Lab Journal: Exploring Our Small Language Model Workshops Link to heading
Welcome! This short blog post walks through the three hands-on workshops in this repository. Each lab builds on the same local toolkit—Milvus for vector storage, Ollama for small language models, and a few Python helpers—so you can experiment without leaving your laptop. Let us take a friendly tour.
Workshop 1: CSV Embeddings in Milvus Link to heading
Think of this lab as the foundation. You bring up Milvus, Attu, and Ollama with Docker Compose, then prepare a Python shell that stays active while you work. With that environment ready, you load a course catalog CSV, turn each description into an embedding with Ollama, and store the data inside a Milvus collection.
What you will practice Link to heading
- Creating databases, users, and schemas in Milvus from Python
- Building an IVF_FLAT vector index so searches stay fast
- Issuing vector and scalar queries to explore the new collection
Why it matters Link to heading
Once you complete the first workshop you have a working mental model: structured metadata, dense vectors, and Milvus search all live together. The later labs reuse the same habits, so time spent here pays off immediately.
Workshop 2: Cache Prompts to Milvus Link to heading
The second lab shows how to turn Milvus into a lightweight prompt cache. You connect Ollama embeddings and a chat LLM, define a cache collection, and teach the workflow to reuse past answers before calling the model again.
What you will practice Link to heading
- Measuring embedding dimensions so schemas match your model output
- Creating an indexed collection that stores prompts, replies, and vectors
- Building helper functions that try Milvus first and call Ollama only on a miss
Why it matters Link to heading
Caching keeps response times low and protects your hardware from unnecessary inference work. The patterns in this lab transfer to any scenario where repeated questions show up, from chatbots to guided tutorials.
Workshop 3: PDF RAG in Milvus Link to heading
The final lab pulls everything together to build a retrieval-augmented generation (RAG) loop. You ingest a PDF, split it into digestible chunks, embed each piece with Ollama, and store the results in Milvus. With the data indexed, you run semantic searches, feed the best matches to the language model, and observe how grounded answers feel.
What you will practice Link to heading
- Loading and chunking long-form documents with LangChain utilities
- Keeping embedding dimensions aligned between schema and model output
- Orchestrating a complete query: retrieve similar chunks, compose a prompt, and generate an answer
Why it matters Link to heading
RAG is the sweet spot for small language models. By pairing local embeddings with focused context, you stay in control of data, latency, and cost while still delivering helpful responses.
Wrap-up Link to heading
All three workshops follow the same friendly rhythm: start services with Docker, keep a Python shell alive, wire Ollama into Milvus, then test your work with clear helper functions. Whether you are cataloging courses, caching chat replies, or grounding a PDF assistant, you now have an approachable playbook for building practical, local-first AI workflows.