Lab Journal: Exploring Our Small Language Model Workshops Link to heading

Welcome! This short blog post walks through the three hands-on workshops in this repository. Each lab builds on the same local toolkit—Milvus for vector storage, Ollama for small language models, and a few Python helpers—so you can experiment without leaving your laptop. Let us take a friendly tour.

Workshop 1: CSV Embeddings in Milvus Link to heading

CSV embeddings in Milvus repo

Think of this lab as the foundation. You bring up Milvus, Attu, and Ollama with Docker Compose, then prepare a Python shell that stays active while you work. With that environment ready, you load a course catalog CSV, turn each description into an embedding with Ollama, and store the data inside a Milvus collection.

What you will practice Link to heading

Creating databases, users, and schemas in Milvus from Python
Building an IVF_FLAT vector index so searches stay fast
Issuing vector and scalar queries to explore the new collection

Why it matters Link to heading

Once you complete the first workshop you have a working mental model: structured metadata, dense vectors, and Milvus search all live together. The later labs reuse the same habits, so time spent here pays off immediately.

Workshop 2: Cache Prompts to Milvus Link to heading

Cache prompts to Milvus repo

The second lab shows how to turn Milvus into a lightweight prompt cache. You connect Ollama embeddings and a chat LLM, define a cache collection, and teach the workflow to reuse past answers before calling the model again.

What you will practice Link to heading

Measuring embedding dimensions so schemas match your model output
Creating an indexed collection that stores prompts, replies, and vectors
Building helper functions that try Milvus first and call Ollama only on a miss

Why it matters Link to heading

Caching keeps response times low and protects your hardware from unnecessary inference work. The patterns in this lab transfer to any scenario where repeated questions show up, from chatbots to guided tutorials.

Workshop 3: PDF RAG in Milvus Link to heading

PDF RAG in Milvus repo

The final lab pulls everything together to build a retrieval-augmented generation (RAG) loop. You ingest a PDF, split it into digestible chunks, embed each piece with Ollama, and store the results in Milvus. With the data indexed, you run semantic searches, feed the best matches to the language model, and observe how grounded answers feel.

What you will practice Link to heading

Loading and chunking long-form documents with LangChain utilities
Keeping embedding dimensions aligned between schema and model output
Orchestrating a complete query: retrieve similar chunks, compose a prompt, and generate an answer

Why it matters Link to heading

RAG is the sweet spot for small language models. By pairing local embeddings with focused context, you stay in control of data, latency, and cost while still delivering helpful responses.

Wrap-up Link to heading

All three workshops follow the same friendly rhythm: start services with Docker, keep a Python shell alive, wire Ollama into Milvus, then test your work with clear helper functions. Whether you are cataloging courses, caching chat replies, or grounding a PDF assistant, you now have an approachable playbook for building practical, local-first AI workflows.