RAG Search
The RAG (Retrieval-Augmented Generation) Search module enables semantic search across your entire knowledge base using vector embeddings. Instead of keyword matching, you can ask questions in natural language and retrieve contextually relevant information from all your processed content.
Prerequisites
- Foundation module completed (vault structure and AI provider configured)
- Meeting Processing module completed (enriched notes ready to embed)
- Docker installed for running Qdrant vector database
- ~30 minutes to complete setup
What You'll Get
- Vector Embeddings - Semantic representations of all your meeting notes and documents
- Semantic Search - Find information by meaning, not just keywords
- Context Retrieval - Pull relevant passages for AI-powered answers
- Metadata Filtering - Filter by person, date, type, category, or tags
- FastAPI Service - Optional REST API for integration with other tools
Qdrant Setup
Qdrant is a vector database that stores and searches embeddings. We'll run it locally using Docker.
Step 1: Start Qdrant
Run Qdrant as a Docker container:
docker run -d -p 6333:6333 -p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage:z \
qdrant/qdrant:latest This will:
- Run Qdrant in detached mode (
-d) - Expose port 6333 for API access
- Persist data to
./qdrant_storageon your machine - Start automatically on system boot (unless stopped)
Step 2: Verify Qdrant is Running
curl http://localhost:6333/healthz Expected response: {"title":"healthz","version":"1.x.x"}
Embedding Scripts
The embed_to_qdrant.py script takes your markdown files, generates vector embeddings,
and stores them in Qdrant for fast semantic search.
How It Works
- Read - Parses frontmatter (tags, attendees, category) and content
- Chunk - Splits documents into ~1200 character chunks with 200 char overlap
- Embed - Generates 768-dimensional vectors using your chosen embedding model
- Store - Uploads to Qdrant with metadata for filtering
- Dedupe - Skips unchanged files, tombstones old versions
Metadata Extraction
The script automatically extracts from frontmatter:
- people - From
attendees,people, orparticipants - tags - From
tagsortag - category - From
categoryorproject(legacy) - type - Inferred from category, tags, or folder path (meeting, one-on-one, email, etc.)
Local LLM Setup (Privacy-First)
For maximum privacy, use Ollama to generate embeddings locally. Your data never leaves your machine.
Step 1: Install Ollama
Download from ollama.ai and install.
Step 2: Pull an Embedding Model
ollama pull nomic-embed-text Note: The current scripts use Vertex AI for embeddings by default. To use Ollama,
you'll need to modify the embedding backend in embed_to_qdrant.py (set EMBED_BACKEND=ollama
or use sentence-transformers with EMBED_BACKEND=st).
Using Sentence Transformers (Fully Local)
Install the sentence-transformers library:
pip install sentence-transformers Set environment variable:
export EMBED_BACKEND=st
export ST_MODEL=BAAI/bge-large-en-v1.5 This downloads a ~1GB model once and runs entirely on your CPU/GPU. No API keys, no cloud calls.
Cloud Embeddings
Cloud embedding services offer faster processing and require less local compute, but send your content to external APIs.
Vertex AI (Google Cloud) - Default
The scripts use Vertex AI's text-embedding-004 model by default.
# Set these environment variables
export GOOGLE_CLOUD_PROJECT=your-project-id
export GOOGLE_CLOUD_LOCATION=us-central1
export GEMINI_EMBED_MODEL=text-embedding-004
export EMBED_DIM=768 Authenticate with gcloud:
gcloud auth application-default login --project your-project-id OpenAI Embeddings
To use OpenAI's text-embedding-3-large model:
# Set environment variables
export EMBED_BACKEND=openai
export OPENAI_API_KEY=sk-...
export OPENAI_EMBEDDING_MODEL=text-embedding-3-large Cost: ~$0.13 per 1M tokens (~750k words). A typical meeting transcript costs $0.001-0.01.
Running Embeddings
Embed a Single File
cd ~/Documents/MyVault
python scripts/embed_to_qdrant.py \
--path "Meetings/12-09-25 - Staff Meeting.md" \
--type meeting \
--collection personal_assistant Embed an Entire Folder (Batch Mode)
python scripts/embed_to_qdrant.py \
--input "Meetings" \
--input "People" \
--recursive \
--ext md \
--vault-root ~/Documents/MyVault \
--collection personal_assistant This will:
- Recursively scan
Meetings/andPeople/folders - Process all
.mdfiles - Use relative paths from vault root for stable document IDs
- Skip unchanged files automatically
Force Re-embedding
To re-embed files even if unchanged:
python scripts/embed_to_qdrant.py \
--input "Meetings" \
--recursive \
--force Common Options
--force- Re-embed even if content hash matches--hard-delete-previous- Physically delete old chunks instead of tombstoning--debug- Print frontmatter parsing and metadata resolution--vault-root /path/to/vault- Use relative paths for stable doc IDs--doc-id-key uid- Use frontmatter field as primary doc identifier
Search Interface
Use search_qdrant_simple.py for metadata-only searches (no embeddings required),
or the FastAPI service for full semantic search.
Metadata Search (Simple)
Search by person, type, category, or tags without semantic matching:
# Find all meetings with Andrew
python scripts/search_qdrant_simple.py --person Andrew --limit 10
# Find one-on-ones from the last 30 days
python scripts/search_qdrant_simple.py --type one-on-one --timeframe 30
# Search by text in titles
python scripts/search_qdrant_simple.py --text-search "platform strategy" Semantic Search (FastAPI)
For full semantic search with natural language queries, use the RAG API:
# Start the FastAPI server
uvicorn qdrant_rag:app --host 0.0.0.0 --port 8123 Then query via HTTP:
curl -X POST http://localhost:8123/search \
-H "Content-Type: application/json" \
-d '{
"query": "what did we decide about the AWS outage?",
"limit": 5
}' Metadata Filtering
The search system supports inline operators to filter results by metadata:
Filter Operators
tag:platform-resiliency- Filter by tagperson:andrew- Filter by attendee (case-insensitive)category:sync-meeting- Filter by categorytype:one-on-one- Filter by document typeafter:2025-10-01- Only documents after this datebefore:2025-12-31- Only documents before this date
Example Queries
# One-on-ones with Jason from the last month
python scripts/search_qdrant_simple.py \
--person jason \
--type one-on-one \
--timeframe 30
# Platform resilience discussions tagged with AWS
python scripts/search_qdrant_simple.py \
--text-search "outage" \
--tags platform-resiliency aws
# All emails about Compass Assistant
python scripts/search_qdrant_simple.py \
--type email \
--text-search "compass assistant" RAG Queries
The /ask endpoint combines semantic search with LLM-powered answers, citing sources inline.
Ask a Question
curl -X POST http://localhost:8123/ask \
-H "Content-Type: application/json" \
-d '{
"query": "summarize Jason's 1:1 last week person:jason category:one-on-one",
"k": 8
}' Response format:
{
"answer": "Jason discussed three main topics: ...",
"sources": [
{
"id": "abc123",
"score": 0.89,
"title": "1:1 with Jason Chen",
"snippet": "We talked about...",
"path": "Meetings/2025/Q425/12-02-25 - 1-1 with Jason.md",
"people": ["jason", "erik"],
"tags": ["one-on-one", "coaching"],
"type": "one-on-one"
}
]
} How It Works
- Parse query - Extract filters (person, tag, type, date) and free text
- Embed query - Generate vector for semantic matching
- Search - Find top K most relevant chunks using cosine similarity
- Retrieve context - Extract text snippets from matched chunks
- Generate answer - Send context + query to LLM for synthesis
- Cite sources - Include [S1], [S2] markers in the answer
Verify It Works
Step 1: Embed Sample Documents
python scripts/embed_to_qdrant.py \
--input "Meetings" \
--recursive \
--ext md \
--collection personal_assistant Expected output:
{
"status": "ok",
"count_processed": 15,
"count_errors": 0,
"collection": "personal_assistant",
"model": "text-embedding-004",
"embed_dim": 768,
"items": [# additional items]
} Step 2: Search by Metadata
python scripts/search_qdrant_simple.py --person erik --limit 5 Expected output:
Found 5 results:
1. Q1 roadmap planning and budget review
Type: meeting | Category: team-meeting
People: erik, sarah, jason
Tags: planning, budget, roadmap
2. 1:1 with Sarah Chen
Type: one-on-one | Category: one-on-one
People: erik, sarah
Tags: one-on-one, coaching, growth Step 3: Test Semantic Search (Optional)
Start the API server:
uvicorn qdrant_rag:app --port 8123 Query the search endpoint:
curl -X POST http://localhost:8123/search \
-H "Content-Type: application/json" \
-d '{"query": "AWS outage platform resilience", "limit": 3}' You should get semantically relevant results even if the exact keywords don't match.
Troubleshooting
Qdrant not running
Check if the Docker container is running: docker ps | grep qdrant
Start it if stopped: docker start <container-id>
Embedding errors: "Missing dependency"
Install required packages:
pip install qdrant-client google-cloud-aiplatform pyyaml Authentication failed (Vertex AI)
Ensure you've authenticated with gcloud and set the required environment variables:
gcloud auth application-default login --project your-project-id
export GOOGLE_CLOUD_PROJECT=your-project-id
export GOOGLE_CLOUD_LOCATION=us-central1 Search returns no results
Verify documents were embedded successfully:
# Check collection stats
curl http://localhost:6333/collections/personal_assistant
Look for "points_count" - if it's 0, no documents were embedded.
Dimension mismatch error
The embedding model must match the dimension configured during collection creation.
Vertex AI's text-embedding-004 produces 768-dim vectors. If you change models,
you may need to recreate the collection or use a different collection name.
Next Steps
Now that you have semantic search working, continue to Calendar Integration to generate proactive meeting briefs using your searchable knowledge base.