RAG Search

The RAG (Retrieval-Augmented Generation) Search module enables semantic search across your entire knowledge base using vector embeddings. Instead of keyword matching, you can ask questions in natural language and retrieve contextually relevant information from all your processed content.

Prerequisites

  • Foundation module completed (vault structure and AI provider configured)
  • Meeting Processing module completed (enriched notes ready to embed)
  • Docker installed for running Qdrant vector database
  • ~30 minutes to complete setup

What You'll Get

  • Vector Embeddings - Semantic representations of all your meeting notes and documents
  • Semantic Search - Find information by meaning, not just keywords
  • Context Retrieval - Pull relevant passages for AI-powered answers
  • Metadata Filtering - Filter by person, date, type, category, or tags
  • FastAPI Service - Optional REST API for integration with other tools

Qdrant Setup

Qdrant is a vector database that stores and searches embeddings. We'll run it locally using Docker.

Step 1: Start Qdrant

Run Qdrant as a Docker container:

docker run -d -p 6333:6333 -p 6334:6334 \
  -v $(pwd)/qdrant_storage:/qdrant/storage:z \
  qdrant/qdrant:latest

This will:

  • Run Qdrant in detached mode (-d)
  • Expose port 6333 for API access
  • Persist data to ./qdrant_storage on your machine
  • Start automatically on system boot (unless stopped)

Step 2: Verify Qdrant is Running

curl http://localhost:6333/healthz

Expected response: {"title":"healthz","version":"1.x.x"}

Embedding Scripts

The embed_to_qdrant.py script takes your markdown files, generates vector embeddings, and stores them in Qdrant for fast semantic search.

How It Works

  1. Read - Parses frontmatter (tags, attendees, category) and content
  2. Chunk - Splits documents into ~1200 character chunks with 200 char overlap
  3. Embed - Generates 768-dimensional vectors using your chosen embedding model
  4. Store - Uploads to Qdrant with metadata for filtering
  5. Dedupe - Skips unchanged files, tombstones old versions

Metadata Extraction

The script automatically extracts from frontmatter:

  • people - From attendees, people, or participants
  • tags - From tags or tag
  • category - From category or project (legacy)
  • type - Inferred from category, tags, or folder path (meeting, one-on-one, email, etc.)

Local LLM Setup (Privacy-First)

For maximum privacy, use Ollama to generate embeddings locally. Your data never leaves your machine.

Step 1: Install Ollama

Download from ollama.ai and install.

Step 2: Pull an Embedding Model

ollama pull nomic-embed-text

Note: The current scripts use Vertex AI for embeddings by default. To use Ollama, you'll need to modify the embedding backend in embed_to_qdrant.py (set EMBED_BACKEND=ollama or use sentence-transformers with EMBED_BACKEND=st).

Using Sentence Transformers (Fully Local)

Install the sentence-transformers library:

pip install sentence-transformers

Set environment variable:

export EMBED_BACKEND=st
export ST_MODEL=BAAI/bge-large-en-v1.5

This downloads a ~1GB model once and runs entirely on your CPU/GPU. No API keys, no cloud calls.

Cloud Embeddings

Cloud embedding services offer faster processing and require less local compute, but send your content to external APIs.

Vertex AI (Google Cloud) - Default

The scripts use Vertex AI's text-embedding-004 model by default.

# Set these environment variables
export GOOGLE_CLOUD_PROJECT=your-project-id
export GOOGLE_CLOUD_LOCATION=us-central1
export GEMINI_EMBED_MODEL=text-embedding-004
export EMBED_DIM=768

Authenticate with gcloud:

gcloud auth application-default login --project your-project-id

OpenAI Embeddings

To use OpenAI's text-embedding-3-large model:

# Set environment variables
export EMBED_BACKEND=openai
export OPENAI_API_KEY=sk-...
export OPENAI_EMBEDDING_MODEL=text-embedding-3-large

Cost: ~$0.13 per 1M tokens (~750k words). A typical meeting transcript costs $0.001-0.01.

Running Embeddings

Embed a Single File

cd ~/Documents/MyVault
python scripts/embed_to_qdrant.py \
  --path "Meetings/12-09-25 - Staff Meeting.md" \
  --type meeting \
  --collection personal_assistant

Embed an Entire Folder (Batch Mode)

python scripts/embed_to_qdrant.py \
  --input "Meetings" \
  --input "People" \
  --recursive \
  --ext md \
  --vault-root ~/Documents/MyVault \
  --collection personal_assistant

This will:

  • Recursively scan Meetings/ and People/ folders
  • Process all .md files
  • Use relative paths from vault root for stable document IDs
  • Skip unchanged files automatically

Force Re-embedding

To re-embed files even if unchanged:

python scripts/embed_to_qdrant.py \
  --input "Meetings" \
  --recursive \
  --force

Common Options

  • --force - Re-embed even if content hash matches
  • --hard-delete-previous - Physically delete old chunks instead of tombstoning
  • --debug - Print frontmatter parsing and metadata resolution
  • --vault-root /path/to/vault - Use relative paths for stable doc IDs
  • --doc-id-key uid - Use frontmatter field as primary doc identifier

Use search_qdrant_simple.py for metadata-only searches (no embeddings required), or the FastAPI service for full semantic search.

Metadata Search (Simple)

Search by person, type, category, or tags without semantic matching:

# Find all meetings with Andrew
python scripts/search_qdrant_simple.py --person Andrew --limit 10

# Find one-on-ones from the last 30 days
python scripts/search_qdrant_simple.py --type one-on-one --timeframe 30

# Search by text in titles
python scripts/search_qdrant_simple.py --text-search "platform strategy"

Semantic Search (FastAPI)

For full semantic search with natural language queries, use the RAG API:

# Start the FastAPI server
uvicorn qdrant_rag:app --host 0.0.0.0 --port 8123

Then query via HTTP:

curl -X POST http://localhost:8123/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "what did we decide about the AWS outage?",
    "limit": 5
  }'

Metadata Filtering

The search system supports inline operators to filter results by metadata:

Filter Operators

  • tag:platform-resiliency - Filter by tag
  • person:andrew - Filter by attendee (case-insensitive)
  • category:sync-meeting - Filter by category
  • type:one-on-one - Filter by document type
  • after:2025-10-01 - Only documents after this date
  • before:2025-12-31 - Only documents before this date

Example Queries

# One-on-ones with Jason from the last month
python scripts/search_qdrant_simple.py \
  --person jason \
  --type one-on-one \
  --timeframe 30

# Platform resilience discussions tagged with AWS
python scripts/search_qdrant_simple.py \
  --text-search "outage" \
  --tags platform-resiliency aws

# All emails about Compass Assistant
python scripts/search_qdrant_simple.py \
  --type email \
  --text-search "compass assistant"

RAG Queries

The /ask endpoint combines semantic search with LLM-powered answers, citing sources inline.

Ask a Question

curl -X POST http://localhost:8123/ask \
  -H "Content-Type: application/json" \
  -d '{
    "query": "summarize Jason's 1:1 last week person:jason category:one-on-one",
    "k": 8
  }'

Response format:

{
  "answer": "Jason discussed three main topics: ...",
  "sources": [
    {
      "id": "abc123",
      "score": 0.89,
      "title": "1:1 with Jason Chen",
      "snippet": "We talked about...",
      "path": "Meetings/2025/Q425/12-02-25 - 1-1 with Jason.md",
      "people": ["jason", "erik"],
      "tags": ["one-on-one", "coaching"],
      "type": "one-on-one"
    }
  ]
}

How It Works

  1. Parse query - Extract filters (person, tag, type, date) and free text
  2. Embed query - Generate vector for semantic matching
  3. Search - Find top K most relevant chunks using cosine similarity
  4. Retrieve context - Extract text snippets from matched chunks
  5. Generate answer - Send context + query to LLM for synthesis
  6. Cite sources - Include [S1], [S2] markers in the answer

Verify It Works

Step 1: Embed Sample Documents

python scripts/embed_to_qdrant.py \
  --input "Meetings" \
  --recursive \
  --ext md \
  --collection personal_assistant

Expected output:

{
  "status": "ok",
  "count_processed": 15,
  "count_errors": 0,
  "collection": "personal_assistant",
  "model": "text-embedding-004",
  "embed_dim": 768,
  "items": [# additional items]
}

Step 2: Search by Metadata

python scripts/search_qdrant_simple.py --person erik --limit 5

Expected output:

Found 5 results:

1. Q1 roadmap planning and budget review
   Type: meeting | Category: team-meeting
   People: erik, sarah, jason
   Tags: planning, budget, roadmap

2. 1:1 with Sarah Chen
   Type: one-on-one | Category: one-on-one
   People: erik, sarah
   Tags: one-on-one, coaching, growth

Step 3: Test Semantic Search (Optional)

Start the API server:

uvicorn qdrant_rag:app --port 8123

Query the search endpoint:

curl -X POST http://localhost:8123/search \
  -H "Content-Type: application/json" \
  -d '{"query": "AWS outage platform resilience", "limit": 3}'

You should get semantically relevant results even if the exact keywords don't match.

Troubleshooting

Qdrant not running

Check if the Docker container is running: docker ps | grep qdrant

Start it if stopped: docker start <container-id>

Embedding errors: "Missing dependency"

Install required packages:

pip install qdrant-client google-cloud-aiplatform pyyaml

Authentication failed (Vertex AI)

Ensure you've authenticated with gcloud and set the required environment variables:

gcloud auth application-default login --project your-project-id
export GOOGLE_CLOUD_PROJECT=your-project-id
export GOOGLE_CLOUD_LOCATION=us-central1

Search returns no results

Verify documents were embedded successfully:

# Check collection stats
curl http://localhost:6333/collections/personal_assistant

Look for "points_count" - if it's 0, no documents were embedded.

Dimension mismatch error

The embedding model must match the dimension configured during collection creation. Vertex AI's text-embedding-004 produces 768-dim vectors. If you change models, you may need to recreate the collection or use a different collection name.

Next Steps

Now that you have semantic search working, continue to Calendar Integration to generate proactive meeting briefs using your searchable knowledge base.