Case study
Enterprise AI knowledge platform
Semantic document retrieval, contextual search, embeddings, and AI-assisted querying—chunking pipelines, vector search, and scalable ingestion on FastAPI and Qdrant.
Systems engineer — retrieval stack, ingestion, evaluation hooks
- Retrieval
- Dense + metadata filters
- Vector store
- Qdrant
- Serving
- FastAPI (async)
Reference architecture
Problem
- —Teams needed answers grounded in internal PDFs, wikis, and tickets without leaking across tenant boundaries.
- —Naive chunking produced brittle retrieval: tables split incorrectly, headers orphaned from bodies, and duplicate near-identical chunks inflated latency.
Architecture
- —Ingestion workers normalize files, extract structure-aware chunks, and attach metadata (source, ACL, section path, hash).
- —Embeddings batch through OpenAI with backoff and idempotent writes keyed by content hash.
- —Qdrant stores dense vectors with payload filters for tenant, product line, and sensitivity class.
- —FastAPI exposes query, feedback, and admin reindex endpoints; LangChain composes retrievers, rerankers, and citation formatting.
Challenges
- —Chunking strategy: balance recall on long policy PDFs with precision on short runbooks.
- —Cold-start reindex: backfill millions of tokens without saturating embedding rate limits.
- —Grounding: force answer components to cite chunk IDs; surface "insufficient context" instead of hallucinating.
Technologies
- —FastAPI
- —LangChain
- —OpenAI APIs
- —Qdrant
- —PostgreSQL
- —Docker
- —AWS
Engineering decisions
- —Structured chunk metadata over larger naive pages—improved MRR on internal eval sets.
- —Server-side ACL filtering in Qdrant payloads rather than post-filtering in Python to keep latency predictable.
- —Logged retrieval traces (query, filters, top-k ids) for offline eval and regression tests on golden questions.
Outcome
- —Production retrieval path with traceable citations and tenant-safe filtering.
- —Repeatable reindex jobs and versioned embedding models for controlled upgrades.