Archicise
Exercise

Design a RAG-based Q&A System

Design a Retrieval-Augmented Generation system that answers questions about a company's internal knowledge base using LLMs.

Functional Requirements

  • Ingest documents (PDFs, docs, wikis) into a searchable knowledge base
  • Answer natural language questions using relevant context
  • Cite sources for each answer
  • Handle multi-turn conversations with context
  • Support document updates and deletions

Non-Functional Requirements

  • Sub-3 second response time for queries
  • Handle 1000+ concurrent users
  • Support 10M+ document chunks
  • Maintain answer accuracy above 90%
  • Cost-efficient LLM usage

Questions to Consider

  • How do you chunk documents for optimal retrieval?
  • What embedding model and vector database should you use?
  • How do you handle context window limits?
Your Solution

Document Ingestion Pipeline

Design the document processing pipeline. Consider parsing different formats, chunking strategies, metadata extraction, and handling updates/deletions.