Skip to content

GenAI Engineer Projects — 20 Portfolio Ideas to Get Hired in 2026


In the current hiring landscape for GenAI engineering roles, your portfolio projects carry more weight than academic credentials. Employers do not care about your degree or certifications; they care about your ability to build systems that work in production. This is a fundamental shift from traditional software engineering, where formal education often served as a proxy for capability.

The market demand for GenAI engineers far exceeds supply, but this does not mean hiring standards have dropped. On the contrary, companies are highly selective because failed AI projects are expensive. A bad hire who deploys hallucinating systems or creates security vulnerabilities can cost millions. Your portfolio must prove you can be trusted with production systems.

Why Projects Matter More Than Credentials:

  • Demonstration over declaration: Anyone can list “LangChain” on a resume. Building a system that handles edge cases, implements proper error handling, and scales under load proves actual competence.
  • Portfolio as conversation starter: Interviewers will spend 60-70% of technical discussions on your projects. Poor projects lead to shallow conversations. Strong projects demonstrate depth.
  • Proof of production thinking: Academic exercises optimize for correctness. Production systems optimize for reliability, cost, maintainability, and observability. Your projects must show you understand this distinction.
  • Differentiation in a crowded field: Bootcamp graduates and self-taught developers all build the same tutorial projects. Distinctive, well-architected systems make you memorable.

The Portfolio Mindset:

Treat your GitHub profile as a product. Each repository should tell a story: what problem you solved, why you made specific architectural choices, how you handled failures, and what you would do differently with more resources. Code quality, documentation, and deployment matter as much as functionality.


Understanding what employers actually evaluate in portfolio projects is essential for building the right things. After reviewing hundreds of GenAI engineering candidates and speaking with hiring managers at companies ranging from Series A startups to FAANG, clear patterns emerge.

What Employers Actually Look For:

Evaluation DimensionWhat They Want to SeeRed Flags
System ThinkingArchitecture diagrams, component separation, clear interfacesMonolithic scripts, no modularity, everything in one file
Production AwarenessError handling, logging, monitoring, rate limitingHappy-path only code, no error handling, missing logs
Trade-off AnalysisDocumented decisions with pros/cons”I used X because it’s popular” without justification
Testing StrategyUnit tests, integration tests, evaluation frameworksNo tests, manual verification only
Operational ConcernsDockerfiles, deployment configs, cost tracking”Works on my machine”, no deployment path
Code QualityType hints, docstrings, consistent style, lintingUntyped code, no documentation, inconsistent formatting

The Three-Project Rule:

Quality consistently beats quantity. Three exceptional projects that demonstrate depth across different domains will outperform ten shallow tutorial implementations. Your portfolio should tell a coherent story about your capabilities.

Selecting Projects for Your Target Role:

  • Junior roles (0-2 years): Focus on projects that demonstrate you can learn, follow patterns, and write clean code. Employers expect to teach you, but you must prove you are teachable.
  • Mid-level roles (2-5 years): Projects should show independent system design, deployment experience, and the ability to optimize for non-functional requirements like latency and cost.
  • Senior roles (5+ years): Build systems that demonstrate architectural judgment, scalability thinking, and the ability to make complex trade-offs. Include projects that show technical leadership potential.

Before diving into specific projects, understand what separates portfolio-worthy projects from tutorial implementations. This mental model will guide every architectural decision you make.

The Portfolio-Worthiness Framework:

A project is portfolio-worthy when it demonstrates one or more of the following:

  1. Complex Integration: Multiple systems working together (LLM, database, cache, API) with clear interfaces and error handling
  2. Scale Thinking: Design decisions that would hold up under increased load, even if the current implementation is small
  3. Operational Maturity: Monitoring, deployment, and maintenance considerations built in from the start
  4. Domain Expertise: Deep understanding of a specific problem space (legal, medical, finance) with appropriate constraints and safety measures
  5. Innovation: Novel approaches to known problems, or novel applications of existing techniques

The Layered Architecture Pattern:

Most production GenAI systems follow a consistent layered pattern:

┌─────────────────────────────────────────────────────────────┐
│ Presentation Layer │
│ (Web UI, API endpoints, CLI interface) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Application Layer │
│ (Request validation, orchestration, session management) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Core Logic Layer │
│ (RAG pipeline, agent workflows, prompt templates) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Infrastructure Layer │
│ (LLM clients, vector DB, cache, external APIs) │
└─────────────────────────────────────────────────────────────┘

Each layer has a single responsibility and communicates through well-defined interfaces. This separation enables testing, swapping implementations, and reasoning about the system.

The Failure-First Design Principle:

Production systems spend most of their time handling edge cases, not the happy path. Design your projects assuming:

  • The LLM will hallucinate or timeout
  • The vector database will be temporarily unavailable
  • User input will be malformed or malicious
  • External APIs will return errors or rate limit
  • Network calls will fail intermittently

Every component should have a fallback strategy. Document these decisions in your README.


This section provides detailed specifications for eight projects across three career levels. Each specification includes problem context, architecture, technology choices, implementation milestones, testing strategy, deployment approach, and interview preparation.


Before examining individual projects, understand the architectural patterns common to production GenAI systems.

The Standard RAG Pipeline:

┌─────────────────────────────────────────────────────────────────┐
│ INGESTION PIPELINE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Raw Documents → Parsing → Chunking → Embedding → Vector Store │
│ ↑ │
│ (PDF, HTML, (Text (OpenAI, (Pinecone, │
│ Markdown) extraction) open source) Weaviate) │
│ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ QUERY PIPELINE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ User Query → Embedding → Retrieval → Reranking → LLM → Response│
│ ↑ ↑ │
│ (Optional: (Vector + Keyword (Cross-encoder (GPT-4, │
│ Query rewrite) hybrid) scoring) Claude) │
│ │
└─────────────────────────────────────────────────────────────────┘

The Agent Orchestration Pattern:

┌─────────────────────────────────────────────────────────────────┐
│ AGENT ORCHESTRATOR │
│ (State management, routing, error handling) │
└─────────────────────────────────────────────────────────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Search │ │ Analysis │ │ Action │ │ Response │
│ Agent │ │ Agent │ │ Agent │ │ Agent │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
│ │ │ │
└──────────────┴──────────────┴──────────────┘
┌──────────────────┐
│ Shared State │
│ (Checkpoints) │
└──────────────────┘

The Multi-Tenant SaaS Architecture:

┌─────────────────────────────────────────────────────────────────┐
│ API Gateway │
│ (Auth, rate limiting, routing) │
└─────────────────────────────────────────────────────────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Tenant │ │ Tenant │ │ Tenant │
│ A │ │ B │ │ C │
│(Isolated│ │(Isolated│ │(Isolated│
│ Data) │ │ Data) │ │ Data) │
└─────────┘ └─────────┘ └─────────┘
│ │ │
└───────────────┼───────────────┘
┌────────────────────┐
│ Shared Services │
│ (LLM, Embedding) │
└────────────────────┘

These projects demonstrate foundational skills. Focus on code quality, clear documentation, and understanding the basic patterns.


Problem Statement:

Organizations generate vast amounts of unstructured documentation (PDFs, manuals, reports) that employees need to query efficiently. Traditional search is keyword-based and misses semantic meaning. Build a system that allows users to upload documents and ask natural language questions, receiving accurate answers grounded in the document content.

Why This Matters:

Document Q&A is the most common enterprise GenAI use case. It demonstrates your ability to implement the core RAG pattern that powers countless production systems. Every interviewer will understand this problem domain.

Architecture Diagram:

┌─────────────────────────────────────────────────────────────────┐
│ CLIENT LAYER │
│ ┌─────────────┐ ┌─────────────────────────────┐ │
│ │ Streamlit │ │ File Upload Component │ │
│ │ Web UI │◄────────────►│ (Drag & Drop, Progress) │ │
│ └─────────────┘ └─────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ API LAYER │
│ FastAPI Application │
├─────────────────────────────────────────────────────────────────┤
│ │
│ POST /upload → DocumentHandler → Processing Pipeline │
│ POST /query → QueryHandler → RAG Pipeline │
│ GET /documents → ListHandler → Metadata Store │
│ │
└─────────────────────────────────────────────────────────────────┘
┌──────────────────┼──────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ DOCUMENT │ │ VECTOR │ │ LLM │
│ PROCESSING │ │ STORE │ │ CLIENT │
├───────────────┤ ├───────────────┤ ├───────────────┤
│ pdfplumber │ │ ChromaDB │ │ OpenAI API │
│ (extraction) │ │ (in-memory │ │ GPT-4o-mini │
│ │ │ or persist) │ │ │
│ Recursive │ │ │ │ Async client │
│ chunking │ │ Cosine sim │ │ with retry │
│ (500 tokens, │ │ retrieval │ │ logic │
│ 50 overlap) │ │ │ │ │
└───────────────┘ └───────────────┘ └───────────────┘

Technology Stack:

ComponentTechnologyVersionPurpose
Web FrameworkFastAPI0.115+Async API endpoints
UIStreamlit1.40+Rapid prototyping interface
Document Parsingpdfplumber0.11+PDF text extraction
Text Chunkinglangchain-text-splitters0.3+Semantic chunking
EmbeddingsOpenAI text-embedding-3-smallAPIDocument/query vectors
Vector StoreChromaDB0.6+Local vector storage
LLMOpenAI GPT-4o-miniAPIAnswer generation
ValidationPydantic2.10+Request/response models
Testingpytest8.3+Unit and integration tests

File Structure:

document-qa-system/
├── README.md
├── requirements.txt
├── pyproject.toml
├── Dockerfile
├── docker-compose.yml
├── .env.example
├── .gitignore
├── src/
│ ├── __init__.py
│ ├── main.py # FastAPI application entry
│ ├── config.py # Configuration management
│ ├── models/
│ │ ├── __init__.py
│ │ ├── schemas.py # Pydantic models
│ │ └── enums.py # Domain enums
│ ├── services/
│ │ ├── __init__.py
│ │ ├── document_service.py # Document processing
│ │ ├── embedding_service.py # Embedding generation
│ │ ├── retrieval_service.py # Vector search
│ │ └── llm_service.py # LLM interaction
│ ├── core/
│ │ ├── __init__.py
│ │ ├── exceptions.py # Custom exceptions
│ │ ├── logging_config.py
│ │ └── constants.py
│ └── api/
│ ├── __init__.py
│ ├── routes.py # API endpoint definitions
│ └── dependencies.py # FastAPI dependencies
├── ui/
│ └── streamlit_app.py # Streamlit interface
├── tests/
│ ├── __init__.py
│ ├── conftest.py # pytest fixtures
│ ├── unit/
│ │ ├── test_document_service.py
│ │ ├── test_retrieval_service.py
│ │ └── test_llm_service.py
│ └── integration/
│ └── test_api.py
└── docs/
└── architecture.md # Design decisions

Implementation Milestones:

MilestoneDurationDeliverableSuccess Criteria
M1: Project Setup2 daysRepository with structureTests run, linting passes
M2: Document Processing3 daysPDF extraction pipeline95%+ text extraction accuracy
M3: Vector Pipeline3 daysEmbedding and storageSub-100ms retrieval latency
M4: RAG Integration4 daysEnd-to-end Q&A80%+ answer relevance (manual)
M5: Web Interface3 daysStreamlit UIUpload, query, display flow works
M6: Testing & Polish3 daysTest suite, documentation80%+ code coverage

Testing Strategy:

tests/unit/test_retrieval_service.py
import pytest
from unittest.mock import Mock, patch
from src.services.retrieval_service import RetrievalService
class TestRetrievalService:
@pytest.fixture
def mock_chroma(self):
return Mock()
@pytest.fixture
def service(self, mock_chroma):
return RetrievalService(chroma_client=mock_chroma)
def test_retrieve_returns_formatted_results(self, service, mock_chroma):
"""Retrieval should return context documents with scores."""
mock_chroma.query.return_value = {
"documents": [["chunk1", "chunk2"]],
"distances": [[0.1, 0.3]],
"metadatas": [[{"source": "doc1"}, {"source": "doc1"}]]
}
results = service.retrieve(query="test query", top_k=2)
assert len(results) == 2
assert results[0]["content"] == "chunk1"
assert results[0]["score"] == 0.9 # Converted from distance
def test_retrieve_handles_empty_results(self, service, mock_chroma):
"""Should gracefully handle no matches found."""
mock_chroma.query.return_value = {
"documents": [[]],
"distances": [[]],
"metadatas": [[]]
}
results = service.retrieve(query="nonsense query", top_k=5)
assert results == []
def test_retrieve_respects_top_k(self, service, mock_chroma):
"""Should respect the top_k parameter."""
mock_chroma.query.return_value = {
"documents": [["a", "b", "c", "d", "e"]],
"distances": [[0.1, 0.2, 0.3, 0.4, 0.5]],
"metadatas": [[{}] * 5]
}
results = service.retrieve(query="test", top_k=3)
assert len(results) == 3

Deployment Approach:

# Dockerfile
FROM python:3.12-slim
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
gcc \
&& rm -rf /var/lib/apt/lists/*
# Copy requirements first for layer caching
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY src/ ./src/
COPY ui/ ./ui/
# Create volume for persistent storage
VOLUME ["/app/data"]
# Expose ports for both API and UI
EXPOSE 8000 8501
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# Default command runs API
CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]
docker-compose.yml
version: '3.8'
services:
api:
build: .
ports:
- "8000:8000"
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- CHROMA_PERSIST_DIR=/app/data/chroma
volumes:
- chroma_data:/app/data/chroma
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
ui:
build: .
command: streamlit run ui/streamlit_app.py --server.port=8501 --server.address=0.0.0.0
ports:
- "8501:8501"
environment:
- API_URL=http://api:8000
depends_on:
api:
condition: service_healthy
volumes:
chroma_data:

What Interviewers Will Ask:

  1. “How did you handle PDFs with tables and images?”

    • Expectation: Discussion of extraction limitations, choice of pdfplumber for table support, acknowledgement that images require OCR or multimodal models
  2. “What chunking strategy did you use and why?”

    • Expectation: Recursive character splitting with overlap, explanation of trade-offs between chunk size and context preservation
  3. “How do you prevent the system from making up answers when documents do not contain the information?”

    • Expectation: System prompt instructions, confidence thresholds, “I do not know” responses
  4. “What would you change to support 1000 concurrent users?”

    • Expectation: Async processing, connection pooling, vector database scaling, caching layer

Problem Statement:

Job seekers struggle to tailor resumes for specific positions. Recruiters spend seconds scanning resumes and miss qualified candidates due to formatting or keyword issues. Build a tool that analyzes a resume against a job description, extracts key requirements, scores alignment, and provides specific improvement suggestions.

Why This Matters:

This project demonstrates structured output extraction, comparative analysis, and practical utility. HR tech is a major GenAI application area, and this project shows you can build tools with measurable business value.

Architecture Diagram:

┌─────────────────────────────────────────────────────────────────┐
│ INPUT LAYER │
├─────────────────────────────────────────────────────────────────┤
│ ┌─────────────────┐ ┌─────────────────────────┐ │
│ │ Resume Upload │ │ Job Description Input │ │
│ │ (PDF, DOCX) │ │ (Text paste, URL) │ │
│ └─────────────────┘ └─────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ EXTRACTION PIPELINE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Resume Extraction Agent │ │
│ │ • Personal info (name, contact) - Pydantic model │ │
│ │ • Work experience (company, role, dates, bullets) │ │
│ │ • Skills (technical, soft skills) │ │
│ │ • Education (degree, institution, year) │ │
│ │ • Projects (title, description, technologies) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Job Description Extraction Agent │ │
│ │ • Required skills (must-have vs nice-to-have) │ │
│ │ • Experience level (years, seniority) │ │
│ │ • Key responsibilities │ │
│ │ • Company culture indicators │ │
│ │ • Salary range (if present) │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ ANALYSIS ENGINE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ Skill │ │ Experience │ │ Semantic │ │
│ │ Matching │ │ Comparison │ │ Similarity │ │
│ │ Algorithm │ │ Logic │ │ Scoring │ │
│ │ │ │ │ │ │ │
│ │ Exact match │ │ Years calc │ │ Resume embedding │ │
│ │ Fuzzy match │ │ Level check │ │ JD embedding │ │
│ │ Synonyms │ │ Gap analysis │ │ Cosine similarity │ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ OUTPUT GENERATION │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Analysis Report (Structured) │ │
│ │ │ │
│ │ Overall Match Score: 72/100 │ │
│ │ │ │
│ │ Strengths: │ │
│ │ ✓ Strong technical skills alignment (Python, AWS) │ │
│ │ ✓ Relevant 5 years experience │ │
│ │ │ │
│ │ Gaps: │ │
│ │ ✗ Missing: Kubernetes experience │ │
│ │ ✗ Missing: Team leadership experience │ │
│ │ ! Warning: Resume uses "managed" instead of "led" │ │
│ │ │ │
│ │ Recommendations: │ │
│ │ 1. Add Kubernetes to skills section │ │
│ │ 2. Quantify impact in project descriptions │ │
│ │ 3. Use stronger action verbs │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Technology Stack:

ComponentTechnologyVersionPurpose
Document Parsingpython-docx, pdfplumber1.1+, 0.11+Resume extraction
Structured OutputPydantic2.10+Schema validation
LLMOpenAI GPT-4o-miniAPIExtraction and analysis
Text Similaritysentence-transformers3.4+Semantic matching
Web UIGradio5.0+Simple interface
AsyncasynciostdlibConcurrent processing
Testingpytest, pytest-asyncio8.3+Test framework

Implementation Milestones:

MilestoneDurationDeliverableSuccess Criteria
M1: Schema Design2 daysPydantic models for resume/JDValidation passes on samples
M2: Resume Parser3 daysPDF/DOCX extraction90%+ field extraction rate
M3: JD Parser2 daysJD text extractionStructured output consistent
M4: Analysis Engine4 daysMatching and scoringManual evaluation agrees 75%+
M5: Report Generation2 daysFormatted recommendationsActionable, specific advice
M6: UI & Polish2 daysGradio interfaceEnd-to-end flow complete

Key Code Pattern - Structured Extraction:

src/models/schemas.py
from pydantic import BaseModel, Field
from typing import List, Optional
from datetime import date
from enum import Enum
class SkillLevel(str, Enum):
EXPERT = "expert"
ADVANCED = "advanced"
INTERMEDIATE = "intermediate"
BEGINNER = "beginner"
class WorkExperience(BaseModel):
company: str = Field(description="Employer name")
title: str = Field(description="Job title")
start_date: Optional[date] = Field(None, description="Start date")
end_date: Optional[date] = Field(None, description="End date or null if current")
is_current: bool = Field(False, description="Whether this is current position")
bullets: List[str] = Field(default_factory=list, description="Achievement bullets")
@property
def duration_months(self) -> int:
"""Calculate experience duration in months."""
end = self.end_date or date.today()
return (end.year - self.start_date.year) * 12 + (end.month - self.start_date.month)
class ResumeData(BaseModel):
name: str = Field(description="Candidate full name")
email: Optional[str] = Field(None, description="Contact email")
phone: Optional[str] = Field(None, description="Contact phone")
linkedin: Optional[str] = Field(None, description="LinkedIn URL")
summary: Optional[str] = Field(None, description="Professional summary")
skills: dict[str, List[str]] = Field(
default_factory=dict,
description="Categorized skills: technical, soft, domain, tools"
)
experience: List[WorkExperience] = Field(default_factory=list)
education: List[dict] = Field(default_factory=list)
projects: List[dict] = Field(default_factory=list)
@property
def total_years_experience(self) -> float:
"""Calculate total years of professional experience."""
total_months = sum(exp.duration_months for exp in self.experience)
return round(total_months / 12, 1)
@property
def all_skills_flat(self) -> List[str]:
"""Return all skills as flat list."""
return [
skill.lower()
for category in self.skills.values()
for skill in category
]
class JobRequirement(BaseModel):
skill: str = Field(description="Required skill or qualification")
is_required: bool = Field(True, description="Must-have vs nice-to-have")
importance: int = Field(1, ge=1, le=5, description="Importance 1-5")
context: Optional[str] = Field(None, description="How skill is used in role")
class JobDescription(BaseModel):
title: str = Field(description="Job title")
company: Optional[str] = Field(None, description="Company name")
level: Optional[str] = Field(None, description="Seniority level")
min_years_experience: Optional[int] = Field(None)
location: Optional[str] = Field(None)
salary_range: Optional[str] = Field(None)
requirements: List[JobRequirement] = Field(default_factory=list)
responsibilities: List[str] = Field(default_factory=list)
culture_indicators: List[str] = Field(default_factory=list)
class MatchAnalysis(BaseModel):
overall_score: int = Field(ge=0, le=100, description="Overall match percentage")
skill_match_score: int = Field(ge=0, le=100)
experience_match_score: int = Field(ge=0, le=100)
semantic_similarity_score: float = Field(ge=0, le=1)
matched_skills: List[str] = Field(default_factory=list)
missing_skills: List[JobRequirement] = Field(default_factory=list)
experience_gaps: List[str] = Field(default_factory=list)
strengths: List[str] = Field(default_factory=list)
recommendations: List[str] = Field(default_factory=list, min_length=3)

What Interviewers Will Ask:

  1. “How do you handle resumes with non-standard formats or creative layouts?”

    • Expectation: Discussion of extraction limitations, fallback strategies, graceful degradation
  2. “What accuracy did you achieve for skill extraction, and how did you measure it?”

    • Expectation: Manual evaluation on test set, precision/recall metrics, error analysis
  3. “How did you prevent the system from hallucinating requirements that are not in the job description?”

    • Expectation: Strict output schema, validation, conservative extraction with low confidence handling

These projects demonstrate production-grade implementation skills. Focus on performance optimization, error handling, and deployment concerns.


Section titled “Project 3: Advanced RAG with Hybrid Search”

Problem Statement:

Basic RAG systems often fail to retrieve relevant documents because semantic search alone misses exact keyword matches, especially for technical terms, product names, and acronyms. Build a production-grade RAG system that combines dense (semantic) and sparse (keyword) retrieval, includes reranking, handles conversation history, and deploys as a scalable API.

Why This Matters:

This is the standard for production RAG systems. Basic implementations fail in real-world scenarios with diverse document types and query patterns. This project proves you can build systems that work under realistic constraints.

Architecture Diagram:

┌─────────────────────────────────────────────────────────────────┐
│ DOCUMENT INGESTION │
│ (Async Processing Pipeline) │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Upload API → Validation → Parsing → Chunking → Queue │
│ ↓ │
│ (Size, type (Schema (Unstructured (Semantic (Redis│
│ checks) validation) io) split) Stream)│
│ │
│ Worker Pool: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ 1. Generate dense embedding (OpenAI text-embedding-3) │ │
│ │ 2. Generate sparse embedding (BM25/SPLADE via sentence) │ │
│ │ 3. Store in Pinecone with metadata │ │
│ │ 4. Index keywords in Elasticsearch (optional) │ │
│ │ 5. Update processing status │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ QUERY PIPELINE │
│ (Hybrid Retrieval) │
├─────────────────────────────────────────────────────────────────┤
│ │
│ User Query │
│ │ │
│ ▼ │
│ ┌─────────────────┐ ┌─────────────────────────────────────┐ │
│ │ Query Rewriting │───►│ • Expand acronyms │ │
│ │ (Optional LLM) │ │ • Add synonyms │ │
│ │ │ │ • Clarify ambiguous terms │ │
│ └─────────────────┘ └─────────────────────────────────────┘ │
│ │ │
│ ├────────────────────────┬─────────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Dense │ │ Sparse │ │ Keyword │ │
│ │ Retrieval │ │ Retrieval │ │ (BM25) │ │
│ │ │ │ │ │ │ │
│ │ Pinecone │ │ Pinecone │ │ Elasticsearch│ │
│ │ (cosine) │ │ (dot prod) │ │ (BM25 score)│ │
│ │ Top 20 │ │ Top 20 │ │ Top 20 │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │
│ └────────────────────┼────────────────────┘ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ Fusion & Deduplication │
│ │ • RRF (Reciprocal Rank Fusion) │
│ │ • Score normalization │
│ │ • Duplicate removal │
│ │ Top 15 candidates │
│ └─────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ Cross-Encoder │ │
│ │ Reranking │ │
│ │ │ │
│ │ sentence-transformers │
│ │ ms-marco-MiniLM-L-6-v2 │
│ │ │ │
│ │ Score each query-doc pair │
│ │ Return top 5 │
│ └─────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Context Assembly + Prompt Building │ │
│ │ • Combine retrieved chunks │ │
│ │ • Add conversation history (last 3 exchanges) │ │
│ │ • Format with source citations │ │
│ │ • Inject system prompt │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ LLM Generation │ │
│ │ • GPT-4o-mini (default) or GPT-4o (complex queries) │ │
│ │ • Streaming response │ │
│ │ • Citation injection │ │
│ │ • Answer confidence estimation │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Response Post-Processing │ │
│ │ • Format validation │ │
│ │ • Source attribution │ │
│ │ • Suggested follow-up questions │ │
│ └─────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Technology Stack:

ComponentTechnologyVersionPurpose
API FrameworkFastAPI0.115+Async endpoints
Task QueueCelery + Redis5.4+, 7.4+Async processing
Dense EmbeddingsOpenAI text-embedding-3-smallAPISemantic vectors
Sparse EmbeddingsSPLADE via transformers4.46+Keyword vectors
Vector DBPinecone5.4+Hybrid search
Rerankersentence-transformers cross-encoder3.4+Result ranking
LLMOpenAI GPT-4o-miniAPIResponse generation
MonitoringLangSmithLatestTrace and evaluate
DeploymentDocker + Docker Compose27+Containerization

File Structure:

advanced-rag-system/
├── README.md
├── pyproject.toml
├── docker-compose.yml
├── .env.example
├── config/
│ ├── __init__.py
│ ├── settings.py # Pydantic Settings with env vars
│ ├── logging.yaml # Structured logging config
│ └── prompts/ # Version-controlled prompts
│ ├── system_prompt.txt
│ ├── query_rewrite.txt
│ └── citation_prompt.txt
├── src/
│ ├── __init__.py
│ ├── main.py # FastAPI app
│ ├── api/
│ │ ├── __init__.py
│ │ ├── routes.py # HTTP endpoints
│ │ ├── dependencies.py # Injectable dependencies
│ │ └── middleware.py # Auth, rate limiting
│ ├── core/
│ │ ├── __init__.py
│ │ ├── exceptions.py
│ │ ├── logging.py
│ │ └── constants.py
│ ├── models/
│ │ ├── __init__.py
│ │ ├── schemas.py # Pydantic models
│ │ └── domain.py # Business entities
│ ├── services/
│ │ ├── __init__.py
│ │ ├── ingestion/
│ │ │ ├── __init__.py
│ │ │ ├── parser.py # Document parsing
│ │ │ ├── chunker.py # Semantic chunking
│ │ │ └── worker.py # Celery tasks
│ │ ├── retrieval/
│ │ │ ├── __init__.py
│ │ │ ├── dense.py # Vector search
│ │ │ ├── sparse.py # BM25/SPLADE
│ │ │ ├── fusion.py # RRF fusion
│ │ │ └── reranker.py # Cross-encoder
│ │ ├── generation/
│ │ │ ├── __init__.py
│ │ │ ├── llm.py # LLM client
│ │ │ ├── history.py # Conversation memory
│ │ │ └── prompts.py # Prompt management
│ │ └── evaluation/
│ │ ├── __init__.py
│ │ └── metrics.py # RAGAS metrics
│ └── infrastructure/
│ ├── __init__.py
│ ├── pinecone_client.py
│ ├── redis_client.py
│ └── langsmith_client.py
├── tests/
│ ├── __init__.py
│ ├── conftest.py
│ ├── unit/
│ ├── integration/
│ └── evaluation/ # RAG evaluation suite
└── scripts/
├── run_ingestion.py
├── evaluate_rag.py
└── benchmark_latency.py

Implementation Milestones:

MilestoneDurationDeliverableSuccess Criteria
M1: Infrastructure3 daysDocker, config, loggingAll services start cleanly
M2: Ingestion Pipeline4 daysAsync document processing100 docs/min throughput
M3: Hybrid Retrieval5 daysDense + sparse + fusionBetter recall than single method
M4: Reranking3 daysCross-encoder integration15%+ MRR improvement
M5: Generation3 daysStreaming, history, citationsSub-2s time-to-first-token
M6: Evaluation3 daysRAGAS metrics pipelineQuantified quality scores
M7: Deployment2 daysProduction Docker setupHealth checks, monitoring

Testing Strategy:

tests/evaluation/test_retrieval.py
import pytest
from dataclasses import dataclass
from typing import List
from src.services.retrieval.fusion import RRFusion
from src.services.retrieval.dense import DenseRetriever
from src.services.retrieval.sparse import SparseRetriever
@dataclass
class RetrievalTestCase:
query: str
expected_doc_ids: List[str]
description: str
RETRIEVAL_TEST_CASES = [
RetrievalTestCase(
query="What is the company's vacation policy?",
expected_doc_ids=["hr_handbook_2024.pdf"],
description="Basic semantic retrieval"
),
RetrievalTestCase(
query="API rate limits for v2 endpoints",
expected_doc_ids=["api_docs_v2.md"],
description="Keyword-heavy technical query"
),
RetrievalTestCase(
query="How do I reset my 2FA?",
expected_doc_ids=["security_faq.md", "account_recovery.md"],
description="Multi-document answer"
),
]
class TestRetrievalQuality:
@pytest.fixture
async def retrievers(self):
dense = DenseRetriever()
sparse = SparseRetriever()
fusion = RRFusion(k=60)
return dense, sparse, fusion
@pytest.mark.asyncio
@pytest.mark.parametrize("test_case", RETRIEVAL_TEST_CASES)
async def test_retrieval_recall(self, retrievers, test_case):
"""Test that expected documents are in top-k results."""
dense, sparse, fusion = retrievers
# Retrieve using both methods
dense_results = await dense.search(test_case.query, top_k=20)
sparse_results = await sparse.search(test_case.query, top_k=20)
# Fuse results
fused = fusion.combine([dense_results, sparse_results], top_k=10)
retrieved_ids = [r.document_id for r in fused]
# Check expected IDs are present
for expected_id in test_case.expected_doc_ids:
assert expected_id in retrieved_ids, \
f"Expected {expected_id} for query: {test_case.query}"
@pytest.mark.asyncio
async def test_hybrid_beats_dense_alone(self, retrievers):
"""Hybrid retrieval should outperform dense for keyword-heavy queries."""
dense, sparse, fusion = retrievers
query = "HTTP 429 error troubleshooting"
dense_results = await dense.search(query, top_k=5)
sparse_results = await sparse.search(query, top_k=5)
fused = fusion.combine([dense_results, sparse_results], top_k=5)
# Check if relevant doc is in results
relevant_doc = "api_error_codes.md"
dense_has = any(r.document_id == relevant_doc for r in dense_results)
fused_has = any(r.document_id == relevant_doc for r in fused)
assert fused_has or not dense_has, \
"Hybrid should find doc when dense doesn't"

What Interviewers Will Ask:

  1. “Why did you choose RRF for fusion instead of linear combination?”

    • Expectation: Discussion of score normalization challenges, why rank-based fusion is more robust across different scoring scales
  2. “How do you handle the latency increase from reranking?”

    • Expectation: Batch processing, async patterns, caching strategies, trade-offs between quality and speed
  3. “What retrieval metrics did you track, and what were your targets?”

    • Expectation: MRR, NDCG, recall@k, precision@k, human evaluation correlation
  4. “How would you scale this to handle 1000 queries per second?”

    • Expectation: Load balancing, caching, read replicas, embedding service scaling, CDN for documents

Problem Statement:

Knowledge workers spend hours researching topics across multiple sources, synthesizing information, and writing summaries. Build an autonomous agent system that researches topics end-to-end: searches the web, reads and extracts key information from sources, synthesizes findings across multiple documents, and produces structured reports with citations.

Why This Matters:

Agent systems represent the next major evolution in GenAI applications. This project demonstrates understanding of multi-agent architecture, tool use, state management, and complex workflow orchestration. These are the skills needed for the most cutting-edge GenAI roles.

Architecture Diagram:

┌─────────────────────────────────────────────────────────────────┐
│ RESEARCH ORCHESTRATOR │
│ (LangGraph State Machine) │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Research State │ │
│ │ • query: str │ │
│ │ • sub_queries: List[str] │ │
│ │ • sources: List[Source] │ │
│ │ • findings: List[Finding] │ │
│ │ • synthesis: Optional[Synthesis] │ │
│ │ • report: Optional[Report] │ │
│ │ • iteration_count: int │ │
│ │ • errors: List[Error] │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ State Graph │ │
│ │ │ │
│ │ START → Plan → Search → Extract → Evaluate ──┐ │ │
│ │ ↑ │ │ │
│ │ └────────── Need More Info ◄───────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ Synthesize │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ Write Report → END │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────────┼──────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Planner │ │ Searcher │ │ Extractor │ │
│ │ Agent │ │ Agent │ │ Agent │ │
│ ├─────────────┤ ├─────────────┤ ├─────────────┤ │
│ │ Break down │ │ • SerpAPI │ │ • URL fetch │ │
│ │ complex │ │ • arXiv │ │ • Readability│ │
│ │ queries │ │ • Wikipedia │ │ • LLM extract│ │
│ │ into sub- │ │ • News API │ │ • Key facts │ │
│ │ queries │ │ │ │ • Quotes │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Synthesis │ │
│ │ Agent │ │
│ ├─────────────┤ │
│ │ Resolve │ │
│ │ conflicts │ │
│ │ Identify │ │
│ │ gaps │ │
│ │ Build │ │
│ │ narrative │ │
│ └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Technology Stack:

ComponentTechnologyVersionPurpose
OrchestrationLangGraph0.2+Agent workflow state machine
LLMOpenAI GPT-4o / Claude 3.5 SonnetAPIAgent reasoning
SearchSerpAPI + arXiv APILatestWeb and academic search
Web Scrapingplaywright + readability-lxml1.49+, 0.9+Content extraction
State StoreRedis7.4+Checkpoint persistence
OutputPydantic2.10+Structured reports
MonitoringLangSmithLatestTrace agent decisions

Implementation Milestones:

MilestoneDurationDeliverableSuccess Criteria
M1: State Design3 daysLangGraph state machineAll states transition correctly
M2: Planner Agent3 daysQuery decompositionComplex queries broken into sub-queries
M3: Search Agent4 daysMulti-source search5+ sources per query
M4: Extractor Agent4 daysContent extraction80%+ extraction success rate
M5: Synthesis Agent3 daysConflict resolutionCoherent synthesis from multiple sources
M6: Report Writer3 daysFormatted outputStructured report with citations
M7: Evaluation4 daysQuality metricsHuman-evaluated accuracy scores

Key Code Pattern - LangGraph State Machine:

src/agents/research_graph.py
from typing import TypedDict, List, Annotated
from langgraph.graph import StateGraph, END
from langgraph.checkpoint import RedisCheckpoint
import operator
class Source(TypedDict):
url: str
title: str
content: str
relevance_score: float
accessed_at: str
class Finding(TypedDict):
claim: str
evidence: str
source_url: str
confidence: float
class ResearchState(TypedDict):
query: str
sub_queries: List[str]
sources: Annotated[List[Source], operator.add]
findings: Annotated[List[Finding], operator.add]
iteration: int
max_iterations: int
status: str # "planning", "searching", "extracting", "synthesizing", "complete"
error: str
# Node functions
async def planner_node(state: ResearchState) -> dict:
"""Break down complex query into sub-queries."""
if state["iteration"] >= state["max_iterations"]:
return {"status": "complete"}
planner = PlannerAgent()
sub_queries = await planner.decompose(state["query"])
return {
"sub_queries": sub_queries,
"status": "searching",
"iteration": state["iteration"] + 1
}
async def search_node(state: ResearchState) -> dict:
"""Search for sources for each sub-query."""
searcher = SearchAgent()
all_sources = []
for sub_query in state["sub_queries"]:
sources = await searcher.search(sub_query, max_results=5)
all_sources.extend(sources)
# Deduplicate by URL
seen = set()
unique_sources = []
for s in all_sources:
if s["url"] not in seen:
seen.add(s["url"])
unique_sources.append(s)
return {
"sources": unique_sources,
"status": "extracting"
}
async def extract_node(state: ResearchState) -> dict:
"""Extract key information from sources."""
extractor = ExtractionAgent()
all_findings = []
for source in state["sources"][:10]: # Limit to top 10
try:
findings = await extractor.extract(
content=source["content"],
query=state["query"]
)
for f in findings:
f["source_url"] = source["url"]
all_findings.extend(findings)
except Exception as e:
# Log but continue
continue
return {
"findings": all_findings,
"status": "evaluating"
}
def should_continue(state: ResearchState) -> str:
"""Decide whether to continue research or synthesize."""
if state["status"] == "complete":
return "synthesize"
if len(state["findings"]) < 5 and state["iteration"] < state["max_iterations"]:
return "plan" # Need more information
return "synthesize"
# Build the graph
workflow = StateGraph(ResearchState)
# Add nodes
workflow.add_node("planner", planner_node)
workflow.add_node("search", search_node)
workflow.add_node("extract", extract_node)
workflow.add_node("synthesize", synthesis_node)
workflow.add_node("write_report", report_node)
# Add edges
workflow.set_entry_point("planner")
workflow.add_edge("planner", "search")
workflow.add_edge("search", "extract")
workflow.add_conditional_edges(
"extract",
should_continue,
{
"plan": "planner",
"synthesize": "synthesize"
}
)
workflow.add_edge("synthesize", "write_report")
workflow.add_edge("write_report", END)
# Compile with checkpointing
checkpoint = RedisCheckpoint(redis_url="redis://localhost:6379")
research_agent = workflow.compile(checkpointer=checkpoint)

What Interviewers Will Ask:

  1. “How do you prevent the agent from getting stuck in infinite loops?”

    • Expectation: Max iteration limits, state machine constraints, convergence detection
  2. “What happens when a search returns paywalled content?”

    • Expectation: Fallback strategies, content extraction limitations, transparent handling
  3. “How do you evaluate the quality of the final report?”

    • Expectation: Human evaluation framework, factuality checking, citation accuracy metrics

Problem Statement:

Code reviews are bottlenecks in software development teams. Reviewers miss issues due to time constraints or lack of domain knowledge. Build a GitHub bot that automatically analyzes pull requests, identifies security vulnerabilities, performance issues, and style violations, and suggests specific improvements with explanations.

Why This Matters:

Developer productivity tools are high-value GenAI applications. This project demonstrates integration with developer workflows, tool-augmented agents, and structured output generation. It shows you understand the software development lifecycle.

Architecture Diagram:

┌─────────────────────────────────────────────────────────────────┐
│ GITHUB INTEGRATION │
├─────────────────────────────────────────────────────────────────┤
│ │
│ GitHub Webhook → Event Processor → Task Queue → Workers │
│ (PR opened, (Filter, (Celery + (Async │
│ commit pushed) validate) Redis) processing)│
│ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ ANALYSIS PIPELINE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Diff Retrieval │ │
│ │ • Fetch PR diff via GitHub API │ │
│ │ • Parse file changes with context │ │
│ │ • Filter relevant files (exclude vendor, generated) │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────────┼───────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Security │ │ Performance │ │ Style │ │
│ │ Agent │ │ Agent │ │ Agent │ │
│ ├─────────────┤ ├─────────────┤ ├─────────────┤ │
│ │ • SQL inj │ │ • N+1 query │ │ • PEP8 │ │
│ │ • XSS risk │ │ • Memory │ │ • Type hints│ │
│ │ • Secrets │ │ • Complexity│ │ • Naming │ │
│ │ • Auth bugs │ │ • Async │ │ • Docs │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │
│ └───────────────────┼───────────────────┘ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Result Aggregation │ │
│ │ • Deduplicate overlapping issues │ │
│ │ • Score severity (critical, warning, suggestion) │ │
│ │ • Sort by importance and file location │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Review Comment Generation │ │
│ │ • Line-specific comments with context │ │
│ │ • Summary comment with statistics │ │
│ │ • Suggested code changes │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ GitHub PR Comment Posting │ │
│ │ • Create review with comments │ │
│ │ • Request changes or approve │ │
│ │ • Update existing review on new commits │ │
│ └───────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Technology Stack:

ComponentTechnologyVersionPurpose
GitHub IntegrationPyGithub2.5+API client
Webhook HandlerFastAPI0.115+Event reception
Task QueueCelery + Redis5.4+, 7.4+Async processing
Static Analysisbandit, pylint1.7+, 3.3+Security/lint checks
LLMOpenAI GPT-4o-miniAPIReview generation
DatabasePostgreSQL16+PR history, caching
DeploymentDocker27+Containerization

What Interviewers Will Ask:

  1. “How do you handle false positives from the security scanner?”

    • Expectation: Confidence scoring, suppressions, user feedback loop
  2. “What prevents the bot from suggesting changes that break existing tests?”

    • Expectation: CI integration, test awareness, conservative suggestions
  3. “How do you ensure the bot does not overwhelm developers with too many comments?”

    • Expectation: Batching, severity filtering, summary-first approach

These projects demonstrate architectural expertise, scale thinking, and the ability to lead complex technical initiatives.


Project 6: Domain-Specific Fine-Tuned Model

Section titled “Project 6: Domain-Specific Fine-Tuned Model”

Problem Statement:

General-purpose LLMs lack deep expertise in specialized domains like legal, medical, or financial analysis. They struggle with domain-specific terminology, regulatory nuances, and format requirements. Fine-tune an open-source model (Llama 3.3, Mistral) for a specific domain, creating a model that outperforms GPT-4 on domain tasks while being deployable on cost-effective infrastructure.

Why This Matters:

Fine-tuning specialists command premium salaries. This project demonstrates advanced ML skills, dataset engineering, training infrastructure, and model serving. It proves you can go beyond API integration to actual model customization.

Architecture Diagram:

┌─────────────────────────────────────────────────────────────────┐
│ DATA PIPELINE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Raw Sources → Curation → Formatting → Tokenization → Dataset │
│ ↓ │
│ (Legal docs, (Quality (Instruction (Llama 3.3 (Hugging│
│ case law, filtering, format with tokenizer, Face │
│ textbooks) dedup) reasoning) truncation) datasets)│
│ │
│ Example Format: │
│ { │
│ "instruction": "Analyze this contract clause...", │
│ "input": "Clause text...", │
│ "output": "Analysis with citations...", │
│ "reasoning": "Step-by-step legal reasoning..." │
│ } │
│ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ TRAINING INFRASTRUCTURE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Training Configuration │ │
│ │ • Base model: meta-llama/Llama-3.3-8B-Instruct │ │
│ │ • Method: QLoRA (4-bit quantization) │ │
│ │ • LoRA rank: 64, alpha: 128 │ │
│ │ • Target modules: q_proj, k_proj, v_proj, o_proj │ │
│ │ • Learning rate: 2e-4 with cosine decay │ │
│ │ • Batch size: 64 (accumulated) │ │
│ │ • Epochs: 3 │ │
│ │ • Max sequence: 4096 tokens │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Training Orchestration (Axolotl/TRL) │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌──────────┐ │ │
│ │ │ Data │ ───► │ Model │ ───► │ Training│ │ │
│ │ │ Loader │ │ Prep │ │ Loop │ │ │
│ │ │ (streaming)│ │ (QLoRA) │ │ │ │ │
│ │ └─────────────┘ └─────────────┘ └────┬─────┘ │ │
│ │ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ │ │ │
│ │ │ Checkpoint │ ◄────│ Validation │ ◄─────────┘ │ │
│ │ │ (HF Hub) │ │ (every N │ │ │
│ │ │ │ │ steps) │ │ │
│ │ └─────────────┘ └─────────────┘ │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Experiment Tracking (Weights & Biases) │ │
│ │ • Training loss curves │ │
│ │ • Learning rate schedule │ │
│ │ • GPU utilization │ │
│ │ • Validation metrics │ │
│ │ • Sample generations │ │
│ └───────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ EVALUATION FRAMEWORK │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ Automated │ │ Human │ │ Benchmark │ │
│ │ Metrics │ │ Evaluation │ │ Comparison │ │
│ ├──────────────┤ ├──────────────┤ ├──────────────────────┤ │
│ │ • Perplexity │ │ • Expert │ │ • GPT-4 baseline │ │
│ │ • BLEU/ROUGE │ │ review of │ │ • Domain-specific │ │
│ │ • Factuality │ │ samples │ │ test sets │ │
│ │ • Safety │ │ • Rubric │ │ • Cost/perf tradeoff │ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ MODEL SERVING │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Deployment Options │ │
│ │ │ │
│ │ Option A: vLLM (Recommended) │ │
│ │ • Tensor parallelism for multi-GPU │ │
│ │ • PagedAttention for throughput │ │
│ │ • OpenAI-compatible API │ │
│ │ • ~3,000 tok/sec on A100 │ │
│ │ │ │
│ │ Option B: Text Generation Inference (TGI) │ │
│ │ • Hugging Face native │ │
│ │ • Good for Hub integration │ │
│ │ │ │
│ │ Option C: llama.cpp (CPU/Edge) │ │
│ │ • Quantized GGUF format │ │
│ │ • CPU inference │ │
│ │ • Edge deployment │ │
│ └───────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Technology Stack:

ComponentTechnologyVersionPurpose
Base ModelLlama 3.3 8B InstructLatestFoundation model
TrainingAxolotl or TRL0.5+Fine-tuning framework
PEFTpeft0.14+LoRA/QLoRA implementation
Quantizationbitsandbytes0.45+4-bit quantization
DatasetHugging Face datasets3.2+Data processing
TrackingWeights & Biases0.19+Experiment logging
ServingvLLM0.6+High-throughput inference
HardwareA100 40GB or H100N/ATraining (cloud rental)

Implementation Milestones:

MilestoneDurationDeliverableSuccess Criteria
M1: Dataset Curation7 days10K+ high-quality examplesExpert-validated samples
M2: Training Setup4 daysAxolotl config, infraSuccessful dry-run
M3: Fine-Tuning5 daysTrained adapter weightsLoss convergence
M4: Evaluation5 daysBenchmark resultsBeats GPT-4 on domain tasks
M5: Deployment4 daysvLLM serving endpointSub-100ms TTFT
M6: Documentation3 daysTraining report, model cardReproducible training

What Interviewers Will Ask:

  1. “Why did you choose QLoRA over full fine-tuning?”

    • Expectation: Cost trade-offs, memory requirements, catastrophic forgetting concerns
  2. “How did you prevent overfitting on your training data?”

    • Expectation: Validation set design, early stopping, dropout, weight decay discussion
  3. “What was your cost per training run, and how did you optimize it?”

    • Expectation: GPU rental costs, spot instances, gradient accumulation strategies
  4. “How do you handle model updates when new training data becomes available?”

    • Expectation: Continuous training strategies, version management, A/B testing

Problem Statement:

Large organizations need to make institutional knowledge accessible across departments while maintaining strict access controls. Build a multi-tenant RAG system capable of indexing millions of documents across diverse formats, with real-time updates, granular permissions, comprehensive monitoring, and cost tracking.

Why This Matters:

Enterprise scale is where senior engineers differentiate. This project demonstrates distributed systems design, security architecture, and operational excellence. These are the challenges faced by companies like Glean, Microsoft, and Amazon.

Architecture Diagram:

┌─────────────────────────────────────────────────────────────────┐
│ CLIENT LAYER │
│ (Web App, Mobile, Slack Bot, API Clients) │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ API GATEWAY │
│ (Kong/AWS API Gateway - Auth, Rate Limit, Routing) │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ APPLICATION SERVICES │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Query Service │ │ Ingestion │ │ Admin Service │ │
│ │ (FastAPI) │ │ Service │ │ (Management) │ │
│ │ │ │ (FastAPI) │ │ │ │
│ │ • RAG pipeline │ │ • Upload API │ │ • User mgmt │ │
│ │ • Auth check │ │ • Validation │ │ • Permissions │ │
│ │ • Response │ │ • Queue job │ │ • Analytics │ │
│ └────────┬────────┘ └────────┬────────┘ └─────────────────┘ │
│ │ │ │
│ │ ▼ │
│ │ ┌─────────────────────┐ │
│ │ │ Ingestion Pipeline │ │
│ │ │ (Celery Workers) │ │
│ │ ├─────────────────────┤ │
│ │ │ • Document parsing │ │
│ │ │ • OCR (if needed) │ │
│ │ │ • Chunking │ │
│ │ │ • Embedding │ │
│ │ │ • Vector storage │ │
│ │ └─────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ RAG Pipeline (per-tenant) │ │
│ │ │ │
│ │ Query → Auth/ACL → Hybrid Retrieval → Rerank → LLM │ │
│ │ ↓ ↓ │ │
│ │ (Permission (Tenant-scoped │ │
│ │ filtering) vector search) │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
┌───────────────────┼───────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ VECTOR DB │ │ CACHE LAYER │ │ SEARCH │
│ (Milvus) │ │ (Redis) │ │ (Elasticsearch│
├───────────────┤ ├───────────────┤ ├───────────────┤
│ • Multi-tenant│ │ • Query cache │ │ • Full-text │
│ collections │ │ • Rate limit │ │ • Faceted │
│ • Partition │ │ • Session │ │ • Filtering │
│ by org │ │ store │ │ │
│ • Role-based │ │ • Pub/sub │ │ │
│ access │ │ for sync │ │ │
└───────────────┘ └───────────────┘ └───────────────┘
│ │ │
└───────────────────┼───────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ DATA & MESSAGING LAYER │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ PostgreSQL │ │ Kafka │ │ S3 / GCS │ │
│ │ (Metadata, │ │ (Event │ │ (Document │ │
│ │ users, │ │ streaming) │ │ storage) │ │
│ │ permissions)│ │ │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ OBSERVABILITY LAYER │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ Prometheus │ │ Grafana │ │ Custom Dashboards │ │
│ │ (Metrics) │ │ (Dashboards) │ │ • Query volume │ │
│ │ │ │ │ │ • Cost per tenant │ │
│ │ • Latency │ │ • Latency │ │ • Quality scores │ │
│ │ • Throughput │ │ • Error rate │ │ • Usage patterns │ │
│ │ • Errors │ │ • Cost │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Technology Stack:

ComponentTechnologyVersionPurpose
APIFastAPI0.115+Application layer
Vector DBMilvus/Zilliz2.5+Scalable vector search
CacheRedis Cluster7.4+Performance layer
Message QueueKafka3.8+Event streaming
DatabasePostgreSQL16+Transactional data
StorageS3/GCSN/ADocument blob storage
AuthOAuth2 + JWTN/AAuthentication
MonitoringPrometheus + GrafanaLatestObservability
Cost TrackingCustom + CloudWatchN/AUsage billing

Implementation Milestones:

MilestoneDurationDeliverableSuccess Criteria
M1: Multi-tenant Design5 daysSchema, isolation strategySecurity review pass
M2: Core Services7 daysQuery, ingestion, admin APIsFunctional endpoints
M3: Vector Pipeline6 daysMilvus integration10K docs/sec ingestion
M4: Auth & ACL5 daysPermission systemRow-level security works
M5: Monitoring4 daysDashboards, alerts99.9% uptime visibility
M6: Load Testing5 daysPerformance validation1000 QPS sustained
M7: Documentation4 daysRunbooks, architecture docsOnboarding guide

What Interviewers Will Ask:

  1. “How do you ensure tenant data isolation in the vector database?”

    • Expectation: Namespace separation, collection per tenant, or metadata filtering with strict validation
  2. “What is your strategy for handling document updates in real-time?”

    • Expectation: CDC patterns, event streaming, incremental indexing
  3. “How do you attribute costs to individual tenants for billing?”

    • Expectation: Token counting per tenant, embedding costs, storage metrics
  4. “Walk me through your disaster recovery strategy.”

    • Expectation: Backups, replication, RPO/RTO targets, runbook procedures

Problem Statement:

Business analysts spend hours writing SQL queries and creating reports. Non-technical stakeholders cannot access data insights without going through analysts. Build a system that lets users ask questions about databases in natural language, generates safe SQL, executes with guardrails, visualizes results, and explains findings in business terms.

Why This Matters:

Text-to-SQL is a major enterprise GenAI use case. This project demonstrates complex multi-component system design, safety engineering, and the ability to bridge technical and non-technical domains. It shows full-stack AI system architecture.

Architecture Diagram:

┌─────────────────────────────────────────────────────────────────┐
│ USER INTERFACE │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Conversational UI │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │ │
│ │ │ Chat Panel │ │ Data Viz │ │ Schema Explorer │ │ │
│ │ │ │ │ (Charts, │ │ (Tables, │ │ │
│ │ │ • Natural │ │ Tables) │ │ Columns, │ │ │
│ │ │ language │ │ │ │ Relationships) │ │ │
│ │ │ • Follow-up │ │ • Auto- │ │ │ │ │
│ │ │ questions │ │ generated │ │ • ER diagram │ │ │
│ │ │ • Clarify │ │ • Drill- │ │ • Column stats │ │ │
│ │ │ ambiguous │ │ down │ │ • Sample data │ │ │
│ │ │ queries │ │ │ │ │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────────┘ │ │
│ └───────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ TEXT-TO-SQL PIPELINE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Query Understanding │ │
│ │ │ │
│ │ User Query → Intent Classifier → Entity Extractor │ │
│ │ ↓ ↓ │ │
│ │ (SELECT, AGGREGATE, (Dates, │ │
│ │ EXPLAIN, COMPARE) Metrics, Filters) │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Schema Context Retrieval │ │
│ │ │ │
│ │ • Semantic search over table/column descriptions │ │
│ │ • Retrieve relevant table schemas │ │
│ │ • Include sample values for categorical columns │ │
│ │ • Add business metric definitions │ │
│ │ │ │
│ │ Retrieved Context: │ │
│ │ Tables: orders, customers, products │ │
│ │ Metrics: revenue (sum(order_total)), active_users │ │
│ │ Time range: last 30 days │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ SQL Generation + Validation │ │
│ │ │ │
│ │ LLM Prompt: │ │
│ │ • System: You are a SQL expert... │ │
│ │ • Schema: CREATE TABLE orders... │ │
│ │ • Examples: Few-shot examples of similar queries │ │
│ │ • User: "What were top products by revenue last month?" │ │
│ │ │ │
│ │ Generated SQL → Syntax Validator → Safety Checker │ │
│ │ ↓ ↓ │ │
│ │ (SQL parser) (Query allowlist, │ │
│ │ Table permissions) │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Execution + Error Handling │ │
│ │ │ │
│ │ Safe Execution: │ │
│ │ • Read-only connection (no INSERT/UPDATE/DELETE) │ │
│ │ • Query timeout (30 seconds) │ │
│ │ • Row limit (1000 results) │ │
│ │ • Query plan analysis (reject expensive queries) │ │
│ │ │ │
│ │ Error Recovery: │ │
│ │ • Syntax error → Regenerate with feedback │ │
│ │ • No results → Suggest alternative query │ │
│ │ • Timeout → Suggest aggregation/filtering │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Result Processing │ │
│ │ │ │
│ │ • Auto-detect chart type (bar, line, pie, table) │ │
│ │ • Generate natural language summary │ │
│ │ • Suggest follow-up questions │ │
│ │ • Export options (CSV, PNG, PDF) │ │
│ └───────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Technology Stack:

ComponentTechnologyVersionPurpose
UIReact + TypeScript18+Frontend
VisualizationApache ECharts5.5+Charts
APIFastAPI0.115+Backend
LLMClaude 3.5 Sonnet / GPT-4oAPISQL generation
DatabasePostgreSQL16+Data warehouse
Schema CacheRedis7.4+Metadata caching
SecurityQuery allowlist, read-onlyN/ASafety layer

Implementation Milestones:

MilestoneDurationDeliverableSuccess Criteria
M1: Schema Introspection4 daysAuto-schema discoveryWorks on any Postgres DB
M2: Text-to-SQL Engine7 daysSQL generation pipeline80%+ accuracy on test set
M3: Safety Layer4 daysQuery validationNo unauthorized writes
M4: Visualization5 daysAuto-chart generationAppropriate chart types
M5: Conversation4 daysMulti-turn handlingContextual follow-ups
M6: Evaluation4 daysAccuracy benchmarkSpider or custom test set

What Interviewers Will Ask:

  1. “How do you prevent SQL injection when generating queries with LLMs?”

    • Expectation: Parameterized queries, query allowlists, read-only connections, input sanitization
  2. “What is your strategy for handling ambiguous questions?”

    • Expectation: Clarification prompts, confidence scoring, suggested interpretations
  3. “How do you evaluate the accuracy of generated SQL?”

    • Expectation: Execution-based evaluation, result comparison, manual annotation
  4. “What happens when the database schema changes?”

    • Expectation: Schema versioning, caching invalidation, re-indexing strategies

7. Trade-offs, Limitations, and Failure Modes

Section titled “7. Trade-offs, Limitations, and Failure Modes”

Understanding common portfolio mistakes is as important as knowing what to build. Here are the patterns that distinguish amateur projects from professional ones.

Common Portfolio Mistakes:

MistakeWhy It HurtsHow to Avoid
No error handlingProduction systems fail constantly. Code that assumes success shows inexperience.Implement try/except at all boundaries, circuit breakers for external APIs
Missing testsUntested code is broken code. Interviewers will ask about your testing strategy.Aim for 70%+ coverage, include integration tests
No deployment path”Works on my machine” projects are tutorials, not portfolio pieces.Include Dockerfile, docker-compose, deployment instructions
Undocumented trade-offsEvery decision has trade-offs. Not acknowledging them shows shallow thinking.Include ADRs (Architecture Decision Records) in your docs
Over-engineeringComplex solutions to simple problems waste resources and confuse reviewers.Start simple, add complexity only with justification
No monitoringYou cannot improve what you do not measure.Add basic logging, latency tracking, error rates
Hardcoded secretsExposed API keys in GitHub are an immediate rejection signal.Use environment variables, include .env.example
No data versioningML systems without data versioning are not reproducible.Use DVC or document dataset versions

Failure Modes to Address:

  1. LLM Hallucinations: Always validate outputs. Implement confidence scoring. Have fallback responses.

  2. Rate Limiting: External APIs will throttle you. Implement exponential backoff, request queuing, and graceful degradation.

  3. Context Window Overflow: Large documents exceed token limits. Implement chunking strategies and intelligent context selection.

  4. Embedding Drift: As you update embedding models, vector spaces shift. Plan for re-indexing strategies.

  5. Cold Start: Systems with no data provide poor initial experiences. Plan for bootstrap content or onboarding flows.


Your projects will dominate technical interviews. Prepare to discuss them at multiple depths.

The Project Discussion Framework:

Interviewers typically probe through three layers:

LayerDepthExample Questions
WhatSurface”What does this project do?” “What technologies did you use?”
HowImplementation”How did you handle X?” “Why did you choose Y over Z?”
WhyArchitecture”Why this architecture?” “What would you do differently at 10x scale?”

Prepare These Stories:

For each project, prepare a 2-minute overview, a 5-minute deep dive, and a 10-minute technical discussion. Practice the STAR method (Situation, Task, Action, Result) for challenges you overcame.

Common Deep-Dive Questions:

  1. “Tell me about a bug you encountered and how you debugged it.”

    • What they want: Debugging methodology, systematic thinking, persistence
    • Good answer: Trace through observation, hypothesis, experiment, resolution
  2. “What was the hardest technical decision you made?”

    • What they want: Trade-off analysis, decision framework, learning from outcomes
    • Good answer: Options considered, criteria for decision, outcome assessment
  3. “How would this system handle 100x more load?”

    • What they want: Scale thinking, bottleneck identification, architectural evolution
    • Good answer: Specific components that would break, scaling strategies
  4. “What would you do differently if you started over?”

    • What they want: Self-reflection, learning from experience, architectural vision
    • Good answer: Honest assessment of technical debt, better approaches learned

Portfolio Presentation Tips:

  • Lead with the problem, not the technology. Business value matters more than tech stack.
  • Quantify results where possible. “Reduced query latency by 40%” beats “implemented caching.”
  • Acknowledge limitations. Nothing is perfect. Showing awareness of weaknesses demonstrates maturity.
  • Have a live demo ready. Deployed projects make a stronger impression than localhost screenshots.

What separates toy projects from production-ready systems is operational thinking. As you build, ask these questions:

The Production Readiness Checklist:

CategoryQuestions to Answer
ReliabilityWhat happens when the LLM is down? How do you handle timeouts?
ScalabilityWhat is your throughput bottleneck? How does latency grow with load?
ObservabilityCan you debug issues from logs? Do you have metrics dashboards?
SecurityHow do you handle secrets? Are inputs validated and sanitized?
MaintainabilityIs the code tested? Is there documentation? Can someone else deploy this?
CostWhat is your cost per query? How do you control spend?
ComplianceIs PII handled properly? Are there audit trails?

Cost Engineering:

Production GenAI systems have real costs. Demonstrate awareness:

  • Track token usage per request
  • Implement caching for common queries
  • Use smaller models for simple tasks
  • Consider request batching
  • Monitor and alert on spend

Example Cost Dashboard:

# Track costs per request
class CostTracker:
def __init__(self):
self.metrics = {
"input_tokens": 0,
"output_tokens": 0,
"embedding_tokens": 0,
"total_cost_usd": 0.0
}
def log_llm_call(self, model: str, input_tokens: int, output_tokens: int):
rates = {
"gpt-4o": {"input": 0.0025, "output": 0.01}, # per 1K tokens
"gpt-4o-mini": {"input": 0.00015, "output": 0.0006}
}
rate = rates.get(model, rates["gpt-4o-mini"])
cost = (input_tokens * rate["input"] + output_tokens * rate["output"]) / 1000
self.metrics["input_tokens"] += input_tokens
self.metrics["output_tokens"] += output_tokens
self.metrics["total_cost_usd"] += cost
return cost

Building a portfolio that gets you hired requires more than following tutorials. It requires demonstrating production thinking, architectural judgment, and the ability to learn from mistakes.

Key Principles:

  1. Quality over quantity. Three exceptional projects outperform ten shallow ones.

  2. Build for the role you want. Junior projects demonstrate learning ability. Senior projects demonstrate architectural judgment.

  3. Show your work. Document decisions, include architecture diagrams, write tests, deploy to production.

  4. Prepare to discuss. Your projects will be 60-70% of technical interviews. Know them deeply.

  5. Iterate based on feedback. Share your projects. Get code reviews. Improve based on critique.

Recommended Project Sequence:

Career StageProjectsFocus
BeginnerDocument Q&A, Resume AnalyzerCode quality, basic patterns, deployment
IntermediateAdvanced RAG, Research Agent, Code ReviewSystem design, optimization, integration
AdvancedFine-tuned Model, Enterprise KB, Data AnalystArchitecture, scale, technical leadership

Next Steps:

  1. Choose one project matching your target career level
  2. Build it following the specifications in this guide
  3. Deploy it and create a live demo
  4. Write a comprehensive README with architecture decisions
  5. Practice explaining it at multiple depths
  6. Iterate based on feedback

Your portfolio is a product. Treat it with the same rigor you would apply to production code at a top company. The effort invested will be reflected in interview performance and job offers.


Last updated: February 2026. Project specifications reflect current industry standards and hiring expectations.