Blog
Building an Evaluation-First RAG System
A benchmark-driven RAG system for scientific question answering, designed to demonstrate disciplined evaluation, system design, and end-to-end ML engineering.
RAGEvaluationML EngineeringMLOpsPython
PocketGuide: Offline Travel LLM, Built Evaluation-First
Building a domain-adapted, offline-capable travel assistant through evaluation-driven development, synthetic data generation, and rigorous benchmarking.
Machine LearningLLMEvaluationData GenerationProduction Engineering