Backend AI Engineer

Dhanesh Vashisth

LangGraph · Qdrant · FastAPI · Kafka · Python

I build production-grade Agentic AI systems — self-correcting LangGraph agents, multi-tenant RAG pipeline, NL-SQL Rag pipelines and autonomous research workflows that make real decisions, not just retrieve text. Every system runs in Docker, streams through Kafka, caches in Redis, and is designed to handle real users — not notebook demos.

🔗
LangGraph / LangChain
Stateful multi-agent graphs, conditional routing, tool-calling, and agentic loops in production.
📚
RAG Pipelines
Chunking strategy, OpenAI embeddings, Qdrant vector search, FlashRank reranking, RAGAS evaluation.
FastAPI
Async ASGI APIs with dependency injection, Pydantic v2 validation, lifespan events, and OpenAPI docs.
📨
Kafka
Async document ingestion and feedback pipelines. Producers, consumers, consumer groups, offset management.
🗄️
PostgreSQL · Redis · Qdrant
asyncpg connection pools, Redis semantic cache + session memory, Qdrant tenant-isolated collections.
🐍
Python
Async-first, type-annotated, production-grade. Primary language for all backend and AI engineering work.
🐳
Docker
Multi-service Docker Compose, health checks, named volumes, deployments with resource limits and probes.
🔧
CI/CD · Git
GitHub Actions for automated test, build, and deploy pipelines. Structured branching and PR-based workflows.
AI / LLM
LangGraph
LangChain · RAG
OpenAI GPT-4o-mini
FlashRank · RAGAS
API · Data
FastAPI | Python
Pydantic v2 · asyncpg
PostgreSQL · Redis
Qdrant
Messaging · Async
Kafka
aiokafka · Confluent
Consumer groups
Offset management
Infra · Ops
Docker Compose
Kubernetes
Git GitHub
Health probes
/ nl-sql-rag-pipeline

Production NL-to-SQL system for Excel Texas Wireless — non-technical admins, supervisors, and agents query a live PostgreSQL transaction database using plain English. Schema metadata vectorized with text-embedding-3-small and stored in Qdrant; a 5-node LangGraph pipeline handles schema retrieval, GPT-4o-mini SQL generation, validation with a conditional retry loop, asyncpg execution, and response formatting. Role-based access control enforced at SQL query level — WHERE clauses injected programmatically per role before execution. Redis provides a semantic cache (cosine similarity, 0.92 threshold) with role-scoped keys preventing cross-user data leakage. Full audit trail via query_logs. 35+ tests covering SQL injection, cache isolation, and RBAC bypass. 5 containers orchestrated with Docker Compose health checks.

# query path POST /query/ ──▶ JWT RBAC ──▶ Redis Semantic Cache │ miss LangGraph: Schema Retrieval ──▶ SQL Generation ──▶ Validation+RBAC ──▶ Execution ──▶ Formatter # security layer Node 3 ──▶ Forbidden keyword check ──▶ WHERE agent_id = {id} injected ──▶ retry loop VectorDB: Qdrant Cache: Redis DB: PostgreSQL + asyncpg
LangGraph FastAPI GPT-4o-mini Qdrant Redis OAuth2 JWT RBAC PostgreSQL Docker
/ multi-tenant-ecommerce-rag

Production RAG system serving Amazon, Flipkart, and Myntra from a single deployment — hard tenant isolation via separate Qdrant collections, no shared index. Policy PDFs ingested asynchronously through Kafka; a 5-node LangGraph pipeline handles routing, Qdrant retrieval, FlashRank cross-encoder reranking, GPT-4o-mini generation, and citation building. Redis provides a semantic cache (cosine similarity, 0.92 threshold) and per-session conversation memory. A feedback loop collects thumbs up/down via Kafka, stores ratings in PostgreSQL, and auto-rewrites underperforming prompts without redeployment. Evaluated with RAGAS (faithfulness + answer relevancy). 9 containers, fully orchestrated with Docker Compose health checks.

# query path POST /api/v1/query ──▶ Tenant Auth ──▶ Redis Semantic Cache │ miss LangGraph: Router ──▶ Retriever ──▶ Reranker ──▶ Generator ──▶ Citations # ingestion path POST /api/v1/ingest ──▶ Kafka ──▶ Consumer: chunk · embed · upsert Qdrant · audit log VectorDB: Qdrant Cache+Memory: Redis Prompts: PostgreSQL registry
LangGraph FastAPI Kafka Qdrant Redis FlashRank RAGAS PostgreSQL Docker
/ code-review-agent

Production-grade multi-agent code review system built on LangGraph. Submitted code enters a 5-node stateful graph: orchestrator validates and checks a SHA256 cache, then fans out to three parallel specialist agents — bug detection, code quality, and security analysis — all running GPT-4o-mini concurrently. A synthesizer node merges parallel outputs into a structured JSON report with per-category scores. SHA256 caching in PostgreSQL eliminates duplicate LLM calls for identical code submissions. Tenacity retry logic handles transient OpenAI failures. Fully containerised with Docker Compose; FastAPI backend with Pydantic v2 request validation.

# agent graph topology POST /review ──▶ Orchestrator: validate · SHA256 cache check ──▶ Parallel fan-out (LangGraph · GPT-4o-mini) ├──▶ Bug Detector ├──▶ Quality Checker ──▶ Synthesizer ──▶ JSON Report └──▶ Security Checker Cache: SHA256 · PostgreSQL Retry: tenacity Infra: Docker Compose
LangGraph FastAPI Multi-Agent Parallel Execution Python PostgreSQL Docker
/ youtube-rag-agent

Agentic RAG system that ingests YouTube video transcripts and answers questions strictly grounded in transcript content — no hallucination outside source material. Transcripts are chunked, embedded with OpenAI text-embedding-3-small, and stored in Qdrant. A LangGraph agent orchestrates retrieval, applies a constrained generation prompt that refuses to answer beyond the transcript context, and returns cited responses via a FastAPI backend. Built with langchain_qdrant.QdrantVectorStore after migrating from deprecated qdrant_client APIs — demonstrates real-world dependency management on a moving ecosystem.

# pipeline YouTube URL ──▶ Transcript fetch ──▶ Chunk · Embed ──▶ Qdrant User Query ──▶ FastAPI ──▶ LangGraph Agent ├──▶ Qdrant retrieval └──▶ GPT-4o-mini ──▶ Grounded response Constraint: LLM answers only from transcript context · refuses out-of-scope queries
LangGraph FastAPI Qdrant LangChain Python OpenAI
2025 — Present
AI Engineer (Direct Hire)
Excel Texas Wireless (TX, USA · Remote)
Built a self-correcting NL→SQL agent using LangGraph — stateful validation loop catches execution failures, routes to retry with corrected SQL, and falls back gracefully. Reduced manual query work by 70%.
Implemented stateful multi-turn memory for SQL context carryover — analysts continue queries across turns without re-specifying schema, table names, or prior filters.
Added Redis semantic cache (cosine similarity threshold) and tenacity-based retry logic — cut redundant LLM calls by ~40% and improved reliability under concurrent load.
Directly hired after 3+ years of consulting delivery on this client — conversion based on production performance, not interviews.
2021 — 2025
Software Consultant — Backend & AI Engineering
Rahul Tech Services · Client: Excel Texas Wireless (TX, USA)
Architected backend for a multi-dealer telecom portal handling concurrent users — automated 100% of refill workflows, eliminating manual reconciliation entirely.
Designed high-concurrency FastAPI + PostgreSQL systems with async handlers, connection pooling, and indexed queries — sub-100ms P95 latency on transactional endpoints.
Implemented OAuth2 + JWT with RBAC — multi-role access control for dealers, admins, and operators across the platform.

Open to backend AI engineering roles — production RAG systems, multi-agent architectures, and event-driven AI pipelines.