Backend AI Engineer

Dhanesh Vashisth

LangGraph · Qdrant · FastAPI · Kafka · Python

I build production-grade Agentic AI systems — self-correcting LangGraph agents, multi-tenant RAG pipeline, NL-SQL Rag pipelines and autonomous research workflows that make real decisions, not just retrieve text. Every system runs in Docker, streams through Kafka, caches in Redis, and is designed to handle real users — not notebook demos.

Get in touch View projects Download Resume

Open to opportunities

Core Stack

🔗

LangGraph / LangChain

Stateful multi-agent graphs, conditional routing, tool-calling, and agentic loops in production.

📚

RAG Pipelines

Chunking strategy, OpenAI embeddings, Qdrant vector search, FlashRank reranking, RAGAS evaluation.

⚡

FastAPI

Async ASGI APIs with dependency injection, Pydantic v2 validation, lifespan events, and OpenAPI docs.

📨

Kafka

Async document ingestion and feedback pipelines. Producers, consumers, consumer groups, offset management.

🗄️

PostgreSQL · Redis · Qdrant

asyncpg connection pools, Redis semantic cache + session memory, Qdrant tenant-isolated collections.

🐍

Python

Async-first, type-annotated, production-grade. Primary language for all backend and AI engineering work.

🐳

Docker

Multi-service Docker Compose, health checks, named volumes, deployments with resource limits and probes.

🔧

CI/CD · Git

GitHub Actions for automated test, build, and deploy pipelines. Structured branching and PR-based workflows.

Stack Matrix

AI / LLM

LangGraph
LangChain · RAG
OpenAI GPT-4o-mini
FlashRank · RAGAS

API · Data

FastAPI | Python
Pydantic v2 · asyncpg
PostgreSQL · Redis
Qdrant

Messaging · Async

Kafka
aiokafka · Confluent
Consumer groups
Offset management

Infra · Ops

Docker Compose
Kubernetes
Git GitHub
Health probes

Projects

/ nl-sql-rag-pipeline

GitHub

Production NL-to-SQL system for Excel Texas Wireless — non-technical admins, supervisors, and agents query a live PostgreSQL transaction database using plain English. Schema metadata vectorized with text-embedding-3-small and stored in Qdrant; a 5-node LangGraph pipeline handles schema retrieval, GPT-4o-mini SQL generation, validation with a conditional retry loop, asyncpg execution, and response formatting. Role-based access control enforced at SQL query level — WHERE clauses injected programmatically per role before execution. Redis provides a semantic cache (cosine similarity, 0.92 threshold) with role-scoped keys preventing cross-user data leakage. Full audit trail via query_logs. 35+ tests covering SQL injection, cache isolation, and RBAC bypass. 5 containers orchestrated with Docker Compose health checks.

# query path POST /query/ ──▶ JWT RBAC ──▶ Redis Semantic Cache │ miss ▼ LangGraph: Schema Retrieval ──▶ SQL Generation ──▶ Validation+RBAC ──▶ Execution ──▶ Formatter # security layer Node 3 ──▶ Forbidden keyword check ──▶ WHERE agent_id = {id} injected ──▶ retry loop VectorDB: Qdrant │ Cache: Redis │ DB: PostgreSQL + asyncpg

/ multi-tenant-ecommerce-rag

GitHub

Production RAG system serving Amazon, Flipkart, and Myntra from a single deployment — hard tenant isolation via separate Qdrant collections, no shared index. Policy PDFs ingested asynchronously through Kafka; a 5-node LangGraph pipeline handles routing, Qdrant retrieval, FlashRank cross-encoder reranking, GPT-4o-mini generation, and citation building. Redis provides a semantic cache (cosine similarity, 0.92 threshold) and per-session conversation memory. A feedback loop collects thumbs up/down via Kafka, stores ratings in PostgreSQL, and auto-rewrites underperforming prompts without redeployment. Evaluated with RAGAS (faithfulness + answer relevancy). 9 containers, fully orchestrated with Docker Compose health checks.

# query path POST /api/v1/query ──▶ Tenant Auth ──▶ Redis Semantic Cache │ miss ▼ LangGraph: Router ──▶ Retriever ──▶ Reranker ──▶ Generator ──▶ Citations # ingestion path POST /api/v1/ingest ──▶ Kafka ──▶ Consumer: chunk · embed · upsert Qdrant · audit log VectorDB: Qdrant │ Cache+Memory: Redis │ Prompts: PostgreSQL registry

/ code-review-agent

GitHub

Production-grade multi-agent code review system built on LangGraph. Submitted code enters a 5-node stateful graph: orchestrator validates and checks a SHA256 cache, then fans out to three parallel specialist agents — bug detection, code quality, and security analysis — all running GPT-4o-mini concurrently. A synthesizer node merges parallel outputs into a structured JSON report with per-category scores. SHA256 caching in PostgreSQL eliminates duplicate LLM calls for identical code submissions. Tenacity retry logic handles transient OpenAI failures. Fully containerised with Docker Compose; FastAPI backend with Pydantic v2 request validation.

# agent graph topology POST /review ──▶ Orchestrator: validate · SHA256 cache check ──▶ Parallel fan-out (LangGraph · GPT-4o-mini) ├──▶ Bug Detector ├──▶ Quality Checker ──▶ Synthesizer ──▶ JSON Report └──▶ Security Checker Cache: SHA256 · PostgreSQL │ Retry: tenacity │ Infra: Docker Compose

/ youtube-rag-agent

GitHub

Agentic RAG system that ingests YouTube video transcripts and answers questions strictly grounded in transcript content — no hallucination outside source material. Transcripts are chunked, embedded with OpenAI text-embedding-3-small, and stored in Qdrant. A LangGraph agent orchestrates retrieval, applies a constrained generation prompt that refuses to answer beyond the transcript context, and returns cited responses via a FastAPI backend. Built with langchain_qdrant.QdrantVectorStore after migrating from deprecated qdrant_client APIs — demonstrates real-world dependency management on a moving ecosystem.

# pipeline YouTube URL ──▶ Transcript fetch ──▶ Chunk · Embed ──▶ Qdrant User Query ──▶ FastAPI ──▶ LangGraph Agent ├──▶ Qdrant retrieval └──▶ GPT-4o-mini ──▶ Grounded response Constraint: LLM answers only from transcript context · refuses out-of-scope queries

Experience

2025 — Present

AI Engineer (Direct Hire)

Excel Texas Wireless (TX, USA · Remote)

Built a self-correcting NL→SQL agent using LangGraph — stateful validation loop catches execution failures, routes to retry with corrected SQL, and falls back gracefully. Reduced manual query work by 70%.

Implemented stateful multi-turn memory for SQL context carryover — analysts continue queries across turns without re-specifying schema, table names, or prior filters.

Added Redis semantic cache (cosine similarity threshold) and tenacity-based retry logic — cut redundant LLM calls by ~40% and improved reliability under concurrent load.

Directly hired after 3+ years of consulting delivery on this client — conversion based on production performance, not interviews.

2021 — 2025

Software Consultant — Backend & AI Engineering

Rahul Tech Services · Client: Excel Texas Wireless (TX, USA)

Architected backend for a multi-dealer telecom portal handling concurrent users — automated 100% of refill workflows, eliminating manual reconciliation entirely.

Designed high-concurrency FastAPI + PostgreSQL systems with async handlers, connection pooling, and indexed queries — sub-100ms P95 latency on transactional endpoints.

Implemented OAuth2 + JWT with RBAC — multi-role access control for dealers, admins, and operators across the platform.

Contact

Open to backend AI engineering roles — production RAG systems, multi-agent architectures, and event-driven AI pipelines.

contact@dhaneshvashisth.com github.com/dhaneshvashisth linkedin.com/in/dhaneshvashisth