Nikhil Kasukurthi
Download PDFStaff AI Engineer
I'm an AI engineer with 8 years building AI products, from sitting in clinics to understand user constraints, through training and deployment, to owning the product metrics. My recent work centers on agent harnesses, MCP servers, SDKs, and skills, with a focus on evaluation. I open-sourced an evaluation library now adopted by the Gates Foundation, and led the team behind an LLM agent platform serving 100K daily users. I still profile distributed training on H100s and am currently porting Cosmos-Predict2.5 to vLLM. Published researcher with cross-functional leadership experience.
Work Experience
ML Research & Consulting
Göttingen, Germany Nov 2025 – Present- Training Gemma with GRPO to gate which documents from retrieval make it into the agent's context, using LLM-as-judge rubric scores as the reward signal on Tau-Knowledge Bench
- Designed and launched clarifyit.ai (TypeScript/React), where iterative LLM-driven questioning through tools elicits the user's intent so no bad assumptions are made before generation
- Published a Hemingway editor skill that produces a Hemingway-style readability report for Claude Code and other compatible agents
- Built an Arabic-speaking voice agent that triages patients by symptom and routes to the right specialization, with tools served over a remote MCP server for protocol search and doctor availability. POC serving ~100 users/day
- Actively porting Cosmos-Predict2.5 diffusion-based world foundation model to vLLM Omni. 12% latency and 8% memory improvements over diffusers on initial results
- Profiled CUDA kernels using Nsight Systems to measure NCCL communication overhead in ZeRO-2 on H100 SXM vs PCIe. Published technical deep-dive identifying 2x cost-efficiency gap due to NUMA topology
- Trained Nanochat with NVIDIA Transformer Engine at MXFP8 precision, improving training throughput 5% over baseline FP8. Benchmarked NVFP4
Lead Data Scientist | Eka.care
Bengaluru, India Jan 2022 – Oct 2025Healthcare company building AI-powered tools serving 1M+ DAU
Agents, MCP & Developer Experience
- Discovered via product interviews that doctors abandoned the AI chat window for external references. Designed MedAssist, an LLM client with remote MCP server support. Adopted by Apollo Hospitals; the appointment-booking agent now serves 100K daily users
- Built an MCP server over ClickHouse for natural-language analytics, used by 1,000+ doctors. The agent grounded on five key table schemas and wrote the ClickHouse queries directly (text-to-SQL)
- Early adopter of MCP; open-sourced an MCP server exposing Indian medical knowledge to LLMs (500K+ branded drugs, protocols, national formulary, 403 clinical calculators). Tool access lifted eval performance from 60% to >90%
- Built client SDKs and skills for agents used by developers at multiple orgs; drove the move to token-based auth for remote MCP servers
- Created a structured tool-use UI pattern (radio/multi-select options as tool results), reducing manual input. Conversation depth +27%. Engineered MedAssist for 2s p99 time to first token
Evaluation & Agent Observability
- Designed and open-sourced KARMA-OpenMedEvalKit, an evaluation library for LLMs with LLM-as-a-judge and rubric scoring, plus golden datasets on HuggingFace. Adopted org-wide; the Gates Foundation adopted it for healthcare bot evaluation
- Built tool-use evaluation that captures agent traces to localize where agents fail (tool selection, input mapping, answer integration). Tool access lifted drug identification to 99.7%
- Kept agents within context budget through tool-result tombstoning and conversation compaction, using a smaller LLM to summarize long histories
- Deployed LangFuse for trace/eval tracking and prompt management; evaluated CrewAI and LangChain, then built a custom agent harness on Google ADK for tighter control over tool orchestration and observability
Search & Retrieval
- Built LLM-augmented retrieval using ColQwen-2.5 on protocol PDFs indexed in Vespa with hybrid search (BM25 + dense retrieval). Recall@Top3 +24% over text-only embeddings. Chose late-interaction architecture over chunking for scalability
- Diagnosed poor medication search through query log analysis (500K docs, 1M daily requests). Discovered doctors used shorthands the system couldn't resolve. Built a query decomposition layer on ElasticSearch; nDCG@10 +55%, relevance +160%
- Built contrastive learning semantic retrieval models and BERT-based NER/entity linking to map unstructured clinical text to medical ontologies; diagnosis coding +30%, medication coding +80%
Speech & Language Models
- Deployed a custom Speech LLM (Whisper + Gemma 2) for medical transcription via vLLM plugins, 10x throughput and 5x latency improvement on L4 GPUs over the torch-compiled variant. Cut STT costs by 60%. Within one week, 60% of doctors adopted it for 95% of consultations
- Noticed clinicians juggling Google Sheets for annotation (duplicates, mis-tagging, annotator fatigue). Led a cross-functional team to build an annotation platform with RBAC and multi-modality support. Throughput jumped from 10 to 25 hours/day
Infrastructure & Data Pipelines
- Served multimodal LLMs via vLLM on Kubernetes, torch-compiled models on RayServe, with TensorRT on latency-critical paths. Cut inference cost by 50% vs SageMaker
- Built CDC ingestion (Debezium, Kafka) with Apache Beam reconciling DynamoDB, MySQL, and MongoDB change streams into BigQuery; migrated analytics workloads to ClickHouse
Leadership
- Owned the AI product roadmap ($500K ARR across MedAssist and Patient Profiling), driving direction across LLM Client, Search, and ML Infra teams
- Led a 10-person cross-functional delivery team for MedAssist with 3 direct reports; promoted 1 and mentored multiple engineers
- Ran product interviews and query log analysis to find gaps between user behavior and system metrics. Findings shaped the research roadmap
- AWS featured Eka.care as a reference customer for healthcare AI
Data Scientist | Udaan
Bengaluru, India May 2021 – Dec 2021India's largest B2B e-commerce marketplace
- Learning-to-rank models (gradient-boosted trees) lifted search-to-cart conversion by 10%. A/B tested across business verticals
- Built 3D point cloud processing pipeline (DGCNN) for LiDAR-based volume estimation of warehouse shipments. Built the data science stack from scratch (collection, annotation, training, and deployment). Achieved 40% cost savings and 50% latency reduction
Visiting Researcher | National Centre for Biological Sciences (NCBS) – TIFR
Bengaluru, India May 2020 – Mar 2021Concurrent with role at SigTuple
- Developed PrISM (Precision for Integrative Structural Models) using Variational Autoencoders, a novel unsupervised technique to score integrative models
- Won Best Poster Award at NCBS Annual Talks 2021
- Published in Bioinformatics (Vol. 38, Issue 15, August 2022)
Data Scientist III | SigTuple
Bengaluru, India Jun 2018 – May 2021Healthcare AI startup building diagnostic products
- Built retinal disease detection products from annotation and model development through clinical validation and CE certification. Published 2 papers at IEEE ISBI 2019
- Trained a RetinaNet (with focal loss) object detector for localizing retinal structures. Outputs fed into the inference DAG, routing cropped regions to downstream Diabetic Retinopathy, AMD, and Glaucoma models
- Technical lead for two diagnostic products (Fundus, Urine analysis). Defined and executed research and engineering roadmap across science, engineering, and clinical teams
- Led ML platform team, architected multi-model inference DAG using TF Serving, Kubernetes, and Cloud Functions. Improved turnaround time by 60% and reduced costs by 40%
Publications
PrISM: precision for integrative structural models
Bioinformatics, Volume 38, Issue 15 August 2022Deep learning for weak supervision of diabetic retinopathy abnormalities
IEEE International Symposium on Biomedical Imaging (ISBI) July 2019Dynamic region proposal networks for semantic segmentation in automated glaucoma screening
IEEE International Symposium on Biomedical Imaging (ISBI) July 2019Technical Skills
LLM & Agentic AI
Evaluation & Observability
ML Systems & Serving
Profiling & Quantization
Search & Retrieval
Infrastructure
Languages
Education
B. Tech - Computer Science and Engineering
VIT University, Vellore, India 2014 – 2018CGPA: 8.39/10
Awards
Hackathon Winner, AWS GenAI Hackathon
August 2024Built appointment booking agent through tool use
Impact Award, Eka.care
2023For major organizational impact
Best Poster Award, NCBS Annual Talks
2021For PrISM research presentation
Hackathon Runner Up, Practo Sandbox
December 2017Skin cancer detection through deep learning model