Nikhil Kasukurthi

Staff AI Engineer

I'm an AI engineer with 8 years building AI products, from sitting in clinics to understand user constraints, through training and deployment, to owning the product metrics. My recent work centers on agent harnesses, MCP servers, SDKs, and skills, with a focus on evaluation. I open-sourced an evaluation library now adopted by the Gates Foundation, and led the team behind an LLM agent platform serving 100K daily users. I still profile distributed training on H100s and am currently porting Cosmos-Predict2.5 to vLLM. Published researcher with cross-functional leadership experience.

Work Experience

ML Research & Consulting

Göttingen, Germany Nov 2025 – Present

Training Gemma with GRPO to gate which documents from retrieval make it into the agent's context, using LLM-as-judge rubric scores as the reward signal on Tau-Knowledge Bench
Designed and launched clarifyit.ai (TypeScript/React), where iterative LLM-driven questioning through tools elicits the user's intent so no bad assumptions are made before generation
Published a Hemingway editor skill that produces a Hemingway-style readability report for Claude Code and other compatible agents
Built an Arabic-speaking voice agent that triages patients by symptom and routes to the right specialization, with tools served over a remote MCP server for protocol search and doctor availability. POC serving ~100 users/day
Actively porting Cosmos-Predict2.5 diffusion-based world foundation model to vLLM Omni. 12% latency and 8% memory improvements over diffusers on initial results
Profiled CUDA kernels using Nsight Systems to measure NCCL communication overhead in ZeRO-2 on H100 SXM vs PCIe. Published technical deep-dive identifying 2x cost-efficiency gap due to NUMA topology
Trained Nanochat with NVIDIA Transformer Engine at MXFP8 precision, improving training throughput 5% over baseline FP8. Benchmarked NVFP4

Lead Data Scientist | Eka.care

Bengaluru, India Jan 2022 – Oct 2025

Healthcare company building AI-powered tools serving 1M+ DAU

Agents, MCP & Developer Experience

Discovered via product interviews that doctors abandoned the AI chat window for external references. Designed MedAssist, an LLM client with remote MCP server support. Adopted by Apollo Hospitals; the appointment-booking agent now serves 100K daily users
Built an MCP server over ClickHouse for natural-language analytics, used by 1,000+ doctors. The agent grounded on five key table schemas and wrote the ClickHouse queries directly (text-to-SQL)
Early adopter of MCP; open-sourced an MCP server exposing Indian medical knowledge to LLMs (500K+ branded drugs, protocols, national formulary, 403 clinical calculators). Tool access lifted eval performance from 60% to >90%
Built client SDKs and skills for agents used by developers at multiple orgs; drove the move to token-based auth for remote MCP servers
Created a structured tool-use UI pattern (radio/multi-select options as tool results), reducing manual input. Conversation depth +27%. Engineered MedAssist for 2s p99 time to first token

Evaluation & Agent Observability

Designed and open-sourced KARMA-OpenMedEvalKit, an evaluation library for LLMs with LLM-as-a-judge and rubric scoring, plus golden datasets on HuggingFace. Adopted org-wide; the Gates Foundation adopted it for healthcare bot evaluation
Built tool-use evaluation that captures agent traces to localize where agents fail (tool selection, input mapping, answer integration). Tool access lifted drug identification to 99.7%
Kept agents within context budget through tool-result tombstoning and conversation compaction, using a smaller LLM to summarize long histories
Deployed LangFuse for trace/eval tracking and prompt management; evaluated CrewAI and LangChain, then built a custom agent harness on Google ADK for tighter control over tool orchestration and observability

Search & Retrieval

Built LLM-augmented retrieval using ColQwen-2.5 on protocol PDFs indexed in Vespa with hybrid search (BM25 + dense retrieval). Recall@Top3 +24% over text-only embeddings. Chose late-interaction architecture over chunking for scalability
Diagnosed poor medication search through query log analysis (500K docs, 1M daily requests). Discovered doctors used shorthands the system couldn't resolve. Built a query decomposition layer on ElasticSearch; nDCG@10 +55%, relevance +160%
Built contrastive learning semantic retrieval models and BERT-based NER/entity linking to map unstructured clinical text to medical ontologies; diagnosis coding +30%, medication coding +80%

Speech & Language Models

Deployed a custom Speech LLM (Whisper + Gemma 2) for medical transcription via vLLM plugins, 10x throughput and 5x latency improvement on L4 GPUs over the torch-compiled variant. Cut STT costs by 60%. Within one week, 60% of doctors adopted it for 95% of consultations
Noticed clinicians juggling Google Sheets for annotation (duplicates, mis-tagging, annotator fatigue). Led a cross-functional team to build an annotation platform with RBAC and multi-modality support. Throughput jumped from 10 to 25 hours/day

Infrastructure & Data Pipelines

Served multimodal LLMs via vLLM on Kubernetes, torch-compiled models on RayServe, with TensorRT on latency-critical paths. Cut inference cost by 50% vs SageMaker
Built CDC ingestion (Debezium, Kafka) with Apache Beam reconciling DynamoDB, MySQL, and MongoDB change streams into BigQuery; migrated analytics workloads to ClickHouse

Leadership

Owned the AI product roadmap ($500K ARR across MedAssist and Patient Profiling), driving direction across LLM Client, Search, and ML Infra teams
Led a 10-person cross-functional delivery team for MedAssist with 3 direct reports; promoted 1 and mentored multiple engineers
Ran product interviews and query log analysis to find gaps between user behavior and system metrics. Findings shaped the research roadmap
AWS featured Eka.care as a reference customer for healthcare AI

Data Scientist | Udaan

Bengaluru, India May 2021 – Dec 2021

India's largest B2B e-commerce marketplace

Learning-to-rank models (gradient-boosted trees) lifted search-to-cart conversion by 10%. A/B tested across business verticals
Built 3D point cloud processing pipeline (DGCNN) for LiDAR-based volume estimation of warehouse shipments. Built the data science stack from scratch (collection, annotation, training, and deployment). Achieved 40% cost savings and 50% latency reduction

Visiting Researcher | National Centre for Biological Sciences (NCBS) – TIFR

Bengaluru, India May 2020 – Mar 2021

Concurrent with role at SigTuple

Developed PrISM (Precision for Integrative Structural Models) using Variational Autoencoders, a novel unsupervised technique to score integrative models
Won Best Poster Award at NCBS Annual Talks 2021
Published in Bioinformatics (Vol. 38, Issue 15, August 2022)

Data Scientist III | SigTuple

Bengaluru, India Jun 2018 – May 2021

Healthcare AI startup building diagnostic products

Built retinal disease detection products from annotation and model development through clinical validation and CE certification. Published 2 papers at IEEE ISBI 2019
Trained a RetinaNet (with focal loss) object detector for localizing retinal structures. Outputs fed into the inference DAG, routing cropped regions to downstream Diabetic Retinopathy, AMD, and Glaucoma models
Technical lead for two diagnostic products (Fundus, Urine analysis). Defined and executed research and engineering roadmap across science, engineering, and clinical teams
Led ML platform team, architected multi-model inference DAG using TF Serving, Kubernetes, and Cloud Functions. Improved turnaround time by 60% and reduced costs by 40%

Publications

PrISM: precision for integrative structural models

Bioinformatics, Volume 38, Issue 15 August 2022

Varun Ullanat, Nikhil Kasukurthi, Shruthi Viswanath

Deep learning for weak supervision of diabetic retinopathy abnormalities

IEEE International Symposium on Biomedical Imaging (ISBI) July 2019

Maroof Ahmad, Nikhil Kasukurthi, Harshit Pande

Dynamic region proposal networks for semantic segmentation in automated glaucoma screening

IEEE International Symposium on Biomedical Imaging (ISBI) July 2019

Shivam Shah, Nikhil Kasukurthi, Harshit Pande

Technical Skills

LLM & Agentic AI

MCP Agent Harnesses SDKs Skill Authoring Tool Use Multi-Agent Orchestration Context Engineering RAG Google ADK GRPO/RLHF

Evaluation & Observability

LLM-as-Judge Golden Datasets Rubric Scoring Tau-Bench2 HELM LangFuse Prometheus Grafana

ML Systems & Serving

PyTorch vLLM TensorRT ONNX DeepSpeed RayServe TorchServe Diffusion Models

Profiling & Quantization

Nsight Systems PyTorch Memory Profiler CUDA Profiling INT8 MXFP8 NVFP4 Tensor Parallel ZeRO-2

Search & Retrieval

ElasticSearch Vespa FAISS ColQwen BM25 Contrastive Learning Learning-to-Rank

Infrastructure

Kubernetes Docker Helm SLURM AWS GCP Kafka Apache Beam PySpark ClickHouse BigQuery

Languages

Python Go TypeScript SQL CUDA

Education

B. Tech - Computer Science and Engineering

VIT University, Vellore, India 2014 – 2018

CGPA: 8.39/10

Awards

Hackathon Winner, AWS GenAI Hackathon

August 2024

Built appointment booking agent through tool use

Impact Award, Eka.care

2023

For major organizational impact

Best Poster Award, NCBS Annual Talks

2021

For PrISM research presentation

Hackathon Runner Up, Practo Sandbox

December 2017

Skin cancer detection through deep learning model

Interests

Scuba Diving (Open Water Certified) Rock Climbing Trekking Formula 1