Resume – Nikhil Kasukurthi | bluenotebook.io

Staff AI Engineer

I'm an AI engineer with 8 years building AI products, from sitting in clinics to understand user constraints, through training and deployment, to owning the product metrics. My recent work centers on agent harnesses, MCP servers, SDKs, and skills, with a focus on evaluation. I open-sourced an evaluation library now adopted by the Gates Foundation, and led the team behind an LLM agent platform serving 100K daily users. I still profile distributed training on H100s and am currently porting Cosmos-Predict2.5 to vLLM. Published researcher with cross-functional leadership experience.

Work Experience

ML Research & Consulting

Göttingen, Germany Nov 2025 – Present
  • Training Gemma with GRPO to gate which documents from retrieval make it into the agent's context, using LLM-as-judge rubric scores as the reward signal on Tau-Knowledge Bench
  • Designed and launched clarifyit.ai (TypeScript/React), where iterative LLM-driven questioning through tools elicits the user's intent so no bad assumptions are made before generation
  • Published a Hemingway editor skill that produces a Hemingway-style readability report for Claude Code and other compatible agents
  • Built an Arabic-speaking voice agent that triages patients by symptom and routes to the right specialization, with tools served over a remote MCP server for protocol search and doctor availability. POC serving ~100 users/day
  • Actively porting Cosmos-Predict2.5 diffusion-based world foundation model to vLLM Omni. 12% latency and 8% memory improvements over diffusers on initial results
  • Profiled CUDA kernels using Nsight Systems to measure NCCL communication overhead in ZeRO-2 on H100 SXM vs PCIe. Published technical deep-dive identifying 2x cost-efficiency gap due to NUMA topology
  • Trained Nanochat with NVIDIA Transformer Engine at MXFP8 precision, improving training throughput 5% over baseline FP8. Benchmarked NVFP4

Lead Data Scientist | Eka.care

Bengaluru, India Jan 2022 – Oct 2025
Healthcare company building AI-powered tools serving 1M+ DAU

Agents, MCP & Developer Experience

  • Discovered via product interviews that doctors abandoned the AI chat window for external references. Designed MedAssist, an LLM client with remote MCP server support. Adopted by Apollo Hospitals; the appointment-booking agent now serves 100K daily users
  • Built an MCP server over ClickHouse for natural-language analytics, used by 1,000+ doctors. The agent grounded on five key table schemas and wrote the ClickHouse queries directly (text-to-SQL)
  • Early adopter of MCP; open-sourced an MCP server exposing Indian medical knowledge to LLMs (500K+ branded drugs, protocols, national formulary, 403 clinical calculators). Tool access lifted eval performance from 60% to >90%
  • Built client SDKs and skills for agents used by developers at multiple orgs; drove the move to token-based auth for remote MCP servers
  • Created a structured tool-use UI pattern (radio/multi-select options as tool results), reducing manual input. Conversation depth +27%. Engineered MedAssist for 2s p99 time to first token

Evaluation & Agent Observability

  • Designed and open-sourced KARMA-OpenMedEvalKit, an evaluation library for LLMs with LLM-as-a-judge and rubric scoring, plus golden datasets on HuggingFace. Adopted org-wide; the Gates Foundation adopted it for healthcare bot evaluation
  • Built tool-use evaluation that captures agent traces to localize where agents fail (tool selection, input mapping, answer integration). Tool access lifted drug identification to 99.7%
  • Kept agents within context budget through tool-result tombstoning and conversation compaction, using a smaller LLM to summarize long histories
  • Deployed LangFuse for trace/eval tracking and prompt management; evaluated CrewAI and LangChain, then built a custom agent harness on Google ADK for tighter control over tool orchestration and observability

Search & Retrieval

  • Built LLM-augmented retrieval using ColQwen-2.5 on protocol PDFs indexed in Vespa with hybrid search (BM25 + dense retrieval). Recall@Top3 +24% over text-only embeddings. Chose late-interaction architecture over chunking for scalability
  • Diagnosed poor medication search through query log analysis (500K docs, 1M daily requests). Discovered doctors used shorthands the system couldn't resolve. Built a query decomposition layer on ElasticSearch; nDCG@10 +55%, relevance +160%
  • Built contrastive learning semantic retrieval models and BERT-based NER/entity linking to map unstructured clinical text to medical ontologies; diagnosis coding +30%, medication coding +80%

Speech & Language Models

  • Deployed a custom Speech LLM (Whisper + Gemma 2) for medical transcription via vLLM plugins, 10x throughput and 5x latency improvement on L4 GPUs over the torch-compiled variant. Cut STT costs by 60%. Within one week, 60% of doctors adopted it for 95% of consultations
  • Noticed clinicians juggling Google Sheets for annotation (duplicates, mis-tagging, annotator fatigue). Led a cross-functional team to build an annotation platform with RBAC and multi-modality support. Throughput jumped from 10 to 25 hours/day

Infrastructure & Data Pipelines

  • Served multimodal LLMs via vLLM on Kubernetes, torch-compiled models on RayServe, with TensorRT on latency-critical paths. Cut inference cost by 50% vs SageMaker
  • Built CDC ingestion (Debezium, Kafka) with Apache Beam reconciling DynamoDB, MySQL, and MongoDB change streams into BigQuery; migrated analytics workloads to ClickHouse

Leadership

  • Owned the AI product roadmap ($500K ARR across MedAssist and Patient Profiling), driving direction across LLM Client, Search, and ML Infra teams
  • Led a 10-person cross-functional delivery team for MedAssist with 3 direct reports; promoted 1 and mentored multiple engineers
  • Ran product interviews and query log analysis to find gaps between user behavior and system metrics. Findings shaped the research roadmap
  • AWS featured Eka.care as a reference customer for healthcare AI

Data Scientist | Udaan

Bengaluru, India May 2021 – Dec 2021
India's largest B2B e-commerce marketplace
  • Learning-to-rank models (gradient-boosted trees) lifted search-to-cart conversion by 10%. A/B tested across business verticals
  • Built 3D point cloud processing pipeline (DGCNN) for LiDAR-based volume estimation of warehouse shipments. Built the data science stack from scratch (collection, annotation, training, and deployment). Achieved 40% cost savings and 50% latency reduction

Visiting Researcher | National Centre for Biological Sciences (NCBS) – TIFR

Bengaluru, India May 2020 – Mar 2021
Concurrent with role at SigTuple
  • Developed PrISM (Precision for Integrative Structural Models) using Variational Autoencoders, a novel unsupervised technique to score integrative models
  • Won Best Poster Award at NCBS Annual Talks 2021
  • Published in Bioinformatics (Vol. 38, Issue 15, August 2022)

Data Scientist III | SigTuple

Bengaluru, India Jun 2018 – May 2021
Healthcare AI startup building diagnostic products
  • Built retinal disease detection products from annotation and model development through clinical validation and CE certification. Published 2 papers at IEEE ISBI 2019
  • Trained a RetinaNet (with focal loss) object detector for localizing retinal structures. Outputs fed into the inference DAG, routing cropped regions to downstream Diabetic Retinopathy, AMD, and Glaucoma models
  • Technical lead for two diagnostic products (Fundus, Urine analysis). Defined and executed research and engineering roadmap across science, engineering, and clinical teams
  • Led ML platform team, architected multi-model inference DAG using TF Serving, Kubernetes, and Cloud Functions. Improved turnaround time by 60% and reduced costs by 40%

Publications

PrISM: precision for integrative structural models

Bioinformatics, Volume 38, Issue 15 August 2022
Varun Ullanat, Nikhil Kasukurthi, Shruthi Viswanath

Deep learning for weak supervision of diabetic retinopathy abnormalities

IEEE International Symposium on Biomedical Imaging (ISBI) July 2019
Maroof Ahmad, Nikhil Kasukurthi, Harshit Pande

Dynamic region proposal networks for semantic segmentation in automated glaucoma screening

IEEE International Symposium on Biomedical Imaging (ISBI) July 2019
Shivam Shah, Nikhil Kasukurthi, Harshit Pande

Technical Skills

LLM & Agentic AI

MCP Agent Harnesses SDKs Skill Authoring Tool Use Multi-Agent Orchestration Context Engineering RAG Google ADK GRPO/RLHF

Evaluation & Observability

LLM-as-Judge Golden Datasets Rubric Scoring Tau-Bench2 HELM LangFuse Prometheus Grafana

ML Systems & Serving

PyTorch vLLM TensorRT ONNX DeepSpeed RayServe TorchServe Diffusion Models

Profiling & Quantization

Nsight Systems PyTorch Memory Profiler CUDA Profiling INT8 MXFP8 NVFP4 Tensor Parallel ZeRO-2

Search & Retrieval

ElasticSearch Vespa FAISS ColQwen BM25 Contrastive Learning Learning-to-Rank

Infrastructure

Kubernetes Docker Helm SLURM AWS GCP Kafka Apache Beam PySpark ClickHouse BigQuery

Languages

Python Go TypeScript SQL CUDA

Education

B. Tech - Computer Science and Engineering

VIT University, Vellore, India 2014 – 2018
CGPA: 8.39/10

Awards

Hackathon Winner, AWS GenAI Hackathon

August 2024
Built appointment booking agent through tool use

Impact Award, Eka.care

2023
For major organizational impact

Best Poster Award, NCBS Annual Talks

2021
For PrISM research presentation

Hackathon Runner Up, Practo Sandbox

December 2017
Skin cancer detection through deep learning model

Interests

Scuba Diving (Open Water Certified) Rock Climbing Trekking Formula 1