Nikhil Kasukurthi
Download PDFML Engineer
I'm an ML engineer with 8 years building AI products end-to-end — from sitting in clinics to understand user constraints, through training and deployment, to owning the product metrics. I've built serving infrastructure for multimodal LLMs via vLLM on Kubernetes, designed retrieval pipelines from scratch with Vespa, and profiled distributed training on H100s. Currently porting Cosmos-Predict2.5 to vLLM. Published researcher with cross-functional leadership experience.
Work Experience
Independent ML Research
Göttingen, Germany Nov 2025 – Present- Actively porting Cosmos-Predict2.5 diffusion-based world foundation model to vLLM Omni. 12% latency and 8% memory improvements over diffusers on initial results
- Profiled CUDA kernels using Nsight Systems to measure NCCL communication overhead in ZeRO-2 on H100 SXM vs PCIe. Published technical deep-dive identifying 2x cost-efficiency gap due to NUMA topology
- Trained Nanochat with NVIDIA Transformer Engine at MXFP8 precision, improving training throughput 5% over baseline FP8. Benchmarked NVFP4
- Deployed an Arabic-speaking voice agent on ElevenLabs for clinical appointment booking. Agent triages patients and routes to specializations via a remote MCP server. Currently a POC serving ~100 users/day
- Built Voice Activity Detection (VAD) at the RTP packet level for a telephony product. Deployed ONNX models for optimized edge inference, cutting costs over third-party ASR providers
- Designed and launched clarifyit.ai, an interactive prompt refinement tool. Iterative LLM-driven questioning sharpens user prompts before generation
Lead Data Scientist | Eka.care
Bengaluru, India Jan 2022 – Oct 2025Healthcare company building AI-powered tools serving 1M+ DAU
Speech & Language Models
- Deployed a custom Speech LLM (Whisper + Gemma 2) for medical transcription via vLLM plugins — 10x throughput, 5x latency improvement on L4 GPUs over torch-compiled variant. Cut STT costs by 60%. Within one week, 60% of doctors adopted it for 95% of consultations
- Noticed clinicians juggling Google Sheets for annotation — duplicates, mis-tagging, annotator fatigue. Led a cross-functional team to build an annotation platform with RBAC and multi-modality support. Throughput jumped from 10 to 25 hours/day. Gathered 250 hours of medical speech data and 1,200 protocol documents
Search & Retrieval
- Built LLM-augmented retrieval pipeline using ColQwen-2.5 on protocol PDFs indexed in Vespa with hybrid search (BM25 + dense retrieval). Recall@Top3 +24% over text-only embeddings. Chose late-interaction architecture over chunking for scalability
- Diagnosed poor medication search through query log analysis (500K docs, 1M daily requests). Discovered doctors used shorthands the system couldn't resolve. Built query decomposition layer on ElasticSearch; nDCG@10 +55%, relevance +160%
- Built contrastive learning semantic retrieval models and BERT-based NER/entity linking to map unstructured clinical text to medical ontologies; diagnosis coding +30%, medication coding +80%. Ran as a batch service averaging 200K queries coded monthly
Products & Agentic AI
- Discovered through product interviews that doctors abandoned our AI assistant for external drug references. Designed MedAssist, an LLM client with remote MCP server support, unifying drug references, protocols, and booking. Adopted by Apollo Hospitals
- Early adopter of MCP — open-sourced MCP server for Indian medical context (drug databases, protocols, appointment booking)
- Deployed Donut model in-house for OCR-free medical document understanding. Cut costs from AWS Textract
ML Infrastructure
- Served multimodal LLMs via vLLM on Kubernetes, torch-compiled models on RayServe. Applied TensorRT for latency-critical paths. Cut inference cost by 50% vs SageMaker
- Built PySpark pipelines for feature creation. Used Apache Beam to unify scattered records across DBs into patient profiles
LLM Evaluations
- Open-sourced KARMA-OpenMedEvalKit, an evaluation library for LLMs in healthcare. Gates Foundation adopted KARMA for healthcare bot evaluation
- Evaluated agent harnesses vs CrewAI on Tau-Bench2 with clinical scenarios, and overall benchmarks on HELM and MedHELM
Leadership
- Managed a cross-functional team of 3 (1 engineer, 2 data scientists). Promoted 1 direct report
- Owned MedAssist and Patient Profiling products end-to-end
- Ran product interviews and query log analysis to find gaps between user behavior and system metrics. Findings shaped the research roadmap
- AWS featured Eka.care as a reference customer for healthcare AI
Data Scientist | Udaan
Bengaluru, India May 2021 – Dec 2021India's largest B2B e-commerce marketplace
- Learning-to-rank models (gradient-boosted trees) lifted search-to-cart conversion by 10%. A/B tested across business verticals
- Built 3D point cloud processing pipeline (DGCNN) for LiDAR-based volume estimation of warehouse shipments. Built the data science stack from scratch: collection, annotation, training, and deployment. Achieved 40% cost savings and 50% latency reduction
Visiting Researcher | National Centre for Biological Sciences (NCBS) – TIFR
Bengaluru, India May 2020 – Mar 2021Concurrent with role at SigTuple
- Developed PrISM (Precision for Integrative Structural Models) using Variational Autoencoders, a novel unsupervised technique to score integrative models
- Won Best Poster Award at NCBS Annual Talks 2021
- Published in Bioinformatics (Vol. 38, Issue 15, August 2022)
Data Scientist III | SigTuple
Bengaluru, India Jun 2018 – May 2021Healthcare AI startup building diagnostic products
- Built retinal disease detection products end-to-end, from annotation and model development through clinical validation and CE certification. Published 2 papers at IEEE ISBI 2019
- Trained a RetinaNet (with focal loss) object detector for localizing retinal structures. Outputs fed into the inference DAG, routing cropped regions to downstream Diabetic Retinopathy, AMD, and Glaucoma models
- Technical lead for two diagnostic products (Fundus, Urine analysis). Defined and executed research and engineering roadmap across science, engineering, and clinical teams
- Led ML platform team, architected multi-model inference DAG using TF Serving, Kubernetes, and Cloud Functions. Improved turnaround time by 60% and reduced costs by 40%
Publications
PrISM: precision for integrative structural models
Bioinformatics, Volume 38, Issue 15 August 2022Deep learning for weak supervision of diabetic retinopathy abnormalities
IEEE International Symposium on Biomedical Imaging (ISBI) July 2019Dynamic region proposal networks for semantic segmentation in automated glaucoma screening
IEEE International Symposium on Biomedical Imaging (ISBI) July 2019Technical Skills
ML Systems
Profiling & Quantization
LLM & Agentic AI
Search & Retrieval
Evaluation & Metrics
Infrastructure
Languages
Education
B. Tech - Computer Science and Engineering
VIT University, Vellore, India 2014 – 2018CGPA: 8.39/10
Awards
Hackathon Winner — AWS GenAI Hackathon
August 2024Built appointment booking agent through tool use
Impact Award — Eka.care
2023For major organizational impact
Best Poster Award — NCBS Annual Talks
2021For PrISM research presentation
Hackathon Runner Up — Practo Sandbox
December 2017Skin cancer detection through deep learning model