Resume – Nikhil Kasukurthi | bluenotebook.io

ML Engineer

I'm an ML engineer with 8 years building AI products end-to-end — from sitting in clinics to understand user constraints, through training and deployment, to owning the product metrics. I've built serving infrastructure for multimodal LLMs via vLLM on Kubernetes, designed retrieval pipelines from scratch with Vespa, and profiled distributed training on H100s. Currently porting Cosmos-Predict2.5 to vLLM. Published researcher with cross-functional leadership experience.

Work Experience

Independent ML Research

Göttingen, Germany Nov 2025 – Present
  • Actively porting Cosmos-Predict2.5 diffusion-based world foundation model to vLLM Omni. 12% latency and 8% memory improvements over diffusers on initial results
  • Profiled CUDA kernels using Nsight Systems to measure NCCL communication overhead in ZeRO-2 on H100 SXM vs PCIe. Published technical deep-dive identifying 2x cost-efficiency gap due to NUMA topology
  • Trained Nanochat with NVIDIA Transformer Engine at MXFP8 precision, improving training throughput 5% over baseline FP8. Benchmarked NVFP4
  • Deployed an Arabic-speaking voice agent on ElevenLabs for clinical appointment booking. Agent triages patients and routes to specializations via a remote MCP server. Currently a POC serving ~100 users/day
  • Built Voice Activity Detection (VAD) at the RTP packet level for a telephony product. Deployed ONNX models for optimized edge inference, cutting costs over third-party ASR providers
  • Designed and launched clarifyit.ai, an interactive prompt refinement tool. Iterative LLM-driven questioning sharpens user prompts before generation

Lead Data Scientist | Eka.care

Bengaluru, India Jan 2022 – Oct 2025
Healthcare company building AI-powered tools serving 1M+ DAU

Speech & Language Models

  • Deployed a custom Speech LLM (Whisper + Gemma 2) for medical transcription via vLLM plugins — 10x throughput, 5x latency improvement on L4 GPUs over torch-compiled variant. Cut STT costs by 60%. Within one week, 60% of doctors adopted it for 95% of consultations
  • Noticed clinicians juggling Google Sheets for annotation — duplicates, mis-tagging, annotator fatigue. Led a cross-functional team to build an annotation platform with RBAC and multi-modality support. Throughput jumped from 10 to 25 hours/day. Gathered 250 hours of medical speech data and 1,200 protocol documents

Search & Retrieval

  • Built LLM-augmented retrieval pipeline using ColQwen-2.5 on protocol PDFs indexed in Vespa with hybrid search (BM25 + dense retrieval). Recall@Top3 +24% over text-only embeddings. Chose late-interaction architecture over chunking for scalability
  • Diagnosed poor medication search through query log analysis (500K docs, 1M daily requests). Discovered doctors used shorthands the system couldn't resolve. Built query decomposition layer on ElasticSearch; nDCG@10 +55%, relevance +160%
  • Built contrastive learning semantic retrieval models and BERT-based NER/entity linking to map unstructured clinical text to medical ontologies; diagnosis coding +30%, medication coding +80%. Ran as a batch service averaging 200K queries coded monthly

Products & Agentic AI

  • Discovered through product interviews that doctors abandoned our AI assistant for external drug references. Designed MedAssist, an LLM client with remote MCP server support, unifying drug references, protocols, and booking. Adopted by Apollo Hospitals
  • Early adopter of MCP — open-sourced MCP server for Indian medical context (drug databases, protocols, appointment booking)
  • Deployed Donut model in-house for OCR-free medical document understanding. Cut costs from AWS Textract

ML Infrastructure

  • Served multimodal LLMs via vLLM on Kubernetes, torch-compiled models on RayServe. Applied TensorRT for latency-critical paths. Cut inference cost by 50% vs SageMaker
  • Built PySpark pipelines for feature creation. Used Apache Beam to unify scattered records across DBs into patient profiles

LLM Evaluations

  • Open-sourced KARMA-OpenMedEvalKit, an evaluation library for LLMs in healthcare. Gates Foundation adopted KARMA for healthcare bot evaluation
  • Evaluated agent harnesses vs CrewAI on Tau-Bench2 with clinical scenarios, and overall benchmarks on HELM and MedHELM

Leadership

  • Managed a cross-functional team of 3 (1 engineer, 2 data scientists). Promoted 1 direct report
  • Owned MedAssist and Patient Profiling products end-to-end
  • Ran product interviews and query log analysis to find gaps between user behavior and system metrics. Findings shaped the research roadmap
  • AWS featured Eka.care as a reference customer for healthcare AI

Data Scientist | Udaan

Bengaluru, India May 2021 – Dec 2021
India's largest B2B e-commerce marketplace
  • Learning-to-rank models (gradient-boosted trees) lifted search-to-cart conversion by 10%. A/B tested across business verticals
  • Built 3D point cloud processing pipeline (DGCNN) for LiDAR-based volume estimation of warehouse shipments. Built the data science stack from scratch: collection, annotation, training, and deployment. Achieved 40% cost savings and 50% latency reduction

Visiting Researcher | National Centre for Biological Sciences (NCBS) – TIFR

Bengaluru, India May 2020 – Mar 2021
Concurrent with role at SigTuple
  • Developed PrISM (Precision for Integrative Structural Models) using Variational Autoencoders, a novel unsupervised technique to score integrative models
  • Won Best Poster Award at NCBS Annual Talks 2021
  • Published in Bioinformatics (Vol. 38, Issue 15, August 2022)

Data Scientist III | SigTuple

Bengaluru, India Jun 2018 – May 2021
Healthcare AI startup building diagnostic products
  • Built retinal disease detection products end-to-end, from annotation and model development through clinical validation and CE certification. Published 2 papers at IEEE ISBI 2019
  • Trained a RetinaNet (with focal loss) object detector for localizing retinal structures. Outputs fed into the inference DAG, routing cropped regions to downstream Diabetic Retinopathy, AMD, and Glaucoma models
  • Technical lead for two diagnostic products (Fundus, Urine analysis). Defined and executed research and engineering roadmap across science, engineering, and clinical teams
  • Led ML platform team, architected multi-model inference DAG using TF Serving, Kubernetes, and Cloud Functions. Improved turnaround time by 60% and reduced costs by 40%

Publications

PrISM: precision for integrative structural models

Bioinformatics, Volume 38, Issue 15 August 2022
Varun Ullanat, Nikhil Kasukurthi, Shruthi Viswanath

Deep learning for weak supervision of diabetic retinopathy abnormalities

IEEE International Symposium on Biomedical Imaging (ISBI) July 2019
Maroof Ahmad, Nikhil Kasukurthi, Harshit Pande

Dynamic region proposal networks for semantic segmentation in automated glaucoma screening

IEEE International Symposium on Biomedical Imaging (ISBI) July 2019
Shivam Shah, Nikhil Kasukurthi, Harshit Pande

Technical Skills

ML Systems

PyTorch vLLM TensorRT ONNX DeepSpeed RayServe TorchServe TensorFlow Diffusion Models

Profiling & Quantization

Nsight Systems PyTorch Memory Profiler CUDA Profiling INT8 MXFP8 NVFP4 Tensor Parallel ZeRO-2

LLM & Agentic AI

MCP RAG Whisper Gemma BERT NER/NEL RLHF Transformers

Search & Retrieval

ElasticSearch Vespa FAISS ColQwen BM25 Contrastive Learning Learning-to-Rank

Evaluation & Metrics

HELM Tau-Bench2 LLM-as-Judge A/B Testing

Infrastructure

Kubernetes Docker SLURM Helm AWS GCP Apache Beam PySpark BigQuery

Languages

Python Go CUDA

Education

B. Tech - Computer Science and Engineering

VIT University, Vellore, India 2014 – 2018
CGPA: 8.39/10

Awards

Hackathon Winner — AWS GenAI Hackathon

August 2024
Built appointment booking agent through tool use

Impact Award — Eka.care

2023
For major organizational impact

Best Poster Award — NCBS Annual Talks

2021
For PrISM research presentation

Hackathon Runner Up — Practo Sandbox

December 2017
Skin cancer detection through deep learning model

Interests

Scuba Diving (Open Water Certified) Rock Climbing Trekking Formula 1