Which H100 instance to train Nanochat
Benchmarking H100 PCIe vs SXM vs NVL on training cost, step times, and NCCL profiling to find the cheapest GPU configuration for Nanochat
Ruminations of a data scientist turned engineer - on LLMs, deep learning, and building AI systems.
I'm an ML engineer doing independent research in Göttingen, Germany. Right now I'm porting Cosmos-Predict2.5 to vLLM-Omni, and building clarifyit.ai.
Before this, I led an AI team at Eka.care, building products serving 1M+ daily active users - from deploying Speech LLMs via vLLM on Kubernetes to designing retrieval systems that doctors relied on mid-consultation. I've scaled LLM products from 0 -> 1 serving millions of users, from prototype to production across healthcare, search, and e-commerce.
I write here about what I learn along the way.
Benchmarking H100 PCIe vs SXM vs NVL on training cost, step times, and NCCL profiling to find the cheapest GPU configuration for Nanochat
Enabling endless capabilities for LLMs
Validate the data as it streams, don't make your users wait.