Hippocratic AI
LLM Inference Engineer
Palo AltofulltimemidAdded today
About this role
Hippocratic AI, a healthcare-focused generative AI company, seeks an LLM Inference Engineer to optimize and scale their large language model serving infrastructure. You'll design distributed inference architectures, implement advanced optimization techniques, and ensure efficient deployment of safety-critical clinical AI systems.
What you'll do
- Design and implement multi-node serving architectures for distributed LLM inference
- Optimize multi-LoRA serving systems and apply quantization techniques (FP4/FP6)
- Implement speculative decoding and latency optimization strategies
- Develop disaggregated serving solutions with optimized caching for prefill and decoding
- Benchmark and improve system performance across deployment scenarios and GPU types
- Collaborate with healthcare and AI experts to ensure production-grade performance
What they're looking for
- LLM inference optimization at scale
- Distributed serving architectures for large language models
- Quantization techniques for transformer models
- Speculative decoding and draft model implementation
- Python and C++ programming
- CUDA programming and GPU optimization
- Custom CUDA kernel development (nice-to-have)
- Open-source LLM frameworks (vLLM, SGLang, TensorRT-LLM)
Benefits
- Work on safety-focused healthcare AI at a well-funded startup ($404M total funding)
- Collaborate with physicians, AI pioneers, and researchers from top institutions
- Five-day-per-week in-office role in Palo Alto supporting strong team culture
- Opportunity to shape category-defining healthcare AI technology
- Backed by leading healthcare and AI investors (a16z, Kleiner Perkins, CapitalG)
Likely interview questions
- Walk us through a specific LLM inference optimization project you've built—what techniques did you use, what were the performance bottlenecks, and how did you measure success?
- Describe your hands-on experience with quantization techniques like FP4/FP6. How have you balanced model quality with reduction in footprint, and what trade-offs did you encounter?
Opens the official application on the employer’s site. No login required.