Skip to main content

Hippocratic AI

LLM Inference Engineer

Palo AltofulltimemidAdded today

About this role

Hippocratic AI, a healthcare-focused generative AI company, seeks an LLM Inference Engineer to optimize and scale their large language model serving infrastructure. You'll design distributed inference architectures, implement advanced optimization techniques, and ensure efficient deployment of safety-critical clinical AI systems.

What you'll do

  • Design and implement multi-node serving architectures for distributed LLM inference
  • Optimize multi-LoRA serving systems and apply quantization techniques (FP4/FP6)
  • Implement speculative decoding and latency optimization strategies
  • Develop disaggregated serving solutions with optimized caching for prefill and decoding
  • Benchmark and improve system performance across deployment scenarios and GPU types
  • Collaborate with healthcare and AI experts to ensure production-grade performance

What they're looking for

  • LLM inference optimization at scale
  • Distributed serving architectures for large language models
  • Quantization techniques for transformer models
  • Speculative decoding and draft model implementation
  • Python and C++ programming
  • CUDA programming and GPU optimization
  • Custom CUDA kernel development (nice-to-have)
  • Open-source LLM frameworks (vLLM, SGLang, TensorRT-LLM)

Benefits

  • Work on safety-focused healthcare AI at a well-funded startup ($404M total funding)
  • Collaborate with physicians, AI pioneers, and researchers from top institutions
  • Five-day-per-week in-office role in Palo Alto supporting strong team culture
  • Opportunity to shape category-defining healthcare AI technology
  • Backed by leading healthcare and AI investors (a16z, Kleiner Perkins, CapitalG)

Likely interview questions

  • Walk us through a specific LLM inference optimization project you've built—what techniques did you use, what were the performance bottlenecks, and how did you measure success?
  • Describe your hands-on experience with quantization techniques like FP4/FP6. How have you balanced model quality with reduction in footprint, and what trade-offs did you encounter?
Apply on the employer's site

Opens the official application on the employer’s site. No login required.