Abridge
Machine Learning Infrastructure Engineer, Model Inference
SF Office (Remote)$221k–$260kfulltimemidAdded 2 days ago
About this role
Abridge seeks an ML Infrastructure Engineer to design and optimize the inference infrastructure powering our healthcare AI platform. You'll build scalable Kubernetes clusters, develop high-performance model serving systems, and collaborate with research and product teams to deploy and optimize AI models that transform clinical documentation.
What you'll do
- Design, deploy, and maintain scalable Kubernetes clusters for AI model inference and training
- Develop and optimize ML model serving infrastructure for high performance and low-latency delivery
- Collaborate with ML and product teams to scale backend infrastructure and optimize compute efficiency
- Optimize GPU utilization and compute-heavy workflows for ML workloads
- Build robust model API orchestration systems
- Define and implement infrastructure scaling strategies to support company growth
What they're looking for
- Kubernetes administration and container orchestration
- Distributed systems architecture and design
- ML model deployment and production experience
- API development and real-time/batch workload management
- Model serving frameworks (NVIDIA Triton Server, VLLM, TRT-LLM)
- GPU cluster management and CUDA optimization
- Infrastructure as code (Terraform, Ansible)
- PyTorch or TensorFlow experience
Opens the official application on the employer’s site. No login required.