Abridge

Machine Learning Infrastructure Engineer, Model Inference

SF Office (Remote)$221k–$260kfulltimemidAdded 2 days ago

About this role

Abridge seeks an ML Infrastructure Engineer to design and optimize the inference infrastructure powering our healthcare AI platform. You'll build scalable Kubernetes clusters, develop high-performance model serving systems, and collaborate with research and product teams to deploy and optimize AI models that transform clinical documentation.

What you'll do

Design, deploy, and maintain scalable Kubernetes clusters for AI model inference and training
Develop and optimize ML model serving infrastructure for high performance and low-latency delivery
Collaborate with ML and product teams to scale backend infrastructure and optimize compute efficiency
Optimize GPU utilization and compute-heavy workflows for ML workloads
Build robust model API orchestration systems
Define and implement infrastructure scaling strategies to support company growth

What they're looking for

Kubernetes administration and container orchestration
Distributed systems architecture and design
ML model deployment and production experience
API development and real-time/batch workload management
Model serving frameworks (NVIDIA Triton Server, VLLM, TRT-LLM)
GPU cluster management and CUDA optimization
Infrastructure as code (Terraform, Ansible)
PyTorch or TensorFlow experience

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.