Skip to main content

Abridge

Machine Learning Infrastructure Engineer, Model Inference

SF Office (Remote)$221k–$260kfulltimemidAdded 2 days ago

About this role

Abridge seeks an ML Infrastructure Engineer to design and optimize the inference infrastructure powering our healthcare AI platform. You'll build scalable Kubernetes clusters, develop high-performance model serving systems, and collaborate with research and product teams to deploy and optimize AI models that transform clinical documentation.

What you'll do

  • Design, deploy, and maintain scalable Kubernetes clusters for AI model inference and training
  • Develop and optimize ML model serving infrastructure for high performance and low-latency delivery
  • Collaborate with ML and product teams to scale backend infrastructure and optimize compute efficiency
  • Optimize GPU utilization and compute-heavy workflows for ML workloads
  • Build robust model API orchestration systems
  • Define and implement infrastructure scaling strategies to support company growth

What they're looking for

  • Kubernetes administration and container orchestration
  • Distributed systems architecture and design
  • ML model deployment and production experience
  • API development and real-time/batch workload management
  • Model serving frameworks (NVIDIA Triton Server, VLLM, TRT-LLM)
  • GPU cluster management and CUDA optimization
  • Infrastructure as code (Terraform, Ansible)
  • PyTorch or TensorFlow experience
Apply on the employer's site

Opens the official application on the employer’s site. No login required.