Skip to main content

Foxglove

ML Platform Engineer

San Francisco, CA$183k–$310kfulltimemidAdded 2 days ago

About this role

Foxglove seeks an ML Platform Engineer to design and operate production infrastructure for a robotics data platform handling petabyte-scale multimodal data. You'll own inference serving, embedding pipelines, training infrastructure, and cloud systems that enable rapid ML iteration across distributed robotics deployments.

What you'll do

  • Design and operate production inference infrastructure including model serving, autoscaling, and cost optimization
  • Own embedding and retrieval pipeline architecture for multimodal robotics data (images, video, point clouds, timeseries)
  • Build training and evaluation infrastructure with job orchestration, experiment tracking, and dataset versioning
  • Make cloud infrastructure decisions (AWS/GCP) balancing latency, throughput, reliability, and cost
  • Create platform abstractions enabling product engineers to ship ML features without infrastructure management
  • Evaluate and integrate third-party ML infrastructure components with clear build vs. buy frameworks

What they're looking for

  • Production ML infrastructure and model serving (vLLM, Triton, TorchServe)
  • Distributed systems and cloud infrastructure (AWS/GCP)
  • Vector databases and retrieval systems at scale
  • Model optimization and cloud cost management
  • ML orchestration and experiment tracking
  • System reliability and operational design
  • Communication across ML and engineering teams
  • Fine-tuning and domain adaptation for LLMs (bonus)

Benefits

  • $300 monthly commuter/workspace budget
  • Competitive equity in Series B company
  • 100% medical, dental, vision, and life insurance for employees; 75% for dependents
  • 401(k) matching up to 4%
  • 4 weeks vacation plus holidays and winter break
  • All-expenses-paid company off-sites twice yearly
Apply on the employer's site

Opens the official application on the employer’s site. No login required.