Foxglove
ML Platform Engineer
San Francisco, CA$183k–$310kfulltimemidAdded 2 days ago
About this role
Foxglove seeks an ML Platform Engineer to design and operate production infrastructure for a robotics data platform handling petabyte-scale multimodal data. You'll own inference serving, embedding pipelines, training infrastructure, and cloud systems that enable rapid ML iteration across distributed robotics deployments.
What you'll do
- Design and operate production inference infrastructure including model serving, autoscaling, and cost optimization
- Own embedding and retrieval pipeline architecture for multimodal robotics data (images, video, point clouds, timeseries)
- Build training and evaluation infrastructure with job orchestration, experiment tracking, and dataset versioning
- Make cloud infrastructure decisions (AWS/GCP) balancing latency, throughput, reliability, and cost
- Create platform abstractions enabling product engineers to ship ML features without infrastructure management
- Evaluate and integrate third-party ML infrastructure components with clear build vs. buy frameworks
What they're looking for
- Production ML infrastructure and model serving (vLLM, Triton, TorchServe)
- Distributed systems and cloud infrastructure (AWS/GCP)
- Vector databases and retrieval systems at scale
- Model optimization and cloud cost management
- ML orchestration and experiment tracking
- System reliability and operational design
- Communication across ML and engineering teams
- Fine-tuning and domain adaptation for LLMs (bonus)
Benefits
- $300 monthly commuter/workspace budget
- Competitive equity in Series B company
- 100% medical, dental, vision, and life insurance for employees; 75% for dependents
- 401(k) matching up to 4%
- 4 weeks vacation plus holidays and winter break
- All-expenses-paid company off-sites twice yearly
Opens the official application on the employer’s site. No login required.