agi-inc
ML Platform & Infrastructure Engineer
San Francisco OfficefulltimemidAdded 2 days ago
About this role
Join an AGI-focused startup to build ML infrastructure that powers everyday AI agents. You'll design training automation, evaluation systems, and research tools that enable rapid experimentation and reliable model deployment at scale.
What you'll do
- Design and implement CI/CD pipelines for ML training workflows with robust data ingestion and artifact management
- Build scalable evaluation harnesses that benchmark models automatically and catch performance regressions
- Develop internal SDKs, CLIs, and dashboards enabling researchers to inspect model behavior and iterate efficiently
- Implement comprehensive observability for model latency, throughput, GPU utilization, and inference costs
- Optimize distributed workloads and GPU cluster performance for experimentation speed
- Create alerting systems providing real-time visibility into system reliability and performance
What they're looking for
- Python programming
- ML infrastructure and MLOps
- CI/CD pipeline design for ML workflows
- Cloud infrastructure (AWS or GCP)
- Docker and Kubernetes containerization
- LLM serving stacks (vLLM, TGI preferred)
- Internal developer tools and dashboard development
- Distributed systems and GPU cluster management
Benefits
- Competitive company-sponsored medical, dental, and vision insurance
- Top-tier relocation and immigration support
- In-person collaboration in San Francisco with fast-moving team
- Direct impact on AGI research and consumer AI products
- Work with elite founders and researchers from Stanford, OpenAI, and DeepMind
- Opportunity to define systems shaping research velocity and product quality
Opens the official application on the employer’s site. No login required.