agi-inc

ML Platform & Infrastructure Engineer

San Francisco OfficefulltimemidAdded 2 days ago

About this role

Join an AGI-focused startup to build ML infrastructure that powers everyday AI agents. You'll design training automation, evaluation systems, and research tools that enable rapid experimentation and reliable model deployment at scale.

What you'll do

Design and implement CI/CD pipelines for ML training workflows with robust data ingestion and artifact management
Build scalable evaluation harnesses that benchmark models automatically and catch performance regressions
Develop internal SDKs, CLIs, and dashboards enabling researchers to inspect model behavior and iterate efficiently
Implement comprehensive observability for model latency, throughput, GPU utilization, and inference costs
Optimize distributed workloads and GPU cluster performance for experimentation speed
Create alerting systems providing real-time visibility into system reliability and performance

What they're looking for

Python programming
ML infrastructure and MLOps
CI/CD pipeline design for ML workflows
Cloud infrastructure (AWS or GCP)
Docker and Kubernetes containerization
LLM serving stacks (vLLM, TGI preferred)
Internal developer tools and dashboard development
Distributed systems and GPU cluster management

Benefits

Competitive company-sponsored medical, dental, and vision insurance
Top-tier relocation and immigration support
In-person collaboration in San Francisco with fast-moving team
Direct impact on AGI research and consumer AI products
Work with elite founders and researchers from Stanford, OpenAI, and DeepMind
Opportunity to define systems shaping research velocity and product quality

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.