Skip to main content

agi-inc

ML Platform & Infrastructure Engineer

San Francisco OfficefulltimemidAdded 2 days ago

About this role

Join an AGI-focused startup to build ML infrastructure that powers everyday AI agents. You'll design training automation, evaluation systems, and research tools that enable rapid experimentation and reliable model deployment at scale.

What you'll do

  • Design and implement CI/CD pipelines for ML training workflows with robust data ingestion and artifact management
  • Build scalable evaluation harnesses that benchmark models automatically and catch performance regressions
  • Develop internal SDKs, CLIs, and dashboards enabling researchers to inspect model behavior and iterate efficiently
  • Implement comprehensive observability for model latency, throughput, GPU utilization, and inference costs
  • Optimize distributed workloads and GPU cluster performance for experimentation speed
  • Create alerting systems providing real-time visibility into system reliability and performance

What they're looking for

  • Python programming
  • ML infrastructure and MLOps
  • CI/CD pipeline design for ML workflows
  • Cloud infrastructure (AWS or GCP)
  • Docker and Kubernetes containerization
  • LLM serving stacks (vLLM, TGI preferred)
  • Internal developer tools and dashboard development
  • Distributed systems and GPU cluster management

Benefits

  • Competitive company-sponsored medical, dental, and vision insurance
  • Top-tier relocation and immigration support
  • In-person collaboration in San Francisco with fast-moving team
  • Direct impact on AGI research and consumer AI products
  • Work with elite founders and researchers from Stanford, OpenAI, and DeepMind
  • Opportunity to define systems shaping research velocity and product quality
Apply on the employer's site

Opens the official application on the employer’s site. No login required.