Skip to main content

openai

Software Engineer, ML Systems & Training Architecture

San Franciscofulltimemid

About this role

OpenAI seeks a Senior Software Engineer to strengthen their robotics team's ML training infrastructure. You'll maintain and improve training frameworks, debug complex ML systems, and unblock researchers by ensuring code quality and infrastructure reliability while working 5 days/week in San Francisco.

What you'll do

  • Review, improve, and refactor code across training frameworks and infrastructure
  • Identify and prevent risky or low-quality changes through code review
  • Debug issues spanning ML training systems, GPUs, clusters, and networking
  • Unblock researchers and engineers from broken training jobs and workflow failures
  • Enhance reliability, maintainability, and usability of the training framework
  • Ship practical engineering solutions that directly improve team velocity

What they're looking for

  • Strong software engineering fundamentals and code review expertise
  • ML systems, training frameworks, or distributed systems experience
  • GPU and infrastructure debugging
  • Ability to read and debug unfamiliar codebases quickly
  • High-velocity shipping with pragmatic judgment
  • Experience with messy or fast-moving codebases
  • Root cause analysis and problem-solving

Benefits

  • Compensation: $295K-$380K USD
  • Relocation assistance provided
  • Work on cutting-edge robotics and AGI research
  • 5 days/week in-office (San Francisco)
  • Collaborative environment with researchers and engineers
Apply on the employer's site

Opens the official application on the employer’s site. No login required.