openai
Software Engineer, ML Systems & Training Architecture
San Franciscofulltimemid
About this role
OpenAI seeks a Senior Software Engineer to strengthen their robotics team's ML training infrastructure. You'll maintain and improve training frameworks, debug complex ML systems, and unblock researchers by ensuring code quality and infrastructure reliability while working 5 days/week in San Francisco.
What you'll do
- Review, improve, and refactor code across training frameworks and infrastructure
- Identify and prevent risky or low-quality changes through code review
- Debug issues spanning ML training systems, GPUs, clusters, and networking
- Unblock researchers and engineers from broken training jobs and workflow failures
- Enhance reliability, maintainability, and usability of the training framework
- Ship practical engineering solutions that directly improve team velocity
What they're looking for
- Strong software engineering fundamentals and code review expertise
- ML systems, training frameworks, or distributed systems experience
- GPU and infrastructure debugging
- Ability to read and debug unfamiliar codebases quickly
- High-velocity shipping with pragmatic judgment
- Experience with messy or fast-moving codebases
- Root cause analysis and problem-solving
Benefits
- Compensation: $295K-$380K USD
- Relocation assistance provided
- Work on cutting-edge robotics and AGI research
- 5 days/week in-office (San Francisco)
- Collaborative environment with researchers and engineers
Opens the official application on the employer’s site. No login required.