openai

Training: ML Framework Engineer

San Franciscofulltimemid

About this role

Join OpenAI's Training Runtime team to optimize distributed machine learning infrastructure that powers frontier-scale model training. You'll enhance training throughput by profiling and optimizing frameworks, applying cutting-edge techniques, and enabling researchers to build next-generation models on massive GPU clusters.

What you'll do

Apply advanced techniques to achieve hardware efficiency in internal training framework
Profile and optimize training framework performance
Collaborate with researchers to support next-generation model development
Design and implement state-of-the-art AI model optimizations
Write high-quality, bug-free machine learning code
Improve distributed system performance at scale

What they're looking for

Python proficiency
Machine learning systems optimization
Distributed systems knowledge
Performance profiling and debugging
Software engineering best practices
ML framework experience
Hardware efficiency optimization
Supercomputer performance understanding

Benefits

Hybrid work model (3 days/week in San Francisco office)
Relocation assistance available
Work on frontier-scale AI training systems
Collaborate with leading AI researchers
Impact large-scale GPU cluster performance

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.