Skip to main content

openai

Training: ML Framework Engineer

San Franciscofulltimemid

About this role

Join OpenAI's Training Runtime team to optimize distributed machine learning infrastructure that powers frontier-scale model training. You'll enhance training throughput by profiling and optimizing frameworks, applying cutting-edge techniques, and enabling researchers to build next-generation models on massive GPU clusters.

What you'll do

  • Apply advanced techniques to achieve hardware efficiency in internal training framework
  • Profile and optimize training framework performance
  • Collaborate with researchers to support next-generation model development
  • Design and implement state-of-the-art AI model optimizations
  • Write high-quality, bug-free machine learning code
  • Improve distributed system performance at scale

What they're looking for

  • Python proficiency
  • Machine learning systems optimization
  • Distributed systems knowledge
  • Performance profiling and debugging
  • Software engineering best practices
  • ML framework experience
  • Hardware efficiency optimization
  • Supercomputer performance understanding

Benefits

  • Hybrid work model (3 days/week in San Francisco office)
  • Relocation assistance available
  • Work on frontier-scale AI training systems
  • Collaborate with leading AI researchers
  • Impact large-scale GPU cluster performance
Apply on the employer's site

Opens the official application on the employer’s site. No login required.