Skip to main content

openai

Software Engineer, Frontier Systems - Power Management

San Franciscofulltimemid

About this role

Join OpenAI's Frontier Systems team to architect and optimize power management software for the world's largest AI training supercomputers. You'll develop system-level solutions to monitor, stabilize, and efficiently manage power consumption across hyperscale infrastructure while collaborating with hardware and research teams.

What you'll do

  • Design and implement software solutions to optimize power usage and efficiency in large-scale supercomputer systems
  • Build automation to monitor power patterns during training workloads and develop algorithms to prevent grid instability
  • Create tools for real-time power fault detection and automated remediation across hardware and systems
  • Develop dynamic power throttling mechanisms that adjust consumption based on workload and infrastructure constraints
  • Collaborate with hardware teams to integrate power control requirements into IT hardware design
  • Perform deep system-level investigations into power-related issues and translate findings into automated solutions

What they're looking for

  • Python programming and automation scripting (shell, etc.)
  • Distributed systems and streaming data analysis
  • Electrical engineering concepts (signal processing, power systems, FFT)
  • System-level investigation and fault detection/remediation
  • Data analysis tools (SQL, PromQL, Pandas)
  • Cross-functional collaboration between hardware and software teams
  • Control systems fundamentals
  • Supercomputing or large-scale ML workload experience
Apply on the employer's site

Opens the official application on the employer’s site. No login required.