openai
Software Engineer, Frontier Systems - Power Management
San Franciscofulltimemid
About this role
Join OpenAI's Frontier Systems team to architect and optimize power management software for the world's largest AI training supercomputers. You'll develop system-level solutions to monitor, stabilize, and efficiently manage power consumption across hyperscale infrastructure while collaborating with hardware and research teams.
What you'll do
- Design and implement software solutions to optimize power usage and efficiency in large-scale supercomputer systems
- Build automation to monitor power patterns during training workloads and develop algorithms to prevent grid instability
- Create tools for real-time power fault detection and automated remediation across hardware and systems
- Develop dynamic power throttling mechanisms that adjust consumption based on workload and infrastructure constraints
- Collaborate with hardware teams to integrate power control requirements into IT hardware design
- Perform deep system-level investigations into power-related issues and translate findings into automated solutions
What they're looking for
- Python programming and automation scripting (shell, etc.)
- Distributed systems and streaming data analysis
- Electrical engineering concepts (signal processing, power systems, FFT)
- System-level investigation and fault detection/remediation
- Data analysis tools (SQL, PromQL, Pandas)
- Cross-functional collaboration between hardware and software teams
- Control systems fundamentals
- Supercomputing or large-scale ML workload experience
Opens the official application on the employer’s site. No login required.