openai
Reliability/DFX Engineer
San Franciscofulltimemid
About this role
OpenAI seeks an experienced Reliability/DFX Engineer to design and implement design-for-excellence features in next-generation AI accelerator chips. This cross-stack role bridges chip design, platform architecture, and hardware health teams to architect reliable systems from concept through high-volume production.
What you'll do
- Oversee DFX architecture, implementation, and deployment in silicon, proposing high-ROI features for reliability and fault tolerance
- Build system-level reliability models using empirical data to guide organizational DFX strategy
- Collaborate with chip and platform teams to implement DFX features including digital/mixed-signal IP and firmware
- Partner with hardware health teams to improve reliability in new product introduction and high-volume manufacturing phases
- Drive data-driven improvements through experimental design and analysis across the hardware stack
- Serve as DFX/reliability advocate to align industry ecosystem with OpenAI's requirements
What they're looking for
- RTL design and design-for-test (DFT)
- Reliability modeling and empirical data analysis
- ML chip and platform architecture understanding
- Physical implementation or silicon ATE experience
- System-level hardware design
- ML workload characterization
- Cross-functional collaboration
- Problem-solving at scale
Opens the official application on the employer’s site. No login required.