openai
Security Reliability Engineer
San Franciscofulltimemid
About this role
OpenAI is seeking a Security Reliability Engineer to design, build, and operate secure, reliable infrastructure for identity and access systems across the company. This role owns the full lifecycle of on-premises and hybrid infrastructure, replacing fragile one-off systems with standardized, production-grade infrastructure-as-code solutions that scale with the organization.
What you'll do
- Define and evolve infrastructure patterns for on-premises and hybrid environments using SRE principles
- Own full lifecycle of infrastructure systems including deployment, upgrades, patching, recovery, and operations
- Design and implement infrastructure-as-code using Terraform, Chef, and Ansible for shared services and identity platforms
- Build and operate monitoring, alerting, and incident response mechanisms to meet high availability targets
- Lead incident response and postmortems, driving durable fixes and organizational learning
- Identify and execute high-leverage automation opportunities to eliminate manual toil and reduce operational risk
What they're looking for
- Site Reliability Engineering (SRE) principles and practices
- Infrastructure-as-code tools (Terraform, Ansible, Chef)
- Kubernetes and Docker containerization
- Identity and access management (IAM) systems and Microsoft Entra
- Azure cloud platform and subscription management
- Git-based workflows and version control
- Incident response and postmortem leadership
- Systems design and architectural thinking
Opens the official application on the employer’s site. No login required.