Skip to main content

openai

Security Reliability Engineer

San Franciscofulltimemid

About this role

OpenAI is seeking a Security Reliability Engineer to design, build, and operate secure, reliable infrastructure for identity and access systems across the company. This role owns the full lifecycle of on-premises and hybrid infrastructure, replacing fragile one-off systems with standardized, production-grade infrastructure-as-code solutions that scale with the organization.

What you'll do

  • Define and evolve infrastructure patterns for on-premises and hybrid environments using SRE principles
  • Own full lifecycle of infrastructure systems including deployment, upgrades, patching, recovery, and operations
  • Design and implement infrastructure-as-code using Terraform, Chef, and Ansible for shared services and identity platforms
  • Build and operate monitoring, alerting, and incident response mechanisms to meet high availability targets
  • Lead incident response and postmortems, driving durable fixes and organizational learning
  • Identify and execute high-leverage automation opportunities to eliminate manual toil and reduce operational risk

What they're looking for

  • Site Reliability Engineering (SRE) principles and practices
  • Infrastructure-as-code tools (Terraform, Ansible, Chef)
  • Kubernetes and Docker containerization
  • Identity and access management (IAM) systems and Microsoft Entra
  • Azure cloud platform and subscription management
  • Git-based workflows and version control
  • Incident response and postmortem leadership
  • Systems design and architectural thinking
Apply on the employer's site

Opens the official application on the employer’s site. No login required.