openai

Software Engineer, System Enablement

San Franciscofulltimemid

About this role

Join OpenAI's Scaling team to transform early-stage hardware into production-ready fleet capacity. You'll own the complete lifecycle from bare metal to scheduled Kubernetes nodes, managing imaging, provisioning, cluster integration, and operational readiness across lab and cloud environments.

What you'll do

Lead end-to-end hardware bring-up from bare metal to schedulable fleet capacity, including imaging and cluster bootstrap
Build and maintain golden images and provisioning workflows across lab and production environments
Integrate new nodes into fleet infrastructure and IaC pipelines (Terraform, Chef) across cloud and internal systems
Coordinate with scheduling teams to ensure new hardware is reachable, schedulable, and properly configured
Drive node registration, inventory correctness, and end-to-end visibility
Implement baseline health telemetry, pass/fail checks, and automated reporting for early ramp decisions

What they're looking for

Linux systems administration and infrastructure operations (5+ years)
Kubernetes cluster operations and node lifecycle management
Infrastructure-as-Code tools (Terraform, Chef, Ansible)
Provisioning and imaging (PXE/iPXE, cloud-init, golden images)
Networking fundamentals (L2/L3, routing, DNS, troubleshooting)
Python, Go, or Bash automation and tooling
Hardware bring-up and early silicon validation (preferred)
Multi-cloud operations (Azure, GCP, AWS, OCI)

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.