Skip to main content

openai

Software Engineer, System Enablement

San Franciscofulltimemid

About this role

Join OpenAI's Scaling team to transform early-stage hardware into production-ready fleet capacity. You'll own the complete lifecycle from bare metal to scheduled Kubernetes nodes, managing imaging, provisioning, cluster integration, and operational readiness across lab and cloud environments.

What you'll do

  • Lead end-to-end hardware bring-up from bare metal to schedulable fleet capacity, including imaging and cluster bootstrap
  • Build and maintain golden images and provisioning workflows across lab and production environments
  • Integrate new nodes into fleet infrastructure and IaC pipelines (Terraform, Chef) across cloud and internal systems
  • Coordinate with scheduling teams to ensure new hardware is reachable, schedulable, and properly configured
  • Drive node registration, inventory correctness, and end-to-end visibility
  • Implement baseline health telemetry, pass/fail checks, and automated reporting for early ramp decisions

What they're looking for

  • Linux systems administration and infrastructure operations (5+ years)
  • Kubernetes cluster operations and node lifecycle management
  • Infrastructure-as-Code tools (Terraform, Chef, Ansible)
  • Provisioning and imaging (PXE/iPXE, cloud-init, golden images)
  • Networking fundamentals (L2/L3, routing, DNS, troubleshooting)
  • Python, Go, or Bash automation and tooling
  • Hardware bring-up and early silicon validation (preferred)
  • Multi-cloud operations (Azure, GCP, AWS, OCI)
Apply on the employer's site

Opens the official application on the employer’s site. No login required.