Skip to main content

Baseten

Infrastructure Ops Engineer

San Francisco (Remote)$120k–$160kfulltimemidAdded 2 days ago

About this role

Join Baseten as an Infrastructure Ops Engineer to manage GPU-powered Kubernetes clusters serving leading AI companies. You'll bridge technical customer needs with hardware operations, orchestrating maintenance cycles, capacity fulfillment, and fleet health while partnering with SRE and infrastructure teams.

What you'll do

  • Manage daily GPU fleet operations including node tainting, draining, and PVC repairs
  • Partner with Sales to scope and fulfill customer capacity requests with clear ETAs
  • Act as operational bridge between SRE and infrastructure teams during maintenance windows
  • Identify capacity lifecycle gaps and drive process improvements with better observability
  • Build automation and tooling to reduce manual workflows and incident resolution time
  • Maintain GPU-specific knowledge base (H100/A100/B200) to accelerate future troubleshooting

What they're looking for

  • Kubernetes and container orchestration
  • Cloud infrastructure operations
  • Technical project management and coordination
  • GPU hardware knowledge (H100/A100/B200)
  • Scripting and automation
  • Clear technical communication with engineers and vendors
  • Attention to detail and ownership mindset
  • Debugging cluster-level issues

Benefits

  • Competitive compensation with meaningful equity
  • 100% medical, dental, and vision insurance coverage for employee and dependents
  • Flexible PTO with company-wide Winter Break closure
  • SF or NYC office locations with collaborative team environment
Apply on the employer's site

Opens the official application on the employer’s site. No login required.