Skip to main content

FluidStack

Production Engineer, IaaS

San Francisco, CA$175k–$300kfulltimemidAdded 2 days ago

About this role

Fluidstack is building AI compute infrastructure at massive scale and seeks a Production Engineer to own the observability platform, control plane, and fleet management systems that enable safe operation of tens of thousands of GPUs across distributed data centers.

What you'll do

  • Design and operate observability platform for real-time fleet visibility from site to individual device level
  • Define infrastructure API contracts and control plane used by all teams for machine management and command execution
  • Build data pipelines and health-check frameworks that make GPU fleets legible and queryable
  • Maintain fleet state as authoritative source of truth across provisioning, operations, and customer platforms
  • Onboard new hardware generations and sites through zero-touch provisioning and infrastructure automation
  • Eliminate manual toil by building systems that scale rather than requiring human intervention

What they're looking for

  • Large-scale infrastructure and distributed systems
  • Observability/telemetry platform design
  • API design and control plane architecture
  • Kubernetes or container orchestration
  • Systems programming and automation
  • Fleet management and hardware integration
  • Problem-solving at first principles
  • End-to-end ownership mindset
Apply on the employer's site

Opens the official application on the employer’s site. No login required.