FluidStack
Production Engineer, IaaS
San Francisco, CA$175k–$300kfulltimemidAdded 2 days ago
About this role
Fluidstack is building AI compute infrastructure at massive scale and seeks a Production Engineer to own the observability platform, control plane, and fleet management systems that enable safe operation of tens of thousands of GPUs across distributed data centers.
What you'll do
- Design and operate observability platform for real-time fleet visibility from site to individual device level
- Define infrastructure API contracts and control plane used by all teams for machine management and command execution
- Build data pipelines and health-check frameworks that make GPU fleets legible and queryable
- Maintain fleet state as authoritative source of truth across provisioning, operations, and customer platforms
- Onboard new hardware generations and sites through zero-touch provisioning and infrastructure automation
- Eliminate manual toil by building systems that scale rather than requiring human intervention
What they're looking for
- Large-scale infrastructure and distributed systems
- Observability/telemetry platform design
- API design and control plane architecture
- Kubernetes or container orchestration
- Systems programming and automation
- Fleet management and hardware integration
- Problem-solving at first principles
- End-to-end ownership mindset
Opens the official application on the employer’s site. No login required.