Skip to main content

FluidStack

Distributed Systems Engineer

San Francisco, CA$175k–$300kfulltimemidAdded today

About this role

Fluidstack seeks a Distributed Systems Engineer to build observability and control plane infrastructure for massive-scale GPU compute. You'll own the platform that makes tens of thousands of GPUs manageable and visible in real time, designing APIs and systems that every internal team depends on.

What you'll do

  • Build and operate the observability platform—data pipelines, decoration engines, and healthchecks that provide fleet visibility from site to device level
  • Design and implement the control plane API surface for unified machine management, state inspection, and distributed command execution across the fleet
  • Establish and maintain fleet state as the source of truth for SLOs, site lifecycle, and integration with internal and customer-facing platforms
  • Implement ZTP, DHCP, DNS, and artifact management to cleanly onboard new hardware generations and sites into production
  • Eliminate manual operational toil by automating repetitive infrastructure tasks and building self-service tooling
  • Own the Kubernetes-based infrastructure underpinning production control and ensure system state always matches reality

What they're looking for

  • Distributed systems design and operation at scale
  • Observability platforms and telemetry pipelines
  • API design and backward compatibility
  • Kubernetes and container orchestration
  • Infrastructure automation and tooling
  • Linux and networking fundamentals
  • Scripting and systems programming
  • Fleet management and provisioning systems

Benefits

  • Work on civilization-scale AI compute infrastructure
  • Extreme ownership and autonomy over end-to-end systems
  • High-velocity, first-principles problem-solving culture
  • San Francisco location

Likely interview questions

  • Tell us about a time you eliminated operational toil by building automation—what made you identify it as a problem, and how did you measure the impact?
  • Describe an API design you've worked on that needed to scale or evolve significantly. What did you get wrong the first time, and how would you approach it differently now?
Apply on the employer's site

Opens the official application on the employer’s site. No login required.