FluidStack
Software Engineer, Cloud Infrastructure
San Francisco, CA$175k–$300kfulltimemidAdded today
About this role
Fluidstack seeks a Software Engineer for Cloud Infrastructure to build observability platforms, control planes, and fleet management systems supporting tens of thousands of GPUs at hyperscale. You'll own critical infrastructure APIs and state management that enable the entire company to operate AI compute infrastructure reliably and at speed.
What you'll do
- Build and operate observability platform with data pipelines, correlation engine, and health checks for GPU fleet visibility
- Design and implement API surface for infrastructure that all internal teams depend on for fleet management
- Develop Kubernetes-based production control plane with unified machine management and distributed command execution
- Maintain fleet state as source of truth across provisioning, operations, and customer platforms
- Automate hardware onboarding (ZTP, DHCP, DNS, artifacts) for new GPU generations and data center sites
- Own end-to-end reliability and SLO compliance for infrastructure systems
What they're looking for
- Kubernetes and container orchestration
- API design and system architecture
- Observability platforms and data pipeline design
- Backend/distributed systems engineering
- Infrastructure automation and tooling
- Linux and networking fundamentals
- Relational or time-series databases
- Python or Go
Benefits
- Work on civilization-scale AI infrastructure
- Extreme ownership with full autonomy on end-to-end projects
- Fast-paced, high-impact environment focused on velocity
- Collaborative culture emphasizing first-principles thinking
Likely interview questions
- Describe a time you eliminated toil by automating a repetitive process. How did you approach it?
- Walk us through how you'd design an API that needs to support multiple internal teams managing heterogeneous hardware at scale.
Opens the official application on the employer’s site. No login required.