Bespoke Labs
Backend Engineer
Mountain View (Remote)fulltimemidAdded 2 days ago
About this role
Bespoke Labs seeks an Infrastructure Engineer to design and own the execution layer for RL training environments, enabling agents to operate coherently across multi-tool systems for extended periods. You'll build production systems for sandboxing, state management, performance optimization, and observability to support long-horizon agent training at scale.
What you'll do
- Design sandboxing and execution infrastructure with snapshot/restore capabilities for environment state management across long agent rollouts
- Develop failure detection and recovery systems to identify issues early and revert environments to known-good states
- Optimize platform performance, throughput, latency, and cost-per-rollout across thousands of concurrent long-running environments
- Build frameworks for specifying, packaging, and deploying RL environments used internally and by external customers
- Create debugging and observability tooling for researchers to analyze failures across hundreds of agent traces
- Scale prototypes into production systems with high engineering standards and comprehensive documentation
What they're looking for
- Distributed systems and execution engine design
- Container and sandboxing infrastructure (namespaces, cgroups, VMs, gVisor, Firecracker)
- Profiling, scheduling, and resource utilization optimization
- Cloud platforms (GCP, AWS)
- Python programming
- Systems languages (Rust, Go, or C++)
- Filesystem and process state management
- Testing, validation, and reliability engineering
Opens the official application on the employer’s site. No login required.