Skip to main content

Bespoke Labs

Backend Engineer

Mountain View (Remote)fulltimemidAdded 2 days ago

About this role

Bespoke Labs seeks an Infrastructure Engineer to design and own the execution layer for RL training environments, enabling agents to operate coherently across multi-tool systems for extended periods. You'll build production systems for sandboxing, state management, performance optimization, and observability to support long-horizon agent training at scale.

What you'll do

  • Design sandboxing and execution infrastructure with snapshot/restore capabilities for environment state management across long agent rollouts
  • Develop failure detection and recovery systems to identify issues early and revert environments to known-good states
  • Optimize platform performance, throughput, latency, and cost-per-rollout across thousands of concurrent long-running environments
  • Build frameworks for specifying, packaging, and deploying RL environments used internally and by external customers
  • Create debugging and observability tooling for researchers to analyze failures across hundreds of agent traces
  • Scale prototypes into production systems with high engineering standards and comprehensive documentation

What they're looking for

  • Distributed systems and execution engine design
  • Container and sandboxing infrastructure (namespaces, cgroups, VMs, gVisor, Firecracker)
  • Profiling, scheduling, and resource utilization optimization
  • Cloud platforms (GCP, AWS)
  • Python programming
  • Systems languages (Rust, Go, or C++)
  • Filesystem and process state management
  • Testing, validation, and reliability engineering
Apply on the employer's site

Opens the official application on the employer’s site. No login required.