Skip to main content

Periodic Labs

RL Systems Engineer

Menlo Park$250k–$350kfulltimemidAdded today

About this role

Periodic Labs seeks an RL Systems Engineer to build the infrastructure layer that couples fast model training and inference with reinforcement learning feedback loops for scientific discovery. You'll optimize GPU cluster scheduling, implement low-latency communication primitives, and collaborate closely with researchers to co-design systems and algorithms that maximize the speed of the RL loop.

What you'll do

  • Design rack-aware scheduling across Ray, Slurm, and Kubernetes for GPU clusters with heterogeneous topologies
  • Build profilers to identify and eliminate bottlenecks in training and inference stacks
  • Implement S3 checkpoint streaming and optimize I/O for large-scale training runs
  • Write and optimize CUDA kernels and communication primitives for maximum hardware throughput
  • Engineer zero-copy RDMA weight synchronization between training and inference systems
  • Collaborate with ML researchers on algorithm-infrastructure co-design and upstream open-source contributions

What they're looking for

  • GPU cluster orchestration (Ray, Slurm, Kubernetes)
  • Low-level systems programming (RDMA, NVLink, kernel optimization)
  • CUDA kernel development and optimization
  • Distributed ML infrastructure and profiling
  • Large-scale inference architecture and load balancing
  • Checkpoint management and cloud storage integration
  • Performance benchmarking and bottleneck analysis
  • Open-source ML framework contributions

Benefits

  • Competitive base salary $250,000-$350,000 plus equity
  • Visa sponsorship available
  • Work at the intersection of research and infrastructure on frontier AI problems
  • Close collaboration with world-class ML researchers
  • Opportunity to influence open-source ML ecosystem roadmaps
  • Locations in Menlo Park and San Francisco
Apply on the employer's site

Opens the official application on the employer’s site. No login required.