Periodic Labs

RL Systems Engineer

Menlo Park$250k–$350kfulltimemidAdded today

About this role

Periodic Labs seeks an RL Systems Engineer to build the infrastructure layer that couples fast model training and inference with reinforcement learning feedback loops for scientific discovery. You'll optimize GPU cluster scheduling, implement low-latency communication primitives, and collaborate closely with researchers to co-design systems and algorithms that maximize the speed of the RL loop.

What you'll do

Design rack-aware scheduling across Ray, Slurm, and Kubernetes for GPU clusters with heterogeneous topologies
Build profilers to identify and eliminate bottlenecks in training and inference stacks
Implement S3 checkpoint streaming and optimize I/O for large-scale training runs
Write and optimize CUDA kernels and communication primitives for maximum hardware throughput
Engineer zero-copy RDMA weight synchronization between training and inference systems
Collaborate with ML researchers on algorithm-infrastructure co-design and upstream open-source contributions

What they're looking for

GPU cluster orchestration (Ray, Slurm, Kubernetes)
Low-level systems programming (RDMA, NVLink, kernel optimization)
CUDA kernel development and optimization
Distributed ML infrastructure and profiling
Large-scale inference architecture and load balancing
Checkpoint management and cloud storage integration
Performance benchmarking and bottleneck analysis
Open-source ML framework contributions

Benefits

Competitive base salary $250,000-$350,000 plus equity
Visa sponsorship available
Work at the intersection of research and infrastructure on frontier AI problems
Close collaboration with world-class ML researchers
Opportunity to influence open-source ML ecosystem roadmaps
Locations in Menlo Park and San Francisco

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.