Baseten

Software Engineer — GPU Networking & Distributed Systems

San Francisco (Remote)$165k–$330kfulltimemidAdded 2 days ago

About this role

Baseten seeks a GPU networking engineer to architect the software infrastructure that optimizes communication across thousands of GPUs for distributed AI inference. You'll integrate RDMA/RoCE capabilities, optimize distributed serving patterns, and co-design networking alongside compute to achieve wire-speed data movement on cutting-edge hardware.

What you'll do

Integrate RDMA, RoCE, and InfiniBand as first-class primitives in the inference stack
Optimize networking for disaggregated KV cache offload and wide expert parallelism across GPU clusters
Design observability tools to visualize packet flow, congestion, and bandwidth across interconnects
Characterize performance on next-generation hardware (B200/B300, Blackwell, NVL72) and write acceptance tests
Implement custom communication kernels and optimize NCCL/NVSHMEM for overlapped compute-transfer
Enable sub-10-second cold starts for large language models via checkpointing and storage optimization

What they're looking for

High-performance networking protocols (InfiniBand, RoCE v2)
C++ and Python with hardware optimization experience
NVIDIA GPU architecture knowledge (H100, Blackwell, memory hierarchy)
NCCL, NVSHMEM, and UCX libraries
Distributed systems and performance debugging
TensorRT-LLM, vLLM, or similar inference frameworks
GPUDirect Storage or high-performance filesystem experience
Hardware benchmarking and cluster qualification

Benefits

Work on bleeding-edge hardware including Blackwell and Rubin architectures
Recently raised $1.5B Series F funding led by top-tier investors
Impact AI deployment for companies like Cursor, Notion, and OpenAI partners
Deep technical challenges at the intersection of networking and compute
San Francisco location with access to cutting-edge GPU clusters

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.