Baseten
Software Engineer — GPU Networking & Distributed Systems
San Francisco (Remote)$165k–$330kfulltimemidAdded 2 days ago
About this role
Baseten seeks a GPU networking engineer to architect the software infrastructure that optimizes communication across thousands of GPUs for distributed AI inference. You'll integrate RDMA/RoCE capabilities, optimize distributed serving patterns, and co-design networking alongside compute to achieve wire-speed data movement on cutting-edge hardware.
What you'll do
- Integrate RDMA, RoCE, and InfiniBand as first-class primitives in the inference stack
- Optimize networking for disaggregated KV cache offload and wide expert parallelism across GPU clusters
- Design observability tools to visualize packet flow, congestion, and bandwidth across interconnects
- Characterize performance on next-generation hardware (B200/B300, Blackwell, NVL72) and write acceptance tests
- Implement custom communication kernels and optimize NCCL/NVSHMEM for overlapped compute-transfer
- Enable sub-10-second cold starts for large language models via checkpointing and storage optimization
What they're looking for
- High-performance networking protocols (InfiniBand, RoCE v2)
- C++ and Python with hardware optimization experience
- NVIDIA GPU architecture knowledge (H100, Blackwell, memory hierarchy)
- NCCL, NVSHMEM, and UCX libraries
- Distributed systems and performance debugging
- TensorRT-LLM, vLLM, or similar inference frameworks
- GPUDirect Storage or high-performance filesystem experience
- Hardware benchmarking and cluster qualification
Benefits
- Work on bleeding-edge hardware including Blackwell and Rubin architectures
- Recently raised $1.5B Series F funding led by top-tier investors
- Impact AI deployment for companies like Cursor, Notion, and OpenAI partners
- Deep technical challenges at the intersection of networking and compute
- San Francisco location with access to cutting-edge GPU clusters
Opens the official application on the employer’s site. No login required.