Baseten
Software Engineer - GPU Kernels
San Francisco (Remote)$180k–$360kfulltimemidAdded 2 days ago
About this role
Baseten seeks a GPU Kernel Engineer to optimize high-performance kernels for AI inference workloads, powering production systems at companies like Cursor and Notion. You'll design and implement cutting-edge CUDA kernels for matrix operations, attention mechanisms, and advanced features like quantization, directly impacting the performance of millions of users.
What you'll do
- Design and implement high-performance GPU kernels for ML operations including matrix multiplications, attention mechanisms, and mixture-of-experts routing
- Write optimized code using CUDA, PTX assembly, and architecture-specific techniques
- Apply performance optimization methods such as memory coalescing, tensor core acceleration, and compute/memory overlap
- Implement cutting-edge features like quantization (FP8/FP4), sparsity, and distributed compute techniques
- Identify and resolve performance bottlenecks using profiling tools like Nsight Systems and Torch Profiler
- Collaborate with research teams to productionize theoretical advancements and contribute to open-source GPU libraries
What they're looking for
- GPU architecture and CUDA programming
- C++ and performance profiling
- Memory hierarchy optimization and bandwidth tuning
- Transformer models and attention mechanism optimization
- GPU kernel libraries (Cutlass, Triton, Thrust, CUB)
- GEMM tuning and distributed multi-GPU compute
- Quantization and numerical precision strategies
- Modern GPU features like tensor cores and async operations
Opens the official application on the employer’s site. No login required.