Baseten

Software Engineer - GPU Kernels

San Francisco (Remote)$180k–$360kfulltimemidAdded 2 days ago

About this role

Baseten seeks a GPU Kernel Engineer to optimize high-performance kernels for AI inference workloads, powering production systems at companies like Cursor and Notion. You'll design and implement cutting-edge CUDA kernels for matrix operations, attention mechanisms, and advanced features like quantization, directly impacting the performance of millions of users.

What you'll do

Design and implement high-performance GPU kernels for ML operations including matrix multiplications, attention mechanisms, and mixture-of-experts routing
Write optimized code using CUDA, PTX assembly, and architecture-specific techniques
Apply performance optimization methods such as memory coalescing, tensor core acceleration, and compute/memory overlap
Implement cutting-edge features like quantization (FP8/FP4), sparsity, and distributed compute techniques
Identify and resolve performance bottlenecks using profiling tools like Nsight Systems and Torch Profiler
Collaborate with research teams to productionize theoretical advancements and contribute to open-source GPU libraries

What they're looking for

GPU architecture and CUDA programming
C++ and performance profiling
Memory hierarchy optimization and bandwidth tuning
Transformer models and attention mechanism optimization
GPU kernel libraries (Cutlass, Triton, Thrust, CUB)
GEMM tuning and distributed multi-GPU compute
Quantization and numerical precision strategies
Modern GPU features like tensor cores and async operations

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.