Skip to main content

Baseten

Software Engineer - GPU Kernels

San Francisco (Remote)$180k–$360kfulltimemidAdded 2 days ago

About this role

Baseten seeks a GPU Kernel Engineer to optimize high-performance kernels for AI inference workloads, powering production systems at companies like Cursor and Notion. You'll design and implement cutting-edge CUDA kernels for matrix operations, attention mechanisms, and advanced features like quantization, directly impacting the performance of millions of users.

What you'll do

  • Design and implement high-performance GPU kernels for ML operations including matrix multiplications, attention mechanisms, and mixture-of-experts routing
  • Write optimized code using CUDA, PTX assembly, and architecture-specific techniques
  • Apply performance optimization methods such as memory coalescing, tensor core acceleration, and compute/memory overlap
  • Implement cutting-edge features like quantization (FP8/FP4), sparsity, and distributed compute techniques
  • Identify and resolve performance bottlenecks using profiling tools like Nsight Systems and Torch Profiler
  • Collaborate with research teams to productionize theoretical advancements and contribute to open-source GPU libraries

What they're looking for

  • GPU architecture and CUDA programming
  • C++ and performance profiling
  • Memory hierarchy optimization and bandwidth tuning
  • Transformer models and attention mechanism optimization
  • GPU kernel libraries (Cutlass, Triton, Thrust, CUB)
  • GEMM tuning and distributed multi-GPU compute
  • Quantization and numerical precision strategies
  • Modern GPU features like tensor cores and async operations
Apply on the employer's site

Opens the official application on the employer’s site. No login required.