databricks
Software Engineer - GenAI inference
San Francisco, Californiamid
About this role
Join Databricks as a Software Engineer to design and optimize the inference engine powering their Foundation Model API. You'll work across the full GenAI inference stack—from GPU kernels to distributed orchestration—ensuring LLM serving systems are fast, scalable, and efficient at production scale.
What you'll do
- Design and implement inference engine and model-serving stack optimized for large-scale LLM inference
- Collaborate with researchers to integrate new model architectures and features like sparsity and mixture-of-experts into production
- Optimize for latency, throughput, memory efficiency, and hardware utilization across GPUs and accelerators
- Build instrumentation, profiling, and tracing tools to identify and resolve performance bottlenecks
- Develop routing, batching, scheduling, and memory management mechanisms for distributed inference workloads
- Ensure reliability, fault tolerance, and reproducibility through A/B testing, rollback, and model versioning
What they're looking for
- CUDA and GPU programming (cuBLAS, cuDNN, NCCL)
- ML inference fundamentals (attention, MLPs, quantization, sparse operations)
- Distributed systems design (RPC frameworks, sharding, memory partitioning)
- Performance profiling and bottleneck analysis across kernel, memory, and network layers
- System instrumentation and tracing tools
- Collaboration with ML researchers to productionize novel ideas
- Software engineering in performance-critical systems (3+ years)
- Python, C++, or low-level systems programming
Benefits
- Competitive salary: $142,200–$204,600 USD
- Annual performance bonus eligibility
- Equity compensation
- Work at the intersection of cutting-edge research and production systems
- Opportunity to contribute to open-source ML infrastructure
Opens the official application on the employer’s site. No login required.