databricks

Software Engineer - GenAI inference

San Francisco, Californiamid

About this role

Join Databricks as a Software Engineer to design and optimize the inference engine powering their Foundation Model API. You'll work across the full GenAI inference stack—from GPU kernels to distributed orchestration—ensuring LLM serving systems are fast, scalable, and efficient at production scale.

What you'll do

Design and implement inference engine and model-serving stack optimized for large-scale LLM inference
Collaborate with researchers to integrate new model architectures and features like sparsity and mixture-of-experts into production
Optimize for latency, throughput, memory efficiency, and hardware utilization across GPUs and accelerators
Build instrumentation, profiling, and tracing tools to identify and resolve performance bottlenecks
Develop routing, batching, scheduling, and memory management mechanisms for distributed inference workloads
Ensure reliability, fault tolerance, and reproducibility through A/B testing, rollback, and model versioning

What they're looking for

CUDA and GPU programming (cuBLAS, cuDNN, NCCL)
ML inference fundamentals (attention, MLPs, quantization, sparse operations)
Distributed systems design (RPC frameworks, sharding, memory partitioning)
Performance profiling and bottleneck analysis across kernel, memory, and network layers
System instrumentation and tracing tools
Collaboration with ML researchers to productionize novel ideas
Software engineering in performance-critical systems (3+ years)
Python, C++, or low-level systems programming

Benefits

Competitive salary: $142,200–$204,600 USD
Annual performance bonus eligibility
Equity compensation
Work at the intersection of cutting-edge research and production systems
Opportunity to contribute to open-source ML infrastructure

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.