Skip to main content

databricks

Software Engineer - GenAI inference

San Francisco, Californiamid

About this role

Join Databricks as a Software Engineer to design and optimize the inference engine powering their Foundation Model API. You'll work across the full GenAI inference stack—from GPU kernels to distributed orchestration—ensuring LLM serving systems are fast, scalable, and efficient at production scale.

What you'll do

  • Design and implement inference engine and model-serving stack optimized for large-scale LLM inference
  • Collaborate with researchers to integrate new model architectures and features like sparsity and mixture-of-experts into production
  • Optimize for latency, throughput, memory efficiency, and hardware utilization across GPUs and accelerators
  • Build instrumentation, profiling, and tracing tools to identify and resolve performance bottlenecks
  • Develop routing, batching, scheduling, and memory management mechanisms for distributed inference workloads
  • Ensure reliability, fault tolerance, and reproducibility through A/B testing, rollback, and model versioning

What they're looking for

  • CUDA and GPU programming (cuBLAS, cuDNN, NCCL)
  • ML inference fundamentals (attention, MLPs, quantization, sparse operations)
  • Distributed systems design (RPC frameworks, sharding, memory partitioning)
  • Performance profiling and bottleneck analysis across kernel, memory, and network layers
  • System instrumentation and tracing tools
  • Collaboration with ML researchers to productionize novel ideas
  • Software engineering in performance-critical systems (3+ years)
  • Python, C++, or low-level systems programming

Benefits

  • Competitive salary: $142,200–$204,600 USD
  • Annual performance bonus eligibility
  • Equity compensation
  • Work at the intersection of cutting-edge research and production systems
  • Opportunity to contribute to open-source ML infrastructure
Apply on the employer's site

Opens the official application on the employer’s site. No login required.