Skip to main content

openai

Software Engineer, Model Inference

San Franciscofulltimemid

About this role

Optimize OpenAI's most advanced AI models for high-performance, low-latency production inference at massive scale. Work with researchers and engineers to enhance model serving infrastructure on Azure GPUs while solving critical performance and reliability challenges.

What you'll do

  • Collaborate with ML researchers and engineers to deploy cutting-edge models into production systems
  • Design and implement optimizations to improve inference performance, latency, throughput, and resource efficiency
  • Build monitoring and debugging tools to identify bottlenecks and system instabilities
  • Maximize GPU utilization and memory efficiency across distributed Azure infrastructure
  • Develop new techniques and architectures to enhance the inference stack
  • Support research acceleration through robust engineering solutions

What they're looking for

  • Software engineering (5+ years professional experience)
  • PyTorch and GPU optimization (CUDA, NCCL)
  • Distributed systems architecture and debugging
  • Performance-critical systems optimization
  • ML inference techniques and optimization
  • HPC technologies (InfiniBand, MPI, NVLink)
  • Production system scaling and refactoring
  • Azure cloud infrastructure
Apply on the employer's site

Opens the official application on the employer’s site. No login required.