openai
Software Engineer, Model Inference
San Franciscofulltimemid
About this role
Optimize OpenAI's most advanced AI models for high-performance, low-latency production inference at massive scale. Work with researchers and engineers to enhance model serving infrastructure on Azure GPUs while solving critical performance and reliability challenges.
What you'll do
- Collaborate with ML researchers and engineers to deploy cutting-edge models into production systems
- Design and implement optimizations to improve inference performance, latency, throughput, and resource efficiency
- Build monitoring and debugging tools to identify bottlenecks and system instabilities
- Maximize GPU utilization and memory efficiency across distributed Azure infrastructure
- Develop new techniques and architectures to enhance the inference stack
- Support research acceleration through robust engineering solutions
What they're looking for
- Software engineering (5+ years professional experience)
- PyTorch and GPU optimization (CUDA, NCCL)
- Distributed systems architecture and debugging
- Performance-critical systems optimization
- ML inference techniques and optimization
- HPC technologies (InfiniBand, MPI, NVLink)
- Production system scaling and refactoring
- Azure cloud infrastructure
Opens the official application on the employer’s site. No login required.