openai

Software Engineer, Model Inference

San Franciscofulltimemid

About this role

Optimize OpenAI's most advanced AI models for high-performance, low-latency production inference at massive scale. Work with researchers and engineers to enhance model serving infrastructure on Azure GPUs while solving critical performance and reliability challenges.

What you'll do

Collaborate with ML researchers and engineers to deploy cutting-edge models into production systems
Design and implement optimizations to improve inference performance, latency, throughput, and resource efficiency
Build monitoring and debugging tools to identify bottlenecks and system instabilities
Maximize GPU utilization and memory efficiency across distributed Azure infrastructure
Develop new techniques and architectures to enhance the inference stack
Support research acceleration through robust engineering solutions

What they're looking for

Software engineering (5+ years professional experience)
PyTorch and GPU optimization (CUDA, NCCL)
Distributed systems architecture and debugging
Performance-critical systems optimization
ML inference techniques and optimization
HPC technologies (InfiniBand, MPI, NVLink)
Production system scaling and refactoring
Azure cloud infrastructure

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.