openai

Software Engineer, Inference - Multi Modal

San Franciscofulltimemid

About this role

Join OpenAI's Inference team to build scalable infrastructure for serving multimodal AI models including image, audio, and speech capabilities in production. You'll work on high-performance systems that handle complex, heterogeneous workloads while collaborating closely with research and product teams.

What you'll do

Design and implement inference infrastructure for large-scale multimodal models
Optimize systems for high-throughput, low-latency delivery of image and audio inputs/outputs
Enable research workflows to transition into reliable production services
Collaborate with researchers, infrastructure, and product teams on model deployment
Improve system-level performance including GPU utilization and tensor parallelism
Own end-to-end problems in distributed compute and data handling systems

What they're looking for

GPU-based ML workload optimization
Large language model or multimodal inference systems
Distributed computing and networking
Inference frameworks (vLLM, TensorRT-LLM, or custom systems)
High-throughput data pipeline design
Image and audio processing systems
Systems engineering and performance tuning
Model parallelism and hardware abstraction

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.