openai
Software Engineer, Inference - Multi Modal
San Franciscofulltimemid
About this role
Join OpenAI's Inference team to build scalable infrastructure for serving multimodal AI models including image, audio, and speech capabilities in production. You'll work on high-performance systems that handle complex, heterogeneous workloads while collaborating closely with research and product teams.
What you'll do
- Design and implement inference infrastructure for large-scale multimodal models
- Optimize systems for high-throughput, low-latency delivery of image and audio inputs/outputs
- Enable research workflows to transition into reliable production services
- Collaborate with researchers, infrastructure, and product teams on model deployment
- Improve system-level performance including GPU utilization and tensor parallelism
- Own end-to-end problems in distributed compute and data handling systems
What they're looking for
- GPU-based ML workload optimization
- Large language model or multimodal inference systems
- Distributed computing and networking
- Inference frameworks (vLLM, TensorRT-LLM, or custom systems)
- High-throughput data pipeline design
- Image and audio processing systems
- Systems engineering and performance tuning
- Model parallelism and hardware abstraction
Opens the official application on the employer’s site. No login required.