Skip to main content

openai

Software Engineer, Inference - Multi Modal

San Franciscofulltimemid

About this role

Join OpenAI's Inference team to build scalable infrastructure for serving multimodal AI models including image, audio, and speech capabilities in production. You'll work on high-performance systems that handle complex, heterogeneous workloads while collaborating closely with research and product teams.

What you'll do

  • Design and implement inference infrastructure for large-scale multimodal models
  • Optimize systems for high-throughput, low-latency delivery of image and audio inputs/outputs
  • Enable research workflows to transition into reliable production services
  • Collaborate with researchers, infrastructure, and product teams on model deployment
  • Improve system-level performance including GPU utilization and tensor parallelism
  • Own end-to-end problems in distributed compute and data handling systems

What they're looking for

  • GPU-based ML workload optimization
  • Large language model or multimodal inference systems
  • Distributed computing and networking
  • Inference frameworks (vLLM, TensorRT-LLM, or custom systems)
  • High-throughput data pipeline design
  • Image and audio processing systems
  • Systems engineering and performance tuning
  • Model parallelism and hardware abstraction
Apply on the employer's site

Opens the official application on the employer’s site. No login required.