openai
Software Engineer, Inference – AMD GPU Enablement
San Franciscofulltimemid
About this role
OpenAI is seeking a Software Engineer to optimize and scale inference infrastructure on AMD GPU platforms. You'll work across the full stack—from kernel-level performance tuning to distributed execution—to enable efficient deployment of large AI models on AMD accelerators.
What you'll do
- Own bring-up, correctness, and performance optimization of OpenAI's inference stack on AMD hardware
- Integrate model-serving infrastructure like vLLM and Triton into GPU-backed systems
- Debug and optimize distributed inference workloads across memory, network, and compute layers
- Develop and optimize high-performance GPU kernels using HIP, Triton, or similar frameworks
- Build and tune collective communication libraries (e.g., RCCL) for multi-GPU parallelization
- Validate correctness, performance, and scalability on large GPU clusters
What they're looking for
- GPU kernel development (HIP, CUDA, or Triton)
- Distributed inference systems and multi-GPU scaling
- Communication libraries (NCCL/RCCL)
- Low-level GPU performance optimization
- End-to-end performance debugging across hardware and software layers
- GPU profiling tools (Nsight, rocprof, perf)
- Model/tensor parallelism and mixed precision
- C++ or Python systems programming
Opens the official application on the employer’s site. No login required.