openai

Software Engineer, Inference – AMD GPU Enablement

San Franciscofulltimemid

About this role

OpenAI is seeking a Software Engineer to optimize and scale inference infrastructure on AMD GPU platforms. You'll work across the full stack—from kernel-level performance tuning to distributed execution—to enable efficient deployment of large AI models on AMD accelerators.

What you'll do

Own bring-up, correctness, and performance optimization of OpenAI's inference stack on AMD hardware
Integrate model-serving infrastructure like vLLM and Triton into GPU-backed systems
Debug and optimize distributed inference workloads across memory, network, and compute layers
Develop and optimize high-performance GPU kernels using HIP, Triton, or similar frameworks
Build and tune collective communication libraries (e.g., RCCL) for multi-GPU parallelization
Validate correctness, performance, and scalability on large GPU clusters

What they're looking for

GPU kernel development (HIP, CUDA, or Triton)
Distributed inference systems and multi-GPU scaling
Communication libraries (NCCL/RCCL)
Low-level GPU performance optimization
End-to-end performance debugging across hardware and software layers
GPU profiling tools (Nsight, rocprof, perf)
Model/tensor parallelism and mixed precision
C++ or Python systems programming

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.