Skip to main content

openai

Software Engineer, Inference – AMD GPU Enablement

San Franciscofulltimemid

About this role

OpenAI is seeking a Software Engineer to optimize and scale inference infrastructure on AMD GPU platforms. You'll work across the full stack—from kernel-level performance tuning to distributed execution—to enable efficient deployment of large AI models on AMD accelerators.

What you'll do

  • Own bring-up, correctness, and performance optimization of OpenAI's inference stack on AMD hardware
  • Integrate model-serving infrastructure like vLLM and Triton into GPU-backed systems
  • Debug and optimize distributed inference workloads across memory, network, and compute layers
  • Develop and optimize high-performance GPU kernels using HIP, Triton, or similar frameworks
  • Build and tune collective communication libraries (e.g., RCCL) for multi-GPU parallelization
  • Validate correctness, performance, and scalability on large GPU clusters

What they're looking for

  • GPU kernel development (HIP, CUDA, or Triton)
  • Distributed inference systems and multi-GPU scaling
  • Communication libraries (NCCL/RCCL)
  • Low-level GPU performance optimization
  • End-to-end performance debugging across hardware and software layers
  • GPU profiling tools (Nsight, rocprof, perf)
  • Model/tensor parallelism and mixed precision
  • C++ or Python systems programming
Apply on the employer's site

Opens the official application on the employer’s site. No login required.