Skip to main content

Anthropic

Performance Engineer, GPU

San Francisco, CA | New York City, NY | Seattle, WAFrom $850kmidAdded 2 days ago

About this role

Anthropic seeks a GPU Performance Engineer to optimize large-scale GPU systems powering Claude and advanced language models. You'll develop custom kernels, distributed architectures, and performance innovations across the full hardware-software stack to maximize efficiency at unprecedented scale.

What you'll do

  • Design and implement GPU kernel optimizations using CUDA, Triton, and tensor core techniques
  • Architect distributed communication strategies for multi-node GPU clusters
  • Optimize end-to-end training and inference pipelines for frontier language models
  • Build performance modeling frameworks to predict and improve GPU utilization
  • Profile production systems to identify and eliminate performance bottlenecks
  • Collaborate with hardware vendors to influence future accelerator capabilities

What they're looking for

  • GPU programming and optimization (CUDA, Triton, CUTLASS)
  • ML frameworks and compilers (PyTorch, JAX, XLA, torch.compile)
  • Distributed systems (NCCL, NVLink, collective communication)
  • Kernel fusion and memory bandwidth optimization
  • Low-precision quantization (INT8/FP8) and mixed-precision techniques
  • Performance profiling tools (Nsight)
  • Large-scale system design and cluster orchestration
  • Model parallelism and distributed training infrastructure

Benefits

  • $280,000–$850,000 annual salary
  • Hybrid work policy (minimum 25% office presence)
  • Work on state-of-the-art AI systems with real-world impact
  • Collaborative environment with world-class researchers and engineers
  • Opportunity to influence hardware and software evolution
Apply on the employer's site

Opens the official application on the employer’s site. No login required.