Anthropic

Performance Engineer, GPU

San Francisco, CA | New York City, NY | Seattle, WAFrom $850kmidAdded 2 days ago

About this role

Anthropic seeks a GPU Performance Engineer to optimize large-scale GPU systems powering Claude and advanced language models. You'll develop custom kernels, distributed architectures, and performance innovations across the full hardware-software stack to maximize efficiency at unprecedented scale.

What you'll do

Design and implement GPU kernel optimizations using CUDA, Triton, and tensor core techniques
Architect distributed communication strategies for multi-node GPU clusters
Optimize end-to-end training and inference pipelines for frontier language models
Build performance modeling frameworks to predict and improve GPU utilization
Profile production systems to identify and eliminate performance bottlenecks
Collaborate with hardware vendors to influence future accelerator capabilities

What they're looking for

GPU programming and optimization (CUDA, Triton, CUTLASS)
ML frameworks and compilers (PyTorch, JAX, XLA, torch.compile)
Distributed systems (NCCL, NVLink, collective communication)
Kernel fusion and memory bandwidth optimization
Low-precision quantization (INT8/FP8) and mixed-precision techniques
Performance profiling tools (Nsight)
Large-scale system design and cluster orchestration
Model parallelism and distributed training infrastructure

Benefits

$280,000–$850,000 annual salary
Hybrid work policy (minimum 25% office presence)
Work on state-of-the-art AI systems with real-world impact
Collaborative environment with world-class researchers and engineers
Opportunity to influence hardware and software evolution

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.