Anthropic
Performance Engineer, GPU
San Francisco, CA | New York City, NY | Seattle, WAFrom $850kmidAdded 2 days ago
About this role
Anthropic seeks a GPU Performance Engineer to optimize large-scale GPU systems powering Claude and advanced language models. You'll develop custom kernels, distributed architectures, and performance innovations across the full hardware-software stack to maximize efficiency at unprecedented scale.
What you'll do
- Design and implement GPU kernel optimizations using CUDA, Triton, and tensor core techniques
- Architect distributed communication strategies for multi-node GPU clusters
- Optimize end-to-end training and inference pipelines for frontier language models
- Build performance modeling frameworks to predict and improve GPU utilization
- Profile production systems to identify and eliminate performance bottlenecks
- Collaborate with hardware vendors to influence future accelerator capabilities
What they're looking for
- GPU programming and optimization (CUDA, Triton, CUTLASS)
- ML frameworks and compilers (PyTorch, JAX, XLA, torch.compile)
- Distributed systems (NCCL, NVLink, collective communication)
- Kernel fusion and memory bandwidth optimization
- Low-precision quantization (INT8/FP8) and mixed-precision techniques
- Performance profiling tools (Nsight)
- Large-scale system design and cluster orchestration
- Model parallelism and distributed training infrastructure
Benefits
- $280,000–$850,000 annual salary
- Hybrid work policy (minimum 25% office presence)
- Work on state-of-the-art AI systems with real-world impact
- Collaborative environment with world-class researchers and engineers
- Opportunity to influence hardware and software evolution
Opens the official application on the employer’s site. No login required.