Anthropic

Research Engineer, Interpretability

San Francisco, CAFrom $560kmidAdded 2 days ago

About this role

Anthropic seeks a Research Engineer to build infrastructure powering interpretability research on large language models. You'll develop specialized tools for understanding how AI systems work, from training and inference stacks to activation analysis, directly supporting AI safety efforts.

What you'll do

Build and maintain specialized inference and training infrastructure for interpretability research, including instrumented passes and steering vector application
Identify and resolve scaling bottlenecks through profiling and optimization
Design tools and abstractions enabling researchers to experiment efficiently
Integrate interpretability research into production safety audits with high reliability standards
Work across the full stack from model internals to user-facing research tooling
Collaborate with researchers to translate research needs into engineering solutions

What they're looking for

Software engineering (5-10+ years)
Python proficiency and one additional language (Rust, Go, Java)
Distributed systems optimization
Performance profiling and bottleneck analysis
Machine learning infrastructure
Quick learning across unfamiliar technical domains
Prioritization and decision-making under ambiguity
Cross-functional collaboration

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.