Anthropic
Research Engineer, Interpretability
San Francisco, CAFrom $560kmidAdded 2 days ago
About this role
Anthropic seeks a Research Engineer to build infrastructure powering interpretability research on large language models. You'll develop specialized tools for understanding how AI systems work, from training and inference stacks to activation analysis, directly supporting AI safety efforts.
What you'll do
- Build and maintain specialized inference and training infrastructure for interpretability research, including instrumented passes and steering vector application
- Identify and resolve scaling bottlenecks through profiling and optimization
- Design tools and abstractions enabling researchers to experiment efficiently
- Integrate interpretability research into production safety audits with high reliability standards
- Work across the full stack from model internals to user-facing research tooling
- Collaborate with researchers to translate research needs into engineering solutions
What they're looking for
- Software engineering (5-10+ years)
- Python proficiency and one additional language (Rust, Go, Java)
- Distributed systems optimization
- Performance profiling and bottleneck analysis
- Machine learning infrastructure
- Quick learning across unfamiliar technical domains
- Prioritization and decision-making under ambiguity
- Cross-functional collaboration
Opens the official application on the employer’s site. No login required.