Skip to main content

Anthropic

Research Engineer, Interpretability

San Francisco, CAFrom $560kmidAdded 2 days ago

About this role

Anthropic seeks a Research Engineer to build infrastructure powering interpretability research on large language models. You'll develop specialized tools for understanding how AI systems work, from training and inference stacks to activation analysis, directly supporting AI safety efforts.

What you'll do

  • Build and maintain specialized inference and training infrastructure for interpretability research, including instrumented passes and steering vector application
  • Identify and resolve scaling bottlenecks through profiling and optimization
  • Design tools and abstractions enabling researchers to experiment efficiently
  • Integrate interpretability research into production safety audits with high reliability standards
  • Work across the full stack from model internals to user-facing research tooling
  • Collaborate with researchers to translate research needs into engineering solutions

What they're looking for

  • Software engineering (5-10+ years)
  • Python proficiency and one additional language (Rust, Go, Java)
  • Distributed systems optimization
  • Performance profiling and bottleneck analysis
  • Machine learning infrastructure
  • Quick learning across unfamiliar technical domains
  • Prioritization and decision-making under ambiguity
  • Cross-functional collaboration
Apply on the employer's site

Opens the official application on the employer’s site. No login required.