Skip to main content

Anthropic

Software Engineer, Safeguards Evals

San Francisco, CA | New York City, NYFrom $485kmidAdded 2 days ago

About this role

Anthropic is seeking a Software Engineer to build evaluation infrastructure for an AI-powered abuse detection agent. You'll design rigorous experiments, construct real-world datasets, and ship evaluation pipelines that measure how well the system catches misuse across various harm areas.

What you'll do

  • Design and maintain evaluation harness for an agentic investigation system, including metrics, test cases, and grading approaches
  • Build high-quality datasets representing real-world misuse scenarios across harm domains using real traffic and synthetic generation
  • Measure agent performance on detection precision/recall, investigation quality, and robustness across complex harm areas
  • Identify measurement gaps and evolve evaluations to remain unsaturated as agent capabilities advance
  • Productionize research into regression and release pipelines for agent changes and model upgrades
  • Create tools enabling policy experts to author and iterate on evaluations independently

What they're looking for

  • Python and full-stack development
  • Data pipeline design and maintenance
  • Large language models and agentic systems
  • Data analysis and statistical insight extraction
  • Research prototyping and production-quality code
  • Agent evaluation frameworks and benchmarking
  • Trust and safety or abuse detection systems
  • Prompt engineering and LLM applications

Benefits

  • Competitive annual salary of $320,000–$485,000 USD
  • Work on critical AI safety and alignment challenges
  • Flexible location options in San Francisco or New York City
Apply on the employer's site

Opens the official application on the employer’s site. No login required.