Anthropic

Software Engineer, Safeguards Evals

San Francisco, CA | New York City, NYFrom $485kmidAdded 2 days ago

About this role

Anthropic is seeking a Software Engineer to build evaluation infrastructure for an AI-powered abuse detection agent. You'll design rigorous experiments, construct real-world datasets, and ship evaluation pipelines that measure how well the system catches misuse across various harm areas.

What you'll do

Design and maintain evaluation harness for an agentic investigation system, including metrics, test cases, and grading approaches
Build high-quality datasets representing real-world misuse scenarios across harm domains using real traffic and synthetic generation
Measure agent performance on detection precision/recall, investigation quality, and robustness across complex harm areas
Identify measurement gaps and evolve evaluations to remain unsaturated as agent capabilities advance
Productionize research into regression and release pipelines for agent changes and model upgrades
Create tools enabling policy experts to author and iterate on evaluations independently

What they're looking for

Python and full-stack development
Data pipeline design and maintenance
Large language models and agentic systems
Data analysis and statistical insight extraction
Research prototyping and production-quality code
Agent evaluation frameworks and benchmarking
Trust and safety or abuse detection systems
Prompt engineering and LLM applications

Benefits

Competitive annual salary of $320,000–$485,000 USD
Work on critical AI safety and alignment challenges
Flexible location options in San Francisco or New York City

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.