Anthropic
Software Engineer, Safeguards Evals
San Francisco, CA | New York City, NYFrom $485kmidAdded 2 days ago
About this role
Anthropic is seeking a Software Engineer to build evaluation infrastructure for an AI-powered abuse detection agent. You'll design rigorous experiments, construct real-world datasets, and ship evaluation pipelines that measure how well the system catches misuse across various harm areas.
What you'll do
- Design and maintain evaluation harness for an agentic investigation system, including metrics, test cases, and grading approaches
- Build high-quality datasets representing real-world misuse scenarios across harm domains using real traffic and synthetic generation
- Measure agent performance on detection precision/recall, investigation quality, and robustness across complex harm areas
- Identify measurement gaps and evolve evaluations to remain unsaturated as agent capabilities advance
- Productionize research into regression and release pipelines for agent changes and model upgrades
- Create tools enabling policy experts to author and iterate on evaluations independently
What they're looking for
- Python and full-stack development
- Data pipeline design and maintenance
- Large language models and agentic systems
- Data analysis and statistical insight extraction
- Research prototyping and production-quality code
- Agent evaluation frameworks and benchmarking
- Trust and safety or abuse detection systems
- Prompt engineering and LLM applications
Benefits
- Competitive annual salary of $320,000–$485,000 USD
- Work on critical AI safety and alignment challenges
- Flexible location options in San Francisco or New York City
Opens the official application on the employer’s site. No login required.