Anthropic

Research Engineer, Safeguards Labs

San Francisco, CA | New York City, NYFrom $850kmidAdded 2 days ago

About this role

Anthropic's Safeguards Labs is seeking a Research Engineer to develop and test novel safety methods for Claude, the AI assistant. You'll independently scope projects investigating misuse detection, model safeguards, and abuse prevention, running experiments from conception through potential production deployment.

What you'll do

Lead research projects on detecting Claude misuse, identifying malicious accounts, and strengthening model safeguards
Design and execute offline analyses on usage data to surface abuse patterns and build detection classifiers
Develop prototypes that feed into real-time safeguards systems, partnering with production engineers
Build evaluations and methodologies for measuring safeguard effectiveness, including in agentic settings
Investigate methods for detecting abusive behavior in chat and agent workflows
Communicate findings to Trust & Safety, research, and product teams

What they're looking for

Independent research project management
Python programming and large dataset handling
Large language model fundamentals (sampling, prompting, training)
Machine learning model development and training
Abuse and fraud detection systems
Evaluation methodology design for language models
Red teaming or adversarial ML experience
Production systems prototyping and tech transfer

Benefits

Competitive salary range: $350,000–$850,000 USD annually
Work on high-impact AI safety problems with direct societal benefit
Significant autonomy in scoping and directing research projects
Small team with 3:1 researcher-to-engineer mix allowing high leverage
Hybrid work in San Francisco or New York City

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.