Skip to main content

Anthropic

Research Engineer, Safeguards Labs

San Francisco, CA | New York City, NYFrom $850kmidAdded 2 days ago

About this role

Anthropic's Safeguards Labs is seeking a Research Engineer to develop and test novel safety methods for Claude, the AI assistant. You'll independently scope projects investigating misuse detection, model safeguards, and abuse prevention, running experiments from conception through potential production deployment.

What you'll do

  • Lead research projects on detecting Claude misuse, identifying malicious accounts, and strengthening model safeguards
  • Design and execute offline analyses on usage data to surface abuse patterns and build detection classifiers
  • Develop prototypes that feed into real-time safeguards systems, partnering with production engineers
  • Build evaluations and methodologies for measuring safeguard effectiveness, including in agentic settings
  • Investigate methods for detecting abusive behavior in chat and agent workflows
  • Communicate findings to Trust & Safety, research, and product teams

What they're looking for

  • Independent research project management
  • Python programming and large dataset handling
  • Large language model fundamentals (sampling, prompting, training)
  • Machine learning model development and training
  • Abuse and fraud detection systems
  • Evaluation methodology design for language models
  • Red teaming or adversarial ML experience
  • Production systems prototyping and tech transfer

Benefits

  • Competitive salary range: $350,000–$850,000 USD annually
  • Work on high-impact AI safety problems with direct societal benefit
  • Significant autonomy in scoping and directing research projects
  • Small team with 3:1 researcher-to-engineer mix allowing high leverage
  • Hybrid work in San Francisco or New York City
Apply on the employer's site

Opens the official application on the employer’s site. No login required.