Skip to main content

Anthropic

ML/Research Engineer, Safeguards

San Francisco, CA | New York City, NYFrom $500kmidAdded 2 days ago

About this role

Anthropic is hiring ML/Research Engineers to build systems that detect and prevent misuse of AI products, including coordinated attacks and harmful behaviors. You'll develop classifiers, monitoring systems, and defenses while conducting research on adversarial robustness and red-teaming.

What you'll do

  • Develop classifiers and synthetic data pipelines to detect misuse and anomalous behavior at scale
  • Build monitoring systems to identify coordinated harms across multiple interactions
  • Evaluate and improve safety of agentic products, including threat modeling and prompt injection defenses
  • Conduct research on automated red-teaming and adversarial robustness techniques
  • Work across research-to-deployment pipeline from experiments to production systems

What they're looking for

  • Machine learning engineering or applied research (4+ years)
  • Python proficiency
  • ML systems development
  • Language modeling and transformers
  • Anomaly detection and behavioral ML
  • Adversarial machine learning or red-teaming
  • Interpretability research
  • Large-scale ML systems

Benefits

  • Annual compensation: $350,000–$500,000 USD
  • Visa sponsorship available
  • Hybrid work policy (minimum 25% office time)
  • Work on AI safety and responsible scaling initiatives
  • Collaborative research environment with policy and business experts
Apply on the employer's site

Opens the official application on the employer’s site. No login required.