Anthropic

ML/Research Engineer, Safeguards

San Francisco, CA | New York City, NYFrom $500kmidAdded 2 days ago

About this role

Anthropic is hiring ML/Research Engineers to build systems that detect and prevent misuse of AI products, including coordinated attacks and harmful behaviors. You'll develop classifiers, monitoring systems, and defenses while conducting research on adversarial robustness and red-teaming.

What you'll do

Develop classifiers and synthetic data pipelines to detect misuse and anomalous behavior at scale
Build monitoring systems to identify coordinated harms across multiple interactions
Evaluate and improve safety of agentic products, including threat modeling and prompt injection defenses
Conduct research on automated red-teaming and adversarial robustness techniques
Work across research-to-deployment pipeline from experiments to production systems

What they're looking for

Machine learning engineering or applied research (4+ years)
Python proficiency
ML systems development
Language modeling and transformers
Anomaly detection and behavioral ML
Adversarial machine learning or red-teaming
Interpretability research
Large-scale ML systems

Benefits

Annual compensation: $350,000–$500,000 USD
Visa sponsorship available
Hybrid work policy (minimum 25% office time)
Work on AI safety and responsible scaling initiatives
Collaborative research environment with policy and business experts

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.