Anthropic
ML/Research Engineer, Safeguards
San Francisco, CA | New York City, NYFrom $500kmidAdded 2 days ago
About this role
Anthropic is hiring ML/Research Engineers to build systems that detect and prevent misuse of AI products, including coordinated attacks and harmful behaviors. You'll develop classifiers, monitoring systems, and defenses while conducting research on adversarial robustness and red-teaming.
What you'll do
- Develop classifiers and synthetic data pipelines to detect misuse and anomalous behavior at scale
- Build monitoring systems to identify coordinated harms across multiple interactions
- Evaluate and improve safety of agentic products, including threat modeling and prompt injection defenses
- Conduct research on automated red-teaming and adversarial robustness techniques
- Work across research-to-deployment pipeline from experiments to production systems
What they're looking for
- Machine learning engineering or applied research (4+ years)
- Python proficiency
- ML systems development
- Language modeling and transformers
- Anomaly detection and behavioral ML
- Adversarial machine learning or red-teaming
- Interpretability research
- Large-scale ML systems
Benefits
- Annual compensation: $350,000–$500,000 USD
- Visa sponsorship available
- Hybrid work policy (minimum 25% office time)
- Work on AI safety and responsible scaling initiatives
- Collaborative research environment with policy and business experts
Opens the official application on the employer’s site. No login required.