Anthropic

Software Engineer, Safeguards

San Francisco, CA | New York City, NYFrom $485kmidAdded 2 days ago

About this role

Anthropic is seeking a Software Engineer for their Safeguards team to build safety and oversight systems for AI models. You'll develop monitoring tools, detection mechanisms, and multi-layered defenses to prevent misuse and ensure user protection across their API platforms.

What you'll do

Develop monitoring systems to detect unwanted API partner behaviors and create automated enforcement actions
Build abuse detection mechanisms and infrastructure for AI systems
Surface abuse patterns to research teams to improve model training and hardening
Create robust, real-time, scalable safety defense systems
Surface findings in internal dashboards for analyst review

What they're looking for

Python and TypeScript proficiency
Full-stack software development
Abuse and fraud detection systems
Technical communication with non-technical stakeholders
AI/ML trust and safety mechanisms
Prompt engineering and adversarial input understanding
Internal tooling and operational systems

Benefits

Annual salary: $320,000–$485,000 USD
Hybrid work policy with 25% minimum office time
Visa sponsorship available
Work on AI safety and beneficial AI systems
Collaborative team of researchers and engineers

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.