Skip to main content

Anthropic

Software Engineer, Safeguards

San Francisco, CA | New York City, NYFrom $485kmidAdded 2 days ago

About this role

Anthropic is seeking a Software Engineer for their Safeguards team to build safety and oversight systems for AI models. You'll develop monitoring tools, detection mechanisms, and multi-layered defenses to prevent misuse and ensure user protection across their API platforms.

What you'll do

  • Develop monitoring systems to detect unwanted API partner behaviors and create automated enforcement actions
  • Build abuse detection mechanisms and infrastructure for AI systems
  • Surface abuse patterns to research teams to improve model training and hardening
  • Create robust, real-time, scalable safety defense systems
  • Surface findings in internal dashboards for analyst review

What they're looking for

  • Python and TypeScript proficiency
  • Full-stack software development
  • Abuse and fraud detection systems
  • Technical communication with non-technical stakeholders
  • AI/ML trust and safety mechanisms
  • Prompt engineering and adversarial input understanding
  • Internal tooling and operational systems

Benefits

  • Annual salary: $320,000–$485,000 USD
  • Hybrid work policy with 25% minimum office time
  • Visa sponsorship available
  • Work on AI safety and beneficial AI systems
  • Collaborative team of researchers and engineers
Apply on the employer's site

Opens the official application on the employer’s site. No login required.