Character.AI

Research Engineer, AI Safety & Alignment

Redwood City, CA (Remote)$225k–$400kfulltimemidAdded 2 days ago

About this role

Develop and implement AI safety techniques to make large language models more reliable, honest, and aligned with human values. This role combines cutting-edge research in model alignment and interpretability with practical engineering to protect millions of users.

What you'll do

Design evaluation methodologies and metrics to assess safety and alignment of large language models
Research and develop alignment techniques including value learning, interpretability, and RLHF
Conduct adversarial testing to identify vulnerabilities and failure modes
Mitigate biases, toxicity, and harmful behaviors in models through fine-tuning and reinforcement learning
Translate safety research into scalable solutions with engineering and product teams
Publish findings and contribute to AI safety research community

What they're looking for

Machine learning and transformers architecture
Reinforcement learning from human feedback (RLHF)
Production code development (Python or similar)
GPU training, serving, and debugging
Data pipelines and infrastructure
Model interpretability and explainability
Adversarial testing and vulnerability assessment
Distributed model training

Benefits

Work on critical AI safety challenges affecting millions of users
Contribute to academic publications and present at conferences
Collaborate with leading AI research team at a unicorn AI company
Located in Redwood City, CA
Opportunity to shape responsible AI development

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.