Character.AI
Research Engineer, AI Safety & Alignment
Redwood City, CA (Remote)$225k–$400kfulltimemidAdded 2 days ago
About this role
Develop and implement AI safety techniques to make large language models more reliable, honest, and aligned with human values. This role combines cutting-edge research in model alignment and interpretability with practical engineering to protect millions of users.
What you'll do
- Design evaluation methodologies and metrics to assess safety and alignment of large language models
- Research and develop alignment techniques including value learning, interpretability, and RLHF
- Conduct adversarial testing to identify vulnerabilities and failure modes
- Mitigate biases, toxicity, and harmful behaviors in models through fine-tuning and reinforcement learning
- Translate safety research into scalable solutions with engineering and product teams
- Publish findings and contribute to AI safety research community
What they're looking for
- Machine learning and transformers architecture
- Reinforcement learning from human feedback (RLHF)
- Production code development (Python or similar)
- GPU training, serving, and debugging
- Data pipelines and infrastructure
- Model interpretability and explainability
- Adversarial testing and vulnerability assessment
- Distributed model training
Benefits
- Work on critical AI safety challenges affecting millions of users
- Contribute to academic publications and present at conferences
- Collaborate with leading AI research team at a unicorn AI company
- Located in Redwood City, CA
- Opportunity to shape responsible AI development
Opens the official application on the employer’s site. No login required.