openai

Research Engineer, Frontier Evals & Environments

San Franciscofulltimemid

About this role

Help build evaluation environments and benchmarks that measure and guide the development of frontier AI agents at OpenAI. You'll create RL environments, design measurement methodologies, and directly influence training runs that shape next-generation model capabilities.

What you'll do

Design and build RL environments to test frontier model capabilities and behaviors
Develop methodologies for automatically exploring and understanding model behavior
Conduct rigorous analysis on evaluation scalability, reliability, and measurement variance
Guide training decisions for large-scale model runs and measure their outcomes
Build scalable systems for continuous evaluation across training pipelines
Create self-improvement loops to automate model understanding and analysis

What they're looking for

Machine learning fundamentals and systems thinking
LLM and reinforcement learning experience (RLHF/RLAIF preferred)
Software engineering and production ML systems
Evaluation, grading, and synthetic data methodology
Statistical analysis and experimental design
Cross-functional communication and collaboration
Research taste combined with engineering execution
Coding agents and tool-using agent experience

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.