openai
Research Engineer, Frontier Evals & Environments
San Franciscofulltimemid
About this role
Help build evaluation environments and benchmarks that measure and guide the development of frontier AI agents at OpenAI. You'll create RL environments, design measurement methodologies, and directly influence training runs that shape next-generation model capabilities.
What you'll do
- Design and build RL environments to test frontier model capabilities and behaviors
- Develop methodologies for automatically exploring and understanding model behavior
- Conduct rigorous analysis on evaluation scalability, reliability, and measurement variance
- Guide training decisions for large-scale model runs and measure their outcomes
- Build scalable systems for continuous evaluation across training pipelines
- Create self-improvement loops to automate model understanding and analysis
What they're looking for
- Machine learning fundamentals and systems thinking
- LLM and reinforcement learning experience (RLHF/RLAIF preferred)
- Software engineering and production ML systems
- Evaluation, grading, and synthetic data methodology
- Statistical analysis and experimental design
- Cross-functional communication and collaboration
- Research taste combined with engineering execution
- Coding agents and tool-using agent experience
Opens the official application on the employer’s site. No login required.