Mercor

Research Engineer – Benchmarking, Evals & Failure Analysis

San Francisco$130k–$500kfulltimemidAdded 2 days ago

About this role

Mercor is seeking a Research Engineer to enhance AI models through benchmarking, evaluation systems, and failure analysis. This role involves collaborating with teams to define metrics and improve data quality in a fast-paced environment in San Francisco.

What you'll do

Design and maintain benchmarking metrics for various AI behaviors
Develop and manage evaluation systems for tracking model performance
Conduct failure analysis on model outputs to identify improvement areas
Create and refine rubrics and scoring frameworks for evaluations
Assess data quality and impact on benchmarks to guide data strategies
Collaborate with teams to align evaluations with training goals

What they're looking for

Background in applied research and model evaluation
Strong coding skills related to ML models
Familiar with data structures and algorithms
Experience with APIs and SQL/NoSQL databases
Ability to analyze model behavior and evaluate data quality
Willingness to work in-office in a dynamic setting

Benefits

Bi-annual performance bonuses
Equity grant vested over 4 years
Relocation bonuses up to $15k
Housing bonuses for nearby residents
$1.5k monthly meal stipend
Free Equinox membership

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.