Skip to main content

agi-inc

Research Engineer - Evals

San Francisco OfficefulltimemidAdded 2 days ago

About this role

Join a stealth AI startup building consumer-grade AGI agents to develop the evaluation infrastructure that gates all model and agent releases. You'll own the eval harness across capability, behavior, and user experience—setting the standard for what ships and protecting it from product pressure.

What you'll do

  • Build and maintain eval suites for model capabilities, agent behavior, regressions, and human-rated rubrics
  • Create dashboards and tooling to accelerate researcher experiments and inform leadership decisions
  • Define and defend the quality bar for what's ready to ship
  • Instrument real-user behavior on real devices to inform product decisions
  • Advise research and product teams on measurement strategies for non-deterministic systems
  • Translate eval results into language partners and OEMs can hold the company accountable to

What they're looking for

  • Evaluating non-deterministic systems and agentic behavior
  • Long-horizon task measurement and tool-use assessment
  • Multilingual AI behavior evaluation
  • Metrics design and preventing gaming of metrics
  • Dashboard and instrumentation tooling
  • On-device performance measurement
  • Benchmark and evaluation system design
  • Communication across research, product, and partnerships

Benefits

  • Competitive cash compensation and meaningful equity
  • Top-tier relocation and immigration support
  • In-person role in San Francisco office
  • Work with elite founders and researchers from Stanford, OpenAI, and DeepMind
  • Shape research roadmap through measurement insights
  • Direct influence on product releases and company standards
Apply on the employer's site

Opens the official application on the employer’s site. No login required.