Skip to main content

Cursor

Software Engineer, Agent Evaluation and Quality

San FranciscofulltimemidAdded 3 days ago

About this role

Build evaluation and quality measurement infrastructure for Cursor's AI coding agent. You'll design datasets, evaluation pipelines, and feedback loops that help the agent improve reliably over time, working across product, data, and engineering teams to turn insights into measurable improvements.

What you'll do

  • Design and build AI evaluation systems with curated datasets, offline replay, scorers, and monitoring dashboards
  • Create feedback loops from real user data to inform model and system improvements
  • Develop analysis tools for debugging agent behavior and identifying failure patterns
  • Define quality metrics and operational guardrails for agent reliability
  • Partner with research, product, and infrastructure teams on quality improvements
  • Build pipelines to analyze agent behavior at scale and surface actionable insights

What they're looking for

  • AI/ML evaluation and measurement systems
  • Data pipeline and infrastructure development
  • Python or similar backend languages
  • SQL and data analysis
  • Metrics design and experimentation
  • Debugging and root cause analysis
  • Cross-team collaboration
  • Knowledge of LLMs and AI agents
Apply on the employer's site

Opens the official application on the employer’s site. No login required.