Cursor

Software Engineer, Agent Evaluation and Quality

San FranciscofulltimemidAdded 3 days ago

About this role

Build evaluation and quality measurement infrastructure for Cursor's AI coding agent. You'll design datasets, evaluation pipelines, and feedback loops that help the agent improve reliably over time, working across product, data, and engineering teams to turn insights into measurable improvements.

What you'll do

Design and build AI evaluation systems with curated datasets, offline replay, scorers, and monitoring dashboards
Create feedback loops from real user data to inform model and system improvements
Develop analysis tools for debugging agent behavior and identifying failure patterns
Define quality metrics and operational guardrails for agent reliability
Partner with research, product, and infrastructure teams on quality improvements
Build pipelines to analyze agent behavior at scale and surface actionable insights

What they're looking for

AI/ML evaluation and measurement systems
Data pipeline and infrastructure development
Python or similar backend languages
SQL and data analysis
Metrics design and experimentation
Debugging and root cause analysis
Cross-team collaboration
Knowledge of LLMs and AI agents

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.