Cursor
Software Engineer, Agent Evaluation and Quality
San FranciscofulltimemidAdded 3 days ago
About this role
Build evaluation and quality measurement infrastructure for Cursor's AI coding agent. You'll design datasets, evaluation pipelines, and feedback loops that help the agent improve reliably over time, working across product, data, and engineering teams to turn insights into measurable improvements.
What you'll do
- Design and build AI evaluation systems with curated datasets, offline replay, scorers, and monitoring dashboards
- Create feedback loops from real user data to inform model and system improvements
- Develop analysis tools for debugging agent behavior and identifying failure patterns
- Define quality metrics and operational guardrails for agent reliability
- Partner with research, product, and infrastructure teams on quality improvements
- Build pipelines to analyze agent behavior at scale and surface actionable insights
What they're looking for
- AI/ML evaluation and measurement systems
- Data pipeline and infrastructure development
- Python or similar backend languages
- SQL and data analysis
- Metrics design and experimentation
- Debugging and root cause analysis
- Cross-team collaboration
- Knowledge of LLMs and AI agents
Opens the official application on the employer’s site. No login required.