Glean
Machine Learning Engineer, LLM Evals & Observability
San Francisco, CA$200k–$300kmidAdded 2 days ago
About this role
Glean is seeking a Machine Learning Engineer to build the evaluation and observability systems that measure and improve the quality of its AI Assistant and Agents. You'll design evaluation pipelines, create LLM-powered judges, curate datasets, and build observability infrastructure that enables continuous quality improvement at scale.
What you'll do
- Design and curate evaluation datasets with sampling strategies and golden sets for representative coverage
- Build large-scale evaluation pipelines measuring assistant quality across thousands of real user queries
- Develop LLM-powered judges that score correctness, completeness, and response quality aligned with human judgment
- Gate product launches and prevent regressions by evaluating new models and changes before shipping
- Create observability infrastructure for AI agents including trace enrichment, data pipelines, and dashboards
- Use evaluation results to inform and drive quality improvements in assistant and agent systems
What they're looking for
- Machine learning and LLM evaluation methodologies
- Large-scale data pipeline and infrastructure engineering
- Python and software engineering best practices
- Statistical analysis and sampling strategies
- LLM prompt engineering and judge calibration
- Observability and monitoring systems design
- SQL and data warehouse experience
- Product-oriented thinking and cross-functional collaboration
Opens the official application on the employer’s site. No login required.