Glean

Machine Learning Engineer, LLM Evals & Observability

San Francisco, CA$200k–$300kmidAdded 2 days ago

About this role

Glean is seeking a Machine Learning Engineer to build the evaluation and observability systems that measure and improve the quality of its AI Assistant and Agents. You'll design evaluation pipelines, create LLM-powered judges, curate datasets, and build observability infrastructure that enables continuous quality improvement at scale.

What you'll do

Design and curate evaluation datasets with sampling strategies and golden sets for representative coverage
Build large-scale evaluation pipelines measuring assistant quality across thousands of real user queries
Develop LLM-powered judges that score correctness, completeness, and response quality aligned with human judgment
Gate product launches and prevent regressions by evaluating new models and changes before shipping
Create observability infrastructure for AI agents including trace enrichment, data pipelines, and dashboards
Use evaluation results to inform and drive quality improvements in assistant and agent systems

What they're looking for

Machine learning and LLM evaluation methodologies
Large-scale data pipeline and infrastructure engineering
Python and software engineering best practices
Statistical analysis and sampling strategies
LLM prompt engineering and judge calibration
Observability and monitoring systems design
SQL and data warehouse experience
Product-oriented thinking and cross-functional collaboration

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.