Glean

Machine Learning Engineer, LLM Evals & Observability

Mountain View, CA$200k–$300kmidAdded 2 days ago

About this role

Glean seeks a Machine Learning Engineer to build evaluation and observability systems for its AI Assistant and Agents. You'll design evaluation pipelines, create LLM-powered judges, and develop observability infrastructure that measures and improves AI quality at scale.

What you'll do

Design and curate evaluation datasets with sampling strategies and golden sets for representative coverage
Build large-scale evaluation pipelines measuring assistant quality across thousands of real user queries
Develop LLM-powered judges that score correctness, completeness, and response quality against human judgment
Evaluate new models and product changes to gate launches and prevent quality regressions
Build observability infrastructure including trace enrichment, data pipelines, and dashboards for agent behavior
Close feedback loops between quality measurement and product improvement

What they're looking for

Machine learning and LLM evaluation methodologies
Large-scale data pipeline and infrastructure engineering
Python or similar programming languages
Statistical analysis and metrics design
Prompt engineering and LLM-as-a-judge techniques
Observability and monitoring systems
SQL and data analysis
AI/ML systems design

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.