Glean
Machine Learning Engineer, LLM Evals & Observability
Mountain View, CA$200k–$300kmidAdded 2 days ago
About this role
Glean seeks a Machine Learning Engineer to build evaluation and observability systems for its AI Assistant and Agents. You'll design evaluation pipelines, create LLM-powered judges, and develop observability infrastructure that measures and improves AI quality at scale.
What you'll do
- Design and curate evaluation datasets with sampling strategies and golden sets for representative coverage
- Build large-scale evaluation pipelines measuring assistant quality across thousands of real user queries
- Develop LLM-powered judges that score correctness, completeness, and response quality against human judgment
- Evaluate new models and product changes to gate launches and prevent quality regressions
- Build observability infrastructure including trace enrichment, data pipelines, and dashboards for agent behavior
- Close feedback loops between quality measurement and product improvement
What they're looking for
- Machine learning and LLM evaluation methodologies
- Large-scale data pipeline and infrastructure engineering
- Python or similar programming languages
- Statistical analysis and metrics design
- Prompt engineering and LLM-as-a-judge techniques
- Observability and monitoring systems
- SQL and data analysis
- AI/ML systems design
Opens the official application on the employer’s site. No login required.