Skip to main content

Glean

Machine Learning Engineer, LLM Evals & Observability

Mountain View, CA$200k–$300kmidAdded 2 days ago

About this role

Glean seeks a Machine Learning Engineer to build evaluation and observability systems for its AI Assistant and Agents. You'll design evaluation pipelines, create LLM-powered judges, and develop observability infrastructure that measures and improves AI quality at scale.

What you'll do

  • Design and curate evaluation datasets with sampling strategies and golden sets for representative coverage
  • Build large-scale evaluation pipelines measuring assistant quality across thousands of real user queries
  • Develop LLM-powered judges that score correctness, completeness, and response quality against human judgment
  • Evaluate new models and product changes to gate launches and prevent quality regressions
  • Build observability infrastructure including trace enrichment, data pipelines, and dashboards for agent behavior
  • Close feedback loops between quality measurement and product improvement

What they're looking for

  • Machine learning and LLM evaluation methodologies
  • Large-scale data pipeline and infrastructure engineering
  • Python or similar programming languages
  • Statistical analysis and metrics design
  • Prompt engineering and LLM-as-a-judge techniques
  • Observability and monitoring systems
  • SQL and data analysis
  • AI/ML systems design
Apply on the employer's site

Opens the official application on the employer’s site. No login required.