Skip to main content

Glean

Machine Learning Engineer, LLM Evals & Observability

San Francisco, CA$200k–$300kmidAdded 2 days ago

About this role

Glean is seeking a Machine Learning Engineer to build the evaluation and observability systems that measure and improve the quality of its AI Assistant and Agents. You'll design evaluation pipelines, create LLM-powered judges, curate datasets, and build observability infrastructure that enables continuous quality improvement at scale.

What you'll do

  • Design and curate evaluation datasets with sampling strategies and golden sets for representative coverage
  • Build large-scale evaluation pipelines measuring assistant quality across thousands of real user queries
  • Develop LLM-powered judges that score correctness, completeness, and response quality aligned with human judgment
  • Gate product launches and prevent regressions by evaluating new models and changes before shipping
  • Create observability infrastructure for AI agents including trace enrichment, data pipelines, and dashboards
  • Use evaluation results to inform and drive quality improvements in assistant and agent systems

What they're looking for

  • Machine learning and LLM evaluation methodologies
  • Large-scale data pipeline and infrastructure engineering
  • Python and software engineering best practices
  • Statistical analysis and sampling strategies
  • LLM prompt engineering and judge calibration
  • Observability and monitoring systems design
  • SQL and data warehouse experience
  • Product-oriented thinking and cross-functional collaboration
Apply on the employer's site

Opens the official application on the employer’s site. No login required.