Skip to main content

Firecrawl

Research Engineer – Evals

San Francisco, CA (Hybrid) OR Remote (Americas, UTC-3 to UTC-10) (Remote)$160k–$240kfulltimemidAdded 2 days ago

About this role

Design and build the evaluation infrastructure for Firecrawl's web data extraction platform, owning the metrics, pipelines, and datasets that measure output quality across millions of websites. You'll translate evaluation findings into training signals for models and RL systems, working at the intersection of rigorous measurement and practical product impact.

What you'll do

  • Design and implement evaluation metrics and pipelines for scrape, crawl, extract, and map operations across diverse web formats
  • Build benchmark datasets that represent real-world customer data distribution, including edge cases and difficult scenarios
  • Develop and validate LLM-as-judge systems for automated quality scoring, with human review tooling for edge cases
  • Integrate evals into CI/CD to catch regressions before production deployment
  • Work with RL and research engineers to convert quality measurements into reward signals and training feedback loops
  • Design and execute experiments to test hypotheses, communicating findings clearly to drive product and model decisions

What they're looking for

  • ML engineering and applied AI with production systems experience
  • LLM evaluation methodology and LLM-as-judge system design
  • Data quality assessment and unstructured data handling
  • Python and evaluation infrastructure development
  • Benchmark and dataset design with human labeling workflows
  • Metrics design and statistical rigor
  • CI/CD integration and automated testing
  • Clear technical communication and experimentation

Benefits

  • Salary: $160,000–$240,000/year (adjusted for location)
  • Equity: Up to 0.10%
  • Flexible location: San Francisco hybrid or remote (Americas, UTC-3 to UTC-10)
  • Full-time role at a fast-growing company (8-figure ARR, 120k+ GitHub stars)
  • Work on essential infrastructure for AI data extraction
  • Direct influence on model training and product decisions
Apply on the employer's site

Opens the official application on the employer’s site. No login required.