42dot

LLM Engineer (LLM Evaluation)

Pangyo (Software Dream Center), South Korea (Remote)fulltimemidAdded 2 days ago

About this role

42dot, a mobility AI company under Hyundai Motor Group, seeks an LLM Engineer to design and build evaluation systems for assessing large language model performance. You'll create benchmark datasets, evaluation protocols, and automated workflows using Kubernetes, Argo Workflows, and MLflow to ensure continuous model quality improvement.

What you'll do

Design benchmark datasets and evaluation metrics (human and LLM-based) for reliable LLM performance assessment
Establish evaluation protocols and ensure reproducibility for fair model comparisons
Build automated evaluation workflows using Argo Workflows and MLflow with ML pipeline integration
Design automated regression detection and alerting systems for model deployment validation
Operate repeatable evaluation workflows to validate large-scale model quality and stability
Drive continuous model quality improvement processes based on evaluation results

What they're looking for

LLM evaluation and benchmarking (3+ years experience)
Deep Learning and NLP research/development
LLM evaluation frameworks (lm-eval, HELM, OpenAI Evals)
Python service development with async/concurrent processing
ML experiment management and reproducibility
Model validation workflow design
Kubernetes and containerized environments (preferred)
GPU distributed inference and large-scale model evaluation (preferred)

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.