42dot
LLM Engineer (LLM Evaluation)
Pangyo (Software Dream Center), South Korea (Remote)fulltimemidAdded 2 days ago
About this role
42dot, a mobility AI company under Hyundai Motor Group, seeks an LLM Engineer to design and build evaluation systems for assessing large language model performance. You'll create benchmark datasets, evaluation protocols, and automated workflows using Kubernetes, Argo Workflows, and MLflow to ensure continuous model quality improvement.
What you'll do
- Design benchmark datasets and evaluation metrics (human and LLM-based) for reliable LLM performance assessment
- Establish evaluation protocols and ensure reproducibility for fair model comparisons
- Build automated evaluation workflows using Argo Workflows and MLflow with ML pipeline integration
- Design automated regression detection and alerting systems for model deployment validation
- Operate repeatable evaluation workflows to validate large-scale model quality and stability
- Drive continuous model quality improvement processes based on evaluation results
What they're looking for
- LLM evaluation and benchmarking (3+ years experience)
- Deep Learning and NLP research/development
- LLM evaluation frameworks (lm-eval, HELM, OpenAI Evals)
- Python service development with async/concurrent processing
- ML experiment management and reproducibility
- Model validation workflow design
- Kubernetes and containerized environments (preferred)
- GPU distributed inference and large-scale model evaluation (preferred)
Opens the official application on the employer’s site. No login required.