Skip to main content

Periodic Labs

Distributed Training Engineer

Menlo Park, Remote (Remote)fulltimemidAdded 2 days ago

About this role

Periodic Labs is seeking a Distributed Training Engineer to enhance and manage large-scale distributed LLM training systems for scientific research. The ideal candidate will collaborate with researchers and contribute to open-source frameworks while optimizing mid-training workflows.

What you'll do

  • Optimize and operate large-scale distributed LLM training systems
  • Collaborate with researchers on debugging and maintaining workflows
  • Support frontier-scale experiments in AI and science
  • Contribute to open-source large scale LLM training frameworks
  • Develop tools for distributed training operations

What they're looking for

  • Experience with clusters of ≥5,000 GPUs
  • 5D parallel LLM training
  • Familiarity with distributed training frameworks
  • Proficient in optimizing training throughput
  • Knowledge of Mixture-of-Expert models

Benefits

  • Well-funded and rapidly growing company
  • Opportunity for ownership and problem-solving
  • Exposure to new tools and scientific advancements
Apply on the employer's site

Opens the official application on the employer’s site. No login required.