Skip to main content

TRM Labs

Machine Learning Infrastructure Engineer

San Francisco, CA (Remote)fulltimemidAdded 2 days ago

About this role

TRM Labs is seeking a Machine Learning Infrastructure Engineer to help develop a robust GPU-backed infrastructure that supports AI systems for detecting financial crime. The role involves optimizing high-throughput ML workloads, managing distributed inference strategies, and ensuring seamless collaboration across various engineering teams.

What you'll do

  • Design and manage GPU cluster infrastructure
  • Optimize high-throughput inference systems
  • Implement distributed inference strategies
  • Enhance model optimization workflows
  • Schedule and manage heterogeneous workloads
  • Build observability for ML infrastructure

What they're looking for

  • Experience with distributed systems
  • ML/LLM inference on GPU clusters
  • Understanding of high-throughput inference
  • Familiarity with ML serving frameworks
  • Kubernetes or orchestration systems
  • Ability to debug GPU-related issues
  • Adaptability and autonomy
  • Strong communication skills

Benefits

  • Impactful work in crime prevention
  • Collaborative team environment
  • Opportunity to leverage cutting-edge technology
  • Mission-driven organization
  • Diverse and inclusive workplace
  • [unknown]
Apply on the employer's site

Opens the official application on the employer’s site. No login required.