TRM Labs
Machine Learning Infrastructure Engineer
San Francisco, CA (Remote)fulltimemidAdded 2 days ago
About this role
TRM Labs is seeking a Machine Learning Infrastructure Engineer to help develop a robust GPU-backed infrastructure that supports AI systems for detecting financial crime. The role involves optimizing high-throughput ML workloads, managing distributed inference strategies, and ensuring seamless collaboration across various engineering teams.
What you'll do
- Design and manage GPU cluster infrastructure
- Optimize high-throughput inference systems
- Implement distributed inference strategies
- Enhance model optimization workflows
- Schedule and manage heterogeneous workloads
- Build observability for ML infrastructure
What they're looking for
- Experience with distributed systems
- ML/LLM inference on GPU clusters
- Understanding of high-throughput inference
- Familiarity with ML serving frameworks
- Kubernetes or orchestration systems
- Ability to debug GPU-related issues
- Adaptability and autonomy
- Strong communication skills
Benefits
- Impactful work in crime prevention
- Collaborative team environment
- Opportunity to leverage cutting-edge technology
- Mission-driven organization
- Diverse and inclusive workplace
- [unknown]
Opens the official application on the employer’s site. No login required.