Baseten
Software Engineer - Training Infrastructure
San Francisco (Remote)$165k–$330kfulltimemidAdded 2 days ago
About this role
Baseten is seeking a Software Engineer to design and build scalable infrastructure systems for its ML training platform. You'll architect scheduling, storage, and networking systems while partnering with research engineers to deliver reliable, high-performance training solutions for AI companies.
What you'll do
- Design scalable infrastructure systems including scheduling, storage, and networking for ML training
- Partner with developers and research engineers to translate training requirements into technical solutions
- Architect a global training scheduler and reinforcement learning systems
- Drive reliability improvements and development velocity across the platform
- Make critical architectural decisions balancing performance and system reliability
- Lead technical discussions and mentor junior engineers on infrastructure best practices
What they're looking for
- Go programming (Python a plus)
- Kubernetes in production environments
- Distributed systems design and performance tuning
- Observability systems design
- AWS and GCP cloud platforms
- ML/AI workloads and MLOps platforms
- Distributed storage systems
- Workload orchestration platforms
Benefits
- Competitive compensation with meaningful equity
- 100% medical, dental, and vision insurance coverage for employee and dependents
- Flexible PTO policy with company-wide winter break
- Paid parental leave
- Fertility and family-building stipend through Carrot
- Company-facilitated 401(k)
Opens the official application on the employer’s site. No login required.