Skip to main content

Baseten

Software Engineer - Training Infrastructure

San Francisco (Remote)$165k–$330kfulltimemidAdded 2 days ago

About this role

Baseten is seeking a Software Engineer to design and build scalable infrastructure systems for its ML training platform. You'll architect scheduling, storage, and networking systems while partnering with research engineers to deliver reliable, high-performance training solutions for AI companies.

What you'll do

  • Design scalable infrastructure systems including scheduling, storage, and networking for ML training
  • Partner with developers and research engineers to translate training requirements into technical solutions
  • Architect a global training scheduler and reinforcement learning systems
  • Drive reliability improvements and development velocity across the platform
  • Make critical architectural decisions balancing performance and system reliability
  • Lead technical discussions and mentor junior engineers on infrastructure best practices

What they're looking for

  • Go programming (Python a plus)
  • Kubernetes in production environments
  • Distributed systems design and performance tuning
  • Observability systems design
  • AWS and GCP cloud platforms
  • ML/AI workloads and MLOps platforms
  • Distributed storage systems
  • Workload orchestration platforms

Benefits

  • Competitive compensation with meaningful equity
  • 100% medical, dental, and vision insurance coverage for employee and dependents
  • Flexible PTO policy with company-wide winter break
  • Paid parental leave
  • Fertility and family-building stipend through Carrot
  • Company-facilitated 401(k)
Apply on the employer's site

Opens the official application on the employer’s site. No login required.