Skip to main content

Baseten

Software Engineer- BIS (Baseten Inference Stack)

San Francisco (Remote)$180k–$360kfulltimemidAdded 2 days ago

About this role

Join Baseten's Inference Stack team to build the distributed runtime powering large-scale LLM inference. You'll work across the full stack—from developer tools to Kubernetes orchestration—enabling AI companies to deploy cutting-edge models with industry-leading performance and reliability.

What you'll do

  • Develop infrastructure and orchestration systems for distributed LLM inference at scale
  • Build platform capabilities for routing, autoscaling, scheduling, observability, and runtime management
  • Work across the stack from customer-facing features to low-level infrastructure components
  • Collaborate with Model Performance engineers to make inference optimizations accessible and configurable
  • Debug complex production systems spanning Kubernetes, distributed runtimes, networking, and GPU workloads
  • Own end-to-end projects from architecture through deployment and iteration based on customer feedback

What they're looking for

  • Distributed systems and backend infrastructure
  • Platform engineering and production systems
  • Kubernetes and container orchestration
  • Debugging across multiple stack layers
  • Developer experience design
  • GPU workload management
  • Willingness to learn new languages and frameworks
  • Strong communication and collaboration

Benefits

  • Work on mission-critical inference powering leading AI companies like Cursor and Notion
  • Recently raised $1.5B Series F with top-tier investors
  • Located in San Francisco
  • Opportunity to own systems in production
  • Cross-functional collaboration with Model Performance teams
  • Impact on frontier AI infrastructure
Apply on the employer's site

Opens the official application on the employer’s site. No login required.