Baseten

Software Engineer- BIS (Baseten Inference Stack)

San Francisco (Remote)$180k–$360kfulltimemidAdded 2 days ago

About this role

Join Baseten's Inference Stack team to build the distributed runtime powering large-scale LLM inference. You'll work across the full stack—from developer tools to Kubernetes orchestration—enabling AI companies to deploy cutting-edge models with industry-leading performance and reliability.

What you'll do

Develop infrastructure and orchestration systems for distributed LLM inference at scale
Build platform capabilities for routing, autoscaling, scheduling, observability, and runtime management
Work across the stack from customer-facing features to low-level infrastructure components
Collaborate with Model Performance engineers to make inference optimizations accessible and configurable
Debug complex production systems spanning Kubernetes, distributed runtimes, networking, and GPU workloads
Own end-to-end projects from architecture through deployment and iteration based on customer feedback

What they're looking for

Distributed systems and backend infrastructure
Platform engineering and production systems
Kubernetes and container orchestration
Debugging across multiple stack layers
Developer experience design
GPU workload management
Willingness to learn new languages and frameworks
Strong communication and collaboration

Benefits

Work on mission-critical inference powering leading AI companies like Cursor and Notion
Recently raised $1.5B Series F with top-tier investors
Located in San Francisco
Opportunity to own systems in production
Cross-functional collaboration with Model Performance teams
Impact on frontier AI infrastructure

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.