Baseten
Software Engineer- BIS (Baseten Inference Stack)
San Francisco (Remote)$180k–$360kfulltimemidAdded 2 days ago
About this role
Join Baseten's Inference Stack team to build the distributed runtime powering large-scale LLM inference. You'll work across the full stack—from developer tools to Kubernetes orchestration—enabling AI companies to deploy cutting-edge models with industry-leading performance and reliability.
What you'll do
- Develop infrastructure and orchestration systems for distributed LLM inference at scale
- Build platform capabilities for routing, autoscaling, scheduling, observability, and runtime management
- Work across the stack from customer-facing features to low-level infrastructure components
- Collaborate with Model Performance engineers to make inference optimizations accessible and configurable
- Debug complex production systems spanning Kubernetes, distributed runtimes, networking, and GPU workloads
- Own end-to-end projects from architecture through deployment and iteration based on customer feedback
What they're looking for
- Distributed systems and backend infrastructure
- Platform engineering and production systems
- Kubernetes and container orchestration
- Debugging across multiple stack layers
- Developer experience design
- GPU workload management
- Willingness to learn new languages and frameworks
- Strong communication and collaboration
Benefits
- Work on mission-critical inference powering leading AI companies like Cursor and Notion
- Recently raised $1.5B Series F with top-tier investors
- Located in San Francisco
- Opportunity to own systems in production
- Cross-functional collaboration with Model Performance teams
- Impact on frontier AI infrastructure
Opens the official application on the employer’s site. No login required.