Baseten
Software Engineer - Model Products
San Francisco (Remote)$180k–$360kfulltimemidAdded 2 days ago
About this role
Baseten seeks a Software Engineer to own the Model APIs infrastructure that powers hosted endpoints for open-source AI models. You'll optimize inference performance, build serving capabilities, and develop the platform engineers use to deploy models at scale.
What you'll do
- Design and operate Model APIs with advanced features like structured outputs, function calling, and multi-modal serving
- Profile and optimize GPU kernels, implement custom CUDA operators, and tune memory patterns for high throughput
- Productionize performance improvements including speculative decoding, quantization, and KV-cache optimization
- Build comprehensive benchmarking frameworks to measure real-world performance across architectures and hardware
- Implement platform fundamentals: API versioning, validation, usage metering, quotas, and authentication
- Instrument observability and collaborate across teams on robust, developer-friendly model serving
What they're looking for
- Distributed systems design and operation
- Low-latency backend services and API development
- GPU/CUDA performance optimization and profiling
- LLM inference runtimes (vLLM, TensorRT-LLM, SGLang preferred)
- System debugging and observability (metrics, traces, logs)
- Kubernetes, service meshes, or distributed scheduling
- Written communication and technical documentation
- Infrastructure capacity planning and SLO management
Benefits
- Competitive compensation with meaningful equity
- 100% medical, dental, and vision insurance coverage for employee and dependents
- Flexible PTO with company-wide winter break closure
- Paid parental leave and fertility/family-building stipend
- Company-facilitated 401(k)
- Learning and networking exposure across AI startup ecosystem
Opens the official application on the employer’s site. No login required.