Baseten

Software Engineer - Model Products

San Francisco (Remote)$180k–$360kfulltimemidAdded 2 days ago

About this role

Baseten seeks a Software Engineer to own the Model APIs infrastructure that powers hosted endpoints for open-source AI models. You'll optimize inference performance, build serving capabilities, and develop the platform engineers use to deploy models at scale.

What you'll do

Design and operate Model APIs with advanced features like structured outputs, function calling, and multi-modal serving
Profile and optimize GPU kernels, implement custom CUDA operators, and tune memory patterns for high throughput
Productionize performance improvements including speculative decoding, quantization, and KV-cache optimization
Build comprehensive benchmarking frameworks to measure real-world performance across architectures and hardware
Implement platform fundamentals: API versioning, validation, usage metering, quotas, and authentication
Instrument observability and collaborate across teams on robust, developer-friendly model serving

What they're looking for

Distributed systems design and operation
Low-latency backend services and API development
GPU/CUDA performance optimization and profiling
LLM inference runtimes (vLLM, TensorRT-LLM, SGLang preferred)
System debugging and observability (metrics, traces, logs)
Kubernetes, service meshes, or distributed scheduling
Written communication and technical documentation
Infrastructure capacity planning and SLO management

Benefits

Competitive compensation with meaningful equity
100% medical, dental, and vision insurance coverage for employee and dependents
Flexible PTO with company-wide winter break closure
Paid parental leave and fertility/family-building stipend
Company-facilitated 401(k)
Learning and networking exposure across AI startup ecosystem

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.