Skip to main content

Baseten

Software Engineer - Model Products

San Francisco (Remote)$180k–$360kfulltimemidAdded 2 days ago

About this role

Baseten seeks a Software Engineer to own the Model APIs infrastructure that powers hosted endpoints for open-source AI models. You'll optimize inference performance, build serving capabilities, and develop the platform engineers use to deploy models at scale.

What you'll do

  • Design and operate Model APIs with advanced features like structured outputs, function calling, and multi-modal serving
  • Profile and optimize GPU kernels, implement custom CUDA operators, and tune memory patterns for high throughput
  • Productionize performance improvements including speculative decoding, quantization, and KV-cache optimization
  • Build comprehensive benchmarking frameworks to measure real-world performance across architectures and hardware
  • Implement platform fundamentals: API versioning, validation, usage metering, quotas, and authentication
  • Instrument observability and collaborate across teams on robust, developer-friendly model serving

What they're looking for

  • Distributed systems design and operation
  • Low-latency backend services and API development
  • GPU/CUDA performance optimization and profiling
  • LLM inference runtimes (vLLM, TensorRT-LLM, SGLang preferred)
  • System debugging and observability (metrics, traces, logs)
  • Kubernetes, service meshes, or distributed scheduling
  • Written communication and technical documentation
  • Infrastructure capacity planning and SLO management

Benefits

  • Competitive compensation with meaningful equity
  • 100% medical, dental, and vision insurance coverage for employee and dependents
  • Flexible PTO with company-wide winter break closure
  • Paid parental leave and fertility/family-building stipend
  • Company-facilitated 401(k)
  • Learning and networking exposure across AI startup ecosystem
Apply on the employer's site

Opens the official application on the employer’s site. No login required.