Skip to main content

Baseten

Software Engineer - Voice AI (Inference Runtime)

San Francisco (Remote)$165k–$330kfulltimemidAdded 2 days ago

About this role

Baseten seeks a Software Engineer to own the Voice AI inference runtime stack, building production-grade systems for speech-to-text, text-to-speech, and voice agent workloads. You'll lead product areas end-to-end, optimize model serving for latency and throughput, and collaborate across teams to enable real-time voice AI at scale.

What you'll do

  • Own Voice AI product areas end-to-end from architecture through implementation, rollout, and production operations
  • Design and operate real-time, large-scale model serving systems for STT, TTS, and voice agent workloads
  • Optimize inference stack for latency (p95/p99), throughput, and GPU efficiency through profiling and tuning
  • Build infrastructure for multi-model voice agents with streaming I/O orchestration across components
  • Drive cross-team collaboration on full-stack technical problems and coordinate delivery
  • Mentor teammates through code reviews, design docs, and technical leadership

What they're looking for

  • Production systems design and ownership with focus on tail latency optimization
  • Python and general software engineering proficiency
  • Real-time, large-scale distributed systems
  • Machine learning infrastructure and model serving
  • AI coding assistants (Claude, Cursor, Codex) for productivity
  • Collaboration and cross-functional communication
  • Model runtime optimizations (dynamic batching, scheduling, decode optimization)
  • Containerization, orchestration, or distributed scheduling (Docker, Kubernetes)
Apply on the employer's site

Opens the official application on the employer’s site. No login required.