Baseten
Software Engineer - Voice AI (Inference Runtime)
San Francisco (Remote)$165k–$330kfulltimemidAdded 2 days ago
About this role
Baseten seeks a Software Engineer to own the Voice AI inference runtime stack, building production-grade systems for speech-to-text, text-to-speech, and voice agent workloads. You'll lead product areas end-to-end, optimize model serving for latency and throughput, and collaborate across teams to enable real-time voice AI at scale.
What you'll do
- Own Voice AI product areas end-to-end from architecture through implementation, rollout, and production operations
- Design and operate real-time, large-scale model serving systems for STT, TTS, and voice agent workloads
- Optimize inference stack for latency (p95/p99), throughput, and GPU efficiency through profiling and tuning
- Build infrastructure for multi-model voice agents with streaming I/O orchestration across components
- Drive cross-team collaboration on full-stack technical problems and coordinate delivery
- Mentor teammates through code reviews, design docs, and technical leadership
What they're looking for
- Production systems design and ownership with focus on tail latency optimization
- Python and general software engineering proficiency
- Real-time, large-scale distributed systems
- Machine learning infrastructure and model serving
- AI coding assistants (Claude, Cursor, Codex) for productivity
- Collaboration and cross-functional communication
- Model runtime optimizations (dynamic batching, scheduling, decode optimization)
- Containerization, orchestration, or distributed scheduling (Docker, Kubernetes)
Opens the official application on the employer’s site. No login required.