Baseten

Software Engineer - Voice AI (Inference Runtime)

San Francisco (Remote)$165k–$330kfulltimemidAdded 2 days ago

About this role

Baseten seeks a Software Engineer to own the Voice AI inference runtime stack, building production-grade systems for speech-to-text, text-to-speech, and voice agent workloads. You'll lead product areas end-to-end, optimize model serving for latency and throughput, and collaborate across teams to enable real-time voice AI at scale.

What you'll do

Own Voice AI product areas end-to-end from architecture through implementation, rollout, and production operations
Design and operate real-time, large-scale model serving systems for STT, TTS, and voice agent workloads
Optimize inference stack for latency (p95/p99), throughput, and GPU efficiency through profiling and tuning
Build infrastructure for multi-model voice agents with streaming I/O orchestration across components
Drive cross-team collaboration on full-stack technical problems and coordinate delivery
Mentor teammates through code reviews, design docs, and technical leadership

What they're looking for

Production systems design and ownership with focus on tail latency optimization
Python and general software engineering proficiency
Real-time, large-scale distributed systems
Machine learning infrastructure and model serving
AI coding assistants (Claude, Cursor, Codex) for productivity
Collaboration and cross-functional communication
Model runtime optimizations (dynamic batching, scheduling, decode optimization)
Containerization, orchestration, or distributed scheduling (Docker, Kubernetes)

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.