Bland
Machine Learning Researcher, Audio
San Francisco$140k–$250kfulltimemidAdded 2 days ago
About this role
Bland is seeking an ML researcher to develop foundational audio technologies for enterprise AI phone agents, including speech-to-text, text-to-speech, and neural audio codecs. You'll move research from theory through large-scale training to production systems serving millions of daily calls, collaborating across research and engineering teams.
What you'll do
- Design and train large-scale text-to-speech models with expressive, controllable output and optimize for real-time inference
- Build robust speech-to-text systems handling accents, noise, telephony artifacts, and code switching
- Research and implement neural audio codecs achieving high compression with minimal perceptual loss
- Develop and scale distributed training pipelines for massive multilingual audio datasets
- Design rigorous ablation studies and experiments with both objective metrics and perceptual evaluations
- Curate audio datasets and implement data filtering and staged training strategies
What they're looking for
- Self-supervised learning and generative modeling
- Text-to-speech (TTS) system development and scaling
- Automatic speech recognition (ASR) and robustness techniques
- Neural audio codecs and audio compression
- Distributed GPU training and large-scale ML pipelines
- Experimental design and rigorous evaluation methodology
- Audio signal processing and multimodal modeling
- Python and deep learning frameworks
Benefits
- Remote or San Francisco office location
- Work on foundational voice AI research at scale
- Collaborate with well-funded company backed by leading Silicon Valley investors
- Opportunity to impact enterprise customer interactions through voice technology
Opens the official application on the employer’s site. No login required.