Bland

Machine Learning Researcher, Audio

San Francisco$140k–$250kfulltimemidAdded 2 days ago

About this role

Bland is seeking an ML researcher to develop foundational audio technologies for enterprise AI phone agents, including speech-to-text, text-to-speech, and neural audio codecs. You'll move research from theory through large-scale training to production systems serving millions of daily calls, collaborating across research and engineering teams.

What you'll do

Design and train large-scale text-to-speech models with expressive, controllable output and optimize for real-time inference
Build robust speech-to-text systems handling accents, noise, telephony artifacts, and code switching
Research and implement neural audio codecs achieving high compression with minimal perceptual loss
Develop and scale distributed training pipelines for massive multilingual audio datasets
Design rigorous ablation studies and experiments with both objective metrics and perceptual evaluations
Curate audio datasets and implement data filtering and staged training strategies

What they're looking for

Self-supervised learning and generative modeling
Text-to-speech (TTS) system development and scaling
Automatic speech recognition (ASR) and robustness techniques
Neural audio codecs and audio compression
Distributed GPU training and large-scale ML pipelines
Experimental design and rigorous evaluation methodology
Audio signal processing and multimodal modeling
Python and deep learning frameworks

Benefits

Remote or San Francisco office location
Work on foundational voice AI research at scale
Collaborate with well-funded company backed by leading Silicon Valley investors
Opportunity to impact enterprise customer interactions through voice technology

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.