Skip to main content

Bland

Machine Learning Researcher, Audio

San Francisco$140k–$250kfulltimemidAdded 2 days ago

About this role

Bland is seeking an ML researcher to develop foundational audio technologies for enterprise AI phone agents, including speech-to-text, text-to-speech, and neural audio codecs. You'll move research from theory through large-scale training to production systems serving millions of daily calls, collaborating across research and engineering teams.

What you'll do

  • Design and train large-scale text-to-speech models with expressive, controllable output and optimize for real-time inference
  • Build robust speech-to-text systems handling accents, noise, telephony artifacts, and code switching
  • Research and implement neural audio codecs achieving high compression with minimal perceptual loss
  • Develop and scale distributed training pipelines for massive multilingual audio datasets
  • Design rigorous ablation studies and experiments with both objective metrics and perceptual evaluations
  • Curate audio datasets and implement data filtering and staged training strategies

What they're looking for

  • Self-supervised learning and generative modeling
  • Text-to-speech (TTS) system development and scaling
  • Automatic speech recognition (ASR) and robustness techniques
  • Neural audio codecs and audio compression
  • Distributed GPU training and large-scale ML pipelines
  • Experimental design and rigorous evaluation methodology
  • Audio signal processing and multimodal modeling
  • Python and deep learning frameworks

Benefits

  • Remote or San Francisco office location
  • Work on foundational voice AI research at scale
  • Collaborate with well-funded company backed by leading Silicon Valley investors
  • Opportunity to impact enterprise customer interactions through voice technology
Apply on the employer's site

Opens the official application on the employer’s site. No login required.