Cohere

Audio Inference Engineer, Model Efficiency

New York (Remote)fulltimemidAdded 2 days ago

About this role

Cohere seeks an Audio Inference Engineer to optimize machine learning model serving for audio workloads, focusing on reducing latency and improving throughput. You'll collaborate with training and infrastructure teams to advance real-time audio inference systems and solve complex performance bottlenecks.

What you'll do

Develop and optimize high-performance audio inference systems
Identify and resolve bottlenecks in audio model serving pipelines
Collaborate with training and serving infrastructure teams on model deployment
Advance core metrics including latency, throughput, and quality for audio processing
Work on real-time and streaming audio inference architectures
Deliver solutions for optimizing audio processing and streaming workloads

What they're looking for

C++ and Python programming
Deep learning models for audio, speech, or language applications
GPU programming and low-level system optimization
Machine learning frameworks (PyTorch, TensorFlow, or audio libraries)
Inference frameworks (vLLM, SGLang, TensorRT-LLM, or custom systems)
Model parallelization across multiple GPUs
Real-time streaming architectures
Sequence modeling and transformer-based audio systems

Benefits

Weekly lunch stipend ($75 or equivalent)
Full health, dental, and mental health coverage
6 weeks paid vacation (30 working days)
100% parental leave top-up for up to 6 months
Annual enrichment budget for arts, fitness, and professional development
Home office stipend and remote-friendly work environment

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.