Cohere
Audio Inference Engineer, Model Efficiency
New York (Remote)fulltimemidAdded 2 days ago
About this role
Cohere seeks an Audio Inference Engineer to optimize machine learning model serving for audio workloads, focusing on reducing latency and improving throughput. You'll collaborate with training and infrastructure teams to advance real-time audio inference systems and solve complex performance bottlenecks.
What you'll do
- Develop and optimize high-performance audio inference systems
- Identify and resolve bottlenecks in audio model serving pipelines
- Collaborate with training and serving infrastructure teams on model deployment
- Advance core metrics including latency, throughput, and quality for audio processing
- Work on real-time and streaming audio inference architectures
- Deliver solutions for optimizing audio processing and streaming workloads
What they're looking for
- C++ and Python programming
- Deep learning models for audio, speech, or language applications
- GPU programming and low-level system optimization
- Machine learning frameworks (PyTorch, TensorFlow, or audio libraries)
- Inference frameworks (vLLM, SGLang, TensorRT-LLM, or custom systems)
- Model parallelization across multiple GPUs
- Real-time streaming architectures
- Sequence modeling and transformer-based audio systems
Benefits
- Weekly lunch stipend ($75 or equivalent)
- Full health, dental, and mental health coverage
- 6 weeks paid vacation (30 working days)
- 100% parental leave top-up for up to 6 months
- Annual enrichment budget for arts, fitness, and professional development
- Home office stipend and remote-friendly work environment
Opens the official application on the employer’s site. No login required.