Anyscale
Distributed LLM Inference Engineer
San Francisco (Remote)fulltimemidAdded 2 days ago
About this role
Anyscale is seeking a Distributed LLM Inference Engineer to optimize and build high-performance inference systems at scale using Ray. You'll work across the stack integrating Ray Data with LLM engines, collaborate with open-source communities like vLLM, and ship end-to-end solutions for batch and online inference.
What you'll do
- Develop and optimize batch and online inference solutions at scale for Ray users and Anyscale customers
- Integrate Ray Data with LLM engines to achieve cost-effective large-scale ML inference
- Collaborate with open-source projects like vLLM and contribute improvements back to the community
- Implement state-of-the-art techniques from research and open-source communities
- Work across the full stack to balance throughput, latency, and cost for inference workloads
- Partner with product teams to ship end-to-end solutions quickly
What they're looking for
- Large-scale ML inference optimization
- Distributed systems design
- Deep learning frameworks (PyTorch, TensorFlow)
- Python programming
- Ray framework experience
- GPU/CUDA programming
- ML systems knowledge
- LLM inference engines (vLLM, TensorRT-LLM)
Benefits
- Stock options
- Healthcare plans with 99% premium coverage for employees and dependents
- 401k retirement plan
- Education and wellbeing stipend
- Paid parental leave and fertility benefits
- Paid time off
Opens the official application on the employer’s site. No login required.