Anyscale

Distributed LLM Inference Engineer

San Francisco (Remote)fulltimemidAdded 2 days ago

About this role

Anyscale is seeking a Distributed LLM Inference Engineer to optimize and build high-performance inference systems at scale using Ray. You'll work across the stack integrating Ray Data with LLM engines, collaborate with open-source communities like vLLM, and ship end-to-end solutions for batch and online inference.

What you'll do

Develop and optimize batch and online inference solutions at scale for Ray users and Anyscale customers
Integrate Ray Data with LLM engines to achieve cost-effective large-scale ML inference
Collaborate with open-source projects like vLLM and contribute improvements back to the community
Implement state-of-the-art techniques from research and open-source communities
Work across the full stack to balance throughput, latency, and cost for inference workloads
Partner with product teams to ship end-to-end solutions quickly

What they're looking for

Large-scale ML inference optimization
Distributed systems design
Deep learning frameworks (PyTorch, TensorFlow)
Python programming
Ray framework experience
GPU/CUDA programming
ML systems knowledge
LLM inference engines (vLLM, TensorRT-LLM)

Benefits

Stock options
Healthcare plans with 99% premium coverage for employees and dependents
401k retirement plan
Education and wellbeing stipend
Paid parental leave and fertility benefits
Paid time off

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.