Skip to main content

Anyscale

Distributed LLM Inference Engineer

San Francisco (Remote)fulltimemidAdded 2 days ago

About this role

Anyscale is seeking a Distributed LLM Inference Engineer to optimize and build high-performance inference systems at scale using Ray. You'll work across the stack integrating Ray Data with LLM engines, collaborate with open-source communities like vLLM, and ship end-to-end solutions for batch and online inference.

What you'll do

  • Develop and optimize batch and online inference solutions at scale for Ray users and Anyscale customers
  • Integrate Ray Data with LLM engines to achieve cost-effective large-scale ML inference
  • Collaborate with open-source projects like vLLM and contribute improvements back to the community
  • Implement state-of-the-art techniques from research and open-source communities
  • Work across the full stack to balance throughput, latency, and cost for inference workloads
  • Partner with product teams to ship end-to-end solutions quickly

What they're looking for

  • Large-scale ML inference optimization
  • Distributed systems design
  • Deep learning frameworks (PyTorch, TensorFlow)
  • Python programming
  • Ray framework experience
  • GPU/CUDA programming
  • ML systems knowledge
  • LLM inference engines (vLLM, TensorRT-LLM)

Benefits

  • Stock options
  • Healthcare plans with 99% premium coverage for employees and dependents
  • 401k retirement plan
  • Education and wellbeing stipend
  • Paid parental leave and fertility benefits
  • Paid time off
Apply on the employer's site

Opens the official application on the employer’s site. No login required.