Skip to main content

Astera Labs

Machine Learning Infrastructure Engineer

San Jose, California, United StatesmidAdded 2 days ago

About this role

Astera Labs seeks a Machine Learning Infrastructure Engineer to design and operate the runtime, platform, and operational systems that power modern AI applications. You'll build internal infrastructure for LLM applications, model gateways, observability, and AI operations across the organization.

What you'll do

  • Build and maintain internal AI infrastructure for LLM applications, agents, and retrieval systems
  • Own inference deployment paths with access control, monitoring, and operational reliability
  • Develop model gateways, routing logic, runtime integrations, and telemetry systems
  • Create AI Ops capabilities including evaluation, observability, incident triage, and cost monitoring
  • Build dashboards, tracing, logging, and alerting for production AI systems
  • Optimize performance and unit economics through routing, caching, batching, and latency/cost optimization

What they're looking for

  • Python and backend/systems programming
  • AWS or GCP cloud platforms
  • LLM inference deployment and serving systems
  • Observability, telemetry, and reliability engineering
  • Model APIs, gateways, and runtime infrastructure
  • Evaluation systems and release workflows
  • Incident response and debugging
  • Platform abstraction and SDK design

Benefits

  • Base salary: $140,000 - $165,000
  • Work on cutting-edge AI infrastructure at scale
  • Exposure to agentic systems and modern AI platforms
  • Collaborative team focused on quality and developer experience
Apply on the employer's site

Opens the official application on the employer’s site. No login required.