Astera Labs

Machine Learning Infrastructure Engineer

San Jose, California, United StatesmidAdded 2 days ago

About this role

Astera Labs seeks a Machine Learning Infrastructure Engineer to design and operate the runtime, platform, and operational systems that power modern AI applications. You'll build internal infrastructure for LLM applications, model gateways, observability, and AI operations across the organization.

What you'll do

Build and maintain internal AI infrastructure for LLM applications, agents, and retrieval systems
Own inference deployment paths with access control, monitoring, and operational reliability
Develop model gateways, routing logic, runtime integrations, and telemetry systems
Create AI Ops capabilities including evaluation, observability, incident triage, and cost monitoring
Build dashboards, tracing, logging, and alerting for production AI systems
Optimize performance and unit economics through routing, caching, batching, and latency/cost optimization

What they're looking for

Python and backend/systems programming
AWS or GCP cloud platforms
LLM inference deployment and serving systems
Observability, telemetry, and reliability engineering
Model APIs, gateways, and runtime infrastructure
Evaluation systems and release workflows
Incident response and debugging
Platform abstraction and SDK design

Benefits

Base salary: $140,000 - $165,000
Work on cutting-edge AI infrastructure at scale
Exposure to agentic systems and modern AI platforms
Collaborative team focused on quality and developer experience

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.