Astera Labs
Machine Learning Infrastructure Engineer
San Jose, California, United StatesmidAdded 2 days ago
About this role
Astera Labs seeks a Machine Learning Infrastructure Engineer to design and operate the runtime, platform, and operational systems that power modern AI applications. You'll build internal infrastructure for LLM applications, model gateways, observability, and AI operations across the organization.
What you'll do
- Build and maintain internal AI infrastructure for LLM applications, agents, and retrieval systems
- Own inference deployment paths with access control, monitoring, and operational reliability
- Develop model gateways, routing logic, runtime integrations, and telemetry systems
- Create AI Ops capabilities including evaluation, observability, incident triage, and cost monitoring
- Build dashboards, tracing, logging, and alerting for production AI systems
- Optimize performance and unit economics through routing, caching, batching, and latency/cost optimization
What they're looking for
- Python and backend/systems programming
- AWS or GCP cloud platforms
- LLM inference deployment and serving systems
- Observability, telemetry, and reliability engineering
- Model APIs, gateways, and runtime infrastructure
- Evaluation systems and release workflows
- Incident response and debugging
- Platform abstraction and SDK design
Benefits
- Base salary: $140,000 - $165,000
- Work on cutting-edge AI infrastructure at scale
- Exposure to agentic systems and modern AI platforms
- Collaborative team focused on quality and developer experience
Opens the official application on the employer’s site. No login required.