Exa Labs
Software Engineer, Infrastructure
San Francisco, California$180k–$350kfulltimemidAdded 2 days ago
About this role
Exa, an applied AI lab building next-generation search infrastructure, seeks an Infrastructure Engineer to design and operate massive-scale systems powering web crawling, embedding models, and vector databases. You'll build the foundational tooling that enables rapid innovation across GPU clusters, distributed training, and high-performance inference systems.
What you'll do
- Scale GPU infrastructure to process web-scale data cost-efficiently across multiple regions and clouds
- Orchestrate multi-region training and inference workloads on GPU clusters using Kubernetes and Ray
- Design and maintain advanced LLM gateway and CI/CD systems for AI-native development
- Build observability and monitoring tooling to support massive distributed systems
- Create custom build infrastructure and caching solutions using Nix
- Automate software maintenance and infrastructure improvements company-wide
What they're looking for
- Distributed systems and large-scale infrastructure design
- Kubernetes and GPU cluster orchestration
- Ray or similar distributed computing frameworks
- Infrastructure-as-code and automation (Nix, Terraform, etc.)
- CI/CD pipeline design and optimization
- Multi-cloud and multi-region deployment
- Systems performance optimization and cost efficiency
- Observability and monitoring systems
Benefits
- Premium healthcare (medical, dental, vision)
- Fertility benefits
- 16 weeks fully paid parental leave
- Monthly wellness stipend
- Visa sponsorship available for international candidates
- In-person work in San Francisco
Opens the official application on the employer’s site. No login required.