Latent Defense
Site Reliability Engineer
San Francisco$200k–$275kfulltimemidAdded 2 days ago
About this role
Own the production infrastructure for a clinical AI platform serving major health systems, ensuring 99.9%+ uptime while enabling rapid product development. You'll architect and maintain mission-critical systems using Kubernetes, Terraform, and modern DevOps practices in a high-intensity, in-office environment.
What you'll do
- Design, implement, and maintain production environment for clinical AI infrastructure
- Manage containerized infrastructure using Kubernetes and Helm at scale
- Optimize CI/CD deployment pipelines for TypeScript and Python/ML applications
- Define and implement Infrastructure as Code using Terraform
- Support developer experience by streamlining workflows and tooling
- Own operational excellence and establish standards for system reliability
What they're looking for
- Kubernetes and Helm
- Terraform and Infrastructure as Code
- CI/CD pipeline optimization
- Distributed systems architecture
- Command line and automation
- PostgreSQL, Redis, and Kafka
- Deployment at scale (500+ machines)
- Python and TypeScript
Opens the official application on the employer’s site. No login required.