Latent Defense

Site Reliability Engineer

San Francisco$200k–$275kfulltimemidAdded 2 days ago

About this role

Own the production infrastructure for a clinical AI platform serving major health systems, ensuring 99.9%+ uptime while enabling rapid product development. You'll architect and maintain mission-critical systems using Kubernetes, Terraform, and modern DevOps practices in a high-intensity, in-office environment.

What you'll do

Design, implement, and maintain production environment for clinical AI infrastructure
Manage containerized infrastructure using Kubernetes and Helm at scale
Optimize CI/CD deployment pipelines for TypeScript and Python/ML applications
Define and implement Infrastructure as Code using Terraform
Support developer experience by streamlining workflows and tooling
Own operational excellence and establish standards for system reliability

What they're looking for

Kubernetes and Helm
Terraform and Infrastructure as Code
CI/CD pipeline optimization
Distributed systems architecture
Command line and automation
PostgreSQL, Redis, and Kafka
Deployment at scale (500+ machines)
Python and TypeScript

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.