Skip to main content

Latent Defense

Site Reliability Engineer

San Francisco$200k–$275kfulltimemidAdded 2 days ago

About this role

Own the production infrastructure for a clinical AI platform serving major health systems, ensuring 99.9%+ uptime while enabling rapid product development. You'll architect and maintain mission-critical systems using Kubernetes, Terraform, and modern DevOps practices in a high-intensity, in-office environment.

What you'll do

  • Design, implement, and maintain production environment for clinical AI infrastructure
  • Manage containerized infrastructure using Kubernetes and Helm at scale
  • Optimize CI/CD deployment pipelines for TypeScript and Python/ML applications
  • Define and implement Infrastructure as Code using Terraform
  • Support developer experience by streamlining workflows and tooling
  • Own operational excellence and establish standards for system reliability

What they're looking for

  • Kubernetes and Helm
  • Terraform and Infrastructure as Code
  • CI/CD pipeline optimization
  • Distributed systems architecture
  • Command line and automation
  • PostgreSQL, Redis, and Kafka
  • Deployment at scale (500+ machines)
  • Python and TypeScript
Apply on the employer's site

Opens the official application on the employer’s site. No login required.