Skip to main content

Air Apps

Site Reliability Engineer (SRE)

Rome Metropolitain Area (Remote)fulltimemidAdded 2 days ago

About this role

Air Apps seeks an experienced Site Reliability Engineer to ensure system reliability, availability, and scalability across cloud environments. You'll design fault-tolerant infrastructure, implement observability tools, and automate deployment processes while collaborating with development teams in their Lisbon office.

What you'll do

  • Design and implement scalable, reliable systems across cloud environments
  • Develop monitoring, logging, and alerting infrastructure using tools like Prometheus, Grafana, or Datadog
  • Automate infrastructure provisioning and deployment using Infrastructure as Code (Terraform, CloudFormation)
  • Conduct root cause analysis and implement preventative measures to minimize failures
  • Optimize CI/CD pipelines and implement load balancing, failover, and disaster recovery strategies
  • Participate in on-call rotations to address system failures and minimize downtime

What they're looking for

  • Site Reliability Engineering and DevOps practices
  • Cloud platforms (AWS, Azure, or GCP)
  • Observability tools (Prometheus, Grafana, ELK, Datadog)
  • Infrastructure as Code (Terraform, CloudFormation, Pulumi)
  • Containerization and orchestration (Docker, Kubernetes, Helm)
  • Scripting languages (Bash, Python, or Go)
  • Linux system administration and networking
  • Incident management and distributed systems knowledge

Benefits

  • Apple hardware for work
  • Annual bonus
  • Top-tier health and life insurance
  • Transportation budget
  • Coverflex benefits package (meals, wellness)
  • Childcare support
Apply on the employer's site

Opens the official application on the employer’s site. No login required.