Air Apps
Site Reliability Engineer (SRE)
Rome Metropolitain Area (Remote)fulltimemidAdded 2 days ago
About this role
Air Apps seeks an experienced Site Reliability Engineer to ensure system reliability, availability, and scalability across cloud environments. You'll design fault-tolerant infrastructure, implement observability tools, and automate deployment processes while collaborating with development teams in their Lisbon office.
What you'll do
- Design and implement scalable, reliable systems across cloud environments
- Develop monitoring, logging, and alerting infrastructure using tools like Prometheus, Grafana, or Datadog
- Automate infrastructure provisioning and deployment using Infrastructure as Code (Terraform, CloudFormation)
- Conduct root cause analysis and implement preventative measures to minimize failures
- Optimize CI/CD pipelines and implement load balancing, failover, and disaster recovery strategies
- Participate in on-call rotations to address system failures and minimize downtime
What they're looking for
- Site Reliability Engineering and DevOps practices
- Cloud platforms (AWS, Azure, or GCP)
- Observability tools (Prometheus, Grafana, ELK, Datadog)
- Infrastructure as Code (Terraform, CloudFormation, Pulumi)
- Containerization and orchestration (Docker, Kubernetes, Helm)
- Scripting languages (Bash, Python, or Go)
- Linux system administration and networking
- Incident management and distributed systems knowledge
Benefits
- Apple hardware for work
- Annual bonus
- Top-tier health and life insurance
- Transportation budget
- Coverflex benefits package (meals, wellness)
- Childcare support
Opens the official application on the employer’s site. No login required.