Apify Technologies s.r.o.
Platform Reliability Engineer
Prague (Remote)fulltimemidAdded 2 days ago
About this role
Apify seeks a Platform Reliability Engineer in Prague to strengthen production monitoring, incident management, and alerting systems. You'll work with a developer's mindset to improve observability practices across the platform without on-call duties, focusing on sustainable improvements that help engineering teams ship confidently.
What you'll do
- Operate and improve monitoring stack (Prometheus, Grafana, OpenTelemetry) with meaningful metrics and actionable alerts
- Define and implement incident management processes including communication, post-incident learning, and runbooks
- Collaborate with platform and product engineers to make reliability practices practical and adoptable
- Instrument services to expose appropriate signals that reflect customer experience
- Design alert routing and notification strategies to reduce noise and improve signal quality
- Write clear documentation and guidance that engineering teams actually use
What they're looking for
- Prometheus, Grafana, OpenTelemetry monitoring tools
- Alert routing and PagerDuty or similar incident tools
- Code reading and writing (TypeScript/Node.js preferred)
- AWS infrastructure (EC2, EKS, Kubernetes)
- Incident management and post-incident culture
- Metrics selection and observability design
- CI/CD and deployment practices
- Container technologies and Helm
Benefits
- No on-call rotation required
- Focus on sustainable improvement over emergency response
- Work with modern tech stack including Kubernetes, AWS, and TypeScript
- Collaborative environment with platform and product teams
- Located in Prague
Opens the official application on the employer’s site. No login required.