Skip to main content

Apify Technologies s.r.o.

Platform Reliability Engineer

Prague (Remote)fulltimemidAdded 2 days ago

About this role

Apify seeks a Platform Reliability Engineer in Prague to strengthen production monitoring, incident management, and alerting systems. You'll work with a developer's mindset to improve observability practices across the platform without on-call duties, focusing on sustainable improvements that help engineering teams ship confidently.

What you'll do

  • Operate and improve monitoring stack (Prometheus, Grafana, OpenTelemetry) with meaningful metrics and actionable alerts
  • Define and implement incident management processes including communication, post-incident learning, and runbooks
  • Collaborate with platform and product engineers to make reliability practices practical and adoptable
  • Instrument services to expose appropriate signals that reflect customer experience
  • Design alert routing and notification strategies to reduce noise and improve signal quality
  • Write clear documentation and guidance that engineering teams actually use

What they're looking for

  • Prometheus, Grafana, OpenTelemetry monitoring tools
  • Alert routing and PagerDuty or similar incident tools
  • Code reading and writing (TypeScript/Node.js preferred)
  • AWS infrastructure (EC2, EKS, Kubernetes)
  • Incident management and post-incident culture
  • Metrics selection and observability design
  • CI/CD and deployment practices
  • Container technologies and Helm

Benefits

  • No on-call rotation required
  • Focus on sustainable improvement over emergency response
  • Work with modern tech stack including Kubernetes, AWS, and TypeScript
  • Collaborative environment with platform and product teams
  • Located in Prague
Apply on the employer's site

Opens the official application on the employer’s site. No login required.