Serve Robotics
Reliability Operations Engineer (Malaysia)
Penang, Malaysia (Remote)fulltimemidAdded 2 days ago
About this role
Serve Robotics is seeking a Reliability Operations Engineer in Penang, Malaysia, to ensure the operational reliability of their robotic and cloud systems. The role involves handling escalations, improving operational workflows, and collaborating with engineering teams to maintain system health and incident response.
What you'll do
- Lead incident investigations during daytime hours and provide timely updates.
- Respond to Tier 1 escalations and use runbooks for issue remediation.
- Update operational documentation based on new findings.
- Enhance tools and scripts for troubleshooting tasks.
- Utilize observability tools to identify anomalies in system performance.
- Participate in a weekend on-call rotation for incident responses.
What they're looking for
- Bachelor's in Computer Science, IT or related field
- 2-4 years in Reliability Operations or related roles
- Experience in incident response and log analysis
- Proficiency in Linux system diagnostics
- Familiarity with Grafana, Prometheus, and Google Cloud Monitoring
- Knowledge of CI/CD pipelines
- Experience with Jira or similar systems
- Strong communication and troubleshooting skills
Opens the official application on the employer’s site. No login required.