Accenture
Site Reliability Engineer
Arlington, VAFrom $203.4kmidAdded 2 days ago
About this role
Accenture Federal Services seeks a Site Reliability Engineer to ensure reliability and scalability of enterprise AI systems supporting mission-critical federal applications. You'll manage incident response, observability platforms, and operational workflows while collaborating across AI, DevSecOps, and cybersecurity teams in Arlington, VA.
What you'll do
- Ensure reliability, scalability, and performance of enterprise AI systems in Hub-and-Spoke architecture
- Lead incident response efforts and maintain service continuity for mission-critical applications
- Implement and manage SLOs/SLAs, capacity planning, and performance optimization strategies
- Operate observability platforms using OpenTelemetry, Prometheus, Grafana, Loki, and Tempo
- Drive FinOps practices to optimize operational costs and resource utilization
- Collaborate with cross-functional teams to integrate monitoring and continuous feedback mechanisms
What they're looking for
- OpenTelemetry, Prometheus, Grafana, Loki, and Tempo
- SLO/SLA management and monitoring techniques
- Incident response and reliability engineering
- FinOps practices and cost optimization
- Performance optimization and capacity planning
- CI/CD pipelines and continuous delivery
- Enterprise AI systems knowledge
- DevSecOps and observability operations
Benefits
- Competitive base pay commensurate with experience and location
- Comprehensive benefits package (details available on company website)
- Professional development through certifications and industry training
- Collaborative and inclusive workplace culture
- Opportunity to support mission-critical federal applications
- Glassdoor Top 100 Best Place to Work recognition
Opens the official application on the employer’s site. No login required.