Skip to main content

Accenture

Site Reliability Engineer

Arlington, VAFrom $203.4kmidAdded 2 days ago

About this role

Accenture Federal Services seeks a Site Reliability Engineer to ensure reliability and scalability of enterprise AI systems supporting mission-critical federal applications. You'll manage incident response, observability platforms, and operational workflows while collaborating across AI, DevSecOps, and cybersecurity teams in Arlington, VA.

What you'll do

  • Ensure reliability, scalability, and performance of enterprise AI systems in Hub-and-Spoke architecture
  • Lead incident response efforts and maintain service continuity for mission-critical applications
  • Implement and manage SLOs/SLAs, capacity planning, and performance optimization strategies
  • Operate observability platforms using OpenTelemetry, Prometheus, Grafana, Loki, and Tempo
  • Drive FinOps practices to optimize operational costs and resource utilization
  • Collaborate with cross-functional teams to integrate monitoring and continuous feedback mechanisms

What they're looking for

  • OpenTelemetry, Prometheus, Grafana, Loki, and Tempo
  • SLO/SLA management and monitoring techniques
  • Incident response and reliability engineering
  • FinOps practices and cost optimization
  • Performance optimization and capacity planning
  • CI/CD pipelines and continuous delivery
  • Enterprise AI systems knowledge
  • DevSecOps and observability operations

Benefits

  • Competitive base pay commensurate with experience and location
  • Comprehensive benefits package (details available on company website)
  • Professional development through certifications and industry training
  • Collaborative and inclusive workplace culture
  • Opportunity to support mission-critical federal applications
  • Glassdoor Top 100 Best Place to Work recognition
Apply on the employer's site

Opens the official application on the employer’s site. No login required.