openai
Software Engineer, Observability
San Franciscofulltimemid
About this role
OpenAI is seeking a Software Engineer to build observability infrastructure and AI-powered tools that help engineers monitor and debug production systems at massive scale. You'll work across the full stack—from distributed logging and time series databases to intelligent dashboards and debugging interfaces—in a small team focused on making OpenAI's systems reliable and observable.
What you'll do
- Design and maintain large-scale observability infrastructure including distributed logging, time series, and trace storage systems
- Develop AI-native tools that enable engineers to autonomously detect, understand, and resolve production issues
- Build user-facing experiences such as dashboards, notebook interfaces, and interactive debugging tools
- Collaborate across engineering, research, and product teams to shape the next generation of observability capabilities
- Operate and scale systems handling petabytes of logs and billions of metrics across OpenAI's fleet
What they're looking for
- Large-scale distributed systems design and operation
- Logging systems and time series database expertise
- Full-stack development (backend, infrastructure, and UI)
- Systems, networking, and cloud infrastructure fundamentals
- Kubernetes and cloud platforms (AWS or equivalent)
- Observability tools experience (Prometheus, OpenTelemetry, etc.)
- Ability to work in ambiguous, unscoped environments
- Product thinking and user-focused engineering
Opens the official application on the employer’s site. No login required.