Skip to main content

openai

Software Engineer, Inference - Performance Optimization

San Franciscofulltimemid

About this role

OpenAI seeks a Software Engineer to optimize inference performance by building performance models, analyzing workloads across application and infrastructure layers, and creating tools to help teams understand latency, capacity, and cost tradeoffs. You'll partner with cross-functional teams to turn performance insights into concrete improvements for production systems.

What you'll do

  • Develop and refine performance models converting microbenchmarks into cost-to-serve estimates
  • Conduct end-to-end inference workload analysis across applications, models, and fleet infrastructure
  • Build tooling to identify performance bottlenecks across abstraction layers for latency and throughput
  • Partner with engineering and research teams to implement performance improvements
  • Project how future system changes affect inference performance and capacity needs
  • Analyze inference stack performance across application, model, and fleet layers

What they're looking for

  • Performance profiling and benchmarking
  • Distributed systems design and analysis
  • Model inference optimization
  • Hardware efficiency understanding
  • Cross-layer system analysis (kernels, accelerators, networking)
  • Systems modeling and capacity planning
  • Data analysis and visualization
  • Collaboration with engineering teams
Apply on the employer's site

Opens the official application on the employer’s site. No login required.