openai
Software Engineer, Inference - Performance Optimization
San Franciscofulltimemid
About this role
OpenAI seeks a Software Engineer to optimize inference performance by building performance models, analyzing workloads across application and infrastructure layers, and creating tools to help teams understand latency, capacity, and cost tradeoffs. You'll partner with cross-functional teams to turn performance insights into concrete improvements for production systems.
What you'll do
- Develop and refine performance models converting microbenchmarks into cost-to-serve estimates
- Conduct end-to-end inference workload analysis across applications, models, and fleet infrastructure
- Build tooling to identify performance bottlenecks across abstraction layers for latency and throughput
- Partner with engineering and research teams to implement performance improvements
- Project how future system changes affect inference performance and capacity needs
- Analyze inference stack performance across application, model, and fleet layers
What they're looking for
- Performance profiling and benchmarking
- Distributed systems design and analysis
- Model inference optimization
- Hardware efficiency understanding
- Cross-layer system analysis (kernels, accelerators, networking)
- Systems modeling and capacity planning
- Data analysis and visualization
- Collaboration with engineering teams
Opens the official application on the employer’s site. No login required.