openai

Software Engineer, Inference - Performance Optimization

San Franciscofulltimemid

About this role

OpenAI seeks a Software Engineer to optimize inference performance by building performance models, analyzing workloads across application and infrastructure layers, and creating tools to help teams understand latency, capacity, and cost tradeoffs. You'll partner with cross-functional teams to turn performance insights into concrete improvements for production systems.

What you'll do

Develop and refine performance models converting microbenchmarks into cost-to-serve estimates
Conduct end-to-end inference workload analysis across applications, models, and fleet infrastructure
Build tooling to identify performance bottlenecks across abstraction layers for latency and throughput
Partner with engineering and research teams to implement performance improvements
Project how future system changes affect inference performance and capacity needs
Analyze inference stack performance across application, model, and fleet layers

What they're looking for

Performance profiling and benchmarking
Distributed systems design and analysis
Model inference optimization
Hardware efficiency understanding
Cross-layer system analysis (kernels, accelerators, networking)
Systems modeling and capacity planning
Data analysis and visualization
Collaboration with engineering teams

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.