Baseten
Software Engineer, Model Performance Systems
San Francisco (Remote)$160k–$200kfulltimemidAdded 2 days ago
About this role
Baseten is seeking early-career software engineers to build performance benchmarking and diagnostic tools for AI infrastructure. You'll develop automated systems to validate GPU clusters, measure LLM performance, and optimize model inference across high-performance computing environments.
What you'll do
- Automate LLM quality benchmarks (GSM8K, MMLU) and custom performance testing for specific workloads
- Create acceptance tests for GPU clusters, measuring memory bandwidth, networking throughput, and multi-node performance
- Develop GPU-enabled development environments and maintain internal tools for model experimentation
- Use PyTorch Profiler and NVIDIA Nsight Systems to identify performance bottlenecks and debug the compute/networking stack
- Build dashboards and alerts for real-time monitoring of system health, model startup times, and runtime performance
- Automate performance testing via CI/CD pipelines to catch regressions before production
What they're looking for
- Python programming
- GPU profiling and optimization
- High-performance computing (HPC) concepts
- NVIDIA software stack (CUDA, Nsight Systems)
- Infrastructure and systems understanding
- Benchmarking and performance testing
- C++ (preferred)
- LLM inference knowledge
Benefits
- Competitive compensation with meaningful equity
- Opportunity to gain world-class expertise in GPU orchestration and LLM inference
- High autonomy to build tools from scratch and contribute to open-source projects
- Direct impact on infrastructure powering major AI companies
- Learning from expert-led team
Opens the official application on the employer’s site. No login required.