Baseten

Software Engineer, Model Performance Systems

San Francisco (Remote)$160k–$200kfulltimemidAdded 2 days ago

About this role

Baseten is seeking early-career software engineers to build performance benchmarking and diagnostic tools for AI infrastructure. You'll develop automated systems to validate GPU clusters, measure LLM performance, and optimize model inference across high-performance computing environments.

What you'll do

Automate LLM quality benchmarks (GSM8K, MMLU) and custom performance testing for specific workloads
Create acceptance tests for GPU clusters, measuring memory bandwidth, networking throughput, and multi-node performance
Develop GPU-enabled development environments and maintain internal tools for model experimentation
Use PyTorch Profiler and NVIDIA Nsight Systems to identify performance bottlenecks and debug the compute/networking stack
Build dashboards and alerts for real-time monitoring of system health, model startup times, and runtime performance
Automate performance testing via CI/CD pipelines to catch regressions before production

What they're looking for

Python programming
GPU profiling and optimization
High-performance computing (HPC) concepts
NVIDIA software stack (CUDA, Nsight Systems)
Infrastructure and systems understanding
Benchmarking and performance testing
C++ (preferred)
LLM inference knowledge

Benefits

Competitive compensation with meaningful equity
Opportunity to gain world-class expertise in GPU orchestration and LLM inference
High autonomy to build tools from scratch and contribute to open-source projects
Direct impact on infrastructure powering major AI companies
Learning from expert-led team

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.