Skip to main content

Baseten

Software Engineer, Model Performance Systems

San Francisco (Remote)$160k–$200kfulltimemidAdded 2 days ago

About this role

Baseten is seeking early-career software engineers to build performance benchmarking and diagnostic tools for AI infrastructure. You'll develop automated systems to validate GPU clusters, measure LLM performance, and optimize model inference across high-performance computing environments.

What you'll do

  • Automate LLM quality benchmarks (GSM8K, MMLU) and custom performance testing for specific workloads
  • Create acceptance tests for GPU clusters, measuring memory bandwidth, networking throughput, and multi-node performance
  • Develop GPU-enabled development environments and maintain internal tools for model experimentation
  • Use PyTorch Profiler and NVIDIA Nsight Systems to identify performance bottlenecks and debug the compute/networking stack
  • Build dashboards and alerts for real-time monitoring of system health, model startup times, and runtime performance
  • Automate performance testing via CI/CD pipelines to catch regressions before production

What they're looking for

  • Python programming
  • GPU profiling and optimization
  • High-performance computing (HPC) concepts
  • NVIDIA software stack (CUDA, Nsight Systems)
  • Infrastructure and systems understanding
  • Benchmarking and performance testing
  • C++ (preferred)
  • LLM inference knowledge

Benefits

  • Competitive compensation with meaningful equity
  • Opportunity to gain world-class expertise in GPU orchestration and LLM inference
  • High autonomy to build tools from scratch and contribute to open-source projects
  • Direct impact on infrastructure powering major AI companies
  • Learning from expert-led team
Apply on the employer's site

Opens the official application on the employer’s site. No login required.