Skip to main content

Cursor

Software Engineer, ML Infrastructure

San FranciscofulltimemidAdded 2 days ago

About this role

Build and maintain large-scale ML infrastructure for Cursor's coding AI model. Work closely with researchers to optimize training throughput, manage GPU clusters across cloud and bare metal environments, and develop systems for workload scheduling and cluster automation.

What you'll do

  • Collaborate with ML researchers to improve training throughput and reliability
  • Plan and build cutting-edge GPU infrastructure with OEMs and cloud providers
  • Optimize compute density and scalability for large-scale reinforcement learning workloads
  • Develop software and systems to automate GPU cluster building, monitoring, and operations
  • Design workload scheduling and data movement systems for distributed training

What they're looking for

  • Systems and infrastructure software engineering
  • Python, Typescript, Rust, or Golang
  • Distributed storage and networking infrastructure
  • Linux systems administration across cloud and bare metal
  • Infrastructure-as-code and configuration management
  • Kubernetes administration
  • Large-scale distributed systems design
  • GPU operations (Nvidia, Infiniband/RoCE preferred)
Apply on the employer's site

Opens the official application on the employer’s site. No login required.