Cursor
Software Engineer, ML Infrastructure
San FranciscofulltimemidAdded 2 days ago
About this role
Build and maintain large-scale ML infrastructure for Cursor's coding AI model. Work closely with researchers to optimize training throughput, manage GPU clusters across cloud and bare metal environments, and develop systems for workload scheduling and cluster automation.
What you'll do
- Collaborate with ML researchers to improve training throughput and reliability
- Plan and build cutting-edge GPU infrastructure with OEMs and cloud providers
- Optimize compute density and scalability for large-scale reinforcement learning workloads
- Develop software and systems to automate GPU cluster building, monitoring, and operations
- Design workload scheduling and data movement systems for distributed training
What they're looking for
- Systems and infrastructure software engineering
- Python, Typescript, Rust, or Golang
- Distributed storage and networking infrastructure
- Linux systems administration across cloud and bare metal
- Infrastructure-as-code and configuration management
- Kubernetes administration
- Large-scale distributed systems design
- GPU operations (Nvidia, Infiniband/RoCE preferred)
Opens the official application on the employer’s site. No login required.