Cursor
Software Engineer, ML Data Systems
San FranciscofulltimemidAdded 2 days ago
About this role
Build and maintain the data infrastructure that powers Cursor's ML-driven code automation product. You'll own the full lifecycle of data pipelines—from fixing immediate bottlenecks to redesigning systems at scale—while ensuring correctness, cost-efficiency, and privacy guarantees.
What you'll do
- Design and ship data pipeline replacements while maintaining system reliability during transitions
- Define instrumentation requirements for new product surfaces and integrate telemetry end-to-end
- Debug performance issues across client, streaming, storage, and compute layers
- Establish data contracts and schema evolution strategies to prevent silent failures across consumers
- Optimize storage costs through retention policies, compression, and strategic data deletion
- Identify and close instrumentation gaps that impact model evaluation and experimentation
What they're looking for
- Apache Spark (Databricks or open-source)
- Large-scale data pipeline design and ownership
- Data modeling and schema design
- Production debugging across infrastructure layers
- Ray Data
- ClickHouse (preferred)
- Orchestration tools (dbt, Dagster)
- Privacy-aware data systems
Benefits
- In-person offices in North Beach (San Francisco) and Manhattan (New York)
- Flat organizational structure with small, talented team
- Ownership of high-impact systems that directly enable product teams
- Daily shipping culture with focus on quality and correctness
Opens the official application on the employer’s site. No login required.