Cursor

Software Engineer, ML Data Systems

San FranciscofulltimemidAdded 2 days ago

About this role

Build and maintain the data infrastructure that powers Cursor's ML-driven code automation product. You'll own the full lifecycle of data pipelines—from fixing immediate bottlenecks to redesigning systems at scale—while ensuring correctness, cost-efficiency, and privacy guarantees.

What you'll do

Design and ship data pipeline replacements while maintaining system reliability during transitions
Define instrumentation requirements for new product surfaces and integrate telemetry end-to-end
Debug performance issues across client, streaming, storage, and compute layers
Establish data contracts and schema evolution strategies to prevent silent failures across consumers
Optimize storage costs through retention policies, compression, and strategic data deletion
Identify and close instrumentation gaps that impact model evaluation and experimentation

What they're looking for

Apache Spark (Databricks or open-source)
Large-scale data pipeline design and ownership
Data modeling and schema design
Production debugging across infrastructure layers
Ray Data
ClickHouse (preferred)
Orchestration tools (dbt, Dagster)
Privacy-aware data systems

Benefits

In-person offices in North Beach (San Francisco) and Manhattan (New York)
Flat organizational structure with small, talented team
Ownership of high-impact systems that directly enable product teams
Daily shipping culture with focus on quality and correctness

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.