Skip to main content

Cursor

Software Engineer, ML Data Systems

San FranciscofulltimemidAdded 2 days ago

About this role

Build and maintain the data infrastructure that powers Cursor's ML-driven code automation product. You'll own the full lifecycle of data pipelines—from fixing immediate bottlenecks to redesigning systems at scale—while ensuring correctness, cost-efficiency, and privacy guarantees.

What you'll do

  • Design and ship data pipeline replacements while maintaining system reliability during transitions
  • Define instrumentation requirements for new product surfaces and integrate telemetry end-to-end
  • Debug performance issues across client, streaming, storage, and compute layers
  • Establish data contracts and schema evolution strategies to prevent silent failures across consumers
  • Optimize storage costs through retention policies, compression, and strategic data deletion
  • Identify and close instrumentation gaps that impact model evaluation and experimentation

What they're looking for

  • Apache Spark (Databricks or open-source)
  • Large-scale data pipeline design and ownership
  • Data modeling and schema design
  • Production debugging across infrastructure layers
  • Ray Data
  • ClickHouse (preferred)
  • Orchestration tools (dbt, Dagster)
  • Privacy-aware data systems

Benefits

  • In-person offices in North Beach (San Francisco) and Manhattan (New York)
  • Flat organizational structure with small, talented team
  • Ownership of high-impact systems that directly enable product teams
  • Daily shipping culture with focus on quality and correctness
Apply on the employer's site

Opens the official application on the employer’s site. No login required.