openai
Data Engineer
San Franciscofulltimemid
About this role
OpenAI is hiring a Data Engineer to design and maintain critical data pipelines and warehouse infrastructure that power analytics, safety systems, and model training. You'll collaborate across teams to build scalable, fault-tolerant systems that support business decisions and AI research.
What you'll do
- Design and manage data pipelines for ingesting user event data into the data warehouse
- Develop canonical datasets to track product metrics like user growth, engagement, and revenue
- Collaborate with Infrastructure, Data Science, Product, Marketing, Finance, and Research teams on data solutions
- Build robust, fault-tolerant systems for data ingestion and processing
- Participate in data architecture decisions and technical planning
- Ensure data security, integrity, and compliance with industry standards
What they're looking for
- Data pipeline design and development
- Python, Scala, or Java
- Apache Spark (writing, debugging, optimizing)
- Distributed processing frameworks (Hadoop, Flink)
- ETL schedulers (Airflow, Dagster, Prefect)
- Distributed storage systems (HDFS, S3)
- Data warehouse management
- SQL and data modeling
Benefits
- Work on AI systems with significant real-world impact
- Collaborate with ChatGPT research team
- San Francisco HQ location
- Relocation assistance provided
- Equal opportunity employer
Opens the official application on the employer’s site. No login required.