Databricks Coding Interview Questions
27 Databricks coding interview problems with full optimal solutions — 18 easy, 7 medium, 2 hard. Every problem ships with multiple approaches (brute-force first, then the optimal), complexity tables for each, company-specific tips on what an Databricks interviewer values, and a FAQ section.
- #16easyfoundational
16. Valid Anagram
Determine whether two strings are anagrams — Databricks surfaces this in early screens to test whether you reach for a frequency map, the same mental model behind deduplication passes in Delta Lake compaction jobs.
- #17easyfoundational
17. First Bad Version
Find the first broken build in a sequence — a canonical binary-search probe that mirrors how Databricks bisects failing notebook versions or regressed MLflow runs in a CI pipeline.
- #18easyfoundational
18. Counting Bits
Count set bits for every integer 0–n — a DP warm-up that directly parallels how Databricks computes per-partition popcount statistics in Photon's vectorized execution engine.
- #19mediumfoundational
19. Top K Frequent Elements
Return the k most frequent integers — the canonical heap-vs-bucket-sort duel that Databricks maps directly to top-N analytics queries and the cardinality-estimation problems inside Delta Live Tables.
- #20mediumfoundational
20. Min Stack
Design a stack that retrieves its minimum in O(1) — Databricks uses this to test auxiliary-state discipline, a pattern that shows up when tracking minimum-cost DAG nodes in a query optimizer.
- #21mediumfoundational
21. Find Peak Element
Locate any local maximum in O(log n) — Databricks ties this to binary-search strategies for finding optimal partition-split points in Delta Lake's data-skipping index.
- #22mediumfoundational
22. Course Schedule
Detect a cycle in a directed prerequisite graph — the textbook DAG-validation problem that Databricks applies directly to detecting circular dependencies in Delta Live Tables pipeline DAGs.
- #23mediumfoundational
23. Partition Labels
Greedily partition a string so each character appears in exactly one part — a range-merging pattern Databricks reuses when computing non-overlapping file-range compaction windows in Delta Lake's OPTIMIZE command.
- #24mediumfoundational
24. Subarray Sum Equals K
Count contiguous subarrays whose values sum to k — the prefix-sum technique here is the same one Databricks uses to compute rolling aggregations over unbounded streaming windows in Structured Streaming.
- #25mediumfoundational
25. Number of Islands
Count connected land components in a 2-D grid — a BFS/DFS connected-components pattern Databricks extends to counting disconnected data-lake zones and partitioning graph-based cluster topology.
- #26hardfoundational
26. Sliding Window Maximum
Return the maximum in every sliding window of size k — a deque-based streaming aggregation Databricks implements in Structured Streaming's watermark-bounded window queries over high-throughput event streams.
- #27hardfoundational
27. Serialize and Deserialize Binary Tree
Encode and reconstruct an arbitrary binary tree through a string — a serialization-format problem Databricks faces when checkpointing execution-plan trees in Delta's query optimizer and persisting MLflow model dependency graphs.
- #1easyfrequently asked
1. Two Sum
Given an array of integers, return indices of the two numbers that add up to a target. Databricks uses this as a warm-up to see if you naturally reach for a hash map and to gauge whether you can articulate the brute-force-to-optimal tradeoff in distributed terms.
- #2easyfrequently asked
2. Valid Parentheses
Determine if a string of brackets is balanced. Databricks asks this to see if you reach for a stack instinctively and whether you can map it onto SQL-parser or query-AST validation scenarios.
- #3easyfrequently asked
3. Merge Two Sorted Lists
Merge two sorted linked lists into one sorted list. Databricks uses this as a launchpad to the real question they care about: how does this generalize to merging K sorted partitions during a shuffle?
- #4easysometimes asked
4. Remove Duplicates from Sorted Array
Modify a sorted array in-place to remove duplicates and return the new length. Databricks uses this to test the two-pointer / read-write head pattern that shows up in every distributed dedup operator.
- #5easysometimes asked
5. Remove Element
Remove all occurrences of a value from an array in-place. Databricks uses this as the in-place-filter primitive that maps onto Spark's filter operator on a partition.
- #6easysometimes asked
6. Search Insert Position
Given a sorted array, return the index where a target should be inserted to keep it sorted. Databricks uses this to verify you can write a binary search that returns the LEFT bound, which is the canonical primitive for range partitioning.
- #7easyfrequently asked
7. Maximum Subarray
Find the contiguous subarray with the largest sum. Databricks asks this to test Kadane's algorithm and to set up the harder question: 'now do it on a Spark DataFrame partitioned across the cluster.'
- #8easyrarely asked
8. Plus One
Given a non-empty array of digits representing a non-negative integer, add one to the integer. Databricks asks this to see if you handle the carry-propagation cleanly and whether you reach for in-place mutation when the structure allows.
- #9easysometimes asked
9. Merge Sorted Array
Merge two sorted arrays into the first one, in-place, where the first has trailing space to hold the result. Databricks uses this to test the back-to-front merge trick, which is the same memory-efficient pattern their sort-merge join uses.
- #10easysometimes asked
10. Binary Tree Inorder Traversal
Return the inorder traversal of a binary tree's nodes' values. Databricks asks this to see if you can write both the recursive and iterative versions and explain why the iterative one matters in JVM-stack-bounded environments.
- #11easyrarely asked
11. Same Tree
Check whether two binary trees are structurally identical with the same values. Databricks uses this to test recursive pattern-matching, which is the same template Catalyst uses to compare query subtrees during optimizer rule application.
- #12easyrarely asked
12. Symmetric Tree
Determine if a binary tree is a mirror of itself around its center. Databricks asks this to test paired recursion — comparing two pointers that walk in opposite directions, which is the same primitive used in plan-folding and palindrome detection.
- #13easyfrequently asked
13. Maximum Depth of Binary Tree
Find the maximum depth of a binary tree. Databricks uses this to test the canonical 'return aggregated value upward' tree recursion that maps directly onto cost estimation in Catalyst.
- #14easysometimes asked
14. Balanced Binary Tree
Determine if a binary tree is height-balanced. Databricks asks this to test the post-order pattern where you return information up the tree to avoid recomputing heights at every node.
- #15easysometimes asked
15. Minimum Depth of Binary Tree
Find the minimum depth of a binary tree (distance from root to nearest LEAF). Databricks asks this because it tests whether you can distinguish 'null child' from 'leaf' — a subtle case that catches candidates who only memorized max-depth.
Related interview-prep guides
CodeSignal GCA for Tech Interviews in 2026: The Complete Guide
The CodeSignal General Coding Assessment is a 70-minute, four-task timed test scored on a 600 to 850 scale, used as a filter by Goldman Sachs, Capital One, Robinhood, Brex, and a growing list of tech and finance employers. This guide breaks down what it tests, how it scores, what it tracks during your session, and how a modern desktop setup pairs with it without showing up in proctored recordings.
System Design Interview Guide for CS New Grads (2026): Framework, Templates, Cheat Sheet
The new-grad system design interview is a vocabulary check, a structure check, and a communication check, not a senior architect evaluation. This guide gives you a 4-step framework, a 12-template cheat sheet, a 45-minute time budget, the five canonical problems that carry 80% of new-grad rotations, and a side-by-side of HLD vs LLD vs machine-learning-system-design. Built for the CS new grad who has solved 600 LeetCode problems but never drawn a load balancer.
How to Cold-Email a CS Recruiter as a New Grad in 2026 (Templates Inside)
Cold-emailing recruiters still works in 2026, but the playbook has narrowed. Generic templates get flagged as spam by both humans and email clients. What books calls in 2026 is short, specific, and respectful of the recruiter's time. This guide has the anatomy, the templates, and the follow-up cadence.