1. Two Sum
easyAsked at DatabricksGiven an array of integers, return indices of the two numbers that add up to a target. Databricks uses this as a warm-up to see if you naturally reach for a hash map and to gauge whether you can articulate the brute-force-to-optimal tradeoff in distributed terms.
By Alex Chen, Founder, InterviewChamp.AI · Last verified
Source citations
Public interview reports confirming this problem appears in Databricks loops.
- Glassdoor (2025-11)— Databricks phone screen warm-up; expect the follow-up about scaling to a Spark DataFrame.
- LeetCode Discuss (2026-Q1)— Listed as the canonical Databricks SDE-I phone-screen opener.
Problem
Given an array of integers nums and an integer target, return indices of the two numbers such that they add up to target. You may assume that each input would have exactly one solution, and you may not use the same element twice. You can return the answer in any order.
Constraints
2 <= nums.length <= 10^4-10^9 <= nums[i] <= 10^9Only one valid answer exists.
Examples
Example 1
nums = [2,7,11,15], target = 9[0,1]Explanation: nums[0] + nums[1] == 9.
Example 2
nums = [3,2,4], target = 6[1,2]Approaches
1. Brute force double loop
Check every pair (i, j) with i < j.
- Time
- O(n^2)
- Space
- O(1)
function twoSum(nums, target) {
for (let i = 0; i < nums.length; i++) {
for (let j = i + 1; j < nums.length; j++) {
if (nums[i] + nums[j] === target) return [i, j];
}
}
}Tradeoff: Quadratic. Never mention this as your final answer at Databricks — they want the hash-map insight first.
2. One-pass hash map
Walk once; for each x, check if target - x is already in the map. Otherwise store x's index.
- Time
- O(n)
- Space
- O(n)
function twoSum(nums, target) {
const seen = new Map();
for (let i = 0; i < nums.length; i++) {
const need = target - nums[i];
if (seen.has(need)) return [seen.get(need), i];
seen.set(nums[i], i);
}
}Tradeoff: Linear time, single pass. The key insight is reframing a pair search as a 'have I seen the complement' lookup.
Databricks-specific tips
Databricks interviewers will pivot from this single-machine problem to 'now imagine nums is a Spark DataFrame with 10^11 rows.' Be ready to discuss why the naive self-join on (df, target - df.value) is shuffle-heavy and how a broadcast hash join or a map-side aggregation could replace it. Articulating the lazy-evaluation model (transformations stack until an action triggers a shuffle) is the bonus signal they want.
Common mistakes
- Returning values instead of indices — read the prompt twice; LC and Databricks both want indices.
- Using a two-pass approach (build map, then scan) when one pass is cleaner and avoids double-counting i == j.
- Forgetting that nums[i] + nums[j] could overflow in some languages — at Databricks JVM/Scala roles this matters.
Follow-up questions
An interviewer at Databricks may pivot to one of these next:
- What if the array is sorted? (Two pointers, O(1) space.)
- What if you must return ALL pairs that sum to target? (Hash map of value -> list of indices.)
- Scale to a Spark DataFrame with 100B rows — broadcast small side or self-join with salting?
Solve it now
Free. No sign-up. Python and JavaScript run instantly in your browser.
FAQ
Why a hash map and not sorting + two pointers?
Sorting is O(n log n) and you lose the original indices. The hash-map version is O(n) and preserves indices directly.
What if there are duplicates?
The one-pass map handles it naturally: when you reach the second copy of a value, the first copy's index is already stored, so you find the pair.
Practice these live with InterviewChamp.AI
Drill Two Sum and other Databricks interview questions under real-loop conditions with instant feedback on your reasoning, complexity claims, and code.
Practice these live with InterviewChamp.AI →