4. Remove Duplicates from Sorted Array
easyAsked at DatabricksModify a sorted array in-place to remove duplicates and return the new length. Databricks uses this to test the two-pointer / read-write head pattern that shows up in every distributed dedup operator.
By Alex Chen, Founder, InterviewChamp.AI · Last verified
Source citations
Public interview reports confirming this problem appears in Databricks loops.
- LeetCode Discuss (2025-10)— Databricks SDE-II phone screen warm-up.
- Glassdoor (2026-Q1)— Followed by 'how does Spark's distinct() work on a sorted partition?'
Problem
Given an integer array nums sorted in non-decreasing order, remove the duplicates in-place such that each unique element appears only once. The relative order of the elements should be kept the same. Return k after placing the final result in the first k slots of nums.
Constraints
1 <= nums.length <= 3 * 10^4-100 <= nums[i] <= 100nums is sorted in non-decreasing order.
Examples
Example 1
nums = [1,1,2]2, nums = [1,2,_]Example 2
nums = [0,0,1,1,1,2,2,3,3,4]5, nums = [0,1,2,3,4,_,_,_,_,_]Approaches
1. Set + rebuild
Throw into a Set, write back.
- Time
- O(n)
- Space
- O(n)
function removeDuplicates(nums) {
const s = [...new Set(nums)];
for (let i = 0; i < s.length; i++) nums[i] = s[i];
return s.length;
}Tradeoff: Works but ignores the sorted-input invariant. Databricks wants in-place.
2. Read-pointer / write-pointer (two pointers)
Slow pointer marks the next write slot; fast pointer scans. Write only when fast sees a new value.
- Time
- O(n)
- Space
- O(1)
function removeDuplicates(nums) {
if (nums.length === 0) return 0;
let slow = 0;
for (let fast = 1; fast < nums.length; fast++) {
if (nums[fast] !== nums[slow]) {
slow++;
nums[slow] = nums[fast];
}
}
return slow + 1;
}Tradeoff: O(1) extra space. The sorted invariant means duplicates are adjacent — that's why one comparison is enough.
Databricks-specific tips
Databricks grades the in-place version because the read/write-head pattern is exactly how their sort-distinct operator runs on a sorted partition without an extra allocation. Be ready to discuss how this scales: on a Spark DataFrame, after sortWithinPartitions you can run this exact two-pointer dedup with zero shuffle. Mentioning that bonus shows you understand operator pipelining.
Common mistakes
- Comparing nums[fast] to nums[fast-1] instead of nums[slow] — works because of sort but conceptually conflates read and write heads.
- Returning slow instead of slow + 1 — off-by-one.
- Allocating a new array — the problem explicitly requires in-place modification.
Follow-up questions
An interviewer at Databricks may pivot to one of these next:
- Allow at most 2 duplicates (LC 80).
- Unsorted input — what's the best you can do in-place?
- Distributed: dedup a Spark DataFrame after sorting within partitions.
Solve it now
Free. No sign-up. Python and JavaScript run instantly in your browser.
FAQ
Why two pointers and not one?
The slow pointer marks where to WRITE; the fast pointer is where to READ. They diverge whenever you skip a duplicate, which is the whole point of in-place compaction.
Does this work on unsorted input?
No — the algorithm assumes duplicates are adjacent. Unsorted needs a hash set or sort first.
Practice these live with InterviewChamp.AI
Drill Remove Duplicates from Sorted Array and other Databricks interview questions under real-loop conditions with instant feedback on your reasoning, complexity claims, and code.
Practice these live with InterviewChamp.AI →