Skip to main content

4. Remove Duplicates from Sorted Array

easyAsked at Databricks

Modify a sorted array in-place to remove duplicates and return the new length. Databricks uses this to test the two-pointer / read-write head pattern that shows up in every distributed dedup operator.

By Alex Chen, Founder, InterviewChamp.AI · Last verified

Source citations

Public interview reports confirming this problem appears in Databricks loops.

  • LeetCode Discuss (2025-10)Databricks SDE-II phone screen warm-up.
  • Glassdoor (2026-Q1)Followed by 'how does Spark's distinct() work on a sorted partition?'

Problem

Given an integer array nums sorted in non-decreasing order, remove the duplicates in-place such that each unique element appears only once. The relative order of the elements should be kept the same. Return k after placing the final result in the first k slots of nums.

Constraints

  • 1 <= nums.length <= 3 * 10^4
  • -100 <= nums[i] <= 100
  • nums is sorted in non-decreasing order.

Examples

Example 1

Input
nums = [1,1,2]
Output
2, nums = [1,2,_]

Example 2

Input
nums = [0,0,1,1,1,2,2,3,3,4]
Output
5, nums = [0,1,2,3,4,_,_,_,_,_]

Approaches

1. Set + rebuild

Throw into a Set, write back.

Time
O(n)
Space
O(n)
function removeDuplicates(nums) {
  const s = [...new Set(nums)];
  for (let i = 0; i < s.length; i++) nums[i] = s[i];
  return s.length;
}

Tradeoff: Works but ignores the sorted-input invariant. Databricks wants in-place.

2. Read-pointer / write-pointer (two pointers)

Slow pointer marks the next write slot; fast pointer scans. Write only when fast sees a new value.

Time
O(n)
Space
O(1)
function removeDuplicates(nums) {
  if (nums.length === 0) return 0;
  let slow = 0;
  for (let fast = 1; fast < nums.length; fast++) {
    if (nums[fast] !== nums[slow]) {
      slow++;
      nums[slow] = nums[fast];
    }
  }
  return slow + 1;
}

Tradeoff: O(1) extra space. The sorted invariant means duplicates are adjacent — that's why one comparison is enough.

Databricks-specific tips

Databricks grades the in-place version because the read/write-head pattern is exactly how their sort-distinct operator runs on a sorted partition without an extra allocation. Be ready to discuss how this scales: on a Spark DataFrame, after sortWithinPartitions you can run this exact two-pointer dedup with zero shuffle. Mentioning that bonus shows you understand operator pipelining.

Common mistakes

  • Comparing nums[fast] to nums[fast-1] instead of nums[slow] — works because of sort but conceptually conflates read and write heads.
  • Returning slow instead of slow + 1 — off-by-one.
  • Allocating a new array — the problem explicitly requires in-place modification.

Follow-up questions

An interviewer at Databricks may pivot to one of these next:

  • Allow at most 2 duplicates (LC 80).
  • Unsorted input — what's the best you can do in-place?
  • Distributed: dedup a Spark DataFrame after sorting within partitions.

Solve it now

Free. No sign-up. Python and JavaScript run instantly in your browser.

Output

Press Run or Cmd+Enter to execute

FAQ

Why two pointers and not one?

The slow pointer marks where to WRITE; the fast pointer is where to READ. They diverge whenever you skip a duplicate, which is the whole point of in-place compaction.

Does this work on unsorted input?

No — the algorithm assumes duplicates are adjacent. Unsorted needs a hash set or sort first.

Practice these live with InterviewChamp.AI

Drill Remove Duplicates from Sorted Array and other Databricks interview questions under real-loop conditions with instant feedback on your reasoning, complexity claims, and code.

Practice these live with InterviewChamp.AI →