Skip to main content

30. Contains Duplicate

easyAsked at Snowflake

Decide whether an array contains any duplicate value. Snowflake uses this to test the dedup primitive — and to lead into discussions of approximate-cardinality structures like HyperLogLog, which they use for APPROX_COUNT_DISTINCT.

By Alex Chen, Founder, InterviewChamp.AI · Last verified

Source citations

Public interview reports confirming this problem appears in Snowflake loops.

  • Glassdoor (2026-Q1)Snowflake new-grad onsites use this to set up HLL follow-up.
  • LeetCode Discuss (2025-12)Recurring at Snowflake SDE-I screens.

Problem

Given an integer array nums, return true if any value appears at least twice in the array, and return false if every element is distinct.

Constraints

  • 1 <= nums.length <= 10^5
  • -10^9 <= nums[i] <= 10^9

Examples

Example 1

Input
nums = [1,2,3,1]
Output
true

Example 2

Input
nums = [1,2,3,4]
Output
false

Example 3

Input
nums = [1,1,1,3,3,4,3,2,4,2]
Output
true

Approaches

1. Sort then scan

Sort. Check adjacent pairs.

Time
O(n log n)
Space
O(1) or O(n) depending on sort
function containsDuplicate(nums) {
  nums.sort((a, b) => a - b);
  for (let i = 1; i < nums.length; i++) {
    if (nums[i] === nums[i - 1]) return true;
  }
  return false;
}

Tradeoff: n log n. Mention only as a no-extra-space option.

2. Hash set (optimal)

Walk once; add to set; if you see a duplicate insertion, return true.

Time
O(n)
Space
O(n)
function containsDuplicate(nums) {
  const seen = new Set();
  for (const n of nums) {
    if (seen.has(n)) return true;
    seen.add(n);
  }
  return false;
}

Tradeoff: Linear time, linear space. Streaming-friendly — process row-at-a-time.

Snowflake-specific tips

Snowflake interviewers want both solutions discussed and your tradeoff reasoning. Bonus signal: discuss HyperLogLog and Snowflake's APPROX_COUNT_DISTINCT — at trillion-row scale you can't keep an exact set, so you use a probabilistic sketch with ~1% error in O(log log n) bits.

Common mistakes

  • Doing nums.length !== new Set(nums).size — works but iterates twice and allocates the full set even when an early duplicate exists.
  • Sorting in place when the caller didn't expect mutation.
  • Using includes() in a loop — O(n^2).

Follow-up questions

An interviewer at Snowflake may pivot to one of these next:

  • Contains Duplicate II (LC 219) — duplicate within k indices.
  • Contains Duplicate III (LC 220) — duplicate within k indices and t value difference.
  • How does HyperLogLog count distinct in a streaming fashion?

Solve it now

Free. No sign-up. Python and JavaScript run instantly in your browser.

Output

Press Run or Cmd+Enter to execute

FAQ

Why hash set and not array?

Hash set gives O(1) average lookup. Using an array means O(n) lookup per check, blowing up to O(n^2) total.

Why does Snowflake care about approximate counts?

An exact COUNT(DISTINCT) over a billion-row column requires either sorting or hashing a billion-element structure. APPROX_COUNT_DISTINCT uses HyperLogLog, which trades 1-2% error for orders of magnitude less memory and CPU.

Practice these live with InterviewChamp.AI

Drill Contains Duplicate and other Snowflake interview questions under real-loop conditions with instant feedback on your reasoning, complexity claims, and code.

Practice these live with InterviewChamp.AI →