30. Contains Duplicate

Q: Why hash set and not array?

Hash set gives O(1) average lookup. Using an array means O(n) lookup per check, blowing up to O(n^2) total.

Q: Why does Snowflake care about approximate counts?

An exact COUNT(DISTINCT) over a billion-row column requires either sorting or hashing a billion-element structure. APPROX_COUNT_DISTINCT uses HyperLogLog, which trades 1-2% error for orders of magnitude less memory and CPU.

easyAsked at Snowflake

Decide whether an array contains any duplicate value. Snowflake uses this to test the dedup primitive — and to lead into discussions of approximate-cardinality structures like HyperLogLog, which they use for APPROX_COUNT_DISTINCT.

By Sam K., Founder, InterviewChamp.AI · Last verified 2026-05-20

Source citations

Public interview reports confirming this problem appears in Snowflake loops.

Glassdoor (2026-Q1)— Snowflake new-grad onsites use this to set up HLL follow-up.
LeetCode Discuss (2025-12)— Recurring at Snowflake SDE-I screens.

Problem

Given an integer array nums, return true if any value appears at least twice in the array, and return false if every element is distinct.

Constraints

1 <= nums.length <= 10^5
-10^9 <= nums[i] <= 10^9

Examples

Example 1

Input

nums = [1,2,3,1]

Output

true

Example 2

Input

nums = [1,2,3,4]

Output

false

Example 3

Input

nums = [1,1,1,3,3,4,3,2,4,2]

Output

true

Approaches

1. Sort then scan

Sort. Check adjacent pairs.

Time: O(n log n)
Space: O(1) or O(n) depending on sort

function containsDuplicate(nums) {
  nums.sort((a, b) => a - b);
  for (let i = 1; i < nums.length; i++) {
    if (nums[i] === nums[i - 1]) return true;
  }
  return false;
}

Tradeoff: n log n. Mention only as a no-extra-space option.

2. Hash set (optimal)

Walk once; add to set; if you see a duplicate insertion, return true.

Time: O(n)
Space: O(n)

function containsDuplicate(nums) {
  const seen = new Set();
  for (const n of nums) {
    if (seen.has(n)) return true;
    seen.add(n);
  }
  return false;
}

Tradeoff: Linear time, linear space. Streaming-friendly — process row-at-a-time.

Snowflake-specific tips

Snowflake interviewers want both solutions discussed and your tradeoff reasoning. Bonus signal: discuss HyperLogLog and Snowflake's APPROX_COUNT_DISTINCT — at trillion-row scale you can't keep an exact set, so you use a probabilistic sketch with ~1% error in O(log log n) bits.

Common mistakes

Doing nums.length !== new Set(nums).size — works but iterates twice and allocates the full set even when an early duplicate exists.
Sorting in place when the caller didn't expect mutation.
Using includes() in a loop — O(n^2).

Follow-up questions

An interviewer at Snowflake may pivot to one of these next:

Contains Duplicate II (LC 219) — duplicate within k indices.
Contains Duplicate III (LC 220) — duplicate within k indices and t value difference.
How does HyperLogLog count distinct in a streaming fashion?

Solve it now

Free. No sign-up. Python and JavaScript run instantly in your browser.

Output

Press Run or Cmd+Enter to execute

FAQ

Why hash set and not array?

Hash set gives O(1) average lookup. Using an array means O(n) lookup per check, blowing up to O(n^2) total.

Why does Snowflake care about approximate counts?

An exact COUNT(DISTINCT) over a billion-row column requires either sorting or hashing a billion-element structure. APPROX_COUNT_DISTINCT uses HyperLogLog, which trades 1-2% error for orders of magnitude less memory and CPU.

30. Contains Duplicate

Source citations

Problem

Constraints

Examples

Approaches

1. Sort then scan

2. Hash set (optimal)

Snowflake-specific tips

Common mistakes

Follow-up questions

Solve it now

Editor Settings

FAQ

More Snowflake coding interview questions

Companies that also ask Contains Duplicate