30. Contains Duplicate
easyAsked at SnowflakeDecide whether an array contains any duplicate value. Snowflake uses this to test the dedup primitive — and to lead into discussions of approximate-cardinality structures like HyperLogLog, which they use for APPROX_COUNT_DISTINCT.
By Alex Chen, Founder, InterviewChamp.AI · Last verified
Source citations
Public interview reports confirming this problem appears in Snowflake loops.
- Glassdoor (2026-Q1)— Snowflake new-grad onsites use this to set up HLL follow-up.
- LeetCode Discuss (2025-12)— Recurring at Snowflake SDE-I screens.
Problem
Given an integer array nums, return true if any value appears at least twice in the array, and return false if every element is distinct.
Constraints
1 <= nums.length <= 10^5-10^9 <= nums[i] <= 10^9
Examples
Example 1
nums = [1,2,3,1]trueExample 2
nums = [1,2,3,4]falseExample 3
nums = [1,1,1,3,3,4,3,2,4,2]trueApproaches
1. Sort then scan
Sort. Check adjacent pairs.
- Time
- O(n log n)
- Space
- O(1) or O(n) depending on sort
function containsDuplicate(nums) {
nums.sort((a, b) => a - b);
for (let i = 1; i < nums.length; i++) {
if (nums[i] === nums[i - 1]) return true;
}
return false;
}Tradeoff: n log n. Mention only as a no-extra-space option.
2. Hash set (optimal)
Walk once; add to set; if you see a duplicate insertion, return true.
- Time
- O(n)
- Space
- O(n)
function containsDuplicate(nums) {
const seen = new Set();
for (const n of nums) {
if (seen.has(n)) return true;
seen.add(n);
}
return false;
}Tradeoff: Linear time, linear space. Streaming-friendly — process row-at-a-time.
Snowflake-specific tips
Snowflake interviewers want both solutions discussed and your tradeoff reasoning. Bonus signal: discuss HyperLogLog and Snowflake's APPROX_COUNT_DISTINCT — at trillion-row scale you can't keep an exact set, so you use a probabilistic sketch with ~1% error in O(log log n) bits.
Common mistakes
- Doing nums.length !== new Set(nums).size — works but iterates twice and allocates the full set even when an early duplicate exists.
- Sorting in place when the caller didn't expect mutation.
- Using includes() in a loop — O(n^2).
Follow-up questions
An interviewer at Snowflake may pivot to one of these next:
- Contains Duplicate II (LC 219) — duplicate within k indices.
- Contains Duplicate III (LC 220) — duplicate within k indices and t value difference.
- How does HyperLogLog count distinct in a streaming fashion?
Solve it now
Free. No sign-up. Python and JavaScript run instantly in your browser.
FAQ
Why hash set and not array?
Hash set gives O(1) average lookup. Using an array means O(n) lookup per check, blowing up to O(n^2) total.
Why does Snowflake care about approximate counts?
An exact COUNT(DISTINCT) over a billion-row column requires either sorting or hashing a billion-element structure. APPROX_COUNT_DISTINCT uses HyperLogLog, which trades 1-2% error for orders of magnitude less memory and CPU.
Practice these live with InterviewChamp.AI
Drill Contains Duplicate and other Snowflake interview questions under real-loop conditions with instant feedback on your reasoning, complexity claims, and code.
Practice these live with InterviewChamp.AI →