87. Repeated DNA Sequences
mediumAsked at VercelFind all 10-letter substrings that appear more than once in a DNA string. Vercel asks this for the rolling-hash / sliding-window pattern — same shape as their request-fingerprint dedup over a sliding time window.
By Alex Chen, Founder, InterviewChamp.AI · Last verified
Source citations
Public interview reports confirming this problem appears in Vercel loops.
- Glassdoor (2025-12)— Vercel runtime screen; sliding window expected.
- LeetCode Discuss (2026-Q1)— Listed in Vercel screen pool.
Problem
The DNA sequence is composed of a series of nucleotides abbreviated as 'A', 'C', 'G', and 'T'. Given a string s that represents a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.
Constraints
1 <= s.length <= 10^5s[i] is either 'A', 'C', 'G', or 'T'.
Examples
Example 1
s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"["AAAAACCCCC","CCCCCAAAAA"]Example 2
s = "AAAAAAAAAAAAA"["AAAAAAAAAA"]Approaches
1. Hash set of 10-char substrings
Slide a window; track seen and seen-twice sets.
- Time
- O(n)
- Space
- O(n)
function findRepeatedDnaSequences(s) {
const seen = new Set();
const twice = new Set();
for (let i = 0; i + 10 <= s.length; i++) {
const sub = s.substring(i, i + 10);
if (seen.has(sub)) twice.add(sub);
else seen.add(sub);
}
return [...twice];
}Tradeoff: Simple and correct. The rolling hash gives a constant-factor speedup but isn't strictly necessary for the LC constraint.
2. Rolling hash with 2-bit encoding (alternative)
Encode each nucleotide as 2 bits; pack 10 nucleotides into one 20-bit int; slide by shifting.
- Time
- O(n)
- Space
- O(n) in worst case
function findRepeatedDnaSequences(s) {
const code = { A: 0, C: 1, G: 2, T: 3 };
const seen = new Set(), twice = new Set();
let hash = 0;
const mask = (1 << 20) - 1;
for (let i = 0; i < s.length; i++) {
hash = ((hash << 2) | code[s[i]]) & mask;
if (i >= 9) {
if (seen.has(hash)) twice.add(s.substring(i - 9, i + 1));
else seen.add(hash);
}
}
return [...twice];
}Tradeoff: Avoids substring allocation. The 2-bit-per-nucleotide packing is the classic bit-trick for DNA processing.
Vercel-specific tips
Vercel grades the sliding-window pattern. Bonus signal: offering the 2-bit rolling-hash version and explaining that the 10-character DNA window fits in 20 bits, perfect for fast hashing. Mention that for general strings you'd use a polynomial rolling hash.
Common mistakes
- Returning ALL substrings, not just those appearing >= twice.
- Using indexOf to count occurrences — O(n^2).
- Off-by-one on the substring bounds — i + 10 vs i + 9.
Follow-up questions
An interviewer at Vercel may pivot to one of these next:
- Generalize to k-letter substrings.
- Find the most common k-mer.
- Streaming version with bounded memory.
Solve it now
Free. No sign-up. Python and JavaScript run instantly in your browser.
FAQ
Why 2 bits per nucleotide?
Four characters need exactly 2 bits (4 = 2^2). Packing 10 characters into 20 bits fits in a JavaScript int — fast hash without collisions on this alphabet.
Why two sets, not one?
We want substrings appearing >= 2 times in the output. A single set can't distinguish 1 vs 2+. Two sets (seen and seen-twice) handle that cleanly.
Practice these live with InterviewChamp.AI
Drill Repeated DNA Sequences and other Vercel interview questions under real-loop conditions with instant feedback on your reasoning, complexity claims, and code.
Practice these live with InterviewChamp.AI →