Skip to main content

87. Repeated DNA Sequences

mediumAsked at Vercel

Find all 10-letter substrings that appear more than once in a DNA string. Vercel asks this for the rolling-hash / sliding-window pattern — same shape as their request-fingerprint dedup over a sliding time window.

By Alex Chen, Founder, InterviewChamp.AI · Last verified

Source citations

Public interview reports confirming this problem appears in Vercel loops.

  • Glassdoor (2025-12)Vercel runtime screen; sliding window expected.
  • LeetCode Discuss (2026-Q1)Listed in Vercel screen pool.

Problem

The DNA sequence is composed of a series of nucleotides abbreviated as 'A', 'C', 'G', and 'T'. Given a string s that represents a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.

Constraints

  • 1 <= s.length <= 10^5
  • s[i] is either 'A', 'C', 'G', or 'T'.

Examples

Example 1

Input
s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output
["AAAAACCCCC","CCCCCAAAAA"]

Example 2

Input
s = "AAAAAAAAAAAAA"
Output
["AAAAAAAAAA"]

Approaches

1. Hash set of 10-char substrings

Slide a window; track seen and seen-twice sets.

Time
O(n)
Space
O(n)
function findRepeatedDnaSequences(s) {
  const seen = new Set();
  const twice = new Set();
  for (let i = 0; i + 10 <= s.length; i++) {
    const sub = s.substring(i, i + 10);
    if (seen.has(sub)) twice.add(sub);
    else seen.add(sub);
  }
  return [...twice];
}

Tradeoff: Simple and correct. The rolling hash gives a constant-factor speedup but isn't strictly necessary for the LC constraint.

2. Rolling hash with 2-bit encoding (alternative)

Encode each nucleotide as 2 bits; pack 10 nucleotides into one 20-bit int; slide by shifting.

Time
O(n)
Space
O(n) in worst case
function findRepeatedDnaSequences(s) {
  const code = { A: 0, C: 1, G: 2, T: 3 };
  const seen = new Set(), twice = new Set();
  let hash = 0;
  const mask = (1 << 20) - 1;
  for (let i = 0; i < s.length; i++) {
    hash = ((hash << 2) | code[s[i]]) & mask;
    if (i >= 9) {
      if (seen.has(hash)) twice.add(s.substring(i - 9, i + 1));
      else seen.add(hash);
    }
  }
  return [...twice];
}

Tradeoff: Avoids substring allocation. The 2-bit-per-nucleotide packing is the classic bit-trick for DNA processing.

Vercel-specific tips

Vercel grades the sliding-window pattern. Bonus signal: offering the 2-bit rolling-hash version and explaining that the 10-character DNA window fits in 20 bits, perfect for fast hashing. Mention that for general strings you'd use a polynomial rolling hash.

Common mistakes

  • Returning ALL substrings, not just those appearing >= twice.
  • Using indexOf to count occurrences — O(n^2).
  • Off-by-one on the substring bounds — i + 10 vs i + 9.

Follow-up questions

An interviewer at Vercel may pivot to one of these next:

  • Generalize to k-letter substrings.
  • Find the most common k-mer.
  • Streaming version with bounded memory.

Solve it now

Free. No sign-up. Python and JavaScript run instantly in your browser.

Output

Press Run or Cmd+Enter to execute

FAQ

Why 2 bits per nucleotide?

Four characters need exactly 2 bits (4 = 2^2). Packing 10 characters into 20 bits fits in a JavaScript int — fast hash without collisions on this alphabet.

Why two sets, not one?

We want substrings appearing >= 2 times in the output. A single set can't distinguish 1 vs 2+. Two sets (seen and seen-twice) handle that cleanly.

Practice these live with InterviewChamp.AI

Drill Repeated DNA Sequences and other Vercel interview questions under real-loop conditions with instant feedback on your reasoning, complexity claims, and code.

Practice these live with InterviewChamp.AI →