Skip to main content

187. Repeated DNA Sequences

medium

Find every 10-letter sequence that appears more than once in a DNA string. The bit-encoding trick packs each 10-mer into a 20-bit integer, slashing memory and hash time.

By Alex Chen, Founder, InterviewChamp.AI · Last verified

Problem

The DNA sequence is composed of a series of nucleotides abbreviated as 'A', 'C', 'G', and 'T'. Given a string s that represents a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.

Constraints

  • 1 <= s.length <= 10^5
  • s[i] is either 'A', 'C', 'G', or 'T'.

Examples

Example 1

Input
s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output
["AAAAACCCCC","CCCCCAAAAA"]

Example 2

Input
s = "AAAAAAAAAAAAA"
Output
["AAAAAAAAAA"]

Solve it now

Free. No sign-up. Python and JavaScript run instantly in your browser.

Output

Press Run or Cmd+Enter to execute

Hints

Progressive — try the first before opening the next.

Hint 1

Hashing the substrings into a hash map works — O(n * 10) time, O(n * 10) space.

Hint 2

Four nucleotides fit into 2 bits each. A 10-mer fits in 20 bits = one int. Roll the encoding like a sliding-window hash.

Hint 3

Map A=00, C=01, G=10, T=11. The 20-bit rolling hash is updated in O(1): ((hash << 2) | next_code) & MASK20.

Solution approach

Reveal approach

Rolling bit-encoded hash. Map A→0, C→1, G→2, T→3 with a small lookup. For positions i = 0 to len(s) - 10, maintain a 20-bit integer that encodes s[i..i+9]: shift left by 2 to drop the leftmost code (after masking to 20 bits) and OR in the new rightmost code. Use a hash set seen and a hash set added; whenever a rolling hash repeats and hasn't already been added, append the substring to the result and mark it added. O(n) time over the string (each window step is O(1)) and O(n) space. The bit-packing is a constant-factor win on memory and gives the hash set fixed-size integer keys instead of 10-char strings.

Complexity

Time
O(n)
Space
O(n)

Related patterns

  • bit-manipulation
  • hash-map
  • sliding-window

Related problems

Asked at

Companies reported asking this problem (sourced from public Glassdoor, Blind, and Levels.fyi interview posts).

  • LinkedIn

Practice these live with InterviewChamp.AI

Drill Repeated DNA Sequences and Bit Manipulation problems under real interview conditions with instant feedback on your reasoning, complexity claims, and code.

Practice these live with InterviewChamp.AI →