187. Repeated DNA Sequences
mediumFind every 10-letter sequence that appears more than once in a DNA string. The bit-encoding trick packs each 10-mer into a 20-bit integer, slashing memory and hash time.
By Alex Chen, Founder, InterviewChamp.AI · Last verified
Problem
The DNA sequence is composed of a series of nucleotides abbreviated as 'A', 'C', 'G', and 'T'. Given a string s that represents a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.
Constraints
1 <= s.length <= 10^5s[i] is either 'A', 'C', 'G', or 'T'.
Examples
Example 1
s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"["AAAAACCCCC","CCCCCAAAAA"]Example 2
s = "AAAAAAAAAAAAA"["AAAAAAAAAA"]Solve it now
Free. No sign-up. Python and JavaScript run instantly in your browser.
Hints
Progressive — try the first before opening the next.
Hint 1
Hashing the substrings into a hash map works — O(n * 10) time, O(n * 10) space.
Hint 2
Four nucleotides fit into 2 bits each. A 10-mer fits in 20 bits = one int. Roll the encoding like a sliding-window hash.
Hint 3
Map A=00, C=01, G=10, T=11. The 20-bit rolling hash is updated in O(1): ((hash << 2) | next_code) & MASK20.
Solution approach
Reveal approach
Rolling bit-encoded hash. Map A→0, C→1, G→2, T→3 with a small lookup. For positions i = 0 to len(s) - 10, maintain a 20-bit integer that encodes s[i..i+9]: shift left by 2 to drop the leftmost code (after masking to 20 bits) and OR in the new rightmost code. Use a hash set seen and a hash set added; whenever a rolling hash repeats and hasn't already been added, append the substring to the result and mark it added. O(n) time over the string (each window step is O(1)) and O(n) space. The bit-packing is a constant-factor win on memory and gives the hash set fixed-size integer keys instead of 10-char strings.
Complexity
- Time
- O(n)
- Space
- O(n)
Related patterns
- bit-manipulation
- hash-map
- sliding-window
Related problems
Asked at
Companies reported asking this problem (sourced from public Glassdoor, Blind, and Levels.fyi interview posts).
Practice these live with InterviewChamp.AI
Drill Repeated DNA Sequences and Bit Manipulation problems under real interview conditions with instant feedback on your reasoning, complexity claims, and code.
Practice these live with InterviewChamp.AI →