Skip to main content

139. Word Break

mediumAsked at Cohere

Determine if a string can be segmented into words from a dictionary. Cohere asks this because the DP segmentation pattern directly models how subword tokenisers decide optimal byte-pair-encoding splits during vocabulary construction.

By Alex Chen, Founder, InterviewChamp.AI · Last verified

Source citations

Public interview reports confirming this problem appears in Cohere loops.

  • Glassdoor (2026-Q1)Cohere MLE and SWE candidates identify Word Break as a recurring DP question with an NLP framing in onsite interviews.
  • Blind (2025-11)Cohere threads note Word Break as especially relevant given the company's focus on language and tokenisation.

Problem

Given a string s and a dictionary of strings wordDict, return true if s can be segmented into a space-separated sequence of one or more dictionary words. Note that the same word in the dictionary may be reused multiple times in the segmentation.

Constraints

  • 1 <= s.length <= 300
  • 1 <= wordDict.length <= 1000
  • 1 <= wordDict[i].length <= 20
  • s and wordDict[i] consist of only lowercase English letters.
  • All the strings of wordDict are unique.

Examples

Example 1

Input
s = "leetcode", wordDict = ["leet","code"]
Output
true

Explanation: "leetcode" = "leet" + "code".

Example 2

Input
s = "applepenapple", wordDict = ["apple","pen"]
Output
true

Explanation: "apple" + "pen" + "apple" — words can be reused.

Example 3

Input
s = "catsandog", wordDict = ["cats","dog","sand","and","cat"]
Output
false

Explanation: No valid segmentation.

Approaches

1. Bottom-up DP

dp[i] = true if s[0..i-1] can be segmented. For each position i, check all substrings s[j..i-1] where dp[j] is true and the substring is in the wordSet.

Time
O(n³) worst case (n² substrings, each check O(n) for string comparison)
Space
O(n + m)
function wordBreak(s, wordDict) {
  const wordSet = new Set(wordDict);
  const dp = new Array(s.length + 1).fill(false);
  dp[0] = true; // empty prefix is always valid
  for (let i = 1; i <= s.length; i++) {
    for (let j = 0; j < i; j++) {
      if (dp[j] && wordSet.has(s.slice(j, i))) {
        dp[i] = true;
        break;
      }
    }
  }
  return dp[s.length];
}

Tradeoff: Clear DP. O(n²) iterations; each slice is O(n). Use a Set for O(1) word lookup. In practice very fast for the given constraints.

2. BFS over string positions

Treat each index as a node. BFS from 0; for each position, try all words from the dictionary and jump by word.length if the substring matches.

Time
O(n · m · L) where m=dict size, L=max word length
Space
O(n)
function wordBreak(s, wordDict) {
  const visited = new Set();
  const queue = [0];
  while (queue.length) {
    const start = queue.shift();
    if (visited.has(start)) continue;
    visited.add(start);
    for (const word of wordDict) {
      const end = start + word.length;
      if (end <= s.length && s.slice(start, end) === word) {
        if (end === s.length) return true;
        queue.push(end);
      }
    }
  }
  return false;
}

Tradeoff: Intuitive graph framing. The visited set is critical to avoid redundant re-exploration. Better when the dictionary is small relative to n.

Cohere-specific tips

Cohere interviewers appreciate the connection to tokenisation: 'Word Break is essentially the problem a subword tokeniser solves — given a character sequence and a vocabulary, find the optimal segmentation.' Discuss how the DP table maps to the Viterbi algorithm used in hidden Markov model-based tokenisers. If the interviewer asks for all possible segmentations (Word Break II), mention memoised backtracking.

Common mistakes

  • Using wordDict as an array for lookup — convert to a Set for O(1) average lookup.
  • Not initialising dp[0] = true — the empty prefix is the required base case.
  • Forgetting the break after setting dp[i] = true — without it, unnecessary inner iterations continue.
  • Using recursion without memoisation — exponential time for pathological inputs like 'aaaa...a' with words ['a', 'aa', 'aaa'].

Follow-up questions

An interviewer at Cohere may pivot to one of these next:

  • Word Break II (LC 140) — return all valid segmentations.
  • How would you extend this to handle unknown words with a character-level fallback?
  • Describe how byte-pair encoding tokenisation relates to this DP segmentation problem.

Solve it now

Free. No sign-up. Python and JavaScript run instantly in your browser.

Output

Press Run or Cmd+Enter to execute

FAQ

Why is dp[0] = true?

It represents the empty prefix, which is trivially segmentable. Without it, no word matching the start of s would ever set dp[word.length] = true.

What makes this problem hard for memoised recursion?

The same suffix can be reached by many different prefix segmentations. Without memoisation, subproblems are recomputed exponentially. The DP table avoids this.

How does Trie-based lookup improve the complexity?

A Trie allows prefix matching in O(L) rather than checking all m words at each position, reducing the inner loop from O(m·L) to O(n) in practice.

Practice these live with InterviewChamp.AI

Drill Word Break and other Cohere interview questions under real-loop conditions with instant feedback on your reasoning, complexity claims, and code.

Practice these live with InterviewChamp.AI →