Skip to main content

139. Word Break

mediumAsked at Hugging Face

Determine if a string can be segmented into words from a dictionary. Hugging Face uses this as a DP signal problem with direct ML relevance — tokenization itself is a form of word breaking, and understanding how dynamic programming explores segmentation space helps engineers reason about subword tokenizer design.

By Alex Chen, Founder, InterviewChamp.AI · Last verified

Source citations

Public interview reports confirming this problem appears in Hugging Face loops.

  • Glassdoor (2026-Q1)Cited in Hugging Face SWE onsite reports as a DP problem with strong tokenization relevance.
  • Blind (2025-11)Hugging Face threads highlight Word Break as a medium problem frequently asked for ML and NLP engineering roles.

Problem

Given a string s and a dictionary of strings wordDict, return true if s can be segmented into a space-separated sequence of one or more dictionary words. Note that the same word in the dictionary may be reused multiple times in the segmentation.

Constraints

  • 1 <= s.length <= 300
  • 1 <= wordDict.length <= 1000
  • 1 <= wordDict[i].length <= 20
  • s and wordDict[i] consist of only lowercase English letters.
  • All the strings of wordDict are unique.

Examples

Example 1

Input
s = "leetcode", wordDict = ["leet","code"]
Output
true

Explanation: "leetcode" = "leet" + "code".

Example 2

Input
s = "applepenapple", wordDict = ["apple","pen"]
Output
true

Explanation: "apple" + "pen" + "apple". Reuse is allowed.

Example 3

Input
s = "catsandog", wordDict = ["cats","dog","sand","and","cat"]
Output
false

Explanation: No valid segmentation exists.

Approaches

1. Bottom-up DP

dp[i] = true if s[0..i-1] can be segmented. For each position i, check all substrings s[j..i-1] that are in the dictionary and dp[j] is true.

Time
O(n² * m) where m is the average word length for the slice check
Space
O(n)
function wordBreak(s, wordDict) {
  const wordSet = new Set(wordDict);
  const dp = new Array(s.length + 1).fill(false);
  dp[0] = true; // empty prefix is always segmentable
  for (let i = 1; i <= s.length; i++) {
    for (let j = 0; j < i; j++) {
      if (dp[j] && wordSet.has(s.slice(j, i))) {
        dp[i] = true;
        break;
      }
    }
  }
  return dp[s.length];
}

Tradeoff: O(n²) in practice (the slice is O(n) in the worst case). Clear recurrence, easy to explain. Preferred in interviews for its directness.

2. BFS / memoized DFS

Treat each index as a graph node. BFS from index 0: for each position, try all words in the dictionary; if s.startsWith(word, i), add i + word.length to the queue. Reach s.length = success.

Time
O(n * w * L) where w=wordDict size, L=max word length
Space
O(n)
function wordBreak(s, wordDict) {
  const visited = new Set();
  const queue = [0];
  while (queue.length) {
    const start = queue.shift();
    if (visited.has(start)) continue;
    visited.add(start);
    for (const word of wordDict) {
      if (s.startsWith(word, start)) {
        const next = start + word.length;
        if (next === s.length) return true;
        queue.push(next);
      }
    }
  }
  return false;
}

Tradeoff: BFS avoids revisiting positions (visited set). Good when wordDict is small but s is long. The DP approach is generally preferred for its simpler implementation.

Hugging Face-specific tips

Hugging Face will immediately connect this to tokenization: 'This is essentially what a greedy tokenizer does — it tries to match the longest word in the vocabulary at each position. A DP approach finds ALL valid segmentations, not just the greedy one.' Then ask: 'What changes if you need to return all valid segmentations instead of just a boolean?' (Word Break II — LC 140). The ability to articulate this connection signals NLP engineering depth.

Common mistakes

  • Not initializing dp[0] = true — the empty prefix is the base case; without it no word can start at index 0.
  • Using wordDict as a list for membership checks instead of converting to a Set — O(n) per lookup vs. O(1).
  • Forgetting to break early when dp[i] is set to true — otherwise the inner loop continues unnecessarily.
  • Off-by-one: s.slice(j, i) in a length-indexed loop — verify that dp indices and string slice indices align.

Follow-up questions

An interviewer at Hugging Face may pivot to one of these next:

  • Word Break II (LC 140) — return all valid segmentations; use backtracking with memoization.
  • How would you modify the algorithm to prefer the fewest segments?
  • How does this relate to the Viterbi algorithm used in sequence decoding for NLP models?

Solve it now

Free. No sign-up. Python and JavaScript run instantly in your browser.

Output

Press Run or Cmd+Enter to execute

FAQ

Why must dp[0] = true?

It represents the empty prefix, which is trivially segmentable. Without this base case, no word match at the start of the string can ever set dp[word.length] = true.

What is the time complexity?

O(n² * L) where n = s.length and L = average word length (for the string slice). With a Trie instead of a Set, lookups become O(L) total and the inner loop can be eliminated.

How does a Trie optimize this?

Instead of trying all j < i for each i, walk the Trie from each position i backward — the Trie prunes branches early when no word starts with the current prefix, reducing the search space.

Practice these live with InterviewChamp.AI

Drill Word Break and other Hugging Face interview questions under real-loop conditions with instant feedback on your reasoning, complexity claims, and code.

Practice these live with InterviewChamp.AI →