Skip to main content

49. Group Anagrams

mediumAsked at Hugging Face

Group strings that are anagrams of each other. Hugging Face uses this to test canonical-key hashing — the same fingerprinting technique used to cluster semantically equivalent dataset examples or deduplicate near-duplicate model cards in the Hub.

By Alex Chen, Founder, InterviewChamp.AI · Last verified

Source citations

Public interview reports confirming this problem appears in Hugging Face loops.

  • Glassdoor (2025-12)Cited in Hugging Face SWE medium-round reports as a canonical-key grouping exercise.
  • Blind (2025-09)Hugging Face threads list Group Anagrams as a common medium problem testing hash map design.

Problem

Given an array of strings strs, group the anagrams together. You can return the answer in any order. An Anagram is a word or phrase formed by rearranging the letters of a different word or phrase, using all the original letters exactly once.

Constraints

  • 1 <= strs.length <= 10^4
  • 0 <= strs[i].length <= 100
  • strs[i] consists of lowercase English letters.

Examples

Example 1

Input
strs = ["eat","tea","tan","ate","nat","bat"]
Output
[["bat"],["nat","tan"],["ate","eat","tea"]]

Explanation: "eat", "tea", "ate" are anagrams; "tan", "nat" are anagrams; "bat" stands alone.

Example 2

Input
strs = [""]
Output
[[""]]

Explanation: Single empty string.

Example 3

Input
strs = ["a"]
Output
[["a"]]

Explanation: Single character.

Approaches

1. Sorted-string key

Sort each string to produce a canonical key. Strings that are anagrams will have the same sorted key. Group them in a Map.

Time
O(n * k log k) where k is max string length
Space
O(n * k)
function groupAnagrams(strs) {
  const map = new Map();
  for (const str of strs) {
    const key = str.split('').sort().join('');
    if (!map.has(key)) map.set(key, []);
    map.get(key).push(str);
  }
  return Array.from(map.values());
}

Tradeoff: Simple and correct. The sort is O(k log k) per string. For short strings (k ≤ 100) this is essentially O(1) per string, making the total O(n). Preferred in most interview settings.

2. Character-count key

Build a 26-element frequency count for each string and use it as a key. No sorting required — O(k) per string.

Time
O(n * k)
Space
O(n * k)
function groupAnagrams(strs) {
  const map = new Map();
  for (const str of strs) {
    const count = new Array(26).fill(0);
    for (const ch of str) count[ch.charCodeAt(0) - 97]++;
    const key = count.join(',');
    if (!map.has(key)) map.set(key, []);
    map.get(key).push(str);
  }
  return Array.from(map.values());
}

Tradeoff: Strictly O(n * k) — better than sort for very long strings. Mention this optimization if the interviewer asks about the bottleneck or if k can be large.

Hugging Face-specific tips

Hugging Face will often follow up: 'What if the strings were Unicode and not just lowercase letters?' Discuss generalizing the character-count key to a Map<char, count> or using a sorted-character key (which already handles Unicode). Connect to their domain: 'The same canonical-fingerprint approach is how dataset deduplication works — you hash a normalized version of each example and group by hash.' This shows ML systems awareness.

Common mistakes

  • Using the original string as a key — anagrams have different strings but the same sorted key.
  • Sorting by character code but forgetting that sort() is lexicographic by default — always pass a comparator or use split/sort/join for strings.
  • Returning the map directly instead of converting to Array.from(map.values()).
  • Not handling empty strings — an empty string sorts to an empty string, which is a valid key.

Follow-up questions

An interviewer at Hugging Face may pivot to one of these next:

  • Group Shifted Strings (LC 249) — group strings whose character differences are the same.
  • How would you scale this to group anagrams in a corpus of 10^9 strings using MapReduce?
  • What canonical key would you use if the alphabet was Unicode instead of just lowercase English letters?

Solve it now

Free. No sign-up. Python and JavaScript run instantly in your browser.

Output

Press Run or Cmd+Enter to execute

FAQ

Why is sorted-string key acceptable despite O(k log k)?

In practice k ≤ 100, so the sort is effectively O(1). For very large k (e.g., genomic sequences), the character-count key at O(k) is strictly better.

Can two different strings have the same sorted key but not be anagrams?

No — sorting is a bijection on multisets of characters. Same sorted string ↔ same character multiset ↔ anagram.

Is the output order important?

No — 'return the answer in any order' means groups and their internal order can vary.

Practice these live with InterviewChamp.AI

Drill Group Anagrams and other Hugging Face interview questions under real-loop conditions with instant feedback on your reasoning, complexity claims, and code.

Practice these live with InterviewChamp.AI →