Skip to main content

49. Group Anagrams

mediumAsked at Cohere

Group strings that are anagrams of one another. Cohere asks this because canonical-key hashing mirrors how embedding models deduplicate semantically-equivalent queries before routing them to a retrieval index.

By Alex Chen, Founder, InterviewChamp.AI · Last verified

Source citations

Public interview reports confirming this problem appears in Cohere loops.

  • Glassdoor (2025-Q4)Cohere MLE and SWE candidates cite Group Anagrams as a frequent medium question in coding rounds.
  • Blind (2025-09)Mentioned in Cohere prep guides as a standard hashing-and-grouping problem.

Problem

Given an array of strings strs, group the anagrams together. You can return the answer in any order. An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, using all the original letters exactly once.

Constraints

  • 1 <= strs.length <= 10^4
  • 0 <= strs[i].length <= 100
  • strs[i] consists of lowercase English letters.

Examples

Example 1

Input
strs = ["eat","tea","tan","ate","nat","bat"]
Output
[["bat"],["nat","tan"],["ate","eat","tea"]]

Explanation: Three anagram groups.

Example 2

Input
strs = [""]
Output
[[""]]

Explanation: Single empty string group.

Example 3

Input
strs = ["a"]
Output
[["a"]]

Explanation: Single character group.

Approaches

1. Sort each string as key

For each word, sort its characters to produce a canonical key. Group words by their canonical key in a Map.

Time
O(n · k log k) where k is max string length
Space
O(n · k)
function groupAnagrams(strs) {
  const map = new Map();
  for (const s of strs) {
    const key = s.split('').sort().join('');
    if (!map.has(key)) map.set(key, []);
    map.get(key).push(s);
  }
  return [...map.values()];
}

Tradeoff: Simple and correct. Sorting costs O(k log k) per string. For small alphabets a character-count key is asymptotically faster.

2. Character-count key

Build a 26-element frequency array for each string and use its string representation as the Map key. Avoids sorting.

Time
O(n · k)
Space
O(n · k)
function groupAnagrams(strs) {
  const map = new Map();
  for (const s of strs) {
    const count = new Array(26).fill(0);
    for (const ch of s) count[ch.charCodeAt(0) - 97]++;
    const key = count.join(',');
    if (!map.has(key)) map.set(key, []);
    map.get(key).push(s);
  }
  return [...map.values()];
}

Tradeoff: O(n·k) time — eliminates the sort. The key is a comma-joined frequency string. Preferred when k is large and the alphabet is fixed.

Cohere-specific tips

Cohere engineers think about canonical representations constantly — embeddings, normalised queries, deduplicated document hashes. When presenting this problem, frame canonical-key grouping as: 'I need a representation that is the same for all members of the equivalence class and different across classes.' Then discuss which canonical form is cheaper: sorted characters (O(k log k)) vs frequency vector (O(k)). Cohere teams working on retrieval will find the frequency-vector approach conceptually closer to TF-IDF and bag-of-words indexing.

Common mistakes

  • Using array reference as a Map key — two distinct arrays with identical content are not equal in JS. Always join to a string key.
  • Not handling empty strings — sorted empty string is an empty string, which is a valid key.
  • Sorting in-place and modifying the original string — split first to preserve the original.
  • Using an object instead of a Map — object keys are stringified differently for non-string types, though here both work.

Follow-up questions

An interviewer at Cohere may pivot to one of these next:

  • Find All Anagram Groups in a stream — how would you group anagrams arriving one at a time?
  • Valid Anagram (LC 242) — check if two strings are anagrams.
  • How would you group near-duplicate sentences where word order differs but vocabulary is identical?

Solve it now

Free. No sign-up. Python and JavaScript run instantly in your browser.

Output

Press Run or Cmd+Enter to execute

FAQ

Why join the count array with commas?

Without a separator, [1,10] and [11,0] would produce the same string '110'. Commas prevent key collisions.

What if strings contain uppercase or non-alpha characters?

The frequency-array approach needs a larger array or a general Map<char, count>. Mention this when discussing real-world input sanitization.

Is there a solution faster than O(n·k)?

Not in the worst case — you must read every character at least once to determine the canonical key.

Practice these live with InterviewChamp.AI

Drill Group Anagrams and other Cohere interview questions under real-loop conditions with instant feedback on your reasoning, complexity claims, and code.

Practice these live with InterviewChamp.AI →