Skip to main content

System Design Questions

Design Slack — System Design Interview Guide

Design Slack is a system-design interview that asks you to build team chat for the workplace: persistent channels organized by workspace, threaded replies, presence, search across years of history, and integrations. The hard part is multi-tenant isolation plus search at message-trillions scale.

By Alex Chen, Founder, InterviewChamp.AI · Last verified

Reported in interviews at

  • Slack
  • Salesforce
  • Meta
  • Microsoft
  • Atlassian

Sourced from Glassdoor, Levels.fyi, and Blind interview reports.

Functional requirements

  • Workspace with channels (public and private) and direct messages between users
  • Send a message to a channel; receive in real time on all connected members
  • Threaded replies on any parent message (one-level thread, not arbitrary depth)
  • Search across channel and DM history within a workspace, by keyword and filters
  • Presence: show whether each workspace member is online, away, or active in a channel

Non-functional requirements

  • End-to-end message latency: <500ms p99 from sender send to receiver display
  • Search latency: <1s p99 for keyword search over a workspace's full history
  • Availability: 99.99% for messaging, 99.9% for search (degraded search acceptable)
  • Scale: ~20M+ daily active users across millions of workspaces, ~1B messages/day, peak ~50K messages/sec

Capacity estimation

Public 2023 Slack scale: ~20M+ daily active users, ~10M+ workspaces with a long tail (most workspaces are <100 users; a small minority have 10K+). Daily message volume: ~1B messages/day = ~12K messages/sec average, peak ~50K (heavily skewed to workday hours in US/EU timezones). Average message ~200 bytes text + metadata; ~20% have a file or rich attachment.

Storage: 1B × 200B = 200 GB/day of text. With files, image previews, and metadata indexes, realized storage is ~2-3 TB/day = ~1 PB/year. The search index over the message corpus is itself ~200 GB/year. Workspaces grow forever — Slack rarely deletes — so total accumulated storage is in the high tens of PB after several years.

The key shape: multi-tenancy. A single global system serves millions of independent workspaces. Each workspace's data must never leak to another, but the infrastructure must share efficiently. This shifts the sharding key conversation — it's almost always workspace_id, with workspace_id range-partitioned across shards so a single tenant's reads colocate.

High-level design

Three core domains: real-time messaging, durable history, and search.

Real-time messaging: clients hold a persistent connection (typically WebSocket) to a regional gateway. The gateway authenticates the connection, joins the user to all their channel rooms, and forwards messages bidirectionally. A connection-routing tier maps user_id → which gateway holds the open socket, so any other server can forward a message to a specific user. Messages are written to a durable message store (sharded relational store, sharded by workspace_id) before being delivered, so an offline recipient sees the message on next reconnect by reading from the per-channel history.

Durable history: a sharded relational store partitioned by workspace_id holds messages with a per-channel monotonically increasing sequence number. Reads (channel scrollback) are bounded range queries on (channel_id, seq DESC LIMIT 50). Files and attachments are uploaded directly to object storage; the message envelope carries only the file reference.

Search: a dedicated inverted-index cluster, also sharded by workspace_id, ingests messages via change-data-capture from the message store. Each workspace's search index lives on a small number of shards (small workspaces share a shard with others; enterprise workspaces get their own). A search query routes to the workspace's shard(s), runs the query locally, merges if needed, and returns hits. Filters (channel, sender, date range) are applied at the index level. Presence is a TTL-based key-value store keyed by user_id; gateway servers refresh the user's status on every heartbeat. Threaded replies are stored with a parent_message_id and rendered as a separate panel when the user opens a thread — a single bounded query against the parent's reply list.

Deep dive — the hard problem

Two deep dives: multi-tenant isolation at scale, and search.

Multi-tenant isolation: the cardinal failure mode of a multi-tenant chat system is a noisy neighbor — one giant workspace (50K users, 100M messages) ruining latency for a small workspace sharing the same shard. The standard solution is a tiered shard topology. Small workspaces (the long tail) share shards efficiently: hundreds of small workspaces per shard, isolated logically by workspace_id in every query. Medium workspaces get a dedicated shard. Large enterprise workspaces get a dedicated shard cluster — sometimes even a fully separate database instance. The router knows each workspace's tier and routes accordingly.

Isolation also means rate limits: per-workspace rate limits on message sends, search queries, and API calls, so one workspace's bot can't degrade another. Quotas are enforced at the gateway before the message reaches the message store. Discuss the rate-limit story explicitly — interviewers always probe noisy-neighbor scenarios.

Search deep dive: inverted indexes over years of history are expensive. Two patterns help. First, time-tiered indexes: recent messages (last 30 days) live in a hot index optimized for fast writes and fast reads; older messages live in a cold index optimized for compression. Most search queries hit only recent — degrade older-results latency, not recent. Second, per-workspace index isolation: a single workspace's query never scans another workspace's data, which both bounds query cost and provides hard isolation. The trade-off: indexes are smaller and the workspace dimension is always the first filter.

Third hard surface: thread fanout and notifications. When a user is @mentioned, the notification routes to their device(s) via the gateway if they're online or via push notification if offline. Threads add a wrinkle — a user can subscribe to a thread to get notifications even when they're not @mentioned. The notification service maintains a subscription edge (thread_id, user_id) and emits a notification on every new thread reply for each subscriber. At scale this is fanout-on-write to a notification queue per user, drained on connect.

Common mistakes

  • Sharding messages by message_id instead of workspace_id — kills tenant-locality and search routing
  • Designing one global search index instead of per-workspace shards — security risk + giant fan-out per query
  • Treating Slack threads as arbitrary-depth trees (they're flat — one parent message + a list of replies)
  • Forgetting noisy-neighbor protection; one 50K-user workspace tanks latency for 10K small ones on the same shard
  • Missing the tiered shard topology — assuming all workspaces look the same scaling-wise

Likely follow-up questions

  • How would you support 'Connect' channels that bridge two different workspaces?
  • What changes if you add end-to-end encryption (server can't see message content)?
  • How would you handle a single workspace with 500K members (a public community-style workspace)?
  • How would you implement search-as-you-type with sub-100ms responses?
  • How would you build a compliance archive feature that exports years of a workspace's messages?

Practice Design Slack live with an AI interviewer

Free, no sign-up required. Get real-time feedback on your design.

Practice these live

Frequently asked questions

How long is a Design Slack system-design interview?
45-60 minutes. Slack's own loop (and Salesforce's) explicitly grades on multi-tenancy + search + real-time delivery. Source: Glassdoor Slack/Salesforce 2022-2024 reports.
Is Design Slack easier than Design WhatsApp?
Different shape. WhatsApp is heavy on end-to-end encryption and 2B+ user persistent connections; Slack is heavy on multi-tenancy and search. Most interviewers consider them comparable. Slack's search-at-history-scale is the surface WhatsApp doesn't have.
Do I need to talk about specific search engines (Elasticsearch, etc.)?
No — discuss 'an inverted-index cluster' as a generic property. Naming a specific engine is fine but never required. The signal is understanding why search needs its own store, why it's sharded by workspace, and why recent and historical data are tiered.
What's the single most-asked follow-up?
Noisy neighbor — what happens when one workspace is 100x larger than the next. If you don't have a tiered-shard answer ready, you'll spend 10 minutes inventing one under pressure.