Skip to main content

System Design Questions

Design LinkedIn — System Design Interview Guide

Design LinkedIn is a system-design interview that asks you to build a professional network with a feed, a connection graph reaching three degrees out, and recruiter-grade people search across hundreds of millions of profiles. The hard part is the connection graph at scale and ranking the feed under engagement signals.

By Alex Chen, Founder, InterviewChamp.AI · Last verified

Reported in interviews at

  • LinkedIn
  • Meta
  • Microsoft
  • Amazon
  • Google

Sourced from Glassdoor, Levels.fyi, and Blind interview reports.

Functional requirements

  • Create and update a profile (work history, skills, education, headline)
  • Send and accept connection requests; maintain a bidirectional connection graph
  • Browse a personalized home feed of posts from connections and followed pages
  • Search people by name, company, title, or skill with filters for location and seniority
  • Show a profile's degree of separation from the viewer (1st, 2nd, or 3rd)
  • Notify users of profile views, messages, and key network events

Non-functional requirements

  • Scale: ~1B registered members, ~300M monthly active, average ~500 connections/user
  • Feed read latency: <500ms p99 from request to rendered feed
  • Search latency: <300ms p99 for people-search queries with filters
  • Availability: 99.95%; eventual consistency on feeds and connection counts is acceptable
  • Read-heavy: ~50:1 reads to writes (feed loads, profile views, searches dominate)

Capacity estimation

Public 2024 scale anchors: ~1B registered members, ~300M MAU, ~310M weekly visitors. Average member has ~500 first-degree connections; with second-degree the average user is connected to ~1M people, and third-degree often reaches the whole platform — the graph is small-world.

Write QPS: posts at ~5M/day = ~60/sec average, ~300/sec peak. Connection events ~30M/day = ~350/sec average. Profile edits ~10M/day. Reads: feed loads at ~150M/day = ~1,700/sec average, ~10K/sec peak. Search queries ~50M/day = ~600/sec average. People-search dominates compute because each query touches an inverted index over hundreds of millions of profiles.

Storage: profile data ~10 KB/user × 1B users = 10 TB metadata. Posts ~1 KB × 5M/day = 5 GB/day = ~2 TB/year. The connection graph at 500 avg connections × 1B users × 2 directions × 16 bytes/edge = ~16 TB. Search index over profile text is larger — ~50 TB inverted index for full-text + structured fields. Activity history (notifications, profile-view events) is voluminous — append-only log at ~1B events/day for several days hot.

High-level design

Five core services: profile, connection-graph, feed, search, and notifications. Each is independently sharded with its own storage.

Profile service stores member records in a sharded relational store keyed by user_id. Reads are cached aggressively in an in-memory tier because the same profile is fetched hundreds of times during a feed or search session. Profile updates publish change events to a streaming queue consumed by feed and search downstream.

Connection-graph service manages the bidirectional friend graph. Edges are stored in a sharded key-value store with two replicas per edge — sharded by user_id so 'my connections' is a single-shard fetch. The interesting structure is second/third-degree reach: precomputing every user's second-degree set is impractical (average ~1M, top users tens of millions). Instead, second-degree is computed on demand via a bidirectional BFS — see deep dive.

Feed service builds personalized timelines. Like Twitter-style social products this is fanout-based, but the per-post fanout is bounded because the average user has fewer connections than a celebrity (~500 vs millions), so fanout-on-write works for most posts. Each member has a precomputed timeline in an in-memory cache; new posts from connections are pushed into the cache and re-ranked.

Search service maintains an inverted index over profile fields and posts. Queries hit the index, which returns candidate document IDs; a re-ranking step scores candidates using viewer-specific signals (network proximity, mutual connections, profile completeness). Index updates flow from the profile and post streams and are applied within minutes.

Notifications service consumes events (someone viewed your profile, accepted your request, posted in a group) and writes per-user notification lists in a sharded store. Clients pull on app open; pushes go through a separate mobile-push subsystem.

Deep dive — the hard problem

Two deep dives: the connection graph at distance two/three, and feed ranking.

Connection graph: 'degree of separation' is shown on every profile view. Computing it naïvely (BFS from viewer across the graph) is O(network size) per query — at 10K queries/sec it would crush any graph store. Two production tactics combine.

First, bidirectional BFS: instead of walking from A toward B in one direction, walk one step from A and one step from B simultaneously, then check whether the frontiers intersect. After one expansion each side, both sides hold their direct connections (~500 each); intersecting these two sets is O(1000) — trivial. After a second expansion each side, both sides hold ~250K nodes — much larger but still bounded. Bidirectional BFS reaches third-degree with only two expansions per side and an intersection check.

Second, materialized second-degree counts but not the full set: store per-user 'connection-of-connection count' lazily as edges form, but recompute the actual intersection at query time. Most queries only need a degree label and a few example mutual-connections, both cheap on demand.

For the hottest queries (recruiter searches that touch second-degree on many candidates), precompute and cache a per-recruiter set of 'who can I reach in two hops' refreshed nightly. This is the standard tradeoff: precompute the hot path, query the long tail.

Feed ranking: pure recency ordering produces a noisy feed at LinkedIn's connection density (a user with 500 connections might see 200+ new posts/day). Production feeds rank using a learned score combining recency, predicted dwell time, engagement probability, and content-type diversity (jobs, posts, articles, learning suggestions). The score is computed offline per (viewer, post) pair for the hottest candidates and refreshed every few minutes; for fresh posts that haven't been scored yet, a fast online model produces a default score until the offline batch catches up. The interviewer wants to see you understand the two-tier (offline + online) split.

Third tradeoff: connection privacy. By default 'who viewed my profile' is visible to the viewee. Users in stealth mode mask their identity. This shapes the event-emission and aggregation pipeline: profile-view events carry a privacy flag; the notification service applies it before showing the viewer's name. Mention this — privacy is a frequent interviewer follow-up.

Common mistakes

  • Precomputing every user's second-degree connection set in storage — at 500 avg first-degree × 500 second-degree this is petabytes of data
  • Designing feed ranking as a single online model — the latency budget at p99 won't allow scoring thousands of candidates synchronously
  • Treating people-search as a generic text search — without network-proximity re-ranking, results are useless to a recruiter
  • Storing the connection graph in a single relational table — at 16 TB of edges, sharding by user_id is mandatory
  • Forgetting profile-view privacy — interviewers reward the candidate who raises it unprompted

Likely follow-up questions

  • How would you implement 'People You May Know' suggestions at scale?
  • What changes if you have to support 10M+ first-degree connections for an influencer account?
  • How would you support a recruiter-grade search that scans for candidates passively open to job offers?
  • How would your design handle a major company acquisition where 10K profiles update their employer the same day?
  • How would you implement real-time messaging on top of this connection graph?

Practice Design LinkedIn live with an AI interviewer

Free, no sign-up required. Get real-time feedback on your design.

Practice these live

Frequently asked questions

How long is the Design LinkedIn system-design round?
45-60 minutes. LinkedIn's own loop runs 60 minutes and expects coverage of the connection graph, feed ranking, and people-search. Source: Glassdoor LinkedIn senior engineer interview reports 2023-2024.
Is Design LinkedIn just Design Twitter plus a graph?
Surface-similar but the graph changes everything. Twitter is a follower fanout problem; LinkedIn is a bidirectional graph with second/third-degree queries baked into every page view. The graph component is where the interview signal lives.
Do I need to discuss the 'People You May Know' algorithm?
Mention it as a downstream of the connection graph (triadic closure: friends of friends are likely connections). Don't drill into the ML model — interviewers care that you can identify the offline batch precompute pattern, not the model architecture.
What is the most common mistake on Design LinkedIn?
Treating it like Facebook — designing a global news feed without recognizing that LinkedIn is fundamentally a graph product. Without bidirectional BFS or its equivalent, every profile view would melt your servers.