Skip to main content

21. Find Median from Data Stream

hardAsked at Confluent

Design a structure that absorbs numbers and reports the running median — Confluent uses it because two-heap streaming aggregates are the textbook fit for an online consumer over a Kafka topic.

By Alex Chen, Founder, InterviewChamp.AI · Last verified

Problem

Design a class MedianFinder with addNum(num) and findMedian(). findMedian returns the median of all numbers seen so far. Both operations should be efficient as the input grows.

Constraints

  • -10^5 <= num <= 10^5
  • At most 5*10^4 calls to addNum and findMedian
  • findMedian is called after at least one addNum

Examples

Example 1

Input
addNum(1); addNum(2); findMedian()
Output
1.5

Example 2

Input
addNum(3); findMedian()
Output
2.0

Approaches

1. Sorted array

Maintain a sorted array; insert by binary-search; report middle.

Time
O(n) per add
Space
O(n)
// arr stays sorted; addNum binary-searches insertion index, splices in

Tradeoff:

2. Two heaps

Keep a max-heap for the lower half and a min-heap for the upper half; balance sizes after each insert. Median is the top of the larger heap or the average of both tops.

Time
O(log n) per add
Space
O(n)
class MedianFinder {
  constructor() { this.lo = []; this.hi = []; } // lo = max-heap (negate); hi = min-heap
  _push(h, x) { h.push(x); for (let i = h.length - 1; i > 0;) { const p = (i-1)>>1; if (h[p] <= h[i]) break; [h[p], h[i]] = [h[i], h[p]]; i = p; } }
  _pop(h) { const t = h[0]; const last = h.pop(); if (h.length) { h[0] = last; let i=0, n=h.length; for(;;){let l=2*i+1,r=l+1,s=i; if(l<n&&h[l]<h[s])s=l; if(r<n&&h[r]<h[s])s=r; if(s===i)break; [h[s],h[i]]=[h[i],h[s]]; i=s;} } return t; }
  addNum(num) {
    this._push(this.lo, -num);
    this._push(this.hi, -this._pop(this.lo));
    if (this.hi.length > this.lo.length) this._push(this.lo, -this._pop(this.hi));
  }
  findMedian() {
    if (this.lo.length > this.hi.length) return -this.lo[0];
    return (-this.lo[0] + this.hi[0]) / 2;
  }
}

Tradeoff:

Confluent-specific tips

Confluent grades this on streaming integration — describe how partition assignment forces a per-partition pair of heaps and a downstream merger so exactly-once semantics keep the global median stable across consumer-group rebalance.

Solve it now

Free. No sign-up. Python and JavaScript run instantly in your browser.

Output

Press Run or Cmd+Enter to execute

Practice these live with InterviewChamp.AI

Drill Find Median from Data Stream and other Confluent interview questions under real-loop conditions with instant feedback on your reasoning, complexity claims, and code.

Practice these live with InterviewChamp.AI →