Find Median from Data Stream⚓︎
Description⚓︎
The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value, and the median is the mean of the two middle values.
- For example, for
arr = [2,3,4]
, the median is3
. - For example, for
arr = [2,3]
, the median is(2 + 3) / 2 = 2.5
.
Implement the MedianFinder class:
MedianFinder()
initializes the MedianFinder
object.
void addNum(int num)
adds the integer num
from the data stream to the data structure.
double findMedian()
returns the median of all elements so far. Answers within 10^-5
of the actual answer will be accepted.
Example 1:
- Input:
["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"]
[[], [1], [2], [], [3], []]
- Output:
[null, null, null, 1.5, null, 2.0]
Explanation:
Constraints:
-10^5 <= num <= 10^5
- There will be at least one element in the data structure before calling
findMedian
. - At most
5 * 10^4
calls will be made toaddNum
andfindMedian
.
Solution⚓︎
Class Structure⚓︎
- Class MedianFinder: Contains two priority queues
small
andlarge
. small
: A max heap (standard priority queue in C++) that stores the smaller half of the numbers.large
: A min heap (priority queue withgreater<int>
comparison) that stores the larger half of the numbers.
Constructor⚓︎
- MedianFinder(): Initializes the MedianFinder object. It doesn't require any specific initialization for the heaps as their constructors handle that.
Function addNum(int num)⚓︎
- The function ensures that the two heaps are either equal in size or have a size difference of one.
- If
small
has more or equal elements thanlarge
, the new number is added tosmall
and then the maximum element fromsmall
is moved tolarge
. - Otherwise, the new number is added to
large
and then the minimum element fromlarge
is moved tosmall
. - This approach maintains the heaps such that
small
contains the smaller half of the numbers, andlarge
contains the larger half. The top ofsmall
is always less or equal to the top oflarge
.
Function findMedian()⚓︎
- Returns the median of the current data stream.
- If
small
has more elements, the median is the top ofsmall
. - If
large
has more elements, the median is the top oflarge
. - If they are of equal size, the median is the average of the tops of both
small
andlarge
.
Complexity Analysis⚓︎
- Time Complexity:
addNum(int num)
- Insertion into a priority queue: \(O(\log n)\), where \(n\) is the number of elements in the heap.
- Extracting the top and inserting into the other heap: \(O(\log n)\).
- Total for each insertion: \(O(\log n)\).
findMedian()
- Returning the top of a heap or calculating the average of the tops of two heaps: \(O(1)\).
- Space Complexity: The space complexity is \(O(n)\), where \(n\) is the number of elements in the data stream. This is because the data structure stores all the elements in the two heaps.
Follow-Up⚓︎
If 99%
of all integer numbers from the stream are in the range [0, 100]
, how would you optimize your solution?