Given n objects and associated scores, find top k objects s.t. k is much smaller compared to n.
The most trivial approach is of course to sort n objects based on their scores and pick top k. The best known comparison based sort will do this in O(n log n) with additional pass to pick top k. And of course, this is not what is expected! We need complexity to be lower than O(n log n).
Let's focus on last part of the problem statement. What additional benefit does it give us to know that k << n? One obvious thing that comes to mind is, we are interested in correct ordering of just top k objects, remaining n-k could be unordered at the benefit of having lower complexity. Thinking in the same direction, one can try to pick k such objects such that remaining n-k are virtually ignored! But, there has to be some mechanism to pick top k. The same thing considered in opposite way will give us better insights - i.e. we get lower complexity at the expense of neglecting ordering of non-top n-k objects; meaning we don't care about them, specifically their position in ordered list.
So, is there any way where we can selectively pick elements s.t. the highest score is picked first then the second highest scored and so on? well.. this suggests none other than a max-heap! a max-heap allows you to get maximum element one at a time with delete operation. Let's see the algorithm.
- Build heap of n objects given.
- for i=1 to k do
- get max element (delete from heap)
n = 1024, k = 10n+k log n ~ 1024 + 10 * 10 ~ 1024 -> O(n)
n = 1024, k = 200n+k log n ~ 1024 + 200 * 10 ~ 2000 -> O(k log n) which will approach O(n log n) as k -> n
:)
Hey Girija,
ReplyDeleteNice one. Thanks a lot.
Prashant.