Count-Min Sketch- Approximate Counter:

saurav omar
3 min readSep 11, 2019

--

If you are developer and working on a youtube , prime videos ,hotstar or similar kind of platforms. Now you wanted to check popularity of videos in your platform which can estimated through no of views of that particular videos. First thing directly comes ot our mind is we can use hashing technique.

Let’s say we have some 1 billion of videos then in this case you need 1GB of memory let’s say all videos id’s are not in sequence and generated through some logic which has 32 bit integer value then in this case total you need some 8 GB of memory, and you know the videos keeps on increasing on you platform so now you need 32 bits to generate video id but in future you may require 64 bits or more. So using Hashing you require some couple servers then we need to tackle Distributed problems, and required more money.

We already know if we do not need accurate count in this if we have approximate count this will hold true for us. Then here comes the Count-Min Sketch Algorithm.

Count-Min Sketches:-

Approximately how many time “Eminem Not afraid” has been viewed

One of the most popular forms of the sketch data structure is the Count-Min Sketch introduced by Muthukrishnan and Cormode in 2003. The idea is quite simple and the data structure is based on probabilistic algorithms to serve various types of queries on streaming data. The data structure is parameterized by two factors — ε and δ, where the error in answering the query is within a factor of ε with probability δ. So you can tune these parameters based on the space that you can afford and accordingly amortize the accuracy of results that the data structure can serve you.

The internal structure of a Count-Min Sketch is a table, similar to that of a Hash Table. while Hash Tables use a single hash function.

  • Count-Min Sketches use multiple hash functions
  • Initially, every cell in the Count-Min-Sketch is initialized to 0.
  • When any event occurs, the event’s id is hashed over every column. Each hash function outputs a row value, and the counter at each resulting row-column combination is incremented.
  • To query an event’s count, we take the minimum of that event’s counts over all of the hash functions.

Count(videoId) = min of all the Hash(videoId)

Let’s see with an example count total views of videos:

  • initially table is empty
  • Let’s say someone started viewing video “not afraid” and it mapped to below hash keys and increase the count.
  • Let’s say someone else searched and viewed “sing for the moment” and it mapped to below hash keys and the keys where the values are already present then it will increase existing value.
  • Let’s say we want to get views of “not afraid” songs then we get all the values of hash and take the minimum value which is 1.

count = 1 {min(h1=1, h2=1,h2=2, h2=2) }.

Important properties of Count-Min Sketch:

  • It gives you an approximate value.
  • Count-Min Sketch is always O(1) in both time and space even if we have millions of events occurs.
  • It is parallelizable. We can add multiple events to the table concurrently, so long as the hashes for those events don’t collide.

That’s It,

Happy Learning.

--

--

saurav omar
saurav omar

Written by saurav omar

Geek and Always ready to give more than 100%

No responses yet