We are glad to present you an article written by Qrator Labs' engineer Dmitry Kamaldinov. If you want to be a part of our Core team, write us at hr@qrator.net.
1 Introduction
On implementing streaming algorithms, counting of events often occurs, where an event means something like a packet arrival or a connection establishment. Since the number of events is large, the available memory can become a bottleneck: an ordinary \(n\)-bit counter allows to take into account no more than \(2^n - 1\) events.
One way to handle a larger range of values using the same amount of memory would be approximate counting. This article provides an overview of the well-known Morris algorithm and some generalizations of it.
Another way to reduce the number of bits required for counting mass events is to use decay. We discuss such an approach here, and we are going to publish another blog post on this particular topic shortly.
In the beginning of this article, we analyse one straightforward probabilistic calculation algorithm and highlight its shortcomings (Section 2). Then (Section 3), we describe the algorithm proposed by Robert Morris in 1978 and indicate its most essential properties and advantages. For most non-trivial formulas and statements, the text contains our proofs, the demanding reader can find them in the inserts. In the following three sections, we outline valuable extensions of the classic algorithm: you can learn what Morris's counters and exponential decay have in common, how to improve the accuracy by sacrificing the maximum value, and how to handle weighted events efficiently.