Rust implementation of the CVM algorithm for counting distinct elements in a stream
0

Configure Feed

Select the types of activity you want to include in your feed.

More logic fixes tsk tsk

+7 -2
+1 -1
Cargo.toml
··· 8 8 keywords = ["CVM", "count-distinct", "estimation"] 9 9 categories = ["algorithms", ] 10 10 11 - version = "0.1.9" 11 + version = "0.1.10" 12 12 edition = "2021" 13 13 14 14 [dependencies]
+6 -1
src/lib.rs
··· 45 45 pub fn process_element(&mut self, elem: T) { 46 46 // We should switch to a treap (as per Knuth) to avoid the hash overhead, but FxHash 47 47 // is still a lot faster than linear searching a Vec, even at small (1000) buffer sizes 48 - self.buf.remove(&elem); 48 + // Round 0: if an element exists, remove it. Element is added back due to probability 1 49 + // When buffer is full, remove half the elements 50 + // Round 1: if an element exists, remove it. Element MAY be added back due to probability 0.5 51 + if self.buf.get(&elem).is_some() { 52 + self.buf.remove(&elem); 53 + } 49 54 if self.rng.gen_bool(self.probability) { 50 55 self.buf.insert(elem); 51 56 }