···3030If you're thinking about using this library, you presumably know that it only provides an estimate (within the specified bounds), similar to something like HyperLogLog. You are trading accuracy for speed!
31313232## Perf
3333-Calculating the unique tokens in a [418K UTF-8 text file](https://www.gutenberg.org/ebooks/8492) takes 19.2 ms ± 0.3 ms on an M2 Pro
3333+Calculating the unique tokens in a [418K UTF-8 text file](https://www.gutenberg.org/ebooks/8492) takes 18.6 ms ± 0.3 ms on an M2 Pro
34343535## Implementation Details
3636This library strips punctuation from input tokens using a regex. I assume there is a small performance penalty, but it seems like a small price to pay for increased practicality.
+1-1
src/lib.rs
···4646 // I think this will be faster than a hashset for practical sizes
4747 // but I need some empirical data for this
4848 if let Some(pos) = self.buf.iter().position(|x| *x == clean_word) {
4949- self.buf.remove(pos);
4949+ self.buf.swap_remove(pos);
5050 }
5151 if self.rng.gen_bool(self.probability) {
5252 self.buf.push(clean_word);