Minhash Algorithm Visualizer

Minhash allows you to find similarity at scale. With this Minhash generator tool you can create Minhash values online from pieces of text. We show you the steps needed to create a Minhash value from the given text, and the outputs you can expect from this specific implementation of the Minhash algorithm.

This tool follows the popular Datasketch minhash Python library implementation of the algorithm.

All values are generated client-side.

The input text to generate Minhash values from

Base hash function to hash shingles with

Amount of words per shingle

Amount of hash permutations

Generated Minhash Values

Generate 2 or more Minhash values sets to compare their results with eachother.

No Minhash values generated yet

Compare Minhash Values

It's often interesting to compare Minhash values for clustering or deduplication purposes. Compare your previous Minhash values with Jaccard similarity.


How does this Minhash algorithm work?

When you enter a piece of text, the following steps are taken:

  1. Create k-word shingles from the given text
  2. Apply the

What does Minhash return?

The return value of the Minhash algorithm depends on the implementation. Generating Minhash for a value generally does not return a single value, but a list of values. The number of values returned is determined by the number of hash functions used, often called permutations.