Extra:Hashing

From COMP15212 Wiki

This is mostly for background information; it does not have too much impact on operating systems but can be used in protection mechanisms such as encoding passwords. Hashing may also be employed to facilitate the implementation of some algorithms.

Hashing is a form of encoding a ‘key’ to data in a fixed length form. For example an 8-bit hash could be formed from each (variable length) name in a group of people for convenience in later searches. For example a (poor quality) hash could be made by simply adding the character values of all the letters in a name.

If the hashes are stored then an incoming name (string) can also be hashed in the same way. It then requires a single comparison to determine that the new name doesn’t match an entry in the database. This is much more efficient (faster!) than comparing the strings if there is considerable likelihood of similarity.

John Johnson
John Jones
John Smith
...

Hashes are necessarily not unique: a hash match does not guarantee a string match but a hash mismatch does guarantee a string mismatch, reducing the need for string comparisons.

A good hash function will produce output values with equal probabilities: i.e. in an 8-bit hash there should be (roughly) 1/256 of each input with a given hash value.

When used for disguising a password (for example) a hash value will be considerably longer than 8 bits and the function will be … quite complex.