Provide Best Programming Tutorials

Hash Functions And Hash Codes

A typical hash function first converts a search key to an integer value called a hash code, then compresses the hash code into an index to the hash table.

Java’s root class Object has the hashCode method, which returns an integer hash code. By default, the method returns the memory address for the object. The general contract for the hashCode method is as follows:

  1. You should override the hashCode method whenever the equals method is overridden to ensure two equal objects return the same hash
  2. During the execution of a program, invoking the hashCode method multiple times returns the same integer, provided that the object’s data are not changed.
  3. Two unequal objects may have the same hash code, but you should implement the

hashCode method to avoid too many such cases.

Hash Codes for Primitive Types

For search keys of the type byte, short, int, and char, simply cast them to int. Therefore, two different search keys of any one of these types will have different hash codes.

For a search key of the type float, use Float.floatToIntBits(key) as the hash code. Note floatToIntBits(float f) returns an int value whose bit representation is the same as the bit representation for the floating number f. Thus, two different search keys of the float type will have different hash codes.

For a search key of the type long, simply casting it to int would not be a good choice, because all keys that differ in only the first 32 bits will have the same hash code. To take the first 32 bits into consideration, divide the 64 bits into two halves and perform the exclusive- or operation to combine the two halves. This process is called folding. The hash code for a long key is

int hashCode = (int)(key ^ (key >> 32));

Note >> is the right-shift operator that shifts the bits 32 positions to the right. For example, 1010110 >> 2 yields 0010101. The ^ is the bitwise exclusive-or operator. It operates on two corresponding bits of the binary operands. For example, 1010110 ^ 0110111 yields 1100001. For more on bitwise operations, see Appendix G, Bitwise Operations.

For a search key of the type double, first convert it to a long value using the Double. doubleToLongBits method, then perform a folding as follows:

long bits = Double.doubleToLongBits(key);
int hashCode = (int)(bits ^ (bits >> 32));

 

Hash Codes for Strings

Search keys are often strings, so it is important to design a good hash function for strings. An intuitive approach is to sum the Unicode of all characters as the hash code for the string. This approach may work if two search keys in an application don’t contain the same letters,

but it will produce a lot of collisions if the search keys contain the same letters, such as

tod and dot.

A better approach is to generate a hash code that takes the position of characters into con- sideration. Specifically, let the hash code be

s0 * b(n – 1) + s1 * b(n – 2) + g + sn – 1

where si is s.charAt(i). This expression is a polynomial for some positive b, so this is called a polynomial hash code. Using Horner’s rule for polynomial evaluation the hash code can be calculated efficiently as follows:

( g ((s0 * b + s1) * b + s2) * b + g + sn 2) * b + sn 1

This computation can cause an overflow for long strings, but arithmetic overflow is ignored in Java. You should choose an appropriate value b to minimize collisions. Experiments show that good choices for b are 31, 33, 37, 39, and 41. In the String class, the hashCode is overridden using the polynomial hash code with b being 31.

Compressing Hash Codes

The hash code for a key can be a large integer that is out of the range for the hash-table index, so you need to scale it down to fit in the index’s range. Assume the index for a hash table is between 0 and N-1. The most common way to scale an integer to between 0 and N1 is to use

index = hashCode % N;

Ideally, you should choose a prime number for N to ensure the indices are spread evenly. However, it is time consuming to find a large prime number. In the Java API implementation for java.util.HashMap, N is set to an integer power of 2. There is a good reason for this choice. When N is an integer power of 2, you can use the & operator to compress a hash code to an index on the hash table as follows:

index = hashCode & (N − 1);

index will be between 0 and N 1. The ampersand, &, is a bitwise AND operator (see Appendix G, Bitwise Operations). The AND of two corresponding bits yields a 1 if both bits are 1. For example, assume N = 4 and hashCode = 11. Thus, 11 & (4 1) = 1011 & 0011 = 0011.

To ensure the hashing is evenly distributed, a supplemental hash function is also used along with the primary hash function in the implementation of java.util.HashMap. This function is defined as:

private static int supplementalHash(int h) {
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

>>> is unsigned right-shift operations (also introduced in Appendix G). The bitwise operations are much faster than the multiplication, division, and remainder operations. You should replace these operations with the bitwise operations whenever possible.

The complete hash function is defined as:

h(hashCode) = supplementalHash(hashCode) & (N – 1)

The supplemental hash function helps avoid collisions for two numbers with the same lower bits. For example, both 11100101 & 00000111 and 11001101 & 00000111 yield 00000111. But supplementalHash(11100101) & 00000111 and supplemental- Hash(11001101) & 00000111 will be different. Using a supplemental function reduces this type of collision.

 

 

Leave a Reply

Close Menu