*A typical hash function first converts a search key to an integer value called a hash code, then compresses the hash code into an index to the hash table.*

Java’s root class **Object **has the **hashCode **method, which returns an integer *hash code*. By default, the method returns the memory address for the object. The general contract for the **hashCode **method is as follows:

- You should override the
**hashCode**method whenever the**equals**method is overridden to ensure two equal objects return the same hash - During the execution of a program, invoking the
**hashCode**method multiple times returns the same integer, provided that the object’s data are not changed. - Two unequal objects may have the same hash code, but you should implement the

**hashCode **method to avoid too many such cases.

## Hash Codes for Primitive Types

For search keys of the type **byte**, **short**, **int**, and **char**, simply cast them to **int**. Therefore, two different search keys of any one of these types will have different hash codes.

For a search key of the type **float**, use **Float.floatToIntBits(key) **as the hash code. Note **floatToIntBits(float f) **returns an **int **value whose bit representation is the same as the bit representation for the floating number **f**. Thus, two different search keys of the **float **type will have different hash codes.

For a search key of the type **long**, simply casting it to **int **would not be a good choice, because all keys that differ in only the first 32 bits will have the same hash code. To take the first 32 bits into consideration, divide the 64 bits into two halves and perform the exclusive- or operation to combine the two halves. This process is called *folding*. The hash code for a **long **key is

int hashCode = (int)(key ^ (key >> 32));

Note **>> **is the right-shift operator that shifts the bits 32 positions to the right. For example, **1010110 >> 2 **yields **0010101**. The ^ is the bitwise exclusive-or operator. It operates on two corresponding bits of the binary operands. For example, **1010110 **^ **0110111 **yields **1100001**. For more on bitwise operations, see Appendix G, Bitwise Operations.

For a search key of the type **double**, first convert it to a **long **value using the **Double. ****doubleToLongBits **method, then perform a folding as follows:

long bits = Double.doubleToLongBits(key); int hashCode = (int)(bits ^ (bits >> 32));

## Hash Codes for Strings

Search keys are often strings, so it is important to design a good hash function for strings. An intuitive approach is to sum the Unicode of all characters as the hash code for the string. This approach may work if two search keys in an application don’t contain the same letters,

but it will produce a lot of collisions if the search keys contain the same letters, such as

**tod **and **dot**.

A better approach is to generate a hash code that takes the position of characters into con- sideration. Specifically, let the hash code be

*s*0 * *b*(*n *– 1) + *s*1 * *b*(*n *– 2) + g + *s**n *– 1

where *s _{i} *is

**s.charAt(i)**. This expression is a polynomial for some positive

*b*, so this is called a

*polynomial hash code*. Using Horner’s rule for polynomial evaluation the hash code can be calculated efficiently as follows:

( g ((*s*_{0} * *b *+ *s*_{1}) * *b *+ *s*_{2}) * *b *+ g + *s _{n} *

_{–}

_{2}) *

*b*+

*s*

_{n}_{–}

_{1}

This computation can cause an overflow for long strings, but arithmetic overflow is ignored in Java. You should choose an appropriate value *b *to minimize collisions. Experiments show that good choices for *b *are 31, 33, 37, 39, and 41. In the **String **class, the **hashCode **is overridden using the polynomial hash code with *b *being **31**.

## Compressing Hash Codes

The hash code for a key can be a large integer that is out of the range for the hash-table index, so you need to scale it down to fit in the index’s range. Assume the index for a hash table is between **0 **and **N-1**. The most common way to scale an integer to between **0 **and **N**−**1 **is to use

index = hashCode % N;

Ideally, you should choose a prime number for **N **to ensure the indices are spread evenly. However, it is time consuming to find a large prime number. In the Java API implementation for **java.util.HashMap**, **N **is set to an integer power of **2**. There is a good reason for this choice. When **N **is an integer power of **2**, you can use the **& **operator to compress a hash code to an index on the hash table as follows:

index = hashCode & (N − 1);

**index **will be between **0 **and **N **− **1**. The ampersand, **&**, is a bitwise AND operator (see Appendix G, Bitwise Operations). The AND of two corresponding bits yields a **1 **if both bits are **1**. For example, assume **N = 4 **and **hashCode = 11**. Thus, **11 & (4 **− **1) = 1011 & ****0011 = 0011**.

To ensure the hashing is evenly distributed, a supplemental hash function is also used along with the primary hash function in the implementation of **java.util.HashMap**. This function is defined as:

private static int supplementalHash(int h) { h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); }

**>>> **is unsigned right-shift operations (also introduced in Appendix G). The bitwise operations are much faster than the multiplication, division, and remainder operations. You should replace these operations with the bitwise operations whenever possible.

The complete hash function is defined as:

h(hashCode) = supplementalHash(hashCode) & (N – 1)

The supplemental hash function helps avoid collisions for two numbers with the same lower bits. For example, both **11100101 & 00000111 **and **11001101 & 00000111 **yield **00000111**. But **supplementalHash(11100101) ****& 00000111 **and **supplemental- ****Hash(11001101) & 00000111 **will be different. Using a supplemental function reduces this type of collision.