Provide Best Programming Tutorials

HashSet In Java

HashSet

The HashSet class implements the Set interface, backed by a hash table which is actually a HashMap instance. No guarantee is made as to the iteration order of the set which means that the class does not guarantee the constant order of elements over time. This class permits the null element. The class also offers constant time performance for the basic operations like add, remove, contains and size assuming the hash function disperses the elements properly among the buckets, which we shall see further in the article.
Few important features of HashSet are:

  • Implements Set Interface.
  • The underlying data structure for HashSet is the hashtable.
  • As it implements the Set Interface, duplicate values are not allowed.
  • Objects that you insert in HashSet are not guaranteed to be inserted in same order. Objects are inserted based on their hash code.
  • NULL elements are allowed in HashSet.
  • HashSet also implements Serializable and Cloneable interfaces.

Initial Capacity: The initial capacity means the number of buckets when hashtable (HashSet internally uses hashtable data structure) is created. The number of buckets will be automatically increased if the current size gets full.
Load Factor: The load factor is a measure of how full the HashSet is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

                  Number of stored elements in the table
   load factor = -----------------------------------------
                        Size of the hash table 

Example: If internal capacity is 16 and the load factor is 0.75 then, number of buckets will automatically get increased when the table has 12 elements in it.

Effect on performance

Load factor and initial capacity are two main factors that affect the performance of HashSet operations. The load factor of 0.75 provides very effective performance as respect to time and space complexity. If we increase the load factor value more than that then memory overhead will be reduced (because it will decrease internal rebuilding operation) but, it will affect the add and search operation in the HashTable. To reduce the rehashing operation we should choose initial capacity wisely. If initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operation will ever occur.

NOTE: The implementation in a HashSet is not synchronized, in the sense that if multiple threads access a hash set concurrently, and at least one of the threads modifies the set, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the set. If no such object exists, the set should be “wrapped” using the Collections.synchronizedSet method. This is best done at creation time, to prevent accidental unsynchronized access to the set as shown below:

Set s = Collections.synchronizedSet(new HashSet(...));

Constructors in HashSet

  1. HashSet h = new HashSet(); 
    Default initial capacity is 16 and the default load factor is 0.75.
  2. HashSet h = new HashSet(int initialCapacity); 
    default load factor of 0.75
  3. HashSet h = new HashSet(int initialCapacity, float loadFactor);
  4. HashSet h = new HashSet(Collection C);

Example

package SetDemo.hashset;

import java.util.HashSet;
import java.util.Iterator;

public class HashSetDemo {
    public static void main(String[] args) {
        HashSet<String> hashSet = new HashSet<>();
        hashSet.add("Andrew");
        hashSet.add("Programming");

        hashSet.add("Andrew");
        hashSet.add("Hello World!");

        hashSet.add(null);//you can add null to hashset

        System.out.println("--------Using for-each Loop------------");

        for (String item : hashSet) {
            System.out.println(item);
        }

        System.out.println("--------Using Iterator----------");

        Iterator it = hashSet.iterator();
        while (it.hasNext()) {
            System.out.println(it.next());
        }

        hashSet.remove("Hello World!");

        System.out.println("--------After remove------------");

        for (String item : hashSet) {
            System.out.println(item);
        }
    }
}

output

--------Using for-each Loop------------
null
Hello World!
Andrew
Programming
--------Using Iterator----------
null
Hello World!
Andrew
Programming
--------After remove------------
null
Andrew
Programming

 

How HashSet judge whether two elements are equal or not

HashSet using hashcode and equals methods to ensure there is no duplicate element in the set.

Internal working of a HashSet

All the classes of Set interface internally backed up by Map. HashSet uses HashMap for storing its object internally. You must be wondering what to enter a value in HashMap we need a key-value pair, but in HashSet, we are passing only one value.

Storage in HashMap

Actually the value we insert in HashSet acts as the key to the map Object and for its value, java uses a constant variable. So in the key-value pair, all the keys will have the same value.

...

public HashSet() {
       map = new HashMap<>();
   }

   /**
    * Constructs a new set containing the elements in the specified
    * collection.  The {@code HashMap} is created with default load factor
    * (0.75) and an initial capacity sufficient to contain the elements in
    * the specified collection.
    *
    * @param c the collection whose elements are to be placed into this set
    * @throws NullPointerException if the specified collection is null
    */
   public HashSet(Collection<? extends E> c) {
       map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
       addAll(c);
   }

   /**
    * Constructs a new, empty set; the backing {@code HashMap} instance has
    * the specified initial capacity and the specified load factor.
    *
    * @param      initialCapacity   the initial capacity of the hash map
    * @param      loadFactor        the load factor of the hash map
    * @throws     IllegalArgumentException if the initial capacity is less
    *             than zero, or if the load factor is nonpositive
    */
   public HashSet(int initialCapacity, float loadFactor) {
       map = new HashMap<>(initialCapacity, loadFactor);
   }

...

Methods in HashSet

boolean add(E e): Used to add the specified element if it is not present if it is present then return false.
void clear(): Used to remove all the elements from the set.
boolean contains(Object o): Used to return true if an element is present in the set.
boolean remove(Object o): Used to remove the element if it is present in the set.
Iterator iterator(): Used to return an iterator over the element in the set.
boolean isEmpty(): Used to check whether the set is empty or not. Returns true for empty and false for a non-empty condition for the set.
int size(): Used to return the size of the set.
Object clone(): Used to create a shallow copy of the set.

Leave a Reply

Close Menu