1. Overview
Java provides various Set implementations tailored for different use cases. In this tutorial, we're going to examine these Set implementations and their characteristics in terms of thread-safety.
2. Non-Thread-Safe Set Implementations
We'll first look at the non-thread-safe Set implementations including HashSet, LinkedHashSet, and TreeSet. When accessed from multiple threads concurrently, these Set implementations may not behave correctly.
Let's verify this with a simple test:
public class MultiThreadedAccess {
public void doInsert(Set<Object> set) throws InterruptedException {
final int taskCount = 100;
final ExecutorService executorService = Executors.newFixedThreadPool(taskCount);
for (int i = 0; i < taskCount; i++) {
executorService.execute(() -> {
set.add("hi");
});
}
executorService.shutdown();
executorService.awaitTermination(1, TimeUnit.SECONDS);
System.out.println("Set size: " + set.size());
}
// Other methods...
}
In the doInsert method, we execute 100 tasks inserting into the given set.
Now when we invoke it passing a HashSet instance:
public static void main(String[] args) throws InterruptedException {
final MultiThreadedAccess multiThreadedAccess = new MultiThreadedAccess();
multiThreadedAccess.doInsert(new HashSet<>());
}
It prints:
Set size: 98
Although we've executed 100 insert operations, the final set size is 98. This means that we lost 2 insertions due to concurrent access. Of course, this result doesn't occur at all times and requires some lucky timing.
2.1. Iterators
The iterators created from HashSet - LinkedHashSet, and TreeSet - are fail-fast. This means that if a new modification occurs after the construction of the iterator, it throws a ConcurrentModificationException.
public class MultiThreadedAccess {
public void doIterate(Set<Object> set) throws InterruptedException {
final int taskCount = 100;
final ExecutorService executorService = Executors.newFixedThreadPool(taskCount);
for (int i = 0; i < taskCount; i++) {
executorService.execute(() -> {
set.add("hi");
for (Object element : set) {
// Do something.
}
});
}
executorService.shutdown();
}
// Other methods...
}
In the doIterate method, we're executing 100 tasks inserting and iterating the given set.
When we pass a HashSet instance:
public static void main(String[] args) throws InterruptedException {
final MultiThreadedAccess multiThreadedAccess = new MultiThreadedAccess();
multiThreadedAccess.doIterate(new HashSet<>());
}
It shows the exception:
Exception in thread "pool-2-thread-7" java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1445)
at java.util.HashMap$KeyIterator.next(HashMap.java:1469)
at com.javabyexamples.java.concurrency.buildingblocks...(MultiThreadedAccess.java:51)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
3. Synchronized Set
Synchronized collections allow us to turn a non-thread-safe collection into a thread-safe one. They achieve this by wrapping the original collection. Now we'll wrap an existing set using the Collections.synchronizedSet method:
final Set<Object> initialSet = new HashSet<>();
final Set<Object> synchronizedSet = Collections.synchronizedSet(initialSet);
Here, we declare a HashSet instance, initialSet. After the Collections.synchronizedSet invocation, we acquire a thread-safe Set object whose public methods are synchronized. Also, note that the wrapper object uses its own intrinsic lock for synchronization.
If we run doInsert on a synchronized set:
public static void main(String[] args) throws InterruptedException {
final MultiThreadedAccess multiThreadedAccess = new MultiThreadedAccess();
multiThreadedAccess.doInsert(Collections.synchronizedSet(new HashSet<>()));
}
It provides the expected thread-safety:
Set size: 100
One drawback of this approach is that it serializes all access to the original set. Threads can't access the instance concurrently in that only one thread can acquire the lock forcing others to wait until the lock is released.
3.1. Compound Actions
Although a synchronized set guards all public methods, it can't help us when we perform a compound operation. A good example is a put-if-absent operation where we insert an element only if it's absent. Such an operation on a synchronized set is technically thread-safe but the result may not be as expected. To solve this issue, we must use client-side locking:
public void putIfAbsent(Object element) {
synchronized (synchronizedSet) {
if (!synchronizedSet.contains(element)) {
synchronizedSet.add(element);
}
}
}
In this method, we're acquiring synchronizedSet's intrinsic lock which is the same lock guarding other Set methods. With this method, we guarantee that no other thread can operate on the set until the current operation completes. In other words, we're making the putIfAbsent method atomic.
3.2. Iterators
The iterators created from synchronized sets can't handle concurrent access and fail fast. They throw ConcurrentModificationException when there is a modification to the underlying data. We'll use the previous doIterate method to observe their behavior:
public static void main(String[] args) throws InterruptedException {
final MultiThreadedAccess multiThreadedAccess = new MultiThreadedAccess();
multiThreadedAccess.doIterate(Collections.synchronizedSet(new HashSet<>()));
}
Similar to the HashSet example, a sample run shows the exception:
Exception in thread "pool-1-thread-71" java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1445)
at java.util.HashMap$KeyIterator.next(HashMap.java:1469)
at com.javabyexamples.java.concurrency.buildingblocks...(MultiThreadedAccess.java:51)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
To solve this issue, we must employ client-side locking around the iteration block:
synchronized (synchronizedSet) {
for(Object element : synchronizedSet){
// Do work...
}
}
Note that we're synchronizing on the wrapper set synchronizedSet.
4. ConcurrentHashMap-backed Set
ConcurrentHashMap is a thread-safe Map implementation that supports concurrent access. It also provides the static newKeySet method that returns a set backed by a ConcurrentHashMap instance. The returned Set instance inherits the thread-safety guarantees of the ConcurrentHashMap class:
final Set<String> setOfStrings = ConcurrentHashMap.newKeySet();
4.1. Compound Operations
ConcurrentHashMap uses lock striping to provide highly concurrent read and write operations. However, it doesn't support the usage of client-side locking. So we can't create custom compound actions like we did with the Collections.synchronizedSet instances.
4.2. Iterators
ConcurrentHashMap returns weakly consistent iterators that can handle concurrent modifications. They don't throw ConcurrentModificationException. However, as a trade-off, weakly consistent iterators don't make any guarantees about reflecting the recent changes.
5. ConcurrentSkipListSet
ConcurrentSkipListSet is a thread-safe Set implementation. Unlike the synchronized sets created by Collections.synchronizedSet, it supports concurrent access:
final Set<String> setOfStrings = new ConcurrentSkipListSet<>();
5.1. Compound Operations
Similar to the ConcurrentHashMap, ConcurrentSkipListSet doesn't support client-side locking. So we can't introduce new compound operations other than the ones already supported.
5.2. Iterators
ConcurrentSkipListSet returns weakly consistent iterators that don't throw ConcurrentModificationException. They achieve this by loosening the guarantee to reflect the changes that happen after their creation.
6. CopyOnWriteArraySet
The last concurrent Set implementation is CopyOnWriteArraySet. Whenever we attempt to modify the contents, CopyOnWriteArraySet copies the underlying array to apply the new change. In essence, it achieves thread-safety by treating the backing array as an immutable object.
6.1. Compound Operations
Since CopyOnWriteArraySet doesn't use locking to enforce thread-safety, we can't lock the whole set to gain exclusive access. Similar to the previous concurrent sets, we can't add new compound actions.
6.2. Iterators
CopyOnWriteArraySet returns snapshot iterators. Since the underlying array is immutable, each iterator instance operates on a snapshot of the state when it's created. If a modification occurs to the data, it doesn't affect the existing iterators since they work on their own copy of the data. As a result, they don't throw ConcurrentModificationException.
7. Summary
In this tutorial, we've looked at the thread-safety characteristics of different Set implementations in Java.
As always, the source code for all examples is available on Github.