TLDR; If you’re familiar with concepts of Threads and Pooling them, you can skip right over to the CompletableFuture’s section, as these are the most common tasks run nowadays and can run their own pool if you don’t want to use the default ForkJoinPool.commonPool
In this tutorial we will discuss all things concurrent in Java (in the package lol).
Firstly, let’s discuss what a thread is, in lament terms a thread is an execution sequence that ultimately will come to an end unless we instruct otherwise, by ‘blocking’ the thread somehow, whether this be with a while loop which never ends or other means.
Threads effectively ‘live’ inside of a process, and a process provides a shared memory space for all subsequent threads that are executed within it.
Each component of a process is detailed below:
Stack — Is a linear collective and fixed data structure, whether that be an Array, LinkedList, or something akin to this sort, that stores information about the active subroutines. In a nutshell, it can be commonly thought of as a stack of references to either:
A) Heap Object a.k.a., reference types, stored on the heap,
B) Struct Objects a.k.a., primitive types, stored on the stack itself.
Registers — An integral piece of the computer processor (CPU). A register may hold an instruction, a storage address, or other kind of data needed by the process to orchestrate execution. I suppose there is technically an additional step, known as the “Counter”, but we won’t worry about that, nor registers really.
Heap — Holds reference objects for the stack, mutable dynamically allocated pieces of memory, in a hierarchical structure, it has no specific fixed size either collectively or physically.
Thread — A sequence of execution with a finite lifecycle, it has a beginning and an ending, and will execute until complete or instructed otherwise.
In the image above, we notice that in the multi-threaded example, the heap memory is shared between them. This is crucial to our discussion of concurrency. As this means that multiple threads can access the same reference type at a time and that can lead to some ugly problems if not managed properly. But fear not, Java has us covered and you’ll see soon enough!
There is technically another form of concurrency known as multi-processing in which we have multiple process, however, we will not be discussing multi-processing as a part of this guide.
When writing a program in any language, usually, the entry point whether that be the ‘main’ method or an entry script such as ‘index.js’ will de facto create a ‘main’ thread, this is our primary execution thread in which we will spawn subsequent threads from.
The Thread Scheduler
Each Java thread is given a numeric priority between MIN_PRIORITY and MAX_PRIORITY (constants defined in the Thread class). At any given time, when multiple threads are ready to be executed, the thread with the highest priority is chosen for execution. (From IITK)
In short, threads don’t always run parallel to each other, and when they don’t, how do we relay to the JVM that we want N thread to run in our pool of say 8 Logical processing units? Thread Priority — Simply modifying this flag will indicate to the JVM that we wish a particular thread to run over another with lesser priority.
If two threads have the same priority, the JVM will follow First-In-First-Out principle, so whoever was registered first will go first.
Now we’re familiar with the scheduler, what a process is, and how a thread sits into these constructs, let’s create one.
Extending the Thread Class
The first way to create a thread is simply extending the built in Thread class:
The run method is often referred to as the task that this thread will complete. Upon calling .start(), internally, the thread will call .run() on a separate thread. I used the static member ‘currentThread’ to identify which thread the sysout is executing within. As we can see, within the run method it is identified as ‘Thread-0’ and in our main method it is ‘main’.
One thing to note, the run method must and only can return void, we will see additional features provided by Java 8 soon which combat this issue.
Notice we’re only overriding a single method of the thread class, this likely seems a little overkill for each task we wish to run in a new thread right?
The solution is to use the Runnable interface argument on the Thread constructor. We have two ways to do this, either through an anonymous Runnable class or a Lambda expression which complies with the Runnable interfaces functional signature.
If you’re not familiar with lambdas, checkout my Java In A Nutshell — Function Interfaces, Method References and Lambdas tutorial
Propagate Exceptions in Runnables
You can’t. As a thread is an isolated sequence of execution, it simply isn’t possible to propagate exceptions up through a runnable into the thread instance responsible for creating the subsequent thread.
We will however see later how futures combat this issue
Useful Thread Methods
On the Thread class/instance itself, it has a good few useful methods:
ThreadGroups essentially allow us to collect a group of threads and control their internal fields, such as setting them all to daemon or their priorities as a collective. They apparently aren’t used too often anymore, so here’s a primitive example:
Thread Management via Executors
Creating threads manually is fairly cumbersome, and not really very ideal. Executors manage this for us, a place where threads are kept active and we can issue tasks (runnables) to a single executor, which will manage the queueing of tasks and execution inside each thread for us.
To create an executor we have a few different methods available statically, but here’s a simple one:
It returns an ‘ExecutorService’ interface implementation (extends Excutor), this provides us with very wide range of useful methods to control how our executor executes such as:
Now obviously, we wouldn’t often want a single thread in the pool and we may want some control over when the threads cease execution and die. The Executors class provides us a hefty lump of different executor types we can create:
Under the hood, the thread configuration is done via the ‘ThreadFactory’ interfaces, the Executors class uses a default class implementation called ‘DefaultThreadFactory’, it looks as so:
If the configuration here was not ideal for our pool, we can extend ThreadFactory ourselves and pass it to any of the pool constructor methods of Executors. For example:
Synchronising Multiple Threads
Before we head into Futures, let’s look at how we can synchronise communication between our threads.
If we were to create an object and attempt to access it at the same time in two different threads, how we can ensure that only one single thread accesses it at a time, so that our output is ‘synchronized’?
Consider the following:
The output looks as follows… :(:
What if we cared about the order? What if we had a specific case we needed to fulfil and this race condition is disallowing us? Java provides the ‘sychronized’ keyword for us, and it is applicable to either:
a) A code block
b) An instance method
c) A static method
If we were to modify the method above (addToList) to any of the following:
Now our output is:
Nice and neat, in synchronized order.
But what if we don’t want to synchronise the entire instance? What if we want to synchronize a member? Well we can, the above code (where we only care of synchronizing the list can be written as):
Additionally, and interestingly, we can synchronise a method call as long as the reference to the resource we’re giving explicit monitor access is referenced to in all utilising threads:
If this doesn’t make too much sense right now, read the monitors section below and come back :)
But what is actually happening here? Why are we passing an instance to the synchronized block, what does this mean?
The object we pass to the synchronised object has a monitor:
Java associates a monitor with each object. The monitor enforces mutual exclusive access to synchronized methods invoked on the associated object. When a thread calls a synchronized method on an object, the JVM ckecks the monitor for that object (From CSC)
In a nutshell, it doesn’t matter too much to us developers and we could consider it an implementation detail of the JVM and it’s approach to handling synchronisation. But it’s good to know! This means, that depending on what object we pass to a synchronized block and/or use within a synchronized method, that these members can only be accessed by a single thread at a time.
I’ve heard of ‘locks’, what are these?
Locks are kind of synonymous to monitors, all it means is that when a thread has access to a synchronized block, i.e., it ‘owns’ the monitor for the passed in object / it is executing within the synchronized block, that this resource is ‘locked’ and therefore other threads who wish to gain access must wait until N thread has completed execution for a given synchronised block / method.
Notify, NotifyAll and Wait
Bar synchronising blocks and methods, we also have a few other features we can use within the synchronised blocks to allow us to have a little more control, what if we didn’t want the final iteration of a loop to be synchronised and wanted to pass the lock / monitor ownship to another thread? Well we can. It can appear a little confusing at first, but if you think of it as simply just passing the monitor lock when we choose to between 2 threads, it makes much more sense :).
Our scenario, rather than holding the lock for a predetermined execution time (i.e., a simple synchronised block or method), we want a very specific order to elements added to the list. What we specifically want is Thread 0 adding 3 elements, Thread 1 adding 6 elements, and finally Thread 0 adding the final element. Here is a code example of how this can be achieved with wait / notify:
I personally find wait/notify very confusing in a larger context and do often avoid it, and have done fine in my career without. So if this feels a little overwhelming, don’t worry — me too :P
You’ve likely noticed we’re calling wait / notify of the object passed in via the synchronised block. What this means is:
a) Our current thread has this objects monitor
b) As our thread has this objects monitor, it makes sense that our thread can determine the state of other threads who wish to access the same resource via calling notify / wait on this specific resource, otherwise any random waiting thread may be awoken (lol)
What if we try lock the resource outside of a synchronized block? Well…
Will occur. As the name implies, the thread communication methods will only work within a synchronised block as they need to ‘own’ the monitor.
Now obviously, when we call obj.wait() we no longer own the lock, surely there’s a way we can verify this? Of course there is: (:P)
Now this is great and dandy, but be careful, if used incorrectly we can enter into a fuck up known as ‘deadlock’. Here is a very primitive example of a deadlock, but I’ll bet you can imagine how easy it is for this to occur…
Now a good safety precaution to avoid this situation is to have all threads who may potentially have to wait set to daemon. This means, that after all ‘user’ (none daemon) threads have finished execution, they can be terminated despite having not finished their tasks. Again, it’s super subjective to your problem at hand but good to remember!
In addition to having the synchronised block control the thread locks, Java 8+ has features to explicitly control locks.
Let’s say we wanted to mimic synchronised behaviour using one of these locks, let’s take a previous example of synchronised and convert it to ReentrantLock:
Simple enough right? It also has some useful helper methods so we know who actually owns the lock at a given time:
Tthere’s a concept of what is fair/unfair when it comes to lock acquisition, there are methods which will try acquire the lock unfairly and some which are fair. I won’t go into the gritty details of locks in this article, but if wanted, I will make an article on it :). I just wanted to make us aware that it does exist, and we aren’t limited to synchronized blocks.
For extra clarity, here are some of the other classes used for locks:
- StampedLock (Includes ReadWriteLock)
If your first language was an asynchronous language with little threading effort (I.e., JS on NodeJS) then futures will feel right at home for you. Remember the Runnables who may only be void? And the ScheduledExecutor who returns a future? This is futures. They’re essentially runnables that are able to return a value at some point in the future. If you’re familiar with Promises from JS, then they work pretty much identically.
I’d also recommend there’s really no point using raw Runnables over Futures, but for completeness we covered it.
So, futures in a nutshell, we have three constructs to consider:
- Callable — Functional interface with a single method, call, which returns a value. Basically Runnables opposite. They both represent tasks, but they don’t both return values.
- Future — An interface which represents a future task, has members to determine if the task is complete, get the value, etc.
- FutureTask — A wrapper for a Callable task and an implementation of the Future interface, provides us some useful methods to manipulate our futures.
Futures — The Raw Way
Similar to how we can spin up a thread and pass it a Runnable manually (without an executor), we can do the same with Futures:
As you’ve probably noticed, as long as we have some kind of reference to our FutureTask after it’s execution within a thread, we can call .get() to retrieve the result. Further notice I had to manually tell our main thread to sleep, this is because if we attempt to retrieve prematurely, it will result in null.
Futures — The Right Way (Just kidding, via Executors)
So far we’ve been calling .execute with executors, this is great for Runnables but not so great for Callables. We further don’t wanna’ wrap every Callable manually up in a FutureTask do we? The .submit() will do this for us:
Additionally, if we want to use a Runnable, we can just pass a Lambda that doesn’t return and the Executors .submit() will pick the correct overload.
someExecutor.submit(() -> System.out.println("hi")); // Runnable
someExecutor.submit(() -> "hi"); // Callable
Futures — Scheduled Executor Service
Earlier we mentioned that we can have tasks run on schedules and skipped it in the executor section until we learnt Callables. The scheduled executor provides an addition method, akin to .execute and .submit, we now have .schedule, in it’s purest form it looks as so for Runnables and Callables:
After 2 seconds, our scheduled task will run. That’s it basically. There are additional methods to schedule and run continuously after N seconds, or at a fixed rate, but I feel it’s kind of self explanatory how they work.
Futures — .get Timeouts
Consider the example above, do we really want to call Thread.sleep(N) or some other blocking operation each time? Of course not, we can instead use the .get(time, unit) overload:
Futures — Invoke/all/any
Lastly, what if we had a collection of tasks we wish to run and not manually submit them each time, say in a loop that calls .submit forEach task? Say no more:
And what’s particularly great about this is that our output is in sequential order, this means that our tasks from invokeAll are registered sequentially. No matter the operation time, our output result will depend upon the longest running task, and therefore invokeAll maintains our presented collection order upon returning the futures themselves.
Lastly, we have invokeAny. As the name implies, it will invoke any of the callables in the collection, but return the result directly. Which ever is able to complete successfully (no exceptions):
So in a nutshell, we can think of invokeAny as invokeAnyThatDontThrowExceptions().
If you remember before we were using executors to call .submit and wrap our Callables in a FutureTask for us, we manually wrapped a Callable into a FutureTask and passed it directly to a raw thread. This worked because the extension chain is:
FutureTask <- Runnable Future <- Runnable, Future<V>
What if we had a way to just call a single method and the executor, thread management, thread configuration are just done for us, so all we need worry about is the lambda task we wish to run, whether that be a Runnable or Callable?
The best way to teach CompletableFutures in my opinion is purely through code, so I’ve grouped a bunch of examples together and prefixed them in block comments detailing how they work and suffixed by the output of each example:
And that concludes the article, if this was helpful, I’m happy I’ve helped lay out some of these concepts in lament terms, if not, you can reach me in my discord server.