top of page

Attention Probabilistic U-net

A new hybrid method to find anomalies in CT scans (on-going experiment)

Image classification is not new to us and the same stands true for image segmentation. Both of these classes of algorithms have evolved and are quite successful in their respective fields. But when it comes to using these algorithms or any algorithm in the field of healthcare, the error should be minimal and the results should be effective.

In our recent work we teamed up with a healthcare organisation to help them solve the problem of image segmentation from the CT scan of brains. The idea of segmentation is not new and there is a lot of literature that has provided a way to solve the segmentation problems. These codes are also available in GitHub and when you have both the literature and the code available it should be quite easy to implement and get the desired results. However, it turns out that is not always the case.

The CT scans that we received were very dark and everything looked absolutely normal to an untrained eye. When we spoke to the radiologist it turned out that there were malign cells present in every image that we received. The anomalies were both small and medium sized in 1000px : 1000px images. But the appearances of these anomalies were so subtle that they got saturated with the neighbouring pixels and it became hard to pick them.

Our challenge was to build a model that could:

  1. Be flexible enough to generate segmentation patches in any dire CT scans

  2. Generate enough data to train the segmentation model

Our Approach

We sifted through a lot of literature and found two relevant ones, both of which were from the Deepmind research:

  1. Hierarchical Probabilistic U-net

  2. Probabilistic U-net

The former is the successor of the latter. And both have the same approach, that is they implement two utility networks in their primary U-net: Prior and Posterior networks. Hierarchical Probabilistic U-net uses Prior and Posterior network at every de-convolutional block. It can sample latent variables and simultaneously feed them to the next de-convolutional block while it scales up, whereas the Probabilistic U-net samples the latent variables only once and feeds them into the de-convolutional block at the end.

We implemented both the models and we are still in the process of improving it. In this article we present to you the probabilistic U-net approach where we replace the U-net with Attention U-net. The transformers have gained so much traction that we wanted to modify the approach and present some interesting results to you.

The idea was straightforward: at every deconvolutional block of the U-net decoder we add a transformer prior to it. As mentioned earlier in the article, most of the images were in a dire condition and the region of interest i.e. the malign cells were almost buried within the surrounding pixels. The attention algorithm could help us pay attention to the region of interest which is what we were looking for.

While we were looking to build a model that is robust we wanted to train it on a larger dataset which wasn't possible. The organisation was not willing to give us more data because of privacy concerns, so we had to come up with a way to generate more images to train our Attention U-net.

So we developed a ResNet Cycle GAN that was able to generate images quite precisely, although the original images were terrible the generated images were quite fondly able to meet our expectations. We used ResNet inorder to make use of the residual blocks that are able to extract vital information such as features, patterns and distributions without suffering from the degradation problem that the larger neural network architecture usually suffer.

Once we figured out the underlying problem we started to implement it.


Before proceeding further we would like to mention that we are not allowed to present the original finding from the dataset that was given to us. So the experiments and results that we will be presenting in this article will be modified enough so that it does not breach any privacy and concerns.

We will be demonstrating the whole experiment with the LIDC lung cancer dataset and the codes will be made available shortly. Bear in mind that this is an ongoing project so we will not present any untested or hypothetical methods.

Probabilistic Attention U-Net

This experiment mimics the steps and methods mentioned in the original Probabilistic U-net paper except that we replaced the entire U-net with the Attention U-net. The two auxiliary networks, prior and posterior are slightly modified but remain the same.

During the training the input x is passed through the three architectures: the attention u-net, prior and posterior. The posterior along with input x also has the input y concatenated to it, here x is the CT scan and y is the segmented mask of the scan.

As the three architectures yield the three different outputs, all the outputs try to improve one another. The output from the U-net gets concatenated with the output form the posterior net and yields a segmentation output y’, which is then used to calculate the loss with the ground truth y using binary cross entropy loss.

It is important to understand that we sample a latent from the posterior and then feed it into the U-net. Each time we sample latent from the posterior it tries to capture the anomalies present in the concatenated image. Thus each time we sample latent from the posterior the distribution changes or slightly varies giving the attention U-net much more precise option to pay attention to the anomalies.

Similarly we use KL divergence to colloquially put the distribution yielded by the prior and posterior network closely together. This step ensures that we condition the prior based upon the input x and ground truth y.

During inference we use the prior to sample latent variables and generate 'n' segmentation masks for every input image. n is the number of samples we want from the prior net.

The first image on the left above demonstrates the training process, where the feedback from the posterior net used to improve the prior. The second image on the left demonstrates the inference process, where the prior is used to sample the latent and attention U-net makes use of the sample distribution to generate the segmentation.

Attention U-Net

As mentioned earlier, we used Attention U-net to generate segmentations. The network consisted of three important segments:

  1. Convolutional Block for downsampling

  2. De-convolutional Block for upsampling

  3. Attention Mechanism to address the area of interest

The idea was to downsample using the convolution block into a bottleneck and then during upsampling we added the attention block prior to the de-convolutional blocks such that the network could pay attention to the anomalies and yield better results using the sample from latent from either prior or posterior.

Source: Attention U-Net


We show results from the two epochs and then we will isolate into the details. The images will be in the following format:

  • The first row will be the CT scans of the lungs

  • The second row will be the ground truth

  • The third row be the generated samples

It is worth noting that each column in the last row consists of samples generated from the different distribution of the latent from the prior.

The image above was captured from the 6th epoch. As you can see, reconstruction of segmentation is almost up to the point where it can resemble the ground truth.

The image above is taken from the 9th epoch. This generated segmentation shows that the model was able to capture the smallest anomaly as well. And grades of each latent distribution are different.

The model was trained for 15 epochs and we saw some good results especially when the grades of the generated segmentation differed gradually. At the same time it could capture anomalies that were not present in the ground truth as well.

We aren't sure why this happened but we can assume that the attention mechanism can capture complex anomalies if the prior can produce good representations. This would be seen as a precautionary measure as well.

During the inference we saw some interesting results.

The image above clearly shows that the model has learned well during the training and it is able to generate segmentation much more flexibly and it is able to diversify the segmentation process.

Again you can see the diversification of the generated segmentation. Quite interestingly the use of attention mechanism has proven to be a good choice for our experiment.

Our proposal for the future

We were very glad that our experiments turned out to be this good. We will be uploading the codes for this experiment with the dataset in our GitHub account on 4th June 2021. Please check the codes.

While this experiment is ongoing and still in process, we would like to address the significance of such algorithms that can help the radiologists in their day today's activities. Algorithms like these aim to make the life of the doctors less stressful and productive at the same time allowing them to handle patients in large numbers with ease. These algorithms can help the doctors to learn more about the human anomalies where sometimes, the naked eyes fail to capture them.

As human life evolves into an unknown space of uncertainty which brings forth trouble and suffering we would like to bridge the gap between human intelligence and machine intelligence so that we challenge problems head on and strive towards a better life.


  1. Attention U-Net: Learning Where to Look for the Pancreas

  2. A Hierarchical Probabilistic U-Net for Modeling Multi-Scale Ambiguities (2019).

  3. the Probabilisitc U-Net

bottom of page