top of page

Generative Methods in Medical Applications

Exploring Generative Adversarial Networks


  1. Background and Working

  2. Challenges in optimizing GANs

  3. Types of GANs

  4. Applications in medical imaging

  • Reconstruction

  • Medical image synthesis

  • Segmentation

  • Classification

  • Detection

5. Conclusion

Background and Working


A generative adversarial network (GAN) is a class of machine learning frameworks designed by Ian J. Goodfellow 2014 (Goodfellow et al., 2014), is one of the recently developed approaches to ‘generative modeling’ using a flexible unsupervised deep learning architecture (Raza & Singh, 2018). Generative modeling is an unsupervised learning task used to generate and learn the patterns in input data so that the model can be utilized for generating new examples (output). The ability of GAN to generate new samples arrive from the fact that it represents probability distribution over multiple variables in the data.


The vanilla GAN (Goodfellow et al., 2014) was designed to draw samples from the desired data distribution without the need to explicitly model the underlying probability density function. It consists of two neural networks: the generator G and the discriminator D. The idea being that the two networks contest against each other.

Essentially, the generative network learns to make relationships from a latent space to a data distribution of interest, while the discriminative network distinguishes data produced by the generator from the true data distribution. The generative network's training objective is to increase the error rate of the discriminative network i.e., "fool" the discriminator network by producing novel candidates that the discriminator thinks are not synthesized (are part of the true data distribution).

The input to G, z is pure random noise sampled from a prior distribution p(z). The random sample generally belongs to a Gaussian or a uniform distribution (for simplicity). The output of G, xg is expected to have visual similarity with the real sample xr that is drawn from the real data distribution pr(x). We denote the non-linear mapping function learned by G parametrized by 𝜽g as xg = G(x; 𝜽g). The input to D is either a real or generated sample. The output of D, y1 is a single value indicating the probability of the input being a real or fake sample. The mapping learned by D parametrized by θ d is denoted as y1 = D(x; 𝜽d ). The generated samples from a distribution pg(x) which is desired to be an approximation of pr(x) after successful training.


Challenges in optimizing GANs

Goodfellow in 2014 understood the problem with GANs. Since there are two neural networks competing against each to make themselves perfect, they do it at the expense of the other.

The GAN training objective is regarded as a saddle point optimization problem (Yadav et al., 2018) and the training is often accomplished by gradient-based optimization methods. The gradient-based optimization methods are the most popular choice for finding local optima for classical minimization and saddle point problems.

Generator and Discriminator are trained simultaneously so that they may evolve together. As a result, one network may inevitably be more powerful than the other, which in most cases is D. When D becomes too strong as opposed to G, the generated samples become too easy to be separated from real ones, thus reaching a stage where gradients from Discriminator approach zero, providing no guidance for further training of Generator.

Another problem commonly faced in training GANs is mode collapse where the generator is unable to produce a variety of outputs or it is limited. This means that when the network is trained upon a multi-modal or variety of data directly, the generator learns to fool the discriminator by generating only a limited variety of data.

Types of GANs

Different types of GANs are developed to solve the limitations of the same.

In order to stabilize training and also to avoid mode collapse, different losses for D have been proposed, such as :

  1. f-divergence (f-GAN) (Nowozin et al., 2016)

  2. least-square (LSGAN) (Mao et al., 2017)

  3. hinge loss (Miyato et al., 2018)

  4. Wasserstein distance (WGAN, WGAN-GP) (Arjovsky et al., 2017; Gulrajani et al., 2017).

Among these, Wasserstein distance stands out as the most popular metric.

In another method, autoencoders are being implemented. In EBGAN, the discriminator network is replaced by the autoencoder. D’s objective then becomes matching autoencoder loss distribution rather than data distribution.

Conditional GANs (cGANs) use label information and result in better quality images. They are able to control how generated images will look. cGANs learn to produce better images by exploiting the information fed to the model.

InfoGAN is able to learn disentangled representations and perform conditional data generation based on these attributes. InfoGANs are used when your dataset is very complex when you’d like to train a cGAN and the dataset is not labeled.

Source: Generative adversarial network in medical imaging: A review (Xin Yi, Ekta Walia, Paul Babyn)

Applications in medical imaging

Medical imaging offers doctors a wide range of information about the patient. It has to be clear and clean in order to understand the problem and prescribe the proper treatment. In this section, we will discuss various applications of generative models in medical imaging.


The quality of medical images can be tampered with due to constraints in clinical settings, such as radiation dose and the diagnostic quality of acquired medical images may be limited by noise and artifacts. Recently, we have seen the development of reconstruction methods from analytic to iterative and now to machine learning and deep learning-based methods. These data-driven learning-based methods either learn to extract important representations and features from the diagnostic image and reconstruct them to high-dimension images where anomalies are (most likely to be) distinguishable. This type of method is called image-to-image translation.


Medical image synthesis

Medical Imaging relies on more than one image, it consists of different modalities. These modalities provide different perspectives on patient diagnosis. But the process of getting these diagnoses usually suffers the constraints that they face. For instance, Computer tomography (CT) has the advantage of providing electron density and physical density of the tissue, but when it comes to soft tissue it does not provide good texture in terms of contrast. In addition, it may add the risk of secondary cancer for younger patients. Magnetic Resonance Imaging (MRI) is safer and gives much contrast than CT scans but it lacks density information that is required for therapy planning.


Generative adversarial networks (GANs) have achieved state-of-the-art results in recent years and it is able to produce images which are almost real. We can feed the networks with the modalities and generate images as descriptive as the real ones. This will enable doctors to produce MRI scans from CT scans since MRI scans are expensive. This same method can be applied to produce a 7T MRI scan from a 3T MRI because of the same reason as mentioned before.


Methods like this can be used in rural areas where people cannot spend much money on the diagnosis.


Segmentation is generally applying a pixelated mask around the subject of concern. Segmentation is a category that usually comes under supervised learning, which deals with labeled data.


Segmentation is very much helpful to highlight any anomalies that are found in the diagnosis images but this type of data is scarce and expensive to generate.

GANs on the other hand can be extremely useful to generate synthetic segmented data at low cost which can be used to help doctors to study and be aware of the new type of anomalies and also be helpful in training deep learning models on a large amount of data.


Classification is one of the most widely used tasks in deep learning. But when it comes to medical purposes there is a scarcity of data. Mostly because medical data are very personal and patients would not want to reveal their identity. As mentioned earlier GANs can be used to produce synthetic data, this type of approach serves three purposes;

  1. Generate data that is tampered with from the real ones but also which is realistic thus hiding the patient’s information.

  2. This approach can produce a lot of data for training and validation purposes which can be used for classification. Eventually, deep learning algorithms will not have to wait for the actual reality to be published.

  3. These models are now capable of producing a variety of generated images which will also make the classification of diseases more dynamic and reliable.


The discriminator of GANs can be utilized to detect anomalies such as malign cells by learning the probability distribution of training images depicting normal pathology. This essentially means that the images that fall outside of this probability distribution can be considered anomalies.

This in turn can help the doctors across all the fields to spend more time studying the patterns that are generated by the generator and rejected by the discriminator. There are some patterns that have been unnoticed by the human eye but to an algorithm, it might not be the case simply because the distribution is different. Thus doctors can pay attention to the pattern and might come up with a conclusion.


GANs are state-of-art technology that is being used effectively in some of the medical facilities across the globe. To conclude all that we saw in this post:

  1. GANs are cost-effective and save a lot of time for building a good classifier but producing good data.

  2. GANs can open new doors for medical research and fasten the treatment speed.

  3. GANs can produce high-definition images

  4. GANs can be used to convert CT scans to MRI.

bottom of page