Exploring CNN for Lung Tumor Segmentation: U-Net and Res U-Net

The main objective of this project will be to address the challenging task of accurately recognizing tumors in lung CT images proposed in the framework of the Medical Segmentation Decathlon challenge. This project was done by a group of three including me as a part of my Image processing coursework. This post will contain the important parts from the final report and explains how the U-net and Res U-Net architecture were used for the lung tumor segmentation.

How does Tumor segmentation help?

As the most deadly cancer for both sexes, lung cancer fatality is claimed to be nearly equal to the combined deaths from breast and colon cancer. [1] For the purpose of treatment planning dose estimates, radiotherapy uses medical imaging, particularly computed tomography (CT), to accurately localize tumors and determine electron densities. Precisely dividing the tumor and vulnerable organs is particularly crucial since mistakes could result in over- or under-irradiation of the tumor and/or surrounding healthy tissue. Automated segmentation of lung cancers from CT scan pictures can help physicians diagnose patients early and continue tracking the disease’s course. [2,3] Additionally, radiomics is yet another discipline that can directly benefit from lesion recognition and delineation, as ROI segmentation is one of the most time-consuming and difficult phases in the radiomics workflow [4,5].

Data Description:

The data used was the Decathlon Challenge dataset, which is accessible online and released under a permissive copyright license (CC-BY-SA 4.0). Pre-operative thin-section 3D CT scans of 96 lung cancer patients were obtained from The Cancer Imaging Archive (TCIA) and included in the dataset that was made available. The lung tumors were the relevant target ROI in the scans. The entire dataset made available for the challenge was created by reformatting the images taken from the archive into the Neuroimaging Informatics Technology Initiative (NIfTI) format. This involved transposing the images to the closest possible right-anterior-superior coordinate frame and then converting them to the NIfTI radiological standard so that they could be easily implemented.

Segmentation Network Architecture:

The U-Net architecture comprises two fundamental pathways: the contracting path (encoder/analysis) and the expansion path (decoder/synthesis), as illustrated in Figure 1. The contracting path resembles a standard convolutional network, extracting classification features. Conversely, the expansion path, employing up-convolutions and feature concatenation from the contracting path, enables the network to perceive localized classification details. Moreover, this pathway enhances output resolution, aiding the final convolutional layer in generating a fully segmented image. With its almost symmetrical design, the resulting U-Net network exhibits a U-shaped structure Fig.1 [6].

The following 4 architectures were implemented in this project:

1. Res U-Net:

A Res U-Net architecture is a variation of the U-Net architecture which is based on the ResNet architecture [7] and proved to be ideal for medical image segmentation tasks. In this network, the input of the first convolutional layer in the residual U-net was added to the output of the second convolutional layer at each block in the network via a skip connection. This meant that, in the Res U-Net architecture, prior to the down or up-sampling in the respective pathways inside the U-net, this skip connection was implemented. The unit design of ResNet blocks is shown in Fig.2.

Fig. 2: ResNet blocks with skip connection

The Res U-Net model utilized features a specific structure comprising 4 down-convolution layers, 1 bridging layer, and 4 up-convolution layers. Each layer incorporates a residual block consisting of two 3x3 kernel convolution layers, followed by batch-normalization and a Rectified Linear Unit (ReLU) activation function. Additionally, within this block, a skip connection is included, integrating a 1x1 kernel convolution operation with a stride of 1. The residual block effectively doubles the channels, while the inclusion of max-pooling between each down-convolution layer reduces the dimensions. The up-convolution process initiates with up-sampling followed by transpose-convolution. The complete architecture is provided in Fig.3.

2. Res U-Net with lesser layers:

While the previous architecture involved the tuning of over 75 million parameters, it was streamlined significantly by minimizing the number of layers. The revised architecture consisted of slightly more than 2 down-convolutional layers, 1 bridging layer, and 2 up-convolutional layers, resulting in a substantial reduction in complexity. This adjustment led to only 2.2 million parameters that needed tuning while maintaining the remaining structure identical to the one discussed in the earlier section.

3. Modified Res U-Net:

The variant presented here introduces three significant modifications to the previously described architecture. Firstly, this variation includes an additional depth comprising 3 down-convolution layers, 1 bridging layer, and 3 up-convolution layers. Secondly, we tried to model the skip-connection in a slightly different way. This involved the incorporation of a preparation layer primarily intended to augment the channels for the skip connection addition. This preparation layer consists of two convolution layers, each with a 3x3 kernel, followed by batch-normalization and a Rectified Linear Unit (ReLU) activation function. The skip connection integrates the output of the preparation layer with the output of the two successive convolution layers. The final major modification in this variant is the absence of the transpose convolution layer in the up-convolution segment. Instead, only an up-sampling operation is conducted between each layer in the latter half. A more detailed visualization of this variation is explained in Fig.5

4. Modified U-Net architecture:

This variation is a further modification to the previous architecture. In this variation, the residual connections were removed. This implies that each layer has two convolution layers with a similar 3*3 kernel. No transpose convolution is performed in the expansion path similar to the previous version. The number of contraction and expansion pathways remains similar to the previous variation.

For each scenario, multiple images from the provided dataset were utilized and a comparative analysis between the predicted masks generated by the network and the ground truth masks was conducted. The neural network underwent training on an A100 and V100 Nvidia GPU within the Google Collab Environment. It was implemented on the PyTorch framework, and using the Adam optimization algorithm with a fixed learning rate of 1 ×10−4. Throughout the training procedure, a batch size of 8 was utilized.

Results:

1.Res U-Net:

This network was run for 30 epochs with over 35 millions parameters and the results are displayed in the Figure.6. As it can be observed, this network had unsatisfactory predicted masks and when analyzing the loss metrics it was noticed that even after an extensive 30-epoch training regimen, the loss did not converge. This finding highlighted the challenge in achieving optimal performance for lung tumor segmentation using the Res U-Net architecture considered.

Fig. 6: Original Res U-Net Architecture Implementation, 30 epochs

2. Res U-Net with lesser layers:

The poor results were attributed to high depth and the huge number of tunable parameters. So, it was decided to reduce the depth of the network and run them. The same architecture was run with reduced layers, hoping for a faster convergence. Training for over 20 epochs led to a BCE loss of 0.6931. The result is shown in Figure 7.

Fig. 7: Res U-Net with less depth, 10 epochs

3. Modified Res U-Net:

The modified Res U-Net architecture (without the inclusion of the 2D Convolution Transpose in the network and a preparation layer) was used here and was run for a reduced number of epochs (3 epochs) and the results were surprisingly better as we can see from Figure 8,9,11. The idea behind this architectural modification is that it gave a reduced model complexity, and thereby the network managed to capture more generalized and essential features pertinent to lung tumor segmentation, and there was an improvement in performance with only 3 epochs in this case. Even though the model was trained for a relatively limited number of epochs, the resulting BCE loss, as shown in Table 1, closely resembles the BCE loss achieved with the modified U-Net architecture, which will be explained in following subsection.

The predicted masks in this scenario demonstrated a significantly closer resemblance to the ground truth masks. And as expected, much better segmentation results were attained by observing the result in Figure 9. This particular case had a BCE loss of 0.028, which is less than the one attained using modified U-Net architecture for the same number of epochs. This Res U-Net architecture outperformed the modified U-Net architecture and produced a promising result effectively minimizing its prediction errors when compared to ground truth labels.

Fig. 8: Proposed Res U-Net Architecture Implementation, 3 epochs

Fig. 9: Proposed Res U-Net Architecture Implementation , 10 epochs

Fig. 10: U-Net Architecture Implementation, 10 epochs

Fig. 11: Ground Truth versus Predicted tumor

4. Modified U-Net architecture:

The modified U-Net architecture was run for 10 epochs, featuring approximately 7.8 million parameters. The outcomes are shown in Figure 10,12. This architecture achieved a BCE loss of 0.0034 and attained reasonably good results in the segmentation task. However, this did not seem to significantly outperform the variation mentioned in the previous case.

Fig. 12: Ground Truth versus Predicted tumor

This project primarily focused on lung tumor segmentation using deep learning architectures, specifically U-Net and Res U-Net. It highlighted the challenges with the original Res U-Net architecture, leading to a modification that demonstrated improved performance. The modified architecture, with reduced model complexity, achieved promising results even with a shorter training duration. Further refinement and exploration of hybrid approaches may enhance the robustness of lung tumor segmentation models.

References:

[1] Bray, F. et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 68, 394–424 (2018).

[2] Barrett, A., Dobbs, J. Roques, T. Practical Radiotherapy Planning 4th edn (CRC Press, 2009).

[3] Stroom, J. C. Heijmen, B. J. M. Geometrical uncertainties, radiotherapy planning margins, and the ICRU-62 report. Radiother. Oncol. 64, 75–83 (2002).

[4] Lambin, P. et al. Radiomics: extracting more information from medical images using advanced feature analysis. Eur. J. Cancer 48, 441–446 (2012).

[5] Kumar, V. et al. Radiomics: the process and the challenges. Magn. Reson. Imaging 30, 1234–1248 (2012).

[6] U-Net Explained: Understanding its Image Segmentation Architecture | by Conor O'Sullivan | Towards Data Science

[7] Nahian Siddique, Sidike Paheding, Colin P Elkin, and Vijay Devabhak tuni. U-net and its variants for medical image segmentation: A review of theory and applications. Ieee Access, 9:82031–82057, 2021.

[8] https://medium.com/@nishanksingla/unet-with-resblock-for-semanticsegmentation-dd1766b4ff66