Skip to main content

Chest X-Ray Analysis of Tuberculosis by Deep Learning with Segmentation and Augmentation

In this paper, the authors explore the efficiency of lung segmentation, lossless and lossy data augmentation in  computer-aided diagnosis (CADx) of tuberculosis using deep convolutional neural networks applied to a small and not well-balanced Chest X-ray (CXR) dataset.

Dataset

Shenzhen Hospital (SH) dataset of CXR images was acquired from Shenzhen No. 3 People's Hospital in Shenzhen, China. It contains normal and abnormal CXR images with marks of tuberculosis.

Methodology

Based on previous literature, attempts to perform training for such small CXR datasets without any pre-processing failed to see good results. So the authors attempted segmenting the lung images before being inputted to the model. This gave demonstrated a more successful training and an increase in prediction accuracy.

To perform lung segmentation, i.e. to cut the left and right lung fields from the lung parts in standard CXRs, manually prepared masks were used.

The dataset was split into 8:1:1 parts for training, validation and test parts respectively. Images were rescaled by 1./255 and resized to 2048×2048 and distributed among training, validation and test parts. The model was trained on 
  1. Segmented SH dataset
  2. Segmented SH dataset with lossless data augmentation
  3. Segmented SH dataset with lossy data augmentation

Segmented Dataset

Training on this dataset observed overfitting due to the small size of the dataset. This is reduced by data augmentation methods discussed in the next two sub sections.

Lossless Data Augmentation

The lossless data augmentation for 2D images included the following transformations: mirror-like reflections (left-right and up-down) and rotations by 90n degrees, where n = 1,2,3. This allowed to increase the size of the whole dataset by 8 times obtain the more realistic results on accuracy and loss during training and validation.

Lossy Data Augmentation

Lossy data augmentation for these 2D images included rotations by 5 degrees. This augmentation is in addition to the data augmentation steps taken during lossless data augmentation.

Conclusion

Lossless data augmentation of the segmented dataset leads to the lowest validation loss (without overfitting) and nearly the same accuracy (within the limits of standard deviation) in comparison to the original and other pre-processed datasets after lossy data augmentation. In conclusion, besides the more complex deep CNNs and bigger datasets, the better progress of CADx for the small and not well-balanced datasets even could be obtained by better segmentation, data augmentation, dataset stratification, and exclusion of non-evident outliers.

Comments

Popular Posts

Ocean: Object-aware Anchor-free Tracking

The paper titled " Ocean: Object Aware Anchor Free Tracking " presents a novel approach to visual object tracking that is poised to outperform existing anchor-based approaches. The authors propose a unique anchor-free framework named Ocean, designed to address certain challenges in the current field of visual tracking. Introduction Visual object tracking is a crucial part of computer vision technology. The widely utilized anchor-based trackers have their limitations, which this paper attempts to address. The authors present the innovative Ocean framework, designed to transform the visual tracking field by improving adaptability and performance. The Problem with Anchor-Based Trackers Despite their wide usage, anchor-based trackers suffer from some notable drawbacks. They struggle with tracking objects experiencing drastic scale changes or those having high aspect ratios. The anchors, with their fixed scale and fixed ratios, can limit the flexibility of the trackers, making the...