Skip to main content

Chest X-Ray Analysis of Tuberculosis by Deep Learning with Segmentation and Augmentation

In this paper, the authors explore the efficiency of lung segmentation, lossless and lossy data augmentation in  computer-aided diagnosis (CADx) of tuberculosis using deep convolutional neural networks applied to a small and not well-balanced Chest X-ray (CXR) dataset.

Dataset

Shenzhen Hospital (SH) dataset of CXR images was acquired from Shenzhen No. 3 People's Hospital in Shenzhen, China. It contains normal and abnormal CXR images with marks of tuberculosis.

Methodology

Based on previous literature, attempts to perform training for such small CXR datasets without any pre-processing failed to see good results. So the authors attempted segmenting the lung images before being inputted to the model. This gave demonstrated a more successful training and an increase in prediction accuracy.

To perform lung segmentation, i.e. to cut the left and right lung fields from the lung parts in standard CXRs, manually prepared masks were used.

The dataset was split into 8:1:1 parts for training, validation and test parts respectively. Images were rescaled by 1./255 and resized to 2048×2048 and distributed among training, validation and test parts. The model was trained on 
  1. Segmented SH dataset
  2. Segmented SH dataset with lossless data augmentation
  3. Segmented SH dataset with lossy data augmentation

Segmented Dataset

Training on this dataset observed overfitting due to the small size of the dataset. This is reduced by data augmentation methods discussed in the next two sub sections.

Lossless Data Augmentation

The lossless data augmentation for 2D images included the following transformations: mirror-like reflections (left-right and up-down) and rotations by 90n degrees, where n = 1,2,3. This allowed to increase the size of the whole dataset by 8 times obtain the more realistic results on accuracy and loss during training and validation.

Lossy Data Augmentation

Lossy data augmentation for these 2D images included rotations by 5 degrees. This augmentation is in addition to the data augmentation steps taken during lossless data augmentation.

Conclusion

Lossless data augmentation of the segmented dataset leads to the lowest validation loss (without overfitting) and nearly the same accuracy (within the limits of standard deviation) in comparison to the original and other pre-processed datasets after lossy data augmentation. In conclusion, besides the more complex deep CNNs and bigger datasets, the better progress of CADx for the small and not well-balanced datasets even could be obtained by better segmentation, data augmentation, dataset stratification, and exclusion of non-evident outliers.

Comments

Popular Posts

BLIP: Bootstrapping Language-Image Pretraining for Unified Vision-Language Understanding

BLIP is a new vision-language model proposed by Microsoft Research Asia in 2022. It introduces a bootstrapping method to learn from noisy image-text pairs scraped from the web. The BLIP Framework BLIP consists of three key components: MED  - A multimodal encoder-decoder model that can encode images, text, and generate image-grounded text. Captioner  - Fine-tuned on COCO to generate captions for web images. Filter  - Fine-tuned on COCO to filter noisy image-text pairs. The pretraining process follows these steps: Collect noisy image-text pairs from the web. Pretrain MED on this data. Finetune captioner and filter on the COCO dataset. Use captioner to generate new captions for web images. Filter noisy pairs using the filter model. Repeat the process by pretraining on a cleaned dataset. This bootstrapping allows BLIP to learn from web-scale noisy data in a self-supervised manner. Innovations in BLIP Some interesting aspects of BLIP: Combines encoder-decoder capability in one...

A non-local algorithm for image denoising

Published in   2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, this paper introduces two main ideas Method noise Non-local (NL) means algorithm to denoise images Method noise It is defined as the difference between the original (noisy) image and its denoised version. Some of the intuitions that can be drawn by analysing method noise are Zero method noise means perfect denoising (complete removal of noise without lose of image data). If a denoising method performed well, the method noise must look like a noise and should contain as little structure as possible from the original image The authors then discuss the method noise properties for different denoising filters. They are derived based on the filter properties. We will not be going in detail for each filter as the properties of the filters are known facts. The paper explains those properties using the intuitions of method noise. NL-means idea Denoised value at...

CLIP: Learning Transferable Visual Models From Natural Language Supervision

CLIP (Contrastive Language-Image Pre-training) is a new approach to learning visual representations proposed by researchers at OpenAI in 2021. Unlike traditional computer vision models which are trained on large labeled image datasets, CLIP learns directly from natural language supervision. The Core Idea The key insight behind CLIP is that we can connect images and text captions without generating the captions. By training the model to predict which caption goes with an image, it learns a rich visual representation of the world. As illustrated above, CLIP consists of two encoders - an image encoder and a text encoder. The image encoder takes in an image and outputs a visual representation vector. The text encoder takes in a caption and outputs a text representation vector. During training, these representations are optimized to be closer for matching image-text pairs, and farther apart for non-matching pairs. This is known as a contrastive loss objective. Benefits of CLIP There are sev...