[论文笔记] ChestX-ray8

Introduction

Title: ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks(标杆) on
Weakly-Supervised(弱监督) Classification and Localization of Common Thorax(胸部) Diseases

Author: Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, Ronald M. Summers

arXiv: https://arxiv.org/abs/1705.02315

本文主要是一篇关于胸部X光片的诊断和病理位置定位的论文,作者分析了当前医学图像在深度学习领域的应用仍然存在着数据的稀缺性,以及对标记数据的依赖性。作者旨在构建大规模高准确率的计算机辅助诊断系统,希望引起学术界能对构建大规模医学图像数据库的兴趣。作者通过自然语言处理(NLP)方法从医院PACS系统中,提取报告内容,获取标签,构建了一套医院规模的胸片数据库,主要包含了8种胸部疾病和正常样例,由十万张胸片构成(相比于其他成果,这个数据集已经算比较大的了)。作者又在这个数据集的基础上,通过设计了一套统一DCNN的训练框架,适用于不同的pre-trained model,完成病理识别,证明了能通过弱监督学习完成病理位置的空间定位,减少了对专业的病理位置标记图像的依赖性。是该研究方向的一个benchmark。

Keyword

  • Weakly-Supervised(弱监督)
  • Multi-labels Classification(多标签分类)
  • Localization(定位)
  • Hospital-scale Chest X-ray Database(医院规模X光胸片数据库)
  • Commn Thorax Diseases(常见胸部疾病)
  • NLP(自然语言处理)

弱监督:没有基于像素标记的训练图像,只有基于图像类别标签的图像, image-level class labels only。

Main Work

  • 构建了一个新的弱监督多标签医学图像数据集数据集ChestX-ray8,比以往做研究的样本量要大。(亮点)
  • 证明了在弱监督多标签图像的情况下,还可以判断和空间定位常见的胸部疾病。(亮点)
  • 未来目标:构建全自动的“reading chest X-rays” 系统
    PS: 介绍可围绕chestX-ray8和判断定位方法两个方面来介绍

Background

Mordern hospitals’ PACS(Picture Archiving and Communication Systems) has a tremendous number of X-ray imaging studies accompanied by radiological reports(ie. loosely labeled).
Open question: How this type of hospital-size Knowledge database can be used for large-scale high precision computer-aided diagnosis(CAD) systems.

目前医院PACS中有大量X光图像和放射报告,目前存在着如何将医院的这些数据库应用到构建大规模高准确率的计算机辅助诊断系统中来的问题。

State-of-the-art object dection and segmentation

Dataset

  • MS COCO: 80 categories, 200k images, 1.2M instances(350k people)
  • PASCAL VOC: 20 categories, 11530 images, containing 27450 annotated objects with bounding-boxes(BBox)

    Object Detection

  • MS COCO: 0.413 mAP
  • PASCAL VOC: 0.884 mAP

Main limitation of recent notable work

All proposed methods are eavaluated on some small-to-middle scale problems of (at most) several hunders patients. The performance of deep learning techniques remians unclear when it scales up to tens of thousands of patient sudies.

目前研究的不足:样本量偏小,数据稀缺

There have been recent efforts on creating openly available annotated medical image database.

  • OpenI (An open access biomedical search engine): 3955 radiology reports, 7470 associated chest x-rays –> caption generation

Challenge

  1. Generic, opened iamge-level anatomy and pathology labels cannot be obtained through crowd-sourcing, such as AMT(Amazon Mechanical Turk) 无医学背景的标注者无法标注医学图像标签,所以使用NLP结合image和reports 提取标签。
  2. The spatial dimensions of an chest X-ray are usually 2000x3000 pixels. 但是局部病理图片区域大小差异大,且相对于原图片很小。对此,本文提出了一种弱监督多标签分类和定位的框架来解决这个困难。
  3. 医学图像诊断不适合直接使用ImageNet pre-trained DCNN model来fine-tune,因此需要建立弱监督医学图像数据库并学习recognition和localization。

Construting Database

ChestX-ray8

  • A new chest X-ray database proposed by this paper.
  • 108,948 frontal-view X-ray images of 32,717 unique patients (collected from 1992 to 2015).
  • 24636 images contain one or more pathologies. 84312 images are normal cases.
  • Mined from authors’ institute’s PACS system.
  • Dimension: 1024 x 1024 bitmap
  • 200 instances for each pathology (totally 983 images) labeled with a B-Box as GT
  • 8 thoracic pathologies
    • Atelectasis(肺不张,肺萎缩)
    • Cardiomegaly(心肥大,心脏扩大症)
    • Effusion(积液)
    • Infiltration(浸润)
    • Mass(肿块)
    • Nodule(结节)
    • Pneumonia(肺炎)
    • Pneumothorax(气胸)
  • Each image can have multi-labels, including 8 thoracic pathologies and “Normal” labels. (Nomal指的是不含任何病理症状,不局限上述8个病理)
  • Labels from the associated radiological reports using NLP.
  • There are connections between different pathologies

Labeling Disease Names by Text Mining(标签提取)

Tools

  • DNorm: a machine learning method for disease recognition and normalization. It maps every mention of keywords in a report to a unique concept ID in the Systematized Nomenclature of Medicine Clinical Terms (or SNOMED-CT), which is a standardized vocalbulary of clinical terminology for the electronic exchange of clinical health infromation.(临床术语转换)
  • MetaMap: a tool to detect bioconcepts from the biomedical text corpus. Different from DNorm, it is an ontology-based approach for the detection of Unified Medical Language System(UMLS) Metathesaurus.(也是用于临床术语检测和转换)
    Process: Merge the results of DNorm and MetaMap. Transefer : reports –>SNOMED-CT –> UMLS terminology

作者一开始使用了DNorm和MetaMap工具,并综合两者的结果处理的结果来提取标签,每种疾病样本标签存储在一个UML文件中,但是存在着噪声问题,主要是否定形式的表达和不确定形式的表达,在语义方面存在着一些问题,据此作者提出了一种基础NLP的改进流程。(这里暂时不关注NLP的具体实现和算法)

Noise(上述工具存在噪声问题)

Eliminate noisy labeling by ruling out negated pathological statements(否认形式的陈述) and uncertain mentions of findings and diseases, e.g., “suggesting obstructive lung disease”.
Use regular expression can not capture various syntatic constructions for multiple subjects. for example, “clear of A and B” -> A as a negation but not B.

Improvement

syntactic level, utilize the syntactic dependency information. Define rules on the dependency graph, by utilizing the dependency label and direction information between words.

Steps
  1. Split and tokenize the reports into sentences using NLTK.
  2. Parse each sentence by Bllip parser using David McCloskys biomedical model.
  3. The syntactic dependencies are obtained from “CCProcesed” dependencies output by applying Stanford dependencies converter on the parse tree.

Quality Control

Using OpenI API, retrieve a total of 3851 unique radiology reports for validation.

使用OpenI API来对提出的方法进行校验,Performance相比于MetaMap有较大的提升。特别是在Infiltration、Pneumothorax等类别上。

Processing Chest X-ray Images

  • DICOM : Digital Imaging and Communication on Medicine
  • Typical X-ray image dimensions: $3000 \times 2000$
  • ChestX-ray8 : resized as $1024 \times 1024$ bitmap
  • OpenI: $512 \times 512$

DICOM

DICOM表示“医学数字成像和通讯”。DICOM是由“美国国家电气制造商协会”(NEMA)发布的标准,这一标准规范了医学成像的管理、储存、打印和信息传输,这些都是扫描仪或医院“医疗影像储传系统”(PACS)中的文件格式。 DICOM包括了一个文件格式和一个网络通讯协议,其中的网络通讯协议是医疗实体间使用TCP/IP进行沟通的一个规范和准则。 一个DICOM文件由一个数据头和图像数据组成的。数据头的大小取决于数据信息的多少。数据头中的内容包括病人编号、病人姓名等等。同时,它还决定了图像帧数以及分辨率。这是图片查看器用于显示图像的。即使是一个单一的图像获取,都会有很多DICOM文件。

参考:https://www.leiphone.com/news/201707/oHpedrbiTzU4nKvK.html

Bouding Box for Pathologies

  • 200 instances for each pathology( 1600 instances total), consisting of 983 images.
  • Given an image and a disease keyword, a board-certified radiologist identified only the corresponding disease instance in the image and labeled it with a B-Box.

挑取了其中一部分样本,请专业认证的放射科医师进行B-Box的标定,作为Ground Truth。

Unified DCNN Framework

  • Models: ImageNet pre-trained models, e.g., AlexNet, GoogleNet, VGGNet-16, ResNet-50, by leaving out the fully-connected layers and the final classification layers.
  • Improvoment:Insert a transition layer, a global pooling layer, a prediction layer, a loss layer

使用了global pooling layer, prediction layer来代替传统的全连接层和softmax层。

Unified一致性体现在新设的Transition Layer,通过不同的pre-trained model通过输出层转换为$S\times S \times D$的标准输出层。使用卷积层的方法(每张图片当做一个神经元,该层输出数目为神经元数目,等于卷积核数目),经过Transition layer之后得到$ S\times S \times D$的输出。

Multi-label Setup

8-dimensional label vector $ y = [y_1,…, y_c, …, y_C], y_c \in \lbrace 0,1 \rbrace, C= 8 $ for each image. $ y_c $ indicates the presence with respect to according pathology. Normal: $[0, 0, 0, 0, 0, 0, 0, 0]$

Transition Layer

To transform the activations from previous layers into a uniform dimension of output, $ S \times S \times D, S \in \lbrace 8, 16, 32 \rbrace $. D represents the dimension of features at spatial location $ (i, j), i,j \in \lbrace1, …, S\rbrace$, which can be avried in different model settings, e.g., $D=1024$ for GoogLeNet and $D=2048$ for ResNet.

Multi-label Classification Loss Layer

  • Firstly, Hinge Loss(HL), Euclidean Loss(EL), and Cross Entropy Loss(CEL) replace softmax loss.
    • It has difficulty learning positive instances(with pathologies), because there are more ‘0’s than ‘1’s
  • Improvement: positive/negative instances balanced loss function: weight CEL(W-CEL)

$$ L_{W-CEL} \cdot (f(\vec{x}), \vec{y}) = \beta_P\sum_{y_c=1}-\ln (f(x_c))+\beta_N\sum_{y_c=0}-\ln (1-f(x_c))$$

where $\beta_P$ is set to $\frac{|P|+|N|}{|P|}$, while $\beta_N$ is set to $\frac{|P|+|N|}{|N|}$. $|P|$ and $|N|$ are the total number of ‘1’s and ‘0’s in a batch of image labels.

损失函数在CEL的基础上,根据多标签分类样本分布不均衡的情况,中加入了对正负样本均衡性的考虑。

Global Pooling Layer

  • Role: choose what information to be passed down
  • Max pooling, average pooling, Log-Sum-Exp(LSE) pooling
    • LSE pooled value $$x_p = \frac{1}{r} \cdot \log[\frac{1}{S} \cdot \sum_{(i,j)\in S} exp(r \cdot x_{ij})]$$
      $x_{ij}$ is the activation value at $(i,j),(i,j)$ is one location in the pooling region S, and $ S=s\times s$ is the total number of locations in S.
    • LSE ranges from maximum $(r \rightarrow \inf )$ to average $(r \rightarrow 0)$ It serves as an adjustable option between max pooling and average pooling.
    • LSE has overflow/underflow problems
  • Improvment: $$ x_p = x^* + \frac{1}{r} \cdot \log [\frac{1}{S} \cdot \sum_{(i,j) \in S} \cdot exp(r \cdot(x_{ij} - x^*)]$$
    where $ x^* = max\lbrace |x_{ij}|, (i,j)\in S\rbrace$.

使用修改的LSE值(防overflow和underflow)来作为全局池化层。

Prediction layer

  • Weight of prediction layer : size of $ D \times C $ (C = 8, class number)
  • Transfer $1 \times D $ to $ 1 \times C $

Prediction layer输出8类疾病每类的概率,类似softmax层,而prediction layer的权值又可以在后续求每一类疾病的Heat map的时候所使用。关于疾病的判断,并不直接计算出其最佳阈值。而是使用了ROC curve和AUC来进行评判。每个不同的阈值会对应ROC曲线上不同的点,从而通过构建ROC曲线求AUC来评判模型的好坏。

Heat map

  • Multiply activations form transition layer with weights from prediction layer
  • 8 heatmap for 8 diseases
  • size: $ S \times S, S \in \lbrace 8, 16, 32 \rbrace $

Bounding Box Generation:

  • Normalized heatmaps to [0,255]
  • Threshold heatmap by {60, 180} individually (ad-hoc)
  • B-Boxes are generated to cover the isolated regions in the resulting binary maps

使用阈值来对Heat map的每个点进行二值化处理,后选取孤立区域绘制B-Box

Experiments

主要是关于实验中的一些具体参数设定,还有调试方法,模型评判标准。

CNN

  • SGD
  • training(70%), validation(10%), testing(20%)
  • Recognition on testing set
  • generate B-Boxes on 983 images with GT
  • Caffe framework
  • ImageNet pre-trained models , i.e., AlexNet. GoogLeNet, VGGNet-16, ResNet-50 from Caffe model zoo.
  • set $ batch_size \times iter_size = 80$ as a constant
  • total trainning iterations are customized for different CNN models to prevent over-fitting.

Performance

ROC and AUC

  • performance on four models
  • Mass : AUC=0.5609, huge within-class appearance variation
  • Pneumonia: AUC=0.6333, lack of total instances(less than 1%)

ROC(Receiver Operating Characteristic)曲线和AUC常被用来评价一个二值分类器(binary classifier)的优劣。此处将每类疾病prediction layer对应的输出值作为一个二值分类器来评判。不同的阈值,会对应ROC曲线不同的点,最终可以描绘出一个ROC曲线,从而求出AUC (Area Under Curve)。

若不了解ROC的话,很有必要先了解一下相关概念,可以参考下面博文,包括了TN, TP, FN, FP, precision, recall,f1-score, fbeta-score等一些度量标准的概念:

Different pooling strategies

  • AVE, MAX, different hyper-parameter r in LSE
  • best in r = 10

W-CEL

使用了考虑权值的CEL,模型的表现有所提升。

Disease Localization

  • standard Intersection over Union ratio (IoU)
  • the Intersection over the detected B-Box area ratio (IoBB)
  • computed B-Boxes are often larger than GT due to heat map resolution
    • difine a correct localization by requiring either $$IoU > T(IoU) or IoBB > T(IoBB)$$
    • T(IoBB) 为超参数,用来筛选B-Box
  • localization accuracy (Acc,)
  • Average False Positive (AFP)

对B-Box加入了一个后验处理条件,根据所侦测到的B-Box与GT B-Box的交集区域面积与该B-Box区域面积的比例(IoBB)与T(IoBB)的比较筛选掉一部分B-Box,再来进行准确率的判断。

Sample

Conclusion

ChestX-ray8 can enable the data-hungry deep neural network paradigms to create clinically meaningful applications, including common disease pattern mining, disease correlation analysis, automated radiological report generation, etc. For future work, ChestX-ray8 will be extended to cover more disease classes and integrated with other clinical information, e.g., followup studies across time and patient history.

Personal Experience

目前刚开始精读一篇论文,发现一篇论文往往要读不止一遍才能看懂其中精髓。第一遍往往是粗略一读,明白大体的意思。后来几遍细致去读,参考论文中所涉及到其他论文,边整理笔记,理清论文大体的架构。一开始读发现,会延伸到很多其他论文,都是自己所不懂的,所以得花比较多的时间。关于论文笔记,自己有在思考要用纯中文来写呢,但是做摘要的时候又大量引用到原文中的表述,又怕自己翻译转述不够准确。如果用纯英文写的话,可能自己下次看的时候会比较费力,还可能分布清楚哪些是自己写的哪些是摘要。当然学习论文的目的,也是在培养自己写论文的能力,所以目标肯定是希望自己能有良好的英文表述。但是目前呢,就先索性保留自己刚开始做摘要和笔记,然后加入自己中文的理解。

Reference

您的支持将鼓励我继续创作!