# Introduction

Title: Structure Correcting Adversarial Network for Organ Segmentation in Chest X-rays
Author: Wei Dai, Joseph Doyle, Xiaodan Liang, Hao Zhang, Nanqing Dong, Yuan Li, Eric P. Xing Petuum Inc.
arXiv: https://arxiv.org/abs/1703.08770

# Keyword

• critic network（判别网络）
• segmentaiton network（分割网络）
• Organ Segmentation（器官图像分割）
• Chest X-rays (CXR)
• Structure Correcting（通过critic network 获取 Global structure information 全局结构信息）
• FCN + GAN

# Main Work

1. Propose Structure Correcting Adversarial Network(SCAN) to segment lung fields and the heart in CXR images（提出SCAN框架）

• a critic network: learns to discriminate between the ground truth organ annotations from the masks synthesized by the segmentation network during trainning; learns the higher order regularities and effectively transfers this global information back to the segmentation model to achieve realistic segmentation outcome（critic network帮助学习到高层的结构信息，单靠分割模型会面临训练样本量不足问题）
• segmentation model : convolutional network
• end-to-end（端到端）
2. The model produces highly accurate and natural segmentation. （高准确率）

• 94.7% IoU for lung fields (human experts: 94.6%)
• 86.6% IoU for heart fields (human experts: 87.8%)
• Surpass current state-of-the-art（超越当前最高水平）
3. Using only very limited trainning data availabel, the model reaches human-level performance without relying on any existing trained model or dataset.（数据依赖性小，可达人类识别水平）

• SCAN model is more robust when applied to a new, unseen dataset, outperforming the vanilla segmentation model by 4.3%

# Background

## Motivation

1. Chest X-ray (CXR) ofen with over 2-10x more scans than other imaging modalities such as MRI, CT scan and PET scans due to its low cost and low dose of radiation. It is asignificant workloads on radiologists and medical practitioners.（CXR花费低，辐射少，导致数量多, 工作负担重）
• In 2015/16, in UK’s public medical sector: 22.5M X-ray images (8M CXR), 4.5M CT, 3.1M MRI
• Shortage of radilogists in the world
2. Organ segmentation is a crucial step to obtain effecive computer-aided detection on CXR.（器官分割CXR计算机辅助诊断重要一步）
• The segmentation of the lung fields and the heart provides rich structure information about shape irregularities and size measurements that can be used to directly assess certain serious clinical conditions, such as cardiomegaly（心肥大）, pneumothorax（气胸）, pleural effusion（胸腔积液）, emphysema（肺气肿）.
• Explicit lung region masks can improve interpretability of computer-aided detection, which is important fir clinical use.

CXR有着花费低，辐射少的有点，但是同时也导致了数量多, 放射科工作者工作负担重的问题。能对CXR进行器官语义分割是构建计算机辅助诊断系统的重要步骤，通过器官结构信息可以发现许多病症的存在。因此本文的研究是有现实意义的。

## Challenge

• X-rays have low resolution and 2-D projection compared with the more modern medical imaging technologies such as CT scan and PET scans.（X光分辨率低，2d成像）
• Very limited CXR trainning data with pixel-level annotations due to expense（像素级标注的CXR训练数据很少）
• CXRs exhibit substantial variations across different patient populations, pathological conditions, imaging technology and operation（CXR样本差异性大）
• CXR images are gray-scale and drastically different from natural images（CXR图是灰度图，现有模型可迁移性差）
• to incorporate the implicit mdedical knowledge involved in contour determination.（如何将医学知识融入边缘判定）
• medical experts look for certain consistent structures surrounding the lung fields while annotating the lung fields.（医学专家在标定边缘的时候会寻找特定结构，如aortic arch（主动脉弓），cardiodiaphragmatic angles（心隔角））
• Therefore， a successful segmentation model must effectively leverage global structual information to resolve the local details.（可突破点：应用全局结构信息）
• high contrast between rib cage and lung fields.

### Lung Field Segmentation

#### Categories

1. Rule-based systems apply pre-defined set of thresholding and morphological operations that are derived from heuristics.
2. Pixel classification methods classify the pixels as inside or outside of the lung fields based on pixel intensities.（像素分类）
3. Based on deformable（可变形） models such as Active Shape Model (ASM) and Active Appearance Model.

#### Current state-of-the-art

Registration-based approach: to build a lung model for a test patient, finds patients in an existing database that are most similar to the test patient and perform linear deformation of their lung profiles based on key point matching.（比较法；关键点匹配）

### Semantic Segmentation with Convolutional Networks

Aims to assign a pre-defined class to each pixel

#### Current state-of-the art

• Fully convolutional network (FCN)
• Improvement: Semantic segmentation using adversarial networks

We note that there is a growing body of recent works that apply neural networks end-to-end on CXR images [25, 34]. These models directly output clinical targets such as disease labels without well-deﬁned intermediate outputs to aid interpretability. Furthermore, they generally require a large number of CXR images for training, which is not readily available for many clinical tasks involving CXR images.

# Problem Definition

• CXRs in the posteroanterior (PA 由后向前) view
• Lung fields definition: lung fileds consist of all the pixels for which radiation passes through the lung but not through the following structures, the heart, the mediastinum（纵膈，介于两肺之间的不透明区域）, below the diaphragm（膈）, the aorta（主动脉）, and if visible, the superior vena cava（上腔静脉）.
• The heart boundary is generally visible on two sides, while the top and bottom borders of the heart have to be inferred due to occlusion by the mediastinum（心脏左右边界通常可见，上下边界被纵膈遮挡需要推测）

# Structure Correcting Adversarial Network (SCAN)

Authors adapt FCNs to gray-scale CXR images uder the stringent constraint of very limited trainning dataset of 247 images. It departs from the usual VGG architecture and can be trained without transfer learning from existing models or dataset.

## Adversarial Training for Smeantic Segmentation

### GAN

• a generator network: learn the data distribution
• a critic network: estimates the probability that a sample comes from the tranning data instead of synthesized by the generator
• Adversarial process: The generator’s objective is to maximize the probability that the critic makes a mistake, while the critic is optimized to minimize the chance of mistake.
• The critic, which itself can be a complex neural network, can learn to exploit higher order inconsistencies in the samples synthesized by the generator.

Use the critic to learn these higher order structures and guide the segmentation network to generate masks more consistent with the learned global structures.

Key: 利用判别模型来学习高阶的结构信息来指导分割网络学习到全局结构信息。

## Training Objectives

### Data

• $S$: segmentation network
• $D$: critic network
• $x_i$: input image, shape $[H,W,1]$ for a single-channel gray-scale image with heigh $H$ and width $W$
• $y_i$: the associated mask labels, shape $[H,W,C]$ where $C$ is the number of classes including the background.
• for each pixel location $(j,k)$, $y_i^{jkc}=1$ for the labeled class $c$ while the rest of the channels are zero($y_i^{jkc’}=0$ for $c’ \neq c$).
• $S(x) \in \lbrace 0, 1 \rbrace ^{\lbrace H,W,C \rbrace}$: denote the class probabilities predicted by $S$ at each pixel location such that class probailities normalize to 1 at each pixel.（$S(x)$:通过S预测的每一个像素点每个类的概率）
• $D(x_i, y)$: scalar probability estimate of $y$ coming from the traning data (ground truth) $y_i$ instead of the predicted mask $S(x_i)$ （$D(x_i,y)$: $y$来自训练数据(ground truth)$y_i$而非$S(x_i)$的概率）

### Optimization problem

Eq.(1):
$$\min_S \max_D \lbrace J(S,D):=\sum_{i=1}^N J_s(S(x_i), y_i) - \lambda [J_d(D(x_i, y_i), 1) + J_d(D(x_i, S(x_i)),0)] \rbrace$$

• 固定$S$，针对$D$（max下标），最大化$J(S,D)$
• 固定$D$，针对$S$最大化$J(S,D)$,
• $J_s(\hat y, y) := \frac{1}{HW} \sum_{j,k} \sum_{c=1}^C-y^{jkc} \ln y^{jkc}$: multi-class cross-entropy loss for predicted mask $\hat y$ averaged over all pixels.
• $J_d(\hat t, t):= -t\ln \hat t + (1-t) \ln(1-\hat t)$ : binary logistic loss for the critic’s predition
• $\lambda$ : tuning parameter balancing pixel-wise loss and the adversarial loss
We can solve Eq.(1) by alternate between optimizing $S$ and optimizing $D$ using their respective loss functions.（训练方法：单独交替迭代训练）

#### Trainning the Critic

Train the critic network by minimizing the following objective with respect to $D$ for a fixed $S$:
$$\sum_{i=1}^N J_d(D(x_i, y_i), 1) + J_d(D(x_i, S(x_i)),0)$$

#### Trainning the Segmentation Network

Given a fixed D, we train the segmentation network by minimizing the following objective with respect to $S$:
$$\sum_{i=1}^N J_s(S(x_i),y_i) + \lambda J_d(D(x_i,S(x_i)),0)$$

• Use $J_d(D(x_i, S(x_i)),1)$ in place of $-J_d(D(x_i, S(x_i)),0)$, for $J_d(D(x_i, S(x_i)),0)$ leads to weaker gradient signals when $D$ makes accurate predictions.

## Segmentation Network

### FCN

• The down-sampling path(下采样) 类似图像分类网络架构
• convolutional layers
• max/average pooling layer
• VGG-based
• residual block architecture
• The up-sampling path(上采样)
• convolutional layers
• deconvolutional layers(transposed convolution) 反卷积层
• Most FCNs are applied to color images with RGB channels which this model cannot use.
• 3 classes
• the left lung
• the right lung
• the heart
• 247 CXR images

## Critic Network

• input: 4 or 5(including input image) channels
• segmentation network
• global average pool
• fully connected layer

# Experiments

## Dataset and Processing

### Dataset

Use two publicly available dataset with at least lung field annotations.

#### JSRT

• Released by Japanese Society of Radiological Technology (JSRT)
• 247 CXRs (154 have lung nodules and 93 have no lung nodule)
• Resolution: $2048 \times 2048$
• gray-scale with color depth of 12 bits.
• represents mostly normal lung and heart masks (lung nodules in most cases do not alter the counter of the lungs and heart

#### Montgomery

• Department of Health and Human Services, Montgomer Country, Marland, USA
• 138 CXRs (80 normal patients and 58 patients with manifested tuberculosis(TB肺结核))
• Resolution: $4020 \times 4892$ or $4892 \times 4020$
• gray-scale with color depth of 12 bits.
• Only the two lung masks annotations are available

### Processing

• scale all images to $400 \times 400$ pixels(with sufficient visual details for vascular structures)
• $800 \times 800$ does not improve the segmentation performance
• image normalization : for given image $x$
$$x^{jk} := \frac{x^{jk} - \hat x}{\sqrt{var(x)}}$$
• $\hat x$: mean of pixels in $x$
• $var(x)$: variance of pixels in $x$
• do not use statistics from the whole dataset（取单张图片均值和方差非整个数据集）
• post-processing: fill in any hole in the predicted mask, and remove the small patches disjoint from the largest mask
• PS: In practice, this is important for the predition output of the segmentation network (FCN alone), but dose not affect the evalutation results for FCN with adversarial trainning*（post-prcessing对FCN有效，对FCN对抗网络无提升效果）

## Training Protocols

• GANs are unstable during the training process
• pre-train the segmentation network using only the pixel-wise loss $J_s$
• faster
• do not train critic network
• Adam optimizer with learning rate 0.0002 to train all models for 350 epochs
• mini-batch size : 10
• with critic network : perform 5 optimization steps on the segmentation for each optimization steps on the critic network ( 5次segementation，1次critic）
• evaluation: IoU (Intersection-over-Union)
• $P$: the set of pixels in the predicted segmentation mask for a class
• $G$: the set of pixels in the ground truth mask for a class
• $IoU=\frac{|P \cap G|}{|P \cup G|} = \frac{|TP|}{|TP|+|FP|+|FN|}$
• Dice Coefficient: $\frac{2|P \cap G|}{|P + G|} = \frac{2|TP|}{2|TP|+|FP|+|FN|}$

## Experiment Design and Result

### Design

• JSRT
• development set: 209 images (randomly)
• evaluation set: 38 images
• tune hyperparameters (such as $\lambda$ in Eq.(1)) using a validation set within development set
• Montgomery
• development set: 117 images(randomly)
• evaluation set: 21 images
• use the same hyperparameters tuned in JSRT

### Performance

1. Compare FCN with SCAN on JSRT

1. Compare to existing methods on JSRT

• current state-of-the-art: registration-based

1. on Different dataset（迁移性）
• different population
• train on the full JSRT and test on the full montgomery
• 单纯使用FCN数据集迁移性不佳

1. time efficiency