Deep Learning has been the #1 buzzword in the digital pathology community over the last few months and it seems like image analysis based on deep learning has the potential to become the killer app finally driving digitization in pathology to widespread acceptance. One main goal of applying deep learning in pathology is to develop and use image analysis solutions that will provide more reproducible results than the manual assessment by human pathologists and thus free them from laborious quantification tasks such as cell counting. Another goal is to achieve this automation in a more effective way and with better performance than with handcrafted algorithms, that rely on human feature extraction.
Apart from some commercial software vendors promoting digital pathology solutions based on deep learning we have seen a number of promising scientific publications and competitions such as the Camelyon Challenge over the last few months.
The rising availability of digital whole slide images is forming the necessary large dataset for deep learning algorithms to be successful. But is it really that easy to train a computer to see and think like a pathologist with years and decades of training, if you just give it enough images to train on? The answer is ...well … in reality it is a bit more complicated.
Recent applications of deep learning in pathology
Some of the latest publications indicate that deep learning algorithms can take digital pathology image analysis to a level where it can equal or even outperform a human pathologist in specific tasks when given the right kind of training data. Here are some examples:
Mitosis Detection
In 2012 the International Conference on Pattern Recognition (ICPR 2012) held a competition on Mitosis Detections on Breast Cancer Histological Images. The two leading models were two of the first successful applications of Deep Neural Networks to the analysis of histological images. The winning model submitted by a group from the Swiss AI lab IDSIA1 achieved an F-Score (combined measure of precision and sensitivity) of 0.78 followed by a submission from IPAL, Grenoble2 with an F-Score of 0.71. The fact that the deep learning based model submitted by a group of AI experts without a specific background in biomedical imaging outperformed all other submissions, expanded the horizon for what could actually be possible applying these methods to histological image analysis.
A tutorial on Deep Learning in Digital Pathology
A great starting point to get an overview of how a selection of seven clinically relevant tasks in digital pathology (including nuclei segmentation, epithelium segmentation, tubule segmentation, lymphocyte detection, mitosis detection, invasive ductal carcinoma detection and lymphoma classification) can be automated using deep learning is Andrew Janowczyk’s tutorial3. The tutorial uses the open source deep learning framework Caffe and the popular AlexNet for all the tasks and achieves a performance similar or superior to handcrafted approaches. It identifies the selection of the right magnification, managing errors in annotations in the training data and choosing a suitable training data set that provides the network with the most informative exemplars. The step-by-step instructions, source code, trained models and input data for this tutorial are available online.
USE CASE 1: NUCLEI SEGMENTATION
DETECTION OF PROSTATE CANCER AND MICRO- AND MACRO-METASTASES OF BREAST CANCER
The goal of this project4 was to evaluate deep learning as a tool to not only increase accuracy but also efficiency in clinical diagnosis of prostate cancer and metastases of breast cancer, not only by detecting cancer and metastasis on a slide level, but by also excluding slides that do not contain cancer. A set of prototype image regions was extracted from all of the manually delineated digital slides and used as training data for the CNN. The CNN was able to identify all slides containing disease and exclude up to 32 % of the slides not containing disease for prostate cancer and up to 44 % for breast cancer metastases without overlooking any slides that contain disease. This indicates that substantial gains in efficiency are possible by using CNNs to exclude tumor-negative slides from further human analysis.
Detection of Breast Cancer Metastasis
The Camelyon Challenge, organized by a group of researcher to evaluate both existing and novel algorithms to detect metastases in hematoxylin and eosin (H&E) stained whole-slide images of lymph node sections, received a total of 32 submissions. The performance of the leading submission by a team from Harvard/MIT is significantly better (AUC of 0.99) than the human benchmark (AUC of 0.96). The result shows that deep learning based algorithms can outperform a human in defined recognition tasks if they are given a sufficient amount of correctly labeled training data.
GLIOMA GRADING
This work5 uses a Modular Deep Learning Classification Pipeline to automatically grade gliomas into Lower Grade Glioma (LGG) Grade II, LGG Grade III, and Glioblastoma Multiforme (GBM) Grade IV. The achieved classification accuracy of this method is 96% for the task of GBM vs. LGG classification, 71% for further identifying the grade of LGG into Grade II or Grade III on a independent data set coming from new patients from a multi-institutional repository.
Segmentation and Classification of Epithelial and Stroma
Distinguishing stromal from epithelial tissue is a critical initial step in developing automated computerized algorithms for cancer diagnosis. In this work6 a Deep Convolutional Network is beating the performance of three models with handcrafted feature extraction in segmenting and classifying stromal and epithelial tissue on H&E stained images of breast cancer and IHC stained images of colorectal cancer reaching accuracies between 84 and 100 %.
Possibilities and Limitations
Most deep learning based approaches use supervised learning and need a sufficient amount and right choice of labeled training data (e.g. representing all types of a certain type of cancer). Many of the publications above use data from just one or two different histology labs for training. So the models might not be suitable for general application since staining might differ from lab to lab. However, approaches7 are developed to address this problem by trying to learn features, which are invariant to differences in the input images and staining variations. Some of the works mentioned above (Janowczyk3, Litjens et al.4) also mention that training data was chosen by an expert to include a data set that is representative for all relevant tissue types or appearances of the disease the model is focus on. Till today there is no large scale initiative in sight that is comparable to the ImageNet project providing a data set comparable to what a pathologist is seeing in his many years of training.
Another issue is the question how to obtain the so called “ground truth”, meaning the classification of the test data set by a human pathologist the model is compared to. This concept anticipates that the human pathologist does not make any errors which stands in contradiction to the high inter- and intraobserver variabilities like e.g. the 25% disagreement in study on interpreting breast biopsies. Annotating whole slide images is a long and laborious process that requires expert knowledge and is usually not done at the pixel level, but at higher magnifications, leading to numerous errors at higher magnifications.
The choice of the “right” set of training data enabling the network to learn about the important differences seems to be key to the effectiveness of deep neural networks. In this case “right” means that the training data includes all representations of the object to detect e.g. stages of mitosis, types of tumor as well as images that do not include the object in a large enough quantity. But again this task requires (pathological) expertise in the relevant field.
USE CASE 6: INVASIVE DUCTAL CARCINOMA (IDC) SEGMENTATION from Andrew Janowczyk's Tutorial
The publications cited above look at a variety of different tasks performed by a pathologist that might be feasible for automation.The typical workflow of a pathologist both in research but also in clinical routine e.g. when diagnosing cancer is complex and needs a stepwise approach consisting of various tasks such as cell counting, tissue classification and the analysis of multiple biomarkers to characterize the tumor. Today’s algorithms are not (yet?) able to put data from the different models into context like a human pathologist would before deciding on his diagnosis.
So, what’s in it for pathologists and patients today (and tomorrow)?
As in other complex applications it does not look like computers will be able to replace a pathologist e.g. in cancer care anytime in the near future. But they will probably greatly support pathologists in tasks like quantification of cells or cancer scoring very soon. In so doing they will help pathology labs and health care providers to deal with the growing shortage of pathologists while providing the necessary quantified data as the basis for modern precision therapies at the same time. In this way they will hopefully help cancer patients with a quick and precise diagnosis to ensure the optimum treatment for their specific disease.