What’s all the fuss about deep learning and how will it impact pathology? Part 3: Deep Learning in Digital Pathology


What’s all the fuss about deep learning and how will it impact pathology? Part 3: Deep Learning in Digital Pathology

Deep Learning has been the #1 buzzword  in the digital pathology community over the last few months and it seems like image analysis based on deep learning has the potential to become the killer app finally driving digitization in pathology to widespread acceptance. One main goal of applying deep learning in pathology is to develop and use image analysis solutions that will provide more reproducible results than the manual assessment by human pathologists and thus free them from laborious quantification tasks such as cell counting. Another goal is to achieve this automation in a more effective way and with better performance than with handcrafted algorithms, that rely on human feature extraction.

Apart from some commercial software vendors promoting digital pathology solutions based on deep learning we have seen a number of promising scientific publications and competitions such as the Camelyon Challenge over the last few months.

The rising availability of digital whole slide images is forming the necessary large dataset for deep learning algorithms to be successful. But is it really that easy to train a computer to see and think like a pathologist with years and decades of training, if you just give it enough images to train on? The answer is ...well … in reality it is a bit more complicated.

Recent applications of deep learning in pathology

Some of the latest publications indicate that deep learning algorithms can take digital pathology image analysis to a level where it can equal or even outperform a human pathologist in specific tasks when given the right kind of training data. Here are some examples:

Mitosis Detection

In 2012 the International Conference on Pattern Recognition (ICPR 2012) held a competition on Mitosis Detections on Breast Cancer Histological Images. The two leading models were two of the first successful applications of Deep Neural Networks to the analysis of histological images. The winning model submitted by a group from the Swiss AI lab IDSIA1 achieved an F-Score (combined measure of precision and sensitivity) of 0.78 followed by a submission from IPAL, Grenoble2 with an F-Score of 0.71. The fact that the deep learning based model submitted by a group of AI experts without a specific background in biomedical imaging outperformed all other submissions, expanded the horizon for what could actually be possible applying these methods to histological image analysis.

A tutorial on Deep Learning in Digital Pathology

A great starting point to get an overview of how a selection of seven clinically relevant tasks in digital pathology (including nuclei segmentation, epithelium segmentation, tubule segmentation, lymphocyte detection, mitosis detection, invasive ductal carcinoma detection and lymphoma classification) can be automated using deep learning is Andrew Janowczyk’s tutorial3. The tutorial uses the open source deep learning framework Caffe and the popular AlexNet for all the tasks and achieves a performance similar or superior to handcrafted approaches. It identifies the selection of the right magnification, managing errors in annotations in the training data and choosing a suitable training data set that provides the network with the most informative exemplars. The step-by-step instructions, source code, trained models and input data for this tutorial are available online.



The goal of this project4 was to evaluate deep learning as a tool to not only increase accuracy but also efficiency in clinical diagnosis of prostate cancer and metastases of breast cancer, not only by detecting cancer and metastasis on a slide level, but by also excluding slides that do not contain cancer. A set of prototype image regions was extracted from all of the manually delineated digital slides and used as training data for the CNN. The CNN was able to identify all slides containing disease and exclude up to 32 % of the slides not containing disease for prostate cancer and up to 44 % for breast cancer metastases without overlooking any slides that contain disease. This indicates that substantial gains in efficiency are possible by using CNNs to exclude tumor-negative slides from further human analysis.

Detection of Breast Cancer Metastasis

The Camelyon Challenge, organized by a group of researcher to evaluate both existing and novel algorithms to detect metastases in hematoxylin and eosin (H&E) stained whole-slide images of lymph node sections, received a total of 32 submissions. The performance of the leading submission by a team from Harvard/MIT is significantly better (AUC of 0.99)  than the human benchmark (AUC of 0.96). The result shows that deep learning based algorithms can outperform a human in defined recognition tasks if they are given a sufficient amount of correctly labeled training data.


This work5 uses a Modular Deep Learning Classification Pipeline to automatically grade gliomas into Lower Grade Glioma (LGG) Grade II, LGG Grade III, and Glioblastoma Multiforme (GBM) Grade IV. The achieved classification accuracy of this method is 96% for the task of GBM vs. LGG classification, 71% for further identifying the grade of LGG into Grade II or Grade III on a independent data set coming from new patients from a multi-institutional repository.

Segmentation and Classification of Epithelial and Stroma

Distinguishing stromal from epithelial tissue is a critical initial step in developing automated computerized algorithms for cancer diagnosis. In this work6 a Deep Convolutional Network is beating the performance of three models with handcrafted feature extraction in segmenting and classifying stromal and epithelial tissue on H&E stained images of breast cancer and IHC stained images of colorectal cancer reaching accuracies between 84 and 100 %.

Possibilities and Limitations

Most deep learning based approaches use supervised learning and need a sufficient amount and right choice of labeled training data (e.g. representing all types of a certain type of cancer). Many of the publications above use data from just one or two different histology labs for training. So the models might not be suitable for general application since staining might differ from lab to lab. However, approaches7 are developed to address this problem by trying to learn features, which are invariant to differences in the input images and staining variations. Some of the works mentioned above (Janowczyk3, Litjens et al.4) also mention that training data was chosen by an expert to include a data set that is representative for all  relevant tissue types or appearances of the disease the model is focus on. Till today there is no large scale initiative in sight that is comparable to the ImageNet project providing a data set comparable to what a pathologist is seeing in his many years of training.

Another issue is the question how to obtain the so called “ground truth”, meaning the classification of the test data set by a human pathologist the model is compared to. This concept anticipates that the human pathologist does not make any errors which stands in contradiction to the high inter- and intraobserver variabilities like e.g. the 25% disagreement in study on interpreting breast biopsies. Annotating whole slide images is a long and laborious process that requires expert knowledge and is usually not done at the pixel level, but at higher magnifications, leading to numerous errors at higher magnifications.

The choice of the “right” set of training data enabling the network to learn about the important differences seems to be key to the effectiveness of deep neural networks. In this case “right” means that the training data includes all representations of the object to detect e.g. stages of mitosis, types of tumor as well as images that do not include the object in a large enough quantity. But again this task requires (pathological) expertise in the relevant field.


The publications cited above look at a variety of different tasks performed by a pathologist that might be feasible for automation.The typical workflow of a pathologist both in research but also in clinical routine e.g. when diagnosing cancer is complex and needs a stepwise approach consisting of various tasks such as cell counting, tissue classification and the analysis of multiple biomarkers to characterize the tumor. Today’s algorithms are not (yet?) able to put data from the different models into context like a human pathologist would before deciding on his diagnosis.

So, what’s in it for pathologists and patients today (and tomorrow)?

As in other complex applications it does not look like computers will be able to replace a pathologist e.g. in cancer care anytime in the near future. But they will probably greatly support pathologists in tasks like quantification of cells or cancer scoring very soon. In so doing they will help pathology labs and health care providers to deal with the growing shortage of pathologists while providing the necessary quantified data as the basis for modern precision therapies at the same time. In this way they will hopefully help cancer patients with a quick and precise diagnosis to ensure the optimum treatment for their specific disease.

1 Dan Ciresan, Alessandro Giusti, Luca Gambardella & Jürgen Schmidhuber, Mitosis detection in breast cancer histology images with deep neural networks. Med Image Comput Comput Assist Interv. 8150, 411–418 (2013).

2 Ludovic Roux, Daniel Racoceanu, Nicolas Loménie, Maria Kulikova, Humayun Irshad, Jacques Klossa, et al., Mitosis detection in breast cancer histological images An ICPR 2012 contest. Journal of Pathology Informatics, 4:8, (2013).

3 Andrew Janowczyk & Anant Madabhushi: Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases. Journal of Pathology Informatics, 7:29 (2016).

4 Geert Litjens et al.: Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Nature Scientific Reports 6:26286 (2016).

5 Mehmet Ertosun & Daniel Rubin, Automated grading of gliomas using deep learning in digital pathology images: A modular approach with ensemble of convolutional neural networks. AMIA Annual Symposium Proceedings 2015: 1899–1908 (2015).

6 Jun Xu, Xiaofei Luo, Guanhao Wang, Hannah Gilmore & Anant Madabhushi, A deep convolutional neural network for segmenting and classifying epithelial and stromal regions in histopathological images. Neurocomputing, 191: 214-223 (2016).

7 Konstantinos Kamnitsas, Christian Baumgartner, Christian Ledig, Virginia Newcombe, Joanna Simpson, Andrew Kane, David Menon, Aditya Nori, Antonio Criminisi, Daniel Rueckert & Ben Glocke, Unsupervised domain adaptation in brain lesion segmentation with adversarial networks. arXiv:1612.08894v1 [cs.CV] 28 (2016)

8 Joann Elmore, Gary Longton, Patricia Carney, et al, Diagnostic Concordance Among Pathologists Interpreting Breast Biopsy Specimens. JAMA, 313(11):1122-1132 (2015).


What’s all the fuss about deep learning and how will it impact pathology?  Part 2: Deep Learning for Image Recognition


What’s all the fuss about deep learning and how will it impact pathology? Part 2: Deep Learning for Image Recognition

Deep Learning: Learns increasingly complex features, Source: Andrew Ng

The limiting factor of traditional machine learning and computer vision technology before the recent rise of deep learning algorithms was the handcrafted feature extraction by a human telling the algorithm what to look for and how to classify an image. Hand engineering an algorithm to detect an object on an image is a long and effortful process requiring an expert to extract relevant features and mostly leading to unsatisfying results. Thousands of computer vision experts working on technologies for many years could not achieve what a three years old child learns by looking at millions of images.

So the basic concept behind deep learning for image recognition is to let the Deep Learning algorithm extract the features itself that are needed to classify an image based on a large set of training images. 

This is typically achieved by a hierarchical approach in layers, detecting simpler features and patterns like e.g. (1) light and dark pixels and (2) shapes and edges and then combining them to larger structures like (3) eyes, noses or mouths and finally a (4) human face.

Convolutional Neural Networks

A very popular type of neural network that has proven to be very effective performing image recognition tasks is Convolutional Neural Network (CNN). We don’t want to dig too deep here and will just stick with a rough overview to illustrate how CNNs are successful in reducing complexity to perform this task. Convolution in this context can be viewed as the process of filtering an image for specific patterns. Convolutional Networks combine the following types of layers performing different tasks. 

  • Convolutional layers: looking for patterns in the data
  • Rectified Linear Units (ReLUs): combining patterns to larger structures
  • Pooling layer: reducing complexity
  • Fully connected layer: connecting the findings with labeled data for classification

Source: Course Notes Stanford CS class CS231n: Convolutional Neural Networks for Visual Recognition by Andrew Kapathy

Training a machine like a child

Inspired by the idea of providing computers with the same kind of learning experiences  a child would have in his or her early developmental years, meaning a very large quantity of images, a team of researchers led by Fei Fei Lei (Stanford University) undertook a huge crowdsourcing effort to create a database containing millions of annotated images, the ImageNet database. While back in 2007 Fei Fei Li got the advice from colleagues to do something useful for her tenure, the massive effort taken by this project laid the foundation for the breakthrough of neural networks in computer vision by providing them with a very large set of labeled data.1

The image recognition revolution

In 2012 the first model based on a CNN, the AlexNet was submitted to the annual ImageNet Large Scale Visual Recognition Challenge by a team from the University of Toronto. This CNN brought the error rate down to 15% from 26% of the best conventional machine vision solution.2

The winning solution of the 2015 challenge, submitted by a team from Microsoft, was the first CNN to beat a human, bringing the error down to 3.5% compared to the human benchmark of 5.1%. Driven by these breakthroughs CNNs are finding their way into all fields of computer vision, including many medical and life science applications such as radiology, pathology and genomics. And while computer vision solutions based on handcrafted algorithms were struggling to achieve acceptable results, deep learning based solutions e.g. for cancer detection suddenly appear to be a considerable solution to assist doctors in diagnosis. 

How do CNNs “see” images?

Photo: Janko Ferlic, Unsplash

Photo: Janko Ferlic, Unsplash

For a long time researchers did not really have a clear idea what exactly happens inside a neural network. The goal of the Deep Visualization project is to provide a better understanding of this by visualizing how CNNs are seeing images. Researchers created images synthetically to maximally activate individual neurons in a Deep Neural Network (DNN). The images show what each neuron “wants to see”, and thus what each neuron has learned to look for.  3

Source: Jason Yosinski

So are we finally there? 

Can we train machines to see and process visual information like humans? And will they be able to replace us e.g. in driving our cars or curing our diseases? Not yet according to Olga Russakovsky, one of the ImageNet challenges organizers. In an article in New Scientist she points out that the algorithms only have to identify images as belonging to one of a thousand categories, which is tiny compared to what humans are capable of. To show true intelligence, machines would have to draw inferences about the wider context of an image, and what might happen one second after a picture was taken.4

In addition to the limitations in the type of tasks they can perform and in adding context to the  information, neural networks are also easily fooled. Google LeNet, the winning submission of the ImageNet challenge 2014, struggled to recognize images that contained filters (like e.g. on Instagram) or that depicted objects in an abstract form like a 3D rendering, painting or sketch.5  A Two-Minute-Paper Session on “Breaking Deep Learning Systems With Adversarial Examples” illustrates how easily neural networks can be fooled by adding noise to the images.6  Another study shows how neural networks make high confidence predictions for images that are unrecognizable for humans.7

The recent news that Google/Alphabet is changing its plans to develop a self driving car to partnering with car manufacturers and equipping cars with sensors8 could be seen as symptom of the fact that technical development is slower than expected and that technology still has a long way to go before we will be able to develop truly intelligent machines and replace humans in complex tasks like driving a car or diagnosing cancer. But today’s neural networks performing image recognition are capable of providing us with valuable assistance in such tasks by recognizing obstacles on the road or patterns in tissue that indicate cancer prevalence.


What’s all the fuss about deep learning and how will it impact pathology? (Part 1)


What’s all the fuss about deep learning and how will it impact pathology? (Part 1)

Technology and also daily media is full of stories about how artificial intelligence (AI) will change our lives and are already doing it today. Andrew Ng, chief scientist and leading deep learning researcher at Baidu Research, was recently cited as follows: “AI is the new electricity ... Just as 100 years ago electricity transformed industry after industry, AI will now do the same.” 1

AI and deep learning algorithms have already found their way into our daily lives through digital assistants on our smart phones capable to recognize human speech, the selection of news stories according to our own personal interests, and the recognition of the names of our friends in Facebook images.2

While there is potential and interest in AI from almost every industry, healthcare and life science are one of the main industries apart from the original tech industry that is adopting these technologies today for applications like drug development or medical image analysis e.g. performed on digital whole slide images scanned from histological sections.3 In recent months we have seen a number of promising results using artificial intelligence for e.g. cancer detection on WSI both in scientific publications (overview in Part 3) and in commercial efforts by digital pathology vendors.

Invasive ductal carcinoma (IDC) segmentation using deep learning (probability heatmap), Source: Andrew Janowczyk

Invasive ductal carcinoma (IDC) segmentation using deep learning (probability heatmap), Source: Andrew Janowczyk

The technology that is the enabler of the latest AI breakthroughs is called Deep Learning. The basic idea behind deep learning is to enable machines to learn e.g. speech or image recognition by themselves based on a set of training data without a human programmer “teaching them” the features to look for. This is achieved by using deep neural networks - a way of learning inspired by the human brain. Spending the last two years with a toddler witnessing the absolutely amazing development of new human brain learning how to speak, walk and name all animals in the local zoo at an incredible speed I find it hard to believe that any kind of algorithm running on a machine could ever reproduce this. So let’s take a closer look at the state of the art and what today’s algorithms are capable of.

In pathology deep learning could help computers look at tissue and recognize patterns such as a tumor more like a pathologist by providing them with the proper training. By this deep learning could help to overcome existing restrictions in digital image analysis such as dependency on human designed features. So could deep learning help image analysis to become the killer app for digital pathology? Of course we cannot give a definite answer to this question today. But with all the opportunity for pathologists and patients in mind, it is a good time to take a closer look at some of the technical concepts behind deep learning without digging into all the math behind it, the reasons for the latest revolution (Part 1), their application to image recognition (Part 2) and some of the possible applications in digital pathology and then how technology might impact pathology in general (Part 3)

Part 1: AI, Machine Learning and Deep Learning in a nutshell

Artificial Intelligence, Machine Learning and Deep Learning are often mentioned in the same context. Let’s look at some definitions to clarify. AI is “the science and engineering of making intelligent machines, especially intelligent computer programs.”4 Machine Learning is a field within  AI and the science of getting computers to act without being explicitly programmed.”5 Deep Learning is a specific technology for machine learning utilizing layered, so called “deep” neural networks and vast quantities of data to discover the intricate structure in large data sets. A network once trained on a training data set can be applied to the same type of data. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics.6 7 

Feature Design

One of the big challenges with traditional machine learning models using handcrafted algorithms is a process called feature design. The human programmer needs to tell the computer what kind of things it should be looking for to make a decision. This places a huge burden on the programmer, and the algorithm's effectiveness relies heavily on how insightful the programmer is. For complex problems such as diagnosing cancer on histology sections this is a huge challenge, that requires basically the knowledge of experienced pathologists. Deep learning is a method to overcome the need for feature extraction. The models are capable of learning  the right features by themselves with the proper configuration by the programmer and a proper set of training data. Deep learning algorithms are even capable of finding features which no human could have been able to devise by himself. In the first two layers features often have some relation to attributes human have devised in the past. On deeper layers in the network the features become so bizarre, that no human could have possibly been able to devise these. This makes deep learning an extremely powerful tool for all kinds of applications that involve pattern recognition. 

How does it work (apart from all the math)

One application of neural networks  is the classification of objects, especially the recognition of complex patterns. A classifier takes the input and computes a confidence score that indicates if an object belongs to a certain category or not e.g. if an image contains a cat or not. A neural network  can be seen as a combination of classifiers structured in a layered web that consists of an input layer, an output layer and multiple hidden layers in between. The network combines the scores of each of the activated neurons with certain weights and biases leading to the output which is again a confidence score that e.g. an object belongs to a certain category.  This process is called forward propagation. The accuracy of the network’s prediction is improved by training. During the training process, weights and biases are adjusted in a way that the predicted output gets as close as possible to the actual output. This process, called backward propagation, starts with the output and works backwards through the network.

Neural networks are called “deep” when they consist of lots of  hidden layers instead of just one or two. This enables them to recognize complex patterns such as  a human face in an image. The first layer would detect simpler patterns like edges, the next layers would combine them to larger structures like a nose followed by more complex patterns such as a human face on the next layer. These deep networks were inspired by the deep architecture of human brains. Due to their complexity deep networks need a lot of computing power typically provided by large Graphical Processing Units (GPU). 

Why is it all happening right now?

The basic concept behind deep learning such as neural networks, backpropagation and even convolutional neural networks (CNN, more on this in part 2) have been around for decades. Artificial intelligence was more a theme for science fiction movies than having a real impact in our daily lives. So what are the reasons for the sudden breakthroughs in this field? Apart from technical advances Geoff Hinton (Google/University of Toronto), one of the deep learning masterminds, mentions two key reasons of the recent revolution starting from 20068:
  • Availability of large amounts of labeled data such as the ImageNet Database
  • and massive computing power provided by large GPUs

Deep Learning Timeline

Learning Resources

If you want to learn in more detail about deep learning, machine learning and neural networks, there are a variety of resources freely available online. Here are some recommendations that are easy to understand, even if you don’t have a computer science background: 

More technical: 

In part 2 of our series we will take a closer look at deep learning methods for image recognition and the current state of the art. Part 3 will focus on how these methods can be applied to digital pathology image analysis and how this might impact pathology in general. 


Digital Multiplexing of Brightfield and Fluorescence Whole Slide Images


Digital Multiplexing of Brightfield and Fluorescence Whole Slide Images

Apart from all the technical differences in image acquisition brightfield and fluorescence images can provide pathologists with different insights on the assay. While brightfield images (e.g. stained in H&E) are the primary choice for observing the morphology of the tissue, fluorescence images are better for the visualization of the cellular details. Digital multiplexing of brightfield and fluorescence images can combine the benefits of both worlds while maintaining the individual characteristics of each of the modalities. Many digital pathology scanners available today are equipped to scan images in both brightfield and fluorescence mode. When using both modes for a tissue sample, adjoined sections are typically prepared and scanned separately, so the resulting whole slide images (WSIs) need to be reviewed or analyzed individually. By using image registration the images can be co-located to facilitate visual evaluation or automated image analysis.

Overview Brightfield vs. Fluorescence Whole Slide Imaging




Image Characteristics

Shows the whole structure of the tissue incl. surroundings

Shows only what has been stained positively, better at specific cellular location

Ease of Reading

Reading slides requires years of histopathology expertise

Easier to read due to eliminated background

Image Acquisition

Faster and easier, equipment is often less costly

More complex setup, can take longer and be more expensive

Image Channels

Images are acquired and stored in RGB mode

Images are acquired and stored with individual channels

Common Staining

H&E, IHC, CISH, other visible stains

Immunofluorescence, FISH

Typical Usage

Most commonly used in clinical routine due to ease of use, FDA clearance for various vendors

Used in research applications, FDA clearance and widely used in clinical routine on  FISH assays

How to Combine the Best of Both Worlds

Aligned images in brightfield (H&E) and fluorescence (Rhodamine, DAPI, FITC, Cy5) 

Aligning images from multiple modalities can help aggregating information from consecutive sections stained in histochemical and fluorescent dyes, e.g. protein expression and gene amplification by aligning immunohistochemical (IHC) markers with fluorescence in-situ hybridization (FISH).

The aligned images can then be used to facilitate the visual evaluation by a pathologist or automated image analysis. There is a wide range of applications conceivable for this method from cancer research and diagnosis to drug development and biomarker discovery. One of the most prominent ones might be research and diagnosis of HER2-positive breast cancer.


Sample Application: Multiplex-Analysis of HER2-Positive Breast Cancer

HER2-positive breast cancer tends to be more aggressive, so targeted therapies have been developed for the treatment of this particular type of cancer. Existing image-based solutions carry out the pathological scoring in breast cancer either with a scoring based on immunohistochemical IHC staining to mark the HER2 protein on the cell membrane, or a scoring based on FISH to mark the HER2 ribonucleic acid (RNA) that contains the genetic information to produce the HER2 protein. By using the aligned images the scoring of HER2 protein overexpression and gene amplification could be combined. This can be achieved by employing automatic image registration to align consecutive cuts that have been stained with IHC HER2 and FISH. After the alignment, similar regions on the sections can be identified quickly and an analysis of the results can be related on a modal and spatial correlation. This could help to overcome the limitations of current tests and help reduce uncertainty in diagnosis.

BRIGHTFIELD and Fluorescence Multiplexing as Part of Slidematch

microDimensions will introduce the capability to align brightfield and fluorescence whole slide images in the next release of Slidematch - a software for the alignment of differently stained sections. With the new feature you can align WSIs in both modalities at a very high precision only limited by the scanner resolution. The import wizard allows to load images in formats from different vendors and in different resolutions in the same batch. The images can be stored as an aligned image series in the .svs, .tif, or .ims format for viewing or automated image analysis.

The new feature will also be presented at Pathology Visions 2016, 23 - 25 October in San Diego, California. Come by our booth #310 to learn more. We are looking forward to your feedback!


Camelyon Challenge - Are today’s algorithms ready to diagnose cancer?


Camelyon Challenge - Are today’s algorithms ready to diagnose cancer?

The challenge

The Camelyon Challenge is organized by a group of researchers (from Radbound UMC, UMC Utrecht and TU Eindhoven in conjunction with the ISBI (International Symposium on Biomedical Imaging) and will run for two years. It is the first challenge using whole-slide images in histopathology. The goal  is to evaluate new and existing algorithms for automated detection of metastases in hematoxylin and eosin (H&E) stained whole-slide images of lymph node sections.

This task has a high clinical relevance and requires large amounts of reading time from pathologists. A successful solution could help to reduce the workload of the pathologists, reduce the subjectivity in diagnosis and lower the cost of the diagnosis. The 2016 challenge focuses on the detection of sentinel lymph nodes of breast cancer. The data sets provided by the Radboud University Medical Center (Nijmegen, the Netherlands) and the University Medical Center Utrecht (Utrecht, the Netherlands) consist of 400 annotated whole slide images, thereof 270 training images and 130 test images. The images have been exhaustively annotated, the average annotation time was 1 hours. The first round of the challenge was completed with a challenge workshop held at ISBI 16 on April 13th. The submission page was reopened on April 14 for new submissions. 


Example slides from the challenge (click arrows to view all slides), Source: "Statistics, Leaderboards, Results and Comparison to Pathologist", presented at ISBI 2016 by Babak Ehteshami Bejnordi

As of July 26th there have been 25 submissions. Submissions are evaluated both by slide and by lesion. The ranking for whole slide classification  is created by performing a receiver operating characteristic (ROC) analysis. The measure used for comparing the algorithms will be area under the ROC curve (AUC). ROC curve is the plot of the true positive rate (or sensitivity) versus false positive rate. The ranking for the tumor localization based on a free-response receiver operating characteristic (FROC) curve. The FROC curve is defined as the plot of sensitivity versus the average number of false-positives per image. Results are published in two leaderboards on the challenge website. Both leaderboards are currently lead by a team from Harvard Medical School and MIT with an AUC of 0.92 for the whole slide classification (1) and of 0.7051 for the tumor localization (2), followed by EXB Research and Development co., Germany and the Chinese University of Hong Kong (CU lab), Hong Kong (Leaderboard 1) and Radboud University Medical Center (DIAG), Netherlands and  the Chinese University of Hong Kong (CU lab) (Leaderboard 2).

See all results

What do these values actually mean?

The AUC measures the area under this curve and is a commonly used measure in the machine learning community to compare models. The AUC of a classifier is equivalent to the probability that the classifier will rank a randomly chosen positive instance higher than a randomly chosen negative instance (assuming positive ranks higher than negative). The closer the ROC gets to the optimal point of perfect prediction the closer the AUC gets to 1. The leading solution submitted by HMS/MIT achieved an AUC value of 0.92 which is only 0.04 points short of the value of 0.96 that was achieved by a human pathologist. Combining the pathologist's analysis with the two best submitted methods the AUC value reaches 0.99, what would lead to a significant reduction of error in cancer diagnosis. 

Image classification error of the Imagenet challenge winning systems vs. human benchmark

Is it just a question of time until algorithms beat the human eye?  

Looking into the results of the Imagenet Large Scale Visual Recognition Challenge, a competition with classification and localization tasks on a test data set of  150.000 images that was held since 2010, this question can be answered with yes. The winning system of the 2015 challenge from Microsoft had a classification error rate of 3.5 % beating the human benchmark of 5.1% for the first time.

It looks like the organizers of the Camelyon Challenge are convinced there is still potential in cancer recognition on histopathological images, as they reopened the submissions page the day after the workshop at ISBI 2016 on April 13th. The ranking is updated as new methods are submitted. So it’s definitely worth keeping an eye on it. The Camelyon 17 Challenge will be held in conjunction with the ISBI 2017 in Melbourne, Australia.

One of the goals of the challenge was to find solutions for a problem with high relevance in clinical practice and evaluate their performance in standardized, quantified and reproducible from. Apart from leaderboards and numbers the most exciting question will be how and when these methods can be implemented into clinical routines to really improve cancer diagnosis and patient care. 

"Three years ago when we started developing algorithms for detecting lymph node metastases of breast cancer, many pathologists in our hospital considered the task nearly impossible for computers, as they believe it's a very complex visual task. The results of Camelyon16 challenge, However, showed that the state-of-the-art artificial intelligence techniques achieve near-human performance in diagnosing breast cancer. While these results are very promising, before introducing these systems into routine clinical practice, we have to validate them rigorously on large number of samples. Camelyon 17 challenge would be a major step towards reaching this objective. We are going to provide participants with a much larger multi-center data set to train and test their systems. As a result of next year's challenge, I would envision a number of algorithms that not only produce accurate and reproducible results but also outperform anatomic pathologists in routine diagnostic settings." - Babak Ehteshami Bejnordi, Radboud University Medical Center, lead coordinator of the Camelyon Challenge   

More Information


Berlin Calling: What was going on at  ECDP 2016?


Berlin Calling: What was going on at ECDP 2016?

Berlin called and the European Digital Pathology Community came to gather for the 13th European Congress on Digital Pathology from May 25th to 28th. The scientific program included 107 presentations and attracted 262 attendees from 25 countries. A poll during the keynote speech given by Horst Karl Hahn of the Fraunhofer Mevis showed that the attendees were split by half into pathologists and computer scientists.

Europe is picking up speed

A general feedback from many attendees both on the industrial and on the user side was that digital pathology is picking up speed in Europe, even if there are big differences between the single countries. While there is already a lot of activity in the Scandinavian countries, the Netherlands and Spain, other countries like Germany have been more reluctant to the digitization in the recent years. According to John Maas, the General Secretary of the DGP, there is a massive increase of interest in digital pathology also among German pathologists lately, which is one of the reasons they took over this conference.(3) A big step for digital pathology in Europe certainly is the Foundation of the European Society of Digital Pathology that was announced on May 27th during the Report from the ECDP Council with Marcial Garcia Rojo as President.

Strong focus on image analysis

In the scientific program there was a strong focus on image analysis this time with three sessions covering Quantification, Tissue Modelling as well as Graphs & Topology with many of them introducing methods for automated ROI detection and quantification. A hot topic in image analysis is the use of deep learning algorithms and how they can be applied in digital pathology as shown e.g. in a presentation on “Deep convolutional neural networks for histological image analysis in gastric carcinoma whole slide images” by Harshita Sharma of the TU Berlin.

How innovations developed for research be adopted in clinical environments faster?

A  panel discussion on Imaging in clinics and research chaired by J. Lundin (Helsinki, Finland) and G. Kayser (Freiburg, Germany) discussed how the clinical adoption of methods that are developed for research can be sped up.

Here are some of the main discussion points: 

  • Collaboration on Validation: The validation of systems and devices is a long and costly process. A multi-centric approach that helps sharing the effort and also the funding could speed up the validation process and by this help to implement innovations.
  • More Pressure from the Pathology-Community: More pressure from the pathology community on funding agencies to drive forward innovation and a communication that highlights the benefits such as savings for healthcare systems in a consistent way. This could be achieved by more scientific studies that show how digital pathology will improve the quality and efficiency of care and lead to a better diagnose. Improved communication between industry and pathologists could support the development of tools that suit better to pathologists needs.
  • Interdisciplinary diagnostic teams: In interdisciplinary diagnostic teams working with integrated data from multiple disciplines (e.g. radiology, genetics, immunology), pathologists could be the drivers of these team as the expert for medical imaging, since they really know what a disease looks like.
  • Pharma Industry as an innovation driver: Pharma industry has a strong interest in detailed information and image analysis methods to be used in drug development and precision medicine. Pharma industry could provide the necessary investment to take digital pathology to a level where tools and systems are ready for clinical implementation. This development in precision medicine and individualized therapies might also create a need for higher precision in clinical pathology that can only be achieved by machines and not by the human eye.
  • Big data and deep learning: A precondition to exploit the full potential of the digitization is the availability of common (big) data repositories, especially for the development of image analysis algorithms based on deep learning. The amount of training data is a critical factor for the success of deep learning methods. To provide such data the panel identified the need of a common data repository containing anonymous data that can be used for training and testing analysis tools. One way to this common repository could be a bottom up approach that integrates existing repositories vs a big top down initiative.

Image analysis on multiple slides in different stains

One of the main topics discussed during conversations in our booth was the following. Working with multiple slides of the same tissue in different IHC stainings is becoming more and more interesting, particularly in applications such as tumor score computation, biomarker development, or companion diagnostics.  R. Røge from Aalborg University Hospital, Denmark presented a work  on ”Validation of Virtual Double Staining for Estimation of Ki67 Proliferation Indices in Breast Carcinomas” using Stereology on slides stained in Ki67 and Pan-cytokeratin. In order to perform analysis on multiple stains efficiently, the digital slides need to be accurately aligned. In this interview, Andreas Keil, our VP of Business Development explains how our software Slidematch can be used as tool to automate the alignment of digital whole slides.

Which scanner is the best?

A hot event especially for the equipment manufacturers but also for users was the 3rd International Scanner Contest where manufacturers could bring in their top devices to compete against each other in the following disciplines: high throughput, quality, validated versatility, technical and image analysis. The results shall give a neutral and valuable insight into the market. They will be published here. Results of past contest are available as well.

More Information on ECDP16

  1. More reports and impressions from Berlin can be found on twitter at https://twitter.com/ecdp2016.

  2. Recordings of the sessions are available here: http://patho-wsi.charite.de/pub/ECDP2016/ECDP-DATA.htm

  3. ECDP 2016 – Interview mit Jörg Maas, Generalsekretär Deutsche Gesellschaft für Pathologie e.V. (DGP): https://www.youtube.com/watch?v=wYdN4ikrhsM


Upcoming Events in Europe

The next ECDP will be held 2018 in Helsinki organized by Johan Lundin and Jorma Isola. The next big European event coming up in September (25th - 29th) is the XXXI International Congress of the International Academy of Pathology and the 28th Congress of the European Society of Pathology ECP that will for the first time be organised as a joint venture. More Information