Embedding deep networks into visual explanations☆
Introduction
Deep learning has made significant strides in recent years, surpassing human performance in many tasks, such as image classification [1], [2], go-playing [3], and classification of medical images [4]. However, the usage of deep learning in real applications still must overcome a trust barrier. Imagine scenarios with a doctor facing a deep learning prediction: this CT image indicates malignant cancer, or a pilot facing a prediction: make an emergency landing immediately. These predictions may be backed up with a claimed high accuracy on benchmarks, but it is human nature not to trust them unless we are convinced that they are reasonable for each individual case. The lack of trust is worsened because of known cases where adversarial examples can fool deep learning to output wrong answers [5], [6]. In order to establish trust, humans need to understand how deep learning makes decisions. Such understanding could also help humans to gain additional insights into new problems, potentially improve deep learning algorithms, and improve human-machine collaboration.
Dictionaries often contain explanations of a concept in the form “A is something because of B, C, and D”, e.g. this is a bird because it has feathers, wings, and a beak. This type of explanation has two properties. First, it is concise – there is no need for hundred reasons adding up to define A. Secondly, it relies on B, C, and D, which are also high-level concepts. Both of these properties are often at odds with standard deep learning predictions – which are combinations of outputs from thousands of neurons in dozens of layers.
Approaches have been proposed to visualize each of the filters [7] and for humans to name them [8], but it is difficult for these approaches to obtain a concise representation. On the other hand, many other approaches generate attention maps that backtrack a decision to important parts of the original image [9], [10], [11], [12], [13]. These are often nice and quite informative, but they are limited to explaining individual images and do not provide high-level concepts that can be broadly applicable to all images in a category simultaneously.
In this paper, we attempt to make explanations similar to “A is something because of B, C, D” by extracting several high-level concepts from deep networks to aid human understanding (Fig. 1(a)). Our model attaches a separate explanation network (XNN) to a layer in the original deep network and reduces it to a few concepts (named the Explanation Space), from which one can generate predictions similar to the original deep network (Fig. 1(b)). We also show that the visualizations of the concepts generated by our method are human understandable.
Our model directly infers concepts from the deep network. It does NOT train from ground truth concepts defined by human, either defined by labels, attributes, or text. The reason we deliberately choose not to use human concepts is to adapt to future situations where the deep network may perform a task in a domain in which human does not have expert knowledge. An example of that would be early-stage cancer prediction from medical imaging, where it may not be totally clear which patterns in the image may lead to cancer, but a deep network may be able to reliably find it out. XNN could then summarize these concepts, provide guidance to human experts and allow them to verify this tangible knowledge derived from the DNN and highlighted by XNN.
We evaluate our approach by 1) Human evaluation, where humans are presented different explanations to check which one improves their categorization capabilities. 2) Metric-based evaluation, where we define quantitative metrics for the aforementioned properties of an explanation network and evaluate them on two different datasets: a fine-grained bird classification and a scene recognition datasets which both have rich ground truth annotations allowing us to compute the introduced metrics. Although the experiments in the paper focus on convolutional neural networks (CNN) applied to images, the explanation framework we develop is general and applicable to other types of deep networks as well.
This paper is an extension of our NeuRIPS 2017 workshop publication [14], which predates most other related approaches. We believe this was one of the first steps towards general explainable deep learning that can advance human knowledge and enhance future collaboration between humans and machines. In this version, we improved the loss function and performed a comprehensive evaluation involving novel quantitative metrics as well as human studies.
Our contributions in this paper are as follows:
- •
We propose a novel explanation network to form a low-dimensional explainable concept space from deep networks. A sparse reconstruction auto-encoder with a pull-away term is proposed to make the explanation network faithful and orthogonal as defined previously.
- •
We present a visualization paradigm for human understanding of the concept space.
- •
We present a user study that shows our explanations can improve human performance on difficult tasks.
- •
We propose automatic quantitative metrics to evaluate the performance of an explanation algorithm for faithfulness, locality and orthogonality. Experimental results show that the proposed explanation methods provide insights to the inner workings of deep network models.
Section snippets
The Explanation Network
Given a deep network (DNN) as a prediction model, we propose to learn an extra Explanation Neural Network (XNN) (Fig. 1(b)), which can be attached to any intermediate layer of the DNN. XNN attempts to learn an embedding to a low-dimensional explanation space, and then directly learns a mapping from the explanation space that mimics the output of the original DNN model. We denote the input feature space of the XNN as , where x are the input features (in the case of CNNs, an image) and W
Related work
The explanation for high accuracy but black-box models has become a significant need in many real applications. A large amount of approaches have been proposed in the past few years in medical domain [19], [20], [21], Natural Language Processing [22], [23], [24], computer vision, etc. In computer vision, approaches have been introduced to explain the predictions either by associating the images with captions/descriptions [25], [26], [27], [28], [29], visualizing individual convolutional filters
Human evaluation
To investigate the effectiveness of the proposed XNN with SRAE, we designed a user study where participants were asked to cluster images that are normally difficult to discern by untrained humans, referred to as main human study. The goal was to inspect whether the XNN can provide interpretable visualizations to non-experts so they can differentiate between unlabeled samples about which they do not have prior knowledge about. We compare XNN against two baselines, one without any visualization (
Conclusion
In this paper, we propose an explanation network, that can be attached to any layer in a deep network to compress the layer into several concepts that can approximate an N-dimensional prediction output from the network. A sparse reconstruction autoencoder (SRAE) is proposed to avoid degeneracy and improve orthogonality. A human evaluation is conducted to investigate the performance of our approach against a baseline. The human study shows that on tasks that are very difficult for humans, XNN
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This work was partially supported by DARPA contract N66001-17-2-4030. The authors would like to thank Dr. Liang Huang, Dr. Alan Fern and Dr. Tom Dietterich for helpful discussions and proofreading an earlier version of the paper. We would also like to thank the anonymous reviewers that helped improve the quality of the paper.
References (84)
- et al.
Efficient sparse coding algorithms
Adv. Neural Inf. Process. Syst.
(2007) - et al.
SPDA-CNN: unifying semantic part detection and abstraction for fine-grained recognition
- et al.
ImageNet classification with deep convolutional neural networks
- et al.
Deep residual learning for image recognition
- et al.
Mastering the game of go with deep neural networks and tree search
Nature
(2016) - et al.
Dermatologist-level classification of skin cancer with deep neural networks
Nature
(2017) - et al.
Intriguing properties of neural networks
- et al.
Explaining and harnessing adversarial examples
- et al.
Visualizing and understanding convolutional networks
- et al.
Network dissection: quantifying interpretability of deep visual representations
Deep inside convolutional networks: visualising image classification models and saliency maps
Look and think twice: capturing top-down visual attention with feedback convolutional neural networks
Learning deep features for discriminative localization
Top-down neural attention by excitation backprop
Grad-CAM: visual explanations from deep networks via gradient-based localization
Learning explainable embeddings for deep networks
Interpretable basis decomposition for visual explanation
Energy-based generative adversarial network
Randomized input sampling for explanation of black-box models
Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission
Interpretable classifiers using rules and bayesian analysis: building a better stroke prediction model
Ann. Appl. Stat.
Supersparse linear integer models for optimized medical scoring systems
Mach. Learn.
Principles of explanatory debugging to personalize interactive machine learning
High-precision model-agnostic explanations
Attention is not not explanation
Multimodal neural language models
What are you talking about? text-to-image coreference
Visual semantic search: retrieving videos via complex textual queries
Deep visual-semantic alignments for generating image descriptions
Generating visual explanations
On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation
PLoS ONE
The (un)reliability of saliency methods
Grad-cam++: generalized gradient-based visual explanations for deep convolutional networks
Learning important features through propagating activation differences
Attentive explanations: justifying decisions and pointing to the evidence
Why should I trust you?: Explaining the predictions of any classifier
A unified approach to interpreting model predictions
Streaming weak submodularity: interpreting neural networks on the fly
Object detectors emerge in deep scene CNNs
Structural-RNN: deep learning on spatio-temporal graphs
Analyzing the performance of multilayer neural networks for object recognition
Interpretable deep models for ICU outcome prediction
Cited by (21)
SoFTNet: A concept-controlled deep learning architecture for interpretable image classification
2022, Knowledge-Based SystemsNeSyFOLD: A Framework for Interpretable Image Classification
2024, Proceedings of the AAAI Conference on Artificial IntelligenceA Survey of Computer Vision Technologies in Urban and Controlled-environment Agriculture
2023, ACM Computing SurveysFour facets of AI transparency
2023, Handbook of Critical Studies of Artificial Intelligence
- ☆
This paper is part of the Special Issue on Explainable AI.
- 1
The work is done when Zhongang Qi was working at Oregon State University.