Embedding deep networks into visual explanations

doi:10.1016/j.artint.2020.103435

Artificial Intelligence

Volume 292, March 2021, 103435

https://doi.org/10.1016/j.artint.2020.103435 Get rights and content

Abstract

In this paper, we propose a novel Explanation Neural Network (XNN) to explain the predictions made by a deep network. The XNN works by learning a nonlinear embedding of a high-dimensional activation vector of a deep network layer into a low-dimensional explanation space while retaining faithfulness i.e., the original deep learning predictions can be constructed from the few concepts extracted by our explanation network. We then visualize such concepts for human to learn about the high-level concepts that the deep network is using to make decisions. We propose an algorithm called Sparse Reconstruction Autoencoder (SRAE) for learning the embedding to the explanation space. SRAE aims to reconstruct part of the original feature space while retaining faithfulness. A pull-away term is applied to SRAE to make the bases of the explanation space more orthogonal to each other. A visualization system is then introduced for human understanding of the features in the explanation space. The proposed method is applied to explain CNN models in image classification tasks. We conducted a human study, which shows that the proposed approach outperforms single saliency map baselines, and improves human performance on a difficult classification task. Besides, several novel metrics are introduced to evaluate the performance of explanations quantitatively without human involvement.

Introduction

Deep learning has made significant strides in recent years, surpassing human performance in many tasks, such as image classification [1], [2], go-playing [3], and classification of medical images [4]. However, the usage of deep learning in real applications still must overcome a trust barrier. Imagine scenarios with a doctor facing a deep learning prediction: this CT image indicates malignant cancer, or a pilot facing a prediction: make an emergency landing immediately. These predictions may be backed up with a claimed high accuracy on benchmarks, but it is human nature not to trust them unless we are convinced that they are reasonable for each individual case. The lack of trust is worsened because of known cases where adversarial examples can fool deep learning to output wrong answers [5], [6]. In order to establish trust, humans need to understand how deep learning makes decisions. Such understanding could also help humans to gain additional insights into new problems, potentially improve deep learning algorithms, and improve human-machine collaboration.

Dictionaries often contain explanations of a concept in the form “A is something because of B, C, and D”, e.g. this is a bird because it has feathers, wings, and a beak. This type of explanation has two properties. First, it is concise – there is no need for hundred reasons adding up to define A. Secondly, it relies on B, C, and D, which are also high-level concepts. Both of these properties are often at odds with standard deep learning predictions – which are combinations of outputs from thousands of neurons in dozens of layers.

Approaches have been proposed to visualize each of the filters [7] and for humans to name them [8], but it is difficult for these approaches to obtain a concise representation. On the other hand, many other approaches generate attention maps that backtrack a decision to important parts of the original image [9], [10], [11], [12], [13]. These are often nice and quite informative, but they are limited to explaining individual images and do not provide high-level concepts that can be broadly applicable to all images in a category simultaneously.

In this paper, we attempt to make explanations similar to “A is something because of B, C, D” by extracting several high-level concepts from deep networks to aid human understanding (Fig. 1(a)). Our model attaches a separate explanation network (XNN) to a layer in the original deep network and reduces it to a few concepts (named the Explanation Space), from which one can generate predictions similar to the original deep network (Fig. 1(b)). We also show that the visualizations of the concepts generated by our method are human understandable.

Our model directly infers concepts from the deep network. It does NOT train from ground truth concepts defined by human, either defined by labels, attributes, or text. The reason we deliberately choose not to use human concepts is to adapt to future situations where the deep network may perform a task in a domain in which human does not have expert knowledge. An example of that would be early-stage cancer prediction from medical imaging, where it may not be totally clear which patterns in the image may lead to cancer, but a deep network may be able to reliably find it out. XNN could then summarize these concepts, provide guidance to human experts and allow them to verify this tangible knowledge derived from the DNN and highlighted by XNN.

We evaluate our approach by 1) Human evaluation, where humans are presented different explanations to check which one improves their categorization capabilities. 2) Metric-based evaluation, where we define quantitative metrics for the aforementioned properties of an explanation network and evaluate them on two different datasets: a fine-grained bird classification and a scene recognition datasets which both have rich ground truth annotations allowing us to compute the introduced metrics. Although the experiments in the paper focus on convolutional neural networks (CNN) applied to images, the explanation framework we develop is general and applicable to other types of deep networks as well.

This paper is an extension of our NeuRIPS 2017 workshop publication [14], which predates most other related approaches. We believe this was one of the first steps towards general explainable deep learning that can advance human knowledge and enhance future collaboration between humans and machines. In this version, we improved the loss function and performed a comprehensive evaluation involving novel quantitative metrics as well as human studies.

Our contributions in this paper are as follows:

•
We propose a novel explanation network to form a low-dimensional explainable concept space from deep networks. A sparse reconstruction auto-encoder with a pull-away term is proposed to make the explanation network faithful and orthogonal as defined previously.
•
We present a visualization paradigm for human understanding of the concept space.
•
We present a user study that shows our explanations can improve human performance on difficult tasks.
•
We propose automatic quantitative metrics to evaluate the performance of an explanation algorithm for faithfulness, locality and orthogonality. Experimental results show that the proposed explanation methods provide insights to the inner workings of deep network models.

Section snippets

The Explanation Network

Given a deep network (DNN) as a prediction model, we propose to learn an extra Explanation Neural Network (XNN) (Fig. 1(b)), which can be attached to any intermediate layer of the DNN. XNN attempts to learn an embedding to a low-dimensional explanation space, and then directly learns a mapping from the explanation space that mimics the output of the original DNN model. We denote the input feature space of the XNN as $Z (x; W)$ , where x are the input features (in the case of CNNs, an image) and W

Related work

The explanation for high accuracy but black-box models has become a significant need in many real applications. A large amount of approaches have been proposed in the past few years in medical domain [19], [20], [21], Natural Language Processing [22], [23], [24], computer vision, etc. In computer vision, approaches have been introduced to explain the predictions either by associating the images with captions/descriptions [25], [26], [27], [28], [29], visualizing individual convolutional filters

Human evaluation

To investigate the effectiveness of the proposed XNN with SRAE, we designed a user study where participants were asked to cluster images that are normally difficult to discern by untrained humans, referred to as main human study. The goal was to inspect whether the XNN can provide interpretable visualizations to non-experts so they can differentiate between unlabeled samples about which they do not have prior knowledge about. We compare XNN against two baselines, one without any visualization (

Conclusion

In this paper, we propose an explanation network, that can be attached to any layer in a deep network to compress the layer into several concepts that can approximate an N-dimensional prediction output from the network. A sparse reconstruction autoencoder (SRAE) is proposed to avoid degeneracy and improve orthogonality. A human evaluation is conducted to investigate the performance of our approach against a baseline. The human study shows that on tasks that are very difficult for humans, XNN

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was partially supported by DARPA contract N66001-17-2-4030. The authors would like to thank Dr. Liang Huang, Dr. Alan Fern and Dr. Tom Dietterich for helpful discussions and proofreading an earlier version of the paper. We would also like to thank the anonymous reviewers that helped improve the quality of the paper.

References (84)

H. Lee et al.
Efficient sparse coding algorithms
Adv. Neural Inf. Process. Syst.
(2007)
H. Zhang et al.
SPDA-CNN: unifying semantic part detection and abstraction for fine-grained recognition
A. Krizhevsky et al.
ImageNet classification with deep convolutional neural networks
K. He et al.
Deep residual learning for image recognition
D. Silver et al.
Mastering the game of go with deep neural networks and tree search
Nature
(2016)
A. Esteva et al.
Dermatologist-level classification of skin cancer with deep neural networks
Nature
(2017)
C. Szegedy et al.
Intriguing properties of neural networks
I.J. Goodfellow et al.
Explaining and harnessing adversarial examples
M.D. Zeiler et al.
Visualizing and understanding convolutional networks
D. Bau et al.
Network dissection: quantifying interpretability of deep visual representations

K. Simonyan et al.

Deep inside convolutional networks: visualising image classification models and saliency maps

C. Cao et al.

Look and think twice: capturing top-down visual attention with feedback convolutional neural networks

B. Zhou et al.

Learning deep features for discriminative localization

J. Zhang et al.

Top-down neural attention by excitation backprop

R.R. Selvaraju et al.

Grad-CAM: visual explanations from deep networks via gradient-based localization

Z. Qi et al.

Learning explainable embeddings for deep networks

B. Zhou et al.

Interpretable basis decomposition for visual explanation

J. Zhao et al.

Energy-based generative adversarial network

V. Petsiuk et al.

Randomized input sampling for explanation of black-box models

R. Caruana et al.

Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission

B. Letham et al.

Interpretable classifiers using rules and bayesian analysis: building a better stroke prediction model

Ann. Appl. Stat.

(2015)

B. Ustun et al.

Supersparse linear integer models for optimized medical scoring systems

Mach. Learn.

(2016)

T. Kulesza et al.

Principles of explanatory debugging to personalize interactive machine learning

M.T. Ribeiro et al.

High-precision model-agnostic explanations

S. Wiegreffe et al.

Attention is not not explanation

R. Kiros et al.

Multimodal neural language models

C. Kong et al.

What are you talking about? text-to-image coreference

D. Lin et al.

Visual semantic search: retrieving videos via complex textual queries

A. Karpathy et al.

Deep visual-semantic alignments for generating image descriptions

L.A. Hendricks et al.

Generating visual explanations

S. Bach et al.

On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation

PLoS ONE

(2015)

P.-J. Kindermans et al.

The (un)reliability of saliency methods

A. Chattopadhyay et al.

Grad-cam++: generalized gradient-based visual explanations for deep convolutional networks

A. Shrikumar et al.

Learning important features through propagating activation differences

D.H. Park et al.

Attentive explanations: justifying decisions and pointing to the evidence

M.T. Ribeiro et al.

Why should I trust you?: Explaining the predictions of any classifier

S.M. Lundberg et al.

A unified approach to interpreting model predictions

E. Elenberg et al.

Streaming weak submodularity: interpreting neural networks on the fly

B. Zhou et al.

Object detectors emerge in deep scene CNNs

A. Jain et al.

Structural-RNN: deep learning on spatio-temporal graphs

P. Agrawal et al.

Analyzing the performance of multilayer neural networks for object recognition

Z. Che et al.

Interpretable deep models for ICU outcome prediction

Cited by (21)

Automatic crater shape retrieval using unsupervised and semi-supervised systems
2024, Icarus
Impact craters are depressions formed due to impacts on the surface of planetary bodies. Recent deep learning-based crater detection methods assume craters as circular-shaped without much attention to extracting craters’ shape and morphology. Craters’ shape information (i.e., instance segmentation map for craters) can be helpful for many advanced analyses including crater formation and surface characteristics. However, publicly available ground truth catalogs for the lunar surface do not contain crater shape annotations and are also challenged by the missing craters problem. We attempt to solve these challenges by proposing a novel estimation and refinement-based approach using a combination of unsupervised and semi-supervised systems. Our method consists of (a) learning estimated crater segmentation (ECS) maps by a novel adaptive rim estimation algorithm (unsupervised system) using side information, (b) refining ECS by a cascade of Mask region-based convolutional neural networks (R-CNNs) to obtain refined crater segmentation (RCS) maps (semi-supervised system), and (c) combining RCS followed by predicting a highly accurate crater segmentation map. In the absence of any publicly available catalog for crater shape annotations, we conducted a ranking-based user study to compare against the state-of-the-art. The proposed method outperforms by achieving the best ranking for 63.53% crater images as compared to 9.67% for state-of-the-art. Further, the extracted shapes of the craters are utilized to improve the estimate of the craters’ diameter, depth, and other morphological factors to be made publicly available: https://drive.google.com/drive/folders/1ghBf2FXNIJUEQkAM2GjLZNiIXKEhZMEB?usp=sharing.
SoFTNet: A concept-controlled deep learning architecture for interpretable image classification
2022, Knowledge-Based Systems
Interpreting deep learning (DL)-based computer vision models is challenging due to the complexity of internal representations. Most recent techniques for rendering DL learning outcomes interpretable operate on low-level features rather than high-level concepts. Methods that explicitly incorporate high-level concepts do so through a determination of the relevancy of user-defined concepts or else concepts extracted directly from the data. However, they do not leverage the potential of concepts to explain model predictions. To overcome this challenge, we introduce a novel DL architecture – the Slow/Fast Thinking Network (SoFTNet) – enabling users to define/control high-level features and utilize them to perform image classification predicatively. We draw inspiration from the dual-process theory of human thought processes, decoupling low-level, fast & non-transparent processing from high-level, slow & transparent processing. SoFTNet hence uses a shallow convolutional neural network for low-level processing in conjunction with a memory network for high-level concept-based reasoning.
We conduct experiments on the CUB-200-2011 and STL-10 datasets and also present a novel concept-based deep $K$ -nearest neighbor approach for baseline comparisons. Our experiments show that SoFTNet achieves comparable performance to state-of-art non-interpretable models and outperforms comparable interpretative methods.
NeSyFOLD: A Framework for Interpretable Image Classification
2024, Proceedings of the AAAI Conference on Artificial Intelligence
On the Dynamics Under the Unhinged Loss and Beyond
2023, arXiv
A Survey of Computer Vision Technologies in Urban and Controlled-environment Agriculture
2023, ACM Computing Surveys
Four facets of AI transparency
2023, Handbook of Critical Studies of Artificial Intelligence

View all citing articles on Scopus

^☆: This paper is part of the Special Issue on Explainable AI.

¹: The work is done when Zhongang Qi was working at Oregon State University.

View full text

Embedding deep networks into visual explanations☆

Abstract

Introduction

Section snippets

The Explanation Network

Related work

Human evaluation

Conclusion

Declaration of Competing Interest

Acknowledgements

Adv. Neural Inf. Process. Syst.

ImageNet classification with deep convolutional neural networks

Deep residual learning for image recognition

Mastering the game of go with deep neural networks and tree search

Nature

Dermatologist-level classification of skin cancer with deep neural networks

Nature

Intriguing properties of neural networks

Explaining and harnessing adversarial examples

Visualizing and understanding convolutional networks

Network dissection: quantifying interpretability of deep visual representations

Deep inside convolutional networks: visualising image classification models and saliency maps

Look and think twice: capturing top-down visual attention with feedback convolutional neural networks

Learning deep features for discriminative localization

Top-down neural attention by excitation backprop

Grad-CAM: visual explanations from deep networks via gradient-based localization

Learning explainable embeddings for deep networks

Interpretable basis decomposition for visual explanation

Energy-based generative adversarial network

Randomized input sampling for explanation of black-box models

Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission

Interpretable classifiers using rules and bayesian analysis: building a better stroke prediction model

Ann. Appl. Stat.

Supersparse linear integer models for optimized medical scoring systems

Mach. Learn.

Principles of explanatory debugging to personalize interactive machine learning

High-precision model-agnostic explanations

Attention is not not explanation

Multimodal neural language models

What are you talking about? text-to-image coreference

Visual semantic search: retrieving videos via complex textual queries

Deep visual-semantic alignments for generating image descriptions

Generating visual explanations

On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation

PLoS ONE

The (un)reliability of saliency methods

Grad-cam++: generalized gradient-based visual explanations for deep convolutional networks

Learning important features through propagating activation differences

Attentive explanations: justifying decisions and pointing to the evidence

Why should I trust you?: Explaining the predictions of any classifier

A unified approach to interpreting model predictions

Streaming weak submodularity: interpreting neural networks on the fly

Object detectors emerge in deep scene CNNs

Structural-RNN: deep learning on spatio-temporal graphs

Analyzing the performance of multilayer neural networks for object recognition

Interpretable deep models for ICU outcome prediction