Transfer of object category knowledge across visual and haptic modalities: Experimental and computational studies

doi:10.1016/j.cognition.2012.08.005

Cognition

Volume 126, Issue 2, February 2013, Pages 135-148

https://doi.org/10.1016/j.cognition.2012.08.005 Get rights and content

Abstract

We study people’s abilities to transfer object category knowledge across visual and haptic domains. If a person learns to categorize objects based on inputs from one sensory modality, can the person categorize these same objects when the objects are perceived through another modality? Can the person categorize novel objects from the same categories when these objects are, again, perceived through another modality? Our work makes three contributions. First, by fabricating Fribbles (3-D, multi-part objects with a categorical structure), we developed visual-haptic stimuli that are highly complex and realistic, and thus more ecologically valid than objects that are typically used in haptic or visual-haptic experiments. Based on these stimuli, we developed the See and Grasp data set, a data set containing both visual and haptic features of the Fribbles, and are making this data set freely available on the world wide web. Second, complementary to previous research such as studies asking if people transfer knowledge of object identity across visual and haptic domains, we conducted an experiment evaluating whether people transfer object category knowledge across these domains. Our data clearly indicate that we do. Third, we developed a computational model that learns multisensory representations of prototypical 3-D shape. Similar to previous work, the model uses shape primitives to represent parts, and spatial relations among primitives to represent multi-part objects. However, it is distinct in its use of a Bayesian inference algorithm allowing it to acquire multisensory representations, and sensory-specific forward models allowing it to predict visual or haptic features from multisensory representations. The model provides an excellent qualitative account of our experimental data, thereby illustrating the potential importance of multisensory representations and sensory-specific forward models to multisensory perception.

Highlights

► We address people’s abilities to transfer category knowledge across sensory domains. ► We introduce the See and Grasp data set, the first visual-haptic data set. ► Experiment shows that object category knowledge transfers across sensory domains. ► Bayesian inference algorithm is proposed for learning componential 3-D shapes. ► Forward models predict sensory-specific features from multisensory representations .

Introduction

When recording neural activity in the human medial temporal lobe, Quiroga, Kraskov, Koch, and Fried (2009) found individual neurons that explicitly encode multisensory percepts. For example, one neuron responded selectively when a person viewed images of the television host Oprah Winfrey, viewed her written name, or heard her spoken name. (To a lesser degree, the neuron also responded to the actress Whoopi Goldberg.) Another neuron responded selectively when a person saw images of the former Iraqi leader Saddam Hussein, saw his name, or heard his name. Clearly, our brains encode abstract representations of objects that are multisensory in the sense that these representations are activated by perceptual inputs, but these inputs span multiple sensory formats or modalities.

Why would our brains acquire abstract representations that are activated by inputs from a variety of sensory modalities? One possible answer to this question is that these representations facilitate the transfer of knowledge across modalities. Consider, for instance, a person that learns to categorize a set of objects based solely on tactile or haptic inputs. Would the person be able to categorize these same objects when the objects are viewed but not grasped? Would the person be able to view novel objects from the same categories and be able to categorize these?

Here, we report experimental and computational studies of the acquisition of multisensory representations of object category, and the role these representations play in the transfer of knowledge across visual and haptic modalities. Our work includes three contributions. First, our experiment used an unusual set of visual-haptic stimuli known as “Fribbles”. Fribbles are complex, 3-D objects with multiple parts and spatial relations among the parts (see Fig. 1). Moreover, they have a categorical structure—that is, each Fribble is an exemplar from a category formed by perturbing a category prototype. Fribbles have previously been used in the study of visual object recognition (Hayward and Williams, 2000, Tarr, 2003, Williams, 1997). An innovation of our work is that we have fabricated a large set of Fribbles using a 3-D printing process and, thus, our Fribbles are physical objects which can be both seen and grasped. Based on this set of stimuli, we have created a data set, referred to as the See and Grasp data set, containing both visual and haptic features of the Fribbles. We are making this data set freely available on the world wide web with the hope that it will encourage quantitative research on computational models of visual-haptic perception.

Second, we conducted an experiment evaluating whether people can transfer knowledge of object category across visual and haptic modalities. Previous researchers have considered the transfer of knowledge of object identity across visual and haptic modalities (e.g., Lacey et al., 2007, Lawson, 2009, Norman et al., 2004). They have also compared similarity and categorization judgements based solely on visual input with those based solely on haptic input (Gaißert and Wallraven, 2012, Gaißert et al., 2011, Gaißert et al., 2008, Gaißert et al., 2010). To our knowledge, our experiment is the first focused on the transfer of object category knowledge across visual and haptic modalities.

Lastly, we developed a computational model, referred to as the MVH (Multisensory-Visual-Haptic) model, accounting for how multisensory representations of prototypical 3-D shape might be acquired, and of the role these representations might play in the transfer of category knowledge across visual and haptic modalities. Like some previous models in the literature (Biederman, 1987; Marr & Nishihara, 1978), the model makes use of part-based representations of prototypes. However, it goes beyond previous work by introducing a learning mechanism for the acquisition of these representations. Using its acquired multisensory representations along with sensory-specific forward models for predicting visual or haptic features from multisensory representations, the model transfers object category knowledge between visual and haptic modalities, thereby providing a qualitative account of our experimental data.

Section snippets

Previous research on visual-haptic object perception

Previous research has shown that knowledge of object identity transfers (at least in part) across visual and haptic domains (e.g., Lacey et al., 2007, Lawson, 2009, Norman et al., 2004). For example, Lacey, Peters, et al. (2007) trained subjects to identify objects either visually or haptically. Following training, subjects were tested on the same task using the untrained sensory modality. Subjects showed excellent transfer to the novel modality when objects were presented at the same

Fribbles and the See and Grasp data set

A key component of our research is the unusual visual-haptic stimuli that we used in both our experimental and computational studies. These stimuli are a subset of a larger set of stimuli known as “Fribbles”.¹ Fribbles have previously been used in the vision sciences to study visual

Experiment

Questions about categorization and generalization are fundamental to cognitive science, yet many open questions about them remain, particularly in the context of multisensory perception. Important questions include: To what extent does knowledge of object categories gained through one modality transfer to another modality? Is the amount of transfer the same for familiar and novel objects? For example, if a person learns to visually categorize a set of objects, can the person categorize these

Preliminary remarks regarding the MVH model

Our data show that participants transferred object category knowledge between visual and haptic modalities. How did they do this? To address this question, we propose a novel computational model, referred to as the MVH (Multisensory-Visual-Haptic) model, with several important properties. This model uses multisensory representations of prototypical 3-D shape. Like some previous models in the literature (Biederman, 1987; Marr & Nishihara, 1978), the model makes use of part-based representations

MVH (Multisensory-Visual-Haptic) model

This section provides the mathematical details of the MVH model. We describe the model from the perspective of a participant from Group V–H in our experiment. During training, the model is provided with images of Fribbles along with the Fribbles’ corresponding category labels. The model learns a multisensory representation of each category’s prototypical 3-D shape on the basis of this information. The model is provided with Fribbles’ haptic features during testing, and it estimates the category

Simulation results

In the simulations reported here, we used a slightly modified version of the See and Grasp data set for the four categories used in the experiment. We used three images of each Fribble rendered from three orthogonal viewpoints—a top view, a front view, and a right view. In addition, we simplified the images by using low-resolution images (80 pixels × 80 pixels) and by converting pixel values to binary numbers using a thresholding scheme. Therefore, the visual representation of a Fribble was a

Discussion

In summary, this article has addressed people’s abilities to transfer object category knowledge across visual and haptic domains. Our work has made three contributions. First, by fabricating Fribbles (3-D, multi-part objects with a categorical structure), we developed (and are making freely available on the web) visual-haptic stimuli that are highly complex and realistic. Second, we conducted an experiment evaluating whether people transfer object category knowledge across visual and haptic

Acknowledgements

We thank M. Tarr for making the 3-D object files for Fribbles available on his web pages. This work was supported by research grants from the National Science Foundation (DRL-0817250) and the Air Force Office of Scientific Research (FA9550-12-1-0303).

References (45)

N. Gaißert et al.
Similarity and categorization: From vision to touch
Acta Psychologica
(2011)
R.L. Goldstone et al.
Reuniting perception and conception
Cognition
(1998)
T.W. James et al.
Haptic study of three-dimensional objects activates extrastriate visual areas
Neuropsychologia
(2002)
M.I. Jordan et al.
Forward models: Supervised learning with a distal teacher
Cognitive Science
(1992)
S.J. Lederman et al.
Hand movements: A window into haptic object recognition
Cognitive Psychology
(1987)
A. Pascual-Leone et al.
The metamodal organization of the brain
Progress in Brain Research
(2001)
D.M. Wolpert et al.
Multiple paired forward and inverse models for motor control
Neural Networks
(1998)
A. Amedi et al.
Convergence of visual and tactile shape processing in the human lateral occipital complex
Cerebral Cortex
(2002)
A. Amedi et al.
Functional imaging of human cross-modal identification and object recognition
Experimental Brain Research
(2005)
S. Ballesteros et al.
Cross-modal repetition priming in young and old adults
European Journal of Cognitive Psychology
(2009)

L.W. Barsalou

Grounded cognition

Annual Review of Psychology

(2008)

I. Biederman

Recognition-by-components: A theory of human image understanding

Psychological Review

(1987)

C.M. Bishop

Pattern recognition and machine learning

(2006)

D.H. Brainard

The psychophysics toolbox

Spatial Vision

(1997)

R.D. Easton et al.

Do vision and haptics share common representations? Implicit and explicit memory within and between modalities

Journal of Experimental Psychology: Learning, Memory and Cognition

(1997)

J. Feldman et al.

Bayesian estimation of the shape skeleton

Proceedings of the National Academy of Sciences

(2006)

I. Fine et al.

Long-term deprivation affects visual perception and cortex

Nature

(2003)

N. Gaißert et al.

Categorizing natural objects: A comparison of the visual and the haptic modalities

Experimental Brain Research

(2012)

N. Gaißert et al.

Analyzing perceptual representations of complex, parametrically-defined shapes using MDS

N. Gaißert et al.

Visual and haptic perceptual spaces show high similarity in humans

Journal of Vision

(2010)

F.E. Grubbs

Sample criteria for testing outlying observations

Annals of Mathematical Statistics

(1950)

S. Haag

Effects of vision and haptics on categorizing common objects

Cognitive Processes

(2011)

Cited by (39)

Prior visual experience increases children's use of effective haptic exploration strategies in audio-tactile sound–shape correspondences
2024, Journal of Experimental Child Psychology
Sound–shape correspondence refers to the preferential mapping of information across the senses, such as associating a nonsense word like bouba with rounded abstract shapes and kiki with spiky abstract shapes. Here we focused on audio-tactile (AT) sound–shape correspondences between nonsense words and abstract shapes that are felt but not seen. Despite previous research indicating a role for visual experience in establishing AT associations, it remains unclear how visual experience facilitates AT correspondences. Here we investigated one hypothesis: seeing the abstract shapes improve haptic exploration by (a) increasing effective haptic strategies and/or (b) decreasing ineffective haptic strategies. We analyzed five haptic strategies in video-recordings of 6- to 8-year-old children obtained in a previous study. We found the dominant strategy used to explore shapes differed based on visual experience. Effective strategies, which provide information about shape, were dominant in participants with prior visual experience, whereas ineffective strategies, which do not provide information about shape, were dominant in participants without prior visual experience. With prior visual experience, poking—an effective and efficient strategy—was dominant, whereas without prior visual experience, uncategorizable and ineffective strategies were dominant. These findings suggest that prior visual experience of abstract shapes in 6- to 8-year-olds can increase the effectiveness and efficiency of haptic exploration, potentially explaining why prior visual experience can increase the strength of AT sound–shape correspondences.
Impact of multisensory learning on perceptual and lexical processing of unisensory Morse code
2021, Brain Research
Multisensory learning profits from stimulus congruency at different levels of processing. In the current study, we sought to investigate whether multisensory learning can potentially be based on high-level feature congruency (same meaning) without perceptual congruency (same time) and how this relates to changes in brain function and behaviour. 50 subjects learned to decode Morse code (MC) either in unisensory or different multisensory manners. During unisensory learning, the MC was trained as sequences of auditory trains. For low-level congruent (perceptual) multisensory learning, MC was applied as tactile stimulation to the left hand simultaneously to the auditory stimulation. In contrast, high-level congruent multisensory learning involved auditory training, followed by the production of MC sequences requiring motor actions and thereby excludes perceptual congruency. After learning, group differences were observed within three distinct brain regions while processing unisensory (auditory) MC. Both types of multisensory learning were associated with increased activation in the right inferior frontal gyrus. Multisensory low-level learning elicited additional activation in the somatosensory cortex, while multisensory high-level learners showed a reduced activation in the inferior parietal lobule, which is relevant for decoding MC. Furthermore, differences in brain function associated with multisensory learning was related to behavioural reaction times for both multisensory learning groups. Overall, our data support the idea that multisensory learning is potentially based on high-level features without perceptual congruency. Furthermore, learning of multisensory associations involves neural representations of stimulus features involved in learning, but also share common brain activation (i.e. the right IFG), which seems to serve as a site of multisensory integration.
Widgets: A new set of parametrically defined 3D objects for use in haptic and visual categorization tasks
2020, Revue Europeenne de Psychologie Appliquee
Citation Excerpt :
The size of the visual analogs was roughly comparable to the actual 3D size of the widgets. The viewpoint was chosen so that all the parts of each widget and the spatial relations between these parts were clearly visible (for a similar procedure, see Yildirim & Jacobs, 2013). The stimulus set is available at the following persistent URL: https://osf.io/q4p3g/.
Most research to date on human categorization ability has concentrated on the visual and auditory domains. However, a limited – but non-negligible – range of studies has also examined the categorization of familiar or unfamiliar (i.e., novel) objects in the haptic (i.e., tactile-kinesthetic) modality.
In this paper, we describe how we developed a new set of parametrically defined objects, called widgets, that can be used as 3D (or 2D) materials for haptic (or visual) categorization purposes.
Widgets are unfamiliar complex 3D shapes with an ovoid body and four types of elements attached to it (eyes, tail, crest, and legs). The stimulus set comprises 24 objects divided into four categories of six exemplars each (the files used for 3D printing are provided as Supplementary Material).
We also assessed and demonstrated the validity of our stimulus set by conducting two separate studies of haptic and visual categorization, involving participants of different ages: young adults (Study 1), and children and adolescents (Study 2). Results showed that humans can categorize our 3D complex shapes on the basis of both haptically and visually perceived similarities in shape attributes.
Widgets are very useful new experimental stimuli for categorization studies using 3D printing technology.
La plupart des recherches à ce jour sur la capacité de catégorisation humaine se sont concentrées sur les domaines visuel et auditifs. Cependant, un éventail limité – mais non négligeable – d’études a également examiné la catégorisation d’objets familiers ou non familiers (c’est-à-dire nouveaux) dans la modalité haptique (c’est-à-dire tactilo-kinesthésique).
Dans cet article, nous décrivons comment nous avons développé un nouvel ensemble d’objets définis paramétriquement, appelés widgets, qui peuvent être utilisés comme matériaux 3D (ou 2D) à des fins de catégorisation haptique (ou visuelle).
Les widgets sont des formes 3D complexes non familières avec un corps ovoïde et quatre types d’éléments qui y sont attachés (yeux, queue, crête et jambes). L’ensemble des stimuli comprend 24 objets divisés en quatre catégories de six exemplaires chacun (les fichiers utilisés pour l’impression 3D sont fournis comme matériel supplémentaire).
Nous avons également évalué et démontré la validité de notre ensemble de stimuli en menant deux études distinctes de catégorisation haptique et visuelle, impliquant des participants d’âges différents : des jeunes adultes (étude 1) et des enfants et des adolescents (étude 2). Les résultats ont montré que les humains peuvent classer nos formes complexes 3D sur la base de similitudes perçues à la fois haptiquement et visuellement dans les attributs de forme.
Les widgets sont de nouveaux stimuli expérimentaux très utiles pour les études de catégorisation utilisant la technologie d’impression 3D.
Infants use phonetic detail in speech perception and word learning when detail is easy to perceive
2020, Journal of Experimental Child Psychology
Infants successfully discriminate speech sound contrasts that belong to their native language’s phonemic inventory in auditory-only paradigms, but they encounter difficulties in distinguishing the same contrasts in the context of word learning. These difficulties are usually attributed to the fact that infants’ attention to the phonetic detail in novel words is attenuated when they must allocate additional cognitive resources demanded by word-learning tasks. The current study investigated 15-month-old infants’ ability to distinguish novel words that differ by a single vowel in an auditory discrimination paradigm (Experiment 1) and a word-learning paradigm (Experiment 2). These experiments aimed to tease apart whether infants’ performance is dependent solely on the specific acoustic properties of the target vowels or on the context of the task. Experiment 1 showed that infants were able to discriminate only a contrast marked by a large difference along a static dimension (the vowels’ second formant), whereas they were not able to discriminate a contrast with a small phonetic distance between its vowels, due to the dynamic nature of the vowels. In Experiment 2, infants did not succeed at learning words containing the same contrast they were able to discriminate in Experiment 1. The current findings demonstrate that both the specific acoustic properties of vowels in infants’ native language and the task presented continue to play a significant role in early speech perception well into the second year of life.
An integrative computational architecture for object-driven cortex
2019, Current Opinion in Neurobiology
Citation Excerpt :
In addition, aspects of multisensory perception and crossmodal transfer can be modeled by composing causal generative models for multiple sensory modalities that share the same underlying latent variables — those represented in the physics engine. Most of these extensions of our framework have been implemented computationally in some form, and received some behavioral support [62–64], but it is an open question whether or how these computations might be instantiated in object-driven cortex. Another important goal is to explore further how the computational architecture presented here connects to existing theoretical accounts of the parietal–frontal regions and their interactions [3,8–10].
Computational architecture for object-driven cortex
Objects in motion activate multiple cortical regions in every lobe of the human brain. Do these regions represent a collection of independent systems, or is there an overarching functional architecture spanning all of object-driven cortex? Inspired by recent work in artificial intelligence (AI), machine learning, and cognitive science, we consider the hypothesis that these regions can be understood as a coherent network implementing an integrative computational system that unifies the functions needed to perceive, predict, reason about, and plan with physical objects—as in the paradigmatic case of using or making tools. Our proposal draws on a modeling framework that combines multiple AI methods, including causal generative models, hybrid symbolic-continuous planning algorithms, and neural recognition networks, with object-centric, physics-based representations. We review evidence relating specific components of our proposal to the specific regions that comprise object-driven cortex, and lay out future research directions with the goal of building a complete functional and mechanistic account of this system.
Visuo-haptic object perception
2019, Multisensory Perception: From Laboratory to Clinic
Vision and touch have many similarities in their processing of information, manifested in multiple behavioral similarities in terms of categorization, recognition, and individual differences. This chapter reviews how these similarities contribute to multisensory object processing. For example, similar unisensory visual and haptic representations are integrated into a multisensory representation that supports both visuo-haptic crossmodal object recognition and view-independence. These behavioral similarities between vision and touch, and the evidence for integration of visual and haptic information into a multisensory representation, imply a shared neural basis for visuo-haptic object processing. We review the evidence that several brain regions, previously thought to be specialized for aspects of visual processing, are additionally active during analogous haptic tasks. Finally, we describe a model of visuo-haptic multisensory object recognition in which the object-selective lateral occipital complex is served by both top-down and bottom-up pathways depending on object familiarity and variable involvement of object and spatial imagery processes.

View all citing articles on Scopus

View full text

Transfer of object category knowledge across visual and haptic modalities: Experimental and computational studies

Abstract

Highlights

Introduction

Section snippets

Previous research on visual-haptic object perception

Fribbles and the See and Grasp data set

Experiment

Preliminary remarks regarding the MVH model

MVH (Multisensory-Visual-Haptic) model

Simulation results

Discussion

Acknowledgements

Acta Psychologica

Cognition

Neuropsychologia

Cognitive Science

Cognitive Psychology

Progress in Brain Research

Neural Networks

Convergence of visual and tactile shape processing in the human lateral occipital complex

Cerebral Cortex

Functional imaging of human cross-modal identification and object recognition

Experimental Brain Research

Cross-modal repetition priming in young and old adults

European Journal of Cognitive Psychology

Grounded cognition

Annual Review of Psychology

Recognition-by-components: A theory of human image understanding

Psychological Review

Pattern recognition and machine learning

The psychophysics toolbox

Spatial Vision

Do vision and haptics share common representations? Implicit and explicit memory within and between modalities

Journal of Experimental Psychology: Learning, Memory and Cognition

Bayesian estimation of the shape skeleton

Proceedings of the National Academy of Sciences

Long-term deprivation affects visual perception and cortex

Nature

Categorizing natural objects: A comparison of the visual and the haptic modalities

Experimental Brain Research

Analyzing perceptual representations of complex, parametrically-defined shapes using MDS

Visual and haptic perceptual spaces show high similarity in humans

Journal of Vision

Sample criteria for testing outlying observations

Annals of Mathematical Statistics

Effects of vision and haptics on categorizing common objects

Cognitive Processes