Skip to content
Publicly Available Published by De Gruyter Mouton November 14, 2022

Comparing linguistic and cultural explanations for visual search strategies

  • Brent Wolter EMAIL logo , Chi Yui Leung , Shaoxin Wang , Shifa Chen and Junko Yamashita
From the journal Cognitive Linguistics

Abstract

Visual search studies have shown that East Asians rely more on information gathered through their extrafoveal (i.e., peripheral) vision than do Western Caucasians, who tend to rely more on information gathered using their foveal (i.e., central) vision. However, the reasons for this remain unclear. Cognitive linguists suggest that the difference is attributable linguistic variation, while cultural psychologists contend it is due to cultural factors. The current study used eye-tracking data collected during a visual search task to compare these explanations by leveraging a semantic difference against a cultural difference to determine which view best explained strategies used on the task. The task was administered to Chinese, American, and Japanese participants with a primary focus on the Chinese participants’ behaviors since the semantic difference aligned the Chinese participants with the Americans, while their cultural affiliation aligned them with the Japanese participants. The results indicated that the Chinese group aligned more closely with the American group on most measures, suggesting that semantic differences were more important than cultural affiliation on this particular task. However, there were some results that could not be accounted for by the semantic differences, suggesting that linguistic and cultural factors might affect visual search strategies concurrently.

1 Introduction

It has been well-established that some aspects of cognition, such as working memory, are universal and unaffected by differences in language, culture, gender, etc. On the contrary, studies focusing on particular cognitive tasks have suggested that factors such as language and culture can have an effect. In the area of visual perception and visual search strategy research, cognitive linguists (and cognitive psychologists) have often observed behavioral differences that have been attributed to linguistic variation. At the same time, cultural psychologists, often investigating similar issues using similar research methodologies, have suggested that such behavioral differences can be attributable to differences in culture. Nonetheless, there have been few attempts to ascertain under what circumstances, and on which cognitive tasks, language or culture provides the most plausible explanation for the observed differences (see Imai and Masuda 2013; Imai et al. 2016).

One reason for the lack of crossover studies in these areas is related to a larger disconnect between the two disciplines of cognitive and cultural psychology, which has led to differences in conceptualizations and definitions for “language” and “culture”. While many cognitive psychologists see language as an essential meaning-making tool that has profound effects on the way we conceptualize, understand, and perceive the world, Imai et al. (2016) argue that researchers in this field have tended to view culture as “inherited knowledge from previous generations, which does not entail diverse culture-specific values and epistemologies.” As for cultural psychologists, Imai et al. (2016: 70) suggest culture is defined much more broadly as “‘narratives’, ‘meaning systems’, ‘systems of thought’, ‘cultural worldview/epistemology’, ‘communication styles’, or ‘self-construals’”. Language, on the other hand, is viewed as a tool which reflects these cultural views and epistemologies, but does not by itself contribute significantly to how these views and epistemologies are formed.

Briefly stated, Imai et al. (2016) suggest that cognitive linguists downplay the role of culture in cognition while cultural psychologists downplay the role of language. Given this situation, it is probably not surprising that there has been so little communication across these disciplines when attempting to pinpoint what provides the best explanation for certain cognitive behaviors. The main aim of the current study was to place these two perspectives in conflict with each other in order to determine which provides the most plausible explanation for visual behaviors on a particular visual search task. The study is described below, but first we will review what studies in these two research traditions have revealed about language, culture, and visual behavior.

1.1 Perception and visual behavior in cognitive linguistic research

A large number of studies have indicated that linguistic differences, primarily lexical and syntactic differences, can lead to differences in how one perceives the world. In the domain of space, a good deal of research in this area has focused on frames of reference, and the results have consistently shown people whose languages favor certain frames of reference (i.e., egocentric, allocentric, and geocentric) over others perform differently on “rotation tasks”, which involve either changing the perspective of the participant and/or the relative placement of objects under observation (Levinson 2003).

Research also indicates that linguistic differences can lead to differences in visual behaviors. A number of studies have looked at differences in color perception and discrimination amongst speakers of different languages. Roberson et al. (2008), for example, used a color discrimination task comparing the performance of Korean speakers, who linguistically divide the English color of green into two distinct categories, yeondu (yellow–green) and chorok (green), and English speakers whose language makes no such distinction. The results indicated that the linguistic difference in Korean affected categorical perception in a manner that was absent in the English-speaking group, providing support for the notion that language influences perception. A similar finding was reported by Winawer et al. (2007) in comparing speakers of Russian, who make a lexical distinction between lighter blues (goluboy) and darker blues (siniy), with speakers of English. (For a more thorough overview of other studies in this area, see Lupyan et al. 2020).

Other studies in this area have focused on differences for certain adjectives that categorize more precisely in one language than another. Goller et al. (2020) compared reaction times and error rates on a series of visual search tasks that leveraged a lexical difference between Korean, which has separate lexical items for tight fit (kkita) and loose fit (nehta), and German, which makes no such lexical distinction. Using reaction times and error rates as dependent measures, the authors found that differences in lexical encoding could account for differences in performance between the two groups, a finding that they argued could be explained by “linguistic spatial relational concepts held in long-term memory” (p. 1).

A final area of investigation into the effects of language on visual perception and cognitive functioning comes from studies investigating differences in grammatical encoding. In one such study using eye-tracking, Flecken et al. (2014) compared the visual behavior of speakers of German, which does not morphosyntactically grammaticalize aspect, with speakers of Arabic, which grammaticalizes aspect. The participants were asked to watch a number of videos that depicted real-world scenes featuring an animal, person, or vehicle moving along a route towards an endpoint. However, in some scenes the endpoint was reached (the control items) but in others it was not (the experimental items). The authors found a significant interaction effect between item type and language group that was consistent with the differences in grammatical encoding. In short, the German speakers tended to fixate more frequently and for longer durations on the endpoints in the critical condition than did the Arabic speakers. Moreover, this pattern held, regardless of whether or not the task was or was not accompanied by participant-generated verbal descriptions of the videos. (See also Flecken et al. (2015) who used an event-related potentials (ERP) paradigm and Papafragou et al. (2008) and Athanasopoulos and Bylund (2013) who used eye-tracking.)

Although this represents only a small sampling of the cognitive linguistic research in this area, many cognitive linguists would suggest that the findings provide a compelling indication that language affects visual behaviors. As noted above, however, there is also a body of research from cultural psychologists indicating that visual behavior can be linked to cultural differences. We will now briefly review some of the key findings in this area.

1.2 Perception and visual behavior in cultural psychology and vision research

Most modern research suggesting that culture affects visual behavior can be traced back to a series of studies by Masuda, Nisbett, and colleagues (e.g., Chua et al. 2005; Masuda and Nisbett 2001, 2006; Nisbett et al. 2001). These investigations focused on the broad distinction between East Asians (EAs) and Western Caucasians (WCs) and the assumption that EAs tend to process visual information more globally than WCs, who focus more on local information.[1] In Masuda and Nisbett (2001), Japanese and American participants were shown a number of short, underwater videos depicting underwater scenes featuring “focal” fish in the foreground and other aquatic plants and animals displayed in the background. After watching the videos, participants were asked to provide descriptions of the videos from memory and complete a recall test that asked them to state whether or not they had seen certain fish in the videos using either the original backgrounds as they appeared in the videos or novel backgrounds. The results showed that the Japanese participants reported background information sooner and significantly more regularly in their descriptions than did the American participants, but the Americans were more successful at identifying the focal fish in novel backgrounds than the Japanese. The groups performed equally well when the focal fish were shown in the original backgrounds. Similar findings using different methodologies were reported in Masuda and Nisbett (2006) and Chua et al. (2005). In each case, the authors suggested that the differences in viewing behavior could be attributed to cultural differences.

Masuda and Nisbett (2001) attribute these differences in viewing behavior to historical differences in the intellectual and social traditions of EA and WC cultures. In respect to the intellectual traditions, they contend that EAs process visual information more globally than WCs due to the lasting effects of Taoism, Buddhism, and Confucianism, which they suggest were “more holistic in character” and tended to stress the importance of viewing objects in relation to their larger context or field (p. 923). On the other hand, they suggest that WCs process visual information more locally due to intellectual traditions emerging from ancient Greece, which, they state, “emphasized analytic thought” and, relatedly, the importance of detaching objects from their environments for purposes of categorization (p. 923).

As for the differences in social traditions, Masuda and Nisbett (2001) argue that in the more “socially complex” collectivist EA societies, “people were required to maintain close and well-structured relationships with other group members” which ultimately led to a situation in which they were more likely to “see wholes” (p. 923). In contrast, they propose that in the “less socially complex” individualist WC societies, people “had more personal control over their environment” and their own goals with respect to the environment, which led to a situation in which WCs could pay less attention to the broader context and instead “see parts” (p. 923).

A number of subsequent eye-tracking studies in the field of vision research using facial recognition tasks has extended research in this area and revealed differences in EAs and WC’s use of visual processing strategies, a finding that many cultural psychologists view as evidence of a physical manifestation of the cultural differences described above. Specifically, these studies have shown that EAs direct their fixations more centrally on the nose region during these tasks, while WCs fixate in a triangular pattern over the eyes and mouths (e.g., Blais et al. 2008; Caldara et al. 2010; Kelly et al. 2010, 2011; Miellet et al. 2012; Rodger et al. 2010). Crucially for the argument that EAs make greater use of extrafoveal vision in facial recognition, additional research in this area has indicated that even though EAs tend to fixate more centrally on the nose region, they rely primarily on information gathered regarding the eyes and mouths for making their determinations. Two studies in particular that demonstrated these are Caldara et al. (2010) and Miellet et al. (2012).

In Caldara et al. (2010), the authors used a gaze-contingent design which restricted the participants’ ability to use their extrafoveal vision during a facial recognition task, creating what the authors described as a “spotlight” effect. However, the size of the spotlight was varied with Gaussian apertures of 2°, 5°, and 8°. When the spotlights were set at 2° and 5°, blocking use of extrafoveal vision, Caldara et al. found that the EA participants “actively fixated the same facial information: the eyes and mouth” in a manner that was highly similar to WC participants in natural viewing conditions (2010: 1, original italics). However, when the spotlight was expanded to 8°, allowing for the functional use of extrafoveal vision, the EA participants reverted back to fixating centrally on the nose while the WCs continued to fixate on the eyes and mouth. Based on these findings, Caldera et al. concluded that “social experience and cultural factors shape the strategies used to extract information from faces, but [our] results suggest that external forces do not modulate information use” (2010: 1, original italics).

In a similar study, Miellet et al. (2012) also used a gaze-contingent design, but instead of restricting extrafoveal vision the authors blocked foveal vision, creating a central blind spot. As with Caldara et al. (2010), Miellet et al. used apertures of 2°, 5°, and 8°, with 2° creating the smallest blind spot and 8° creating the largest blind spot. The results once again indicated a convergence in visual search strategies between the EA and WC participants once the blind spot was set at 8°. At this aperture, participants in both groups tended to focus their gaze centrally on the nose area and rely on extrafoveal vision to make judgments, which was similar to how the EA participants behaved under normal viewing conditions. Based on these findings, Miellet et al., like Caldara et al., concluded that although EAs rely more on extrafoveal information sampling during face processing in natural conditions, “both groups of observers use the same facial features for face recognition” (2012: 7).

Studies using stimuli other than faces have also revealed that EAs tend to rely more on extrafoveal vision in visual search tasks than do WCs. Importantly for the design of the current study, Lüthold et al. (2018) noted that this behavior seems particularly pronounced in images that are low in visual complexity, such as simple geometric shapes, presented on an otherwise plain background (e.g., Boduroglu et al. 2009; Cramer et al. 2016; Lao et al. 2017; McKone et al. 2010; Petrova et al. 2013). Boduroglu et al. (2009), for example, investigated the use of foveal and extrafoveal information in EAs and WCs using a computerized color change detection task. After looking at a central fixation cross for 507 ms, the computer displayed an image showing four boxes of different colors, all arranged equidistant from the fixation cross. The image was displayed for only 150 ms to ensure participants would not have time for eye movements, and was followed by a blank screen shown for 907 ms. After this, a second image that was similar to the initial image with four boxes was displayed, and the participants’ task was to indicate whether there was or was not a change in color in any of the boxes by pressing corresponding keys on the keyboard (half the trials incorporated a change in color in one of the boxes, while the other half had no changes).

To test participants’ effective use of foveal and extrafoveal vision, the second image sometimes also included a reconfiguration of the placement of the boxes, resulting in four distinct conditions: 1) a “same” condition which did not change the placement of the boxes; 2) a “shrink” condition that brought the boxes closer to the fixation point at the center; 3) an “expand” condition that moved the boxes further away from the fixation point; and 4) a random condition that randomized placement of the boxes in an asymmetrical fashion. The results indicated that both groups performed similarly in the same and random conditions, but differed in the shrink and expand conditions. Consistent with the findings of the facial recognition studies, the EAs performed significantly better than the WCs in the expand condition, which favored use of extrafoveal information, while the opposite result was observed in the shrink condition, which favored the use of foveal information. Based on these findings, the authors concluded that “East Asians have wider attentional focus than Americans” (Boduroglu et al. 2009: 356).

Building on this assumption that EAs rely on information gathered extrafoveally more than WCs, Petrova et al. (2013) compared the saccades (i.e., the eye movements) of a group of EA (Chinese) participants with a group of WC (German) participants. In this eye-tracking study, participants were asked to fix their gaze on a light gray fixation cross presented in the center of a black background. After this, the computer presented the target (a gray rhombus situated either directly above or directly below the fixation cross) and participants were asked to look at the target as quickly and accurately as possible. In addition to the target, some trials also simultaneously presented a gray oval at one of intercardinal points, which participants were told was irrelevant and were instructed to ignore. The results showed no significant differences between the two groups for the items presented with no distractors. However, there was a significant difference between the two groups when distractors were present, particularly when the distractors appeared in the upper half of the screen with the targets in the lower half. More specifically, the EA group demonstrated significantly higher saccade curvatures and saccade latencies, indicating a more pronounced effect of the distractors for these participants versus participants in the WC group. The authors suggest that these findings “are in line with previous studies reporting cultural differences in attentional processing” (Petrova et al. 2013: 48), and assert that “low-level attentional processes…are modulated by higher-order factors as indicated by participants’ cultural background” (2013: 43).

To summarize, the results of the studies reviewed in this section indicate that visual behaviors might be influenced by cultural factors. Of particular relevance for the current study is the body of research that indicates that the EAs use extrafoveal information more than WCs, who tend to make greater use of foveal information. However, as noted at the outset of this paper, there have been relatively few studies that have directly compared the influence of linguistic and cultural factors on visual behavior. In the next section, we will turn our attention to a review of two such studies.

1.3 Studies comparing linguistic and cultural views of verbal behavior

To the best of our knowledge, there are only two published studies that have attempted to directly compare linguistic and cultural explanations on visual performance tasks, Tajima and Duffield (2012) and Senzaki et al. (2014).[2] Tajima and Duffield (2012) approached the issue from a cognitive linguistics perspective, arguing that some of the visual findings observed in Nisbett and Masuda’s (2001) study of Japanese and English speakers could be explained by differences in linguistic structures rather than cultural affiliation. In addition to Japanese and English-speaking participants, Tajima and Duffield also included a group of Chinese participants to leverage linguistic differences between Japanese and Chinese against these two groups’ shared cultural affiliation as EAs. This was done to determine if the Chinese group would behave more in line with predictions based on the linguistic properties of Chinese or more in line with predictions based on their cultural affiliation as EAs.

As for the linguistic differences between Japanese and Chinese, Tajima and Duffield focused on the positioning of Figure and Ground in Japanese, Chinese, and English sentences. Tajima and Duffield observed that in Japanese, Ground is given syntactic priority over Figure in sentence structure, while the opposite is true in English. To demonstrate this, Tajima and Duffield (2012: 684–685) state that a natural translation of the English sentence the bike is next to the house into Japanese would reverse the ordering of house and bike resulting in the following sentence:

家のそばに、バイクがある。

uti-no soba-ni baiku-ga aru

house (Ground)-gen near-at bike (Figure)-nom is

As another example, Tajima and Duffield also point out a difference in placement of Figure and Ground between Japanese and English in complex sentence structures, with Japanese tending to place Ground in the main clause and Figure in the subordinate clause, in contrast to English which prefers the opposite pattern. As for Chinese, Tajima and Duffield do not provide concrete claims or examples for Figure and Ground placement; however, they speculate that the “Chinese participants – given the more head-initial nature of Chinese grammar – should diverge from Japanese participants in their dependent measures on any similar task, splitting the Asian response” (2012: 687). In their view, such a finding would indicate that linguistic factors may have also played a role in the results in Nisbett and Masuda’s original study.

In respect to performance on visual behavior tasks, Tajima and Duffield speculated that the Japanese language’s requirement of mentioning Ground information prior to Figure information might make the Japanese participants pay closer attention to background information than the other two groups on a visual task. In the experiment, participants were shown three photographs and asked to provide written descriptions of the images in the photos. Importantly for the considerations of the current study, the experiment also incorporated a visual recall task which presented the participants with fragments showing background images extracted from the photos, along with distractor images which did not appear in the photos. After viewing the photos, participants were asked to indicate whether or not the fragments had appeared in the backgrounds of the original photos using a simple yes/no format in order to test participants’ recall.

The results largely confirmed their expectations. The Japanese participants significantly outperformed both the English and the Chinese counterparts on the recall task, suggesting that the structure of the Japanese language conditioned them to pay more attention to the background information than the structure of English and Chinese. Unexpectedly, the English group also significantly outperformed the Chinese group, but the authors were uncertain as to why this was the case. Despite this unexpected finding, however, the authors still felt their results were sufficient to call into question the purely cultural explanations put forth by Nisbett and Masuda (2001), and to suggest that typological differences between Japanese and English provided a reasonable explanation for some of Nisbett and Masuda’s findings.

The second study that directly compared cultural with linguistic explanations comes from researchers who are more aligned with cultural psychology. Senzaki et al. (2014) conducted two eye-tracking studies comparing Canadian students with Japanese students on a visual task. In the first experiment, participants were asked to passively watch eight videos similar to those used in Masuda and Nisbett’s studies (2001, 2006) while times were recorded for fixations on focal fish versus background areas. In contrast to their expectations, the results found no significant differences in the average fixation times for either the focal fish or the background areas for the two groups, indicating that there were no culture-related differences in global versus local viewing behaviors.

In the second experiment, groups of different participants from the same cultural backgrounds (i.e., Canadian and Japanese) watched the same videos but were told prior to viewing that they would have to “describe [the videos] in as much detail as possible for one full minute” (Senzaki et al. 2014: 1497). In contrast to the first experiment, the second experiment did show significant differences between the two groups, with the Japanese participants fixating significantly longer on the background areas and the Canadian participants fixating significantly longer on the focal fish. Additionally, the Japanese participants provided significantly more utterances referring to background information while the Canadian participants produced significantly more utterances referring to the focal fish. In reviewing the findings, Senzaki et al. (2014) conclude that language, and in particular narrative construction, has an effect on cognitive processes and call for “a systematic investigation of the link between culture and language” (2014: 1503).

1.4 The present study

The present study extends this line of research by contrasting linguistic and cultural explanations for performance on a visual task. The current study is similar to Tajima and Duffield (2012) in that it leverages a linguistic difference between the Japanese, Chinese, and English languages against a cultural difference between Japanese and Chinese participants versus WC Americans. The aim was to determine whether linguistic or cultural differences provided the best explanation for performance on the visual task. The current study included data from three separate measures collected simultaneously while performing the same visual task: participant judgments, response times (RTs), and eye-tracking measures. The participant judgments were used to assess how participants understood the goals of the task on a linguistic level, while the RTs were included as a proxy measure of processing costs associated with the differences in the visuals used in the task. The eye-tracking measures were used to gain an understanding of participants’ use of global versus local visual strategies while completing the task. The main focus was on the performance of the Chinese group, while the American and Japanese groups served as control groups.

In this study, we operationalized linguistic differences as the semantic distinction between the English word narrow and its standard translations of  (zhǎi) in Chinese and 狭い (semai) in Japanese. As will be explained in greater detail below, although zhǎi and semai are standard translations for narrow, zhǎi is much closer in meaning with narrow than it is with semai. Although semai can be used to describe something that is narrow in the English sense, it is also often used to describe something that is small in general, with less consideration of width relative to length, or even cramped or confined. If a Japanese speaker states that they have a 狭い部屋 (semai heya or literally a “narrow room”), for example, it typically denotes a room that is small in general with no specific reference to its length and width dimensions (Wolter 2006). Similarly, a space that would normally be considered rather large can be described as semai if it is full of products, shelves, etc.

As for zhǎi, the literal meaning of zhǎi in Chinese typically describes something that is not wide in comparison to length, in line with the definition for the English narrow. If a Chinese speaker mentions that there is a 窄的街道 (narrow street), for example, it indicates that a street is not wide relative to its length, with little consideration of overall space dimensions. In some restricted situations, however, it is used to describe a space that is small in general in the sense the semai does. For example, if a Chinese speaker uses the expression 窄的空间 (narrow space), it implies that the space is small in general rather than simply narrow in the English sense. However, such usages of zhǎi are rare, suggesting that although the meaning of zhǎi bears more similarity with semai than does narrow, its semantic value is still more closely aligned with narrow.

The assumption that zhǎi is close in meaning to narrow was verified by Authors 2, 3, and 4, all of whom are either native speakers or advanced users of Mandarin and all of whom have high fluency in English. Likewise, the assumption that semai differs from narrow along the lines described above was verified by Author 2 (who has high proficiency in Japanese) and Author 5 (who is a native speaker of Japanese), both of whom are also highly proficient in English. Nonetheless, to gain a more objective understanding of the sematic overlap between zhǎi, semai, and narrow, we evaluated the collocational context of zhǎi and semai under the assumption that a word’s semantic value exerts a considerable influence on its collocational patterns and vice versa (see, e.g., Firth 1957; Jarvis and Pavlenko 2010).

We began by consulting two corpora, the 581-million character Peking University Chinese Contemporary Linguistics Corpus (http://ccl.pku.edu.cn:8080/ccl_corpus/) and the 100 million-word NINJAL-LWP for BCCWJ Japanese Language Corpus (http://nlb.ninjal.ac.jp/). From these corpora, we collected the ten most frequent noun collocates for zhǎi and semai when these words were used in their literal sense.[3] Next, we obtained the 100 most common noun collocates for narrow (using lemma forms) from the Corpus of Contemporary American English or COCA (Davies 2008–). We then checked to see if the collocates identified in the searches for zhǎi and semai were also present among the first 100 collocates for narrow. The results are shown in Table 1. As can be seen in Table 1, 9 of the 10 most common Chinese collocates were attested collocates in COCA (with most occurring with a high rate of frequency). In the case of semai, only 6 of the 10 collocates were attested, providing support for the assumption that zhǎi is closer in meaning to narrow than semai.

Table 1:

Comparison of collocates for zhǎi and semai with narrow.

Chinese word (freq. of co-occurrence) English translation COCA ranking Japanese word English translation COCA ranking
街道 (85) Street/road 1 範囲 (98) Range 2
范围 (43) Range 2 部屋 (98) Room --
通道 (43) Path 3 道 (86) Street/road 1
地方 (41) Place -- 所(78) Place --
空间 (40) Space 10 場所 (69) Placea --
胡同 (38) Alley 16 路地 (55) Alley 16
楼梯 (36) Stairs 36 Place names (42)b --
走廊 (29) Corridor 15 通路 (41) Passage/path/aisle 3
海峡 (26) Channel 22 空間 (38) Space 10
小路 (22) Road 4 地域 (35) Area 63
  1. a所(tokoro) and 場所 (basho) are synonymous in their core meaning (place). When these words are used in adjective-noun collocations with semai (semai tokoro/basho), tokoro can entail a broader range of meaning. For instance, a sentence such as “I like semai basho” most likely means “I like a small place (area)”, but “I like semai tokoro” can bear various meanings depending on contexts, e.g., “I like a small place/country/region/city/room/office/space etc.bThese were proper nouns such as Tokyo and Japan. COCA did not list any place names in the 100 most frequent collocates.

A final corpus-based comparison consisted of comparing the relative frequency of the expressions small room and narrow room within the three languages. The Japanese-language corpus revealed a similar frequency of use for small room (小さい部屋/小さな部屋, 101 occurrences) and narrow room (狭い部屋, 98 occurrences), while the Chinese- and English-language corpora both revealed much higher frequencies for small room over narrow room. In the Chinese-language corpus, small room (小房间) appeared 561 times as compared to 8 times for narrow room (窄的房间/窄房间). In the English-language corpus, small room occurred 1,590 times while narrow room occurred 161 times. Though it is possible that circumstances in Japan lead Japanese speakers to encounter and describe narrow rooms (in the English sense) with greater frequency than the circumstances in America and China, an alternate perspective is that Japanese speakers use semai to describe rooms more frequently because it is largely interchangeable with the Japanese word for small (小さい/小さな) in this context. In other words, if we assume that speakers in these three language groups encounter (and describe) rooms that are narrow versus small (in the English sense) with approximately equal frequency, then the higher frequency of use by the Japanese group for the word semai could be linked to the greater semantic flexibility for semai and the fact that it can be used to describe either type of room configuration (i.e., one that is narrow in the English sense versus one that is small in the English sense).

Cultural differences were operationalized using the customary distinction between EAs (the Chinese and Japanese participants) and WCs (the American participants). To summarize the configuration of the groups more succinctly, the linguistic differences placed the English and Chinese speakers in one group and the Japanese speakers into the other group. The cultural differences, however, placed the Chinese and Japanese in one group and the Americans in the other group. According to these groupings, there was no alignment, either linguistically or culturally, between the American and Japanese participants. Thus, we expected to see differences in both how they understood the goals of the task, which was dependent on instructions delivered in their native language, and their demonstrated preference for global versus local viewing strategies (with the Americans adopting more of a local strategy and the Japanese adopting more of a global strategy). In respect to the Chinese group, although we expected them to understand the goals of the task similar to the Americans (given the closer alignment between narrow and zhǎi), it was unknown whether or not they would adopt a more global or local visual strategy. If they used more of a local strategy it would suggest that their strategy was influenced more by linguistic factors. If, however, they used more of a global approach, it would indicate their visual strategy was influenced more by cultural factors.

Finally, visual search strategies were operationalized using eye tracking data assessing the percentage of time spent focusing one’s gaze on objects presented on a computer screen versus time spent looking outside the objects (on an otherwise blank background, see below). Our assumption here was that participants who spent a higher proportion of time focusing directly on the objects made greater use of their foveal vision while participants who a higher proportion of time looking outside the objects were making greater use of their extrafoveal vision. This assumption is based on three findings from the vision research literature reviewed above. The first is that people typically attend to the same information when making judgments on a visual search task, even when their gaze is directed at different places in the task, such as what was seen in Caldara et al. (2010) and Miellet et al. (2012) complementary face recognition studies discussed above. The second also comes primarily from Caldara et al. (2010) and Miellet et al.’s (2012) findings indicating that people, regardless of their cultural background, are capable of adapting their visual search strategies in response to the demands of a particular task, allowing them to gather information either their foveally or extrafoveally as needed. The third assumption is that people are fully capable of attending to information outside their direct gaze on visual tasks displaying simple stimuli on otherwise plain backgrounds, such as was seen in the studies by Boduroglu et al. (2009) and Petrova et al. (2013) also discussed above.

2 Method

2.1 Participants

Participants included a group of 37 Chinese speakers, 31 American English speakers, and 33 Japanese speakers at universities in their respective countries. All participants were presented with a consent form written in their native language, and all provided their verbal consent prior to data collection. The Chinese and Japanese groups were comprised entirely of undergraduate students, while the American group was comprised primarily of undergraduates but also included some graduate students as well as other people associated with the university where the data were collected.

Although it is difficult in the modern world to find people who are truly monolingual, most participants reported only modest levels of proficiency in any foreign language. Of those who reported higher levels of proficiency in a foreign language, none of them reported that they had more than a basic understanding of the languages spoken by participants in the other two groups. In addition to questions about foreign language proficiency, the participants were also given a questionnaire about their backgrounds to collect information about age, dexterity, and vision. None of the participants reported problems in their natural or corrected vision, nor did they indicate having any problems in using a response pad (see below). Descriptive statistics for the three groups are provided in Table 2.

Table 2:

Demographic information for participants.

Group N Age (SD) Sex (M/F/not disclosed) Dexterity (R/L/both)
Chinese 37 19.0 (0.7) 10/27/0 37/0/0
American 31 33.4 (11.0) 12/18/1 30/1/0
Japanese 33 19.1 (0.7) 33/0/0 33/0/0

2.2 Items

The items used in this study were adapted from a study initially conducted by Wolter et al. (2020). Each item consisted of two images depicting two-dimensional, aerial views of rooms laid out in a side-by-side configuration (see Figure 1). The participants were provided with instructions in their native language before beginning the task. In each case, the task was to choose the narrower of the two rooms by pressing a corresponding button on a response pad. However, as noted above, there was a linguistic difference in respect to the translation of the word narrow which aligned the Chinese group more closely with the American group than the Japanese group.

Figure 1: 
A sample item (50% scale, same height difference, small width difference).
Figure 1:

A sample item (50% scale, same height difference, small width difference).

In respect to the items, a total of nine room images were created that differed slightly but noticeably in both width (441, 456, and 471 pixels) and height (636, 651, and 666 pixels).[4] Combining all possible pairings of the nine images resulted in a total of 36 possible combinations. However, we transposed the left-right positioning of these combinations once for every item, thus creating 72 critical items in total. In all items, the two rooms were different in size either horizontally, vertically, or both. All room images were presented in grayscale. To avoid potential confounds, the rooms were sparsely furnished, with no furniture placed near walls, and no doors or windows (all of which could bias eye movements). In addition, the furnishings, and the distances between the furnishings, were kept the same size in every image and the entire furniture group was centered both horizontally and vertically within the rooms (see Figure 1). The thickness of the walls was kept the same. The only visual difference between the rooms, therefore, was in respect to the wall dimensions. Finally, to ensure the task was challenging enough to result in sufficiently rich data, we offset the center of the two images on the screen. Specifically, we placed the center of one image 10 pixels up and that of the other one 10 pixels down from the horizontal center of the screen, so that the participants could not quickly compare the heights and make judgments. The items were divided into two lists to counterbalance this manipulation, meaning that when an item had the left image up and the right one down on one list, the alignment was reversed in the other list. Participants were randomly assigned to one of the two lists.

2.3 Procedure

The data for each group were collected at research labs in the participants’ home countries. In all cases, the right eye of the participants was tracked (viewing was binocular) except when the left eye provided a better alternative for tracking purposes (i.e., when the eye-tracker had difficulty tracking the right eye). Each lab used an EyeLink desktop mount eye-tracker developed by SR Research (American and Chinese groups: EyeLink 1000+, Japanese group: EyeLink 1000) at a sampling rate of 1,000 Hz. The participants were seated in front of a 24-inch LCD monitor (resolution = 1920 × 1080, refresh rate = 120 Hz) with their heads on a chinrest at a viewing distance of approximately 94 cm. A 9-point grid calibration was used. The program for the experiment was developed using Experiment Builder software (SR Research).

The experiment began with the instructions of the task, which were displayed on the monitor. The instructions were presented only once (again in the participants’ native language) at the beginning of the experiment. This was followed by 12 practice items to familiarize participants to the task, a pause to ensure participants understood the task, and then finally the 72 experimental items.[5] Items were presented in an individually randomized manner for each participant (as determined by Experiment Builder). Participants were asked to indicate which room they felt was more narrow/zhǎi/semai by pressing a corresponding button on a game pad. If a response was not recorded within 15 seconds, the item timed out and the next item was presented. Each item began with a fixation cross presented in the middle of the screen. In addition to ensuring that the participants were all starting each item with the same fixation point, this procedure was also used to indicate any cases where recalibration was needed by triggering the presentation of the room images only after the fixation cross had been focused on for 500 ms.[6]

As noted above, three types of data were collected: participant judgments (i.e., their choice of the right or left room image), response times (RTs), and eye-tracking data (dwell time percentages inside versus outside the room images).[7] The judgments were used to determine how participants understood the goals of the task (i.e., whether the goal was to identify the narrower room, the smaller room, etc.). The RTs were included to provide a proxy measure of processing costs during the experiment in relation to differences in room dimensions. Finally, the eye-tracking data were used to determine if participants were attending to the images in a more global or more local way by considering what proportion of their dwell times occurred inside the rooms.

To reiterate, the main consideration in this study was the behavior of the Chinese group. If, for example, all dependent measures aligned the Chinese group with the American group, then it would provide support for the linguistic explanation for visual processing. If, however, there was a mismatch in the judgment data versus the eye-tracking data (with the judgment data similar to that of the American judgments and the eye-tracking data similar to that of the Japanese group) then it would provide evidence for the cultural explanation of visual processing strategies.

3 Results

3.1 Data analysis

Before analyzing the data, we eliminated responses that timed out at 15 seconds (6 of the 7,344 responses collected). The participant judgments (i.e., left-right choices) for models including width differences (see below) were analyzed using logistic regression modelling with narrowness defined by the English definition. RTs and eye-tracking measures were analyzed using linear mixed-effects modeling, with participant and item entered as random effects (all other variables were entered as fixed effects). All analyses were conducted using R statistical software (R Core Team 2021).

The model fitting procedure for each analysis started with a maximal model that included potential predictor variables as main effects. These included: group (Chinese, American, and Japanese), width differences and height differences between the two rooms (defined as the same [0 pixel difference], medium difference [15 pixel difference], and large difference [30 pixel difference]), and trial (the order in which the participant encountered an item). For analyses using mixed effects modeling (i.e., everything except for the logistic regression analyses for participant judgments), item and person were entered as random effects. In addition, all models included all possible second-order interactions between group and the other main effects. Categorical variables were dummy coded and all numerical predictor variables were standardized (using natural logs) and centered prior to analyses. For the eye-tracking measures percentage values were used due to the fact that the Chinese and American groups tended to display longer dwell times on average than the Japanese group (Chinese x  = 1742, SD = 1674; American x  = 2535, SD = 1988) per trial than did the Japanese group ( x  = 1313, SD = 1468). All numeric outcome variables were log adjusted using natural logs prior to analysis.

It is worth noting than in an earlier version of this study we also included area differences in the models. However, due to the close mathematical connection between width, height, and area, we opted to exclude area difference from the analyses in this study to avoid issues of multicollinearity and model overfitting. In place of this, we added a logistic regression analysis that looked at whether or not the “shorter” room image was chosen on the assumption that the Japanese group might tend to choose the shorter room more often if they were mostly concerned with differences in area.

After constructing each maximal model, a backwards stepwise regression analysis was performed to identify the most plausible models for each measure using Akaike information criterion (AIC) values. No distinctions were made between main effects and second-order interactions in this procedure. The predictor variable that had the least impact on the AIC values at each step was eliminated until only variables that significantly improved the fit were included.

3.2 Main findings

3.2.1 Participant judgments

Descriptive statistics for all measures are presented in Table 3. The results of the logistic regression analyses comparing whether or not the narrower room image was chosen are shown in Table 4. For this analysis, items with room images that had the same width were excluded as these items had no correct answer under an English definition of narrow. As shown in the second column of Table 3, the Chinese group chose the narrower room image 77% of the time, the American group 81%, and the Japanese group 55%. Statistical analyses (Table 4) revealed no significant difference between the Chinese and American groups’ average. However, there were significant differences between the Chinese and the Japanese groups and between the American and the Japanese groups z = −8.823, p < 0.001.[8]

Table 3:

Mean values for all three groups (standard deviations in parentheses).

Group % narrower/wider room chosena % shorter/taller room chosenb RT on all items (log adjusted) RT for items with different width differences (log adjusted) Dwell time % inside room images
Same Medium Large
Chinese 0.77/0.23 0.49/0.51 7.64 (0.74) 7.74 (0.75) 7.65 (0.74) 7.52 (0.70) 0.77 (0.24)
American 0.81/0.19 0.42/0.58 8.02 (0.66) 8.13 (0.68) 8.03 (0.65) 7.88 (0.65) 0.82 (0.22)
Japanese 0.55/0.45 0.61/0.39 7.57 (0.68) 7.58 (0.69) 7.59 (0.68) 7.54 (0.68) 0.64 (0.30)
  1. aExcludes items with the same room widths. bExcludes items with the same room heights.

Table 4:

Final logistic regression model for participant judgments for narrower room image (excludes items with the same width).

Predictors Odds Ratios Std. error Statistic p
(Intercept) 2.72 0.23 11.69 <0.001
American 1.15 0.15 1.06 0.287
Japanese 0.44 0.05 −7.11 <0.001
Large width diff. 1.81 0.21 5.04 <0.001
Same height 1.23 0.15 1.63 0.103
Large height diff. 0.52 0.07 −5.12 <0.001
American * large width diff. 1.92 0.37 3.40 0.001
Japanese * large width diff. 0.57 0.09 −3.62 <0.001
American * same height 0.97 0.19 −0.14 0.890
Japanese * same height 0.87 0.15 −0.82 0.412
American * large height diff. 0.54 0.10 −3.22 0.001
Japanese * large height diff.
2.59
0.46
5.33
<0.001
Observations 5,506
R2 Tjur 0.079
  1. 1) Model validation was tested by comparing the specified model to the null model using a log likelihood test. This indicated that the specified model provided significantly better fit than the null model χ2 = 449.98, p < 0.001. 2) Reference categories are as follows: group = Chinese, width difference = medium, height difference = medium. Significant values are shown in bold.

In addition to the main effects, the analysis also revealed significant group by width difference and group by height difference interactions (Figures 2 and 3). As shown in Figure 2, the Chinese and American groups were both more likely to choose the narrower room when the width difference was large instead of medium, with a larger discrepancy for American group. For the Japanese group, however, width differences had almost no effect. For the group × height difference interactions (Figure 3), all three groups were slightly less likely to choose the narrower room when the height difference was medium as opposed to small. However, the Chinese and the American groups’ likelihood of choosing the narrower room image dropped considerably when the height difference was large (more so for the Americans), while the Japanese group’s likelihood increased slightly. These results suggested that the Chinese and American participants were likely judging narrowness in terms of width relative to height rather than absolute differences in width, a tendency which was distinct from the Japanese participants.

Figure 2: 
Group by width difference interactions for participant judgments for narrower room.
Figure 2:

Group by width difference interactions for participant judgments for narrower room.

Figure 3: 
Group by height difference interactions for participant judgments for narrower room.
Figure 3:

Group by height difference interactions for participant judgments for narrower room.

We also created a logistic regression model assessing whether or not participants chose the shorter of the two room images under the assumption that participants (particularly the Japanese participants) might be more concerned with height differences since these also have a direct bearing on overall area differences. In line with the procedure used for assessing whether the narrower room was chosen, this analysis excluded items showing room images with the same height. As seen in Table 3, column 3, the Chinese group chose the shorter room image 49% of the time, the American group 42%, and the Japanese group 61%. Statistical analyses (Table 5) showed that the Chinese group’s mean score was significantly higher than the American group’s but significantly lower than the Japanese group’s. Additional analyses revealed that the Japanese group’s average was also significantly higher than the American group’s (z = 7.883, p < 0.001). This analysis also revealed a significant interaction between group and height differences (Figure 4). As shown in Figure 4, height differences had a negligible effect on the Chinese group’s choices. For the American group, there was a lower probability of choosing the shorter room when the height difference was large, while the Japanese group demonstrated the opposite trend. These results place the Chinese group’s performance between that of the American and Japanese groups’.

Table 5:

Final logistic regression model for participant judgments for shorter room image (excludes items with the same height).

Predictors Odds ratios Std. error Statistic p
(Intercept) 1.00 0.06 0.06 0.951
American 0.80 0.06 −2.81 0.005
Japanese 1.55 0.13 5.38 <0.001
Same width 0.86 0.05 −2.46 0.014
Large width diff. 1.03 0.07 0.48 0.632
Large height diff. 1.02 0.10 0.19 0.852
Trial 0.93 0.03 −2.60 0.009
American * large height diff. 0.88 0.12 −0.92 0.358
Japanese * large height diff.
1.20
0.17
1.30
0.193
Observations 5466
R2 Tjur 0.027
  1. 1) Model validation was tested by comparing the specified model to the null model using a log likelihood test. This indicated that the specified model provided significantly better fit than the null model χ2 = 147.99, p < 0.001; 2). Reference categories are as follows: group = Chinese, width difference = medium, height difference = medium. Significant values are shown in bold.

Figure 4: 
Group by height difference interactions for participant judgments for shorter room.
Figure 4:

Group by height difference interactions for participant judgments for shorter room.

Overall, the results of the participant judgments were in line with our assumptions regarding the semantic differences between narrow, zhǎi, and semai. In respect to the main effects, the Chinese and American groups both showed a much stronger tendency to make their judgments based on width differences than the Japanese group, which seemed concerned with the combination of width and height differences (as demonstrated by the differences in percentages shown in Table 3). Nonetheless, although the Chinese group’s percentages were clearly much closer in value to the American group’s, they still fell between the extremes displayed by the American and Japanese groups. The interactions told a similar story. The American and Japanese groups’ performance was clearly distinct, while the Chinese group’s results fell between these two extremes. In the case of the first analysis comparing whether or not the narrower room image was chosen, the Chinese group’s response patterns resembled a less extreme version of the American group’s responses (Figures 2 and 3), while in the analysis of whether or not the shorter room image was chosen, the Chinese group’s response pattern was more moderately placed between that of the American and Japanese groups (Figure 4).

3.2.2 Response times

Response times were included as a proxy of processing cost associated with differences in width and height. The results for the final RTs model are shown in Table 6. As for the main effects, the Chinese and Japanese groups both responded significantly faster than the American group tAMERICAN-JAPANESE = −3.390, p < 0.001. This was likely due, at least in part, to the relatively older age of the American participants compared to the other two groups. Investigations into the effects of age on response times (e.g., Thompson et al. 2014) has shown that reaction time begins to decline around the age of 24.

Table 6:

Final mixed effects model for log adjusted RTs.

Predictors Estimates Std. error Statistic p
(Intercept) 7.67 0.09 89.57 <0.001
American 0.36 0.13 2.90 0.004
Japanese −0.07 0.12 −0.59 0.557
Same width 0.10 0.03 3.77 <0.001
Large width diff. −0.13 0.02 −5.53 <0.001
Same height −0.01 0.03 −0.58 0.561
Large height diff. −0.08 0.02 −3.32 0.001
American * same width 0.01 0.03 0.38 0.706
Japanese * same width −0.06 0.03 −1.85 0.064
American * large width diff. −0.02 0.03 −0.51 0.612
Japanese * large width diff. 0.08 0.03 2.45 0.014
American * same height 0.02 0.03 0.67 0.500
Japanese * same height 0.08 0.03 2.23 0.026
American * large height diff. 0.04 0.03 1.22 0.223
Japanese * large height diff. −0.07 0.03 −2.15 0.032
Random effects
σ 2 0.22
τ00 person 0.26
τ00 item 0.00
ICC 0.54
N person 102
N item
72
Observations 7,337
Marginal R2/conditional R2 0.084/0.578
  1. Reference categories are as follows: group = Chinese, width difference = medium, height difference = medium. Significant values are shown in bold.

The analysis also revealed significant group × width difference and significant group × height difference interactions (Figures 5 and 6), which are informative in revealing how similarities and differences in width and height affected processing costs for participants within the three groups. The group × width difference results (Figure 5) indicate increased RTs with decreases in width difference for all three groups; however, the differences for the Chinese and American groups were far more pronounced, suggesting that width differences had a stronger influence on processing for participants in these two groups than they did for the Japanese participants. As for height differences, the results displayed in Figure 6 show the opposite trend. In brief, the Japanese participants’ RTs were more strongly affected by height similarities than they were for participants in the other two groups.

Figure 5: 
Group by width difference interactions for response times.
Figure 5:

Group by width difference interactions for response times.

Figure 6: 
Group by height difference interactions for response times.
Figure 6:

Group by height difference interactions for response times.

Overall, these results are largely in line with the participant judgment analyses, indicating that the Chinese and the American groups attended more closely to width differences, as evidenced by the slower RTs with the higher similarity in width, while the Japanese group attended more closely to a combination of width and height differences. Furthermore, it is also worth noting that all significant interactions for the Chinese group occurred in comparison to the Japanese group rather than the American group (Table 6). These findings provide supplementary support for our assumption regarding the relative semantic values of narrow, zhǎi, and semai.

3.2.3 Eye-tracking results

The participant judgment and RT results provided us with an idea of how the participants interpreted the aims of the task on a linguistic level; however, they tell us nothing regarding how the participants went about approaching the task in respect to global versus local visual processing strategies. To gain insights into this, we turned to the eye-tracking data. As indicated in Table 3, 77% of the Chinese group’s dwell times occurred inside the room images, which was slightly less than the 82% observed for the American group and considerably more than the 64% demonstrated by the Japanese group. To gain a more precise understanding of where participants in the three groups focused their gaze, we generated aggregate heat maps for each group (Figure 7).[9] As can be seen in these heat maps, all three groups demonstrated a tendency to focus their gaze on the central area of the room images, suggesting participants in all three groups relied on foveal information while completing the task. However, the Japanese group also demonstrated a propensity to focus on the middle of the screen directly between the two room images, suggesting greater use of extrafoveal information when compared to the other two groups. The main effects in the statistical analysis support the claim that the Chinese group’s visual tendencies aligned more closely with the American group than the Japanese group. As shown in Table 7, there were no significant main effect differences between the Chinese and American groups, but there were between the Chinese group and the Japanese group and between the American group and the Japanese group tAMERICAN-JAPANESE = −3.916, p < 0.001.

Figure 7: 
Aggregate heat maps indicating average dwell times for Chinese (top), American (middle), and Japanese (bottom) participants.
Figure 7:

Aggregate heat maps indicating average dwell times for Chinese (top), American (middle), and Japanese (bottom) participants.

Table 7:

Final mixed effects model for percentage of dwell times inside room images.

Predictors Estimates Std. error Statistic p
(Intercept) 0.77 0.03 23.53 <0.001
American 0.05 0.05 1.04 0.296
Japanese −0.14 0.05 −2.85 0.004
Same width 0.00 0.01 0.01 0.995
Large width diff. −0.01 0.01 −1.67 0.094
Trial 0.01 0.00 3.62 <0.001
American * same width 0.01 0.01 1.00 0.319
Japanese * same width −0.02 0.01 −1.47 0.140
American * large width diff. 0.02 0.01 1.38 0.168
Japanese * large width diff. 0.02 0.01 1.72 0.086
American * Trial −0.03 0.00 −6.58 <0.001
Japanese * Trial 0.00 0.00 0.92 0.359
Random effects
σ 2 0.03
τ00 person 0.04
ICC 0.57
N person
102
Observations 7,291
Marginal R2/conditional R2 0.088/0.611
  1. Significant values are shown in bold.

The results also indicated significant group by width difference and group by trial interactions (Figures 8 and 9). The significant group by width difference interactions were found between the American group and the Japanese group when the rooms were the same width tAMERICAN-JAPANESE = −2.391, p = 0.017. More interesting, however, were the group by trial interactions as these revealed a significant difference between the Chinese and the American groups, and the American and the Japanese groups tAMERICAN-JAPANESE = 7.280, p < 0.001, but not between the Chinese and the Japanese groups. The group by trial interaction plots shown in Figure 9 suggest that the Chinese and Japanese groups demonstrated a shift in strategies from more global to more local viewing behaviors as the task progressed, while the American group showed the opposition pattern. This, and other findings, will be considered more carefully in the next section.

Figure 8: 
Group by width difference interactions for dwell time percentages.
Figure 8:

Group by width difference interactions for dwell time percentages.

Figure 9: 
Group by trial interactions for percentage of dwell times inside room images.
Figure 9:

Group by trial interactions for percentage of dwell times inside room images.

4 Discussion

The current study was designed to add to the limited number of studies comparing cognitive psychology/linguistic and cultural psychology explanations for observed differences between EAs and WCs in the use of global and local visual search strategies. As stated at the outset, there is ample evidence to suggest that both linguistic and cultural differences affect visual processing strategies, but few studies have attempted to determine which provides the most plausible explanation on a particular visual search task. The results of the participant judgments and RT analyses largely confirm our assumption that the semantic value of the Chinese word zhǎi overlaps with both the English word narrow and the Japanese word semai, but more so with narrow than semai. This was shown in the main effects of participant judgments, particularly for the logistic regression analysis comparing whether or not the participants selected the narrower (English sense) of the two room images. And although the Chinese group’s overall RTs were much closer to those of Japanese group than the American group, the interactions indicated highly similar patterns for the Chinese and American participants as the distances in width and height between the two rooms varied.

Of more concern for the aims of the study were the eye-tracking results, as these were the only data that could provide insights into the visual search strategies used by the participants. The results of the main effects, comparing percentage of dwell times inside the room images, aligned the Chinese group with the American group supporting the findings of Tajima and Duffield (2012) and Senzaki et al. (2014), thus indicating that linguistic differences cannot be disregarding in visual search tasks, particularly when the tasks involve a verbal component either in the form of the instructions provided to the participants or any spoken or written output required on the part of participants.

Nonetheless, despite the growing body of literature indicating the influence of language on visual search tasks, linguistic considerations have remained largely absent from cultural psychology and vision researchers’ theoretical accounts for the observed differences in viewing behavior between EAs and WCs. In their study entitled “Asia has the global advantage: race and visual attention”, for example, McKone et al. (2010, Section 8.2) evaluate a number of theoretical explanations for the EA/WC differences in visual processing, including: 1) genetic differences; 2) individualism-collectivism; 3) differences in brain hemisphere arousal and/or organization; 4) myopia; 5) degree of visual complexity in the physical environment; 6) cultural differences in parents’ direction of babies’ attention during infancy; and 7) situational social variables. Despite the extensiveness of their considerations, however, the authors make no mention of linguistic differences and the role these might play. The results of the current study indicate that linguistic differences should also be considered in studies exploring the link between culture and visual behaviors.

Furthermore, the findings from the study also indicate that the linguistic effects on visual behavior are not limited to those considered in typological analyses, which tend to focus on syntactic, phonological, and morphological differences. Tajima and Duffield demonstrated the importance of typological differences on a syntactic level, while the current study extends their findings by indicating that semantic differences in standard translation equivalents can also have a bearing on the use of visual search strategies. Not only does this need to be considered when interpreting the results of empirical studies, it also needs to be considered when designing experiments investigating differences in visual search strategies across groups. For the task used in the current study, for example, a naïve understanding of the relative semantic values of zhǎi, narrow, and semai might lead one to conclude that all three groups were essentially performing the same task, but this was clearly not the case.

Still, there are some findings in the current study which are difficult to account for through linguistic differences alone. Of particular interest was the shift in strategies employed by participants in the three groups as the task progressed. To review, the Chinese and the Japanese groups demonstrated a tendency toward the use of more local strategies, as evidenced by an increased percentage of dwell times inside the room images, while the Americans tended to gravitate toward a more global approach. One possible explanation for this is that participants from the three groups started off by adopting a preferred style that was more or less global or local, in accordance with their interpretation of zhǎi, narrow, and semai, combined with “default” strategies influenced by cultural background, but they then shifted their strategies in response to the demands of the task and their perceptions regarding the effectiveness of their initial strategy.

This view extends Cramer et al. (2016) assertion that environmental factors not directly connected to culture also play a role in participants’ visual search strategies by indicating that task-specific demands might also play a role. Without the benefit of systematic exit interviews, it is difficult to know how intentional this shift was amongst the participants. Nonetheless, some of the American participants did state after the task that they consciously decided to switch from a more local to a more global strategy once they suspected that some of the rooms had the same width. It would be interesting to rerun the experiment without the same width items to see if this pattern was maintained.

Another distinction between the Chinese and the American groups that warrants additional consideration is in respect to the patterns revealed through the heat map analysis. Although these results showed consistency between these groups in dwell times focused on the center of the screen (i.e., between the room images) and on the center of the room images, the heat maps also show a diverse pattern in location of fixations for the American group and the Chinese group outside these areas. Specifically, the American participants tended to direct their gaze both towards the top and the bottom areas of the room images, while the Chinese group showed a strong preference for only the top areas. Again, it would be interesting to repeat the study without same width items, or even with room images with more pronounced differences in width and size, to see if such variations had an effect on these patterns.

Even without systematically-collected empirical data, however, the findings of the current study underscore some important things that need to be kept in mind when designing tasks and analyzing data in studies focused on linguistic and/or cultural differences in viewing behavior. First of all, it is important to resist dichotomous accounts of global versus local viewing tendencies and the conclusion that EAs and WCs literally see the world differently. As was demonstrated in the blind spot facial recognition studies by Caldara et al. (2010) and Miellet et al. (2012), and as was stated by Ishii et al. (2009), Cramer et al. (2016), and others, EAs and WCs are both fully capable of using global or local perceptual strategies when the need arises. If there are general differences in the visual search strategies used by EAs and WCs, these most likely come down to a difference in preferences rather culturally-imposed restrictions. Along these lines, the findings in this study highlight the fact that the use of visual strategies is not a static trait. Instead, it is very much situation and task dependent, and many factors beyond language and culture affect people’s choice of strategies, regardless of whether or not this is a conscious choice.

5 Limitations and conclusion

As with any study, there were some limitations with the current study. To begin with, the study focused on a single lexical item, which needs to be considered when attempting to apply the findings more broadly. In addition, the study lacks ecological validity; the task used will rarely be encountered in the real world. People might look at floor layouts of rooms on blueprints or at a real estate office, for example, but this would probably not involve a close comparison of particular aspects related to subtle size differences. Another limitation is that the current study did not take individual differences into account. Although the statistical analyses often showed large differences in group means, they also showed large standard deviations for things like fixation percentages inside the room images (Table 3).

These limitations notwithstanding, the findings of the current study indicated that linguistic differences could explain much of the observed variation in visual strategies on the task used in this study. However, it is important to note that linguistic differences alone were insufficient; there seemed to have been other factors involved as well. When placed in the broader research context, the findings of the current study provide an important, albeit small, piece of information in the complex task of identifying and accounting for the influences and interactions of language, culture, and other relevant factors on cognitive processes.

Data availability statement

The source code used and the datasets generated and analyzed during the current study are available through the Open Science Framework at https://osf.io/gk5cf/


Corresponding author: Brent Wolter, Ocean University of China, Qingdao, China; and Department of English and Philosophy, Idaho State University, 921 S. 8th Ave., Stop 8056, Pocatello, ID, USA 83209-8056, E-mail:

References

Athanasopoulos, Panos & Emanuel Bylund. 2013. Does grammatical aspect affect motion event cognition? A cross-linguistic comparison of English and Swedish speakers. Cognitive Science 37(2). 286–309. https://doi.org/10.1111/cogs.12006.Search in Google Scholar

Blais, Caroline, Rachael E. Jack, Christoph Scheepers, Daniel Fiset & Roberto Caldara. 2008. Culture shapes how we look at faces. PLoS One 3(8). e3022. https://doi.org/10.1371/journal.pone.0003022.Search in Google Scholar

Boduroglu, Aysecan, Priti Shah & Richard E. Nisbett. 2009. Cultural differences in allocation of attention in visual information processing. Journal of Cross-Cultural Psychology 40(3). 349–360. https://doi.org/10.1177/0022022108331005.Search in Google Scholar

Caldara, Roberto, Xinyue Zhou & Sébastien Miellet. 2010. Putting culture under the ‘spotlight’ reveals universal information use for face recognition. PLoS One 5(3). e9708. https://doi.org/10.1371/journal.pone.0009708.Search in Google Scholar

Chua, Hannah Faye, Julie E. Boland & Richard E. Nisbett. 2005. Cultural variation in eye movements during scene perception. Proceedings of the National Academy of Sciences 102(35). 12629–12633. https://doi.org/10.1073/pnas.0506162102.Search in Google Scholar

Cramer, Emily S., Michelle J. Dusko & Ronald A. Rensink. 2016. Group-level differences in visual search asymmetry. Attention, Perception, & Psychophysics 78(6). 1585–1602. https://doi.org/10.3758/s13414-016-1137-0.Search in Google Scholar

Davies, Mark. 2008. The Corpus of Contemporary American English (COCA). Available at: https://www.english-corpora.org/coca/.Search in Google Scholar

Firth, John Rupert. 1957. Studies in linguistic analysis. Oxford: Blackwell.Search in Google Scholar

Flecken, Monique, Panos Athanasopoulos, Jan Rouke Kuipers & Guillaume Thierry. 2015. On the road to somewhere: Brain potentials reflect language effects on motion event perception. Cognition 141. 41–51. https://doi.org/10.1016/j.cognition.2015.04.006.Search in Google Scholar

Flecken, Monique, Christiane Von Stutterheim & Mary Carroll. 2014. Grammatical aspect influences motion event perception: Findings from a cross-linguistic non-verbal recognition task*. Language and Cognition 6(1). 45–78. https://doi.org/10.1017/langcog.2013.2.Search in Google Scholar

Goller, Florian, Soonja Choi, Upyong Hong & Ansorge Ulrich. 2020. Whereof one cannot speak: How language and capture of visual attention interact. Cognition 194. 104023. https://doi.org/10.1016/j.cognition.2019.104023.Search in Google Scholar

Imai, Mutsumi & Takahiko Masuda. 2013. The role of language and culture in universality and diversity of human concepts. In Advances in culture and psychology. Oxford University Press. https://oxford.universitypressscholarship.com/view/10.1093/acprof:oso/9780199930449.001.0001/acprof-9780199930449-chapter-1 (accessed 13 January 2021).10.1093/acprof:oso/9780199930449.003.0001Search in Google Scholar

Imai, Mutsumi, Junko Kanero & Takahiko Masuda. 2016. The relation between language, culture, and thought. Current Opinion in Psychology 8. 70–77. https://doi.org/10.1016/j.copsyc.2015.10.011.Search in Google Scholar

Ishii, Keiko, Takafumi Tsukasaki & Shinobu Kitayama. 2009. Culture and visual perception: Does perceptual inference depend on culture?1, 2. Japanese Psychological Research 51(2). 103–109. https://doi.org/10.1111/j.1468-5884.2009.00393.x.Search in Google Scholar

Jarvis, Scott & Aneta Pavlenko. 2010. Crosslinguistic influence in language and cognition, Paperback edition. London, New York: Routledge.Search in Google Scholar

Kelly, David J., Shaoying Liu, Helen Rodger, Sébastien Miellet, Liezhong Ge & Roberto Caldara. 2011. Developing cultural differences in face processing. Developmental Science 14(5). 1176–1184. https://doi.org/10.1111/j.1467-7687.2011.01067.x.Search in Google Scholar

Kelly, David J., Sebastien Miellet & Roberto Caldara. 2010. Culture shapes eye movements for visually homogeneous objects. Frontiers in Psychology 1. 1–7. https://doi.org/10.3389/fpsyg.2010.00006.Search in Google Scholar

Lao, Junpeng, Sébastien Miellet, Cyril Pernet, Nayla Sokhn & Roberto Caldara. 2017. iMap4: An open source toolbox for the statistical fixation mapping of eye movement data with linear mixed modeling. Behavior Research Methods 49(2). 559–575. https://doi.org/10.3758/s13428-016-0737-x.Search in Google Scholar

Levinson, Stephen C. 2003. Space in language and cognition: Explorations in cognitive diversity (Language, culture, and cognition 5). Cambridge, New York: Cambridge University Press.10.1017/CBO9780511613609Search in Google Scholar

Lupyan, Gary, Rasha Abdel Rahman, Lera Boroditsky & Andy Clark. 2020. Effects of language on visual perception. Trends in Cognitive Sciences 24(11). 930–944. https://doi.org/10.1016/j.tics.2020.08.005.Search in Google Scholar

Lüthold, Patrick, Junpeng Lao, Lingnan He, Xinyue Zhou & Roberto Caldara. 2018. Waldo reveals cultural differences in return fixations. Visual Cognition 26(10). 817–830. https://doi.org/10.1080/13506285.2018.1561567.Search in Google Scholar

Masuda, Takahiko & Richard E. Nisbett. 2001. Attending holistically versus analytically: Comparing the context sensitivity of Japanese and Americans. Journal of Personality and Social Psychology 81(5). 922–934. https://doi.org/10.1037//0022-3514.81.5.922.Search in Google Scholar

Masuda, Takahiko & Richard E. Nisbett. 2006. Culture and change blindness. https://onlinelibrary.wiley.com/doi/abs/10.1207/s15516709cog0000_63 (accessed 30 September 2022).10.1207/s15516709cog0000_63Search in Google Scholar

McKone, Elinor, Anne Aimola Davies, Dinusha Fernando, Rachel Aalders, Hildie Leung, Tushara Wickramariyaratne & Michael J. Platow. 2010. Asia has the global advantage: Race and visual attention. Vision Research 50(16). 1540–1549. https://doi.org/10.1016/j.visres.2010.05.010.Search in Google Scholar

Miellet, Sébastien, Lingnan He, Xinyue Zhou, Junpeng Lao & Roberto Caldara. 2012. When East meets west: Gaze-contingent blindspots abolish cultural diversity in eye movements for faces. Journal of Eye Movement Research 5(2). https://doi.org/10.16910/jemr.5.2.5.Search in Google Scholar

Nisbett, Richard E., Kaiping Peng, Incheol Choi & Ara Norenzayan. 2001. Culture and systems of thought: Holistic versus analytic cognition. Psychological Review 108. 291–310. https://doi.org/10.1037/0033-295X.108.2.291.Search in Google Scholar

Papafragou, Anna, Justin Hulbert & John Trueswell. 2008. Does language guide event perception? Evidence from eye movements. Cognition 108(1). 155–184. https://doi.org/10.1016/j.cognition.2008.02.007.Search in Google Scholar

Petrova, Kalina, Dirk Wentura & Xiaolan Fu. 2013. Cultural influences on oculomotor inhibition of remote distractors: Evidence from saccade trajectories. Vision Research 84. 43–49. https://doi.org/10.1016/j.visres.2013.03.008.Search in Google Scholar

R Core Team. 2021. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Available at: https://www.R-project.org/.Search in Google Scholar

Roberson, Debi, Hyensou Pak & J. Richard Hanley. 2008. Categorical perception of colour in the left and right visual field is verbally mediated: Evidence from Korean. Cognition 107(2). 752–762. https://doi.org/10.1016/j.cognition.2007.09.001.Search in Google Scholar

Rodger, Helen, David J. Kelly, Caroline Blais & Roberto Caldara. 2010. Inverting faces does not abolish cultural diversity in eye movements. Perception 39(11). 1491–1503. https://doi.org/10.1068/p6750.Search in Google Scholar

Senzaki, Sawa, Takahiko Masuda & Keiko Ishii. 2014. When is perception top-down and when is it not? Culture, narrative, and attention. Cognitive Science 38(7). 1493–1506. https://doi.org/10.1111/cogs.12118.Search in Google Scholar

Tajima, Yayoi & Nigel Duffield. 2012. Linguistic versus cultural relativity: On Japanese-Chinese differences in picture description and recall. Cognitive Linguistics 23(4). 675–709. https://doi.org/10.1515/cog-2012-0021.Search in Google Scholar

Thompson, Joseph J., Mark R. Blair & J. AndrewHenrey. 2014. Over the hill at 24: Persistent age-related cognitive-motor decline in reaction times in an ecologically valid video game task begins in early adulthood. PLoS One 9(4). e94215. https://doi.org/10.1371/journal.pone.0094215.Search in Google Scholar

Winawer, Jonathan, Nathan Witthoft, Michael C. Frank, Lisa Wu, Alex R. Wade & Lera Boroditsky. 2007. Russian blues reveal effects of language on color discrimination. Proceedings of the National Academy of Sciences of the United States of America 104(19). 7780–7785. https://doi.org/10.1073/pnas.0701644104.Search in Google Scholar

Wolter, Brent. 2006. Lexical network structures and L2 vocabulary acquisition: The role of L1 lexical/conceptual knowledge. Applied Linguistics 27(4). 741–747. https://doi.org/10.1093/applin/aml036.Search in Google Scholar

Wolter, Brent, Junko Yamashita & Chi Yui Leung. 2020. Conceptual transfer and lexical development in adjectives of space: Evidence from judgments, reaction times, and eye tracking. Applied Psycholinguistics 41(3). 595–625. https://doi.org/10.1017/S0142716420000107.Search in Google Scholar

Received: 2020-09-08
Accepted: 2022-10-16
Published Online: 2022-11-14
Published in Print: 2022-11-25

© 2022 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 7.5.2024 from https://www.degruyter.com/document/doi/10.1515/cog-2020-0105/html
Scroll to top button