In 2018, it was reported that male students were given priority over female students in exams at some Japanese medical universities. The underlying reason for this controversial priority is thought to be the assumption that female doctors experience more difficulties than male doctors in the clinical field. One of the universities involved justified setting their admissions criteria higher for women based on an academic paper in the field of psychology that claimed women had better communication skills than men (Mainichi Japan 2018). The officials of the university explained that women tend to reach psychological maturity earlier and have relatively higher communication skills than men; therefore, they adjusted their test scoring in an attempt to correct the disadvantage for male test takers. The paper referenced in their statement (Cohn 1991) investigated sex differences in maturity, but not sex differences in communication skills. Nevertheless, it was used to justify gender bias in the exam without referring to actual gender bias in the clinical field or the higher unemployment rate of female doctors than male doctors, which suggests a discriminatory labor environment against female doctors.

In response to this situation, some Japanese psychologists published a statement denouncing the university’s action as justifying sexism by unsophisticatedly quoting the research. The statement gained the support of over 60 others in the field. Moreover, one academic society on psychological research, the Japan Society of Personality Psychology (https://jspp.gr.jp/en/), held an academic seminar where one of the statement’s authors was invited to discuss how psychologists should interpret gender studies and present the results of such studies to society.

During the academic seminar, an evolutionary psychologist and social psychologist explained the current research environment on gender studies in each of their research fields: (1) gender differences tend to be used to justify the maintenance of the present status in communities; (2) it is hard to interpret gender differences in a strictly statistical sense (both significant probability and effect size); and (3) there have been no guidelines or standards on how researchers should state their results when they find gender effects in their studies (note: these statements are based on a report by the author of the paper, Nomura, who participated to the seminar).

Many studies on gender effects have recently been reported in the fields of human–machine interaction, including affective computing. In particular, researchers in the field of human–robot interaction (HRI) have conducted several studies on gender effects (Nomura 2017). HRI involves the characteristic of the embodiment of robots, which leads to easier assignment of gender-specific properties with robots (so-called “gendered robots”). Moreover, interaction effects of gendered robots with other factors, including the user’s gender, can be explicitly investigated in HRI. However, these studies may be used to design daily-life applications of robots that entrench existing gender biases.

Whether men or women are more likely to prefer or dislike robots is an important issue in HRI (Nomura 2017). Some studies found that females had more negative attitudes toward interactions with robots than males in general (Nomura et al. 2006, 2008), and other studies reported that males had a more positive attitude than females towards the usefulness of a specific type of robot (Kuo et al. 2009; Lin et al. 2012). Effects of user gender in perception of and feelings about robots may depend on other factors such as the type of robots, and the situations and contexts under which robots are used. There are existing studies comparing males and females under different conditions with respect to the existence and absence of various factors related to robots, such as politeness in behaviors (Strait et al. 2015), machine-like or human-like appearance (Tung 2011), and task structure on cooperation or competition with robots (Mutlu et al. 2006).

One of the important issues in HRI (Nomura 2017) is how interaction with humans can be encouraged by endowing robots with human-like characteristics. These characteristics include human-like appearance, natural language communication with voice, and motions such as eye contact and joint movement/position. Gender has also been considered as an important characteristic. Some studies (Carpenter et al. 2009; Niculescu et al. 2010) investigated the effects of gendered robots on human psychological and behavioral reactions toward these robots. In these studies, methods of gendering robots included: appearance; voice; manipulation of names and pronouns that were used in the instructions of the experiments and surveys; or a combination of these factors.

However, the impact of robot gender is not simple, and there are interactions with situational factors such as task and context, as well as human factors, including users’ gender, educational background, and culture. In other words, gender preference in robotics design is dependent on humans interacting with the robots and the situations in which they interact.

On interaction effects between robot gender and user gender, some studies suggested a cross-gender effect, that is, a tendency for males to prefer robots with female characteristics, and females to prefer robots with male characteristics (Siegel et al. 2009; Alexander et al. 2014). On the other hand, another study revealed that females completed a task with a robot equally fast regardless of the robot’s gender; while, males were faster in completing the task when they interacted with the male robot (Kuchenbrandt et al. 2014).

Interactions between robot gender and gender stereotypes that users have are also an important consideration. It was found that a male robot was perceived as more masculine than a female robot, and stereotypically male tasks (e.g., transporting goods, steering machines) were perceived as more suitable for the male robot than were stereotypically feminine tasks (e.g., child care, household maintenance) (Eyssel and Hegel 2012). Another study showed a gender stereotype in occupational roles; a male robot was preferred in a scenario involving a job position related to security and a female robot was preferred in a scenario involving a job position related to healthcare (Tay et al. 2014). Moreover, it has recently been revealed that humans’ perception of emotional intelligence, including empathy in other humans and gendered robots, was affected by their stereotypical gender-based expectations such as females having higher emotional intelligence than males (Chita-Tegmark et al. 2019).

A gender with a stronger aversion to robots means that gender will receive fewer services from robots. As mentioned earlier, such a specific gender’s aversion to robots has not been sufficiently confirmed. Nevertheless, research on user gender in HRI can be used to maintain a gender bias situation in a community.

The worry is that gender bias and gender stereotyping may be justified by referring to research on user gender regardless of whether the reference is justified in this context. For example, assume that some people believe that females are weaker than males in engineering, including the operation of robots, and as a result, males should dominate service sectors that use robots. To convince others about these ideas, they would refer to studies that suggest that females show more negative attitudes toward robots than males do.

In other cases, where people believe that a gender stereotype can support an expected gender bias situation, research on gender may be referenced to directly support the gender bias situation and conceal the gender stereotype. In fact, the case of the exam at the Japanese medical university mentioned in the introduction shows that a study on gender difference on maturity was referenced to justify adjustments to test scoring based on gender difference without referring to gender stereotypes in the clinical field that is, the incorrect belief that male doctors should dominate the clinical field.

In a community, it may be assumed that the female gender is more suited to performing domestic tasks and therefore, female-gendered robots are more suitable for these tasks than male-gendered robots. For this assumption on gender bias while ignoring gender stereotype, reference may be made to studies which suggest that stereotypically feminine tasks are suitable for female robots.

Some researchers have criticized gendering robots in the light of gender stereotypes, arguing that gendering of robots is done through arbitrary choice of technologists that rely mainly on common sense for decision-making (Robertson 2010). The gendering of robots has the possibility to reinforce societal gender stereotypes (Weber and Bath 2007), and that naïve gendering of robots might cause users to demonstrate negative behaviors toward robots that follow gender stereotypes (De Angeli and Brahnam 2006).

In this framework, people may maintain gender stereotyping through its reproduction in robotics applications. For example, the stereotype that the female gender is more suited to performing domestic tasks may encourage female-gendered robots to perform these tasks by referencing to studies that stereotypically feminine tasks are suitable for female robots. This may lead to the reproduction of the stereotype of gender suitability for domestic tasks.

Furthermore, the gender stereotype that females are weaker in engineering, including the operation of robots, than males may lead to the construction of a robotics-based pedagogical system of science and technology that focuses only on male students by referencing to studies which suggest that females show more negative attitudes toward robots than males do. This, thereby, may lead to reproduction of the stereotype.

On reflection, firstly, we should question the validity of a gender bias, for example, in the provision of commercial or educational services using robots. Secondly, we should clarify the relation between gender stereotypes and gender bias, and whether the gender stereotype is actually supported by academic evidence. In this case we may have to complement the explication of gender stereotype for ourselves. Thirdly, we should question whether the referenced studies on gender really support the gender bias situation or the gender stereotype. In the case of the exam at the Japanese medical university, a study on sex differences in maturity was referenced to justify sex differences in communication skills. This type of logical error led to the justification of sexism by unsophisticatedly quoting the research. Finally, we should be careful when using statements on gender difference mentioned in referenced studies. It is hard to interpret gender differences in a strictly statistical sense. We should clarify whether: (1) the statements are supported by sufficient evidence, and (2) whether each study that supports the statements had a sufficient effect size.

It should be pointed out that the result of the paper is just an estimation and it has still not had sufficient evidences in the current stage. Thus, meta-analysis should be conducted to gather cases and confirm evidences. The protocol for this conduction can be proposed as follows:

  1. 1.

    For public reports (papers, news, books, theses, and so on) on robotics applications mentioning about genders (either humans or genders or both), it should be clarified,

  2. 2.

    who are stakeholders in the field where robots are applied (developers, venders, users (students, general adults, elderly), staffs and managers of facilities, etc.),

  3. 3.

    what the robotics applications aim at,

  4. 4.

    what gender-biased situations the above aim invites,

  5. 5.

    who of the above stakeholders are expecting the above gender-biased situations,

  6. 6.

    who of the above stakeholders are suffering from the above gender-biased situations,

  7. 7.

    what gender stereotypes the above stakeholders expecting the gender-biased situation have,

  8. 8.

    what studies on gender in HRI are referred by the above stakeholders expecting the gender-biased situation,

  9. 9.

    whether the above studies on gender can really imply the gender-biased situation and/or gender stereotypes of the stakeholders,

  10. 10.

    what gender stereotypes can be reproduced if the targeted robotics applications are realized.

This protocol for the meta-analysis can be used to formulate guidelines on how researchers should state their results when they find the effects of gender bias in their studies.