Embodied cognition in Immersive Virtual Environments

Exploring the benefits of gesture on second language acquisition


Note: I'll be using virtual reality, immersive virtual environments and IVEs interchangeably throughout this presentation.

  • digital environments
  • experienced through immersive hardware
    • head-mounted displays
    • embodied controllers

Research Motivation

Immersion + Learning




Research Motivation

I put on a VR headset and thought it was special

  • Spatial presence/Place illusion
  • Embodiment

Spatial presence/Place illusion

  • Spatial/Perceptive presence: being in a place
  • General/Cognitive presence: focusing attention on the place
  • Typically not well-separated in experiments (no exception in this experiment)

Lessiter et al., 2001; Schubert et al., 2001; Witmer, Jerome, Singer, 2005

Embodiment in VR

“The sense that emerges when your virtual body properties are processed as if they were the properties of one’s own biological body”

  • sense of self-location
    • feeling that one’s self is located inside the biological body or an avatar’s body
  • sense of agency
    • the body is able to conduct the user's will
  • sense of body ownership
    • the body is the source of experiences (e.g. body transfer illusion)

Kilteni, Groten, 2012

Can we leverage these for something useful? How?

  • Situated learning (Brown & Duguid 1993)
  • Embodiment cogniton (Wilson 2002)

Situated Learning

  • Creating meaning from the real activities of daily living
  • Cooperative education experiences … immersed and physically active in an actual work environment
  • “Embeds learning in activity and makes deliberate use of the social and physical context.”

Brown and Duguid, 1993

Embodied Cognition

  • Summary: Cognitive processes are rooted in the body’s interactions with the world
  • “The activity of the mind is grounded in the mechanisms that evolved for interaction with the environment.” (#4)
  • This research is conceptualised through this perspective

Wilson, 2002

Immersion + Learning


  • Simulators, widely deployed, widely successful
    • Extremely vocation-focussed (i.e. not for abstract learning)
    • Often use real hardware, simulated software (consumer IVEs: general hardware)

Abstract/cognitive applications?

  • Theoretically!
    • Spatial
    • Abstract → concrete
    • Interactive
    • Repeatable
    • Feasibility
    • Motivating
    • Post-reality (Slater & Sanchez-Vives, 2016)

Mikropoulos, 2006


  • Yes (so motivating, vs. real-world, Makransky, 2019)
  • No (too confusing, vs. desktop, Makransky, 2019)
  • It's complicated
  • ✅ People learn in IVEs
  • 🤷 People learn better in IVES vs. real-world
  • 🤷 People learn better in IVES vs. desktop
  • 🤷 People learn better in IVES vs. watching people in IVEs

IVEs are different

What affects learning?

  • Meta-analysis (Howard, 2019)
    • ✅ Game elements had a significant impact on outcomes
    • ✅ Output hardware had a notable impact
      • I.e. head-mounted displays
    • 🤔 Input modalities did not have a notable impact on cognitive learning outcomes

Input modalities ≠ Impact

  • Is that true? For everything?
  • Embodied Cognition theories:
    • Bodily-rooted knowledge: Lever direct, happiness, response time (Barsalou 2008)
    • Physical manipulation: Learning division with candy (Schwartz & Martin 2006)
    • Gesture simulated action Related gesture for maths memorisation (Hostetter & Alibabi 2008)

Howard limitations

  • Too many variables
    • Range of subjects
    • Design of systems
    • Technologies used (including input types)
  • Calls for more research in individual subjects
  • Calls for comparative studies of immersive system variables
  • Perhaps we can find a strong example to demonstrate an input effect?

Choosing a subject

  • Applied Linguistic theories tell us:
    • Learner output (and so input into a system) is the most important aspect of language learning
    • Gesture: Total Physical Response (Asher, 1969)
    • Speaking: Production Effect (MacLeod, 2010)
    • Gesture + Speaking combined (Kelly, 2010; McNeill 2005; Macedonia, 2019; Bergmann 2013)


Many types of gesture. Here: iconic gesture beat deictic metaphoric

  • EEG & Gesture: areas of the brain linked to action activate when learning or using iconic gesture works (Macedonia 2019)

Immersive input modalities should help language learning

  • IVEs: allows gesture input, spoken input, interactivity
  • Linguistics: benefits from gesture input, spoken input, interactivity

But... maybe it won't work

  • Researchers have found multimodal inputs in IVEs can be cognitively overwhelming, and harm learning

Schrader, Bastiaens, 2012

We should find out


  • Comparison between two groups of input methods for language memorisation:
    • Gesture + speech vs. speech
  • Monitor learning outcomes + covariables associated with IVE learning
    • Motivation, presence, usability, cognitive load, learning style
  • Three hypotheses


  • Language memorisation occurs when using gesture + spoken production in an IVE
    • I.e. Does the system allow learning?


  • Leveraging gesture + spoken production leads to better language learning than spoken production alone
    • I.e. Are multiple inputs are more powerful than single ones, like in real-life language learning?
    • Or does this not carry over? Why?


  • Cognitive load does not vary significantly between the two groups
    • I.e. Is cognitive load really a cause of learning issues for multimodal inputs?

Gesture + Speech vs Speech only



  1. Pre-test
  2. Exposure (20 words in ~15 minutes)
  3. Post-test
  4. Week-test

Data collection

  1. Scores
  2. Learning preference (VARK)
  3. Cognitive load (self-reported)
  4. Presence (self-reported, Slater)
  5. Usability and motivation (MEEGA+)


Did learning occur?


Pre-test mean: 0 (both groups)

Post-test mean: 8.78 💪👄 and 5.5 👄

Was there a difference between groups? Yes

Independent t-test, P<0.05, one-tailed

Did group affect retention? No

Two-factor Mixed ANOVA, (Interaction Group, F(1,22)=1.179, p=.289)

Did cognitve load have an impact? No

m=0.71 💪👄 vs. m=0.50 👄

Not significant (t(22) = 0.37, p=0.36)

Independent t-test, P<0.05, one-tailed

How did the covariables relate?

  • No covariable difference between interaction groups.
  • Only presence had a medium-sized impact on overall score:

Multiple linear regression, backwards step-wise method

Learning preference?

We couldn't use our learning preference data, as there wasn't enough difference.


  • ✅ Language memorisation occurs when using gesture + spoken production in an IVE
  • ✅ Leveraging gesture + spoken production leads to better language learning than spoken production alone
  • ✅ Cognitive load does not vary significantly between the two groups
  • 😃


  • IVE input modalities had an impact on learning, but interestingly:
  • No co-variable (motivation/presence/cognitive load/usability) explained impact of adding gesture interaction on results

Does iconic gesture leads to better memorisation, just because it is embodied?

  • Evidence for embodied cognition, not embodiment-as-motivation-benefit
  • Evidence we can leverage gesture-simulated-action/physical manipulation in an IVE
  • Evidence that embodied cognition doesn't need perfect tools (e.g. full hand replicas, tactile feedback)

Or maybe…

  • We didn't capture covariables well enough
  • There's some other covariable we haven't tracked
  • Adding input modalities help learning, whatever they are

Other things to note:

  • Motivation did not impact results
    • Has been considered the cause of gesture-language learning (Asher, 1967)
    • Has been considered the cause of IVE learning (Makransky, 2019)
  • No presence difference between inputs
    • Speculatation that embodied controls would increase presence


  • Designed to get a strong result (generalisable?)
  • Interactionist, not embodiment: one method has more interaction than the other
  • Covariables measuring could be more robust
  • Learning preference affect? We couldn't check
  • It's not really language learning, is it?
  • What does it tell us about embodiment in VR?
    • Is it gesture-simulated-action (Hostetter)?
    • Or is it physical manipulation (Schwartz)?