This is an old revision of the document!

Inaction, Interaction, Embodied Interaction: comparative language acquisition study across HMD and monitor virtual environments

Draft Paper

Analysing the impact of interaction, embodied interaction and head-mounted immersion on memorisation: rationale and experiment design (for submission to Conference on Games 2019 @ QMUL)


This comparative study investigates whether the type of interface have an effect on users' processes of learning words in a second language in a 3D environment.

It explores two types of computer interaction technology: desktop (mouse, keyboard, monitor) and immersive (IVE: head-mounted immersive virtual environments w/ embodied interaction), across two types of words: nouns and verb.

This research is important as immersive interfaces are already being used in place of the desktop interface by researchers and consumers, but research is unclear on the benefits and drawbacks of using different interaction methods for different activities. This research seeks to explore if a distinction exists in the language learning space, through an examination of users' memorisation of nouns and verbs after exposure to each interaction type.

We believe that the immersive technology will demonstrate motivational benefits and stronger learning outcomes than the desktop interface.


The test environment should be as similar as possible. Both the desktop and immersive interfaces will feature:

  • A 3D explorable space
  • First-person perspective
  • Interactive objects
  • Contextually-situated action

The environment will essentially be the same between the two environments, however the desktop interface will use keyboard input for actions (such as “Press E to Lift Object, Press F to Pour Tea”), while the immersive interface will require users to make gestures to grab objects and make actions.

The environments will be designed to include opportunities to learn nouns and verbs.

  • Environment will be designed to look like a Japanese coffee shop
  • Coffee shop layout will be modified so that there are 10 locations within the shop, which, when the avatar enters, prompt interaction and targeted learning with a noun and verb combination
  • User will be able to walk around the space
  • Experience will be split into two parts: presentation and practice
  • Presentation is a step-by-step interaction with each location being experienced in turn
  • Practice is a free exploration of any interaction location
  • Support and instruction will be provided by a helper NPC who issues verbal instructions
  • Users will be able to ask for verbal instructions to be repeated
  • There will be peripheral, non-interactive elements, such as other tables and chairs and cafe patrons
  • Interactive elements will be marked with an outer glow for clarity of interaction
  • Desktop interface will use keyboard entries to carry out required actions
  • Immersive interface will use bespoke gestures to carry out required actions
Reference image

Vocabulary List

There are eighteen word pairs to be acquired, with each verb paired with a noun.

The word pairs are split into six sets (below list is working example, updated list can be found in resources section of this page):

Session Number Interaction Group Verb Noun Verb (JP) Noun (JP)
1 A Take Black tea Toru Kocha
1 A Pay Money Shiaharau Okane
1 A Pour Milk Sosogu Gyuu nyuu
1 B Stir Cup Mazeru Yunomi
1 B Cover Lid Kabuseru Futa
1 B Sip Drink Hitokuchi Nomimono
1 C Eat Rice cake Tabemasu Mochi
1 C Wipe Napkin Fuku Otefuki
1 C Open Door Akeru Tobira
2 X Choose Green tea Erabu Houjicha
2 X Give Banknote Ageru Osatsu
2 X Add Sugar Kuwaeru Satoo
2 Y Swirl Tea bowl Mawasu Chawan
2 Y Put on Tray Oku Bon
2 Y Spill Beverage Koboreru Inryou
2 Z Smell Cake Kagu Yogashi
2 Z Clean Towel Kirei ni suru Daifukin
2 Z Move Bag Ugokasu Kaban

Participants will experience both sessions, one in VR and one in a desktop environment. Combinations include:

  • VR → Desktop
  • Desktop → VR

Each session has three groups of word pairs (ABC or XYZ), with each group a different interaction type (non-interaction, interaction, embodied). There interaction type assigned to each group varies between participants. E.g. Participant one might have A = noninteraction, B = interaction, C = embodied, while participant two might have A = embodied, B = noninteraction, C = interaction.

There are 12 different combinations of interaction type, output type (HMD vs monitor) and word pair group:

Display Type Word Group: A/X Word Group: B/Y Word Group: C/Z
HMD Abstract Embodied Non-interactive
HMD Abstract Non-interactive Embodied
HMD Embodied Abstract Non-interactive
HMD Embodied Non-interactive Abstract
HMD Non-interactive Abstract Embodied
HMD Non-interactive Embodied Abstract
Monitor Abstract Embodied Non-interactive
Monitor Abstract Non-interactive Embodied
Monitor Embodied Abstract Non-interactive
Monitor Embodied Non-interactive Abstract
Monitor Non-interactive Abstract Embodied
Monitor Non-interactive Embodied Abstract

Conducting the experiment

Each session should take no more than 85 minutes. Participants will be studied two at a time, with one in the desktop and one in the VR environment.

Introduction + Pretest

1. Outline experiment and process (5 mins)
  • Inform participants that the experiment is designed to examine language acquisition via technology
  • Explain that we'll be asking them some questions about their knowledge of the target language - Japanese - as well as questions about languages, learning and technology in general
  • Explain that they will get to experience a bespoke language-learning environment and will be asked how they felt about it, as well as some questions regarding anything they learned inside
2. Pre-test: ask them if they recognise the meaning of a list of words (L2) (10 mins)
  • Explain the listening test procedure
  • The computer will play a sound clip of a Japanese word (20 words)
  • The participant should select the English translation from a list
  • If the participant doesn't know the word, they should choose “don't know”
  • Explain that people without prior Japanese were purposefully selected for the study, so to not feel bad if they do not recognise many or any of the words
3. Learning Introduction: provide participants with a list of words/phrases (L1) we'll be trying to learn (5 min)
  • Show list of words (in L1) that participant could learn after using the system

Interface test #1

4. Tech Introduction: introduce them to their interface with a tutorial task (5 mins)
  • Each participant is shown their interface
  • Run an tutorial exercise to familiarise them with controls
5. Interface test (15 mins)
  • Engage participant with learning interface for set time
6. Test: test to see retention rate (5mins)
  • Use same test as original pre-test
7. Survey: survey for other factors (10mins)
  • Qual/quan about other factors:

Interface test #2

8. Tech Introduction: introduce them to their interface with a tutorial task (5 mins)
  • Each participant is shown their interface
  • Run an tutorial exercise to familiarise them with controls
9. Interface test (15 mins)
  • Engage participant with learning interface for set time
10. Test: test to see retention rate (5mins)
  • Use same test as original pre-test
11. Survey: survey for other factors (10mins)
  • Qual/quan about other factors:


12. Thanks + Next Steps
  • Thank participants
  • Remind them a test will be sent to them to do in seven days
  • It is important they do it on that day to get paid
  • Please DO NOT study any Japanese or Japanese words between now and then

Repeat Test

13. ReTest: test one week later to see retention rate (10 mins)
  • Use same test as original pre-test
  • Ask if they have researched any of the words since

Key Question

Does interaction type and/or display type have a notable impact on L2 word acquisition in a 3D environment, and does it depend on the type of word?


  • The biggest difference in memorisation results will be between interaction and non-interaction
    • This result will be bigger on verbs than nouns
  • HMD will have positive memorisation results over monitor
  • There will be no memorisation difference between embodied and interactive learning
  • Motivation will be higher in HMD and embodied scenarios
  • Enjoyment will be higher in HMD and embodied scenarios
  • Presence will be higher in HMD and embodied scenarios
  • Learning preference will not make a difference for interaction type


  • Independent (primary): Interaction technology
  • Independent (secondary): Output type (HMD vs. monitor), word types (noun vs. verb)
  • Dependent:
    • Motivation
    • Enjoyment
    • Presence
    • Confidence
    • Word retention (instant)
    • Word retention (one week)
    • Number of interactions

Analyse results

The experiment have three independent variables - display type, interaction type, and word type in a 2x3x2 design (displayed below).

It uses within-subject design to maximise the number of results and minimise potential random noise. The words and experiences differ enough that we were not worried about the transfer of learning across conditions.

Did people learn?

Paired t-test on before/after learning, to conclude whether:

  • People learned or not instantly
  • People learned or not after one week
  • Difference between instant results and one week

Word difficulty

Something to determine word difficulty/if a word or phrase is particularly hard?


  • Shapiro-Wilk test of normality
  • Check for sphericity and adjust if needed (Greenhouse-Geisser)
    • Check Epsilon to under if sphericity is borked: closer to 1 better, lower bound 0.5 for three-way

What factors influenced learning, and what are their relationships?

  • Three-way repeated measure ANOVA on IMPROVEMENT between tests, main effect of each variable and interaction between them:
    • hmd/desktop
    • interaction/non-interaction/embodied interaction
    • noun/verb
  • Three-way repeated measure ANOVA on IMPROVEMENT between tests, main effect of each variable and interaction between them (for one week later):
    • hmd/desktop
    • interaction/non-interaction/embodied interaction
    • noun/verb

Presence influences?

  • Two-way repeated measure ANOVA on PRESENCE
    • hmd/desktop
    • (maybe interaction/non-interaction/embodied interaction)

Motivation influences?

  • Two-way repeated measure ANOVA on MOTIVATION
    • hmd/desktop
    • (maybe interaction/non-interaction/embodied interaction)

Enjoyment influences?

  • Two-way repeated measure ANOVA on ENJOYMENT
    • hmd/desktop
    • (maybe interaction/non-interaction/embodied interaction)

Interaction method on noun/verb influences?

  • One-way repeated measure ANOVA on interaction/non-interaction/embodied interaction
    • noun/verb outcome

Word Confidence influences?

  • Three-way repeated measure ANOVA on WORD CONFIDENCE
    • hmd/desktop
    • (maybe interaction/non-interaction/embodied interaction)
    • noun/verb

Interaction reptition influences?

  • Three-way repeated measure ANOVA on INTERACTION REPITITION
    • hmd/desktop
    • (maybe interaction/non-interaction/embodied interaction)
    • noun/verb

Learning preference

  • Two-way repeated measure ANOVE on learning outcome
    • interaction/non-interaction/embodied interaction
    • VARK learning preferene


Development Plan

  • ◯ Ethics
  • ◯ Screening questions
    • ◯ Form
  • ❌Introduction script
  • ◯ Pre-test questions (attitudes)
    • ❌Form
  • ◯ Pre/post-test questions (aptitudes)
    • ◯ Sample question
    • ◯ Record results into database
    • ◯ Form
    • ◯ Play sound file on demand
    • ◯ Type answer
    • ◯ Click “I don't know”
    • ◯ Assign to UserID
    • ◯ Option: PRETEST (all words)
    • ◯ Option: POSTTEST (choose which words to see)
    • ◯ Record answer time (time to click on box after sound file finishes playing)
    • ◯ Ability to send test to external participants
  • ❌Qualitative questions (attitudes)
    • ❌Entry form
  • ◯ Word list
  • ❌Virtual World
    • ❌Study Variables
      • ❌Ability to select which word pairs to use
    • ❌Game Sequence
      • ❌Learning script for each location
      • ❌Free explore activation after initial presentation
    • ❌Interface
      • ◯ VR Movement (Left thumbstick forward/backward towards viewpoint, left/right fixed rotation)
      • ◯ VR Grab Object
      • ◯ Gesture recognition (20 verbs)
      • ❌Gesture animation (for desktop hotkey, 20 verbs)
      • ❌Desktop movement
      • ❌Desktop action
    • ❌Tutorial area
      • ❌ Move
      • ❌ Grab
      • ❌Gesture
    • ❌Sound
      • ◯ Play sound on collision
      • ❌Coffee shop background noises
      • ❌Script recording
    • ❌Graphics
      • ❌Coffee shop graphics
        • ❌Background decorations
        • ❌20 nouns
        • ❌Player hands
        • ❌Helper NPC
          • ❌Animations
        • ❌Other NPCs
          • ❌Animations