Kinesthetic controls in Immersive Virtual Environments: simulated gesture or bodily action?


It is well-established that a learner memorises verbs better if they perform a verb-congruent action during the memorisation process (e.g. kicking a ball to learn the verb “kick”). This is known as the enactment or called self-performed task (SPT) effect. Experiments concerning the enactment effect has shown that both taking actions with objects (SPT-Os) and gesturing without objects aids the memorisation process. However, research has also suggested that there are distinct memorisation outcomes between the interaction types (Hall, ???; Wakefield, ???). It has been theorised that this means that interacting with objects presents a different encoding route than gesture-based learning without objects. More simply: acting-on-objects is different from gesturing-on-abstractions.

Recent studies in virtual reality have shown that the enactment and SPT effect carries over into immersive virtual environments (IVEs) (Ratcliffe, 2020; Vasquez, 2018). It is unclear, however, how the sensimotor learning process is contextualised and encoded by IVE learners – are our actions in IVEs considered actions with objects, or gestures with abstractions? Even with embodied controls, it is arguable that we never truly taking “bodily action” in an IVE, as it is always an interface with a simulation. But equally, when we touch a virtual object, it could be considered distinct from simply gesturing inputs for a digital system to interpret.

We believe that by modifying Hall's experiment for an IVE SPT memorisation test, and comparing our results with their findings, we will be able to determine how we contextualise interactions in IVEs, and understand if object-interaction brings a benefit for verb memorisation. Hall’s research leverages congruent movements for memorising verbs, controlling between object interaction (actions) and gesture abstracted interaction (gestures). Like Hall’s, our experiment will see one group will attempt to memorize verb names while encoding with a gesture that does not interactive with presented objects, while the other will be required to grab and use the virtual objects to create the congruent movements. Pre- and post-test outcomes will be compared.

Novelty effect??


  • There will be a difference in memorisation between the group with gesture simulated interaction and those with bodily action
  • The novel bodily action group will have the most powerful learning outcomes

Theoretically, action could be considered more embodied then gesture. Wilson: 3,4 and 6.


Read in full.


Embodied in Immersive Virtual Environments

Immersive virtual environments (IVEs), when coupled with embodied controls, are considered highly embodied places. Users have been shown to report a high sense of spatial or perceptive presence (the feeling of being in a place); as well as being cognitively present and focussing their attention on the place created the IVE (Lessiter et al., 2001; Schubert et al., 2001; Witmer, Jerome, Singer, 2005). IVE users sometimes also report a feeling that their “virtual body properties are processed as if they were the properties of one’s own biological body” (Kilteni, Groten & Slater 2012). This includes a sense of self-location, in which one feels that one’s self is located inside the avatar’s body, a sense of agency, in which the body is able to conduct the user's will, and a sense of body ownership, the body is the source of experiences (e.g. body transfer illusion, Slater, Spanlang, Sanchez-Vives, Blanke, 2010).

The enhanced feelings of embodiment that IVEs present have been shown or theorised to impact the treatment of anxiety disorders (Powers, Emmelkamp, 2008; Wiederhold, Wiederhold, 2005; Maples-Keller, Bunnell, Kim, 2017), aid pain management (Li, Montaño, Chen, Gold, 2011; Gold, Mahrer, 2018), stimulate empathy (Shin, 2018; Schutte & Stilinović 2017) and influence certain types of learning (Slater, 2017; Lindgren, Johnson-Glenberg, 2013). However, much of the research on IVEs and the impact of embodiment technologies has been disparate, often lacking a clear focus on understanding the mechanisms through which humans contextualise, understand and respond to attempts to embody them in virtual surroundings (Howard, 2019; Lindgren, Johnson-Glenberg, 2013).

For hand-based, embodied interactions with an IVE, there is little research into how we contextualise these actions, although they have been well-explored from the perspective of usability and interface design research (Cabral, Morimoto, Zuffo, 2005). Embodiment questions, such as whether we are taking action in a virtual space, or if we are simply gesturing in our real, physical, are underexplored. This is an important distinction, as physical manipulation theories (Schwartz & Martin 2006) and gesture-simulated action theories (Hostetter & Alibabi 2008) differ in their approach to embodied cognition.

There is some evidence that interactions with objects in IVEs are closers to physical manipulations than gesture-simulations, as there is evidence of the body transfer illusion (Yuan & Steed 2010; Javorský, Sylaiou, Škola, Liarokapis, 2019). This suggest that we consider the actions of others on our virtual bodies as real actions, rather than actions happening on a distinct avatar. That does not mean, however, that the inverse is true: that actions we take in an IVE are considered physical manipulation rather than gesture-simulated action.

It is possible for the boundary between a cognitive agent and his or her environment to be malleable Andrewson 2010 Richardson), it could be that IVE allows us to take actions in IVE that have the same outcome in the real-world. Therefore an IVE may allow our brains to think we are taking actual actions, and not just outputting gestures.

A potential method to explore this boundary between physical manipulation/gesture-simulation is through exploring an area that shows distinctions between physical manipulation and gesture simulation in the real world: the enactment effect.

Enactment effect and SPTs

The enactment effect is a well-researched and evidenced verb-learning phenomenon (Englekamp & Zimmer, 1983; Nyberg, 1993; Nyberg, Nilsson, & Backman, 1991). Experiments have overwhelmingly found that performing a relevant action (or gesture) when memorising a verb results in better outcomes than not performing the action, performing an irrelevant action, or watching someone else perform the action. The benefits of the enactment effect have also been repeated in IVE settings both on their own (Vasquez, 2018) and when combined without other input modalities (Ratcliffe, 2020).

There is evidence that shows different cognitive responses from whether an enactment involves physically manipulating an object (e.g. actually kicking a ball to learn term “kicking”) and when it is a gesture-simulated action (e.g. miming a kick). Hall (???) found that children learned novel verbs better through action experiences rather than gesture experiences (Hall), while there are been observed higher rates of recognition and recall accuracy for verbs with a greater amount of associated bodily information (Sidhu and Pexman, 2016). Hall (and Wakefield, ???) also found that gesture can highlight important components of an action without being tied to a specific object; a quality beneficial for generalisation of novel verbs. A similar investigation of the generalisability of gestures rather than actions was also found in math investigations (Novack /??).

Wakefield theorises that these difference mean gesture and action aid learners via distinct mechanisms. Why there is a distinction is still unclear, but there could be numerous explanations. One answer might be the perceived manipulability of an object. If we make a physical action on an object (in the real or virtual worlds), and the object reactions, we experience the object is manipulatable. We have seen evidence that the perceptive manipulability of an object is impacts how we memorise it (Madan & Singhal, 2012). Madan and Singhal interpreted the overall benefit for highly manipulatable items as being due to automatic activation of motor representations – perhaps actions on objects stimulate these to a higher degree then gestures?

Kormi-Nouri and Nilsson (2001) have a contradictory view, explaining the enactment effect not by the motor encoding, but the learning episode. By enacting an action like “lifting the pen”, the act of lifting and the pen are registered together in a single episode. Perhaps actions on the objects created deeper episodic integrations than gestures near the objects? Nyberg (1993) proposes a model in which enactment depends on increasing the distinctiveness of the memory trace, which is improved by item-specific as well as relational processing. Taking actions on an object may be more distinctive than simply gesturing near one. Mangels and Heinberg (2006) found that semantically sensible action phrases like “hug the doll” had better memorisation outcomes then stranger ones, like “hug the shovel”, suggesting semantic association played a role in memorisation. This could be an explanatory factor: actions on objects make more semantic sense than gestures near objects.

Regardless of what causes the distinction between action and gesture in the enactment effect, it seems a ripe effect to explore as method for understanding whether humans contextualise interacting with an object in an IVE as physical action in a virtual space, or gestural simulation.