Mixed Reality User Experiences 2026 – Interaction

In this lecture, we will cover the concept of interaction in mixed reality (MR). What is exciting – but also challenging – about this topic is that MR presents an entirely new paradigm of computing interfaces. When digital content is not bound to 2D screens anymore (but rather can exist in the real world around us), the possibilities are endless. And each new possibility brings new challenges. 

It is hard for textbooks to keep up on the topic of MR interaction. Therefore, I have been looking for good literature review papers (recent as well as older ones) that provide a combination of fundamentals and cutting-edge examples of what MR interaction is. 

This list of reviews is for those who are interested (a 2023 review of interaction technique studies, a 2019 review of remote collaborative interaction in MR, and a seminal introduction to 3-D User Interface Design (2001)). 

But as a minimum, I expect you to read this chapter by Jens Grubert (2021) which is intended as an introduction to MR interaction for students. The chapter serves as the foundation for what we will cover here, providing a nice overview of different types of MR interaction techniques. 

MR interaction typology

Here’s a TL;DR of Grubert’s typology, providing an overview of the different types of MR interaction techniques with key examples for each. (I borrowed the typology, but several of the examples are more recent.)

  • Tangible interaction: Tangible user interfaces (TUIs) are concerned with using physical objects as medium for interaction with computers. This a common type of interaction in MR, where the visual and tangible experience can be integrated through virtual overlays on the tangible objects. Classic examples include MagicBook (Computers & Graphics 2001) and Urp from MIT’s Tangible Media Group (1999).
  • Surface-based interaction: As a special form of tangible interaction, surface-based interaction refers interactions with touch surfaces like mobile tables or entire room surfaces like walls, tables, and furniture. The latter is often enabled in projection-based systems like  RoomAlive (UIST 2014) or ​​WorldKit (CHI 2013).
  • Gesture-based  interaction refers to using hand, body, or touch movements as input in MR, allowing users to control systems through sensed physical actions rather than holding dedicated devices. Enabled by cameras and other sensors that track poses and motion, it enables “natural” interactions mid-air, yet a remaining challenge is that prolonged use causes physical arm fatigue. Examples include the hand-based interactions on Meta Quest, or, as a more advanced example, this recent paper on expressive hand gestures: Hand Interfaces (CHI 2022).
  • Pen-based  interaction refers to using a stylus or digital pen (often together with a physical surface like a tablet) as a precise input device in MR, enabling tasks such as menu control, note-taking, drawing, modelling, and manipulating 2D interfaces. It offers a more stable and familiar alternative to unsupported mid-air hand gestures or game controllers for fine-grained interaction. Examples include 2D input on tablets – such as in RealitySketch (UIST 2020) or, in 3D mid-air as in various VR sketching apps, like ShapesXR.
  • Gaze-based interaction: Gaze-based interaction uses a user’s eye movements as MR input, enabling actions such as selection, navigation, or system adaptation by tracking where someone looks. It can support accessibility (for disabled users) or reduce physical effort by complementing hand input. Techniques vary in speed, accuracy, and suitability depending on the task. As an example, here’s a recent project we did here at AU: Spatial Gaze Markers (CHI 2024)
  • Haptic interaction: Such techniques use touch and force feedback to convey physical sensations in MR, stimulating tactile and kinesthetic senses through active or passive devices. It enriches immersion and task performance by providing a sense of physical presence, but faces challenges around portability (as it often requires external devices), visual occlusion (causing tracking issues), and matching virtual feedback to physical objects (requires high precision). A few examples are: Haptic Retargeting (CHI 2016) and HapticBots (UIST 2021)
  • Keyboard and mouse: In MR, text entry is hard! If you don’t sit at your desk, you need ways of providing keyboard input mid-air, such as this example here. They also just released a virtual surface keyboard on Quest. However, if you’re at your personal desk, a physical keyboard could be integrated. Here, keyboard + hand representations matter for the text entry, e.g., Effects of Hand Representations for Typing in Virtual Reality (IEEE VR 2018) —> The Apple Vision Pro also supports – as seen in this video – a nice Augmented Virtuality experience (yeah, now you know what that means!).
  • Human-AI interaction: There’s a whole literature review on the intersection of XR and AI. But I link it just to show you the pace at which the development of AI accelerates. The literature review is pre-LLM days (or at least early days), and in recent years, so much new stuff has come out that enables entirely new use cases. Recent examples:  EmBARDiment (IEEEVR 2025), LLMR (CHI 2024), and Thing2Reality (UIST 2025).

One is not enough

Now, the above types of interaction techniques rarely work optimally alone. In other words: One is not enough! They must work together in multiples. This is explored in these emerging concepts of interaction:

  • Multimodal interaction combines multiple input and/or output modalities (such as speech, gestures, gaze, touch, or haptics) to leverage their complementary strengths in MR. By coordinating several channels, it aims to improve efficiency, realism, and immersion. Multiple modalities are often better, but the benefits depend on careful task-specific design rather than simply adding more modalities. Examples: The technique Gaze + Pinch Interaction in Virtual Reality (SUI 2017) was developed by Ken and Hans who are from our lab. This was recently implemented into the Apple Vision Pro. It has since been further extended in research, e.g., Reality Proxy (UIST 2025), a technique that relies on object detection, LLMs, and/or digital twin technology. 
  • Multi-display / multi-device interaction involves using multiple physical or virtual screens together (ranging from desktops, tablets, and smartphones to HMDs and large display) to expand workspace, support collaboration, or enhance MR experiences. Such techniques are often about enabling content transfer, contextual augmentation, and flexible window management, allowing users to interact seamlessly across devices and reference frames. Examples include FaceDisplay (CHI 2018) and Apple Vision Pro’s EyeSight face display. Other hybrid examples combine multiple computing devices, such as Traversing Dual Realities (CHI 2025), or simultaneous 2D and 3D sketching in MR (e.g., VRSketchIn, SymbiosisSketch). If you’re interested, here’s a large literature review that we conducted on such multi-device MR interaction techniques.
  • Multi-user interaction in MR enables multiple people—co-located or remote—to collaborate, communicate, and manipulate shared content in real time. It supports synchronous coordination and joint tasks in a shared 3D space using virtual avatars, to enable social interactions that go beyond what is possible in regular face-to-face communication. A few examples include: Blended Whiteboard (CHI 2024) and A “beyond being there” for VR meetings (2021)

Techniques vs. Systems

The chapter we have just covered (and expanded) focuses on interaction techniques. Let us zoom out a bit. There is an important distinction in the research field of HCI+MR that you should know of: interaction techniques vs. interactive systems. In short, an interaction technique is a specific implementation that maps input and output modalities to enable users to do actions on objects (e.g., select, manipulate), whereas an interactive system comprises a set of such techniques to achieve a holistic user experience. The important point is that design and evaluation of user experiences can operate at both of these levels. 

Examples to show the contrast: 

  • Gaze+Pinch is an interaction technique that is then demonstrated in multiple applications to convey its versatility and rich potential. 
  • Blended Whiteboard is an interactive system that comprises multiple interaction techniques to create a unique collaborative experience. 

Before we wrap up, a few 🤯 examples…

Let’s end on a few crazy examples. The goal is to expand your mind on the interaction possibilities of MR (+ show you how fun it is to work in academia):

Yes, it is pretty amazing how human capabilities can be augmented when the medium for interaction is able to trick fundamental aspects of human perception!