In this lecture, we will be covering key concepts and taxonomies on the technologies underpinning MR user experiences, covering some specific MR devices, and principal techniques for tracking and displays.
In the past, I’ve found that students can feel this lecture to be overwhelming with just a long string of general concepts and example devices, which has the consequence that nothing really sticks. So this time, I want to flip it around, where we actively look together at a key online resource: https://vr-compare.com/. I will then explain the concepts you need to know in order to compare key devices like Meta Quest 3(S), Apple Vision Pro, Samsung Galaxy XR (Android), Snap Spectacles, and HTC Vive Pro 2 (discontinued in 2025). The AI glasses category (such as the Ray-Ban Meta AI glasses) are not technically an MR dispaly, they are rather a heads-up display (HUD).

My goal with this lecture is that you will learn how to navigate this table which compares the above set of devices – and be able to understand the key differences between state-of-the-art devices and why these differences matter.
In the process of exploring the above multi-dimensional device comparison, we will aim to cover most of the terms below.
Glossary
Devices:
Each MR device out on the market offers a specific combination of tracking and display technologies, intentionally designed to navigate a set of trade-offs in order to create the best MR experience for their target users.
In terms of headworn devices, there is a distinction that has become popular in industry:
Headsets vs. Glasses: There is no clear cut, but the distinction is currently used to differentiate heavy MR HMDs (most widely used for high-end home entertainment and gaming) from the lightweight glasses form factors (designed for everyday computing). They map out on the Milgram RV continuum with headsets at the V-end and glasses at the R-end of the spectrum.
Tracking:
In MR, the main modality for tracking, and what we will focus on, is called optical tracking – i.e., techniques that rely on cameras (as opposed to other sensors, like GPS, IMUs, ultrasonic tracking, etc.). Here are some key distinctions to be aware of in optical tracking:
- Marker tracking: Markers can take many forms; retroreflective markers used for IR trackers (like those coming with the Optitrack system), beacons (like the base stations for the HTC Vive Pro 2), barcodes (like QR codes, markers from ARToolkit developed 25+ years ago, or reacTIVision which has often been used for tabletop interfaces like this one).
- Markerless tracking: In contrast to the marker-based approaches, markerless tracking does not rely on detecting any dedicated markers in the environment, but rather tracks natural features in the environment. There is a range of techniques to enable a markerless approach – such as image texture tracking (e.g., tracking a magazine cover or a coaster), model-based tracking (looking for things where you know the shape in advance, like a Apple’s Face ID), or Structure from Motion (SfM) techniques like SLAM, which is used by headsets to build a 3D scene understanding.
- Model-based tracking: This approach uses a model which it can match to the camera input in order to detect and track the entity that the model represents. Common examples include human face/body/eye tracking and object tracking.
- ML-based tracking: Modern solutions today are not based on manually made models, but rather on ML techniques that leverage training models on large datasets, such as YOLO.
- Inside-out vs. Outside-in: There are two different hardware approaches to tracking users and objects for MR experiences. Inside-out tracking (explainer video) uses outward facing cameras to track the environment and thereby track the user’s position in relation to it. In contrast, outside-in tracking (explainer video) uses beacons in the environment to track markers on the headset. The general pro/con here is that inside-out is more mobile, whereas outside-in is more precise. Although the distinction is less relevant today (because all recent commercial devices are inside-out), it is still important to know about – especially if you want to do research that requires super precise tracking, like we did in this CHI2016 project here where we had to build our own AR outside-in tracking system using Optitrack cameras and IR markers.
All of the above tracking approaches make the following modalities available for interaction input: head, torso, hands, eyes, gaze, and more. You will rely on several of them when you get to implement interaction techniques during the course, so it’s good to know about how they thrive and what their limitations are)
Tracking is about how the device gets the right input for interaction. Now let’s switch to the output:
Displays:
Just like there are visual (optical) and non-visual tracking techniques, there is also a breadth out display types and techniques. Beyond the visual type, there are haptic and auditory feedback mechanisms. These are very important in making effective immersive experiences.But for now, we will focus on the visual dimension of MR displays.
There are two primary things about visual MR displays that you should know: I) we distinguish between two primary types of HMDs with quite different pros/cons, and II) there is a wealth of MR displays beyond HMDs which we will only cover in brief in this course, as you will only be developing for HMDs.
- Two types of HMDs – Optical see-through (OST) vs. Video see-through (VST): OSTs use a visor in front of the user’s eyes and a tiny projector that projects the MR overlay onto the visor. Classic OST examples are Hololens 2, Snap Spectacles, and Magic Leap. In contrast, VSTs have a lens and a screen in front of each of the user’s eyes, and at the outside front of the headset, there are camera pointing out toward the environment, which provide the ‘eyes’ for the user, real-time streaming stereoscopic video to the two screens inside the headset. Classic VST examples are Quest 3, Apple Vision Pro, Samsung Galaxy XR, or any of the modern VR headsets..
- Beyond HMDs: There is a nice display taxonomy by Bimber & Raskar (2006) that visualizes the full range of displays for MR. They range from head-attached, through handheld, to spatial. And there are subcategories in-between. Have a brief read through this paper. A few examples that you should see if you can fit into the taxonomy is SixthSense, the famous interface concept developed at MIT, and RoomAlive by Microsoft Research.
Case studies
We will end these lecture notes with case studies of two MR devices to see how the above techniques have become embedded into modern headsets. We will focus on the Meta Quest 3 and the Apple Vision Pro. They are quite similar on several dimensions, sharing the following main features:
- Hand tracking, face tracking, surface detection, scene understanding (with spatial anchors)
- Recently: Support for pen-based input (Logitech has the MX Ink Stylus for Quest and the Muse for AVP)
However, they also differ in the ways they navigate trade-offs:
Meta Quest 3 (the low-end consumer product)
The latest Quest device is a cheap standalone headset with a fairly open software platform. It is a VST headset with inside-out tracking (see, now you know what that means!).
Main trade-offs:
- Cons: Display is sufficient resolution (justified by the low price tag), but cannot be used to e.g., read small details or text in the world through the display.
- Pros: The Passthrough Camera API – The video passthrough was recently made accessible for processing by the developer, which makes it ideal for research. And it has enabled some exciting new tracking capabilities, such as marker (QR code) tracking, live (ML-based) object detection+tracking.
Apple Vision Pro (the high-end professional product)
The AVP is another standalone VST with inside-out tracking. But it is a high-end expensive headset aimed for professional use. It’s quite a closed-up platform (it’s Apple! No shock, of course), with only very curated developer access to its tracking capabilities. But it’s a very impressive demo!
Main trade-offs:
- Pros: Display is very high resolution – best in its class! It even has a display for eyes to support eye contact (although it looks quite goofy). It supports eye tracking for gaze-based interaction and hand tracking with better coverage area, enabling arms to rest in a comfortable position during interaction.
- Cons: It is bulkier/heavier than Quest and more closed-up for developers, e.g., eye tracking and passthrough video is not open enough for research purposes.
As a final remark, in 2025, Google announced a real competitor that sits in between the low- and high-end products is the newly released Samsung Galaxy XR. Check out this quick explainer by MKBHD.
Now, if there’s only one thing you take away…
It should be this:
MR devices have different trade-offs. Know the vocabulary/terms and the differences so that you can choose the right technical solution for the problem at hand.
