Immersive Media

What is “Immersive Media”?

Ultimately, it’s the Holodeck. That is, a media experience that is not constrained by a fixed image size, restricted to one viewing position, or encumbered by distracting limitations in image fidelity.

Is Immersive Media created only by computer generated graphics?

Not at all. IDEA’s plans anticipate wide adoption of photographically-captured images, objects and environments. These real-life photographic captures may be stand-alone, such as for a sporting event or concert, or may be seamlessly combined with certain computer generated image elements.

How are immersive scenes photographically captured?

Light field cameras or camera arrays can be used for live-action immersive cinematography. Well-established photogrammetry techniques can be used to capture still images, such as environments, rooms or background plates. And light stages can be used to capture a volumetric image file of a person or object. IDEA will be working with technologists and content creators to identify the tools and workflows appropriate for each.

This sounds futuristic. Does technology exist today that supports this type of Immersive Media?

Almost. Virtual and Augmented reality head-mounted displays provide some of the features, but are limited. New multi-focal displays can provide much more realistic stereoscopic (3D) imaging by allowing your eyes to focus near or far. And new light-field displays—also known as holographic displays — are now being developed in research labs around the globe. IDEA intends to complete its work on immersive media standards in time for the commercial launch of advanced displays, networks and renderers that will enable the user experience.

So is IDEA developing image format specifications for these Light Field Displays?

Yes — the IDEA format specifications are intended to support light field displays, as the “highest common denominator” of the immersive experience. But we recognize that such displays are a few years away from hitting the market. Therefore the specifications will also provide near-term benefits on today’s display technology, including VR headsets and stereoscopic displays. Our key principle is that the IDEA interchange formats are display agnostic.

What do you mean by Display Agnostic?

The ITMF (Immersive Technology Media Format) and future IDEA specifications will represent the image as a full three-dimensional environment, including complex geometry, textures, multiple focal points and viewpoint dependent lighting and texture. This is all necessary for support of light field displays. But a subset of this data will provide the best possible experience on a VR headset (including six degrees of freedom), and other displays — such as AR, stereoscopic and even traditional televisions and mobile devices. The ITMF stream will be rendered via a “smart network” to a format appropriate given device.

Display agnostic also means that the format is open to all manufacturers. The same ITMF format is intended to support displays using different technologies, and from different manufacturers, in order to avoid incompatibility issues.

Synthetic 3D and volumetric images are now commonplace in computer games, movies and many other applications. How is what you’re doing different?

Most applications supporting 3D views are not holographic and require a specific display (e.g., stereoscopic or VR/AR) that rely on a 2D right eye / left eye view to create the illusion of depth. Sometimes these displays are augmented with eye tracking or movement tracking in hopes of presenting images appropriate to the viewing angle for the application. However, the limited resolutions, narrow field of view, poor optical quality, lack of opacity handling (for AR), problematic motion latency, and the inability for the eye to truly focus freely about the volume.

A light field creates the full spray of light within the viewing volume that allows the eye to focus on the objects presented. No gear is required. With a light field display you can see behind objects when you move your head. Parallax is maintained, reflections and refractions behave correctly, and the brain concludes that objects are “real.” To achieve this holographic realism, each pixel must contain scene information that changes depending on the viewing angle and location— just like the real world.

Will these Immersive Experiences be distributed to the home? How?

Lower quality experiences such as 360 video are already being delivered to the home from the network. However, in order to stream fully immersive experiences, we envision that the network itself can play a role in processing and computation. With the availability of light-field displays in the future, the network can be prepared deliver delightful immersive experiences using a multitude of displays.

IDEA will not only develop specifications for the Immersive Technologies Media Format, but it will also develop specifications to distribute ITMF over commercial networks, leveraging state-of-the-art IP networks (e.g. Cable 10G, WiFi 6, and 5G) for their speed, low-latency, and in-network compute functionalities.

What is a Scene Graph, and why is it helpful?

Scene graphs are used to structure a collection of nodes in a hierarchical “tree”. The node data can be based by vector-based graphics, point clouds, voxel maps or many other input sources. Live photographic images can be mapped into a scene graph as well, using techniques originally drawn from photogrammetry and lumigraphs. Yes – we know this is rather complicated sounding, which is why IDEA will be producing educational seminars in order to provide background and information about these established techniques, and how the ITMF format can leverage current practices.

How is audio being handled?

Sound is an extremely important part of the immersive experience. IDEA will leverage existing standards to represent a multi-dimensional soundfield associated with the image space, using technologies such as high-order ambisonics. This will be accompanied by object-based audio files with associated metadata, which will allow the positioning of selected sounds in relation with the viewing position. The ITMF framework includes the necessary position data to render this combination of ambisonic and object audio files to reinforce the immersive experience.