Comparing the Quality of Highly Realistic Digital Humans in 3DoF and
6DoF: A Volumetric Video Case Study
Shishir Subramanyam
*
Jie Li Irene Viola Pablo Cesar
CWI, Amsterdam, The Netherlands
Figure 1: Users Evaluating Realistic Digital Humans in 6DoF (left) and 3DoF (right)
A
BSTRACT
Virtual Reality (VR) and Augmented Reality (AR) applications have
seen a drastic increase in commercial popularity. Different repre-
sentations have been used to create 3D reconstructions for AR and
VR. Point clouds are one such representation characterized by their
simplicity and versatility, making them suitable for real time appli-
cations, such as reconstructing humans for social virtual reality. In
this study, we evaluate how the visual quality of digital humans, rep-
resented using point clouds, is affected by compression distortions.
We compare the performance of the upcoming point cloud compres-
sion standard against an octree-based anchor codec. Two different
VR viewing conditions enabling 3- and 6 degrees of freedom are
tested, to understand how interacting in the virtual space affects the
perception of quality. To the best of our knowledge, this is the first
work performing user quality evaluation of dynamic point clouds
in VR; in addition, contributions of the paper include quantitative
data and empirical findings. Results highlight how perceived visual
quality is affected by the tested content, and how current data sets
might not be sufficient to comprehensively evaluate compression
solutions. Moreover, shortcomings in how point cloud encoding
solutions handle visually-lossless compression are discussed.
Index Terms:
Human-centered computing—Human computer in-
teraction (HCI)—HCI design and evaluation methods—User studies;
—Interaction paradigms—Virtual reality;
1I
NTRODUCTION
Recent advances in capturing, media processing, and 3D rendering
technologies make VR/AR applications popular for mass consump-
tion [34]. In this new media landscape, point clouds are becoming
commonplace due to their simplicity and versatility. Still, the size of
dense point clouds is significant (a frame of roughly 1M points takes
around 19-20 MBytes), which need compression techniques before
transmission. This paper provides an exhaustive quality comparison
between different encoding configurations of digital humans, repre-
sented as point clouds. By investigating the differences in quality,
we provide insights about how to optimise the delivery for both
downloading and real-time communication. One key novelty of
this paper is to study the quality based on realistic consumption
conditions, in 3- and 6- Degrees of Freedom (DoF) scenarios.
*
e-mail: {S.Subramanyam, Jie.Li, Irene.Viola, P.S.Cesar}@cwi.nl
Avatars are a core part of VR applications like social communi-
cation [28], sports training [21], or healthcare [20]. A major line
of scientific work has focused on how to make such avatars more
realistic, interactive, and autonomous [10, 24, 33]. In this paper, we
focus instead on point clouds as a suitable representation for digital
humans based on tele-portation principles [25]. In this case, the
research problem is not so much how to render and animate them to
make them look more realistic, but how to transport them optimally.
Given current advances in technology, real-time delivery of point
clouds is becoming a realistic alternative; focusing the attention
of the research community [23] and industry [32] in encoding and
transmission. Still, given the massive number of points per repre-
sentation, decisions need to be taken regarding the delivery (type
of encoder, bit-rate) to ensure an acceptable quality of experience
depending on the viewing conditions (3DoF, 6DoF). This is the core
research question this paper answers.
Contributions of the paper are two-fold: 1) It provides a first
evaluation of the quality of highly realistic digital humans repre-
sented as dynamic point clouds in immersive viewing conditions.
Existing protocols [5, 7, 8,40, 42] did not consider the dynamic of
the point clouds, focused on one type of data set, and did not take
into account VR viewing conditions; 2) It provides quantitative sub-
jective results about the perceived quality of the contents, along with
qualitative insights on what is important for users in interacting with
digital humans in VR. Such results will help in better configuring
the network conditions for the delivery of points clouds for real-time
transmission, and have implications over ongoing research and stan-
dardisation work regarding the underlying compression technology.
Particularly, this paper extensively studies this current and rel-
evant area of research by proposing 1) a new evaluation protocol,
including the work to create dynamic point clouds for evaluation,
and 2) quality of experience results. These results are based on
an experiment with 52 participants, evaluating 72 stimuli based on
eight dynamic point cloud sequences. Each point cloud sequence
was compressed in four bit-rates, using two types of compression
techniques. These 72 stimuli were evaluated in two viewing con-
ditions (3DoF and 6DoF). The data gathered include rating scores,
presence questionnaires, simulator sickness reports, and time spent
watching the content. The results indicate that, while bit-rate savings
can be obtained by choosing one compression solution over another,
visually lossless compression has not been fully achieved by the
algorithms under evaluation, even at rather large bit-rates. Moreover,
the choice of content can have an impact on how users rate its quality,
influencing the discriminating power of the selected protocol.
127
2020 IEEE Conference on Virtual Reality and 3D User Interfaces (VR)
2642-5254/20/$31.00 ©2020 IEEE
DOI 10.1109/VR46266.2020.00-73