Towards Tangible Cultural Heritage Experiences -- Enriching VR-Based Object Inspection with Haptic Feedback
TTowards Tangible Cultural Heritage Experiences - EnrichingVR-Based Object Inspection with Haptic Feedback
STEFAN KRUMPEN,
Institute of Computer Science II, University of Bonn, GERMANY
REINHARD KLEIN,
Institute of Computer Science II, University of Bonn, GERMANY
MICHAEL WEINMANN,
Institute of Computer Science II, University of Bonn, GERMANY
Fig. 1. Illustration of our VR-Based Object Inspection system with haptic feedback.
VR/AR technology is a key enabler for new ways of immersively experiencing cultural heritage artifacts based on theirvirtual counterparts obtained from a digitization process. In this paper, we focus on enriching VR-based object inspection byadditional haptic feedback, thereby creating tangible cultural heritage experiences. For this purpose, we present an approachfor interactive and collaborative VR-based object inspection and annotation. Our system supports high-quality 3D modelswith accurate reflectance characteristics while additionally providing haptic feedback regarding the object shape featuresbased on a 3D printed replica. The digital object model in terms of a printable representation of the geometry as well asreflectance characteristics are stored in a compact and streamable representation on a central server, which streams the data toremotely connected users/clients. The latter can jointly perform an interactive inspection of the object in VR with additionalhaptic feedback through the 3D printed replica. Evaluations regarding system performance, visual quality of the consideredmodels as well as insights from a user study indicate an improved interaction, assessment and experience of the consideredobjects.CCS Concepts: •
Computing methodologies → Virtual reality ; Perception ; Reflectance modeling . Authors’ addresses: Stefan Krumpen, [email protected], Institute of Computer Science II, University of Bonn, Endenicher Allee 19a,Bonn, GERMANY, 53115; Reinhard Klein, [email protected], Institute of Computer Science II, University of Bonn, Endenicher Allee 19a,Bonn, GERMANY, 53115; Michael Weinmann, [email protected], Institute of Computer Science II, University of Bonn, Endenicher Allee19a, Bonn, GERMANY, 53115.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided thatcopies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the firstpage. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copyotherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions [email protected].© 2021 Association for Computing Machinery.XXXX-XXXX/2021/0-ART0 $15.00https://doi.org/XX.XXXX/XXXXXXX.XXXXXXX ACM J. Comput. Cult. Herit., Vol. 0, No. 0, Article 0. Publication date: 2021. a r X i v : . [ c s . H C ] F e b :2 • Krumpen et al. Additional Key Words and Phrases: tangible cultural heritage, reflectance, bidirectional texture functions, haptic feedback, 3Dprinting
ACM Reference Format:
Stefan Krumpen, Reinhard Klein, and Michael Weinmann. 2021. Towards Tangible Cultural Heritage Experiences - EnrichingVR-Based Object Inspection with Haptic Feedback.
ACM J. Comput. Cult. Herit.
0, 0, Article 0 ( 2021), 19 pages. https://doi.org/XX.XXXX/XXXXXXX.XXXXXXX
The potential of the rapidly emerging VR/AR technology lead to a new paradigm of presenting cultural heritagecontents to the public and opens new opportunities regarding their analysis by expert and non-expert users.Instead of an inspection and analysis solely based on depictions in static images, presenting digitized CH artifactsin terms of fully immersive and interactive experiences comes into reach. The ability of creating an immersiveexperience of digitized artifacts is particularly relevant for the less stressful analysis/inspection of highly fragileobjects, that may easily take damage during inspection, analysis or transport. Furthermore, several artifacts aresusceptible to certain environment conditions such as temperature, humidity or illumination conditions. Hence,it is difficult or even impossible to include them in public exhibitions and complicates their use for scientificor educational purposes. Instead, using accurately digitized virtual counterparts allows avoiding the use of thereal artifact during inspection, analysis and transport and, additionally, opens new possibilities for simultaneousartifact inspection such as required for virtual museum experiences or collaborative analysis by experts situatedat different physical locations. However, the realization of such a virtual, optionally collaborative, object analysisand annotation depends on several crucial factors that have to be met to allow a pleasing experience whilepreserving as much of the aura of CH contents [2] as possible. These challenges include the accurate object/scenedigitization and representation, where both geometric object details as well as characteristic details of materialappearance like reflectance behavior at scratches, engravings, etc. have to be captured, as well as intuitiveinteraction metaphors. State-of-the-art reflectance representations based on memory-consuming bidirectionaltexture functions (BTFs) [25, 43, 44, 49, 50] allow for an accurate virtual model of an object, but require powerfulhardware for real-time rendering, whereas analytical reflectance models come at less computational effort andfaster rendering. This comes at the cost of often not sufficiently capturing relevant reflectance characteristicssuch as light exchange at fine structural details or local subsurface scattering. Furthermore, the presentation ofdigitized contents on VR/AR devices imposes severe constraints regarding real-time rendering at frameratesbeyond 60 Hz on the visualization devices, instant/high-speed data exchange and interaction mechanisms toachieve a realistic impression for the joint analysis and interaction with the object or scene. Despite this progresson representing and visualizing digitized objects, their visualization on screens or VR/AR glasses is impacted bythe interaction devices (i.e. mouse, keyboard, controller, etc.) that limit the experience regarding the intuitivecanonical interaction in terms of touching and rotating an object in your hands. In this paper, we focus onenhancing the perceived aura of the digitized objects by augmenting the user experience with sensual informationbeyond purely sight-related data and providing additional haptic feedback regarding the object shape. For thispurpose, we present a practical tool for collaborative remote object analysis and annotation involving both theaccurate depiction of the digitized artifacts at interactive framerates within VR devices and haptic feedbackbased on a 3D printed replicate to improve the inspection experience. Using current VR devices, several expertslocated at possibly different physical places access and interact with high-quality virtual counterparts of realartifacts, where the reflectance data and the object geometry are stored on a central server and streamed to theremote clients as needed. Our evaluation demonstrates that providing haptic feedback regarding the object shapeenhances VR-based interaction, assessment and experience of the considered objects. In summary, the maincontributions of this work are:
ACM J. Comput. Cult. Herit., Vol. 0, No. 0, Article 0. Publication date: 2021. owards Tangible Cultural Heritage Experiences • 0:3 • We present a framework uniting the concepts of digital material appearance, VR technology and 3D printingthat is designed for the needs of enhanced remote collaborative inspection of cultural heritage contents. • Our interactive multi-client inspection and annotation tool allows multiple users at different physical placesto collaboratively analyze objects based on various interaction metaphors for annotation and illuminatingthe object. • We demonstrate the potential of our approach in the scope of a user study.
Immersive Content Presentation.
Facilitating the access to CH contents by digital representations has beenapproached in various forms. As the most obvious form given by still images or videos limits the user experienceto pre-specified viewpoints and environment conditions (including the lighting conditions), the full aura ofCH content is not preserved. Instead, digitizing the 3D geometry based on laser scanning or structured lightscanning and additionally capturing a digital representation of the reflectance behavior allows the manipulationof the environment and lighting conditions. Beyond interactive screen-based visualizations, a higher degree ofimmersion is achieved based on interactive 3D experiences of the CH objects/scenes. The range from real-worldrepresentations/environments to completely virtual representations/environments has been discussed under thenotion of reality-virtuality continuum [30]. For a detailed discussion of respective developments in the area ofcultural heritage, we refer to the survey of by Bekele et al. [1].In VR settings, users are completely immersed into synthetic or pre-captured worlds that differ from the actualsurrounding [3, 57]. Among the major applications for experiencing, exploring and manipulating CH contentsare virtual museums and education. Augmented Virtuality aims at augmenting such completely virtual worldswith live real-world contents. In contrast, Augmented Reality (AR) enriches the actual real-world surroundingwith synthetic contents as used for exhibition enhancement, education or exploration.For all these approaches, interactions with the scene and its objects are realized based on the interplay betweenvisual depictions and the control devices of VR/AR equipment. However, these interfaces based on conventionalVR/AR controllers, gamepads or gloves do not transport tactile feedback when interacting with objects. Seminalwork on tangible interfaces has been presented in terms of the Tooteko framework [5] where facade reliefs werealso presented to users as 3D printed objects lying on a table to allow haptic experience. Touching different surfaceparts was coupled to respective audio feedback. In contrast, our work extends VR-based 3D object inspection byadditional haptic feedback based on a 3D-printed replica.
Physical Interactions in VR.
Early studies have demonstrated humans’ ability to gauge movements of their ownhands relative to their own bodies. This proprioceptive sense [31] has been explored in terms of body-relativeinteraction types including the direct manipulation of objects as if these were in the user’s hands, physicalmnemonics (i.e. 3D body-relative widgets) and gesture commands. Further work focused on providing a feedbackto the user based on the shape of physical objects. In such passive haptics approaches [28], physical objectsare aligned with objects in virtual environments which has been shown to enhance user experience of virtualenvironments [22]. In contrast to providing a passive haptic feedback for static phenomena in the scene likeascending/descending stairs [35], the interaction with dynamic objects relies on carefully tracking the respectivereal-world counterparts. This can be achieved based on special gloves [41] as well as attaching tracking modulesto physical objects [10, 15]. Our approach also exploits passive haptics in terms of a 3D printed virtual counterpartof the object of interest, which is attached to a tracking module of typical VR equipment. In contrast to previouswork, our work focuses on both the accurate visual reproduction of objects within the virtual environments andthe haptic feedback regarding the object’s shapes.
ACM J. Comput. Cult. Herit., Vol. 0, No. 0, Article 0. Publication date: 2021. :4 • Krumpen et al.
Appearance Modeling.
Object appearance, i.e. the observed colors and textures, results from the complex-ity of light exchange at the surface depending on surface geometry, material reflectance characteristics andthe surrounding illumination conditions. As both the human visual system and acquisition devices are onlycapable of observing material appearance depending on the coupling of these three modalities, appearancemodeling inevitably requires a decoupling of the respective modalities. Respective acquisition strategies havebeen intensively discussed in literature [14, 53, 55]. While the geometric structure can be accurately recoveredfor a wide range of materials (i.e. diffuse to moderately specular surfaces) using structured light techniques orlaser scanners [53], the separation of reflectance and illumination properties remains challenging. High-qualityobject digitization of smaller figurines is therefore mostly carried out in lab environments/using setups withcontrollable illumination such as gonioreflectometers [9, 20] or camera arrays [17–19, 23, 38, 39, 49, 50, 52].Exploiting controllable illumination allows to accurately model reflectance in terms of parametric spatiallyvarying bidirectional reflectance distribution (SVBRDF) models [9, 18, 19, 21, 23, 26, 27, 36–40, 46] or data-drivenbidirectional texture functions [6, 17, 25, 49, 50] that both represent spatially varying surface reflectance depend-ing on view and illumination conditions. SVBRDFs are parameterized over the exact 2D surface of the objectgeometry and usually the local surface reflectance behavior is represented in terms of an analytical function.This makes them particularly suitable for specular materials, where the highlights may be lost with a tabulatedrepresentation. However, as the accuracy of the scanned geometry is still limited regarding fine structural details,the light exchange may not be accurately reproduced in such cases. In contrast, BTFs are parameterized over anapproximate object surface. Hence, subtle effects of light exchange including interreflections, self-shadowing andself-occlusions induced by finer structures not preserved in the reconstruction as well as local subsurface scatter-ing are directly stored in terms of a data-driven image-based representation. The latter results in significantlyincreasing computational and memory requirements in comparison to parametric representations. Whereasstandard BTFs are parameterized over 2D surface domains which suffers from distortions or seam artifacts, the3D parameterization of OctreeBTFs [25] avoids distortions and significantly reduces the number of seam artifacts.While our framework is not restricted to a particular representation, we employ the data-intensive OctreeBTFrepresentation [25] to demonstrate that even such expensive representations can be used for interactive objectinspection and annotation. In addition to a purely visual (in our case VR-based) experience, we provide additionalhaptic feedback to enhance the user experience. Further digitization methods in the context of cultural heritageare based on reflectance transformation imaging (RTI) [8, 11, 12, 29, 32, 33, 42]. However, these techniques usuallycapture the lighting-dependent reflectance of an object only from a single view. Hence, RTI techniques are notadequate for in-hand object inspection scenarios, where the view-dependent object appearance has to be takeninto account as well.
Rendering and Streaming of Appearance Data.
Collaborative object inspection by users at different phyiscallocations relies on efficient model representations, fast data streaming as well as rendering. Storing raw BTFmeasurements comes at huge memory requirements of several tens to hundreds of gigabytes. These can becompressed based on the combination of fitting a mixture of several SVBRDFs to the BTF data and residualapparent BRDFs (ABRDFs) [56]. The latter are required to take non-local effects of the light transport in thematerial into account as otherwise the quality of the reproduced reflectance behavior may be severely impacted.As an alternative, several compression techniques rely on interpreting the BTF as a tensor for which a low-rankapproximation can be found via factorization. Respective techniques include Full Matrix Factorization (FMF)[24], Decorrelated-FMF (DFMF) [48] and K-SVD based compression [45], that are capable of preserving details ofmaterial appearance and allowing real-time rendering. However, the K-SVD approach outperforms the FMF-basedapproaches by a compression factor of about 3 to 4 for comparable quality. Furthermore, investigations onperceptually-inspired BTF compression leveraged the observation that lower downsampled levels of detailssufficiently approximate some of the factorized data, thereby reducing the memory requirements on the GPU
ACM J. Comput. Cult. Herit., Vol. 0, No. 0, Article 0. Publication date: 2021. owards Tangible Cultural Heritage Experiences • 0:5
Digitization + Client C + Client C n + Client C Pre-processing Server Object Geometry Reflectance Characteristics … …
Compression C C C n … Streaming Progress …
3D Printable Geometry
Fig. 2. Overview of our approach towards sharing tangible cultural heritage experiences. The initial digitization outputs theobject geometry, the reflectance characteristics in terms of an OctreeBTF [25]. Then, a pre-processing step (upper center)computes a version of the geometry suitable for 3D printing and converts the BTF data into a streamable version which isloaded by a server (bottom center). The server then streams the compressed OctreeBTF to remote clients that can interactivelyinspect the object in VR, enhanced by haptic feedback provided by the 3D printed replica of the object. wich led to compression factors of about 500. Further BTF compression schemes rely on the use of neuralnetworks [43, 44], where the compression and decompression are achieved based on an autoencoder approach, oron statistical approaches [13], that allow extreme compression at the cost of loosing the capability to accuratelyreproduce fine characteristics of reflectance behavior, as well as vector-quantization-based approaches [16]or the use of multivariate radial basis functions [51], that both reach compression rates similar to FMF-basedcompression. Aside from BTF compression to allow for real-time rendering, scenarios where BTFs have to bestreamed over the internet also should allow for a progressive refinement of the rendering. This can be achievedby leveraging the fact that the used DFMF-compression orders the BTF data with respect to its importance tostream a BTF over the internet and progressively refine the rendering when more data arrives [48]. Furtherwork has shown that the spatial data parts of a 2D-parameterized BTF can be split into smaller tiles that can bestreamed to the GPU independent of each other [47], which enables only loading the BTF data for the objectregions that are currently visible on the screen. In turn, this enables rendering scenes containing of multipleBTFs in real-time, which would otherwise exceed the amount of VRAM available if each BTF would have to beloaded completely without streaming. In our system, we combine these ideas [47, 48] to allow for a progressiverefinement of the rendering of OctreeBTFs [25], while streaming the most important data, i.e. the data thatcontributes most to the final result first.
Our approach for the joint immersive, tangible experience of CH contents by multiple users at possibly differentphysical locations relies on a a client-server model as illustrated in Figure 2. Inputs in terms of the geometricmodel as well as the reflectance characteristics (for the purpose of an accurate object representation, we use theOctreeBTF representation [25]) of the considered models have to be obtained from a prior digitization process or a
ACM J. Comput. Cult. Herit., Vol. 0, No. 0, Article 0. Publication date: 2021. :6 • Krumpen et al. model repository. In an initial step, we perform a pre-processing of these data by computing a 3D printable versionof the 3D shape and transform the OctreeBTF into a more compressed representation that allows for progressivestreaming over a network. The resulting object representation (i.e. in terms of geometry and reflectance) is storedon the server and streamed to each remote client which renders the object on a HMD and allows for interactiveinspection enhanced with haptic feedback from a 3D printed replica. In the following sections, we discuss theinvolved components in more detail. As we address the generation of highly immersive experiences of culturalheritage objects, the objects need to be accurately captured so that characteristic details that determine theobject appearance remain preserved in the digital model. State-of-the-art capture techniques [14, 53, 55] rely onan initial 3D scanning of the object geometry using laser scanners or structured light systems. Subsequently,spatial variations of local surface appearance are captured under different viewing and lighting conditions.Depending on the surface material characteristics, the dependence of the local reflectance behavior 𝑓 ( x , 𝝎 𝑣 , 𝝎 𝑙 ) at a surface point x , the viewing direction 𝝎 𝑣 and the lighting direction 𝝎 𝑙 can be represented using SpatiallyVarying Bidirectional Reflectance Distribution Functions (SVBRDFs) [37] or Bidirectional Texture Functions(BTFs) [6]. Our framework is not restricted to one particular representation. SVBRDFs as captured in severalother investigations [18, 19, 21, 23, 26, 27, 36, 38–40, 46] could also be integrated. However, to demonstrate theefficiency of our approach, we conduct our experiments with the computationally more demanding OctreeBTFrepresentation [25] since it delivers the best visual quality of the representations mentioned above. The OctreeBTFrepresentation relies on parameterizing the reflectance representation in a volumetric manner instead of a 2Dparameterization over the surface which ameliorates seam artifacts and texture distortions. We use datasetscaptured with a camera array [48, 50] for which surface geometry has been reconstructed using a structuredlight system [54] and Octree-BTF representations have been provided by the authors of [25]. In addition to thereflectance characteristics, the capturing process also outputs the detailed geometry of the object. In a first step, the captured 3D geometry is converted to a version that is suited for 3D printing. In particular,if the capturing process does not output a closed mesh, possibly occurring holes in the geometry have to befilled. Afterwards, the model is scaled to a size that fits on the tracker-object (see Figure 3), that is part of the VRsystem and used to track the printed object pose. Optionally, if the object does not have a flat bottom surface orthe surface does not cover the extends of the tracker, a socket can be added in order to mount the object on thetracker, as can be seen in Figure 3. Furthermore, we have to consider the fact that the typical capturing processoutputs the OctreeBTF in a way which is well-suited for real-time rendering, but does not allow for (progressive)streaming over a network due to data size and data layout. Hence, before we store the OctreeBTF on the server,it has to be further compressed and converted to a representation that both allows for progressive streamingand real-time rendering. Therefore, the BTF representation is split into parts that solely depend on the spatialposition x and the light- and view-directions ( 𝝎 𝑣 , 𝝎 𝑙 ) respectively. These parts are further divided into chunks that represent different levels of the octree corresponding to levels of detail of the reconstruction. The resultingdata chunks as well as the geometry are compressed by using Zstandard [4] and ordered in a scheme that allowsthe client to start rendering when the first few chunks have been transmitted, and to progressively refine therendering as more chunks arrive. For a more detail on the BTF compression, how the data is divided into thespatial and angular parts and how the chunks are ordered, we refer to the supplemental material. Initially, the server loads the compressed BTF data chunks and the geometry and then waits for clients to connect.Each new client first receives general information about the object, followed by the geometry and the BTF datachunks in the order discussed in the supplemental material. The server keeps track of the chunks transmitted to
ACM J. Comput. Cult. Herit., Vol. 0, No. 0, Article 0. Publication date: 2021. owards Tangible Cultural Heritage Experiences • 0:7 each client, allowing multiple clients to connect at different times. Whenever a client places an annotation on theobject, the annotation is forwarded to all other clients, and is also stored on the server, so that clients connectingat a later time also receive all annotations already present.
At the client side, we use an off-the-shelf VR-system consisting of a HMD, two controllers and an additionaltracker (see Figure 3 right), that can be attached to an object to include said object into the virtual world. Toallow insights on how the haptics-enriched VR experience improves on a purely screen-based experience, we alsocreated a Non-VR version of the application (in the following denoted as the "‘2D version"’), where the objectand the light are controlled by moving the mouse.
Tracking.
The VR-System is set up in such way that we can use a simple tracker for the object, which is heldin one hand, and a controller for the light and menu interactions held in the other hand. The tracker features astandard 3/4 inch mount used for cameras that we use to fix the printed object on the tracker, which has thebenefit of allowing to quickly exchange the objects. We utilize the trackpad of the controller to control the lightintensity and the object scale. Each rendered frame, the tracker, the controller and the HMD send their positionsand orientations to the application, which then adapts the scene to match the positions in the real world.
Rendering and Streaming.
When the client has connected to the server, a placeholder geometry is rendereduntil the real geometry has been received and decompressed. In general, decompression is performed in separatethread on the CPU, and the uncompressed data is transferred to the GPU in an asynchronous manner, allowingfor seamless updates of geometry and reflectance information without stuttering, which is specifically importantwhen rendering on VR-headsets. When a chunk has been uploaded to the GPU, the render-thread is signaled toupdate the rendering according to the now loaded data. Due to the iterative streaming of BTF data, the user canalready view the object and place annotations on it, while the rendering is refined as more data arrives.
To provide the user with haptic feedback, we chose to print the model using an off-the-shelf 3D-printer. Thisprinter uses the Fused filament fabrication (FFF, also known as fused deposition modeling, FDM) technique,where a thermoplastic material is heated to the point where it melts, and deposited by a moving printhead in aper-layer fashion. The advantage of that technique is, that over the last years, printers have become more andmore affordable and easy to use, while the quality of the printed objects increased. As we want to attach theobjects to the tracker, we add a baseplate to the object with a hole for a screw. Printed objects are shown inFigure 3. Furthermore, the objects have to be scaled down to fit the tracker, since a large object would occludethe tracker too much, thereby decreasing the tracking accuracy.
The major challenge for the design and development of interaction metaphors are given by the specification ofdesirable interaction possibilities with objects as well as their intuitive realization based on hardware equipment.To allow an immersive object visualization, we leverage the potential of current VR devices and extend theinteraction possibilities by adding haptic object experience. In the following, we provide details regarding theused control mechanisms for handling the object of interest as well as changing illumination conditions as wellas adding annotations. The respective controls can easily be adapted for right-handed and left-handed persons(in the following, we will only describe the typical setting for right-handed persons).
Controls for Object Handling.
As mentioned in Section 3.4, we create a haptic feedback during object explorationin VR by 3D printing the respective object geometry and attaching it on a VR tracking device (we use the HTC
ACM J. Comput. Cult. Herit., Vol. 0, No. 0, Article 0. Publication date: 2021. :8 • Krumpen et al.
Fig. 3. Left and middle: Printed objects, right: Object mounted on the tracker.
Vive tracker, to which the printed object is attached). To allow a comparison regarding how haptic feedback fromthe printed object enhances the user experience, we also implemented a second mode where the digital object is(virtually) attached to one of the standard controllers, i.e. this interaction is solely determined by how the VRcontroller is moved and does not provide any haptic feedback. The application is implemented in a way so thatwe can switch between these two modes at any time directly within the application. In addition, we allow theobject to be scaled in the virtual environment based on the trackpad of the controller assigned to the right hand.
Light Controls.
An important aspect for object exploration is given by the respective illumination conditions.Besides showing how an object will look like within a certain environment, which we address by environmentlighting [7], our viewer also supports the direct lighting of an object via a user-controlled light source during itsinspection. When using the direct lighting mode, the controller in the right hand is used to control the main lightsource that can be either a point light, or a spotlight to simulate a flashlight as commonly used to inspect objects.The intensity of the light source can be changed via the controller. In case of environment lighting, we use aspherical environment map to define the incoming light from every direction. However, as correct image-basedlighting is not possible for BTFs in real-time, we approximate the environment map by a series of eight directionallight sources.
Annotation Tools.
During artifact inspection, users often need to annotate respective object parts. For thispurpose, our client application allows users to place annotations on the object by moving the second controller(i.e. the one where the object is not attached to) over a spot on the object, and pressing a button on the controller.Users can thereby annotate the object either by placing a simple marker highlighting a spot to direct other(possibly remotely connected) users to inspect this in more detail, or, if more information is needed, a short anddescriptive text can be added to the annotation using a virtual keyboard. Annotated points on the object aredisplayed as small spheres and the color of the spheres reflects the type of annotation, i.e. whether its purpose isthe highlighting of that particular location or region to other users or whether additional text information isstored at the respective location. The latter texts are only visible to a user if she moves the controller over ornext to the sphere to avoid continuously occluding the object by the text. All added annotations are sent to theserver, which, in turn, forwards them to other connected clients. The display of annotations can be activated ordeactivated at any time.As a further functionality, we allow the user to draw on the object by continuosly placing points on the objectsurface while a certain controller button is pressed. This could be used for marking larger features on the object.
ACM J. Comput. Cult. Herit., Vol. 0, No. 0, Article 0. Publication date: 2021. owards Tangible Cultural Heritage Experiences • 0:9
Fig. 4. View on a person operating the system.
To demonstrate the potential of our approach, we provide a perceptual evaluation obtained in terms of a userstudy as well as evaluations regarding visual quality and performance. In addition, we compare our enhanced VRsystem to a purely 2D screen-based application with the same features.
To verify whether the combination of VR-based object visualization as well as haptic feedback actually enrichesthe experience of inspecting cultural artifacts, we performed a user study where the participants had to inspectan object and perform certain tasks, such as annotating spots or describing materials or reflectance behaviorof the respectively considered object. All participants were naive to the goals of the experiment, providedinformed consent, reported normal or corrected-to normal visual and hearing acuity. During the experiment, theparticipants had to perform the given tasks in three different modes: • In the first mode 𝑀 , the 2D application was used where the user can manipulate the object displayed on astandard screen based on mouse/keyboard interactions. • In the second mode 𝑀 , users could interact with our novel VR-based application without haptic feedback(i.e. the virtual model was attached to the second controller of the VR-system instead). • In the third mode 𝑀 , users were able to interact with the objects based on our VR-application and theprinted model to allow for an additional haptic feedback.To avoid biases, we varied the order of these modes for individual participants, i.e. the participants startedwith different modes. For each condition, the participants were asked to provide ratings on a 7-point Likertscale regarding aspects such as visual quality, adequacy for inspection, quality of object experience and easeof interacting with the object, performing annotations or controlling view and light conditions. The resultsshown in figure 5 demonstrate that the approaches 𝑀 and 𝑀 outperform the 2D application 𝑀 in terms of theease of inspecting the object and controlling light conditions and, hence, result in a better overall experience ACM J. Comput. Cult. Herit., Vol. 0, No. 0, Article 0. Publication date: 2021. :10 • Krumpen et al.
Fig. 5. Statistical results of our user study. Statistics regarding ratings obtained for mode 𝑀 are colored in red, the onesfor mode 𝑀 in green and the ones for mode 𝑀 in blue. The illustration includes lower and upper fence (dashed line),interquartile range (colored bar), median (marked with a vertical bar), outliers (marked with • ) as well as the average value(marked with × ). For most aspects, the VR-based modes 𝑀 and 𝑀 outperform the 2D application ( 𝑀 ) with keyboard-mousecontrols. The additional use of haptic feedback of the object geometry further improves the quality of object experience andassessment as well as view control and leads to the best overall ratings. and intuitiveness regarding the interactions, which may be a result of the higher intuitiveness of hand-basedinteractions with objects. The latter represent a kind of canonical approach for us humans to access real 3D objectsin our daily lifes. In comparison to the depiction of objects on a screen and keyboard/mouse based interactions( 𝑀 ), users particularly acknowledged the easiness of controlling the viewpoint based on VR-based systems 𝑀 and 𝑀 . Furthermore, users seemed to benefit from additionally using a printed replica ( 𝑀 ) regarding the viewcontrol. Mode 𝑀 additionally also received the best ratings regarding the quality of object assessment and objectexperience which were likely to also result in the best overall ease of assessment. However, the resolution of the2D application 𝑀 was rated higher than for the VR modes 𝑀 and 𝑀 , which is a result of the limited resolutionof the used HMD. Finally, we expect the lower scores regarding the annotations observed for Modes 𝑀 and 𝑀 in comparison to 𝑀 to be the result of the absence of a standard keyboard for entering text. Here, the use of avirtual keyboard, where a laser-pointer is used to target each letter individually would have to be exchangedby a respective speech recognition system to rapidly allow adequate text annotations. However, as this aspectdoes not touch the main insights to be gained in the scope of this investigation – that the use of haptic feedbackwithin a VR-based object exploration improves the object experience – we leave it for future work. In a further experiment, we focused on the analysis whether the improved intuitiveness of interaction reportedin our previous study can also be seen in the times required for users to perform certain tasks, that involve aninteraction with the object such as rotating it. To further evaluate if providing the exact geometry for hapticfeedback benefits the interaction, we also provided a simple geometry in terms of a 3D printed cube which was
ACM J. Comput. Cult. Herit., Vol. 0, No. 0, Article 0. Publication date: 2021. owards Tangible Cultural Heritage Experiences • 0:11
Fig. 6. Comparison of interaction times for marking a fixed series of spots on an object as well as drawing strokes alongcharacteristic object features. Interaction times for 2D application ( 𝑀 ) are shown in red, times for 𝑀 are given in green,values for the VR mode with haptic feedback ( 𝑀 ) are given in red the new mode 𝑀 is printed in yellow. mounted on the tracker (denoted as 𝑀 ). In this experiment, the participants were first asked to highlight a fixedseries of spots on the object by clicking on the object and placing an annotation. To separate the impact of theobject interaction from the textual annotation, we only focused on the times to mark the respective surface pointsand discarded the entering of text information for this experiment. Each participant performed this task in allthe modes 𝑀 , 𝑀 , 𝑀 described in Section 5.1 as well as the new mode 𝑀 , starting with alternating modes toeliminate biases. We constructed several series of points the users had to annotate, and alternated the seriesbetween the modes and participants to eliminate learning effects. The series of spots were defined in such a way,that the object has to be rotated in order to annotate the next spot, and when done, the participants were asked torotate the object back to its original position. In addition, we let the users draw strokes along characteristic objectfeatures (neck, face and legs of the Buddha figurine) in all four modes. The respectively required interactiontimes are displayed in Figure 6 and indicate that providing any form of immersive presentation is beneficial fornavigation when the object has to be rotated in order to perform a certain task. Further providing haptic feedbackregarding the object shape gives an additional small benefit, that can be seen in the variance of the interactiontimes. As we considered a reflectance representation in terms of OctreeBTFs [25], there are no seams and distortionsthat affect the visual quality in comparison to the use of conventional BTFs parameterized over the 2D objectsurface. However, we had to turn off the spatial filtering of OctreeBTFs in the VR mode to meet the performancerequirements for interactive VR-based visualization. Despite this, the limited OctreeBTF resolution only becomesvisible when the object is held very close to the HMD. In Figure 7, we show a set of screenshots from the 2Dversion of the application, where different objects are placed in different environment lighting and some surfacepoints have additionally been annotated by a user. Furthermore, the dependence of object appearance undervarying illumination conditions (in this example, the light source is moved) is shown in Figure 8.
To analyze the performance of our approach, we measured the average frametimes over a period of about 30seconds during a typical interaction scenario as seen in Figure 4, both using the 2D application and the VR version.When rendering an OctreeBTF, or BTFs in general, the main bottleneck is the amount of data that has to be readfrom the VRAM each frame, which means that rendering is mainly bandwidth limited. The amount of data mainlydepends on the number of screen pixels covered by the object, which determines how much of the spatial BTFdata has to be fetched from the VRAM, and the number of light sources used to approximate the illumination aswe need to sample the angular data for each light source. To get insights regarding how much the additional lightsused to represent the environment (in our experiments, we approximated environments using eight light sources)
ACM J. Comput. Cult. Herit., Vol. 0, No. 0, Article 0. Publication date: 2021. :12 • Krumpen et al.
Fig. 7. Screenshots of the 2D application where two objects are lit by different environments with the user-controlled lightbeing disabled. In the right images some object parts have been annotated by the user. Please note that the textsize has beenincreased while capturing the screenshots for better readability. In the real application, the texts are much smaller.Fig. 8. A series of screenshots of the 2D application where the environment lighting is disabled and the user-controlled lightis moved on an arc above the object, showing how the reflectance changes according to the light position.
ACM J. Comput. Cult. Herit., Vol. 0, No. 0, Article 0. Publication date: 2021. owards Tangible Cultural Heritage Experiences • 0:13
Table 1. Average frametimes [ms] and frames per second [Hz] for both the 2D application and the VR applications. Thevalues were measured during a typical interaction scenario.
2D VR
With environment 18.18 / 55 28.50 / 35Without environment 14.28 / 70 18.18 / 55affect the performance, we performed the measurement once without considering environment lighting andonce with the respective environment. When interacting with the objects, we additionally took typical distancesfor close-range inspection into account to provide a realistic scenario, which means that a large fraction of thescreen pixels of the virtual camera of the 2D application or, in the VR case, the HMD are covered. As shown inTable 1, frametimes are as expected higher when rendering in VR, even exceeding 28ms, leading to 35 framesper second, i.e. a slight stuttering of the rendering, due to the increased resolution and the fact that we have torender the scene twice to allow the stereo impression. However, interestingly, this aspect was not perceived to bedisturbing by the participants during the user study (see Section 5.1). We further noticed that after the BTF isfully streamed, and the last octree level has been uploaded to the GPU, the frametimes increase significantly.This is due to the fact that at this point there is an increase in the amount of data the GPU has to fetch from theVRAM for each pixel. In addition, we observed that the frametimes decrease again when the object is really closeto the camera. This happens when one voxel of the octree covers more than one screen pixel, which means thatthe overall number of visible voxels decreases as well as the amount of data the GPU needs to fetch from theVRAM. Then, the data can be cached between shader invocations that process the same voxel. But for this effectto occur, the object must be held really close to the camera, especially in VR, and the spatial resolution of the BTFbecomes visible.
Despite the benefits of allowing an improved, tangible remote object experience with particularly better capabilitiesregarding interactive object inspection and object assessment, our approach also faces limitations. One majorlimitation is the resolution of current VR-HMDs, which is an issue that will hopefully be resolved in the nearfuture by the next generations of HMDs. On the other end, higher display resolutions also demand for higherresolution of the used OctreeBTF, which, in turn, requires for more powerful GPUs with more video memory,and better compression techniques for real-time streaming. Also, the used 3D printing process has some limitingfactors. The use of an FDM-printer, which prints objects layer by layer, causes these layers to be perceivablewhen touching the object and probably impact the haptic feedback regarding fine geometrical features of theobject. A solution would be to use a SLA-printer, which can produce much thinner layers and thus can betterpreserve small structures of the object. However, we only expect a small impact on the perception in the scenariosshown in the scope of our evaluation, as the respective object features were easy to perceive, and that the overalltendency of an improved tangible object experience will only be slightly improved. Furthermore, most 3D printersare limited to one material which is usually plastic, but also wood like materials are possible. Some printershave multiple extruders, which allow for multi-material prints, but these materials are all of the same kind, e.g.one cannot mix plastic and wood-materials. We hope that in the future, these issues can be resolved by moreadvanced printing technology, and, hence, do not consider them to be permanent limitations of our overallapproach towards creating tangible object experiences.
ACM J. Comput. Cult. Herit., Vol. 0, No. 0, Article 0. Publication date: 2021. :14 • Krumpen et al.
We have presented an approach towards tangible cultural heritage experiences. Our method allows interactiveand collaborative VR-based object inspection and annotation based on high-quality 3D models with accuratereflectance characteristics, while additionally providing haptic feedback regarding the object shape features basedon a 3D printed replica. As demonstrated by our user study, the additional haptic feedback enriches VR-basedinteraction, assessment and experience of the considered objects, which, in turn, indicates the potential of ourapproach. While the navigation speed is on par with alternative VR-based interaction methods, providing hapticfeedback regarding the object shape enhances the overall experience. In future work, we would also see thepossibility to circumvent the use of a hardware tracking device, e.g. based on printing patterns used for trackinginto the geometry.
REFERENCES [1] M. K. Bekele, R. Pierdicca, E. Frontoni, E. S. Malinverni, and J. Gain. 2018. A Survey of Augmented, Virtual, and Mixed Reality forCultural Heritage.
J. Comput. Cult. Herit.
11, 2 (2018), 7:1–7:36.[2] W. Benjamin. 1935. The Work of Art in the Age of Mechanical Reproduction (written 1935 as
Das Kunstwerk im Zeitalter seinertechnischen Reproduzierbarkeit ). (1935).[3] J. Carmigniani, B. Furht, M. Anisetti, P. Ceravolo, E. Damiani, and M. Ivkovic. 2011. Augmented Reality Technologies, Systems andApplications.
Multimedia tools and applications
51, 1 (2011), 341–377.[4] Y. Collet and C. Turner. 2016. Smaller and Faster Data Compression with Zstandard. https://code.fb.com/core-data/smaller-and-faster-data-compression-with-zstandard/. Accessed: 2019-01-29.[5] F. D’Agnano, C. Balletti, F. Guerra, and P. Vernier. 2015. Tooteko: A Case Study of Augmented Reality for an Accessible Cultural Heritage.Digitization, 3D Printing and Sensors for an Audio-Tactile Experience.
The International Archives of Photogrammetry, Remote Sensingand Spatial Information Sciences
40, 5 (2015), 207.[6] K. J. Dana, S. K. Nayar, B. van Ginneken, and J. J. Koenderink. 1997. Reflectance and Texture of Real-World Surfaces. In
Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . 151–157.[7] P. Debevec. 2006. Image-based Lighting. In
ACM SIGGRAPH 2006 Courses .[8] T. G. Dulecha, F. A. Fanni, F. Ponchio, F. Pellacini, and A. Giachetti. 2020. Neural reflectance transformation imaging.
The VisualComputer
36, 10 (2020), 2161–2174.[9] J. Filip, R. Vavra, M. Haindl, P. Zid, M. Krupika, and V. Havran. 2013. BRDF slices: Accurate adaptive anisotropic appearance acquisition.In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 1468–1473.[10] A. Franzluebbers and K. Johnsen. 2018. Performance Benefits of High-Fidelity Passive Haptic Feedback in Virtual Reality Training. In
Proceedings of the Symposium on Spatial User Interaction . 16–24.[11] A. Giachetti, I. M. Ciortan, C. Daffara, G. Marchioro, R. Pintus, and E. Gobbetti. 2018. A novel framework for highlight reflectancetransformation imaging.
Computer Vision and Image Understanding
168 (2018), 118–131.[12] E. Graeme, M. Kirk, and M. Tom. 2010. Archaeological applications of polynomial texture mapping: analysis, conservation andrepresentation.
Journal of Archaeological Science
37, 8 (2010), 2040 – 2050.[13] M. Haindl and J. Filip. 2007. Extreme Compression and Modeling of Bidirectional Texture Function.
IEEE Transactions on Pattern Analysisand Machine Intelligence
29, 10 (2007), 1859–1865.[14] M. Haindl and J. Filip. 2013.
Visual Texture: Accurate Material Appearance Measurement, Representation and Modeling . Springer.[15] A. Hanus, M. Hoover, A. Lim, and J. Miller. 2019. A Collaborative Virtual Reality Escape Room with Passive Haptics. In . 1413–1414.[16] V. Havran, J. Filip, and K. Myszkowski. 2010. Bidirectional Texture Function Compression Based on Multi-Level Vector Quantization. In
Computer Graphics Forum , Vol. 29. 175–190.[17] V. Havran, J. Hošek, Š. Němcová, J. Čáp, and J. Bittner. 2017. Lightdrum—Portable light stage for accurate btf measurement on site.
Sensors
17, 3 (2017), 423.[18] T. Hawkins, J. Cohen, and P. Debevec. 2001. A Photometric Approach to Digitizing Cultural Artifacts. In
Proceedings of the 2001 conferenceon Virtual reality, archeology, and cultural heritage . 333–342.[19] T. Hawkins, P. Einarsson, and P. E. Debevec. 2005. A Dual Light Stage.
Rendering Techniques
ACM Transactions on Graphics
29, 4 (2010), 99:1–99:12.[21] M. Holroyd, Ja. Lawrence, and T. Zickler. 2010. A Coaxial Optical Scanner for Synchronous Acquisition of 3D Geometry and SurfaceReflectance.
ACM Transactions on Graphics (TOG)
29, 4 (2010), 1–12.ACM J. Comput. Cult. Herit., Vol. 0, No. 0, Article 0. Publication date: 2021. owards Tangible Cultural Heritage Experiences • 0:15 [22] R. D. Joyce and S. Robinson. 2017. Passive haptics to enhance virtual reality simulations. In
AIAA Modeling and Simulation TechnologiesConference . 1313.[23] J. Köhler, T. Nöll, G. Reis, and D. Stricker. 2013. A Full-Spherical Device for Simultaneous Geometry and Reflectance Acquisition. In . 355–362.[24] M. L Koudelka, S. Magda, P. N Belhumeur, and D. J. Kriegman. 2003. Acquisition, Compression, and Synthesis of Bidirectional TextureFunctions. In . 59–64.[25] S. Krumpen, M. Weinmann, and R. Klein. 2017. OctreeBTFs - A Compact, Seamless and Distortion-free Reflectance Representation.
Computers & Graphics
68 (2017), 21–31.[26] K. S. Ladefoged and C. B. Madsen. 2019. Spatially-Varying Diffuse Reflectance Capture Using Irradiance Map Rendering for Image-BasedModeling Applications. In . 46–54.[27] H. P. A. Lensch, J. Kautz, M. Goesele, W. Heidrich, and H.-P. Seidel. 2003. Image-based Reconstruction of Spatial Appearance andGeometric Detail.
ACM Transactions on Graphics
22 (2003), 234–257. Issue 2.[28] R. W. Lindeman, J. L. Sibert, and J. K. Hahn. 1999. Hand-held windows: towards effective 2D interaction in immersive virtual environments.In
Proceedings IEEE Virtual Reality (Cat. No. 99CB36316) . 205–212.[29] T. Malzbender, D. Gelb, and H. Wolters. 2001. Polynomial Texture Maps. In
Proceedings of the 28th Annual Conference on ComputerGraphics and Interactive Techniques . 519–528.[30] P. Milgram and F. Kishino. 1994. A Taxonomy of Mixed Reality Visual Displays.
IEICE TRANSACTIONS on Information and Systems
Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques . 19–26.[32] M. Mudge, T. Malzbender, A. Chalmers, R. Scopigno, J. Davis, O. Wang, P. Gunawardane, M. Ashley, M. Doerr, A. Proenca, and J. Barbosa.2008. Image-Based Empirical Information Acquisition, Scientific Reliability, and Long-Term Digital Preservation for the Natural Sciencesand Cultural Heritage. In
Eurographics 2008 - Tutorials .[33] M. Mudge, T. Malzbender, C. Schroer, and M. Lum. 2006. New Reflection Transformation Imaging Methods for Rock Art and Multiple-viewpoint Display. In
Proceedings of the 7th International Conference on Virtual Reality, Archaeology and Intelligent Cultural Heritage .195–202.[34] G. Müller. 2009.
Data-Driven Methods for Compression and Editing of Spatially Varying Appearance . Dissertation. Universität Bonn.[35] R. Nagao, K. Matsumoto, T. Narumi, T. Tanikawa, and M. Hirose. 2018. Ascending and Descending in Virtual Reality: Simple and SafeSystem Using Passive Haptics.
IEEE Transactions on Visualization and Computer Graphics
24, 4 (2018), 1584–1593.[36] G. Nam, J. H. Lee, D. Gutierrez, and M. H Kim. 2018. Practical SVBRDF Acquisition of 3D Objects with Unstructured Flash Photography.
ACM Transactions on Graphics (TOG)
37, 6 (2018), 1–12.[37] F. E. Nicodemus, J. C Richmond, J. J Hsia, I. W Ginsberg, T. Limperis, et al. 1977.
Geometrical Considerations and Nomenclature forReflectance . Vol. 160.[38] T. Nöll, J. Köhler, . Reis, and D. Stricker. 2013. Faithful, Compact and Complete Digitization of Cultural Heritage using a Full-sphericalScanner. In , Vol. 1. 15–22.[39] T. Nöll, J. Köhler, G. Reis, and D. Stricker. 2015. Fully Automatic, Omnidirectional Acquisition of Geometry and Appearance in theContext of Cultural Heritage Preservation.
Journal on Computing and Cultural Heritage (JOCCH)
8, 1 (2015), 1–28.[40] G. Palma, M. Callieri, M. Dellepiane, and R. Scopigno. 2012. A Statistical Method for SVBRDF Approximation from Video Sequencesin General Lighting Conditions.
Computer Graphics Forum (Proceedings of the Eurographics Symposium on Rendering)
31, 4 (2012),1491–1500.[41] J. S.. Pierce, B. C. Stearns, and R. Pausch. 1999. Voodoo Dolls: Seamless Interaction at Multiple Scales in Virtual Environments. In
Proceedings of the 1999 Symposium on Interactive 3D Graphics . 141–145.[42] F. Ponchio, M. Corsini, and R. Scopigno. 2018. A Compact Representation of Relightable Images for the Web. In
Proceedings of the 23rdInternational ACM Conference on 3D Web Technology . 1:1–1:10.[43] G. Rainer, A. Ghosh, W. Jakob, and T. Weyrich. 2020. Unified Neural Encoding of BTFs.
Computer Graphics Forum
39, 2 (2020), 167–178.[44] G. Rainer, W. Jakob, A. Ghosh, and T. Weyrich. 2019. Neural btf compression and interpolation. In
Computer Graphics Forum , Vol. 38.235–244.[45] R. Ruiters and R. Klein. 2009. BTF Vompression via Sparse Tensor Decomposition. In
Computer Graphics Forum , Vol. 28. 1181–1188.[46] P. Santos, M. Ritz, R. Tausch, H. Schmedt, R. Monroy, A. De Stefano, O. Posniak, C. Fuhrmann, and D. W. Fellner. 2014. CultLab3D: Onthe Verge of 3D Mass Digitization. In
Proceedings of the Eurographics Workshop on Graphics and Cultural Heritage . 65–73.[47] C. Schwartz, R. Ruiters, and R. Klein. 2013. Level-of-Detail Streaming and Rendering using Bidirectional Sparse Virtual Texture Functions.
Computer Graphics Forum (Proc. of Pacific Graphics)
32, 7 (2013), 345–354.[48] C. Schwartz, R. Ruiters, M. Weinmann, and R. Klein. 2013. WebGL-based Streaming and Presentation of Objects with BidirectionalTexture Functions.
J. Comput. Cult. Herit.
6, 3 (2013), 11:1–11:21.ACM J. Comput. Cult. Herit., Vol. 0, No. 0, Article 0. Publication date: 2021. :16 • Krumpen et al. [49] C. Schwartz, R. Sarlette, M. Weinmann, M. Rump, and R. Klein. 2014. Design and Implementation of Practical Bidirectional TextureFunction Measurement Devices Focusing on the Developments at the University of Bonn.
Sensors
14, 5 (2014), 7753–7819.[50] C. Schwartz, M. Weinmann, R. Ruiters, and R. Klein. 2011. Integrated High-Quality Acquisition of Geometry and Appearance forCultural Heritage. In
Proceedings of the International Symposium on Virtual Reality, Archaeology and Intelligent Cultural Heritage (VAST) .25–32.[51] Y. Tsai, K. Fang, W. Lin, and Z. Shih. 2010. Modeling Bidirectional Texture Functions with Multivariate Spherical Radial Basis Functions.
IEEE transactions on pattern analysis and machine intelligence
33, 7 (2010), 1356–1369.[52] B. Tunwattanapong, G. Fyffe, P. Graham, J. Busch, X. Yu, A. Ghosh, and P. Debevec. 2013. Acquiring Reflectance and Shape fromContinuous Spherical Harmonic Illumination.
ACM Transactions on graphics (TOG)
32, 4 (2013), 1–12.[53] M. Weinmann, F. Langguth, M. Goesele, and R. Klein. 2016. Advances in Geometry and Reflectance Acquisition. In
Eurographics 2016Tutorials .[54] M. Weinmann, C. Schwartz, R. Ruiters, and R. Klein. 2011. A Multi-Camera, Multi-Projector Super-Resolution Framework for StructuredLight. In
Proceedings of the International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT) .397–404.[55] T. Weyrich, J. Lawrence, H. P. A. Lensch, S. Rusinkiewicz, and T. Zickler. 2009. Principles of Appearance Acquisition and Representation.
Foundations and Trends in Computer Graphics and Vision
Computer Graphics Forum , Vol. 30. 465–473.[57] Q. Zhao. 2009. A Survey on Virtual Reality.
Science in China Series F: Information Sciences
52, 3 (2009), 348–400.ACM J. Comput. Cult. Herit., Vol. 0, No. 0, Article 0. Publication date: 2021. owards Tangible Cultural Heritage Experiences • 0:17
A BTF COMPRESSION
BTF measurements can be stored in terms of huge matrices, columns contain the reflectance values for a particularsurface point x under various view-light configurations ( 𝜔 𝑙 , 𝜔 𝑣 ) . Depending on the resolution of the measurementsetup, even when stored with half-precision floating point values, this matrix can exceed hundreds of gigabytes,and thus, needs to be compressed in order to meet the demands for real-time rendering and streaming. For this,the Decorrelated Full Matrix Factorization (DFMF) [34], has been demonstrated to achieve good compressionratios, while preserving visual quality. In a first step, the BTF color values are converted into the YUV colorspace,which separates brightness from color information. Subsequently, the color channels are decorrelated by takingthe logarithm of the Y-channel, and dividing the U- and V-channel by the Y-channel. Then, for each of thesematrices, a SVD 𝐴 = U × Σ × V 𝑇 is computed to separate spatial information stored in V from the light- andview-dependent information in U . More precisely, a column 𝑢 𝑐 ( 𝜔 𝑙 , 𝜔 𝑣 ) of U stores the Eigen-ABRDF , whereas arow 𝑣 𝑐 ( x ) of V holds the Eigen-Texture of the BTF component 𝑐 . As the SVD orders the rows V and the columnsof U with respect to their importance, we can compress the BTF by keeping the first 𝑘 rows of V and the first 𝑘 columns of U . 𝜌 𝐵𝑇 𝐹 ( x , 𝝎 𝑙 , 𝝎 𝑣 ) ≈ ∑︁ ≤ 𝑐 < 𝑘 𝑢 𝑐 ( 𝝎 𝑙 , 𝝎 𝑣 ) · 𝝈 𝑐 · 𝑣 𝑐 ( x ) The resulting matrices (cid:101) U , (cid:101) Σ and (cid:101) V for each color channel are then stored, where (cid:101) Σ and (cid:101) V are pre-multiplied forsimplicity. Using this compression, the BTF data size is reduced to a few gigabytes, which fit into the memoryof current GPUs. Furthermore, the use of the YUV colorspace allows to compress the color channels U and Veven further by keeping less components, as most reflectance information is about changes in brightness andthe human eye is more sensitive to these than to changes in color. In our experiments, we observed the use ofeight BTF components for the U- and V-channel to be a good compromise between data size and visual qualitywhen rendering. While for the Y-channel a maximum of 100 components is sufficient for nearly all objects andfor objects with less details, one can also take less components. When rendering, four components are groupedtogether so the number of components 𝑘 is chosen to be a multiple of four, an RGBA-array-texture with 𝑘 / 𝑘 𝑌 =
72 and 𝑘 𝑈𝑉 =
8. This algorithm canbe applied to both conventional, texture-based BTFs as well as OctreeBTFs. The difference is, that for OctreeBTFs,the compression is done for the highest octree-level 𝑑 , and the compressed BTF data for the lower levels iscomputed as a post-processing step by averaging the BTF data over all child nodes in a top-down manner, startingwith level 𝑑 − 𝑑 𝑚𝑖𝑛 as described by Krumpen et al. [25]. B BTF DATA LAYOUT
Even when compressed, transmitting a BTF to a client over the network still takes some time, which makes itfeasible to split the BTF into smaller chunks of data and transmit these chunks independently to a remote client.This enables the client to start rendering the BTF with a lower quality as soon as the first chunks are received andto refine the rendering as more data-chunks arrive. An approach for streaming BTFs over the internet has beenpresented yb Schwartz et al. [48], however, since the data-layout of the spatial matrices (cid:101) Σ × (cid:101) V of the OctreeBTFdiffers from a conventional BTF, this approach cannot be applied to stream OctreesBTFs. The four-dimensionalangular data from (cid:101) U is stored as a nested parabolic map converted to an RGBA-array-texture for both BTFrepresentations and the layers of these textures can be streamed independently. The difference when streaminglies in the handling of spatial data: For conventional, texture-based BTFs, the spatial data can be stored as anRGBA-array texture for each color channel, where each layer of a texture stores the BTF components 𝑐 𝑖 . . . 𝑐 𝑖 + for all surface points X . The individual texture-layers can then be streamed independent from each other, whichmeans that the client can start rendering as soon as the first texture-layer for all three color channels is received.For OctreeBTFs however, the data is stored per voxel, where one voxel corresponds to one point x , i.e. first all ACM J. Comput. Cult. Herit., Vol. 0, No. 0, Article 0. Publication date: 2021. :18 • Krumpen et al. components 𝑐 . . . 𝑐 𝑘 for the first voxel are stored, then components 𝑐 . . . 𝑐 𝑘 for the second voxel, etc. This datalayout leads to a better cache-efficiency on the GPU during rendering, as described by Krumpen et al. [25], buthas the disadvantage that it is no longer possible (at least not without significant, costly re-arrangement of thedata) to stream one component for all voxels as one data-chunk. Thus, we have to treat all components for allvoxels for one octree levels as one data-chunk, and the client can only start rendering if the whole chunk (againfor all color channels) has been received. Finally, rendering the OctreeBTF needs information about the octreeitself and the normal and tangent vectors for each voxel, leading of an additional chunk of voxel-data per octreelevel. We construct the OctreeBTF similar to Krumpen et al. [25], where the first octree levels are forced to becomplete, thus we have 𝑙 = 𝑑 𝑚𝑎𝑥 − 𝑑 𝑚𝑖𝑛 octree levels. Considering the number of texture-layers for the angulardata for all three color channels, we end up with • 𝑙 chunks of voxel data, storing the octree structure. • 𝑙 × • 𝑘 𝑌 / + × 𝑘 𝑈𝑉 / C OCTREEBTF STREAMING
For streaming, we need to determine an order in which the chunks are transmitted, such that the client can startrendering as early as possible. First, the geometry is transmitted, which is followed by the voxel and spatial datafor the octree level 𝑑 𝑚𝑖𝑛 and the first four components of angular data (i.e. texture-layer) for each color channel.The remaining angular texture layers are distributed evenly among the remaining octree-levels. The biggestchunk is the last spatial chunk for the Y-channel due to the large number of voxels with 𝑘 𝑌 components for each.The complete algorithm is provided in Algorithm 1. ACM J. Comput. Cult. Herit., Vol. 0, No. 0, Article 0. Publication date: 2021. owards Tangible Cultural Heritage Experiences • 0:19
Algorithm 1
Build the load order for BTF data chunksAdd voxel data for level 𝑗 Add spatial data for level 𝑗 and channels YUVAdd first angular-layer for channels YUVlayers_per_lvl_Y = max(layers_total_Y / 𝑙 , 1) layers_per_lvl_UV = max(layers_total_UV / 𝑙 , 1)layers_added_Y = 1layers_added_UV = 1 for 𝑖 = 𝑙 do Add voxel data for level 𝑗 𝑖 Add spatial data for level 𝑗 𝑖 and channel Y for 𝑔 = doif layers_added_Y < layers_total_Y then Add angular-layer[layers_added_Y] end if layers_added_Y+=1 end for
Add spatial data for level 𝑗 𝑖 and channel UAdd spatial data for level 𝑗 𝑖 and channel V for 𝑔 = do if layers_added_UV < layers_total_UV then Add angular-layer[layers_added_UV]Add angular-layer[layers_added_UV] end if layers_added_UV+=1 end forend forend forend for