[PDF] Perceptual Evaluation of Liquid Simulation Methods

Abstract

This paper proposes a novel framework to evaluate fluid simulation methods based on crowd-sourced user studies in order to robustly gather large numbers of opinions. The key idea for a robust and reliable evaluation is to use a reference video from a carefully selected real-world setup in the user study. By conducting a series of controlled user studies and comparing their evaluation results, we observe various factors that affect the perceptual evaluation. Our data show that the availability of a reference video makes the evaluation consistent. We introduce this approach for computing scores of simulation methods as visual accuracy metric. As an application of the proposed framework, a variety of popular simulation methods are evaluated.

Full PDF

PPerceptual Evaluation of Liquid Simulation Methods

KIWON UM,

Technical University of Munich

XIANGYU HU,

Technical University of Munich

NILS THUEREY,

Technical University of Munich

Fig. 1. We evaluate dierent simulation methods (le) with user study consisting of pair-wise comparisons with reference (middle). This allows us to robustlyevaluate the dierent simulation methods (right).

This paper proposes a novel framework to evaluate uid simulation meth-ods based on crowd-sourced user studies in order to robustly gather largenumbers of opinions. The key idea for a robust and reliable evaluation isto use a reference video from a carefully selected real-world setup in theuser study. By conducting a series of controlled user studies and comparingtheir evaluation results, we observe various factors that aect the perceptualevaluation. Our data show that the availability of a reference video makesthe evaluation consistent. We introduce this approach for computing scoresof simulation methods as visual accuracy metric. As an application of theproposed framework, a variety of popular simulation methods are evaluated.CCS Concepts: •

Computing methodologies → Physical simulation ; Perception ;Additional Key Words and Phrases: perceptual evaluation, liquid simulation,uid-implicit-particle, smoothed particle hydrodynamics, crowd-sourcing

ACM Reference format:

Kiwon Um, Xiangyu Hu, and Nils Thuerey. 2017. Perceptual Evaluation ofLiquid Simulation Methods.

ACM Trans. Graph.

36, 4, Article 1 (July 2017),12 pages.DOI: http://dx.doi.org/10.1145/3072959.3073633

In science, we constantly evaluate the results of our experiments.While some aspects can be proven by mathematical measures suchas the complexity class of an algorithm, we resort to measurementsfor many practical purposes. When measuring a simulation, the

This work is supported by the

ERC Starting Grant metrics for evaluation could be the computation time of a noveloptimization scheme or the order of accuracy of a new boundarycondition. These evaluation metrics are crucial for scientists todemonstrate advances but also useful for users to select the mostsuitable one among various methods for a given task.This paper targets numerical simulations of liquids; in this area,most methods strive to compute solutions to the established physicalmodel, i.e., the

Navier-Stokes (NS) equations, as accurately as possi-ble. Thus, researchers often focus on demonstrating an improvedorder of convergence to show that a method leads to a more accu-rate solution (Batty et al. 2007; Enright et al. 2003; Kim et al. 2005).However, for computer graphics, the overarching goal is typicallyto generate believable images from the simulations. It is an openquestion how algorithmic improvements such as the contribution ofa certain computational component map to the opinion of viewersseeing a video generated with this method.There are several challenges here. Due to the complexity of ourbrain, we can be sure that there is a very complex relationshipbetween the output of a numerical simulation and a human opinion.So far, there exist no computational models that can approximateor model this opinion. A second diculty is that the transfer ofinformation through our visual system is clearly inuenced notonly by the simulation itself but also by all factors that are involvedwith showing an image such as materials chosen for renderingand the monitor setup of a user. Despite these challenges, the goalof this paper is to arrive at a reliable visual evaluation of uidsimulation methods. We will circumvent the former problem bydirectly gathering data from viewers with user studies, and we willdesign our user study setup to minimize the inuence of image-levelchanges.While there are interesting studies that investigate individualvisual stimuli (Han and Keyser 2016) and the inuence of dierentrendering methods for liquid simulations (Bojrab et al. 2013), ourgoal is to calculate the perceptual scores for uid simulations ona high-level from animations produced with dierent simulation

ACM Transactions on Graphics, Vol. 36, No. 4, Article 1. Publication date: July 2017. a r X i v : . [ c s . G R ] N ov :2 • K. Um et al. methods. We will demonstrate that a robust perceptual evaluationframework can be realized using crowd-sourced user studies thatutilize carefully chosen simulation setups and a reference video. Thiswill allow us to retrieve reliable visual accuracy scores of dierentsimulation methods evaluated in each study. In order to establishthis framework, we ran an extensive series of user studies gatheringmore than 48,000 votes in total. The overview of our framework isillustrated in Figure 1.In summary, we propose a novel perceptual evaluation frameworkfor liquid simulations. To the best of our knowledge, the perceptualevaluation of physically-based liquid animations has previously notbeen studied, and we will use our framework to evaluate dierentsimulation methods and parameterizations. From our evaluationresults, we will draw useful observations for dierent simulationmethods. Fluid simulation methods typically compute solutions to the NSequations, which can be written as 𝜕 u / 𝜕𝑡 + u · ∇ u = g − ∇ 𝑃 / 𝜌 + 𝜈 ∇ u with the additional constraint to conserve volume: ∇ · u =

0, where u is the velocity, g is the gravity, 𝑃 is the pressure, 𝜌 is the den-sity, and 𝜈 is the viscosity coecient. Numerical solvers for theseequations can be roughly categorized as Eulerian and Lagrangianmethods. Fluid animations using Eulerian discretizations have beenpioneered by Foster and Metaxas (1996), and the stable uids solver(Stam 1999) has been widely used after its introduction. For liquids,the particle level set method has been demonstrated to yield accu-rate and smooth surface motions (Enright et al. 2002). Currently, theuid-implicit-particle (FLIP) approach, which combines Eulerian in-compressibility with a particle-based advection scheme to representsmall-scale details and splashes, is widely used for visual eects(Zhu and Bridson 2005). The FLIP algorithm has been extended tomany interesting applications such as the artistic control (Pan et al.2013) and adaptivity (Ando et al. 2013). In the following, we willfocus on liquid simulations in simple domains without any adap-tivity. We believe that this is a good starting point for our studies,but these extensions would of course be interesting for perceptualevaluations in the future.The FLIP method was extended to incorporate position correctionof the participating particles (Ando et al. 2012; Um et al. 2014) andto improve its eciency by restricting particles to a narrow bandaround the surface (Ferstl et al. 2016). Secondary eects generationhas been a highly popular topic within the uid simulation area inorder to increase the apparent detail of the simulation (Ihmsen et al.2012). Many movies and interactive applications have incorporatedhand-tuned parameters and heuristics to approximate where andhow splashes, foam, and bubbles develop from an under-resolvedsimulation. Moreover, a unilateral pressure solver was proposed toenable large-scale splashes in FLIP (Gerszewski and Bargteil 2013).Recently, several more FLIP variants were proposed to incorporatecomplex material eects that go beyond regular Newtonian uids(Ram et al. 2015; Stomakhin et al. 2013). We will later use the closelyrelated ane particle-in-cell (APIC) variant (Jiang et al. 2015) as oneof our candidates for simulation methods.Lagrangian uid simulation techniques in graphics are typicallybased on variants of the smoothed particle hydrodynamics (SPH) approach. After its rst use for deformable objects (Debunne et al.1999), an SPH algorithm for liquids was introduced by Müller et al.(2003), and then weakly-compressible SPH (WCSPH) was introducedby Becker and Teschner (2007). The SPH algorithm was adopted andextended in a multitude of ways such as an adaptive discretization(Adams et al. 2007) and a predictor-corrector step that improveseciency and stability (Solenthaler and Pajarola 2009). Techniquesfor two-way coupling between rigid bodies and liquids have likewisebeen proposed (Akinci et al. 2012).A dierent formulation using the position-based dynamics view-point was proposed for real-time simulations (Macklin and Müller2013) while other researchers suggested an implicit method for bet-ter convergence rate (Ihmsen et al. 2014a); this is known as implicitincompressible SPH (IISPH). From the Lagrangian eld, we will re-strict our visual accuracy study to a few selected methods: WCSPHand IISPH, which are typical and popular in graphics. Additionally,we also include an engineering SPH variant (Adami et al. 2012),from which we expect particularly accurate simulations; we denotethis variant as SPH in our studies.Naturally, researchers have been interested in combining aspectsof the Lagrangian and Eulerian representations by bringing SPHand grid-based solving components together (Losasso et al. 2008;Raveendran et al. 2011). We have not yet included these hybridapproaches in our studies, although FLIP arguably represents ahybrid particle-grid method. For a thorough overview of popularuid simulations methods, refer to the book by Bridson (2015) andstate-of-the-art report by Ihmsen et al. (2014b).The human visual system and perception of image and videocontents have received signicant attention in computer graphicsin order to study how algorithmic choices inuence the nal judg-ment of the created images. For example, in the area of renderingtechniques, Cater et al. (2002) proposed to use selective and per-ceptually driven rendering approaches, and Dumont et al. (2003)introduced a theoretical framework to compute perceptual metrics.In photography, Masia et al. (2009) perceptually evaluated dierenttechniques for tone-mapping HDR images with user studies. Forvideos, an approach for perceptually-driven up-scaling of 3D con-tent was proposed (Didyk et al. 2010) while others investigated acomputational model for the perceptual evaluation of videos (Aydinet al. 2010).Beyond rendering and video, perceptual studies have also beenused in the eld of character animation. Especially, human charac-ters have received attention. For instance, McDonnell et al. (2008)studied how to populate natural crowds for virtual environments.More recently, researchers also gathered data on the attractivenessof virtual characters (Hoyet et al. 2013). In the area of deformable ob-jects, Han and Keyser (2016) studied how visual details can inuencethe perceived stiness of materials. Bojrab et al. (2013) studied howrendering styles of liquids inuence user opinion. While this workalso considers liquids, our goal is in a way orthogonal to theirs. Wefocus on simulation methods without being inuenced by renderingstyles. Despite the fact that most liquid simulation methods are physically-based and thus capable of approximating the NS equations in the

ACM Transactions on Graphics, Vol. 36, No. 4, Article 1. Publication date: July 2017. erceptual Evaluation of Liquid Simulation Methods • 1:3 topfront (a) breaking-dam (in meter) fronttop (b) sloshing-wave (in centimeter)Fig. 2. Two simulation setups (Botia-Vera et al. 2010; Kleefsman et al. 2005)for evaluation of liquid simulation and example frames of real experiments. limit, noticeable visual dierences exist among animations createdfrom the dierent methods. Being aware of these dierences, wepropose a novel approach that employs user studies to evaluate thedierent methods in terms of how closely they match real phenom-ena. The goal of our approach is to robustly and reliably comparedierent liquid simulations such that the evaluation reects a gen-eral opinion. Therefore, we employ a crowd-sourcing platform inorder to recruit many participants to retrieve a reliable evaluation.We focus on the perceptual evaluation of simulations in terms ofwhat we call visual accuracy . We dene this visual accuracy to be ascore computed from user study data to compare dierent methods,and we will make sure that it can be computed in a robust andunbiased way. To collect data, we let users select a preferred videofrom pair-wise comparisons, and we found it crucial for robustnessto provide participants with a visual reference. As we will outlinebelow, this also makes the results very stable with respect to stronglydiering rendering styles. These comparisons with a reference videoare also our motivation to see the scores we compute as a form of accuracy .Liquid simulations are commonly used tools in visual eectsand applied for a vast range of phenomena from drops of blood tolarge scale ocean scenes. While it would be highly interesting toevaluate all of them, we focus on one particular regime of water-likeliquids on human scales. This regime is highly challenging due tothe low viscosity of water. The resulting ows typically feature highReynolds numbers, complex waves, and large amounts of dropletsand splashes. Although this naturally limits the regime of our study,we believe that it is particularly a representative for many eectsand thus worth studying. Next, we will present two carefully chosensimulation setups that will also form the basis of our user studies inSection 3.2.

When selecting simulation setups, our requirements are that thesetups are easy to realize in numerical simulations; thus, they do notinvolve any specialized domain boundary conditions or any mov-ing obstacles. Therefore, the setups should be easily reproducible.Nonetheless, the setups need to result in suciently complex dynam-ics such as overturning waves and splashes in order to be relevantfor visual eects applications. Note that our setups stem from theengineering community. This has the additional benet that detailedow measurements are available as well as video data from realexperiments. The latter is especially important for our user studylater.Our rst setup is close to the popular breaking dam case oftenseen in graphics. Such a benchmark setup is also often used forvalidation in the engineering studies, which adds an obstacle infront of the breaking dam for additional complexity (Kleefsmanet al. 2005). This setup uses a tank of size 3.22m × ×

1m with anopen roof, a static obstacle of 0.16m × × × × dam in the following, and the details of its initial conditionsare illustrated in Figure 2-(a).Our second setup is a sloshing wave tank (Botia-Vera et al. 2010);this is illustrated in Figure 2-(b). A rectangular tank partially lledwith water experiences a periodic motion that continually injectsenergy into the system leading to waves and splash eects form-ing over time. The size of the tank is 0.9m × × wave in the following. Additional documentationfor both setups is available online (Issa et al. 2017).For all simulations, we parameterize them according to the real-world dimensions given above using earth gravity as the only exter-nal force. Unless otherwise noted, we will not include any additionalviscosity. In the following, we explain the user studies, which arebased on one of these two setups. The goal of the user studies is to reliably evaluate the visual accuracyacross a set of 𝑚 videos produced by dierent simulation methods.While many variants of user studies are imaginable (Leroy 2011),we opted for purely binary questions in order to reduce noise andinconsistencies in the answers. We also want to make the design assimple as possible to prevent misunderstandings. Thus, participantsare shown two videos to consider in comparison to a reference videoas illustrated in Figure 3. The videos are played repeatedly withouttime limit, and the participants are given the task to select one videowhich they consider to be closer to the reference video.All participants have to give their vote for all possible pairsin a study. Thus, for 𝑚 videos under consideration, we collect ACM Transactions on Graphics, Vol. 36, No. 4, Article 1. Publication date: July 2017. :4 • K. Um et al.

A BAB01/30: Which one is closer to the reference video? Reference

Fig. 3. Our user study design. 𝑚 ( 𝑚 − )/ 𝑚 is kept small, e.g., 𝑚 < = 𝑚 videos. For this purpose, we adopt the widelyused Bradley-Terry model (Bradley and Terry 1952). We review themodel briey here. Its goal is to compute scores 𝑠 𝑖 such that we candene the probability 𝑝 𝑖 𝑗 that a participant chooses video 𝑖 overvideo 𝑗 as: 𝑝 𝑖 𝑗 = 𝑒 𝑠 𝑖 − 𝑠 𝑗 / (cid:0) + 𝑒 𝑠 𝑖 − 𝑠 𝑗 (cid:1) . (1)Let 𝑤 𝑖 𝑗 denote the number of times where video 𝑖 was preferred overvideo 𝑗 in a user study. Assuming the observations are independent, 𝑤 𝑖 𝑗 follows a binomial distribution. Therefore, the log likelihoodfor all pairs among all videos can be calculated as follows: 𝐿 ( s ) = 𝑚 ∑︁ 𝑖 = 𝑚 ∑︁ 𝑗 = (cid:0) 𝑤 𝑖 𝑗 𝑠 𝑖 − 𝑤 𝑖 𝑗 ln ( 𝑒 𝑠 𝑖 + 𝑒 𝑠 𝑗 ) (cid:1) (2)where s = [ 𝑠 , 𝑠 , ..., 𝑠 𝑚 ] . The nal scores of all videos are computedby solving for the s that maximizes the likelihood function 𝐿 inEquation (2) (Hunter 2004).The vector of scores s is what we use to evaluate the visual ac-curacy in the following. Note that these scores do not yield any“absolute” distances to the reference, and they cannot be used tomake comparisons across dierent studies. However, we found thatthey yield a reliable scoring and probability (see Equation (1)) forall videos participating in a single study.In order to prevent bias with respect to the participants, we rana series of studies in three dierent crowd-sourcing platforms andfound that dierences were negligible. Details for these studies canbe found in Appendix A. Across our studies, we also noticed that (a) Opaque (b) TransparentFig. 4. Example frames of the opaque and transparent rendering styles. the consistency checks did not signicantly inuence the results,thus the large majority of participants was trustworthy. In total, wecollected user study data for 48,800 pair-wise comparisons from 557participants in 65 countries.Seeing the consistency of answers across dierent platforms, webelieve that the user study design described above yields consistentanswers. However, the existence of consistent scores by themselvesdoes not yet mean that we can draw conclusions about the un-derlying simulation methods rather than about a certain style ofvisualization. In the next section, we will present a series of userstudies to investigate whether we can specically target simulationmethods. In order to show that there is a very high likelihood that our studiesallow conclusions to be drawn about the simulation methods, wenow turn to comparisons of studies. Thus, instead of consideringindividual visual accuracy scores 𝑠 𝑖 , we will consider multiple setsof score vectors s to be compared with each other. Once we havedemonstrated that our user studies allow us to draw conclusionswith high condence, we will discuss individual scores for specicsimulation-related questions in Section 4.In the following, we will analyze pairs of studies for which wemake only a single change. For example, one study will have ren-dering style A, and a second study will have rendering style B whilekeeping all other conditions identical. We then perform a correlationanalysis for these studies. If the studies turn out to be correlated,we can draw conclusions about the inuence of the change on theoutcome.For the correlation analysis, we compute the Pearson correla-tion coecient and statistical signicance (Pearson 1920), whichare widely used in statistics as a measure of the linear correlationbetween two variables 𝑥 , 𝑦 ∈ R 𝑚 . This correlation coecient 𝑟 isthe covariance of the two variables divided by the product of theirstandard deviations 𝜎 𝑥 and 𝜎 𝑦 , i.e., 𝑟 = cov ( 𝑥, 𝑦 )/ 𝜎 𝑥 𝜎 𝑦 . A strongpositive correlation, i.e., very similar score distributions, will resultin values close to +

1, while uncorrelated or inverted scores, hencevery dierent user opinions, will result in correlations of 0 or evennegative correlations of − ACM Transactions on Graphics, Vol. 36, No. 4, Article 1. Publication date: July 2017. erceptual Evaluation of Liquid Simulation Methods • 1:5

Table 1. Correlation analysis for the sets of scores evaluated from dierent user studies using FLIP and SPH. Here, ref. denotes the reference video.

ID Comparison (IDs in Table 7) Constant parameters 𝑟 p-valueC opaque (A) vs. transparent (B) dam with ref. dam (A) vs. wave (C) rendered in opaque with ref. opaque (A*) vs. transparent (B*) dam w/o ref. − dam (A*) vs. wave (C*) rendered in opaque w/o ref. with ref. (A) vs. w/o ref. (A*) dam rendered in opaque 0.64540 0.16632C with ref. (B) vs. w/o ref. (B*) dam rendered in transparent − -0.6-0.4-0.2 0 0.2 0.4 0.6 0.8 with ref. C C w/o ref. C C with ref. vs. w/o ref. C C chosen to broadly sample the space of typical resolutions and simula-tion methods. For the studies of this section, we are not particularlyinterested in the specic details of the simulation methods as longas they are representative for commonly used methods of graphicsapplications. With this goal in mind, we will use a popular Eulerianmethod FLIP (Zhu and Bridson 2005) and Lagrangian method SPH(Adami et al. 2012) with three representative resolutions as shown inTable 2. Note that FLIP eectively is a hybrid Lagrangian-Eulerianmethod. However, we consider FLIP as Eulerian in our studies dueto its Eulerian pressure solver, which is a key component of thealgorithm. We put an emphasis on visual aspects with the studiesdescribed in the following section.The space of possible visualization techniques for liquid anima-tions is huge. Many freely available renderers exist to create realisticimages. Real-time applications typically use specialized shaders foreciency, and visual eects in movies employ very rened compo-sitions of many layers to produce highly realistic visuals. Insteadof trying to cover this whole space of possibilities, we focus ontwo extremes of the spectrum: a fully opaque rendering style andperfectly transparent surface. While the former employs a simplediuse material similar to a preview rendering, the transparent ren-dering style exhibits complex lighting eects, such as refraction,reection, and caustics. A consequence is that the surface is veryclearly visible for the diuse surface in contrast to the transparentrendering. Still images for an example of these two rendering stylescan be found in Figure 4. Table 2. Six simulation configurations for the experiments of Table 1 and3. Here, S denotes the scaling factors of resolution, and M denotes themethods: Eulerian (Eu.) and Lagrangian (La.)

S M Resolutions for particle and grid dam wave x Eu. 83k ( 80 × ×

25) 23k ( 75 × × x Eu. 664k (160 × ×

50) 186k (150 × × x Eu. 5,315k (320 × × × × x La. 84k ( 80 × ×

25) 24k ( 75 × × x La. 665k (160 × ×

50) 186k (150 × × x La. 2,253k (240 × ×

75) 634k (225 × × Comparisons of user studies:

Assuming that our design for userstudies is reliable, we expect to see a strong correlation when com-paring two studies with these dierent rendering styles despite thedierences in appearance. This hypothesis is conrmed with a cor-relation coecient of more than 0 .

97 with a high condence level(p<0.01). The details for this correlation calculation C as well as thefollowing ones can be found in Table 1, and the full studies underconsideration are given in Table 7. Considering the signicantly dif-ferent images resulting from these two rendering styles, we believethat the strong correlation is an encouraging result.When removing the reference video from the user study design(C ), i.e., only showing two videos of numerical simulations with thetask to select the “preferred” version, the result changes drastically.Instead of a positive correlation, we now see a nearly no correlation(i.e., 𝑟 = − . dam setup for the study above, we now repeatthis comparison keeping the rendering style constant (i.e., opaque)and comparing simulation setups ( dam versus wave ). When perform-ing these studies with reference videos, we see a strong correlationof 0.97 (C ) with a high condence level (p<0.01), whereas the cor-relation slightly drops to 0.84 when the reference video is removed(C ). The absence of a reference video does not necessarily lead toinconsistent results for all cases, rather there is an increased chanceof ambiguity and substantially dierent responses.From the rst two pairs of comparisons, we draw the conclusionthat the availability of a visual reference is crucial for a consistentevaluation of the liquid motion. Having a reference video even stabi-lizes results from strongly diering visualization styles as illustratedwith the studies of C . The reference video is also the reason whywe believe our results do not contradict previous work that foundsignicant inuence of rendering styles on perception for animatedwater (Bojrab et al. 2013). Regarding liquid motions in the human-scale regime, our results indicate that the inuence of rendering canbe made negligible by providing a visual reference. Note that ourreference does not need to closely match the rendering style usedfor the simulation videos. The results are consistent even for signif-icantly stylized and dierent rendering styles such as our opaqueand transparent styles; both are very dierent from the referencevideo. The dierent correlation scores are summarized visually atthe gure in Table 1. This gure again highlights that the low and ACM Transactions on Graphics, Vol. 36, No. 4, Article 1. Publication date: July 2017. :6 • K. Um et al.

Fig. 5. Example frames of our alternate reference video for dam . even negative correlations are stabilized by the availability of areference video.To shed further light on this topic, we compute correlations be-tween the studies with and without reference video. These cor-respondences can be found in C , in Table 1. In both cases, thevisual accuracy scores of the methods under consideration changesignicantly when the reference video is removed. This results inthe correlations that are not statistically signicant (p>0.05). Be-sides, the results with the transparent rendering style show a drasticchange of user opinions. Thus, without a reference video, visualappearance can strongly inuence the scores. Reference videos:

The video we used as reference for the dam example has a visual appearance that is clearly dierent from ourrenderings. We note that visual accuracy can be evaluated evenwhen the simulated phenomena bear only rough resemblance to thereference video. Figure 5 shows the example frames of a referencevideo recorded in nature at a seashore. We use this video in anadditional user study with the dam example instead of the oneshown in Figure 2, and the resulting scores are highly correlatedwith the results of the original study with the video of the dam experiment. Here, the correlation is 0.93 with a high condencelevel (p<0.01).On the other hand, when we use a video that diers more strongly,the user study results start to change. The correlation betweena study using the wave video with the dam simulations and theoriginal study is not statistically signicant (p>0.05). To summarize,our results show that a reliable visual accuracy can be establishedeven if no reference to the exact simulation setup is available. Thehuman visual system is powerful enough to correlate the visualinputs despite dierent appearance. However, the stability of theresults drops when the physics dier substantially.

Representative methods:

At this point, we also want to conrmour assumption that the two initially chosen simulation methodsare representative for commonly used Eulerian and Lagrangianmethods. We choose two dierent methods from the Eulerian andLagrangian classes: APIC (Jiang et al. 2015) and IISPH (Ihmsen et al.2014a). With these two methods, we performed new user studieskeeping the remainder of the user study and simulation setupsconstant; i.e., the simulations use the same resolutions of particleand grid as before (Table 2). The strong positive correlation forthis pair of studies conrms our initial assumption (C in Table 3).Note that our two sets of simulation methods are also correlated instudies without a reference video (C ). Presumably, this indicatesthat the participants’ tendency in preference among the two classes -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 MP LS FLIP APIC WCSPH IISPH SPH S c o r e damwave Fig. 6. Visual accuracy scores of the seven simulation methods under con-sideration (J and K in Table 7). of methods is fairly consistent. In this case, the individual scoresof each method change substantially between the FLIP&SPH andAPIC&IISPH sets. Thus, this makes it dicult to draw a conclusionamong the dierent methods of each class. However, the correlationbetween the two sets of methods conrms our assumption thatthese methods cover the space of Eulerian and Lagrangian classeswell. In addition, we nd that the availability of a reference videoaects the stability also in these methods (C ). This is consistentwith the aforementioned results indicating again that the absenceof reference video results in a chance of ambiguity. In this section, we use our approach to evaluate the visual accuracyof various simulation methods (Section 4.1 and 4.2). We also demon-strate that our evaluation allows us to redeem heuristic approaches,such as the grid resolution for particle skinning (Section 4.3), or algo-rithmic modications, such as a splash model for FLIP simulations(Section 4.4).

When establishing our evaluation framework, a central goal was tocompare simulation methods. In the following, we evaluate sevensimulation methods from the Eulerian and Lagrangian classes: marker-particles (MP) (Foster and Metaxas 1996), a solver with level set sur-face tracking (LS) (Foster and Fedkiw 2001), FLIP (Zhu and Bridson2005), and APIC (Jiang et al. 2015) as representatives of Eulerianmethods; WCSPH (Becker and Teschner 2007), IISPH (Ihmsen et al.2014a), and a so-called wall-boundary SPH method (Adami et al.2012) as representatives of Lagrangian ones. Note that this classi-cation is primarily based on whether the method uses a grid in thepressure solver. Using these seven methods, we simulate our twosimulation setups, i.e., dam and wave from Section 3.1.The evaluation results are summarized in Figure 6. Interestingly,the Lagrangian methods (particularly, IISPH and SPH) consistentlyreceive higher visual accuracy scores than the other methods. Amongthe Eulerian methods, the FLIP variants (i.e., APIC and FLIP) receivehigher scores than MP and LS. Our guess for the latter results isthat the MP and LS versions exhibit a very small amount of droplets.

ACM Transactions on Graphics, Vol. 36, No. 4, Article 1. Publication date: July 2017. erceptual Evaluation of Liquid Simulation Methods • 1:7

Table 3. Additional correlation analysis for two sets of simulation methods. Here, the dam example is used with opaque style.

ID Comparison (IDs in Table 7) Constant parameters 𝑟 p-valueC FLIP&SPH (A) vs. APIC&IISPH (D) with ref. 0.96057 0.00230C FLIP&SPH (A*) vs. APIC&IISPH (D*) w/o ref. 0.96932 0.00140C with ref. (D) vs. w/o ref. (D*) APIC&IISPH 0.72139 0.10562Note that the score of WCSPH is also noticeably low in the wave example; we observe that the amount of splashes is likewise verysmall, and the surface motion is highly viscous due to its articialviscosity. Here, the level set method receives a higher score thanWCSPH. We presume that this is caused by the articial viscosityof the WCSPH solve, which often results in a stronger damping ofits surface motion in comparison to the LS solve. Figure 7 showsseveral still frames of all the methods.For the implementation of each method, we followed the originalwork without any signicant modications. The Eulerian methods(i.e., MP, LS, FLIP, and PIC) used grid resolutions of 160 × ×

50 for dam and 150 × ×

10 for wave . All methods except LS used 665K par-ticles for dam and 186K particles for wave . Although the Lagrangianmethods did not use any grid in their solve, we used the same un-derlying grid for initializing the particles and sampled each cellwith eight particles. While the Lagrangian methods used a uniformsampling, the Eulerian methods randomly jittered the particles toavoid aliasing. Note that this resulted in slightly dierent numbersof particles ( ∼ This experiment focuses on the four methods that ranked high-est from the previous evaluation and re-evaluates them with theconstraint of a limited computational budget per frame. While theprevious study kept resolution and particle count constant, we haveadjusted them to yield comparable runtimes for this study. We sim-ulated the dam example using APIC, FLIP, IISPH and SPH such thatthey all required approximately 55 seconds per frame of animation.Here, we do not include the computational costs for non-simulationsteps such as surface generation and rendering. We are aware thatabsolute comparisons of performance are dicult in general, but wehave made our best eorts to treat all methods fairly and to bringall implementations up to a similar level of optimization (e.g., allimplementations employ shared-memory parallelism with OpenMPfor most of their steps).The time restriction leads to signicant reduction in resolutionfor the SPH-based methods. Both FLIP and APIC use a 320 × × × ×

30 grid, while SPH uses 84k particles sampled from a80 × ×

25 grid. Example frames for these simulation congurationsare shown in Figure 8. In contrast to the previous evaluation in Section 4.1, our partici-pants gave the Eulerian methods higher visual accuracy scores. Theresults are shown in Figure 9. Thus, while the previous study sug-gests that Lagrangian methods capture large-scale splashes betterat a given resolution, this study suggests that FLIP and APIC leadto improved results under a restriction in computation time.

Our evaluation approach is also useful to redeem heuristic ap-proaches, where parameters are typically chosen by intuition. Oneexample is the grid resolution for generating a surface mesh fromparticle data, i.e., particle skinning . The commonly used heuristic forthis is to use a two times higher resolution of the simulation grid,but there has been little motivation for this particular setting.As the base simulation for this experiment, we use FLIP witha 160 × ×

50 grid and 664k particles. After simulation, a signeddistance eld is computed from the particles (Zhu and Bridson 2005),which we triangulate with marching cubes. Since the particles aresampled at a 2 sub-grid, the cell size of the base resolution (1 x )is 2 ℎ , where ℎ denotes the particle spacing. We perform the parti-cle skinning using dierent resolutions with seven scaling factorsrelative to ℎ : 0.5 x , 0.75 x , 1 x , 1.5 x , 2 x , 3 x , and 4 x . In order to avoidmissing particles in the grids that are more than ℎ apart, the particlediameter is adjusted to the larger of either the grid spacing or theparticle spacing. The example frames are shown in Figure 10.As Figure 11 shows, the evaluation result indicates that the heuris-tic of 2 x (Zhu and Bridson 2005) is a good one. The higher resolutionsdo not yield results that can reliably be considered better than the2 x factor, which thus represents the best performance. This section inspects a specic FLIP extension that claims to yield anincreased amount of visual detail with secondary eects. It employsa neural-networks approach to model the sub-grid scale dynamicsthat lead to splashes (Um et al. 2017), and we will denote it as

MLFLIP in the following. A visual comparison of example frames from bothFLIP and MLFLIP can be seen in Figure 12.In order to see whether this splash model indeed results in bettervisual accuracy scores, we evaluate both FLIP and MLFLIP with twoadditional methods for reference (i.e., MP and SPH). Figure 13 showsthe resulting visual accuracy scores. For the dam setup, we observethat the MLFLIP approach yields a notable improvement in scorefrom 2.28 for regular FLIP to 4.18 for MLFLIP. The gain for the wave setup is lower, from 1.83 to 2.66, but we can still nd a statisticallyrelevant improvement. These results indicate that splashes are animportant visual cue for large-scale liquid phenomena.

ACM Transactions on Graphics, Vol. 36, No. 4, Article 1. Publication date: July 2017. :8 • K. Um et al.

Fig. 7. Example frames of seven simulations in two examples: (top) dam and (boom) wave . From le to right, MP, LS, FLIP, APIC, WCSPH, IISPH and SPH areshown. (a) FLIP (b) APIC(c) IISPH (d) SPHFig. 8. Example frames of four simulations with a similar computation time. -0.5 0 0.5 1 1.5 2 2.5 3 3.5 FLIP APIC IISPH SPH S c o r e dam Fig. 9. Visual accuracy scores of the four videos simulated in a similarcomputation time (L in Table 7).

As our core method of evaluation, we propose to use measurementsof visual accuracy scores from user studies with a reference video.However, seeing the strong variability in the previous results, es-pecially for the transparent rendering style, we believe that it isimportant to discuss additional studies that we conducted to investi-gate the inuence of rendering on the scores of simulation methodswhen no reference video is available. However, we found this areato be highly complex; thus, the following results are far from acomplete mapping of rendering space.In a rst series of studies, we investigate the behavior of the transi-tion between opaque and transparent rendering styles. We generated

Table 4. Correlation analysis for the additional rendering styles.

Comparison (IDs in Table 7) 𝑟 p-valueOpaque (A*) vs. Glossy (H*) 0.94329 0.00473Opaque (A*) vs. Translucent (I*) 0.93170 0.00684Transparent (B*) vs. Glossy (H*) 0.55867 0.24918Transparent (B*) vs. Translucent (I*) 0.59764 0.21027a sequence of three in-between versions by linearly blending thetwo styles in image space as shown in Figure 14 and performeduser studies. Interestingly, the correlations between this series ofstudies change smoothly, albeit not linearly, when moving fromopaque towards transparent. The data are shown in Figure 15. Dueto the strong dierence in initial results (C from Table 1), we foundit surprising that the space between these two extremes behavessmoothly.We also performed the user studies with the same setup usingtwo additional rendering styles, which we selected to be dierentfrom both opaque and transparent styles. The rst additional styleis a dark-green glossy surface, while the second one is a translucent volume with attenuation eects. These two rendering styles areshown in Figure 16. The correlation coecients for these two styleswith respect to our two initial styles indicate that both the glossy and translucent styles are strongly correlated with the opaque oneas shown in Table 4. Note that all studies discussed in this sectionwere performed without the reference video. The results indicatethat the opaque style covers a broader range of other renderingstyles by showing the strong correlations even when no referencevideo is given. Presumably, the transparent rendering style with itscomplex light eects triggers a very dierent “mental image” forthe participants when no reference video is given. This leads to asubstantially dierent evaluation of the videos with transparentrendering. However, note that all studies in Section 4 are conductedwith the opaque rendering style and a reference since our goal is toreliably assess dierent methods. We have presented the rst framework to perceptually evaluateliquid simulation methods by employing crowd-sourced user studies.By analyzing the evaluation results from controlled studies, wehave demonstrated that our framework can reliably measure useropinions in the form of a visual accuracy score. Our key nding here

ACM Transactions on Graphics, Vol. 36, No. 4, Article 1. Publication date: July 2017. erceptual Evaluation of Liquid Simulation Methods • 1:9

Fig. 10. Example frames of seven resolutions for particle skinning. From le to right, 0.5x, 0.75x, 1x, 1.5x, 2x, 3x, and 4x are shown. S c o r e dam Fig. 11. Visual accuracy scores of the seven resolutions for particle skinning(M in Table 7). is that the availability of a reference video makes stable evaluationspossible. Most importantly, the scores are not inuenced by a certainchoice of rendering method.The ndings from our studies have led to several insights. Forour chosen settings, the studies suggest that • viewers prefer SPH-based methods when comparable parti-cle counts are used, • FLIP and especially APIC are preferred when the computa-tional resources are limited, • the commonly used factor of two for particle skinning isconrmed by our experiment, • and the splash eects are an important visual componentfor large-scale liquids.As the perception of physical phenomena such as liquids is highlycomplex, our work clearly represents only a rst step. We have notinvestigated the demographics of our participants in more detail. (a) FLIP (b) MLFLIPFig. 12. Visual comparison of MLFLIP with FLIP in two examples: (top) dam and (boom) wave . Moreover, we currently focus on a specic regime of liquid ows,and it is not clear how applicable our results are for other regimes.Likewise, we have only tested a small selection of simulation meth-ods with our studies. There are many interesting variants that couldbe evaluated in addition to our current selection. In the future, we arealso highly interested in extending our studies to smoke ows andother types of materials such as objects undergoing elasto-plasticdeformations. As we have proposed a rst perceptual evaluationframework for liquid simulation methods, we believe these direc-tions are very interesting avenues for future work.

ACKNOWLEDGMENTS

We would like to thank all members of the graphics labs of TUMand IST Austria for the thorough discussions and the SPHERICcommunity for providing the experimental videos.

REFERENCES

S. Adami, X. Y. Hu, and N. A. Adams. 2012. A generalized wall boundary condition forsmoothed particle hydrodynamics.

J. Comput. Phys.

DOI: https://doi.org/10.1016/j.jcp.2012.05.005Bart Adams, Mark Pauly, Richard Keiser, and Leonidas J. Guibas. 2007. AdaptivelySampled Particle Fluids.

ACM Trans. Graph.

26, 3, Article 48 (July 2007), 7 pages.

DOI: https://doi.org/10.1145/1276377.1276437Nadir Akinci, Markus Ihmsen, Gizem Akinci, Barbara Solenthaler, and MatthiasTeschner. 2012. Versatile Rigid-Fluid Coupling for Incompressible SPH.

ACM Trans.Graph.

31, 4 (July 2012), 62:1–62:8.

DOI: https://doi.org/10.1145/2185520.2185558Ryoichi Ando, Nils Thurey, and Reiji Tsuruno. 2012. Preserving Fluid Sheets withAdaptively Sampled Anisotropic Particles.

IEEE Transactions on Visualization andComputer Graphics

18, 8 (2012), 1202–1214.

DOI: https://doi.org/10.1109/TVCG.2012.87Ryoichi Ando, Nils Thürey, and Chris Wojtan. 2013. Highly Adaptive Liquid Simulationson Tetrahedral Meshes.

ACM Trans. Graph.

32, 4 (July 2013), 103:1–103:10.

DOI: https://doi.org/10.1145/2461912.2461982 S c o r e damwave Fig. 13. Notable improvements of MLFLIP in visual accuracy in two exam-ples (N and O in Table 7).

ACM Transactions on Graphics, Vol. 36, No. 4, Article 1. Publication date: July 2017. :10 • K. Um et al. (a) Transparent (b) Blended (opaque: 0.25) (c) Blended (opaque: 0.5) (d) Blended (opaque: 0.75) (e) OpaqueFig. 14. Examples from our series of rendering styles transitioning from opaque to transparent.

OpaqueBlended(opaque: 0.75)Blended(opaque: 0.5)Blended(opaque: 0.25)Transparent Transparent Blended(opaque: 0.25) Blended(opaque: 0.5) Blended(opaque: 0.75) Opaque 0.27919 0.46189 0.74095 1.000000.61722 0.84580 0.92656 1.00000 0.740950.77268 0.97564 1.00000 0.92656 0.461890.87906 1.00000 0.97564 0.84580 0.279191.00000 0.87906 0.77268 0.61722-0.01308 -0.01308 -0.2 0 0.2 0.4 0.6 0.8 1 1.2

Fig. 15. Correlation among the five sets of overall scores evaluated fromthe user studies with dierent rendering styles.

Tunç Ozan Aydin, Martin Čadík, Karol Myszkowski, and Hans-Peter Seidel. 2010. VideoQuality Assessment for Computer Graphics Applications.

ACM Trans. Graph.

29, 6(Dec. 2010), 161:1–161:12.

DOI: https://doi.org/10.1145/1866158.1866187Christopher Batty, Florence Bertails, and Robert Bridson. 2007. A Fast VariationalFramework for Accurate Solid-uid Coupling.

ACM Trans. Graph.

26, 3, Article 100(July 2007), 7 pages.

DOI: https://doi.org/10.1145/1276377.1276502Markus Becker and Matthias Teschner. 2007. Weakly compressible SPH for free sur-face ows. In

Proceedings of the 2007 ACM SIGGRAPH/Eurographics symposium onComputer animation (SCA ’07) . Eurographics Association, Aire-la-Ville, Switzerland,Switzerland, 209–217. http://dl.acm.org/citation.cfm?id=1272690.1272719Micah Bojrab, Michel Abdul-Massih, and Bedrich Benes. 2013. Perceptual Importanceof Lighting Phenomena in Rendering of Animated Water.

ACM Trans. Appl. Percept.

10, 1 (March 2013), 2:1–2:18.

DOI: https://doi.org/10.1145/2422105.2422107Elkin Botia-Vera, Antonio Souto-Iglesias, Gabriele Bulian, and L. Lobovský. 2010. ThreeSPH Novel Benchmark Test Cases for free surface ows. In

Proceedings of the 5thERCOFTAC SPHERIC workshop on SPH applications . Manchester, UK.Ralph Allan Bradley and Milton E. Terry. 1952. Rank Analysis of Incomplete BlockDesigns: I. The Method of Paired Comparisons.

Biometrika

39, 3/4 (1952), 324–345.

DOI: https://doi.org/10.2307/2334029Robert Bridson. 2015.

Fluid Simulation for Computer Graphics . CRC Press. (a) Glossy (b) TranslucentFig. 16. Example frames of the rendering styles.

Kirsten Cater, Alan Chalmers, and Patrick Ledda. 2002. Selective Quality Rendering byExploiting Human Inattentional Blindness: Looking but Not Seeing. In

Proceedingsof the ACM Symposium on Virtual Reality Software and Technology (VRST ’02) . ACM,New York, NY, USA, 17–24.

DOI: https://doi.org/10.1145/585740.585744Forrester Cole, Kevin Sanik, Doug DeCarlo, Adam Finkelstein, Thomas Funkhouser,Szymon Rusinkiewicz, and Manish Singh. 2009. How Well Do Line DrawingsDepict Shape?

ACM Trans. Graph.

28, 3, Article 28 (July 2009), 9 pages.

DOI: https://doi.org/10.1145/1531326.1531334Gilles Debunne, Mathieu Desbrun, Alan Barr, and Marie-Paule Cani. 1999. Interactivemultiresolution animation of deformable models. In

Computer Animation andSimulation ’99 . Springer, 133–144.Piotr Didyk, Elmar Eisemann, Tobias Ritschel, Karol Myszkowski, and Hans-PeterSeidel. 2010. Perceptually-motivated Real-time Temporal Upsampling of 3D Contentfor High-refresh-rate Displays.

Computer Graphics Forum

29, 2 (2010), 713–722.

DOI: https://doi.org/10.1111/j.1467-8659.2009.01641.xReynald Dumont, Fabio Pellacini, and James A. Ferwerda. 2003. Perceptually-DrivenDecision Theory for Interactive Realistic Rendering.

ACM Trans. Graph.

22, 2 (April2003), 152–181.

DOI: https://doi.org/10.1145/636886.636888Douglas Enright, Ronald Fedkiw, Joel Ferziger, and Ian Mitchell. 2002. A Hybrid ParticleLevel Set Method for Improved Interface Capturing.

J. Comput. Phys.

DOI: https://doi.org/10.1006/jcph.2002.7166Doug Enright, Duc Nguyen, Frederic Gibou, and Ron Fedkiw. 2003. Using the ParticleLevel Set Method and a Second Order Accurate Pressure Boundary Condition for FreeSurface Flows. In

Proceedings of 4th ASME-JSME Joint Fluids Summer EngeneeringConference , Vol. 2. 337–342.

DOI: https://doi.org/10.1115/FEDSM2003-45144Florian Ferstl, Ryoichi Ando, Chris Wojtan, Rüdiger Westermann, and Nils Thuerey.2016. Narrow band FLIP for liquid simulations.

Computer Graphics Forum

35, 2(2016), 225–232.Nick Foster and Ronald Fedkiw. 2001. Practical Animation of Liquids. In

Proceedingsof the 28th Annual Conference on Computer Graphics and Interactive Techniques(SIGGRAPH ’01) . ACM, New York, NY, USA, 23–30.

DOI: https://doi.org/10.1145/383259.383261Nick Foster and Dimitri Metaxas. 1996. Realistic Animation of Liquids.

GraphicalModels and Image Processing

58, 5 (Sept. 1996), 471–483.

DOI: https://doi.org/10.1006/gmip.1996.0039Dan Gerszewski and Adam W. Bargteil. 2013. Physics-Based Animation of Large-Scale Splashing Liquids.

ACM Trans. Graph.

32, 6 (Nov. 2013), 185:1–185:6.

DOI: https://doi.org/10.1145/2508363.2508430D. Han and J. Keyser. 2016. Eect of Low-Level Visual Details in Perception ofDeformation.

Computer Graphics Forum

35, 2 (May 2016), 375–383.

DOI: https://doi.org/10.1111/cgf.12839Ludovic Hoyet, Kenneth Ryall, Katja Zibrek, Hwangpil Park, Jehee Lee, Jessica Hodgins,and Carol O’Sullivan. 2013. Evaluating the Distinctiveness and Attractiveness ofHuman Motions on Realistic Virtual Bodies.

ACM Trans. Graph.

32, 6 (Nov. 2013),204:1–204:11.

DOI: https://doi.org/10.1145/2508363.2508367David R. Hunter. 2004. MM algorithms for generalized Bradley-Terry models.

The Annalsof Statistics

32, 1 (Feb. 2004), 384–406.

DOI: https://doi.org/10.1214/aos/1079120141Markus Ihmsen, Nadir Akinci, Gizem Akinci, and Matthias Teschner. 2012. Uniedspray, foam and air bubbles for particle-based uids.

The Visual Computer

28, 6-8(2012), 669–677.Markus Ihmsen, Jens Cornelis, Barbara Solenthaler, Christopher Horvath, and MatthiasTeschner. 2014a. Implicit Incompressible SPH.

IEEE Transactions on Visualizationand Computer Graphics

20, 3 (March 2014), 426–435.

DOI: https://doi.org/10.1109/TVCG.2013.105Markus Ihmsen, Jens Orthmann, Barbara Solenthaler, Andreas Kolb, and MatthiasTeschner. 2014b. SPH Fluids in Computer Graphics. In

Eurographics 2014 - State ofthe Art Reports . Eurographics Association, Strasbourg, France, 21–42.

DOI: https://doi.org/10.2312/egst.20141034R. Issa, D. Violeau, Antonio Souto-Iglesias, and Elkin Botia-Vera. 2017. SPHERICValidation Tests. http://spheric-sph.org/validation-tests. (2017).Chenfanfu Jiang, Craig Schroeder, Andrew Selle, Joseph Teran, and Alexey Stomakhin.2015. The Ane Particle-in-cell Method.

ACM Trans. Graph.

34, 4 (July 2015),51:1–51:10.

DOI: https://doi.org/10.1145/2766996ACM Transactions on Graphics, Vol. 36, No. 4, Article 1. Publication date: July 2017. erceptual Evaluation of Liquid Simulation Methods • 1:11

ByungMoon Kim, Yingjie Liu, Ignacio Llamas, and Jarek Rossignac. 2005. FlowFixer:Using BFECC for Fluid Simulation. In

Eurographics Conference on Natural Phenomena .Eurographics Association, Dublin, Ireland, 51–56.

DOI: https://doi.org/10.2312/NPH/NPH05/051-056K. M. T. Kleefsman, G. Fekken, A. E. P. Veldman, B. Iwanowski, and B. Buchner. 2005.A Volume-of-Fluid Based Simulation Method for Wave Impact Problems.

J. Comput.Phys.

DOI: https://doi.org/10.1016/j.jcp.2004.12.007Gondy Leroy. 2011.

Designing User Studies in Informatics . Springer London.

DOI: https://doi.org/10.1007/978-0-85729-622-1F. Losasso, J.O. Talton, N. Kwatra, and R. Fedkiw. 2008. Two-Way Coupled SPH andParticle Level Set Fluid Simulation.

IEEE Transactions on Visualization and ComputerGraphics

14, 4 (2008), 797–804.

DOI: https://doi.org/10.1109/TVCG.2008.37Miles Macklin and Matthias Müller. 2013. Position Based Fluids.

ACM Trans. Graph.

DOI: https://doi.org/10.1145/2461912.2461984Belen Masia, Sandra Agustin, Roland W. Fleming, Olga Sorkine, and Diego Gutierrez.2009. Evaluation of Reverse Tone Mapping Through Varying Exposure Conditions.

ACM Trans. Graph.

28, 5, Article 160 (Dec. 2009), 8 pages.

DOI: https://doi.org/10.1145/1618452.1618506Rachel McDonnell, Michéal Larkin, Simon Dobbyn, Steven Collins, and Carol O’Sullivan.2008. Clone Attack! Perception of Crowd Variety.

ACM Trans. Graph.

27, 3, Article26 (Aug. 2008), 8 pages.

DOI: https://doi.org/10.1145/1360612.1360625Matthias Müller, David Charypar, and Markus Gross. 2003. Particle-Based FluidSimulation for Interactive Applications. In

Proceedings of the 2003 ACM SIG-GRAPH/Eurographics Symposium on Computer Animation (SCA ’03) . EurographicsAssociation, Aire-la-Ville, Switzerland, Switzerland, 154–159.Zherong Pan, Jin Huang, Yiying Tong, Changxi Zheng, and Hujun Bao. 2013. InteractiveLocalized Liquid Motion Editing.

ACM Trans. Graph.

32, 6 (Nov. 2013), 184:1–184:10.

DOI: https://doi.org/10.1145/2508363.2508429Karl Pearson. 1920. Notes on the History of Correlation.

Biometrika

13, 1 (Jan. 1920),25–45.

DOI: https://doi.org/10.1093/biomet/13.1.25Daniel Ram, Theodore Gast, Chenfanfu Jiang, Craig Schroeder, Alexey Stomakhin,Joseph Teran, and Pirouz Kavehpour. 2015. A Material Point Method for ViscoelasticFluids, Foams and Sponges. In

Proceedings of the 2015 ACM SIGGRAPH/EurographicsSymposium on Computer Animation (SCA ’15) . ACM, New York, NY, USA, 157–163.

DOI: https://doi.org/10.1145/2786784.2786798Karthik Raveendran, Chris Wojtan, and Greg Turk. 2011. Hybrid Smoothed ParticleHydrodynamics. In

Proceedings of the 2011 ACM SIGGRAPH/Eurographics Symposiumon Computer Animation (SCA ’11) . ACM, New York, NY, USA, 33–42.

DOI: https://doi.org/10.1145/2019406.2019411B. Solenthaler and R. Pajarola. 2009. Predictive-corrective Incompressible SPH.

ACMTrans. Graph.

28, 3, Article 40 (July 2009), 6 pages.

DOI: https://doi.org/10.1145/1531326.1531346Jos Stam. 1999. Stable Fluids. In

Proceedings of the 26th Annual Conference on ComputerGraphics and Interactive Techniques (SIGGRAPH ’99) . ACM Press/Addison-WesleyPublishing Co., New York, NY, USA, 121–128.

DOI: https://doi.org/10.1145/311535.311548Alexey Stomakhin, Craig Schroeder, Lawrence Chai, Joseph Teran, and Andrew Selle.2013. A Material Point Method for Snow Simulation.

ACM Trans. Graph.

32, 4 (July2013), 102:1–102:10.

DOI: https://doi.org/10.1145/2461912.2461948Kiwon Um, Seungho Baek, and JungHyun Han. 2014. Advanced Hybrid Particle-GridMethod with Sub-Grid Particle Correction.

Computer Graphics Forum

33, 7 (Oct.2014), 209–218.

DOI: https://doi.org/10.1111/cgf.12489Kiwon Um, Xiangyu Hu, and Nils Thuerey. 2017. Liquid Splash Modeling with NeuralNetworks. (2017). arXiv:1704.04456Yongning Zhu and Robert Bridson. 2005. Animating Sand As a Fluid.

ACM Trans.Graph.

24, 3 (July 2005), 965–972.

DOI: https://doi.org/10.1145/1073204.1073298

A CROWD-SOURCING PLATFORMS

There exist several crowd sourcing services that provide a web-basedplatform where the requester can launch user studies with a web-interface. This section compares three popular platforms: AmazonMechanical Turk (MT), CrowdFlower (CF), and Microworkers (MW).In order to investigate consistency of all three platforms, we useour study setup for dam from Section 3.3 with six dierent versions.In addition, we included an additional seventh dummy video, whichwas synthesized by interleaving the six videos for each one second;we did not include reverse questions in these three studies.Table 5 and Figure 17 show the evaluation results from the userstudy run on all three platforms, and Table 6 shows the resultingcorrelation coecients. As all p-values ( < -0.5 0 0.5 1 1.5 2 2.5 a1 a2 a3 a4 a5 a6 a7 S c o r e CFMWMT

Fig. 17. Graph of seven scores evaluated from three platforms. there is signicant evidence with 99% condence to conclude thatthe user studies obtained on the dierent platforms match. Thus,when only considering the results of a single study, all three plat-forms yield very similar results.However, there are noticeable dierences in the cost for eachstudy. All platforms allow the requester to set a cost for each queryand the required number of participants. An additional service feeis typically charged on top of this. For our user study, we selected50 participants for 21 queries and a per query payment of 0.01 USD,which resulted in costs of 21.00 USD for MT, 12.60 USD for CF, and23.10 USD for MW. In addition, there were signicant dierences inexecution speed. With these settings, the MT platform took severalweeks to complete the study, while the other two platforms yieldedresults in less than three days. Due to additional limitations withrespect to the maximal number of queries in the CF platform, wechose the MW platform for all our studies.

Table 5. Three sets of scores evaluated from three platforms.

Score (standard error)ID CF MW MT 𝑎 𝑎 𝑎 𝑎 𝑎 𝑎 𝑎 Table 6. Pearson’s correlations for the three platforms.

CF, MW MW, MT MT, CF 𝑟 ACM Transactions on Graphics, Vol. 36, No. 4, Article 1. Publication date: July 2017. :12 • K. Um et al.

Table 7. The visual accuracy scores (and standard errors).