[PDF] Fast 3D Indoor Scene Synthesis with Discrete and Exact Layout Pattern Extraction

Abstract

We present a fast framework for indoor scene synthesis, given a room geometry and a list of objects with learnt priors. Unlike existing data-driven solutions, which often extract priors by co-occurrence analysis and statistical model fitting, our method measures the strengths of spatial relations by tests for complete spatial randomness (CSR), and extracts complex priors based on samples with the ability to accurately represent discrete layout patterns. With the extracted priors, our method achieves both acceleration and plausibility by partitioning input objects into disjoint groups, followed by layout optimization based on the Hausdorff metric. Extensive experiments show that our framework is capable of measuring more reasonable relations among objects and simultaneously generating varied arrangements in seconds.

Full PDF

FFast 3D Indoor Scene Synthesiswith Discrete and Exact Layout Pattern Extraction

Song-Hai ZhangTsinghua University [email protected]

Shao-Kui ZhangTsinghua University [email protected]

Wei-Yu XieBeijing Institute of Technology [email protected]

Luo-Cheng YangTsinghua University [email protected]

Hong-Bo FuCity University of Hong Kong [email protected]

Abstract

We present a fast framework for indoor scene synthesis,given a room geometry and a list of objects with learnt pri-ors. Unlike existing data-driven solutions, which often ex-tract priors by co-occurrence analysis and statistical modelﬁtting, our method measures the strengths of spatial rela-tions by tests for complete spatial randomness (CSR), andextracts complex priors based on samples with the abil-ity to accurately represent discrete layout patterns. Withthe extracted priors, our method achieves both accelerationand plausibility by partitioning input objects into disjointgroups, followed by layout optimization based on the Haus-dorff metric. Extensive experiments show that our frame-work is capable of measuring more reasonable relationsamong objects and simultaneously generating varied ar-rangements in seconds.

1. Introduction

3D indoor scene synthesis is thriving in recent years.As demonstrated in [48, 16, 28], automatically synthesiz-ing plausible rooms beneﬁts various applications. With theemergence of various datasets for 3D indoor scenes [38, 23,49], techniques shift toward data-driven approaches, i.e.,modeling priors expressing strategies of layouts of furnitureobjects. However, inherent difﬁculties of 3D indoor scenesynthesis still exist in various aspects.First, it is inevitable to deal with furniture layouts pa-rameterized continuously or discretely, which distribute incomplex high-dimensional spaces [24]. A few works (e.g.,[15, 32, 13, 30]) attempt to simplify layouts into indepen-dent cliques or subsets e.g., [13, 32]. However, their under-lying metric depends on “co-occurrence”, which is merelycounting co-existing frequencies instead of incorporating Figure 1: Given a list of objects (Left), we decompose theminto disjoint groups (Top-Middle) with coherence for eachindividual group and freedom among groups. By incorpo-rating discrete templates as priors to guide syntheses, ourmethod generates various plausible layouts in seconds.spatial knowledge. For an example in Figure 2, a highfrequency of co-occurrence does not necessarily signify astrong spatial relationship. In other words, scene synthesispurely based on co-occurrence could generate weird out-comes.Second, due to innumerable strategies of arrangement,it is hard to exhaustively list all possible spatial relationsamong objects [5, 6, 31, 45, 22] or to mathematically formu-late uniﬁed and accurate models for them [13, 44, 40, 41].For example, Chang et al. [5] dictate a speciﬁc set ofpossible relations such as “support”, “right”, “front”, etc,which fundamentally limit the variety of possibly synthe-sized scenes. To model relations with multiple patterns,a common approach is to ﬁt observed layouts with mod-els. However, “ﬁtting models” could potentially introducenoises and be inﬂuenced by noises, especially when the un-derlying patterns do not satisfy with the assumptions ofmodels, e.g., a commonly used Gaussian mixture model1 a r X i v : . [ c s . G R ] F e b a) A bedroom. (b) A stand & a bed. (c) A stand & a chair. Figure 2: Illustrating the problems of co-occurrence. Withsimilar co-occurrence, two relative positions of two pairs ofobjects are shown in 2a. In 2b, the double bed and the standare obviously spatial related, while the stand and the chairare distributed randomly.(GMM). Figure 3 shows a failure case of sampling a relativeposition and an orientation from a GMM of a chair w.r.t atable. We argue that when the observed data is of sufﬁcientsize, the correct case inside observations or samples for ﬁt-ting already offers exact layout strategies with varieties.To address the above difﬁculties, in this paper, we pro-pose a method to measure the strength of spatial relationsbetween objects by utilizing tests for complete spatial ran-domness (CSR) [9]. A test for CSR (Section 4) describeshow likely a set of events are generated w.r.t a homoge-neous Poisson process. Intuitively, it measures how obviouscertain patterns exist in a set of points. Therefore, objectswith high measurements tend to be grouped and arrangedtogether. Objects that fail to pass tests for CSR are ignored,even if they have high co-occurrence.Furthermore, we present an approach for extracting dis-crete representation of various shapes of layout strategies,incorporating density peak clustering [34]. Finally, wepresent a framework for automatically synthesizing variousarrangements of given objects w.r.t an input room geometry,by partitioning input objects into disjoint groups accordingto the extracted priors, followed by optimization based onthe Hausdorff metric to cope with discrete priors. The entireprocess can be done in seconds. (a) GMM. (b) GMM ﬁts ours. (c) Ours.

Figure 3: Inherent problems of ﬁtting models. Fitting rawdata (Left) suffers with noises. GMM ﬁtting our denoiseddata (Middle) gives a blurred pattern. Ours (Right) success-fully places the chair at the right place in a correct orienta-tion.In sum our work makes the following contributions:1. We ﬁrst incorporate tests for complete spatial random- ness to measure the strength of spatial relations be-tween objects, which is a more powerful measurementthan “co-occurrence”, allowing us to decompose agiven list of objects into several disjoint groups. There-fore, unnecessary calculations are reduced whilst theplausibility is increased for indoor scene modeling.2. We propose to use discrete and exact distributions torepresent layout patterns of arbitrary shapes of objectsfor indoor scene conﬁgurations.3. We introduce a fast indoor scene synthesis framework,which is able to generate diverse arrangements in sec-onds.

2. Related Works

3D Indoor Scene Synthesis aims at generating appro-priate layouts of furniture objects for rooms. Various so-lutions considering different input settings and tasks havebeen proposed. For example, [3, 7, 14, 37] generate roomlayouts based on RGB-D images or 3D scans. Human lan-guage [6, 5, 30] and hand-drawn sketches [44] have alsobeen explored as additional inputs to guide scene synthesis.[41, 33, 29] iteratively infer the next objects to rooms. Afull review of existing works on indoor scene synthesis isbeyond the scope of this paper. Please refer to an insightfulsurvey in [48].As discussed in Section 1, representations of layoutstrategies play an important role in 3D indoor scene syn-thesis. To encode prior knowledge, [45, 31, 42] attemptto quantify interior design rules. The emerging availabil-ity of 3D indoor scene datasets enables various data-drivenapproaches. For example, [40, 6] model spatial relationsbetween objects using semantics such as “left”, “right”,“front”, etc. Gaussian mixture models (GMMs) are adoptedby [13, 44, 19] to ﬁt observed distributions of objects.Graph structures are constructed by [32, 15]. As surveyed in[48], [46, 25] model contexts for objects, e.g., average ori-entations and distances between objects, orientations w.r.t the nearest walls, etc. However, despite the variety of rep-resentations, the underlying metrics are still conﬁned to co-occurrence, model ﬁtting or even intuitive semantics, e.g.,probabilities of edges are calculated by co-existing frequen-cies [13, 32].Our task partially resembles [46] and [42], but takesan automatic approach to extract constraints from existinglayouts. We are also inspired by the works of [13] and[43]. However, the former requires exemplar scenes as in-put, while the latter focuses on re-arrangement of existingscenes. In contrast, we aim to learn general patterns forpairs of objects from existing layout examples for scenesynthesis.

Tests for Complete Spatial Randomness (CSR) is aclassical topic [11]. Given a series of points distributedn a plane, a test for CSR is typically used to answer howlikely the points are placed randomly. Formally, it describeshow likely a set of events are generated w.r.t a homogeneousPoisson process (planar Poisson process). Previously, mostapplications of CSR are conﬁned to ecology [17], e.g., to in-vestigate whether or not a set of observed plants are locatedwith patterns. Rosin [35] is probably the ﬁrst to bring theconcept of CSR into computer vision to handle the problemof how to detect white noises inside images. Typical meth-ods of tests for CSR include using Diggle’s function [9, 20],distance-based methods [11], etc. In this paper, we follow[2] to test CSR by means of angles (Section 4). (a) d wa,ct = 1 . . (b) d dt,ch = 2 . . (c) d be,ni = 2 . . Figure 4: Several results of tests for CSR. 4a plots relativepositions between a wardrobe cabinet and a coffee table,4b plots relative positions between a dinning table and achair, and 4c plots relative positions between a bed and anightstand.

3. Overview

Our pipeline is split into an ofﬂine stage and an onlinestage. In the ofﬂine stage, we ﬁrst learn the spatial strengthgraph G ss indicating how objects are spatially related witheach other (section 4). This is more powerful than countingco-occurrence. We also extract versatile patterns of layoutstrategies as discrete “templates” and reduce noises withindatasets such as SUNCG [38] (Section 5). Given learnedpriors, an empty room, and a set of user-speciﬁed objects,during the online stage, our method ﬁrst groups spatially co-herent objects into groups (e.g., a bed and two night stand,as illustrated in Figure 9b). Next, we do an instant arrange-ment for each group by heuristically using learned tem-plates. Finally, we adjust the overall layout by optimizing aconsistent loss function (Section 6).In formal, the ofﬂine input is a multigraph G in =( V in , E in ) , which is a direct mathematical representationof original datasets, i.e., each vertex corresponds to an ob-jects and each edge corresponds to a displacement betweentwo objects. A vertex v iin ∈ V in contains a set of attributes { ( d i,ωwall , θ i,ωwall , t i,ωwall ) | ω = 1 , , . . . , Ω } , i.e., the row val-ues of distances, orientations and translations w.r.t its near-est walls.Centering an object o i , the k -th edge e i,j,kin ∈ E in from v iin to v jin is valued by a quadruple ( p i,j,kx , p i,j,ky , p i,j,kz , p i,j,kθ ) representing the k -th relative translation and orientation of o j w.r.t o i . And we leverage E i,jin to indicate the set of edges formed from v iin to v jin ,where v iin is the corresponding vertex in V in of object o i .We construct G in with 2266 vertices, over 2 millionedges from more than 520,000 rooms in SUNCG dataset,and measure their strength of spatial relations in Section 4and extract layout priors in Section 5. (a) (b) Figure 5: 5a plots the CSR value and co-occurrence of ev-ery pair of objects. Two objects might co-occur in manyrooms, while the strength of their spatial relation could below, vice versa. For example, the bed and the nightstand dohave patterns of each other despite their low co-occurrence.5b plots how tests for CSR are able to retain relations whichare mistakenly removed by co-occurrence with increasingthresholds set for co-occurrence.

4. Spatial Strength Graph

Before actually extracting a template from datasets foreach pair of objects, a question naturally arises: do we re-quire templates for all pairs? As shown in Figure 2, twoobjects could have very messy layout strategies, with trans-formations between them rather independent of each other,even though they might have high co-occurrence. This mo-tivates us to learn a spatial strength graph (SSG) so that amultitude of pairs of objects that have low relations of spa-tial strength are ignored when arranging rooms. This willhelp us synthesize more plausible scenes but also acceleratethe synthesis process.Formally, an SSG is a weighted graph deﬁned as G ss =( V ss , E ss ) , where G ss denotes an entire graph, with V ss = V in representing all objects in the dataset, and E ss is theedges with weight to encode spatial strength between ob-jects. We measure the weights of E ss by equation 1 that is“d-value” [2] within the domain of tests for complete spatialrandomness (CSR) [9]: d = √ m sup | F c ( θ ) − F e ( θ ) | . (1) F c and F e are respectively cumulative distribution function(CDF) and empirical distribution function (EDF) w.r.t an-gle θ , which is subject to uniform distribution [2]. m isthe number of points formulating the F e . For each pair ofobjects o i and o j , the weights E i,jss is set to d i,j subject toandom samples from E i,jin in a ratio of 10%, as suggested in[2] and [10]. As shown in ﬁgure 4a, a wardrobe and a cof-fee table are spatilly independent, so their d-value is low.Although considerable noises exist in ﬁgure 4b, d-value ofa dinning table and a chair is still reasonablly high. Finally,ﬁgure 4c shows clear patterns between a bed and a night-stand.Figure 5 suggests the differences between tests for CSRand co-occurrences. In ﬁgure 5a, we plot two mea-surements for all pairs of objects, where pairs includingan air-conditioner typically co-occur frequently but air-conditioners are placed independently to most of other ob-jects. Figure 5b illustrates how tests for CSR are able toretain relations mistakenly removed by co-occurrences,

5. Prior Extraction (a) Input. (b) De-noised. (c) Fitted.

Figure 6: The overall process of prior extraction. 6a isthe input with considerable noises. 6b de-noises the input,which is readily to use in our framework, while 6c depictsthe further generalization of our templates into ﬁtted modelswhich are applicable for other frameworks such as MCMCPatterns are priors suggesting how we arrange objects inreal life layouts. Figure 6c shows a pattern of a laptop w.r.t an ofﬁce chair. Since relative translations are incorporated,patterns can inherently avoid some unreasonable situationssuch as collisions. However, it is obvious that we cannot seta uniﬁed model for all patterns, since the patterns can havearbitrary shapes. To extract arbitrary-shaped patterns in dis-crete representations, we adopt the approach in [34], whichclusters points according to ρ (equation 2) and δ (equation3), where the indicating function I { d ≤ d c } returns if d ≤ d c and otherwise. ρ k = (cid:88) k (cid:48) I { d ≤ d c } ( d k,k (cid:48) ) , d c = d ( ηK ) , (2) δ k = min j : ρ k <ρ k (cid:48) ( d k,k (cid:48) ) . (3)Given a set of edges E i,jin from v iin to v jin in G in as shownin Figure 6a, we ﬁrst calculate pairwise Euclidean distancesbetween them using translations. For each edge e i,j,kin , a ρ k is counted as the number of other edges with distancesless than d c to it. Taking K points, d c is the ηK - great-est value among all pairwise distances with η = 0 . assuggested by [34]. δ k represents the minimal distance from a set of e i,j,k (cid:48) with higher ρ k (cid:48) than ρ k . As a result, despitearbitrary shapes, merely edges with high ρ k belong to a po-tential pattern, and each edge with high ρ k and high δk in-dexes to a potential pattern, which is analogous to a clustercenter in [34]. In contrast, noises tend to have high valuesof δ while their local density is distinctly low. As a result,we reduce noises and highlight patterns E i,jp , as illustratedin ﬁgure 6b. The rest of accurate patterns form a discretetemplates E i,jp are already fully usable to our framework.To incorporate our model in previous works, e.g., MCMC,our priors can be easily ﬁtted to distributions such as usingnon-parametric kernel density estimation based on Gaus-sian kernels, as shown in Figures 6c and 8.We also perform similar prior extraction tasks for in-dividual objects with regard to their orientations and dis-tances to the nearest walls where d k,k (cid:48) becomes differencesof scalars. In doing so, we keep the values t w , θ w withboth high values of ρ and δ to index the pattern. Then weformulate the translation and rotation priors of walls intoboth multinomial distribution, and assign them to their cor-responding vertices in G in .Figure 7: Assigning existing templates to new objects ofsimilar geometry. Given a previously unseen ofﬁce chair(Left), we achieve the layout strategy of it w.r.t the desk(Right) by merging templates of objects geometrically sim-ilar to the chair (Middle).Next, we further generalize our templates to make themreusable and extensible. We observed that objects withsame semantics and similar geometries share layout strate-gies. As shown in Figure 7, given a new object without thecorresponding priors extracted from our datasets, we ﬁndits similar models by comparing 3D shapes of models using[21], which uses s kshed to measure the degree of similarity.We select the top- K results { ( o k , s kshed ) | s Kshed = β, k =1 , , , . . . , K } and take the union of the K templates asthe template for the new object, where β is chosen as . according to our experiments.Figure 8 shows some results of Learnt priors. Similarto the visualization of dense optical ﬂows [12], we applythe system of hue, saturation and value (HSV) to representrientations, where angles are normalized within (0 , π ) ashue, probability densities are represented as saturation, andvalues are all set to . Since height differences for mostobjects do not vary signiﬁcantly, we plot the three channels ( p i,j,kx , p i,j,kz , p i,j,kθ ) to make it more intuitive.

6. Scene Synthesis

In this section, we incorporate the learnt SSG and priorsto synthesize room layouts. A synthesis process typicallyincludes two steps: a heuristic arrangement, followed by anoptimization. Given a set of input objects ˆ O , we ﬁrst de-compose them into several groups according to the SSG,and arrange objects within each group, where relative trans-formations are immediately indexed by the templates. Fi-nally, we apply a global optimization to satisfy layout strate-gies of objects in ˆ O . We ﬁrst construct an unweighted graph described by anadjacency matrix M adj , whose vertices correspond to inputobjects ˆ O . Entries of M adj are determined by G ss in Sec-tion 4. More speciﬁcally, if d u,v ≥ (cid:15) , then we set M u,vadj = 1 ,where (cid:15) typically equals to . as suggested in [2]. Af-ter achieving M adj , we iteratively construct disjoint groups g ∈ Gr of objects by ﬁnding connected components of thegraph represented by M adj . Figure 9 shows examples ofresulting groups. It is common to see a group containingonly one object, such as wardrobe, cabinet or shelf, becausetheir placement usually does not require considering otherobjects, Such single-object groups greatly ease the follow-ing optimization process.Based on a given room shape, partitioned groups, andlearnt templates, we then generate proposals for pendingscenes, i.e., objects are immediately placed and oriented w.r.t their groups and walls. For each group g ∈ Gr , lay-outs of g are heuristically generated by sampling a posteriorprobability distribution Ψ G | E p ( g ) expressed in Equation 4,given templates E p (Section 5). Ψ G | E p ( g ) = α ( g ) · Φ E p | G = g ( E µp ) (cid:82) α ( g ) · Φ E p | G = g ( E µp ) dg , (4) = α ( g ) · (cid:80) µ (cid:81) τ ∈ g φ E µp | T = τ ( E µ,tp , τ µ ) (cid:82) α ( g ) · Φ E p | G = g ( E µp ) dg , (5) α ( g ) denotes the probability of each object τ ∈ g beingthe dominant object τ µ in g . Let deg ( τ ) denote the de-gree of τ w.r.t M adj , which is the number of objects con-nected with it according to the test for CSR (Section 4), and dmax = max τ ∈ g deg ( τ ) . The likelihood φ E µp | T = τ ( · ) is amultinomial distribution formed by the given template E µ,tp of τ w.r.t τ µ , while it is equal to a constant when τ = τ µ . α ( g ) = (cid:40) |{ τ | τ ∈ g,deg ( τ )= dmax }| , if deg ( τ µ ) = dmax , otherwise , (6)When sampling Ψ G | Θ= ˆ O , we ﬁrst randomly decide τ µ of g . Equation 5 implies that { φ E µp | T = τ ( · ) | τ ∈ g } are in-dependent to each other, so transformations of objects aresampled according to their own templates, respectively. Inpractice, if an object has a relatively low d-value to τ µ , wefurther decompose the group and assign a new dominant ob-ject to it. In some cases, this heuristic strategy could samplea sufﬁciently plausible layout even without a further opti-mization. However, the heuristic strategy may still results inunreasonable conditions such as collision between groups,objects out of room boundaries, etc. Next we show how weadjust objects so that a plausible layout of objects is even-tually presented. Equation 7 mathematically formalizes template match-ing, where we are trying to minimize the summation ofHausdorff distances d H between all objects w.r.t their tem-plates. X i indexes the transformation of object o i and E p isa set of sampled transformations in Section 5. X ∗ = arg min X L ( X, E p ) (7) = arg min X (cid:88) i,j M i,jadj d H ( X i , E i,jp ) + Col ( X, r ) , (8) d H is Hausdorff metric between an element to a set de-rived by the distance function d h under the space of trans-lation and rotation. The reason for assembling Hausdorffdistance is that it directly tackles samples instead of dis-tributions. As illustrated, it is unlikely to mathematicallyexpress a uniﬁed distribution to model arbitrary layout pat-terns. In contrast, if we could extract samples of arbitraryshape, Hausdorff metric enables pipelines to skip model ﬁt-ting and to optimize directly using reﬁned samples. d H ( x, S ) = min v ∈ S d h ( x, s ) , (9) d h ( x, s ) = (cid:13)(cid:13) x p − v p (cid:13)(cid:13) + exp( ori ( x θ , v θ )) , (10) ori ( θ, θ (cid:48) ) = min(2 π − (cid:12)(cid:12) θ − θ (cid:48) (cid:12)(cid:12) , (cid:12)(cid:12) θ − θ (cid:48) (cid:12)(cid:12) ) , (11)Equation 12 represents the artifacts among objects andbetween objects and walls, where p ( χ, k ) returns the k -throtated point position of the bounding box or the shape of χ .Ideally, if there is no collision and no object out of bound- a) CoffeeTable-Chair (b) Sofa-Loudspeaker (c) TV-CoffeeTable (d) TVStand-PlayStation (e) Desk-OfﬁceChair (f) Sink-Toilet

Figure 8: Several results of learnt priors. (a) Marked relations. (b) Disjoint graphs.

Figure 9: Formulating functionally coherent groups of ob-jects using tests for CSR.ary,

Col ( X, r ) should equal to . Col ( X, r ) =

Col wall ( X, r ) +

Col obj ( X )= (cid:88) i,k (cid:89) r tR ( p ( X i , k ) , p ( R, r ) , p ( R, r + 1))+ (cid:88) i,k,j (cid:89) l tL ( p ( X i , k ) , p ( X j , l ) , p ( X j , l + 1)) (12) Col wall measures whether or not objects are out of walls,whilst

Col obj calculates overlaps among objects. Trun-cated by tR ( · ) and tL ( · ) , γ ( · ) represents the “to-left” test ofcomputational geometry [8], such as the utilization in [18].In addition to given objects, we place extra virtual objectsas doors and windows with ﬁxed transformations to avoidblocking them. tR ( p , p , p ) = max( − γ ( p , p , p ) , (13) tL ( p , p , p ) = max( γ ( p , p , p ) , (14)Since the underlying metrics are factorized as quadraticterms, we optimize equation 9 utilizing Position-Based Dy-namics (PBD) [4], which is also detailed in [42]. Incorpo-rating heuristic approaches, syntheses require iterationsto converge on average after heuristic attempts. Figure 10: A comparison between values from tests forCSR and co-occurrence.

7. Experiments

Figure 10 shows the comparison between using co-occurrence and using tests for CSR to measure the strengthsof relations between objects. The results is normalizeddue to different scales. The upper triangular part depictsco-occurrence and the lower ﬁlls results from tests forCSR, which alleviates the unreasonableness caused by co-occurrence. The shelf is weakly related to the computerspatially, but co-occurrence suggests a strong relation.It is obvious that placing the gray desk is independent ofarranging the brown shelf, but they have a high frequencyof co-existing in different rooms of various types, whichpotentially inﬂuences overall performances. Applying testsfor CSR for them, the pair is decoupled spatially. The sameis true of objects preferring independent layout with mostof the others, such as the white dryer, the wardrobe and thebrown stand.

Figure 11 compares priors of our work with others. Sincepriors model layouts with high probability of being plausi-ble between objects, we should get likely transformationsy sampling them. According to experiments, both [46]and GMM fail if noises exist in datasets, so inputs to allpriors are data de-noised by us (section 5). Figure 11 showsthe sampled transformations of priors, given de-noised data(section 5).Red dots denote the centred objects. The top row is pri-ors used by [46], where they average the relative distancesand orientations. We disturb their mean distance and orien-tations by a Gaussian kernel κ ∼ N (0 , . . The middlerow is GMM used by [44, 13]. Although we further man-ually set appropriate thresholds for each pattern in order toreduce potential noises as well as assembling [1] to assistexplorations of number of peaks, the results are still con-ﬁned to elliptical shapes such as ﬁgure 11a, or introducedoutliers such as 11b. The bottom row shows our results.Ours are capable of detecting various layout patterns with-out introducing outliers. For example, ﬁgure 11c showstwo patterns (four symmetry patterns) between a sofa anda loudspeaker. (a) DiningTable-Glass (b)

LongTable-Chair (c)

Sofa-Loudspeaker

Figure 11: Sampled transformations from extracted priorsbetween objects, where the former is centred as red dots.Top: Yu et al. [46]. Middle: Xu et al. [44], Fisher el al.[13]. Bottom: Ours.

Our work achieves acceleration due to the usage of PBD[4] which is veriﬁed [42] to be faster than using MCMC.However, our work is different from [42] since ours is data-driven and does not require user input to constraint for eachsynthesis. In this section, we conduct several experimentsto show the achieved efﬁciency, where examples are chosenfrom ﬁgure 12.We do heuristic arrangement for both [32] and [46] tospeed up their work. The time costs are shown in table 1, Table 1: Time Consumption (sec). > > > > > > > > iterations. Experimentally, deter-mining terminations for MCMC is hard and proposal movesare precarious. Because we judge whether or not a proposalis accepted after each iteration, resources are wasted. Results of our work is shown in ﬁgure 12. Formulat-ing functional groups using CSR enables us to generate hy-brid rooms. Evaluations of 3D indoor scenes are subjec-tive, so we conduct two user studies to evaluate our method.Firstly, aesthetic measures how visually pleasing the gener-ated scenes are, i.e., asking subjects to grade generated lay-outs shufﬂed with ground truth. Subjects grade from level-1(poor) to level-5 (perfect). As listed in table 2, our gener-ated results are comparable to the original layouts. Anotheruser study is conducted to measure how tests for CSR andlearnt templates satisfy intuitions of humans. We sort pair-wise relations by tests for CSR and co-occurrence respec-tively. For each sorted list of pairs, we take templates ofpairs at a ﬁxed interval int = 120 from the highest value.Then subjects judge whether or not the presented templatesare consistent with real-life layout strategies. Tabulated intable 3, results for co-occurrence contain considerable pairsspatially independent. In total, scenes and templatesare generated. We invite subjects from societies and theywere merely told to grade layouts and and judge patterns.

8. Conclusion

In this paper, we present a framework for 3D indoorscene synthesis based on analysis of patterns. We exper-imentally verify the correctness, generalization and effec-tiveness of it. Our framework is capable of further ex-pansion by easily incorporating object selections such as[32, 26]. Future work includes getting ﬁner comparisons of3D shapes for generalizing our templates such as 3DMatch[47]. Recently, improvements for density peak clusteringare also available [39, 27]. We hope the pipeline, learntmodels and synthesized layouts can contribute to automaticroom layouts as well as associated domains such as sceneunderstanding [36].able 2: User study: aesthetics.Data Bedroom Living Room Bathroom Dinning Room Balcony Hall Garage

Hybrid Room

TotalOurs 2.944 3.292 2.989 3.344 3.344 3.061 3.256 3.317 3.194Ground Truth 2.911 3.422 3.156 3.589 3.378 2.878 3.511 3.367 3.276Table 3: User study: evaluations of tests for CSR and co-occurrence.Metric Bedroom Living Room Bathroom Dinning Room Balcony Hall Garage TotalTests for CSR 93.31% 85.47% 96.67% 92.42% 86.36% 89.47% 76.17% 88.55%Co-Occurrence 32.26% 43.81% 86.67% 45.76% 23.08% 38.46% 36.84% 43.53% (a) Bedroom (b) Living Room (c) Bathroom(d) Hybrid Room - 1 (e) Hybrid Room - 2 (f) Hybrid Room - 3

Figure 12: Examples of various synthesized results. Each scene is generated with three alternative layouts in top viewsfollowed by side views.

References [1] Hirotogu Akaike. Information theory and an extension ofthe maximum likelihood principle. In

Selected papers of hi-rotugu akaike , pages 199–213. Springer, 1998. 7[2] Renato Assuncao. Testing spatial randomness by means ofangles.

Biometrics , pages 531–537, 1994. 3, 4, 5[3] Armen Avetisyan, Manuel Dahnert, Angela Dai, ManolisSavva, Angel X Chang, and Matthias Nießner. Scan2cad:Learning cad model alignment in rgb-d scans. arXiv preprint arXiv:1811.11187 , 2018. 2[4] Jan Bender, Matthias M¨uller, Miguel A Otaduy, MatthiasTeschner, and Miles Macklin. A survey on position-basedsimulation methods in computer graphics. In

Computergraphics forum , volume 33, pages 228–251. Wiley OnlineLibrary, 2014. 6, 7[5] Angel Chang, Will Monroe, Manolis Savva, ChristopherPotts, and Christopher D Manning. Text to 3d scenegeneration with rich lexical grounding. arXiv preprintarXiv:1505.06289 , 2015. 1, 26] Angel Chang, Manolis Savva, and Christopher D Manning.Learning spatial knowledge for text to 3d scene generation.In

Proceedings of the 2014 Conference on Empirical Meth-ods in Natural Language Processing (EMNLP) , pages 2028–2038, 2014. 1, 2[7] Kang Chen, Yukun Lai, Yu-Xin Wu, Ralph Robert Martin,and Shi-Min Hu. Automatic semantic modeling of indoorscenes from low-quality rgb-d data using contextual infor-mation.

ACM Transactions on Graphics , 33(6), 2014. 2[8] Mark De Berg, Marc Van Kreveld, Mark Overmars, and Ot-fried Schwarzkopf. Computational geometry. In

Computa-tional geometry , pages 1–17. Springer, 1997. 6[9] Peter J Diggle. On parameter estimation and goodness-of-ﬁttesting for spatial point patterns.

Biometrics , pages 87–101,1979. 2, 3[10] Peter J Diggle, Julian Besag, and J Timothy Gleaves. Statis-tical analysis of spatial point patterns by means of distancemethods.

Biometrics , pages 659–667, 1976. 4[11] Peter J Diggle et al.

Statistical analysis of spatial point pat-terns.

Academic press, 1983. 2, 3[12] Gunnar Farneb¨ack. Two-frame motion estimation based onpolynomial expansion. In

Scandinavian conference on Im-age analysis , pages 363–370. Springer, 2003. 4[13] Matthew Fisher, Daniel Ritchie, Manolis Savva, ThomasFunkhouser, and Pat Hanrahan. Example-based synthesisof 3d object arrangements.

ACM Transactions on Graphics(TOG) , 31(6):135, 2012. 1, 2, 7[14] Matthew Fisher, Manolis Savva, Yangyan Li, Pat Hanrahan,and Matthias Nießner. Activity-centric scene synthesis forfunctional 3d scene modeling.

ACM Transactions on Graph-ics (TOG) , 34(6):179, 2015. 2[15] Qiang Fu, Xiaowu Chen, Xiaotian Wang, Sijia Wen, BinZhou, and Hongbo Fu. Adaptive synthesis of indoor scenesvia activity-associated object relation graphs.

ACM Transac-tions on Graphics (TOG) , 36(6):201, 2017. 1, 2[16] Tobias Germer and Martin Schwarz. Procedural arrangementof furniture for real-time walkthroughs. In

Computer Graph-ics Forum , volume 28, pages 2068–2078. Wiley Online Li-brary, 2009. 1[17] Jacques Gignoux, Camille Duby, and S´ebastian Barot. Com-paring the performances of diggle’s tests of spatial random-ness for small samples with and without edge-effect correc-tion: application to ecological data.

Biometrics , 55(1):156–164, 1999. 3[18] Ronald L. Graham. An efﬁcient algorithm for determiningthe convex hull of a ﬁnite planar set.

Info. Pro. Lett. , 1:132–133, 1972. 6[19] Paul Henderson, Kartic Subr, and Vittorio Ferrari. Automaticgeneration of constrained furniture layouts. arXiv preprintarXiv:1711.10939 , 2017. 2[20] LP Ho and SN Chiu. Testing the complete spatial random-ness by diggle’s test without an arbitrary upper limit.

Journalof Statistical Computation and Simulation , 76(07):585–591,2006. 3[21] Yanir Kleiman, Oliver van Kaick, Olga Sorkine-Hornung,and Daniel Cohen-Or. Shed: shape edit distance for ﬁne-grained shape similarity.

ACM Transactions on Graphics(TOG) , 34(6):235, 2015. 4 [22] Manyi Li, Akshay Gadi Patil, Kai Xu, Siddhartha Chaudhuri,Owais Khan, Ariel Shamir, Changhe Tu, Baoquan Chen,Daniel Cohen-Or, and Hao Zhang. Grains: Generative re-cursive autoencoders for indoor scenes.

ACM Transactionson Graphics (TOG) , 38(2):12, 2019. 1[23] Wenbin Li, Sajad Saeedi, John McCormac, Ronald Clark,Dimos Tzoumanikas, Qing Ye, Yuzhong Huang, Rui Tang,and Stefan Leutenegger. Interiornet: Mega-scale multi-sensor photo-realistic indoor scenes dataset. In

British Ma-chine Vision Conference (BMVC) , 2018. 1[24] Yabei Li, Junge Zhang, Yanhua Cheng, Kaiqi Huang, andTieniu Tan. Df 2 net: Discriminative feature learning and fu-sion network for rgb-d indoor scene classiﬁcation. In

Thirty-Second AAAI Conference on Artiﬁcial Intelligence , 2018. 1[25] Yuan Liang, Fei Xu, Song-Hai Zhang, Yu-Kun Lai, and Tai-jiang Mu. Knowledge graph construction with structure andparameter learning for indoor scene design.

ComputationalVisual Media , 4(2):123–137, 2018. 2[26] Yuan Liang, Song-Hai Zhang, and Ralph Robert Martin. Au-tomatic data-driven room design generation. In

InternationalWorkshop on Next Generation Computer Animation Tech-niques , pages 133–148. Springer, 2017. 7[27] Ruhui Liu, Weiping Huang, Zhengshun Fei, Kai Wang, andJun Liang. Constraint-based clustering by fast search andﬁnd of density peaks.

Neurocomputing , 330:223–237, 2019.7[28] Gloria Hander Lyons.

Ten Common Home Decorating Mis-takes & How to Avoid Them . Blue Sage Press, 2008. 1[29] Rui Ma, Honghua Li, Changqing Zou, Zicheng Liao, XinTong, and Hao Zhang. Action-driven 3d indoor scene evolu-tion.

ACM Trans. Graph. , 35(6):173–1, 2016. 2[30] Rui Ma, Akshay Gadi Patil, Matthew Fisher, Manyi Li,S¨oren Pirk, Binh-Son Hua, Sai-Kit Yeung, Xin Tong,Leonidas Guibas, and Hao Zhang. Language-driven synthe-sis of 3d scenes from scene databases. In

SIGGRAPH Asia2018 Technical Papers , page 212. ACM, 2018. 1, 2[31] Paul Merrell, Eric Schkufza, Zeyang Li, Maneesh Agrawala,and Vladlen Koltun. Interactive furniture layout using in-terior design guidelines. In

ACM transactions on graphics(TOG) , volume 30, page 87. ACM, 2011. 1, 2[32] Siyuan Qi, Yixin Zhu, Siyuan Huang, Chenfanfu Jiang, andSong-Chun Zhu. Human-centric indoor scene synthesis us-ing stochastic grammar. In

Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition , pages5899–5908, 2018. 1, 2, 7[33] Daniel Ritchie, Kai Wang, and Yu-an Lin. Fast and ﬂexi-ble indoor scene synthesis via deep convolutional generativemodels. In

Proceedings of the IEEE Conference on Com-puter Vision and Pattern Recognition , pages 6182–6190,2019. 2[34] Alex Rodriguez and Alessandro Laio. Clustering by fastsearch and ﬁnd of density peaks.

Science , 344(6191):1492–1496, 2014. 2, 4[35] Paul Rosin. Thresholding for change detection. In

Sixth In-ternational Conference on Computer Vision (IEEE Cat. No.98CH36271) , pages 274–279. IEEE, 1998. 3[36] Scott Satkin, Jason Lin, and Martial Hebert. Data-drivenscene understanding from 3d models. 2012. 737] Tianjia Shao, Weiwei Xu, Kun Zhou, Jingdong Wang, Dong-ping Li, and Baining Guo. An interactive approach to seman-tic modeling of indoor scenes with an rgbd camera.

ACMTransactions on Graphics (TOG) , 31(6):136, 2012. 2[38] Shuran Song, Fisher Yu, Andy Zeng, Angel X Chang, Mano-lis Savva, and Thomas Funkhouser. Semantic scene comple-tion from a single depth image.

Proceedings of 30th IEEEConference on Computer Vision and Pattern Recognition ,2017. 1, 3[39] Bangyu Tong. Density peak clustering algorithm basedon the nearest neighbor. In . Atlantis Press, 2019. 7[40] Kai Wang, Yu-An Lin, Ben Weissmann, Manolis Savva, An-gel X Chang, and Daniel Ritchie. Planit: Planning and in-stantiating indoor scenes with relation graph and spatial priornetworks.

ACM Transactions on Graphics (TOG) , 38(4):132,2019. 1, 2[41] Kai Wang, Manolis Savva, Angel X Chang, and DanielRitchie. Deep convolutional priors for indoor scene synthe-sis.

ACM Transactions on Graphics (TOG) , 37(4):70, 2018.1, 2[42] Tomer Weiss, Alan Litteneker, Noah Duncan, MasakiNakada, Chenfanfu Jiang, Lap-Fai Yu, and Demetri Ter-zopoulos. Fast and scalable position-based layout synthesis. arXiv preprint arXiv:1809.10526 , 2018. 2, 6, 7[43] Hualiang Xie, Wenzhuo Xu, and Bin Wang. Reshufﬂe-based interior scene synthesis. In

Proceedings of the12th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and Its Applications in Industry , pages191–198. ACM, 2013. 2[44] Kun Xu, Kang Chen, Hongbo Fu, Wei-Lun Sun, and Shi-Min Hu. Sketch2scene: sketch-based co-retrieval and co-placement of 3d models.

ACM Transactions on Graphics(TOG) , 32(4):123, 2013. 1, 2, 7[45] Yi-Ting Yeh, Lingfeng Yang, Matthew Watson, Noah DGoodman, and Pat Hanrahan. Synthesizing open worlds withconstraints using locally annealed reversible jump mcmc.

ACM Transactions on Graphics (TOG) , 31(4):56, 2012. 1,2[46] Lap-Fai Yu, Sai Kit Yeung, Chi-Keung Tang, Demetri Ter-zopoulos, Tony F Chan, and Stanley Osher. Make it home:automatic optimization of furniture arrangement.

ACMTrans. Graph. , 30(4):86, 2011. 2, 7[47] Andy Zeng, Shuran Song, Matthias Nießner, MatthewFisher, Jianxiong Xiao, and Thomas Funkhouser. 3dmatch:Learning local geometric descriptors from rgb-d reconstruc-tions. In

CVPR , 2017. 7[48] Song-Hai Zhang, Shao-Kui Zhang, Yuan Liang, and PeterHall. A survey of 3d indoor scene synthesis.

Journal ofComputer Science and Technology , 34(3):594, 2019. 1, 2[49] Jia Zheng, Junfei Zhang, Jing Li, Rui Tang, Shenghua Gao,and Zihan Zhou. Structured3d: A large photo-realisticdataset for structured 3d modeling.