[PDF] Geometry-Based Layout Generation with Hyper-Relations AMONG Objects

Abstract

Recent studies show increasing demands and interests in automatically generating layouts, while there is still much room for improving the plausibility and robustness. In this paper, we present a data-driven layout framework without model formulation and loss term optimization. We achieve and organize priors directly based on samples from datasets instead of sampling probabilistic models. Therefore, our method enables expressing and generating mathematically inexpressible relations among three or more objects. Subsequently, a non-learning geometric algorithm attempts arranging objects plausibly considering constraints such as walls, windows, etc. Experiments would show our generated layouts outperform the state-of-art and our framework is competitive to human designers.

Full PDF

GGeometry-Based Layout Generation with Hyper-Relations AMONG Objects

Shao-Kui ZhangTsinghua University [email protected]

Wei-Yu XieBeijing Institute of Technology [email protected]

Song-Hai Zhang*Tsinghua University [email protected]

Abstract

Recent studies show increasing demands and inter-ests in automatically generating layouts, while there isstill much room for improving the plausibility and ro-bustness. In this paper, we present a data-driven lay-out framework without model formulation and loss termoptimization. We achieve and organize priors directlybased on samples from datasets instead of samplingprobabilistic models. Therefore, our method enablesexpressing and generating mathematically inexpressiblerelations among three or more objects. Subsequently,a non-learning geometric algorithm attempts arrangingobjects plausibly considering constraints such as walls,windows, etc. Experiments would show our generatedlayouts outperform the state-of-art and our frameworkis competitive to human designers.

1. Introduction

3D scenes are becoming fundamental to many domainsin computer graphic, e.g., photo-realistic rendering, virtualreality (VR), providing datasets for computer vision [9], etc.However, the increasing development of computer graphicrequires a better ability to model 3D scenes and providemore layouts. Therefore, we have been investigating tech-niques of automatically generating scene layouts.Generating scene layouts beneﬁts various applications.First, it saves manual work of placing objects, such as videogames or industrial designs . Li et al. [11] generate variouslayouts for better simulations of wheelchair training. Handaet al. [9] generate multi-view images from much fewer 3Dscenes.Existing works already show the progress of scene syn-thesis [31], where scene layouts focus on their plausibilityand aesthetic, i.e., visual identiﬁcations given generated lay-outs. Existing works are divided into neural network basedtechniques and others. The former trains several neural net-works for different steps such as placing objects, rotating Code is publicly available athttps://github.com/Shao-Kui/3DScenePlatform, including the proposedframework (algorithm) and a 3D scene platform (toolbox). https://planner5d.com/ objects, deciding termination of arrangements [25]. Thelatter formulates a set of mathematical models includinggraphs, and typically optimize a shufﬂed area based on e.g.,Markov Chain Monte Carlo (MCMC) [30, 19], since themodels are too complicated to be solved. Nevertheless, al-gorithmic methods have not been investigated as far as wereviewed, because similarly we have to embed layout rulesinto an algorithm so that it operates properly. However, lay-out rules are innumerable. A qualitative comparison of ex-isting techniques is beyond the scope of this paper. Despiteunderlying technical details, this paper focuses on the ﬁnalresults, i.e., improving the plausibility and aesthetic of gen-erated layouts.In this paper, we propose an algorithmic framework forgenerating room layouts as shown in Fig. 1. Our frame-work is split into a data-driven phase: coherent group-ing and a non-data-driven phase: geometric arranging. Indata-driven phase, objects are grouped into several coherentgroups (section 3), where priors are learnt for suggestinglayouts within each coherent group. We directly use cor-rect and denoised samples extracted from datasets as pri-ors. This enables two factors. First, we no longer hypoth-esize distributions of layout rules between/among objects,especially mathematically inexpressible relations. Second,we could easily formulate and represent relations amongthree or more objects since we only have to structure sam-ples of real distributions. Similar to “hyper-graphs” wherean edge connects to more than two vertices, we name ourlearnt relations among objects “hyper-relations” (section 4).Thus, several objects of the same coherent group are ar-ranged in O (1) time by sampling their hyper-relations. Innon-data-driven phase, given independent coherent groupswhere objects of the same group are already properly ar-ranged among each other, a geometric algorithm is pro-posed to generate positions and orientations of each group.Since layout rules among objects are applied during data-driven phase, geometry phase concentrates on much fewerrules related to walls, windows, etc (section 5).Current techniques of synthesizing 3D scenes include se-lecting a set of appropriate objects and generating plausiblelayouts for the objects. We do “layout” while techniquesfor selecting objects are easily incorporated such as [10].Note that in this paper, we prefer instance-based priors to1 a r X i v : . [ c s . G R ] J a n igure 1: Our framework uniformly layouts objects, e.g., small objects on a surface are arranged concurrently instead ofanother layout problem. In addition to the overall plausibility, we emphasize reasonableness among objects related to eachother, i.e., coherent groups. Ours is also friendly to objects hung on walls.category-based priors, e.g., we consider a spatial relationbetween a speciﬁc coffee table to a speciﬁc chair, whereboth of them have unique textures and geometries. If cate-gories are being based, distinct features of objects are lost.As shown in Fig. 2, different shapes of several armchairsintuitively have their own priors to the same coffee table.In this paper, we make the following contributions:1. We ﬁrst introduce and learn hyper-relations amongthree or more objects, which increases layouts of ob-jects of same coherent groups and requires O (1) timeto sample layouts of each group, e.g., a coffee tablesurrounded with several distinct sofas and a TV stand,thus increasing the overall performance.2. We propose a new scaleble geometry-based frameworkfor layout generation, which considers detailed aspectsof room layout, e.g., doors, windows, wall decora-tions, small objects, etc. In coordination with hyper-relations, more plausible and robust layouts are gener-ated.3. We develop an open-source platform for manipulating3D scenes, where requirements such as rendering, ex-ploring, modifying, etc, are supported, thus allowingresearches focus thoroughly on algorithms and tech-niques.

2. Related Works

3D scene synthesis is to select a set of appropriate objectsand transform them plausibly [31]. Earlier works of synthe-sizing 3D scenes are mainly based on designing rules, e.g.,[8, 16] (not data-driven) or data-driven priors. For the for-mer, designing rules are mathematically formulated as a setof constraints followed by optimizations [18, 26]. For thelater, since learnt distributions are too complicated to be dif-ferentiated, MCMC is assembled for attempting proposalsof solving such situation [30, 14, 13, 19, 29]. (a)

Round Sofa (b)

Long Sofa (c)

L-Shape Sofa

Figure 2: Extracting priors based on instances results inﬁner priors. This ﬁgure shows the priors of the same coffeetable with respect to three sofa instances of different shapesand geometries.Some of them present a framework including both ob-ject selection and layout generation, while the rest focus onlayouts, though it may also focus on selecting objects [10].Our method focuses on generations of layouts, i.e., we con-tribute mainly on how to make layouts more plausible androbust.With the continuous study of neural network, severalworks based on convolutional or graph neural network areproposed [25, 24, 20], including the current state-of-artwork PlanIT [24] which is the baseline of our framework.One feature of network-based works is that they couple se-lections and layouts, i.e., selecting an object depends onpending layouts, vice versa. In contrast earlier works afore-mentioned seperate two stages. Literature based on othertechniques does exists, e.g., human-centric assessments [7].Please refer to a more insightful survey on 3D indoor scenesynthesis [31].Several works also synthesize 3D scenes with input otherthan 3D scenes. Xu et al. recovery 3D scenes from handsketch [28]. Luo et al [15] generate 3D scenes from scenegraphs. [1, 3, 22] generate room layouts based on RGB-Dimages or 3D scans. [27, 5] generate scenes based on inputexamples. [2, 17] translate human language to 3D sceneconﬁgurations. However, different input results in differentigure 3: Three coherent groups where white dots denoterespective dominant objects.constraints, frameworks and even applications, these worksare beyond the scope of this paper.

3. Deﬁnitions

Given a list of objects with doors, windows and a roomshape, we formulate its corresponding graph G = < V, E > where each object o ∈ V . E is the set of edges which arealso simply relations between/among objects. Note that inthis paper, we assume a relation may involve more than twoinstances i.e., a hyper-relation among objects (section 4).A coherent group g is a list containing objects where oneobject connects at least to one another object in the samegroup. In other words, two coherent groups never havean edge between their instances respectively. Conceptually,generating coherent groups g ∈ G is equivalent to formulat-ing maximal connected subgraph of G , given priors as con-nections. When generating layout given input, we alwaysinitially group objects into several g i ⊂ V even though agroup may contain one object, such as a wardrobe, a pictureframe, a kitchen cabinet, etc. Coherent groups are hierar-chical as shown in Fig. 3, where visual edges are pairwisebetween parents and children.A transformation of an object includes its translation ( x, y, z ) and Y-axis rotation θ where ﬂoors align with XoZplain. In this paper, we do not re-scale objects. The sameis true of coherent groups. Additionally, transformations ofcoherent groups are propagated to their subordinate objects.Priors are used to group objects into coherent groupsand suggest layouts within each coherent group. A priorset P o ,o ,o ,...,o n , abbreviated as P O , involves two or moreobjects. A single prior p kO ∈ P O suggests a set of plausi-ble transformations for all objects involved. Each prior setcontains a dominant object such as o , and other secondaryobjects. For example, if a dinning table is surrounded withseveral chairs and supports a plant, “dinning table” is thedominant object in this scenario and remaining objects aresecondary objects. If only two objects are involved in P O , P O is a “pairwise relation” between the two objects (sec-tion 4.1). If more than two objects are involved and allsecondary objects derives from the same instance, P O is a“pattern chain set”. Otherwise, P O is a “hyper-relation” 4.3. (a) Pairwise Relation. (b)

Pattern Chain. (c)

Hyper-Relation.

Figure 4: Three types of priors in this paper. Links withthe same color suggest same secondary objects. 4a: a pair-wise “one-to-one” relation between a desk and a chair; 4b:a pairwise “one-to-many” relation between a table and sev-eral identical chairs; 4c: a hyper-relation among several dif-ferent objects dominated by a coffee table.

4. Priors

In this section we show how we extract relations amongobjects. We start by extracting traditional pairwise rela-tions, e.g,. a desk with respect to a chair. Then, we presentpre-computed pattern chains which generalize one-to-onerelations to one-to-many relations, e.g., a dining table sur-rounded by several identical chairs. Finally, based on pair-wise relations and pattern chains, we further generalize andformulate hyper-relations among objects, i.e, a relations“among” more than two instances. Fig. 4 suggests the dif-ferences of them. In this section, we show how priors arerepresented and generated, and the usage is shown in sec-tion 5.Theoretically, pairwise relations and pattern chains areboth special forms of hyper-relations. The reason of dis-cussing them separately is because directly learning hyper-relations is difﬁcult. As a result, we ﬁrst introduce pairwiserelations which derive more general pattern chains thus en-abling forming hyper-relations.

A Pairwise relation is a set of priors P ab from a dom-inant object a to a secondary object b . Given a pairwiserelation P ab , we can sample a prior p ab,k ∈ P ab that is di-rectly a transformation of b with respect to a . Note that pair-wise relations are directional, and sampled transformationsare only relative between two objects involved i.e., globaltransformations are still required (section 5).We extract discrete pairwise priors by utilizing den-sity peak clustering (DPC) [21], which ﬁrstly calculate ρ k = (cid:80) k (cid:48) I { d ≤ d c } ( d k,k (cid:48) ) , d c = d (0 . K ) and δ k =min k (cid:48) : ρ k <ρ k (cid:48) ( d k,k (cid:48) ) for all points. In our situation, d k,k (cid:48) denoting the Euclidean distance from the transformation ofdominant object k to the transformation of secondary object k (cid:48) . A transformation includes transitions and rotations. d c is a hyper-parameter and rho k is counting the number of d k,k (cid:48) that is lower than d c . The selection of d c follows [21],i.e., the . K th greatest d k,k (cid:48) among all k relative dis-tances. δ k seeks a minimal d k,k (cid:48) among all d k,k (cid:48) with higher a) (b) (c) Figure 5: 5a: Directly sampling a pairwise relation with-out pre-computed pattern results in obvious implausibility.5b: Recursively formulating a pattern chain. 5c: Additionalconstraints are optional if e.g., well-aligned layouts are re-quired. rho k (cid:48) than rho k . Please refer to [21] for more details aboutthis algorithm. Although DPC is typically used for cluster-ing, it does anomaly detection for eliminating noises, i.e.,removing points with low values of ρ and high values of δ .Cluster centers and ordinary points are treated equally sincethey are already reasonable transformations in this paper.After elimination, remaining “points” are plausible rela-tions directly from datasets (human designers) where each“point” become a single pairwise prior p ab,k ∈ P ab for lo-cally arranging a dominant object and its secondary object.Typical dominant objects include desk, dinning table, cof-fee table, bed, etc. We manually label a set of instances thatare capable of being dominant objects. Commonly, a dominant object has several secondarycopies of the same instance, e.g., a dinning table with sev-eral identical chairs. If we sample them twice or more asshown in ﬁgure 5a, aforementioned pairwise relations donot guarantee plausibility of “one-to-many” relations. Thus,we solve it by presenting pattern chains.A pattern chain set C ab is a prior set between object a and b . Each c jab = { j , j , ..., j n } , c jab ⊂ N is a list ofindices to its pairwise relation P ab , e.g., j x indexes to the x -th pairwise relation p ab,j x in P ab . Generating one pat-tern chain c jab is a recursive process. First, a p ab,j ∈ P ab is randomly selected from P ab . As discussed, p ab,j givesa plausible transformation between a and b . Second, wetraverse all p ab,i ∈ P ab . If a copy of object b with the trans-formation of p ab,i do not collide with another copy with thetransformation of p ab,j , p ab,i is included in a new subset P (cid:48) ab ⊂ P ab . Third, we would like to place another copy of b , so p ab,j is randomly selected from P (cid:48) ab and the aboveprocedure is executed recursively until P (cid:48) ab is empty. Asshown ﬁgure 5b, after three iterations, placing three chairsaround a table ﬁlters out a subset of their pairwise priors(gray). Therefore, a fourth chair can be only placed in theremaining pigmented areas. When a chain is generated, we (a) (b) (c) Figure 6: 6a: Using only pairwise relations results implau-sibilities among secondary objects in a coherent group. 6b:Using hyper-relations results possibilities among all objectsinvolved. 6c: A different object set requires another hyper-relation, since we can not assume “as many objects as pos-sible”.can optionally adjust it, e.g., ﬁgure 5c suggests “horizontalsand verticals” to make the chain well-aligned.Note the above generates one pattern chain c jab = { j , j , ... } . In theory, a P ab of O ( n ) size has O ( n !) unde-termined pattern chains. In practice, we only generate onepattern chain for each p ab,k ∈ P ab , to make sure each pair-wise relation is used at least once, instead of ﬁguring out theentire pattern chain set. Otherwise, it requires O ( n !) timeand space to compute only a single set, which also slowdown online arrangement by restricting prior loading. A hyper-relation H O is a prior set among several objects O = { o dom , o sec , o sec , ... } . A dominant object o dom ex-ists in H O such as a coffee table and secondary objects re-late to each other, e.g., chairs on a rug, armchairs beside along sofa. Purely sampling pairwise prior sets results inscenarios such as ﬁgure 6a, where secondary objects areonly plausible with respect to their dominant object. Hyper-relation is essentially different from pattern chains. Patternchain sets are still one-to-one relations and a pattern chainassumes incorporating as many secondary objects as possi-ble. In contrast, a hyper-relation has a deﬁnite list of ob-jects, i.e., we can not assume what instances are includedand how many copies each instance has in a speciﬁc hyper-relation, because areas are limited. As shown in ﬁgure 6band 6c, different numbers and instances of seats derives twodistinct hyper-relations.To generate hyper-relations, we do not hypothesize andlearn concrete distributions because real distributions aretoo complicated to be expressed, solved and sampled [12].Instead, we try achieving as many exact samples as pos-sible. Given a set of objects O and its dominant object o dom ∈ O , we randomly select a secondary object o sec ∈ O and randomly sample a prior from the pairwise relation be-tween o dom and o sec . Thus, o sec is transformed with respectto o dom . Next, similar to generating pattern chains, we ﬁl-ter the remaining pairwise relations between o dom and otherecondary objects o sec , o sec , ... ∈ O , to ensure “collisionfree”. With multiple instances, additional rules are required.We use “tiers”, which as far as we studied is ﬁrstly termi-nologized in [30], for ﬁner ﬁltering. For example, rugs areplaced on the ground where objects such as tables and bedscan be put on top of it. Merely detecting collisions wouldmistakenly ﬁlter plausible priors. Not detecting collisionsbetween objects of different tiers alleviates such situations.After ﬁltering the remaining pairwise relations, recursively,we randomly select another secondary object and repeat theabove steps until all secondary objects are placed appropri-ately with no implausibilities. After that, a single hyper-prior is generated with transformations of all secondary ob-jects. We iteratively re-run the entire process to enrich thepending hyper-relation.Yet, the above steps still require deﬁnite lists of objects.Nevertheless, ﬁguring out all undetermined lists is almostequivalent to exhaustively traverse all combinations of ob-jects. To address this, we systematically optimize extrac-tions. After forming coherent groups (section 5), we checktheir hierarchies. If a parent has two or more children,we try assemble the hyper-relation for them. If the hyper-relation does not exist, a new thread is started to generateit in background. In other words, we either load existinghyper-relations if they are already generated or establish athread for generating them when we need them. Alterna-tively, users can manually suggest their own lists of objectsto generate their hyper-relation.

5. Geometry-Based Layout Generation

We show how we ﬁnally arrange objects in this section.First, objects are decomposed into several coherent groups g i ⊂ G based on ﬁnding maximal connected subgraphs us-ing pairwise relations between objects as shown in ﬁgure 7,where whether or not two objects are connected depends onexistence of pairwise relations between objects.One secondary object can have at most one dominant ob-ject. If multiple available dominant objects exist with re-spect to a secondary object o sec , we randomly select a dom-inant object and discard relations between o s ec and otherdominant objects. Each dominant object also has ﬁnitelengths of copies of secondary instances guided by lengthsof respective pattern chains. This makes our frameworkmore ﬂexible, e.g., given only one chair but a dressing ta-ble and a desk in a bedroom, we randomly assign the chairto either the dressing table or the desk, which gives morevariance to generated results.After that, input objects are distributed in coherentgroups. As discussed in section 4, within a speciﬁc coherentgroup, we can directly sample a set of transformations forall objects locally within the group. As shown in ﬁgure 7, if Figure 7: Coherent Grouping. Dotted dashes denote hyper-relations of secondary objects. Given a list of objects togenerate their layout, we ﬁrst group them into several co-herent groups. For example, a coffee table relates to twosofas and a TV stand and the TV stand relates to a TV, sothey form one coherent group. Two cabinets have no rela-tion to others, so each of them form their own groups.a parent has two or more descendants and each descendantsare different, the hyper-relation is assembled or started to begenerated in background, e.g., coffee table with respect totwo sofas and a TV stand. If the descendants are identical,the pattern-chain set is sampled, e.g., dining table and fourchairs. Otherwise, we use pairwise priors, e.g., TV standand TV. Therefore, the ﬁnal process is to transform severalcoherent groups properly in the room. Eventually, we assign transformations to each coherentgroup and propagate transformations to objects. Since pri-ors already layout objects sufﬁciently within groups, threemore constraints are required to make layouts physicallyplausible among groups: 1, all groups should be inside aroom; 2, all groups should not overlap each other; 3, clearpaths should exist for windows and doors.Placing a set of shapes (coherent groups) in anotherhuger polygon (room) is an np-hard problem [4] in compu-tational geometry. Thus, we geometrically simplify coher-ent groups as cuboids, consider doors and windows as ﬁxed(pre-arranged) blocks, and do heuristic attempts as shownin algorithm 1.We ﬁrst sort coherent groups according to their area oc-cupied from the largest to the smallest, since bigger groupsusually represent more central functionality of rooms, e.g.,a bedroom is call a “bedroom” due to a coherent group dom-inated by a bed. Then, coherent groups are placed with re-gard to this order, whereas random positions are assigned tothem along the inner side of the targeting room, since theeﬁnition of coherent groups indicates the relations amongdifferent coherent groups are weak (section 3). After plac-ing a group, we check potential collisions between thisgroup and other groups or blocks. If collided, we discardthe transformation and randomly re-select a new transfor-mation. To enhance the performance, we used exponen-tially increasing sampling density. If a proper transforma-tion fails at a density of d for the pending coherent group,we increase d to d to ﬁnd more possible positions. Butif it still collides after several times of increasing density,we discard the group and conduct the next one. To increasethe plausibility, we add more heuristic rules: 1, we initiallyattempt to transform groups at corners of rooms and sidesof other existing coherent groups. During collision detec-tion, we take the height into consideration. So it is possiblethat some furniture with lower height is placed in front ofwindows. Finally, “liftings” L f are assigned to groups. If L f = 0 , a groups is placed against walls. If L f equals tohalf the length of the room, a group is placed in the middleof the room.

6. Experiments

We utilize a recent 3D scene dataset “3D-Front ” [6]with 70000+ layouts and 9992 3D models. To roam andrender 3D scenes, we develop an open-source 3D sceneplatform as shown in ﬁgure 8, where we can add, delete,modify and search objects. We can orbitally control the per-spective camera for selecting better views. By clicking “lay-out”, conﬁgurations of a current room is layouted by ourproposed framework. We render 3D scenes using Three.js and the algorithm is mainly implemented by PyTorch andNumPy. Several results are shown in ﬁgure 9. Please referto our supplementary materials for more details. We compare our framework with the state-of-art PlanIT[24]. PlanIT includes not only arranging objects but alsoselecting appropriate objects. However, since we focuson arranging objects, we show better plausibility and aes-thetic achieved using our framework by re-arranging resultsof PlanIT, i.e, we generate layouts given objects and roomshape selected by PlanIT.Qualitatively, as shown in ﬁgure 11, ours is friendly forlayouts among objects with strong relations, i.e., “coherentgroups” in this paper. For example, a TV stand and a sofaare strongly related to a coffee table. Ours makes sure theyare plausibly arranged among each other. Additionally, ours https://tianchi.aliyun.com/dataset/dataDetail?dataId=65347 http://threejs.org/ We also run our framework on SUNCG [23] before this dataset be-came unavailable. We include results of SUNCG optionally in our supple-mentary materials only to verify the effectiveness of our framework.

Algorithm 1

Geometric Arranging

Input: Polygon of room’s inner side P r ; List of rectangles of coherent groups with height A rec ; List of rectangles of windows and doors;

Output:

Transformations of rectangles T rec ; function C HECK

OK( A ) if A does not overlap with existing groups andblocks then return True else return False end if end function function A PPLY T RANSFORM ( A , t ) apply transformation t to A return A end function function I NSERT R ECTANGLE ( A ) Let T be array of transformations //For heuristic for edge ∈ P r and p ∈ existing polygons do Push heuristic transformation of edge or p to T end for for t ∈ T do if CheckOK(ApplyTransform( A , t )) then return t ; end if end for //For random Clear T for n = 1 → max sampling density do for edge ∈ P r do Push n ∗ len ( edge ) random transforma-tions on edge to T end for Shufﬂe T for t ∈ T do if CheckOK(ApplyTransform( A , t )) then return t ; end if end for Clear T end for return None; end function for a ∈ A rec do Push InsertRectangle( a ) to T rec ; end for does not block paths of doors and windows. Quantitatively,we also conduct a user study as shown in ﬁgure 10a. 43 sub-jects are invited. Subjects are university students, workers, a) Platform Overview. (b) Viewing & Roaming.(c) Manipulating & Searching. (d) Rendering. Figure 8: We develop an open-source 3D scene platform al-lowing adding, deleting, modifying, searching objects andrendering, saving scenes. Users can explore given 3Dscenes by orbital control. Our platform is embedded withthe proposed algorithm.Table 1: User study: results of comparing PlanIT with ours.Room Type PlanIT OursBedrooms 1.847 (1.336) 2.66 (1.125)Living Rooms 1.749 (1.327) 2.572 (1.266)Bathrooms 1.028 (1.2) 2.553 (1.314)Kitchens 1.549 (1.342) 2.651 (1.167)Total 1.543 (1.341) 2.609 (1.221)housewives, interior designers, etc. Each subject is given20 questions and each question includes a layout generatedby ours and a layout generated by PlanIT. Presented lay-outs are shufﬂed, i.e., ours can be either left or right. Foreach question, a subject compares two layouts and marksthem respectively. Marks ranged from (very poor) to (very plausible). All subjects are taught how to use the userstudy system before experiencing. In ﬁgure 10a, the Chi-nese characters are rendered as “there are two room layoutsbelow, please compare the two layouts, considering aes-thetic, plausibility and reasonableness, thus marking themrespectively.”, “0: totally unreasonable, inaesthetic. It maynever appear in the real world layout. ” and “5: very aes-thetic and plausible. I will refer to this layout in the realworld.” Results are shown in table 1, where marks are aver-aged (standard deviation) of respective room types. In this section, we compare our generated layouts withthe layouts of human designers (ground truths) to verify thatours is competitive to human. Subjects are the same group Few subjects preserve privacy. of section 6.2. Each subject is required to choose a mostplausible layout from ten alternative layouts as shown in ﬁg-ure 10b, where one layout is designed by a human designerand remaining nine layouts are generated by ours. Sub-jects can zoom in layouts by right clicks such as ﬁgure 10c.All subjects are taught before experiencing and manuals areavailable. Ground truths are randomly selected from 3D-Front. In ﬁgure 10b, the Chinese characters are rendered as“there are ten layouts below and please select your favoriteone considering aesthetic, plausibility and reasonableness”,“left-click for selections and right click for zooming in” and“after selecting, press submit for the next question”.Results are shown in ﬁgure 12. Two distributions areplotted for bedrooms and “living-dinning” rooms respec-tively, i.e., each line is averaged distributions of user selec-tions of its room type, where “0” denotes ground truth. Al-though human-designed layouts outperform ours, generatedlayouts are still favored competitively as shown in ﬁgure 12.

We experience our framework on a PC with AMD2700X (GHz), GTX 970, and WD20EZRX. Time consump-tion of layouts depend on degrees of crowding, i.e., ratioof total area of coherent groups to area of room. Higherdegrees result in more discards during geometry-based ar-rangements (section 5.2), thus slowing down generations.To layout 3D-Front such as ﬁgure 9, if priors are cached,our framework consume within . second for a layout. Ifcorresponding priors of several objects are not loaded, addi-tional IO is required up to seconds for a layout. For non-crowded rooms, with cached priors, our framework gener-ates layouts in real time.We also run the state-of-art PlanIT [24] on servers withGTX 1080ti. According to our experiments, generating alayout requires more than a minute. Nevertheless, this in-cludes both object selection and object arrangement and thetwo are interleaved with each other. Testing exact time con-sumption of “layout” of PlanIT is beyond the scope of thispaper. Furthermore, [26] is not a data-driven framework.Therefore, it is hard to conclude “better efﬁciency” as a con-tribution.

7. Conclusions and Future Works

In this paper, we present a new framework of generat-ing room layouts and we experimentally verify the achievedplausibility and robustness. The code of this framework anda toolbox platform is available. We hope this could con-tribute to domains related to 3D scenes. However, this workstill suffers the following weaknesses.The most difﬁculties we encountered is arranging“chains” of objects around walls. For independent objectssuch as wardrobes, transformations of them have high de-gree of freedom since we ﬁnd appropriate places for themedroom.Living Room & Dinning Room.Figure 9: Results. Please zoom in for more details. More results are included in the supplementary ﬁles. (a) (b) (c)

Figure 10: User studies. 10a: Marking ours and PlanIT [24] respectively; 10b: Selecting the most plausible layout from tenalternative scenes where one scene is generated by human designers; 10c: subjects can zoom in a particular layouts for bettercognition.with no collision and implausibility. However, for groupsof objects such as kitchen cabinets and ovens, they are fre-quently placed adjacently next to each other as shown inﬁgure 13. Firstly, orders of a chain should be carefully con-sidered. For example, commonly we places geometricallysimilar cabinet next to each other. Otherwise, layouts arenot aesthetic as shown in ﬁgure 13b. Secondly, an L-shapechain should somehow turn at corners, especially when wehave L-shape objects such as L-shape cabinets which arefrequently treated as “corner objects” as shown in ﬁgure 13c. Thirdly, doors and windows are also challenges forarranging chains. In our framework, if we treat a chain asan entire group, currently we do not have plans for samplingsuch priors. On the other hand, if we treat a chain as indi-vidual objects, complicated rules are required but we alsodo not have a plan for formulating the rules. As a result, wedemonstrate this weakness in detail and we would try ﬁx-ing it in future. Fortunately, in real-world decoration, mostcabinets are ﬁxed on walls.The storage and loading of priors may require further a) PlanIT. (b) Ours.

Figure 11: Qualitatively comparing PlanIT with ours.Figure 12: Distributions of user selected layouts of bed-rooms (BLUE) and “living-dinning rooms” (RED).system-level optimizations. Currently, all priors are struc-tured in “.json” format, which is inefﬁcient if a prior of acoherent group is too large. When arranging objects online,loading priors may consume up to few seconds for loadingcorresponding priors into the memory. Although this onlyaffects the ﬁrst attempt, since priors are cached after that,it is still a concern in practice. Eventually, the way of ex-tracting patterns is robust to shapes and textures instead ofincorporating them. This is useful if we want objects ar-ranged more compacted.

References [1] A. Avetisyan, M. Dahnert, A. Dai, M. Savva, A. X. Chang,and M. Nießner. Scan2cad: Learning cad model alignmentin rgb-d scans. arXiv preprint arXiv:1811.11187 , 2018. 2[2] A. Chang, W. Monroe, M. Savva, C. Potts, and C. D. Man-ning. Text to 3d scene generation with rich lexical grounding. (a) (b) (c)

Figure 13: The problem of “chains” of objects around walls.13a: The ground truth; 13b: A failure case of ours; 13c: AnL-shape “chain” with L-shape objects. arXiv preprint arXiv:1505.06289 , 2015. 2[3] K. Chen, Y. Lai, Y.-X. Wu, R. R. Martin, and S.-M. Hu.Automatic semantic modeling of indoor scenes from low-quality rgb-d data using contextual information.

ACM Trans-actions on Graphics , 33(6), 2014. 2[4] M. De Berg, M. Van Kreveld, M. Overmars, andO. Schwarzkopf. Computational geometry. In

Computa-tional geometry , pages 1–17. Springer, 1997. 5[5] M. Fisher, D. Ritchie, M. Savva, T. Funkhouser, and P. Han-rahan. Example-based synthesis of 3d object arrangements.

ACM Transactions on Graphics (TOG) , 31(6):135, 2012. 2[6] H. Fu, R. Jia, L. Gao, M. Gong, B. Zhao, S. Maybank, andD. Tao. 3d-future: 3d furniture shape with texture. arXivpreprint arXiv:2009.09633 , 2020. 6[7] Q. Fu, H. Fu, H. Yan, B. Zhou, X. Chen, and X. Li. Human-centric metrics for indoor scene assessment and synthesis.

Graphical Models , 110:101073, 2020. 2[8] T. Germer and M. Schwarz. Procedural arrangement of fur-niture for real-time walkthroughs. In

Computer GraphicsForum , volume 28, pages 2068–2078. Wiley Online Library,2009. 2[9] A. Handa, V. Patraucean, V. Badrinarayanan, S. Stent, andR. Cipolla. Understanding real world indoor scenes withsynthetic data. In

Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition , pages 4077–4085, 2016. 1[10] Y. He, Y. Cai, Y.-C. Guo, Z.-N. Liu, S.-K. Zhang, S.-H.Zhang, H.-B. Fu, and S.-Y. Chen. Style-compatible ob-ject recommendation for multi-room indoor scene synthesis. arXiv preprint arXiv:2003.04187 , 2020. 1, 2[11] W. Li, J. Talavera, A. G. Samayoa, J.-M. Lien, and L.-F. Yu.Automatic synthesis of virtual wheelchair training scenarios.In , pages 539–547. IEEE, 2020. 1[12] Y. Li, J. Zhang, Y. Cheng, K. Huang, and T. Tan. Df 2 net:Discriminative feature learning and fusion network for rgb-dndoor scene classiﬁcation. In

Thirty-Second AAAI Confer-ence on Artiﬁcial Intelligence , 2018. 4[13] Y. Liang, F. Xu, S.-H. Zhang, Y.-K. Lai, and T. Mu. Knowl-edge graph construction with structure and parameter learn-ing for indoor scene design.

Computational Visual Media ,4(2):123–137, 2018. 2[14] Y. Liang, S.-H. Zhang, and R. R. Martin. Automatic data-driven room design generation. In

International Workshopon Next Generation Computer Animation Techniques , pages133–148. Springer, 2017. 2[15] A. Luo, Z. Zhang, J. Wu, and J. B. Tenenbaum. End-to-end optimization of scene layout. In

Proceedings ofthe IEEE/CVF Conference on Computer Vision and PatternRecognition , pages 3754–3763, 2020. 2[16] G. H. Lyons.

Ten Common Home Decorating Mistakes &How to Avoid Them . Blue Sage Press, 2008. 2[17] R. Ma, A. G. Patil, M. Fisher, M. Li, S. Pirk, B.-S. Hua, S.-K.Yeung, X. Tong, L. Guibas, and H. Zhang. Language-drivensynthesis of 3d scenes from scene databases. In

SIGGRAPHAsia 2018 Technical Papers , page 212. ACM, 2018. 2[18] P. Merrell, E. Schkufza, Z. Li, M. Agrawala, and V. Koltun.Interactive furniture layout using interior design guide-lines. In

ACM transactions on graphics (TOG) , volume 30,page 87. ACM, 2011. 2[19] S. Qi, Y. Zhu, S. Huang, C. Jiang, and S.-C. Zhu. Human-centric indoor scene synthesis using stochastic grammar. In

Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition , pages 5899–5908, 2018. 1, 2[20] D. Ritchie, K. Wang, and Y.-a. Lin. Fast and ﬂexible in-door scene synthesis via deep convolutional generative mod-els. In

Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition , pages 6182–6190, 2019. 2[21] A. Rodriguez and A. Laio. Clustering by fast search and ﬁndof density peaks.

Science , 344(6191):1492–1496, 2014. 3, 4[22] T. Shao, W. Xu, K. Zhou, J. Wang, D. Li, and B. Guo. Aninteractive approach to semantic modeling of indoor sceneswith an rgbd camera.

ACM Transactions on Graphics (TOG) ,31(6):136, 2012. 2[23] S. Song, F. Yu, A. Zeng, A. X. Chang, M. Savva, andT. Funkhouser. Semantic scene completion from a sin-gle depth image. In

Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition , pages 1746–1754, 2017. 6[24] K. Wang, Y.-A. Lin, B. Weissmann, M. Savva, A. X. Chang,and D. Ritchie. Planit: Planning and instantiating indoorscenes with relation graph and spatial prior networks.

ACMTransactions on Graphics (TOG) , 38(4):132, 2019. 2, 6, 7, 8[25] K. Wang, M. Savva, A. X. Chang, and D. Ritchie. Deepconvolutional priors for indoor scene synthesis.

ACM Trans-actions on Graphics (TOG) , 37(4):70, 2018. 1, 2[26] T. Weiss, A. Litteneker, N. Duncan, M. Nakada, C. Jiang, L.-F. Yu, and D. Terzopoulos. Fast and scalable position-basedlayout synthesis. arXiv preprint arXiv:1809.10526 , 2018. 2,7[27] G. Xiong, Q. Fu, H. Fu, B. Zhou, G. Luo, and Z. Deng.Motion planning for convertible indoor scene layout design.

IEEE Transactions on Visualization and Computer Graph-ics , 2020. 2 [28] K. Xu, K. Chen, H. Fu, W.-L. Sun, and S.-M. Hu.Sketch2scene: sketch-based co-retrieval and co-placementof 3d models.

ACM Transactions on Graphics (TOG) ,32(4):123, 2013. 2[29] Y.-T. Yeh, L. Yang, M. Watson, N. D. Goodman, and P. Han-rahan. Synthesizing open worlds with constraints using lo-cally annealed reversible jump mcmc.

ACM Transactions onGraphics (TOG) , 31(4):56, 2012. 2[30] L.-F. Yu, S. K. Yeung, C.-K. Tang, D. Terzopoulos, T. F.Chan, and S. Osher. Make it home: automatic optimiza-tion of furniture arrangement.

ACM Trans. Graph. , 30(4):86,2011. 1, 2, 5[31] S.-H. Zhang, S.-K. Zhang, Y. Liang, and P. Hall. A survey of3d indoor scene synthesis.