[PDF] GUIGAN: Learning to Generate GUI Designs Using Generative Adversarial Networks

Abstract

Graphical User Interface (GUI) is ubiquitous in almost all modern desktop software, mobile applications, and online websites. A good GUI design is crucial to the success of the software in the market, but designing a good GUI which requires much innovation and creativity is difficult even to well-trained designers. Besides, the requirement of the rapid development of GUI design also aggravates designers' working load. So, the availability of various automated generated GUIs can help enhance the design personalization and specialization as they can cater to the taste of different designers. To assist designers, we develop a model GUIGAN to automatically generate GUI designs. Different from conventional image generation models based on image pixels, our GUIGAN is to reuse GUI components collected from existing mobile app GUIs for composing a new design that is similar to natural-language generation. Our GUIGAN is based on SeqGAN by modeling the GUI component style compatibility and GUI structure. The evaluation demonstrates that our model significantly outperforms the best of the baseline methods by 30.77% in Frechet Inception distance (FID) and 12.35% in 1-Nearest Neighbor Accuracy (1-NNA). Through a pilot user study, we provide initial evidence of the usefulness of our approach for generating acceptable brand new GUI designs.

Full PDF

GGUIGAN : Learning to Generate GUI Designs UsingGenerative Adversarial Networks

Tianming Zhao

Jilin University

Changchun, [email protected]

Chunyang Chen ∗ Monash University

Melbourne, [email protected]

Yuanning Liu

Jilin University

Changchun, [email protected]

Xiaodong Zhu ∗ Jilin University

Changchun, [email protected]

Abstract —Graphical User Interface (GUI) is ubiquitous inalmost all modern desktop software, mobile applications, andonline websites. A good GUI design is crucial to the success of thesoftware in the market, but designing a good GUI which requiresmuch innovation and creativity is difﬁcult even to well-traineddesigners. Besides, the requirement of the rapid developmentof GUI design also aggravates designers’ working load. So,the availability of various automated generated GUIs can helpenhance the design personalization and specialization as they cancater to the taste of different designers. To assist designers, wedevelop a model

GUIGAN to automatically generate GUI designs.Different from conventional image generation models based onimage pixels, our

GUIGAN is to reuse GUI components collectedfrom existing mobile app GUIs for composing a new design thatis similar to natural-language generation. Our

GUIGAN is basedon SeqGAN by modeling the GUI component style compatibilityand GUI structure. The evaluation demonstrates that our modelsigniﬁcantly outperforms the best of the baseline methods by30.77% in Fr´echet Inception distance (FID) and 12.35% in 1-Nearest Neighbor Accuracy (1-NNA). Through a pilot user study,we provide initial evidence of the usefulness of our approach forgenerating acceptable brand new GUI designs.

Index Terms —Graphical User Interface, mobile application,GUI design, deep learning, Generative Adversarial Network(GAN)

I. I

NTRODUCTION

Graphic User Interface (GUI) is ubiquitous in almost allmodern desktop software, mobile applications, and onlinewebsites. It provides a visual bridge between a softwareapplication and end-users through which they can interact witheach other. A good GUI design makes an application easy,practical, and efﬁcient to use, which signiﬁcantly affects thesuccess of the application and the loyalty of its users [1]. Forexample, computer users view Apple’s Macintosh system ashaving a better GUI than the Windows system, therefore theirpositive views almost double that of Windows users, leadingto 20% more brand loyalty [2].Good GUI design is difﬁcult and time-consuming, even forprofessional GUI designers, as the designing process mustfollow many design rules and principles [3] [4], such as ﬂuentinteractivity, universal usability, clear readability, aestheticappearance, and consistent styles [5] [6] [3]. To follow thefashion trend, GUI designers have to keep reviewing thelatest/hottest mobile apps/software or getting inspiration from * Corresponding author. design sharing sites (e.g., Dribbble ). Considering that eachmobile app/website/software contains tens of different screensand their GUIs need to be updated iteratively due to the marketpressure, designers have to take much innovation-extensiveworking load.Unfortunately, this design work often awaits just very fewdesigners in a company [7], and software developers have toﬁll in the gap. In a survey of more than 5,700 developers [8],51% developers reported working on app GUI design tasks,more so than other development tasks, which they had toperform every few days. However, software developers oftendo not have sufﬁcient professional UI/UX design training withart sense. That is why it is challenging for developers to designthe GUI only from scratch. Instead, when designing the GUIfor websites or mobile apps, developers are very likely tosearch existing GUI designs on the internet as the reference,and further implement and customize the GUI design fortheir own purposes [9], [10]. This process usually happens atGUI development in small start-ups or small-scale open-sourcesoftware, as there are not any professional UI/UX designers.Although some studies help with the GUI search by attributeﬁltering [11] or parsing code structure of UI [12], there arethree problems with the GUI search. First, there is a gapbetween the developers’ intention in mind and the outputtextual query, and another gap between the textual queryand visual GUI design. Due to the gap between the textualand visual information, the retrieved GUI may not satisfydevelopers’ requirements. Second, the retrieved GUI designmay be adopted by other developers, resulting in the highsimilarity to other apps, negatively inﬂuencing the uniquenessof the app. Directly using others’ GUI may also involvepotential intellectual property issues. Third, the design styleof some retrieved GUIs may be out of date and it’s hard fordevelopers to keep track of the latest trend of the GUI design. “ A lot of times, people don’t know what they want until youshow it to them. ” — Steve Jobs

Consequently, an automated method for creative GUI designgeneration is terribly needed to alleviate the burden of bothnovice designers and developers. With the generated GUI https://dribbble.com/ a r X i v : . [ c s . H C ] J a n … GUI Subtree Segmentation …… … Generator Discriminator

Information of Design Style and Composition Structure Subtree Serialization Input Output Components Recombination

GUI Dataset Subtree Repository GUI Generation System Based on GAN

Adversarial Training

Generated GUI

Fig. 1. Overview of the proposed method. design, developers can further adopt the automated GUI codegeneration [13]–[15] for the automated. In that way, the over-all GUI development process will be signiﬁcantly simpliﬁed.In this work, we develop a deep learning model

GUIGAN to automatically generate GUI designs based on the existingGUI screenshots collected from thousands of mobile apps. Itcan provide designers and developers brand new GUI designs,and they can further customize the generated GUI for theirown purpose, rather than starting from scratch. Although thereare plenty of image generation models like DCGAN, VAE-GAN, CycleGan, and WGAN [16]–[19], they are all basedon plain pixels. In contrast, GUI is composed of a set ofdetailed components (e.g., button, text, images), and a goodGUI design is concerned more with the composition of thesecomponents, rather than ﬁne-grained component pixels. Due tothe characteristic of GUI and inspired by the natural-languagegeneration (i.e., selecting a list of words for composing onesentence), we formulate our task as selecting a list of existingGUI component subtree to compose new GUI designs.An overview of our approach is shown in Fig 1. First, Wecollect 12,230 GUI screenshots and their corresponding meta-information from 1,609 Android apps in 27 categories fromGoogle Play and decompose them into 41,813 component sub-trees for re-using. Second, we develop a SeqGAN [20] basedmodel. Apart from the default generation and discriminationloss, we model the GUI component style compatibility andGUI layout structure for guiding the training. Therefore, our

GUIGAN can generate brand new GUI designs for designers’inspiration. The evaluation demonstrates that our model sig-niﬁcantly outperforms the best of the baseline methods by30.77% in Fr´echet Inception distance (FID) and 12.35% in1-Nearest Neighbor Accuracy (1-NNA). Through a pilot userstudy, we provide the initial evidence of the usefulness of ourapproach for generating acceptable brand new GUIs.Our contributions in this work can be summarized as follow: • To the best of our knowledge, this is the ﬁrst study toautomatically generate the mobile app GUI design whichrequires much creativity and visual understanding. • We propose a novel deep learning-based method to gen-erate brand new GUI designs composed of subtree se-quences from the existing GUI designs without additionalmanual presets. • The experimental results based on two speciﬁc develop-ment conditions show that our method can successfullycapture GUI design styles and structural features, and

Toolbar TextView Button TextView ItemView TextView TextView Layout TextView TextView TextView Layout TextView TextView TextView … Layout TextView TextView TextView … Fig. 2. Real-world data collection of GUI subtrees. automatically generate a new composite GUI that con-forms to the aesthetic of the consumers and standard GUIstructure. II. P

RELIMINARY

In this section, we clear our goal and establish the corre-sponding task, and then introduce a deep learning method thatour work is based on.

A. Task Establishment

Different from the plain image which is made up of pixels,one GUI design image consists of two types of componentsi.e., widgets (e.g., button, image, text) and spatial layouts (e.g.,linear layout, relative layout). The widgets (leaf nodes) areorganized by the layout (intermedia nodes) as the structuraltree for one GUI design as seen in Fig 2. As most GUI de-signers may re-use some of their previous design componentsin their new design [21], we take the subtree of existing GUIsas the basic unit for composing a new GUI design rather thanplain pixels. Therefore, we formulate our task as producing asequence S T = ( s , ..., s t , ...s T ) , s t ∈ S of GUI componentsubtrees, where S is the subtree repository. It can also bedescribed as generating a new GUI by selecting a list ofcompatible GUI subtrees.To obtain these candidate subtrees from screenshots of theGUIs, we cut them from the original screenshot according tocertain rules. Given one GUI design with detailed componentinformation, we cut out all the ﬁrst-level subtrees from theoriginal DOM tree as seen in Fig 2. If the width of a subtreeexceeds 90% of the GUI width, we continue to cut it to … … × × × + … Real World Data Sample Structures

List Image Image Text Image Text Text

Subtree Embeddings Style Classification CNN Generator(LSTM Network) … Samples Subtree Candidates: 𝑠 , 𝑠 , …, 𝑠 𝑉 Discriminator(CNN Network)

𝐿𝑜𝑠𝑠 𝑔 𝐿𝑜𝑠𝑠 𝑐 𝐿𝑜𝑠𝑠 𝑠 Layout List Image Image Text Text Layout Link

Real World Structures

𝐿𝑜𝑠𝑠 𝑑 … HOM( 𝒔 ) MED(tree(s), tree(r)) Adversarial Training Fig. 3. The workﬂow of

GUIGAN . the next level, otherwise, stop splitting and this subtree isused as the smallest granularity unit. The procedure will beiterated until the segmentation stops. Finally, all the smallestsubtrees are given a unique number identiﬁcation. We removethe subtrees with duplicate bounds in one GUI and keep onlyone in the process. Based on the collection and observationof the data from our pilot study, we remove the subtrees withpartial overlap and preserve those with the aspect ratio between0.25 to 50, which has a speciﬁc structure and can be clippedfrom the original GUI screenshot. B. Base Model

Our work is mainly based on the generative adversarialnetworks (GAN) [22], which consists of a generative networkas the generator and a discriminative network as the discrim-inator respectively. The generator learns the features from thereal data and generates new samples to fool the discriminator.The discriminator tries to distinguish the true sample from thefake one. These two networks are trained in an adversarialmode until the discriminator cannot distinguish the samplesgenerated by the generator.The

GUIGAN is proposed based on SeqGAN [20], whichis a variant of GAN. SeqGAN is the ﬁrst work extendingGANs to generate sequences of discrete tokens. It solves thecommon problems of traditional GAN in dealing with discretedata such as sequences, that is, the generator is difﬁcult totransfer gradient updates effectively, and the discriminator isdifﬁcult to evaluate incomplete sequences. SeqGAN combinesthe GAN and Policy Gradient algorithm of reinforcement learning to guide the training of the generative model throughthe discriminative model. SeqGAN uses a Long Short-TermMemory (LSTM) as the generator, a CNN with a highwaystructure as the discriminator, and a well-trained oracle withthe same architecture as the generator to generate samplesas the ground truth. The discriminator updates parametersby distinguishing between real samples and generated onesfrom the generator in the d-step (the step of training thediscriminator), which belongs to a binary classiﬁcation task.The generator uses the Monte Carlo (MC) search rewardperformed by the discriminator in combination with the policygradient method to update its parameters in the g-step (the stepof training the generator).III. A

PPROACH

We propose a system called

GUIGAN that learns to syn-thesize brand new GUI designs for designers by modelingGUI component subtree sequences and style compatibility.The approach overview can be seen in Fig 3 Based onsubtrees automatically segmented from the original GUIs inSection II-A, we ﬁrst convert all them into embedding bymodeling their style in Section III-A. During the trainingprocess, the generator randomly generates a sequence with thegiven length and the discriminator acts as the environment, inwhich the reward can be calculated as the loss g by MonteCarlo tree search (MCTS). We get the homogeneity value ofthe generated result as loss c in Section III-B. By measuringthe distance between the generated result and the original GUIesign, the model captures the structural information in Sec-tion III-C with loss s calculated by the minimum edit distance.By integrating all the loss functions above in Section III-D,the parameters of the generator are updated with the back-propagation algorithm. A. Style Embedding of Subtree

Since we are feeding the model with sub-images showingGUI component subtrees, we ﬁrst convert all of them to anembedding. Similar to natural-language sentence, the overallGUI component layout tree can be regarded as a sentence, andthe subtrees obtained from its metadata decomposition is theconstituent words of this sentence. We serialize the subtreesby depth-ﬁrst traversal and map them into embedding spaceto get their vector features as the input of our

GUIGAN . Thus,we apply a deep learning network to get the feature vector andstyle embedding of the subtree sequences.To transform the image from pixel level to vector level,we adopt a siamese network [23], [24] to model the GUIdesign with a dual-channel CNN structure, which maps aGUI into GUI vector space. We apply a pair of GUI images( g , g ) as the input and the goal of the siamese networkis to distinguish whether the two images are from the sameapp. According to our observation, the GUIs from one app ismore similar in design style than GUIs from different apps.Therefore, we set up the learning function to discriminate iftwo input design images are from one app or not to makethe input embedding more meaningful, i.e., representing thedesign style. The CNN in the siamese network takes one ofthe GUI screenshot pair as the input. Then the convolutionoperation is executed with various ﬁlters ( m × m matrix)to extract the features in the GUIs. A Relu activation and amax-pooling layer follow the convolution operation, which canbe considered as a convolutional block and they are stackedrepeatedly. In the end, the output of the last convolutionalblock, which represents the embedding in the vector space ofthe GUIs, is ﬂattened into an FC(fully-connected) layer.The goal of the trained CNN in the siamese network is toconvert the GUI screenshot image g to N-dimensional vector V g . This non-linear transformation function f can be expressedas V g = f ( g ; θ ) , where θ represents the trainable parametersin the network which can be updated by the back-propagationalgorithm in the training process. The weighted L distanceis applied to measure the two feature vectors V g and V g from the two channels and then fed into a sigmoid activationfunction to calculate the predictive result. Since this task canbe formulated as a binary classiﬁcation problem i.e., two inputscreenshots from the same app or not, we adopt the binarycross entropy loss function: Loss ( x, y ) = − (cid:88) i ( x i log( y i ) + (1 − x i ) log(1 − y i )) (1)where x is the probability output of the network and y is thetarget (0 or 1). The CNNs of the different channels use the same weightsand learn the ability to obtain the most representative informa-tion features in GUIs, which is used to quantitatively comparethe appearance design style similarity between GUI images.The pixel information of the intercepted layout subtrees fromthe original GUI is fed into the trained CNN and thus weacquire their design embeddings. B. Modeling Subtree Compatibility

We apply the CNN (in the Siamese network) trained inthe previous section to help evaluate the aesthetic identityof the generated samples. In each g-step, when the generatorgenerates a complete sequence S T , which is composed of T subtrees from different apps (subtree repository) splicedin order. According to the metadata of their GUIs, we canacquire the coordinates of each subtree and intercept theirimages from their original GUI images. Then we input theminto the CNN from the Siamese network trained before andoutput their embeddings. Using these embeddings, we applythe homogeneity (HOM) to evaluate the aesthetic compatibilityof subtrees in the sequence.Homogeneity (HOM) is the proportion of clusters contain-ing only members of a single class (the class here representsthe app, and when the subtrees all come from the same app,they get the highest harmony) by h = 1 − H ( G | C ) H ( G ) (2)where H ( G | C ) is the conditional entropy ofa class with a given cluster assignment and H ( G | C ) = − (cid:80) | G | g =1 (cid:80) | C | c =1 n g ,cn log( n g ,cn c ) . H ( G ) representsthe entropy of G and H ( G ) = − (cid:80) | G | g =1 n g n log( n g n ) . n isthe total number of samples, n g and n c belong to class G and class C respectively, and n g,c is the number of samplesdivided from class G to class C .We expect that the generator can keep learning to makethe generated results have higher homogeneity scores, whichrepresents better coordination and compatibility. Therefore, weintegrate the homogeneity score of the generated result into thetraining of the generator after the discriminator feeds back thereward value to a complete sequence from the generator. Thenwe calculate the style loss as Loss c = (cid:26) exp ( − h ) , if c > , if c = 1 (3)where c is the number of the app where the subtrees comefrom. If the subtrees of a sample are all from the same app, Loss c becomes zero. C. Modeling Subtree Structure

In addition to style information, another factor to be con-sidered is the structure information corresponding to the gen-erated sequence. The layouts of each actual GUI have certaincomposition rules, which make a GUI not only more logicalin appearance but also practical function. We hope that whengenerating new subtree combination sequences, the generatoran also follow the composition conditions of GUI to a certainextent so that these synthetic sequences not only stay diversitybut also meet the structural characteristics of the real GUIs. Forthis purpose, we use the structure strings of the subtrees fromtheir metadata instead of the GUI wireframe images [25] [26]to represent their structures as there is explicit order amongdifferent GUI components. The minimum edit distance (MED)is introduced to quantify the structural similarity betweentwo GUIs. The MED can be used to evaluate the structuralsimilarity between the generated samples and the real worlddata. By reducing the structural distance, we can optimize thegenerator, so that it learns the reasonable structure combinationand order from the real GUIs, which can be expressed as

Loss s =  max( i, j ) , if min( i, j ) = 0 , min  lev S r ,S g ( i − , j ) + 1 lev S r ,S g ( i, j −

1) + 1 lev S r ,S g ( i − , j −

1) + 1 otherwise. (4)where S r and S g represent the subtree structure stringsof real and generated samples, lev S r ,S g ( i, j ) is the distancebetween the ﬁrst i characters in S r and the ﬁrst j charactersin S g . Each character represents a GUI component such asListView or FrameLayout. D. Multi-Loss Fusion

There is a large span in the numerical region of the threelosses(the feedback loss from the discriminator, loss g , loss c ,and loss s ), so that we need to normalize them for subsequentcalculation. By adding the trainable noise parameters [27], thethree loss values can be balanced to the same scale and weexpress the ﬁnal fusion loss function as Loss mul = λ Loss g + λ Loss c + λ Loss s (5)In g-step, we update the parameters of the generator byminimizing Loss mul and we apply

Adam update algorithm[28] instead of stochastic gradient descent (SGD) for fasterconvergence. arg min G Loss mul (6)The challenge of the model is to generate a new GUI designwith a reasonable structure and compatible style. We don’twant the model to only learn to generate sequences similar tothe real samples. It promotes

GUIGAN to generate new GUIdesigns with authenticity and diversity by fusing the structureand style information simultaneously to the original sequencefeatures. As shown in Fig 4, two samples generated by the

GUIGAN are reconstructed by the pieces from the real GUI.IV. I

MPLEMENTATION

The data in our experiment comes from the Rico open-source online data collection [25], and a part of GUIs forthis article is retained through manual screening. Our modelmainly consists of a SeqGAN (includes an LSTM and a CNN)and a Siamese network. All networks are implemented on thePyTorch platform and trained on a GPU. (a)(b)

Fig. 4. Two example GUIs generated by

GUIGAN with components fromcorresponding original GUIs.

A. Dataset Construction

Our data comes from Rico [25], an open-source mobile appdataset for building data-driven design applications. Rico isthe largest repository of mobile app designs to date, supportingdesign search, UI layout generation, UI code generation, etc.Rico was built by mining Android apps at runtime via human-powered and programmatic exploration. 13 workers spent2,450 hours using the downloaded apps from the Google PlayStore on the platform over ﬁve months, producing 10,811 userinteraction traces. Rico contains design and interaction data for72219 UIs from 9772 apps, spanning 27 categories.Based on our observation, not all GUIs from Ricodatasets [25] can be used in this study, so we remove someof them. First, we remove the GUIs of game apps as gameapp GUIs is speciﬁcally generated by game engine which isdifferent from other general apps. Second, we manually re-move some low-quality GUIs including large pop-up windows,advertisements, or posters occupying the whole screen size,webpage, loading page with a progress bar, and real scenes incamera. Some examples can be seen in Fig 5 and we releaseall of our datasets at our online gallery . https://github.com/GUIDesignResearch/GUIGAN a) (b) (c) (d) Fig. 5. Examples of GUIs we removed from our dataset: The GUI from thegame app (a), pop-up window (b), large picture (c), waiting page (d).

B. Model Implementation

As we take the subtree of the existing GUIs as the basic unitfor composing a new GUI design in our paper. A sample ofthe real-world data is the combination of the subtrees from onesingle real-world GUI in order, with both the screenshots andstructure information based on the metaﬁle. It is mainly usedas the training data for our GAN based approach. We ﬁrst splitthose GUIs collected from real-world Android apps in GooglePlay into subtrees following procedures in Section II-A. Wethen train a Siamese network with this data to get subtreestyle embedding in Section III-A. The subtree embedding isinput to the generator of the

GUIGAN for modelling both thestyle loss (

Loss c ) and structure loss ( Loss s ) for generatingnew subtree sequences, which can be used for composingnew GUI designs. The generated subtree sequences are theninput to the discriminator within the SeqGAN for training thediscriminator to discriminate the real-world GUI and generatedGUI. After the adversarial training, given random noise or pre-built GUI components, the GUIGAN can generate a new GUIby composing several subtrees from real-world GUIs.The structure of the generator and the discriminator inSeqGAN is preserved in

GUIGAN which is implemented inPyTorch. We store the start and end subtrees of each real-world GUI in the start list and end list respectively. TheLSTM randomly takes a subtree in the start list as the initialmatrix, not the zero matrices in SeqGAN. Thus, the generatorgenerates a sequence of length T (the default sequence lengthis 30 because the GUI subtree length in real-world data ismostly within 30). From the beginning of the ﬁrst subtree inthe start list, if the total height of the later spliced subtree andall previous subtrees exceeds the rated height, the subsequentsplicing will be stopped. If the subtree in the end list isselected, the splicing will be stopped directly.The Long Short-Term Memory (LSTM) is used as thegenerative network. The vector dimension is selected to be32 and the hidden layer feature dimension is selected to be32. Like the discriminative model in SeqGAN, we also use aCNN network that joins the highway architecture. The batchsize is 32 and the learning rate is set to 0.05.The siamese network is a two-channel CNNs with shared weights. The positive example is a pair of subtrees from thesame app with a label of 1, and the negative one from twodifferent apps with a label of 0. The subtree image is resizedto × by the Nearest Neighbor algorithm as the input.There are 4 Conv → Pool layers blocks in the CNN structure.The ﬁrst

Conv layer use 64 ﬁlters, and each subsequent

Conv layer doubles the number of ﬁlers. We set the ﬁlter size as × , × , × , and × in different CNN layers,and the stride as × for convolutional layers. We apply thepooling units of size × applied with a stride 2. We trainthe Siamese network with 50 epochs for about three hours.V. A UTOMATED E VALUATION

In this section, we prepare GUI images collected accordingto the category and development company of the app as exper-imental data based on the statistics and test the performance ofthe proposed model on GUI generation. We use the real-worlddata as the ground truth and introduce WGAN-GP, FaceOff,and two variations of our

GUIGAN as the baseline methods tocompare the two metrics of FID and 1-NNA.

A. Experimental Dataset

When preparing the experiment dataset, we consider twospeciﬁc usage scenarios when developing app GUIs. First,most designers and developers have a clear goal for developingthe GUI of an app in a speciﬁc category such as ﬁnance,education, news, etc. Since each kind of app category ownsits characteristics, we try to test our model’s capability incapturing that characteristic by preparing a separated datasetfor the ﬁve most frequent app categories in Rico dataset [25]including News & Magazines, Books & Reference, Shopping,Communication, and Travel & Local as shown in Table I.Second, we notice that designers and developers often referto the GUI design style developed by big companies whendeveloping their own GUI. Therefore, another experiment isto generate GUIs by learning GUI designs from a speciﬁccompany. Based on their metadata, we prepare three kinds ofGUIs from three big companies with most apps in our dataseti.e., Google, YinzCam, and Raycom as shown in Table I.

TABLE IGUI

DATASET BY CATEGORY OR COMPANY

Category or Company

B. Evaluation Metrics

In order to quantify and measure the similarity between thereal data distribution P r and the generated sample distribution g , we introduce Fr´echet Inception distance (FID) [29] and1-Nearest Neighbor Accuracy (1-NNA) [30] as the evaluationmetrics. FID measures the diversity and quality of generatedimages relative to real images, and 1-NNA is used to analyzethe distribution differences between the two sample sets. Fr´echet Inception distance (FID) is a widely-used met-ric [31] recently introduced for measuring the quality anddiversity of generated images, especially by GAN. FID iscalculated according to Inception Score (IS), comparing thestatistics of negative samples to positive samples, using theFr´echet distance between two multivariate Gaussians:

FID( P r , P g ) = (cid:107) µ r − µ g (cid:107) + T r ( C r + C g − C r C g ) / ) (7)where µ r and µ g are the mean values of the 2048-dimensional activations of the Inception-v3 [32](By asymmet-rically decomposing the convolution operation, the depth andwidth of the network pretrained on the ImageNet are increasedsimultaneously, and the 1 × r and g respectively, while C r and C g are the covariances.A lower FID represents that the two distributions are closer,which means that the quality and diversity of the generatedimages are higher. In our experiments, we input the samenumber of real and generated image collections into theInception-v3 network to get the FID score. is used in two-sample teststo assess whether two distributions are identical. A welltrained 1-nearest neighbor classiﬁer is applied. The better theperformance of the generated model, the more difﬁcult it isfor the 1-NN classiﬁer to distinguish the true from the false.Therefore, the best recognition rate is 50%, the worst is 100%.But if the recognition rate is lower than 50%, it means thatthe model may be overﬁtting. This metric is widely used forevaluating the quality of generated images [33]–[35]. C. Baseline Models

According to our query of relevant information, there arevery little researches on generating GUI designs. But there aresome related works about image generation which is roughlysimilar to our task, so we use two different kinds of methodsas baselines with one from the image generation and the otherone using template search.The ﬁrst baseline is

WGAN-GP [36], which is proposedon the basis of WGAN [19]. WGAN introduces Wassersteindistance and optimizes the implementation process of thealgorithm to solve the problem of gradient vanishing in thetraining process of traditional GAN. WGAN-GP introduces agradient penalty in WGAN, which accelerates the convergenceof the model and has a more stable training process.The second baseline is

FaceOff [21] which parses theDOM tree of a raw input website created by a user throughmeasuring the distance of the trees and uses a CNN to learnstyle compatibility to ﬁnd a similarly well-designed web GUI.Although FaceOff is used to generate a new web GUI, in thisarticle we modify its raw input to the real world data GUI (a) (b) (c) (d)

Fig. 6. Examples of the GUIs generated by

GUIGAN . structure of the mobile application as a query and then sortthe retrieved results according to the homogeneity score oftheir subtree combination.Apart from the two baselines mentioned above, we also getsome derivation baselines from our model by changing themuti-loss in the generator. One ( GUIGAN -style ) is, using onlythe

Loss c correction, and the other one ( GUIGAN -structure )with only the

Loss s correction within which the generatorfocuses on either design style or structure characteristics. Inthis way, we can compare and observe within our model howthese modiﬁcations affect the generated results in metric. D. Results

Table II and III show the results of different methods ontwo metrics in the category and company speciﬁc developmentscenarios. The results show that our proposed model achievesthe highest scores on FID and 1-NNA under both the GUI de-velopment scenario. Our model achieves a 33.63% and 11.33%boost in FID and 1-NNA than the best baselines in the datasetof category, and 28.23% and 14.18% increase in the datasetof the company, respectively. Fig 6 shows examples of GUIimages from

GUIGAN , and for ease of observation, we separatedifferent subtrees with thick red lines. It can be seen that theGUIs generated from

GUIGAN have a comfortable appearance,and a reasonable structure composed of different components.Meanwhile, it also keeps the overall harmonious design style.After checking many generated GUIs, we ﬁnd that both thestructure and style of the GUIs are also very diverse whichcan provide developers or designers with different candidatesfor their GUI design. More generated GUIs from

GUIGAN canbe seen in our online gallery .Two baselines including WGAN-GP and FaceOff do notperform well compared with GUIGAN in FID and 1-NNA.According to our observation, we can see the overall layoutof generated GUIs from WGAN-GP as seen in Fig 7 (a), butvery blurred in detail. That is because WGAN-GP is a pixel-based approach that cannot accurately model the informationof component-based GUIs. Although it is widely used in thenatural image, it is not suitable for our artiﬁcial GUI designimages, especially considering the fact that there is not so https://github.com/GUIDesignResearch/GUIGANABLE IIP ERFORMANCE BY DIFFERENT APP CATEGORIES

Category WGAN-GP FaceOff

GUIGAN GUIGAN -Style

GUIGAN -StructureFID 1-NNA FID 1-NNA FID 1-NNA FID 1-NNA FID 1-NNANews & Magazines 0.181 0.999 0.145 0.987

TABLE IIIP

ERFORMANCE BY DIFFERENT APP DEVELOPMENT COMPANIES company WGAN-GP FaceOff

GUIGAN GUIGAN -Style

GUIGAN -StructureFID 1-NNA FID 1-NNA FID 1-NNA FID 1-NNA FID 1-NNAGoogle 0.181 0.999 0.125 0.945 0.131 0.844 0.122 (a) (b) (c) (d)

Fig. 7. Generated GUI examples by WGAN-GP (a) and FaceOff (b, c, d). much data in this study. FaceOff is much better than WGAN-GP, but there are still some issues with their approach. First,FaceOff often chooses the subtrees with the highest backscore to accelerate the convergence of the model and onlycompares the structural similarity between the real GUIs andthe retrieved template to minimize their distance, resulting inthe diversity loss. However, it does not consider the relativeposition of each component especially the speciﬁc top-downrelationship of the structure in the GUI. Therefore, mostgenerated GUIs from FaceOff are of very similar structurelike that in Fig 7 (b), and many GUIs are also of the samecolor schema as that in Fig 7 (c) and Fig 7 (d).The other two derived baselines of our approach,

GUIGAN -style and

GUIGAN -structure explore the impact of style in-formation and structure information on the generated results.With only modeling the design style information,

GUIGAN - (a) (b) (c) (d) Fig. 8. Generated GUI examples by

GUIGAN -style (a, b) and

GUIGAN -structure (c, d). style can generate GUI designs with harmonious color com-binations as seen in Fig 8 (a) and (b), but without very goodstructural designs. For example, the menu tab appears in themiddle of the GUI in Fig 8 (a) and the login button appearsat the top of the GUI in Fig 8 (b). Similar issues also applyto

GUIGAN -structure with reasonable and diverse layouts ofgenerated GUIs, but terrible color schema as seen in Fig 8 (c),(d). The results from these two baselines demonstrate that thetwo loss function settings in Section III successfully capturethe style and structure information.Although most of the samples generated by our model aresatisfactory, there are still some bad designs. We manuallyobserve those bad GUI designs and summarize some reasons.First, due to the default size of components in the subtree,some of them are difﬁcult to ﬁt into the generated GUIs asseen in Fig 9 (a), (b). There is either an overlap between someomponents or one ﬁgure taking all the GUI space. Second,since our model learns the style and structure informationat the same time, there may be an imbalance between themfor some GUI generation. Fig 9 (c) shows an example withtoo much emphasis on style consistency while ignoring thestructural effects. In contrast, Fig 9 (d) has a set of diversecomponents in good structure but incompatible color schema. (a) (b) (c) (d)

Fig. 9. Examples of bad results generated by

GUIGAN . VI. H

UMAN E VALUATION

The target of this work is to automatically generate a list ofGUI designs for novice designers or developers to adopt. Theautomated experiments above demonstrate the performanceof our model compared with other baselines. However, thesatisfactoriness of the GUI design can be subjective depend-ing on different users or developers. To better evaluate theusefulness of

GUIGAN , we conduct a user study to investigatethe feedback from developers in this section.

A. Evaluation Metrics

There are no existing evaluation metrics for mobile GUIdesign in the literature. But inspired by the web GUI evalu-ation [37]–[39] and image evaluation [40], [41], we proposethree novel metrics for participants to rate the quality of theGUI design from three aspects by considering the characteris-tics of the mobile GUIs. First, design aesthetics is to evaluatethe overall design’s pleasing qualities. Second, we adopt colorharmony [42], [43] which refers to the property that certainaesthetically pleasing color combinations have to evaluate thecolor schema selection within the GUI. These combinationscreate pleasing contrasts and consonances that are said to beharmonious. Third, structure rationality is used to measure thecomponent layout rationality, i.e., the location of componentsin the GUI and the logic of their combination and sorting. Foreach metric, the participants will give a score ranging from1 to 5 with 1 representing the least satisfactoriness while 5as the highest satisfactoriness. Besides, to conﬁrm if

GUIGAN implicitly considers the app functionality during the training,we further ask participants to manually check if the componentdistribution of generated GUIs is functionally correct e.g., themenu bar on the top of the page. They will mark 1 if the GUIcomponents are functionally correctly distributed, while 0 forincorrect ones.

B. Procedures

In real-world app development, teams often know theirtarget very well. To mimic that practice, we select 5 appcategories (same in Section V) for speciﬁc GUI generation.For each category, we randomly generate 10 GUI designs foreach method. Due to the poor performance of WGAN-GP inthe last experiment, we only take FaceOff as the baseline.We recruited ﬁve Master students majoring in computerscience. They all have several-year programming experienceand at least 1-year Android development experience mostlyabout GUI implementation and some GUI design. There-fore, they can be regarded as junior Android developers forevaluating whether they are satisﬁed with our GUI design.First, we introduce them a detailed explanation about the GUIevaluation metrics. Then they are provided with the generatedGUI designs from different methods, then give the score ofeach GUI design in three metrics i.e., design aesthetics, colorharmony, structure rationality. Note that they do not knowwhich GUI design is from which method and all of them willevaluate the GUI design individually without any discussion.After the experiment, we tell the participants which GUIdesigns are generated by our model and ask them to leavesome general comments about our

GUIGAN . C. Results

As shown in Table IV, the generated GUI designs fromour model outperforms that of FaceOff signiﬁcantly with3.11, 3.30, and 3.21 which are 31.22%, 25.00%, and 34.87%increase in overall aesthetics, color harmony, and structure.In addition to the average score, our model is also better thanFaceOff in generating GUI design for all ﬁve app categories inthree metrics. That result also demonstrates the generalizationof our

GUIGAN . According to the detailed analysis of theexperiment result, the GUI designs with low scores tend tohave incomplete structure, single content, large and abrupt pic-tures, or advertisements. In contrast, the GUI with high scoreshas a concise layout, slightly rich content, and backgroundcompatible images. We also ﬁnd that users’ requirements forcontent richness are much higher than other indicators, butthis often goes against the simplicity of layout, which needsfurther research and balance.To understand the signiﬁcance of the differences betweenthe two approaches, we carry out the Mann-Whitney Utest [44] which is speciﬁcally designed for small samples(only 10 GUI design in each category) on three metrics.The test results in Table IV suggests that our

GUIGAN cancontribute signiﬁcantly to the GUI design in all three metricswith p − value < . or p − value < . except theaesthetics and color harmony metrics in the shopping category.Besides the comparison with the baseline, we also presentparticipants another dataset mixing with 10 randomly se-lected real-world GUI design images from our dataset and10 randomly selected GUI designs generated by our modelfor checking the overall GUI aesthetics, color harmony, andstructure. Some generated GUIs are even rated higher thanreal-world ones. In terms of Fig 10 (a), the ﬁve user-study ABLE IVP

ERFORMANCE OF H UMAN E VALUATION . **

DENOTES p < . AND * DENOTES p < . Category Metric ScoreFaceOff

GUIGAN

News & Magazines aesthetics .

08 2 . ∗∗ harmony .

54 3 . ∗∗ structure .

22 3 . ∗∗ functionality 0.38 . ∗∗ Books & Reference aesthetics .

40 3 . ∗∗ harmony .

46 3 . ∗∗ structure .

40 3 . ∗∗ functionality 0.40 . ∗∗ Shopping aesthetics 2.66 3.02harmony 3.04 3.18structure .

52 3 . ∗ functionality 0.60 0.78Communication aesthetics .

56 3 . ∗ harmony .

86 3 . ∗ structure .

60 3 . ∗ functionality 0.42 . ∗∗ Travel & Local aesthetics .

14 3 . ∗∗ harmony .

30 3 . ∗∗ structure .

16 3 . ∗∗ functionality 0.46 . ∗∗ Average aesthetics .

37 3 . ∗∗ harmony .

64 3 . ∗∗ structure .

38 3 . ∗∗ functionality .

452 0 . ∗∗ participants rate the real GUI with 2.8, 3.2, and 3.6 in threemetrics (aesthetics, harmony, structure) on average, while 3,3.4, 3.4 for the generated one. For Fig 10 (b), they score thereal GUI with 3.2, 3.6, and 4, and the generated GUI with 3.6,3.6, and 4 which is not that high. There are several reasonswhy our generated GUIs get higher scores than some real-world GUIs: (1) Some generated GUIs (examples in Fig 10)are of higher quality than some poorly designed real-worldGUIs. And note that the score is just 2.1% to 3.7% higherthan some real-world GUIs. (2) There may be the human biasof different raters as different people are of different aestheticvalues. Different human raters may also adopt slightly differentcriteria when rating the GUI quality as people vary. Wemitigated those potential bias by not telling them which GUIis generated by our model or from real-world apps.The results of functionality can be seen in Table IV withthe average score as 0.812 which is 79.65% signiﬁcantlyhigher than that of the baseline. It shows that the componentdistribution of most of our generated GUIs is functionallycorrect. It also indicates our model implicitly get that appfunctionality during training on a large-scale dataset as theGUI design is highly related to app functionality, though ourmodel does not explicitly consider app functionality. Addi-tionally, we ﬁnd that the performance of our model differs indifferent app categories. GUIGAN achieves an average score of

Shopping (a) Shopping

Travel & Local (b) Travel & Local

Fig. 10. In each pair, the ﬁrst image is real-world GUI while the second oneis generated by

GUIGAN . ELATED W ORK

GUI is crucial for the user experience of modern desktopsoftware, mobile applications, and online websites. In thissection, we introduce related works about GUI design andGUI generation.

A. GUI Design

GUI design is an important step in GUI development.Therefore, many researchers are working on assisting design-ers in the GUI design such as investigating the UI designpatterns [45], color evolution [46], [47], UI-related users’review [48], [49], GUI code generation [13], website GUIgeneration [21], etc. Liu et al. [50] follow the design rulesfrom Material Design to annotate the mobile GUI designto represent its semantics. Swearngin et al. [51] adopt theimage processing method to help designs with converting themobile UI screenshots into editable ﬁles in Photoshop so thatdesigners can take it as a starting point for further customiza-tion. To render inspirations to the designer, Chen et al. [52]propose a program-analysis method to efﬁciently generate thestoryboard with UI screenshots, given one app executable ﬁle.Fischer et al. [53] transfer the style from ﬁne art to GUI.Chen et al. [54] study different GUI element detection methodson large-scale GUI data and develop UIED [55] to handlediverse and complicated GUI images. Other supporting workssuch as GUI tag prediction [56] and GUI component galleryconstruction [57] can enhance designers’ searching efﬁciency.All of these works are targeting at simplifying the designprocess for professional designers. In contrast, our method isfocusing on the initial stage of GUI design i.e., generatingdiverse GUI designs for giving inspirations to novice designersand developers who are of not much GUI design training.hrough the GAN method in deep learning, our model learnsthe design style and structural characteristics of existing GUIsto generate diversiﬁed new GUI for designers’ reference, soas to lower the GUI design barrier.

B. GUI Generation

Thanks to the rapid development of deep learning, the imagegeneration performance are further improved, especially by theGenerative Adversarial Network (GAN) [22] and its deviationmodels. Apart from the natural image generation, there are alsomany works on re-arranging elements for composing graphicdesigns with better layouts (especially semantic layouts).Sandhaus et al. [58] present an approach for the automaticlayout of photo compositions that incorporates the knowledgeabout aesthetic design principles. Yang et al. [43] analyze low-level image features and apply high-level aesthetic designingprinciples and predeﬁned templates to the given images andtexts, thus to automatically suggest the optimal template, textlocations, and colors. Vempati et al. [59] utilize the MaskR-CNN object detector to automatically annotate the requiredobjects/tags and a Genetic Algorithm method to generate anoptimal advertisement layout for the given image content,input components, and other design constraints. A rankingmodel is trained on historical banners to rank the generatedcaptivities by predicting their Click-Through-Rate (CTR). Liet al. [60] introduce a progressive generative model of imageextrapolation with three stages and two important sub-tasks.In the ﬁeld of text-to-image synthesis, Hinz et al. [61]introduce Semantic Object Accuracy (SOA) to evaluate imagesgiven an image caption. LayoutGAN is proposed by Li etal. [62] for graphic design and scene generation, introducingwireframe rendering for image discrimination. The generatortakes as input a set of vectors and uses self-attention modulesto reﬁne their labels and geometric parameters jointly. Jyothi etal. [63] propose a variational autoencoder based method calledLayoutVAE, which allows for generating full image layoutsgiven a label set, or per label layouts for an existing imagegiven a new label and has the capability of detecting unusuallayouts. Some works are generating GUI test case for checkingGUI usability [64]–[66], accessibility [67] and security [68].Unlike these works in generating the layout of graphicdesign like posters, advertisements, we are the ﬁrst to work onGUI design generation. Different from their tasks in arrangingthe given components, our task is more challenging i.e.,selecting components from a repository and compose theminto a great GUI design by taking the design style and structureinformation into the consideration. Therefore, we develop anovel approach for modeling that information.VIII. C

ONCLUSION

Designing a good GUI which requires much innovation andcreativity is difﬁcult even to well-trained designers. In this pa-per, we propose a GAN-based GUI design generation method,

GUIGAN , which can assist novice designers and developersby generating new GUIs by learning the existing app GUIscreenshots. The generated GUI design can be regarded as the starting point or inspiration for their design work. Wedecomposed the ﬁltered GUIs to form our large-scale subtreeretrieval repository, then feed these subtrees to our modelfor generating reasonable one-dimension sequences, which arefurther used for reorganization. Two additional corrections areadded to the generator of our model to improve the modelin the aspect of design style and structural composition. Theautomated experiments demonstrate the performance of ourmodel and the user study conﬁrms the usefulness of

GUIGAN .To improve the generated GUIs, we will adopt two ways.First, we will improve our model to generate GUIs with higherquality. We can summarize a list of issues of the current modelby carrying out a detailed analysis of a bad GUI generation ofcurrent data. We then improve our model accordingly and alsoadd a list of rules to post process generated GUIs e.g., menubar should be on the top of the GUI, etc. Second, we will buildan AI-human collaboration system i.e., the generated GUIs inour model are only used for inspiring developers/designers.Designers or developers can further select or customize GUIsaccording to their purpose. Besides, our model can be used toconvert designers’/developers’ partial GUI design to the fullone via our model’s leveraging the pre-built GUI componentsand control, though it can be further improved. Some examplescan be seen at our online gallery . We also plan to combine thecurrent GUI code generation works [13], [15], [69] with ourGUI design generation to fully automate GUI development.IX. A CKNOWLEDGEMENTS

This work is supported in part by the National Natural Sci-ence Foundation of China (Grant 61471181), and ChunyangChen is partially supported by Facebook research award.R

EFERENCES[1] B. J. Jansen, “The graphical user interface,”

ACM SIGCHI Bulletin ,vol. 30, no. 2, pp. 22–26, 1998.[2] T. Winograd, “From programming environments to environments fordesigning,”

Communications of the ACM , vol. 38, no. 6, pp. 65–74,1995.[3] “Essential design principles,” https://developer.apple.com/videos/play/wwdc2017/802/, 2017.[4] “Design - material design,” https://material.io/design/, 2014.[5] W. O. Galitz,

The essential guide to user interface design: an introduc-tion to GUI design principles and techniques . John Wiley & Sons,2007.[6] I. G. Clifton,

Android User Interface Design: Implementing MaterialDesign for Developers . Addison-Wesley Professional, 2015.[7] Y. W. B. Hong, “Matters of design,” in

Commun. ACM arXivpreprint arXiv:1901.00891 , 2019.[12] F. Behrang, S. P. Reiss, and A. Orso, “Guifetch: supporting appdesign and development through gui search,” in

Proceedings of the 5thInternational Conference on Mobile Software Engineering and Systems .ACM, 2018, pp. 236–246. https://github.com/GUIDesignResearch/GUIGAN/blob/master/README.md Proceedings of the 40th International Conferenceon Software Engineering . ACM, 2018, pp. 665–676.[14] T. Beltramelli, “pix2code: Generating code from a graphical user in-terface screenshot,” in

Proceedings of the ACM SIGCHI Symposium onEngineering Interactive Computing Systems . ACM, 2018, p. 3.[15] K. Moran, C. Bernal-C´ardenas, M. Curcio, R. Bonett, and D. Poshy-vanyk, “Machine learning-based prototyping of graphical user interfacesfor mobile apps,” arXiv preprint arXiv:1802.02312 , 2018.[16] A. Radford, L. Metz, and S. Chintala, “Unsupervised representationlearning with deep convolutional generative adversarial networks,” arXivpreprint arXiv:1511.06434 , 2015.[17] A. B. L. Larsen, S. K. Sønderby, H. Larochelle, and O. Winther,“Autoencoding beyond pixels using a learned similarity metric,” in

International conference on machine learning . PMLR, 2016, pp. 1558–1566.[18] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-imagetranslation using cycle-consistent adversarial networks,” in

Proceedingsof the IEEE international conference on computer vision , 2017, pp.2223–2232.[19] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan,” arXivpreprint arXiv:1701.07875 , 2017.[20] L. Yu, W. Zhang, J. Wang, and Y. Yu, “Seqgan: Sequence generativeadversarial nets with policy gradient,” in

Thirty-First AAAI Conferenceon Artiﬁcial Intelligence , 2017.[21] S. Zheng, Z. Hu, and Y. Ma, “Faceoff: Assisting the manifestation designof web graphical user interface,” in

Proceedings of the Twelfth ACMInternational Conference on Web Search and Data Mining . ACM,2019, pp. 774–777.[22] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in

Advances in neural information processing systems , 2014, pp. 2672–2680.[23] J. Bromley, I. Guyon, Y. LeCun, E. S¨ackinger, and R. Shah, “Signatureveriﬁcation using a” siamese” time delay neural network,” in

Advancesin neural information processing systems , 1994, pp. 737–744.[24] G. Koch, R. Zemel, and R. Salakhutdinov, “Siamese neural networks forone-shot image recognition,” in

ICML deep learning workshop , vol. 2,2015.[25] B. Deka, Z. Huang, C. Franzen, J. Hibschman, D. Afergan, Y. Li,J. Nichols, and R. Kumar, “Rico: A mobile app dataset for buildingdata-driven design applications,” in

Proceedings of the 30th Annual ACMSymposium on User Interface Software and Technology . ACM, 2017,pp. 845–854.[26] J. Chen, C. Chen, Z. Xing, X. Xia, and J. Wang, “Wireframe-basedui design search through image autoencoder,”

ACM Transactions onSoftware Engineering and Methodology , vol. 29, no. 3, pp. 1–31, 2020.[27] A. Kendall, Y. Gal, and R. Cipolla, “Multi-task learning using uncer-tainty to weigh losses for scene geometry and semantics,” in

Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition ,2018, pp. 7482–7491.[28] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 , 2014.[29] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter,“Gans trained by a two time-scale update rule converge to a local nashequilibrium,” in

Advances in Neural Information Processing Systems ,2017, pp. 6626–6637.[30] D. Lopezpaz and M. Oquab, “Revisiting classiﬁer two-sample tests,”2017.[31] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, andX. Chen, “Improved techniques for training gans,” in

Advances in neuralinformation processing systems , 2016, pp. 2234–2242.[32] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinkingthe inception architecture for computer vision,” pp. 2818–2826, 2016.[33] G. Yang, X. Huang, Z. Hao, M.-Y. Liu, S. Belongie, and B. Hariharan,“Pointﬂow: 3d point cloud generation with continuous normalizingﬂows,” in

Proceedings of the IEEE International Conference on Com-puter Vision , 2019, pp. 4541–4550.[34] S. Zhang, Z. Han, Y.-K. Lai, M. Zwicker, and H. Zhang, “Stylistic sceneenhancement gan: mixed stylistic enhancement generation for 3d indoorscenes,”

The Visual Computer , vol. 35, no. 6-8, pp. 1157–1169, 2019. [35] Q. Xu, G. Huang, Y. Yuan, C. Guo, Y. Sun, F. Wu, and K. Weinberger,“An empirical study on evaluation metrics of generative adversarialnetworks,” 2018.[36] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville,“Improved training of wasserstein gans.”[37] K. Reinecke, T. Yeh, L. Miratrix, R. Mardiko, Y. Zhao, J. Liu, andK. Z. Gajos, “Predicting users’ ﬁrst impressions of website aestheticswith a quantiﬁcation of perceived visual complexity and colorfulness,” in

Proceedings of the SIGCHI conference on human factors in computingsystems , 2013, pp. 2049–2058.[38] C. K. Coursaris, S. J. Swierenga, and E. Watrall, “An empirical in-vestigation of color temperature and gender effects on web aesthetics,”

Journal of usability studies , vol. 3, no. 3, pp. 103–117, 2008.[39] J. Li, J. Yang, J. Zhang, C. Liu, and T. Xu, “Attribute-conditioned layoutgan for automatic graphic design,”

IEEE Transactions on Visualizationand Computer Graphics , vol. PP, no. 99, pp. 1–1, 2020.[40] H. Zhang, J. E. Fritts, and S. A. Goldman, “Image segmentationevaluation: A survey of unsupervised methods,” computer vision andimage understanding , vol. 110, no. 2, pp. 260–280, 2008.[41] Z. Wang, G. Healy, A. F. Smeaton, and T. E. Ward, “Use of neuralsignals to evaluate the quality of generative adversarial network per-formance in facial image generation,”

Cognitive Computation , vol. 12,no. 1, pp. 13–24, 2020.[42] M. Tokumaru, N. Muranaka, and S. Imanishi, “Color design supportsystem considering color harmony,” in ,vol. 1. IEEE, 2002, pp. 378–383.[43] X. Yang, T. Mei, Y.-Q. Xu, Y. Rui, and S. Li, “Automatic generationof visual-textual presentation layout,”

ACM Transactions on MultimediaComputing, Communications, and Applications (TOMM) , vol. 12, no. 2,pp. 1–22, 2016.[44] M. P. Fay and M. A. Proschan, “Wilcoxon-mann-whitney or t-test? onassumptions for hypothesis tests and multiple interpretations of decisionrules,”

Statistics surveys , vol. 4, p. 1, 2010.[45] K. Alharbi and T. Yeh, “Collect, decompile, extract, stats, and diff:Mining design pattern changes in android apps,” in

Proceedings ofthe 17th International Conference on Human-Computer Interaction withMobile Devices and Services . ACM, 2015, pp. 515–524.[46] A. Jahanian, S. Keshvari, S. Vishwanathan, and J. P. Allebach, “Colors–messengers of concepts: Visual design mining for learning color se-mantics,”

ACM Transactions on Computer-Human Interaction (TOCHI) ,vol. 24, no. 1, p. 2, 2017.[47] A. Jahanian, P. Isola, and D. Wei, “Mining visual evolution in 21 yearsof web design,” in

Proceedings of the 2017 CHI Conference ExtendedAbstracts on Human Factors in Computing Systems . ACM, 2017, pp.2676–2682.[48] B. Fu, J. Lin, L. Li, C. Faloutsos, J. Hong, and N. Sadeh, “Why peoplehate your app: Making sense of user feedback in a mobile app store,”in

Proceedings of the 19th ACM SIGKDD international conference onKnowledge discovery and data mining . ACM, 2013, pp. 1276–1284.[49] W. Martin, F. Sarro, Y. Jia, Y. Zhang, and M. Harman, “A survey of appstore analysis for software engineering,”

IEEE transactions on softwareengineering , vol. 43, no. 9, pp. 817–847, 2017.[50] T. F. Liu, M. Craft, J. Situ, E. Yumer, R. Mech, and R. Kumar, “Learningdesign semantics for mobile apps,” in

The 31st Annual ACM Symposiumon User Interface Software and Technology . ACM, 2018, pp. 569–579.[51] A. Swearngin, M. Dontcheva, W. Li, J. Brandt, M. Dixon, and A. J. Ko,“Rewire: Interface design assistance from examples,” in

Proceedings ofthe 2018 CHI Conference on Human Factors in Computing Systems .ACM, 2018, p. 504.[52] S. Chen, L. Fan, C. Chen, T. Su, W. Li, Y. Liu, and L. Xu, “Storydroid:Automated generation of storyboard for android apps,” in

Proceedingsof the 41st International Conference on Software Engineering . ACM,2019.[53] M. Fischer, R. R. Yang, and M. S. Lam, “Imaginenet: Style transferfrom ﬁne art to graphical user interfaces,” 2018.[54] J. Chen, M. Xie, Z. Xing, C. Chen, X. Xu, L. Zhu, and G. Li, “Objectdetection for graphical user interface: old fashioned or deep learningor a combination?” in

Proceedings of the 28th ACM Joint Meetingon European Software Engineering Conference and Symposium on theFoundations of Software Engineering , 2020, pp. 1202–1214.[55] M. Xie, S. Feng, Z. Xing, J. Chen, and C. Chen, “Uied: a hybrid toolfor gui element detection,” in

ESEC/FSE , 2020.56] C. Chen, S. Feng, Z. Liu, Z. Xing, and S. Zhao, “From lost to found:Discover missing ui design semantics through recovering missing tags,”

Proceedings of the ACM on Human-Computer Interaction , vol. 4, no.CSCW2, pp. 1–22, 2020.[57] C. Chen, S. Feng, Z. Xing, L. Liu, S. Zhao, and J. Wang, “Gallerydc: Design search and knowledge discovery through auto-created guicomponent gallery,”

Proceedings of the ACM on Human-ComputerInteraction , vol. 3, no. CSCW, pp. 1–22, 2019.[58] P. Sandhaus, M. Rabbath, and S. Boll, “Employing aesthetic principlesfor automatic photo book layout,” in

International Conference onMultimedia Modeling . Springer, 2011, pp. 84–95.[59] S. Vempati, K. T. Malayil et al. , “Enabling hyper-personalisation:Automated ad creative generation and ranking for fashion e-commerce,” arXiv preprint arXiv:1908.10139 , 2019.[60] Y. Li, L. Jiang, and M.-H. Yang, “Controllable and progressive imageextrapolation,” arXiv preprint arXiv:1912.11711 , 2019.[61] T. Hinz, S. Heinrich, and S. Wermter, “Semantic object accuracy forgenerative text-to-image synthesis,” arXiv preprint arXiv:1910.13321 ,2019.[62] J. Li, J. Yang, A. Hertzmann, J. Zhang, and T. Xu, “Layoutgan: Syn-thesizing graphic layouts with vector-wireframe adversarial networks,”

IEEE Transactions on Pattern Analysis and Machine Intelligence , 2020.[63] A. A. Jyothi, T. Durand, J. He, L. Sigal, and G. Mori, “Layoutvae:Stochastic scene layout generation from a label set,” in

Proceedingsof the IEEE International Conference on Computer Vision , 2019, pp.9895–9904.[64] D. Zhao, Z. Xing, C. Chen, X. Xu, L. Zhu, G. Li, and J. Wang, “Seeno-maly: Vision-based linting of gui animation effects against design-don’tguidelines,” in , 2020.[65] Y. Bo, X. Zhenchang, X. Xin, C. Chunyang, Y. Deheng, and L. Shanping,“Don’t do that! hunting down visual design smells in complex uis againstdesign guidelines,” in

The 43rd International Conference on SoftwareEngineering , 2021.[66] Z. Liu, C. Chen, J. Wang, Y. Huang, J. Hu, and Q. Wang, “Owleyes: Spotting ui display issues via visual understanding,” in . IEEE, 2020, pp. 398–409.[67] J. Chen, C. Chen, Z. Xing, X. Xu, L. Zhu, G. Li, and J. Wang, “Unblindyour apps: Predicting natural-language labels for mobile gui componentsby deep learning,” in

Proceedings of the ACM/IEEE 42nd InternationalConference on Software Engineering , ser. ICSE ’20. New York, NY,USA: Association for Computing Machinery, 2020, p. 322–334.[68] S. Chen, L. Fan, C. Chen, M. Xue, Y. Liu, and L. Xu, “Gui-squattingattack: Automated generation of android phishing apps,”

IEEE Transac-tions on Dependable and Secure Computing , 2019.[69] F. Sidong, M. Suyu, Y. Jinzhong, C. Chunyang, Z. TingTing, andY. Zhen, “Auto-icon: An automated code generation tool for icon designsassisting in ui development,” in