[PDF] Design and Evaluation of Product Aesthetics: A Human-Machine Hybrid Approach

Abstract

Aesthetics are critically important to market acceptance in many product categories. In the automotive industry in particular, an improved aesthetic design can boost sales by 30% or more. Firms invest heavily in designing and testing new product aesthetics. A single automotive "theme clinic" costs between $100,000 and $1,000,000, and hundreds are conducted annually. We use machine learning to augment human judgment when designing and testing new product aesthetics. The model combines a probabilistic variational autoencoder (VAE) and adversarial components from generative adversarial networks (GAN), along with modeling assumptions that address managerial requirements for firm adoption. We train our model with data from an automotive partner-7,000 images evaluated by targeted consumers and 180,000 high-quality unrated images. Our model predicts well the appeal of new aesthetic designs-38% improvement relative to a baseline and substantial improvement over both conventional machine learning models and pretrained deep learning models. New automotive designs are generated in a controllable manner for the design team to consider, which we also empirically verify are appealing to consumers. These results, combining human and machine inputs for practical managerial usage, suggest that machine learning offers significant opportunity to augment aesthetic design.

Full PDF

Design and Evaluation of Product Aesthetics: A Human-Machine Hybrid Approach by Alex Burnap John R. Hauser and Artem Timoshenko July 2019 Alex Burnap is a Postdoctoral Fellow in Marketing, MIT Management School, Massachusetts Institute of Technology, E62-366, 77 Massachusetts Avenue, Cambridge, MA 02139, (405) 880-3660, [email protected]. John R. Hauser is the Kirin Professor of Marketing, MIT Management School, Massachusetts Institute of Technology, E62-538, 77 Massachusetts Avenue, Cambridge, MA 02139, (617) 253-2929, [email protected]. Artem Timoshenko is an Assistant Professor of Marketing at Kellogg School of Management, Northwestern University, 2211 Campus Drive, Suite 5391, Evanston, IL 60208, (617) 803-5630, [email protected]. We thank Jeff Hartley, Joyce Salisbury, Zheng Shen, Sharon Sheremet, John Manoogian II, and Andrew Norton for valuable insights into how product aesthetics are designed and evaluated; Mark Beltramo, Fred Feinberg, Ari Helljaka, Honglak Lee, Ye Liu, and Yanxin Pan for mathematical modeling discussion; and Emrah Bayrak, Songting Dong, Nasreddine El-Dehaibi, Siham El Kilal, Gui Liberali, Ye Liu, Max Yi Ren, Erin MacDonald, and Glen Urban for helpful comments and suggestions.

Design and Evaluation of Product Aesthetics: A Human-Machine Hybrid Approach

Abstract

Aesthetics are critically important to market acceptance in many product categories. In the automotive industry in particular, an improved aesthetic design can boost sales by 30% or more. Firms invest heavily in designing and testing new product aesthetics. A single automotive “theme clinic” costs between $100,000 and $1,000,000, and hundreds are conducted annually. We use machine learning to augment human judgment when designing and testing new product aesthetics. The model combines a probabilistic variational autoencoder (VAE) and adversarial components from generative adversarial networks (GAN), along with modeling assumptions that address managerial requirements for firm adoption. We train our model with data from an automotive partner—7,000 images evaluated by targeted consumers and 180,000 high-quality unrated images. Our model predicts well the appeal of new aesthetic designs—38% improvement relative to a baseline and substantial improvement over both conventional machine learning models and pretrained deep learning models. New automotive designs are generated in a controllable manner for the design team to consider, which we also empirically verify are appealing to consumers. These results, combining human and machine inputs for practical managerial usage, suggest that machine learning offers significant opportunity to augment aesthetic design. Keywords:

Aesthetics, Generative Adversarial Networks, Generating New Products , Machine Learning, Prelaunch Forecasting, Product Development , Variational Autoencoders.

1. Introduction

Consumers consistently rank aesthetics among the three most important attributes in product choice (Bloch 1995; Creusen and Schoormans 2005). For example, the visual design of the original iPod was judged to be a critical factor in its market acceptance (Reppel, Szmigin, and Gruber 2006). In categories such as home appliances, aesthetics help firms establish product differentiation beyond functional attributes (Bloch 1995; Crilly, Moultrie, and Clarkson 2004; Person et al. 2007); for instance, the Dyson DC01 used transparent design to communicate its complexity to consumers, helping it become the best-selling vacuum in the U.K. (Noble and Kumar 2010). Firms use aesthetics to strategically position and enhance brand recognition (Aaker and Keller 1990; Karjalainen and Snelders 2010; Keller 2003); trade dress violations (non-functional attributes that signal brand identity) are hotly contested in Lanham Act (§43A) litigation. Aesthetics pervade marketing—visually-appealing products and packaging drive consumers to choose one product over another, especially at the point of purchase in crowded brick-and-mortar stores, supermarkets, and online retailers (Clement 2007; Orth and Malkewitz 2008). Developing product aesthetics can require substantial investment, yet returns on investment are found across markets—a study of 93 firms across nine product categories found that firms that heavily invested into aesthetic design had 32% higher earnings than industry averages (Hertenstein, Platt, and Veryzer 2005). Marketing and product managers routinely manage the aesthetic design of products, services, and digital marketplaces. In this paper, we propose a methodology to improve the process of aesthetic product design and testing. While the basic concepts of the proposed methodology are applicable across markets, we demonstrate the approach in the automotive industry. In the automotive industry, product aesthetics can explain up to 60% of uncertainty in purchase decision for certain segments (Kreuzbauer and Malter 2005). Automotive design significantly affects its market performance (Cho, Hasija, and Sosa 2015; Jindal et al. 2016; Rubera 2015), in large part by influencing consumer consideration (Liu et al. 2017; Palazzolo and Feinberg 2015). For example, the redesign of the Buick Enclave in 2008 commanded a 30% increase in MSRP over the Buick Rendezvous it replaced, and the redesign of the 2005 Volkswagen Beetle resulted in a 54% market share gain in just a single year (Kreuzbauer and Malter 2005; Blonigen, Knittel, and Soderbery 2013). On the other hand, the visual appeal of the 2001 Pontiac Aztec was cited as a reason for its lack of market success (Vlasic 2011). Not surprisingly, automotive firms invest heavily in design and redesign—$1.25 billion on average with up to $5.7 billion invested in critical designs (Blonigen, Knittel, and Soderbery 2013; Pauwels et al. 2004; Rubera 2015). Traditionally, human judgment enters the aesthetic product design pipeline in at least two ways. While there are established aesthetic heuristics and cognitive design principles (Coates 2003; Crilly, Moultrie, and Clarkson 2004; Norman 2004), aesthetic design is often generated and screened by design teams who have an “eye” for visual design. Design teams are powerful within organizations; their aesthetic judgments are hard to overrule (Vlasic 2011). Human judgment also affects aesthetics through consumer evaluations. Firms often ask consumers to evaluate alternative designs in laboratory test markets, A/B tests, or “theme clinics.” In a typical automotive theme clinic, a few hundred targeted consumers are recruited and brought to a central location to evaluate aesthetic designs. Consumers view the aesthetic designs and rate them on established benchmarks such as semantic scales for sporty, appealing, innovative, and luxurious (Coates 2003; Manoogian II 2013). Theme clinics are costly. Automotive firms typically invest between $100,000 and $1,000,000 per theme clinic for a single new vehicle design. With multiple aesthetic designs per vehicle and over a hundred vehicles in its worldwide product line, General Motors alone spends tens of millions of dollars on theme clinics. With the additional hidden costs incurred when designers routinely and manually screen hundreds of aesthetic designs to narrow down alternatives for a theme clinic, costs can exceed $100 million for a single manufacturer. In this paper, we augment human judgment with machine learning tools to address both aspects of aesthetic design: (1) the generation and (2) the testing of new aesthetic designs. For testing, we predict how consumers would rate product aesthetics directly from visual images. Specifically, we use an encoding model to represent visual designs (images) using a smaller number of 500 “features,” and train a predictive model that predicts aesthetic ratings based on those features. We use the predictive model to screen newly proposed aesthetic designs so that only the highest-potential designs are tested in theme clinics (or their equivalent for non-automotive applications). For generation, we create new designs to inspire the design team. We develop a generative model that creates new product designs that are predicted to be aesthetically appealing while satisfying product attributes desired by the designers (e.g., “Cadillac-like”). This gives the design team a controllable tool which they may use to morph through the design space. While some images are nonsensical, other generated images inspire creative aesthetic design by bending product forms. Our focus is augmentation of human expertise and creativity, rather than its automation . Our focus addresses several key managerial issues related to integrating machine intelligence into existing human workflows. This combination of machine learning and human judgment (machine-human hybrid) leads to better designs and a reduction in the costs of design generation and testing.

2. Conceptual Model of Aesthetic Product Design

Figure 1 summarizes the current design process and the proposed role of machine learning augmentation. Consider first the current process as shown in the first two rows (coded as black). The process begins with a market definition that is external to the aesthetic design efforts. For example, Apple targeted smartphones with a touchscreen; Zenni Optical aimed to develop prescription sports glasses; IKEA targeted affordable yet aesthetically pleasing furniture. In automotive markets, firms target particular segments such as luxury compact utility vehicles (currently the Cadillac XT5, Buick Envision, Volvo XC60, BMW X3, and others). Market definition provides soft constraints on aesthetics based on the targeted consumers and the firm’s capabilities (Box 1). Aesthetic designers then create hundreds to thousands of freehand sketches that are converted to 2D images (Box 2). For example, Dyson and General Motors may generate several hundred sketches per new product model, while IKEA may generate fewer given its product line variety and turnover (Bouchard, Aoussat, and Duchamp 2006; Toffoletto 2013). The human design team next screens potential designs to a smaller set of testable design concepts in a process known as “downselecting” (Box 3). Consumers evaluate the remaining designs in theme clinics resulting in more screening. Successful designs are advanced downstream for further development, including engineering, manufacturing, and marketing communications (advertising, social media, websites, etc.). The process is highly iterative and asynchronous across both generation and testing; our augmentation applies to all iterations.

Figure 1.

Augmenting Aesthetic Design with Machine Learning

The red trapezoids and arrows highlight the machine learning augmentation. After the machine learning models are trained, we use them to augment both design concept testing and generation. In testing, we evaluate design concept images using the predictive model to help eliminate designs likely to score low in theme clinics. This enables faster iteration as the design team focuses on the remaining high-potential images. By focusing on high-potential images, theme clinics are more efficient and effective. The resulting aesthetic designs are likely more profitable when introduced to the market. The firm benefits from shorter product development times by more rapidly sending final designs to downstream firm stages. Cost reductions result from the corresponding lower product “drop rate” (i.e., the share of design concepts later terminated in downstream firm stages) (Cooper 1990; Danneels and Kleinschmidtb 2001). As a final added benefit, rigorous quantitative evaluation of aesthetics helps “shield” designs from downstream changes driven by engineering, manufacturing, or accounting (Hartley 1996; Manoogian II 2013; Vlasic 2011). The generative model creates many designs that are realistic and aesthetically appealing, and can stretch designs in ways that spark creativity among designers (Coates 2003; Martindale 1990). The designers can control the generative model through specifying designer attributes. Example attributes are ‘red,’ ‘Cadillac-like,’ ‘Sport Utility Vehicle (SUV),’ ‘2015 vintage,’ or ‘viewed from the side.’ (The vintage variable is important when training the model. It captures the changing nature of fashion in the automotive industry (Hekkert, Snelders, and Wieringen 2003; Martindale 1990). This intended use is motivated by our experience with real design teams—human creativity is augmentable but currently nowhere near automatable. Designers must innovate product designs while anticipating changes in consumer demand; a challenging task given the often significant time it takes to develop a new product, e.g., Adidas apparel 12-18 months, GM vehicles 36+ months (

Adidas AG 2017 Annual Report

Both the predictive model and the generative model require that we address critical challenges in working with product aesthetics images and with the traditional aesthetic design pipeline. First, images are inherently high dimensional. Even modest quality images are 100 x 100 pixels for each of red, green, and blue colors together comprising 30,000 variables—far too many to be input to conventional choice models. Previous work has instead primarily represented aesthetics in choice models using characteristic lines (Chan, Mihm, and Sosa 2018; Ranscombe et al. 2012), landmark points (Landwehr, Labroo, and Herrmann 2011), silhouettes (Orsborn, Cagan, and Boatwright 2009; Reid, Gonzalez, and Papalambros 2010), and Bezier curves (Kang et al. 2016); or aggregated numbers such as J.D. Power APEAL or online reviews (Cho, Hasija, and Sosa 2015; Homburg, Schwemmle, and Kuehnl 2015; Jindal et al. 2016; Pauwels et al. 2004). We focus on working with images, the industry standard, due to their realism in representing the design. Design concept realism greatly affects consumer evaluation and enhances accuracy and precision in attribute-based product evaluation and forecasting (Dotson et al. 2016; Reid, MacDonald, and Du 2013; Hauser, Eggers, and Selove 2019). Choice modeling with images has only recently been enabled by deep learning, including relevant work in aesthetic prediction (Pan et al. 2017; Vasileva et al. 2018) and aesthetic generation (Kang et al. 2017; Sbai et al. 2019). Our work fundamentally differs from these works as we tackle key managerial issues required for adoption within companies; specifically, limited data sizes, and modeling decisions that augment rather than automate existing human workflows within firms. Second, gathering consumer evaluations is costly and this results in limited training data. In our application, we are fortunate to have 7,000 aesthetic ratings by consumers for vehicles, but even that would be insufficient to estimate a predictive model with 30,000 variables. To address limitations on the size of the training sample, we need to either (a) reduce the number of explanatory variables using dimensionality reduction or (b) increase the amount of training data to make the predictive model feasible. We do both by training an encoding model to embed images in a low-dimensional space. Embeddings reduce the dimensionality, and allow us to combine the relatively small (and expensive for the firm) sample of labelled training data (images with aesthetic ratings) with a much larger sample of unlabeled training data (images without ratings). Success depends upon whether the dimensionality reduction preserves the important information from the full images while still allowing us to predict human aesthetic judgment and generate perceptually realistic new designs. Embeddings using a neural network have seen recent adoption in marketing science; for example, Timoshenko and Hauser (2019) embed textual data to identify consumer needs; Sun, Ghose, and Liu (2018) embed consumer purchasing behavior; Liu, Lee, and Srinivasan (2018) embed product reviews to predict sales conversion; Zhang et al. (2017) embed Airbnb photos to predict demand; Liu, Dzyabura, and Mizik (2017) embed social media images to predict identity; Chakaborthy, Kim and Sudhir (2019) embed reviews to identify sentiment and missing evaluations. In our case, good embeddings are those which reduce dimensionality to low-dimensional “features” while still allowing us to rate product aesthetics and generate realistic new designs. (A “feature” in our context is an abstract representation inherent to embeddings. It is not an attribute such as “round headlight” or “an 8 o slope.”) Third, aesthetic evaluations are holistic (Berlyne 1971; Bloch 1995; Crilly, Moultrie, and Clarkson 2004; Martindale 1990). Aesthetic images cannot be decomposed as is done in marketing-science methods such as conjoint analysis (Orme and Chrzan 2017). Design aspects are interdependent; we cannot expect consumers to evaluate design aspects separately. For example, even for pre-defined aesthetic aspects, a consumer cannot evaluate the aesthetics of a new BMW X3 design as an additive sum of the shape and position of headlights, the slope of the hood, and the height of the beltline. Rather, the Gestalt interplay of all design elements, including subtle elements such as the “Hoffmeister kink,” drive consumer evaluations of qualitative attributes such as appealing, sporty, aggressive, luxurious, or modern (Coates 2003). The same interdependence is true for embedded features, thus we use deep neural networks for the encoder, predictive, and generative models to capture complex non-linear interactions among the reduced embedded “features.” Fourth, the design process is highly iterative and asynchronous. Multiple design teams (and sub-teams) in both design concept testing and generation must be able to use the machine learning augmentation for their corresponding roles. For example, the design team may split into sub-teams to work on several promising design concepts in parallel, while the current theme clinics may be testing entirely different design concepts. To enhance parallel development, we require an ability to separate the predictive and generative models into standalone components. At the same time, we require the ability to combine these models for usage and training.

3. Overview of a Machine Learning Approach to Augment Aesthetic Product Design

Let 𝑋 𝑖 be the (height x width x color) matrix of pixels of image 𝑖 . The proposed approach requires two inputs: product images labeled with aesthetic ratings evaluated by consumers, {(𝑋 𝑖 , 𝑦 𝑖 )| 𝑖 = 1. . 𝑁}, and unlabeled product images without ratings but with product attributes, {(𝑋 𝑗 , 𝑎⃗ 𝑗 )| 𝑗 = 1. . 𝑀} . For example, 𝑦 𝑖 might be the average consumer evaluation of the aesthetic appeal of product design 𝑖 , and 𝑎⃗ 𝑗 may be attributes (if known) such as color and brand. In a typical application, the firm would only have consumer evaluations for a small fraction of images, 𝑁 ≪ 𝑀.

Our two high-level goals are to test new product aesthetics by predicting consumer ratings, 𝑦̂ 𝑛𝑒𝑤 , for new product images created by the design team, 𝑋 𝑛𝑒𝑤 , as well as generate new product designs, 𝑋̂ 𝑔𝑒𝑛 , according to attributes desired by the design team, 𝑎⃗ 𝑔𝑒𝑛 , and with predicted aesthetic ratings, 𝑦̂ 𝑔𝑒𝑛 , that indicate consumer acceptance. Figure 2.

Proposed Machine Learning Augmentation Approach Figure 2 summarizes the general input and output flow of the proposed machine learning augmentation for aesthetic design. For every product design 𝑖 , the encoding model in Figure 2 (red trapezoid on the left) uses a function, 𝑞(ℎ⃗⃗ 𝑖 |𝑋 𝑖 , 𝑦 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗) to embed the 196,608-dimensional image, 𝑋 𝑖 , its aesthetic rating (if known), 𝑦 𝑖 , and its attributes (if known), 𝑎⃗ 𝑖 , to a 512-dimensional embedding, ℎ⃗⃗ 𝑖 . Note the 196,608-dimensional images are due to being 256 x 256 x 3 (height x width x color), while the 512-dimensional embedding size is chosen by us. The embedding (purple square in the middle of Figure 2) makes it tractable to use the predictive model, 𝑝 𝑝𝑟𝑒𝑑 (𝑦̂ 𝑖 | ℎ⃗⃗ 𝑖 , 𝛽⃗) , (green trapezoid near the top of Figure 2) to predict ratings for new images and for generated images. The embedding also makes it tractable to use the generative model, 𝑝 𝑔𝑒𝑛 (𝑋 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗) , (blue trapezoid on the right of Figure 2) to create new high-potential images, 𝑋 𝑔𝑒𝑛 . When generating images, we sample ℎ⃗⃗ such that 𝑝 𝑝𝑟𝑒𝑑 (𝑦̂|ℎ⃗⃗, 𝛽⃗) is large, and produce either a set of images or a video displaying images as we change ℎ⃗⃗ smoothly. The three models in the proposed approach—the generative model, encoding model, and predictive model—are formulated as probability distributions which are connected by the embedding. We learn an embedding distribution for each product design rather than a point estimate. Each product design 𝑖 has its own embedding distribution ℎ⃗⃗ 𝑖 ~ 𝑞(ℎ⃗⃗ 𝑖 |𝑋 𝑖 , 𝑦 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗) . Learning the parameters of the distribution leverages the variational inference literature (Blei, Kucukelbir, and McAuliffe 2017; Jordan et al. 1999), thereby enabling Bayesian parameter estimation at data sizes otherwise intractable for MCMC sampling. Variational approaches have seen recent adoption in marketing; for example, Braun and McAuliffe (2010) derive variational bounds for the mixed multinomial logit model; Ishigaki, Terui, Sato, and Allenby (2018) estimate a latent topic choice model for sparse transaction data; Ansari, Li, and Zhang (2018) estimate a topic model for a recommender system. In our case, a variational Bayesian approach improves predictive performance and allows better interpolation by the generative model. We use deep neural networks (augmented through adversarial training) to parametrize all three models. The loss functions for the deep neural networks are based on maximizing the data log likelihood—minimizing loss is the same as choosing parameters that lead to distributions that have high data likelihood. We develop the deep neural network model by first using conditional probability to separate the log-likelihood function into components that are based on prediction, generation, and encoding (§4). We then approximate the true log-likelihood with a lower bound based on the observed data (evidence lower bound). Finally, we assume priors from a parametric family of probability distributions and choose parameters to maximize the approximate log-likelihood based on the observed images and ratings. The overall combined model is maximized jointly and is based on those images that have ratings and those that do not have ratings. For the remainder of paper, when it is clear in context, we refer to the predictive model, generative model, and encoding model as the predictor, generator, and encoder, respectively.

4. Proposed Approach: Variational Autoencoder and Generative Adversarial Network

Our goals are to augment design concept testing by predicting aesthetic ratings, 𝑦̂ 𝑛𝑒𝑤 , for new design images, 𝑋 𝑛𝑒𝑤 , as well as augment design concept generation by creating new designs, 𝑋 𝑔𝑒𝑛 , given attributes, 𝑎⃗ 𝑔𝑒𝑛 , desired by the design team. Let 𝛽⃗ be a vector of model parameters. To ease notation, we temporarily write all parameters and likelihoods for a single datum 𝑖 . We seek a joint distribution, 𝑝(𝑦 𝑖 , 𝑋 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗ ), for the ratings and images conditioned on the design attributes and the parameters. The joint distribution can be decomposed into a predictive model and a generative model by the laws of conditional probability: 𝑝(𝑦 𝑖 , 𝑋 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗) = 𝑝 𝑝𝑟𝑒𝑑 (𝑦 𝑖 | 𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗)𝑝 𝑔𝑒𝑛 (𝑋 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗) . Unfortunately, representing and estimating the two conditional distributions is not feasible when product images are high dimensional. To address high-dimensionality, we approximate the true joint distribution using embeddings . In our case, the embedding compresses information from high-dimensional images, aesthetic ratings, and product attributes to enable tractable predictive and generative models. We choose embeddings such that the embeddings-based likelihood closely approximates the true but intractable likelihood, 𝑝(𝑦 𝑖 , 𝑋 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗) . We estimate an embedding posterior distribution, 𝑞 𝑒𝑛𝑐 (ℎ⃗⃗ 𝑖 |𝑋 𝑖 , 𝑦 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗) , for each product design 𝑖 such that ℎ⃗⃗ 𝑖 has substantially fewer dimensions (e.g., 512) than 𝑋 𝑖 (e.g., 196,608), yet retains most of the information contained in the images, ratings, and product attributes. To infer embeddings, we use variational Bayes methods to approximate the true joint log-likelihood, log 𝑝(𝑦 𝑖 , 𝑋 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗) , with an approximate log-likelihood, ℓ 𝑎𝑝𝑝𝑟𝑜𝑥𝑖 (𝛽⃗ ) , dependent on the embeddings, ℎ⃗⃗ 𝑖 (Blei, Kucukelbir, and McAuliffe 2017; Jordan et al. 1999). Note that while this technique could introduce new data-dependent parameters, 𝛽⃗ 𝑖 , we implicitly include them in the set of “global” parameters, 𝛽⃗ , as they will later be efficiently learned with a deep learning model (amortized inference, Hoffman et al. 2012). To obtain this approximation, we first condition the true log likelihood, log 𝑝(𝑦 𝑖 , 𝑋 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗) , on the embeddings via marginalization, which leads to the logarithm of an expectation. We then approximate the logarithm of the expectation by the expectation of the logarithm. By Jensen’s Inequality, the approximation is a lower bound to the true log likelihood. Rearranging terms we arrive at the approximate likelihood in Equation 1. Given an image, 𝑋 𝑖 , its rating, 𝑦 𝑖 , and its attributes, 𝑎⃗ 𝑖 , we seek to maximize ℓ 𝑎𝑝𝑝𝑟𝑜𝑥𝑖 (𝛽⃗) and thus approximately maximize log 𝑝(𝑦 𝑖 , 𝑋 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗) . Appendix A2 provides a more-detailed derivation that combines standard variational Bayes theory with predictive models. If 𝐷 𝐾𝐿 (∙ || ∙) signifies the Kullback-Leibler (KL) divergence, then: (1) ℓ 𝑎𝑝𝑝𝑟𝑜𝑥𝑖 (𝛽⃗) = E ℎ⃗⃗⃗ 𝑖 [log 𝑝 𝑝𝑟𝑒𝑑 (𝑦 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗) + log 𝑝 𝑔𝑒𝑛 (𝑋 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗) − log 𝑞 𝑒𝑛𝑐 (ℎ⃗⃗ 𝑖 |𝑋 𝑖 , 𝑦 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗)+ log 𝑝 𝑝𝑟𝑖𝑜𝑟 (ℎ⃗⃗ 𝑖 |𝑎⃗ 𝑖 )]= 𝐸 ℎ⃗⃗⃗ 𝑖 [log 𝑝 𝑝𝑟𝑒𝑑 (𝑦 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗) + log 𝑝 𝑔𝑒𝑛 (𝑋 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)]− 𝐷 𝐾𝐿 (𝑞 𝑒𝑛𝑐 (ℎ⃗⃗ 𝑖 |𝑋 𝑖 , 𝑦 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗)||𝑝 𝑝𝑟𝑖𝑜𝑟 (ℎ⃗⃗ 𝑖 |𝑎⃗ 𝑖 )) Equation 1 is intuitive. The first term seeks to maximize our ability to predict aesthetic ratings based on the embeddings. The second term seeks to reproduce images based on the embeddings. The last term is the negative of KL divergence from the embedding (distribution) to the prior on the embeddings. The KL term acts to regularize the embedding such that the distributions of encoded images do not stray too far from the prior. The KL divergence prevents the encoder from “cheating” by giving each image a unique sub-region of the embedding space, effectively memorizing training data and hurting generalization performance. Note that for this derivation we have assumed conditional independence between images, ratings, and attributes given the embedding. This allows our model to use these data separately, as guided by the challenges discussed in §2.2 with regards to labelled data and the multiple stakeholders involved in current design workflows. Because the approximate log-likelihood, ℓ 𝑎𝑝𝑝𝑟𝑜𝑥𝑖 (𝛽⃗) , is a lower bound to the true log likelihood, 0 log 𝑝(𝑦 𝑖 , 𝑋 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗) , the approximation is commonly called the “evidence lower bound” (ELBO; Jordan et al. 1999). We derived Equation 1 for every datum 𝑖 , which leads to an overall full data approximate log-likelihood, ℒ(𝛽⃗) , which we maximize over parameters, 𝛽⃗ , using the observed images and ratings: (2) ℒ(𝛽⃗) = ∑ ℓ 𝑎𝑝𝑝𝑟𝑜𝑥𝑖 (𝛽⃗) 𝑖 Equation 2 assumes global parameters for (1) predicting ratings based on embeddings and (2) generating images based on embeddings. If we did not make the assumption of global parameters, then the model would not be useful for either prediction or generation. We also specify a common prior describing the desired probabilistic structure of embedding (distributions). We make Equations 1 & 2 “variational” by assuming that each datum 𝑖 has its own embedding distribution defined by distributional model parameters conditioned on the images and attributes. Finally, we note that the embedding portions of Equations 1 & 2 are equivalent to those derived from minimizing the KL divergence from (1) a true (unknown) distribution of the embeddings conditioned on the images and attributes to (2) the best variational distribution of the embeddings conditioned on the images and attributes (Jordan et al. 1999). Because of the non-linear relationship between images, ratings, and embeddings we use neural networks for the predictive, generative, and embedding models. To train these neural networks feasibly, we decompose the global parameter space ( 𝛽⃗ ) into parameters for the predictive model ( 𝛽⃗ 𝑃 ), the generative model ( 𝛽⃗ 𝐺 ), and encoding model ( 𝛽⃗ 𝐸 ). We do so by using conditional probability to separate Equation 2 into its component models. Decomposition enables us to train and use each neural network separately and iteratively, which helps to overcome a key managerial issue, namely, that the model must be usable in an asynchronous and iterative environment by teams from industrial design, marketing, brand strategy, and so on. Specifically: (3) ℒ(𝛽⃗) = ℒ 𝑝𝑟𝑒𝑑 (𝛽⃗ 𝑃 ) + ℒ 𝑔𝑒𝑛 (𝛽⃗ 𝐺 ) + ℒ 𝑒𝑛𝑐 ( 𝛽⃗ 𝐸 ) ℒ 𝑝𝑟𝑒𝑑 (𝛽⃗ 𝑃 ) = ∑ 𝐸 ℎ⃗⃗⃗ 𝑖 [log 𝑝 𝑝𝑟𝑒𝑑 (𝑦 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗ 𝑃 )] 𝑖 ∈ 𝑟𝑎𝑡𝑒𝑑 ℒ 𝑔𝑒𝑛 (𝛽⃗ 𝐺 ) = ∑ 𝐸 ℎ⃗⃗⃗ 𝑖 [log 𝑝 𝑔𝑒𝑛 (𝑋 𝑖 | ℎ⃗⃗ 𝑖 , 𝛽⃗ 𝐺 )] 𝑖 ∈𝑟𝑎𝑡𝑒𝑑,𝑢𝑛𝑟𝑎𝑡𝑒𝑑 ℒ 𝑒𝑛𝑐 ( 𝛽⃗ 𝐸 ) = − ∑ {𝐷 𝐾𝐿 (𝑞 𝑒𝑛𝑐 (ℎ⃗⃗ 𝑖 |𝑋 𝑖 , 𝑦 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗ 𝐸 )||𝑝 𝑝𝑟𝑖𝑜𝑟 (ℎ⃗⃗ 𝑖 |𝑎⃗ 𝑖 )) 𝑖 ∈ 𝑟𝑎𝑡𝑒𝑑,𝑢𝑛𝑟𝑎𝑡𝑒𝑑 + 𝐷 𝐾𝐿 (𝑞 𝑎𝑡𝑡𝑟 (𝜋⃗⃗ 𝑖 |𝑋 𝑖 , 𝛽⃗ 𝐸 )|| 𝑝 𝑎𝑡𝑡𝑟 (𝜋⃗⃗ 𝑖 |𝑎⃗ 𝑖 ))} This decomposition is possible since the embedding “bottlenecks” information contained in images, ratings, and attributes that is necessary to both predict ratings and generate new designs (Shu, Bui, and Ghavamzadeh 2016). We further promote separation in practice depending on the data regime, specifically, the presence of rating and attribute information. With unlabeled (i.e., unrated) data, we focus on learning images and attributes without rating dependency. Because the generative likelihood and the encoder likelihood do not depend upon the ratings or the parameters of the predictive model, we use all labeled and unlabeled images in the neural network for the embeddings. For (relatively) less expensive unlabeled data, we have access to product attributes (e.g., color, brand). Accordingly, the encoder log-likelihood, ℒ 𝑒𝑛𝑐 ( 𝛽⃗ 𝐸 ) , differs from that in Equation 1 because we also learn the relationship between images and their attributes. Thus, in addition to the embeddings, ℎ⃗⃗ 𝑖 , we use variational inference to encode attribute information with 𝜋⃗⃗ 𝑖 . Following the same reasoning used to derive Equation 1, we obtain the last KL divergence term in the encoder log-likelihood. This term acts as a classifier for attributes, 𝑎⃗ 𝑖 , when the attributes are known. We use the classifier to predict attributes when they are unknown or ambiguous (e.g., when generating new designs). See also Appendix A3 and Keng (2017). We choose probability distributions for the predictive, generative, and encoding models in Equation 3 using the framework of variational autoencoders (VAE; Kingma and Welling 2013). We later augment the VAE with concepts from adversarial learning as used in Generative Adversarial Networks (GANs, Goodfellow et al. 2014).

Predictive Model.

We use a deep neural network, 𝑓 𝑃 ( ℎ⃗⃗ 𝑖 , 𝛽⃗ 𝑃 ) , that maps the embeddings to a prediction on the rating of interest, say a 1-to-5 rating on “appeal.” Information about the full images and attributes is summarized in the embeddings. For the predictive model, we choose Laplace distributions with means 𝑦 𝑖 and unit diversity, 𝑝 𝑝𝑟𝑒𝑑 (𝑦 𝑖 |ℎ⃗⃗, 𝛽⃗ 𝑃 ) = 𝑒 −|𝑦 𝑖 −𝑓 𝑃 (ℎ⃗⃗⃗ 𝑖 ,𝛽⃗⃗⃗ 𝑃 )| . Define 𝑦̂ =𝑓 𝑃 ( ℎ⃗⃗ 𝑖 , 𝛽⃗ 𝑃 ) as the predicted rating from the neural network, then the predictive distribution implies that we minimize the mean absolute error of predicted versus true ratings. 2 (4) ℒ 𝑝𝑟𝑒𝑑 (𝛽⃗ 𝑃 ) = ∑ 𝐸 ℎ⃗⃗⃗ 𝑖 [log 𝑝 𝑝𝑟𝑒𝑑 (𝑦 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗ 𝑃 )] 𝑖 ∈𝑟𝑎𝑡𝑒𝑑 = − ∑ |𝑦 𝑖 − 𝑦 𝑖 ̂ | 𝑖 ∈ 𝑟𝑎𝑡𝑒𝑑 Generative Model.

We use a deep neural network, 𝑓 𝐺 (ℎ⃗⃗ 𝑖 ; 𝛽⃗ 𝐺 ) that generates candidate images from an embedding. We train the model on observed images, but use it for both observed and new images. We choose the generative model to be a high-dimensional Laplace distribution with means, 𝑋 𝑖 , and unit diversity, 𝑝 𝑔𝑒𝑛 (𝑋 𝑖 | ℎ⃗⃗ 𝑖 , 𝛽⃗⃗⃗⃗ 𝐺 ) ∝ 𝑒 −|𝑋 𝑖 −𝑓 𝐺 (ℎ⃗⃗⃗ 𝑖 ,𝛽⃗⃗⃗ 𝐺 )| . Define 𝑋̂ 𝑖 as the predicted image from the neural network with elements 𝑥̂ 𝑖𝑑 for each of the pixels in that image (3D because there are three colors). For many products, and vehicles in particular, we help the generative model by using “masks.” A mask defines the general nature of a product, say “SUV-like” and is represented as a matrix with the same height and width as the product image. The mask’s 𝐷 pixels are binary values describing the presence of the product of a general shape. Masks focus generative models on product designs rather than other potentially unrelated information in product images. In the generator, masks are analogous to a 4 th color; they are predicted by the same deep neural network using the (now-augmented) parameters, 𝛽⃗ 𝐺 . Let 𝑀 be the mask with elements 𝑚 𝑑 and let 𝑀̂ 𝑖 be the predicted mask with elements 𝑚̂ 𝑖𝑑 . Then, following the same arguments used for pixels, the log-likelihood function for the generative model is: (5) ℒ 𝑔𝑒𝑛 (𝛽⃗ 𝐺 ) = ∑ 𝐸 ℎ⃗⃗⃗ 𝑖 [log 𝑝 𝑔𝑒𝑛 (𝑋 𝑖 , 𝑀 𝑖 | ℎ⃗⃗ 𝑖 , 𝛽⃗ 𝐺 )] 𝑖 ∈ 𝑟𝑎𝑡𝑒𝑑,𝑢𝑛𝑟𝑎𝑡𝑒𝑑 = − ∑ { 13𝐷 ∑|𝑥 𝑖𝑑 − 𝑥̂ 𝑖𝑑 | 𝑑 + 1𝐷 ∑|𝑚 𝑑 − 𝑚̂ 𝑖𝑑 | 𝑑 } 𝑖 ∈ 𝑟𝑎𝑡𝑒𝑑,𝑢𝑛𝑟𝑎𝑡𝑒𝑑 Encoding Model.

We use a deep neural network, 𝑓 𝐸 (𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗ 𝐸 ) , to map images, 𝑋 𝑖 , and product attributes, 𝑎⃗ 𝑖 , to an embedding, ℎ⃗⃗ 𝑖 , with distributional parameters, 𝜇⃗ 𝑖 (𝑋 𝑖 , 𝑎⃗ 𝑖 ) and 𝜎⃗ 𝑖 (𝑋 𝑖 , 𝑎⃗ 𝑖 ) . Note that we omit the (labeled) rating dependency of the encoder as discussed in §4.2, as well as implicitly include per-datum variational distributional parameters in 𝛽⃗ 𝐸 . The estimation is “amortized” into a single neural network (Shu et al. 2018). For the embedding variational family, we choose multivariate Gaussian mixture distributions with mixture components depending on product attributes, 𝑎⃗ 𝑖 , (e.g., ‘SUV’). In other words, an embedding, ℎ⃗⃗ 𝑖 , has a Gaussian mixture marginal distribution, but ℎ⃗⃗ 𝑖 |𝑎⃗ 𝑖 has a single Gaussian conditional distribution given attributes (Dilokthanakul et al. 2016). This expands a representation capacity of our 3 model (Ranganath, Tran, and Blei 2015), without resorting to more complicated autoregressive and flow-based methods (Chen et al. 2016). We further assume each 𝐾 -dimensional Gaussian has diagonal covariance, thereby factorizing into 𝐾 conditionally independent Gaussians in which 𝐾 is the dimensionality of the embeddings. If 𝑘 indexes the elements of the embedding, then the variational assumption implies that 𝑞 𝑒𝑛𝑐 (ℎ⃗⃗ 𝑖 | 𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗ 𝐸 ) ∝ ∏ 𝜎 𝑖𝑘−1 𝑒 − (ℎ 𝑖𝑘 −𝜇 𝑖𝑘 ) 𝜎 𝑖𝑘−2 𝐾𝑘=1 where the 𝜇⃗ 𝑖 and 𝜎⃗ 𝑖 are functions of 𝑋 𝑖 , 𝑎⃗ 𝑖 , and 𝛽⃗ 𝐸 . Using known formulae for 𝐷 𝐾𝐿 (𝒩(𝜇 𝑖𝑘 , 𝜎 𝑖𝑘 ) || 𝒩(0,1)) , we obtain a simpler representation for the first divergence term in the encoder log-likelihood (Kingma and Welling 2013). For the second divergence term in the encoder log-likelihood, we show in Appendix A3 that we may approximate 𝐷 𝐾𝐿 (𝑞 𝑎𝑡𝑡𝑟 (𝜋⃗⃗ 𝑖 |𝑋 𝑖 , 𝛽⃗ 𝐸 )|| 𝑝 𝑎𝑡𝑡𝑟 (𝜋⃗⃗ 𝑖 |𝑎⃗ 𝑖 )) ≅ 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 − log 𝑞 𝑎𝑡𝑡𝑟 (𝑎⃗ 𝑖 |𝑋 𝑖 , 𝛽⃗ 𝐸 ) . This results in the encoder acting as a multinomial classifier, 𝑞 𝑎𝑡𝑡𝑟 (𝑎⃗ 𝑖 |𝑋 𝑖 , 𝛽⃗ 𝐸 ), to predict attributes from product images. Specifically, we have 𝐶 multinomial distributions, where 𝐶 is the number of attributes (e.g., brand, bodytype) and ℓ 𝑐 is the number of levels of attribute 𝑐 (e.g., Cadillac). The encoder neural net for 𝜇⃗ 𝑖 and 𝜎⃗ 𝑖 thus also produces 𝜋⃗⃗ 𝑖 , from which we draw Dirichlet probabilities, 𝑎̂ 𝑖 = 𝑞 𝑒𝑛𝑐 (𝜋⃗⃗ 𝑖 |𝑋 𝑖 , 𝛽⃗ 𝐸 ) , using a soft-max function. We recognize 𝐸 𝜋⃗⃗⃗ 𝑖 [log 𝑞 𝑒𝑛𝑐 (𝜋⃗⃗ 𝑖 |𝑋 𝑖 , 𝛽⃗ 𝐸 )] as the cross-entropy for a draw of the attributes, 𝑎⃗ 𝑖 , from the multinomial probabilities, 𝑎̂ 𝑖 . This provides the second term in Equation 6. This term rewards the encoder for its ability to predict the attributes based on the images. During training, this term encourages the encoder to learn attributes, while during prediction (when we do not know attributes) this term allows us to estimate unknown product attributes, 𝑎̂ 𝑖 , by sampling from the multinomial distribution indexed by 𝜋⃗⃗ 𝑖 . Putting both terms together we obtain: (6) ℒ 𝑒𝑛𝑐 ( 𝛽⃗ 𝐸 ) = ∑ {∑ [12 (𝜇 𝑘𝑖2 + 𝜎 𝑘𝑖2 ) − log 𝜎 𝑘𝑖 − 12] 𝐾𝑘=1 + ∑ ∑ 𝑎 𝑖𝑐ℓ log 𝑎̂ 𝑖𝑐ℓℓ 𝑐 ℓ=1𝐶𝑐=1 } 𝑖∈ 𝑟𝑎𝑡𝑒𝑑,𝑢𝑛𝑟𝑎𝑡𝑒𝑑 Equations 3 through 6 specify a set of log-likelihood functions for the neural networks that together compose the VAE in Figure 2. The log-likelihoods balance the ability to predict ratings, generate new images, and represent images by embeddings. But to be effective for machine learning augmentation, we require that the embeddings encode and generate images well. For example, if we are generating luxury crossover utility vehicles (luxury CUVs), the images should look like well-designed luxury CUVs. After extensive experimentation and tuning, we found it necessary to augment the VAE 4 using the concept of adversarial training found in generative adversarial networks (GANs). We note our extension is similar to concurrent work by Heljakka, Solin, and Kannala (2019). The basic idea is that we reward the generative model for generating images that are different than the prior, but we reward the encoder if it can distinguish a generated image from an observed image because it is too far from the prior. The GAN is implemented with competing objectives—a term in the generative model is the negative of a term in the encoder. Because the generator and encoder are trained iteratively, the generator and encoder reach a min-max solution to a two-player game. That is, we converge to a fixed point where generated images and actual images are both encoded to the same embedding space (Goodfellow et al. 2014). We augment the variational autoencoder with the adversarial training concept from GANs, by adding the following term to the likelihood of the encoder to reward the encoder for distinguishing generated images from observed images. We subtract the same term from the log-likelihood of the generator to reward the generator to produce images that are hard to distinguish from observed images. These terms do no simply cancel out as we iteratively train the encoder and predictor while fixing the generator, and vice versa. We use the methods of §4.4 to simplify this expression with normally distributed prior and posterior distributions to obtain an analog to Equation 6. (7) ∑ 𝐷 𝐾𝐿 (𝑞 𝑒𝑛𝑐 (ℎ⃗⃗ 𝑖 |𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗ 𝐸 )||𝑝 𝑝𝑟𝑖𝑜𝑟 (ℎ 𝑖 ⃗⃗⃗⃗|𝑎⃗ 𝑖 )) 𝑖 ∈𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝑑 𝑖𝑚𝑎𝑔𝑒𝑠 This approach to adversarial training differs from conventional GANs in that we are not learning an implicit generative model by rewarding a “discriminator” (i.e., the analogue of the encoder in our work) to classify images as real or generated. We instead perform adversarial training in the embedding space much like feature matching and perceptual similarity approaches (Larsen, Sønderby, and Winther 2015). We maintain the probabilistic interpretation using the Kullback-Leibler divergence term rather than previously-used feature matching on point estimates conceptually similar to Huang et al. 2018). §§4.1-4.3 used the perspective of VAEs and probability models to derive an approximate separable likelihood function, the components of which are based on deep neural networks. These VAE components may be viewed as combining semi-supervised VAEs (Kingma et al. 2014) with conditional VAEs (Sohn, Lee, and Yan 2015). In §4.4 we augmented the VAE perspective with adversarial methods (GANs) to encourage the generator to generate recognizable new images while encouraging the encoder to recognize generated images as generated rather than observed. In contrast to typical VAE-GAN 5 approaches (Larsen, Sønderby, and Winther 2015; Zhao et al. 2016; Berthelot et al. 2017), adversarial autoencoders (Makhzani et al. 2015) and adversarial generative encoders (Heljakka, Solin, and Kannala 2018; Ulyanov, Vedaldi, and Lempitsky 2018), we retain the probabilistic interpretation of the VAE viewpoint as our adversarial losses are motivated by divergences of the embedding distribution. The combined approach improves embeddings and minimizes “posterior collapse” in VAEs; our model converges and the embeddings differ from the priors. Table 1 summarizes the log-likelihood functions as loss functions comparable to more typical neural network frameworks. We do so recognizing that 𝑙𝑜𝑠𝑠(𝛽⃗ 𝑃 , 𝛽⃗ 𝐺 , 𝛽⃗ 𝐸 ) = − ℒ(𝛽⃗ 𝑃 , 𝛽⃗ 𝐺 , 𝛽⃗ 𝐸 ) . Table 1 further indicates which images are included in summations.

5. Moving from Theory to Practical Implementation

The basic structure of our machine learning augmentation is derived by using deep neural networks to approximate separable log-likelihood functions. From this perspective, our augmentation is a VAE modified with an adversarial (GAN) component. We differ from most VAEs and GANs because we include as well information from product ratings, product attributes, and masks, all three of which are necessary to enhance predictive ability and generate realistic new aesthetic designs. But given the challenges discussed in §2, we must further customize and “tune” the model.

Many deep learning models are trained for one application (e.g., object detection) and modified for another application (e.g., aesthetic rating prediction), often referred to as “transfer learning” (Evgeniou and Pontil 2004). Such customized models have proven effective on image data for identifying product returns and brand identity (Dzyabura et al. 2019; Liu, Dzyabura, and Mizik 2019), and on textual data for identifying customer needs and predicting sales conversion (Liu, Lee, and Srinivasan 2019; Timoshenko and Hauser 2019). Unfortunately, we were unable to identify any pre-trained models that generate new highly-rated automotive images and which would be controllable in a manner that may be used by designers (e.g., morph Brand A sedan with Brand B sport-utility vehicle). The demands of interconnecting attributes, masks, ratings, high-resolution images, and adversarial training for generation together require a custom deep learning architecture trained on unique data. Because the custom deep learning model required substantial “tuning,” we were careful to hold out data for model evaluation. We randomly split data into training, validation, and testing sets using a seeded random number generator for reproducibility. Validation data were used to set model hyperparameters (e.g., learning rates) and monitor training progress. Testing data were used only for model evaluation. 6

Table 1.

Predictive, Generative, and Encoding Loss Functions (where loss = – log-likelihood)

Loss Function Intuition Predictive Model (summed over rated images) |𝑦 𝑖 − 𝑦 𝑖 ̂ | Laplace term rewards the predictor for predicting ratings.

Generative Model (summed over rated, unrated images, or, when indicated, generated images)

13𝐷 ∑|𝑥 𝑖𝑑 − 𝑥̂ 𝑖𝑑 | 𝑑 Laplace term rewards generator for predicting images that are close to true images.

1𝐷 ∑|𝑚 𝑖𝑑 − 𝑚̂ 𝑖𝑑 | 𝑑 Laplace term rewards generator for predicting masks that correctly segment the design within the image. ∑ [12 (𝜇 𝑘𝑔2 + 𝜎 𝑘𝑔2 ) − log 𝜎 𝑘𝑔 − 12] 𝐾𝑘=1

Adversarial term rewards the generator for images that are drawn from the same distribution as observed images. Summed over generated images only ( 𝑔 ). Encoding Model (summed over rated, unrated images, or, when indicated, generated images) ∑ [ (𝜇 𝑘𝑖2 + 𝜎 𝑘𝑖2 ) − log 𝜎 𝑘𝑖 − ] 𝐾𝑘=1 , 𝜇⃗ 𝑖 , 𝜎⃗ 𝑖 = 𝑓 𝐸 (𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗ 𝐸 ) KL divergence rewards the encoder for embeddings close to the prior. − ∑ ∑ 𝑎 𝑖𝑐ℓ log 𝑎̂ 𝑖𝑐ℓℓ 𝑐 ℓ=1𝐶𝑐=1 , , 𝑎̂ 𝑖 = 𝑓 𝐸 (𝑋 𝑖 , 𝛽⃗ 𝐸 ) Cross entropy rewards encoder for predicting attributes from images. − ∑ [12 (𝜇 𝑘𝑔2 + 𝜎 𝑘𝑔2 ) − log 𝜎 𝑘𝑔 − 12] 𝐾𝑘=1

Adversarial term rewards encoder for distinguishing generated images from observed images. Summed over generated images only ( 𝑔 ). Figure 3 summarizes the deep neural network architectures for the predictive, generative, and encoding models. Each architecture is based on many modeling decisions (and tuning). To simplify the description of the architectures, Figure 3 uses “blocks” of neural network “layers.” The layers in each block are given at the top of Figure 3. Each of these layers (e.g., 2D convolution) performs the indicated operations on “features” from the previous layers. For example, 2D convolution applies two-dimensional filters to the input in an analogy to neuronal receptive fields (Olshausen and Field 1996). We use spectral normalization as a regularization technique to control the magnitude of gradients 7 during model training (see §5.3). A leaky rectified linear layer acts as a nonlinear activation function to transform values from the previous layer, thereby enabling the neural network to learn complex feature interactions. A residual connection simply adds the original input from the first layer of the block to the now transformed features at the end of the block. In doing so, the intermediary layers learn the residual error from the previous block. 2D average pooling reduces the dimensionality of the previous layer by down-sampling patches of 2D features to a single value.

Figure 3.

Deep Neural Network Architectures for Predictive, Generative, and Encoding Models

The rows in Figure 3 corresponding to the generative and encoding models enhance generation and encoding with a “progressive structure” as detailed in (Karras et al. 2017). In a progressive structure, we start at low resolutions, then increase resolution in stages and retrain the blocks until we reach the target image resolution of 256 x 256 pixels (three colors plus the mask). We label the resolution stages by height and width, say 4 x 4 pixels or 32 x 32 pixels. When the model is trained, we obtain lower 8 resolution images by down-sampling from the full image. The input and output image resolutions are given by the corresponding block. For example, we obtain 32 x 32 images as the 4 th block in the encoder model. During generation, at each stage in the progressive structure, the immediately lower resolution image is blended with the stage’s resolution according to an annealing schedule, in which the trained image progressively includes less of the low resolution image. Both the encoder and generator sample for multiple iterations before moving to next resolution—the number of samples is a hyper-parameter set using the validation data. The advantage of a progressive structure is that, as the model learns to encode or generate, it maintains knowledge from previous images effectively. Rather than starting model training from a completely blank slate at full resolution, the model has already learned information about the product design at lower resolutions. Deep learning models do not always have well-behaved parameter estimation procedures. This is especially true for the models in Figure 3, which feature complex deep learning models that balance multiple competing and/or adversarial loss function terms. We therefore seek to avoid unbounded “back-propagated” gradients that lead to catastrophic failures in model training (e.g., outputting images of only black pixels). We stabilize training with the spectral normalization layer in each block. This promotes Lipschitz continuity to regularize the model by dividing the raw output weights of each layer by the largest singular value of the matrix of weights. See Miyato et al. 2018. We also manually enforce both soft and hard constraints on portions of the model architecture. For example, we bound the variance of the Gaussian random variables in the KL-divergence terms and use floating point safeguards to ameliorate the potential numerical instability introduced in §4 by the logarithms and various 𝐿 𝑝 -norms. We tune model training by scaling the contribution of each loss term in Table 1 to its corresponding encoder, generator, or predictor loss with user-defined multiplicative scaling terms. This creates a balancing act between the seven loss terms given their endogenous interdependency during training. Further tuning is often domain specific (e.g., the scaled importance of the vehicle body style versus relative to its orientation). Empirically we found that the KL-divergence loss terms were best scaled to 1/10 or less relative to the loss terms for reconstructing images reconstruction loss, as well as annealed from zero at start of training to a maximum value after model convergence. We stabilize model training by tuning the rate at which the competing models are updated. Empirically, we found that the generative model must be updated substantially more often than the encoding model; from 10 times as much to as little as two. We found that this ratio depends empirically 9 on the stage of model training. This may be unique to our application which combines likelihood-based VAE with adversarial training. Typically, generative GAN models are updated less often than encoding models (Arjovsky, Chintala, and Bottou 2017) and benefit from unequal learning rates (Heusel et al. 2017). By contrast our training ratio favors minimizing 𝐷 𝐾𝐿 (𝑝|𝑞) rather than 𝐷 𝐾𝐿 (𝑞|𝑝) . We train the model by minimizing the loss functions in Table 1 with first-order stochastic gradient methods using mini-batches of training data. Specifically, we use the Adam stochastic gradient optimizer (Kingma and Ba 2015). Stochastic gradient methods are justified given their empirical performance and scalability via the backpropagation algorithm. Backpropagation simplifies an otherwise large calculation of a multi-parameter gradient to an equivalent series of smaller iterative gradient calculations. Gradients for a given loss in Table 1 propagate from the layer calculating the loss backwards to “earlier” layers, thereby taking advantage of the compositional structure of network layers and the chain rule of differentiation (Lagrange 1797). To use gradient methods, we employ the “reparameterization trick” used in Kingma and Welling (2013). We rewrite the otherwise intractable gradient of an expectation over the embedding distribution (Equation 3 and Equation 7) to an equivalent tractable formulation by splitting the stochastic Gaussian embedding distribution into a deterministic neural net and an independent additive stochastic term. With this simplification we more easily compute an unbiased estimate of the gradient using Monte Carlo samples of the independent additive term. We similarly use this reparametrization trick when we do not have access to product attributes during training and inference. In this case we use a relaxation of the otherwise non-differentiable categorical attribute variables called the Gumbel-Softmax (Jang, Gu, and Poole 2016). Case Study: Aesthetic Design of U.S. Automotive Market SUV/CUVs

Our machine learning augmentation has two goals (1) predict consumer evaluations of aesthetic designs (ratings) for a sample of proposed designs created and tested by an automotive partner and (2) generate new aesthetic designs that are likely to be rated highly. As an initial test of the proposed model (Figure 2) we evaluate (1) whether the trained VAE/GAN model can predict ratings of SUVs and CUVs that were evaluated in our partner’s theme clinics and (2) whether high-predicted-appeal designs generated by the VAE/GAN model are rated highly and low-predicted-appeal designs are rated poorly.

We obtained aesthetic ratings for images of the 203 unique SUVs/CUVs from model years 2010-0 2014 (MY2010-14) tested in our partner’s theme clinics. Our partner collected ratings from 178 targeted consumers in one of their theme clinics. Following established procedures for their theme clinics, our partner targeted consumers who were rigorously screened and selected to be interested in purchasing in the target category. Respondents were incentive aligned using our partners’ standard methods, both fiscally and with knowledge that their input would guide future aesthetic design. Ratings were obtained from a web-based survey co-located with the theme clinic. Warm-up questions motivated respondents (credibly) that their ratings would affect aesthetic design and introduced our partner’s standard pairwise-semantic-differential rating-scale (most unappealing to most appealing). We anchored respondents’ ratings by asking each respondent to rate five pre-chosen pairs of images—prechosen in pretests to be most divisive on the pairwise ratings scale. The five image pairs were the same for all respondents, but randomly counterbalanced between respondents. Respondents each rated ten sequential pages of SUVs/CUVs in which each page contained five images. To test respondent consistency, the 2 nd and 8 th page and the 3 rd and 9 th page contained the same images randomly ordered. After extensive pretesting, the survey was implemented by our automotive partner. To maintain consistency among images and mitigate image-color biases, all images were reduced to greyscale and shown with a side viewpoint. (Unrated and generated images are three-color.) Appendix A4 provides an example rating page. To maintain data quality prior to any further analysis, we eliminated respondents who were judged to be inconsistent based on Krippendorf’s 𝛼 where 𝛼 = 𝛼 is a generalization of other interrater reliability measures such as Fleiss’ 𝜅 and Cohen’s 𝜅 (Krippendorff 2011). The cutoff was 𝛼 = 0.75 , which eliminated 38 respondents (21%). Ratings from consistent users were aggregated to a mean value for each of the 203 unique SUVs in MY2010-14, which were assigned to corresponding vehicle viewpoints to result in a full data of 7,308 rated images by giving the same mean rating to all viewpoints. We randomly split the full data into training, validation, and testing data at a ratio of 50%:25%:25%. (Same-model SUVs and CUVs remained together to ensure that validation or testing information did not leak into the training data.) Although high-quality full-color images are readily available on the web, almost all of these images are copyrighted and could not be used without permission to train our model. Fortunately, high-quality images are available from aggregators and are used often by automotive firms in their marketing communications. The typical cost is about $50,000 per month. We obtained 180,000 unrated SUV/CUV 1 images from our partner firm and a proprietary (unnamed) aggregator. All images were rescaled to 256 x 256 pixel images and included several attributes. We used conventional computer vision tools to obtain masks and attributes that were not coded by the firm, such as color and viewpoint.

7. Evaluation of the Machine Learning Augmentation

We evaluate the model on the 1,836 rated images in the held-out testing set. Table 2 reports the mean absolute error (MAE) for predicted-versus-actual ratings on the held-out data. Our model predicts well with a MAE of 0.372 of a scale point. Figure 4 provides example predictions for eight SUVs/CUVs. To put the MAE of the predicted model in perspective, we compare its predictive ability to a series of baseline models that vary from naïve to sophisticated, we used the same training and validation data to develop the benchmarks and spend considerable effort optimizing the sophisticated benchmarks so that the benchmarks provide a meaningful comparison.

Naïve baselines . The most naïve baseline is that respondents select the scale midpoint. This baseline represents zero information. A less naïve baseline uses global information from the training respondents by assuming that all variation about the median (average) rating is noise.

Sophisticated benchmark 1: Random Forest and Computer Vision Features.

Computer vision and machine learning have a long history of processing high-dimensional image and video data for object detection and image segmentation. Conventional methods reduce high-dimensional visual data to a small set of “hand engineered” features chosen based on experience of successful prediction. These features are used in conventional machine learning methods such as support vector machines. Our benchmark uses three types of hand-engineered features from computer vision: (1) Histogram of oriented gradients (HOG) features encode edge and shape information. HOG features divide the image into a grid of image patches, calculate the gradients of each patch, and bin these gradients into a histogram. Edge orientation and shape intensity are contained in the gradients’ direction and magnitude values. (2) A downscaled version of the image itself (e.g., 256 x 256 to 32 x 32). And (3) histograms of color values for each red, green, and blue image channel. These features are used in a random forest with 100 trees. We chose a random forest because it performed best when tested against other common machine learning approaches: support vector machines, Gaussian process regression, and L1/L2-regularized linear regression.

Sophisticated benchmark 2: Pre-trained Deep Learning Model.

Many researchers in marketing (and machine learning) use “pre-trained” open-source neural networks trained for one prediction task and repurposed for another prediction task. As a sophisticated benchmark, we used the pre-trained 2 VGG16 deep learning model trained on the ImageNet database. This benchmark outperformed other common pre-trained models (e.g., ResNet50, InceptionV3) for our prediction task. The VGG16 model is a pyramid of sixteen stacked layers (13 convolutional and 3 fully connected) that sequentially reduce images in size until they are classified in the last layer. The initial layers of VGG16 transform pixels to edges and lines found in visual images. For our benchmark, we maintain the initial layers and replace the last classification layer with two batch-normalized fully-connected rectified-linear layers followed by a regression layer. (This architecture was chosen using validation data.) We train the model in two steps. We first freeze the pre-trained layers and use our data to train the new layers. We then fine-tune the entire neural network with scaled-down back-propagated gradients. The two-step procedure improves the benchmark’s prediction.

Results.

The proposed machine learning augmentation outperforms both the naïve baselines and the sophisticated benchmarks. The improvements are substantial. The sophisticated benchmarks do well relative to the naïve benchmarks and the proposed VAE/GAN deep-network does well relative to the sophisticated benchmarks.

Table 2.

Predictive Test of Machine Learning Augmentation versus Baselines and Benchmarks

Prediction Model Mean Absolute Error (std. dev.) Improvement

Baseline: Scale Midpoint (Zero Information) 0.601 (n/a) 0.0% Baseline: Median Rating in Training Images (Best Uniform) 0.603 (n/a) ~0.0 % Benchmark: Computer Vision Features and Random Forest (Conventional Machine Learning) 0.450 (0.003) 25.3 % Benchmark: VGG16 with Fine-Tuned Final Layers (Pre-trained Deep Learning) 0.434 (0.008) 28.0% Proposed Machine Learning Augmentation (Custom Deep Learning) 0.372 (0.004) 38.3 % Figure 4.

Examples of Predictive Accuracy for Machine Learning Augmentation . Our first test is whether or not the embeddings (and attributes) can create credible product aesthetic images. We begin by sampling points in the embedding space, conditioned on desired attributes, then moving smoothly around that space. Each sampled point is on a hypersphere, such that interpolation is smooth in spherical coordinates. For each point in the embedding space, we generate a high-dimensional image. We judge that the images are realistic and are valuable inputs to the design team given their ability to be controllably morphed. We provide a demonstration video at https://vimeo.com/334094197.

Consumer evaluations . To test consumer reaction, we generated 50 targeted images: 25 were predicted by the VAE/GAN to be rated highly and 25 were predicted to be rated poorly. Figure 5 provides examples of generated images of each type. For consistency, we generated each image to be a light gray SUV/CUV from the side view. We added diversity with masks from other body styles (e.g., sedans, hatchbacks). The generated designs were morphed slightly to ensure plausibility by respondents and to mitigate the effect of biases (Lopez, Miller, and Tucker 2019). Using the methods of §6.1, we used 181 respondents from a professional Internet panel (ProdegeMR, at $4 per respondent) to rate these images. Using industry standards, we screened respondents to be SUV/CUV “intenders.” To maintain quality, we pretested the survey carefully. Prior to any data analysis, following best market-research practices, we used “trap” questions to eliminate professional respondents, consistency checks to eliminate inattentive respondents, and response timings (and pattern checking) to eliminate 4 respondents who were just “clicking through.” In total, we eliminated 53% of the initial 385 respondents who began the survey. Respondents generally preferred images that were predicted to be rated high over images that were predicted to be rated low. In our data, respondents confirmed our predictions 74.0% of the time, well above random. We are encouraged by this initial test.

Figure 5 . Example Generated Images (See also https://vimeo.com/334094197)

8. Conclusion

Deep learning methods are beginning to affect all aspects of marketing science, sometimes with methods customized to the challenge, sometimes by using tuned pre-trained models. Many of these methods rely on hard-to-quantify unstructured data such as natural language or images. We focus on consumers’ aesthetic judgments by using state-of-the-art machine learning to augment human judgment. The augmentation comes in two forms. First, we predict consumer evaluations of new potential aesthetic designs from images. Second, we controllably generate new SUV/CUV images to enhance creative design. These goals are challenging. We developed a machine learning augmentation that combines many concepts. We derive a version of the semi-supervised VAE from a probabilistic perspective to maximize the likelihood that an image is encoded well and rated as predicted, while using a low-dimensional embedding to “bottleneck” information between a predictive and generative model. Within the VAE framework, we include attributes to carry information about images and masks to constrain target images to be realistic. We add adversarial training concepts from the GAN literature to improve 5 training of realistic generated images while retaining VAE probabilistic interpretation so that the output does not collapse to known images. Finally, we use a variety of engineering ideas (e.g., spectral normalization, progressive training, residual connections, adaptively balanced training losses) to tune the deep learning model so that it is able to properly converge during training. The end result is a model that predicts image ratings substantially better than other machine learning benchmarks and controllably generates images that are perceived as valuable boosts to creativity. An important contribution is our modeling decisions that address key managerial issues required for using machine intelligence within existing design processes. These contributions are a key difference of our work when contrasted with recent deep learning approaches; e.g., (Kang et al. 2017; Pan et al. 2017; Vasileva et al. 2018, Sbai et al. 2019) and the review by (Deng, Loy, and Tang 2017). We address the delicate integration between machine intelligence and existing human workflows, a point stressed heavily in our working interactions with marketing and design professionals. We focus on augmenting rather automating human expertise and human creativity, and ensuring all models are meaningfully controllable by the respective teams within the firm. For example, conditional independence assumptions enable full model decomposition into the predictive and generative models, allowing iterative and asynchronous training and usage by teams. Another example includes deriving our work from the VAE probabilistic framework rather than GANs, trading off generative realism for generative diversity for creative design. The second managerial issue we address is the relatively limited amount of rated data on product images and aesthetic ratings. This was due to the expense required to collect rated data as well as show many unique designs exist; our rated data on new-SUV/CUV design images were only 203 unique SUV/CUVs over five years of theme clinics. If we were to train the deep learning model on these images alone, the embeddings and the generative capability would be weak if not impossible. We overcome this issue using semi-supervised learning to combine the expensive rated “small data,” with the inexpensive and significantly larger “big data” of unrated images. Combining small data and big data made it feasible to train a deep learning model that does well on the theme-clinic-based data found in firms.

The SUV/CUV application is a proof-of-concept developed to augment a real automotive design team. New vehicles take many years and $1-5 billion in investment (Blonigen, Knittel, and Soderbery 2013). Over time we will learn whether machine learning augmentation has documented monetary benefits. Directly assessing the financial value of aesthetics is challenging given its interrelatedness with 6 confounding factors such as functional attributes, brand identity, and aesthetic trends (Person, Snelders, and Schoormans 2016). Recent promising work into disentangling these factors includes those that explicitly control of covariation in functional and form attributes (Kang et al. 2016; Zhang et al. 2019), as well as those that temporally model aesthetic trends (Yoganarasimhan 2017). For now, our work relies on predictive statistics and human judgment for validation. We hope that other researchers will test machine learning augmentation in other industries, perhaps industries in which the new product development cycle is shorter and less expensive. Our model was engineered to be effective for vehicles after a heavy degree of engineering and model tuning, with much of the engineering challenge coming from the combination of VAE and GAN concepts. Many recent deep learning advances have focused on design generation using GANs, and, as a result, lack the latent space structure needed for the predictive model. When generation alone is the primary goal, GANs tend to outperform other models including VAEs and flow-based approaches; we direct readers to recent GAN applications for aesthetics (Kang et al. 2017; Pan et al. 2017; Sbai et al. 2019). A natural next step is to assess the degree to which the proposed approach augments designer creativity. While perhaps less straightforward to measure, similar questions have seen recent marketing interest in applications such as idea generation (Toubia and Netzer 2017). Assuming our efforts continue to be guided by real design needs and managerial problems, this trend of various research streams being integrated bodes well for a future of hybrid human and machine intelligence. 7

Appendices A1. Summary of Notation 𝑖, 𝑗 indexes product images 𝑑 indexes dimension of high-dimensional image space (i.e., a pixel) 𝑘 indexes dimension of low-dimensional embedding 𝑐, 𝑙 indexes c th product attribute (e.g., brand) with 𝑙 attribute levels (e.g., Cadillac) 𝑋 𝑖 product image 𝑖 𝑥 𝑖𝑑 pixel 𝑑 of product image 𝑖 𝑋̂ 𝑖 product image (generated) 𝑥̂ 𝑖𝑑 pixel 𝑑 of product image 𝑖 (generated ) 𝑦 𝑖 aesthetic rating (known) 𝑦̂ 𝑖 aesthetic rating (predicted) ℎ⃗⃗ 𝑖 product embedding vector 𝑎⃗ 𝑖 product attributes vector (if known) 𝑎̂ 𝑖 product attributes vector (predicted) 𝑀 𝑖 product mask (if known) 𝑚 𝑖𝑑 pixel 𝑑 of mask for product 𝑖 𝑀̂ 𝑖 product mask (generated) 𝑚̂ 𝑖𝑑 pixel 𝑑 of mask for product 𝑖 (generated) A2. Derivation of Approximate Log-Likelihood in Equations 1 §4.1 decomposed 𝑝(𝑦 𝑖 , 𝑋 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗ ), the joint likelihood of a product image, 𝑋 𝑖 , and its aesthetic rating, 𝑦 𝑖 , conditioned on product attributes, 𝑎⃗ 𝑖 , into the predictive model and generative model. We seek to maximize the log-likelihood, log 𝑝(𝑦 𝑖 , 𝑋 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗) , for every datum 𝑖 in our training data with respect to the model parameters 𝛽⃗ . However, parameter estimation and inference of this high-dimensional likelihood is an intractable problem. Instead we approximate the likelihood using variational inference. We first assume the existence of low-dimensional latent embeddings, ℎ⃗⃗ 𝑖 . We introduce ℎ⃗⃗ 𝑖 via marginalization of ℎ⃗⃗ 𝑖 over the joint density in the second line of (A.1). We expand this density to the predictive model and generative model as well as a prior over the product embedding. (A.1) log 𝑝(𝑦 𝑖 , 𝑋 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗) = log ∫ 𝑝(𝑦 𝑖 , 𝑋 𝑖 , ℎ⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗) dℎ⃗⃗ = log ∫ 𝑝 𝑝𝑟𝑒𝑑 (𝑦 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)𝑝 𝑔𝑒𝑛 (𝑋 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)𝑝 𝑝𝑟𝑖𝑜𝑟 (ℎ⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗) dℎ⃗⃗ We seek to learn an embedding distribution rather than just a point estimate of ℎ⃗⃗ 𝑖 . We do not explicitly assume this form of the new joint density with the introduced product embedding, ℎ⃗⃗ 𝑖 , and instead introduce a tractable distribution which we will use to approximate it, 𝑞 𝑒𝑛𝑐 (ℎ⃗⃗ 𝑖 |𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗) , resulting in the “encoder model.” (A.2) log ∫ 𝑝 𝑝𝑟𝑒𝑑 (𝑦 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)𝑝 𝑔𝑒𝑛 (𝑋 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)𝑝 𝑝𝑟𝑖𝑜𝑟 (ℎ⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗) dℎ⃗⃗ = log ∫ 𝑝 𝑝𝑟𝑒𝑑 (𝑦 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)𝑝 𝑔𝑒𝑛 (𝑋 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)𝑝 𝑝𝑟𝑖𝑜𝑟 (ℎ⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗)𝑞 𝑒𝑛𝑐 (ℎ⃗⃗ 𝑖 |𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗) 𝑞 𝑒𝑛𝑐 (ℎ⃗⃗ 𝑖 |𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗)dℎ⃗⃗ = log E ℎ⃗⃗⃗ 𝑖 ~𝑞 𝑒𝑛𝑐 ( ℎ⃗⃗ 𝑖| 𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗ ) [𝑝 𝑝𝑟𝑒𝑑 (𝑦 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)𝑝 𝑔𝑒𝑛 (𝑋 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)𝑝 𝑝𝑟𝑖𝑜𝑟 (ℎ⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗)𝑞 𝑒𝑛𝑐 (ℎ⃗⃗ 𝑖 |𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗) ] Our new goal is to find the best encoder model, 𝑞 𝑒𝑛𝑐 (ℎ⃗⃗ 𝑖 |𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗) , for each datum 𝑖 from an assumed family of tractable densities. In other words, we aim to estimate hyperparameters of the latent product embedding, ℎ⃗⃗ 𝑖 , which index a unique element within an assumed variational distribution family. This results in unique variational parameters for each datum 𝑖 which we implicitly include in the global set of model parameters, 𝛽⃗ . Estimating these parameters using sampling techniques (e.g., MCMC) is intractable, hence we cast sampling as an optimization problem using a lower bound of the expectation in Equation (A.2) via Jensen’s inequality. This approximation is known as the “evidence lower bound,” which is less than or equal to the intractable high-dimensional joint density, log 𝑝(𝑦 𝑖 , 𝑋 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗) . (A.3) log E ℎ⃗⃗⃗ 𝑖 ~𝑞 𝑒𝑛𝑐 ( ℎ⃗⃗ 𝑖| 𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗ ) [𝑝 𝑝𝑟𝑒𝑑 (𝑦 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)𝑝 𝑔𝑒𝑛 (𝑋 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)𝑝 𝑝𝑟𝑖𝑜𝑟 (ℎ⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗)𝑞 𝑒𝑛𝑐 (ℎ⃗⃗ 𝑖 |𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗) ] ≥ E ℎ⃗⃗⃗ 𝑖 ~𝑞 𝑒𝑛𝑐 ( ℎ⃗⃗ 𝑖| 𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗ ) [log 𝑝 𝑝𝑟𝑒𝑑 (𝑦 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)𝑝 𝑔𝑒𝑛 (𝑋 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)𝑝 𝑝𝑟𝑖𝑜𝑟 (ℎ⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗)𝑞 𝑒𝑛𝑐 (ℎ⃗⃗ 𝑖 |𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗) ] With the logarithm moved inside the expectation, we decompose the joint density into three separate terms: the predictive model, the generative models, and the ratio of the encoder and prior model. Under the expectation of the encoder model, this last term ends up being Kullback-Leibler 9 divergence between the encoder and the prior over the embedding, D 𝐾𝐿 (log 𝑞 𝑒𝑛𝑐 ||𝑝 𝑝𝑟𝑖𝑜𝑟 ) . (A.4) E ℎ⃗⃗⃗ 𝑖 ~𝑞 𝑒𝑛𝑐 ( ℎ⃗⃗ 𝑖| 𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗ ) [log 𝑝 𝑝𝑟𝑒𝑑 (𝑦 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)𝑝 𝑔𝑒𝑛 (𝑋 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)𝑝 𝑝𝑟𝑖𝑜𝑟 (ℎ⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗)𝑞 𝑒𝑛𝑐 (ℎ⃗⃗ 𝑖 |𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗) ] = E ℎ⃗⃗⃗ 𝑖 ~𝑞 𝑒𝑛𝑐 ( ℎ⃗⃗ 𝑖| 𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗ ) [log 𝑝 𝑝𝑟𝑒𝑑 (𝑦 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)] + E ℎ⃗⃗⃗ 𝑖 ~𝑞 𝑒𝑛𝑐 ( ℎ⃗⃗ 𝑖| 𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗ ) [log 𝑝 𝑔𝑒𝑛 (𝑋 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)]− E ℎ⃗⃗⃗ 𝑖 ~𝑞 𝑒𝑛𝑐 ( ℎ⃗⃗ 𝑖| 𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗ ) [log 𝑞 𝑒𝑛𝑐 (ℎ⃗⃗ 𝑖 |𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗)𝑝 𝑝𝑟𝑖𝑜𝑟 (ℎ⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗) ] = E ℎ⃗⃗⃗ 𝑖 ~𝑞 𝑒𝑛𝑐 ( ℎ⃗⃗ 𝑖| 𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗ ) [log 𝑝 𝑝𝑟𝑒𝑑 (𝑦 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)] + E ℎ⃗⃗⃗ 𝑖 ~𝑞 𝑒𝑛𝑐 ( ℎ⃗⃗ 𝑖| 𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗ ) [log 𝑝 𝑔𝑒𝑛 (𝑋 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)]− D 𝐾𝐿 (log 𝑞 𝑒𝑛𝑐 (ℎ⃗⃗ 𝑖 |𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗)||𝑝 𝑝𝑟𝑖𝑜𝑟 (ℎ⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗)) = ℓ 𝑎𝑝𝑝𝑟𝑜𝑥𝑖 (𝛽⃗) These three terms comprise the approximation, ℓ 𝑎𝑝𝑝𝑟𝑜𝑥𝑖 (𝛽⃗) , that we maximize. Since the Kullback-Leibler divergence term is negative, maximizing the overall approximation includes minimizing distributional dissimilarity between the posterior of the embedding given by the encoder model, 𝑞 𝑒𝑛𝑐 (ℎ⃗⃗ 𝑖 |𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗) , and the distributional prior that we choose, 𝑝 𝑝𝑟𝑖𝑜𝑟 (ℎ⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗) . If we minimize this divergence to zero, the approximate likelihood is equal to the true likelihood, i.e., log 𝑝(𝑦 𝑖 , 𝑋 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗) =ℓ 𝑎𝑝𝑝𝑟𝑜𝑥𝑖 (𝛽⃗) . Thus, maximizing ℓ 𝑎𝑝𝑝𝑟𝑜𝑥𝑖 (𝛽⃗) lower bounds the previously intractable likelihood maximization of log 𝑝(𝑦 𝑖 , 𝑋 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗) . A3. Derivation of Approximate Log-Likelihood from Equations 3

We derive Equation 3 using the same reasoning as in Appendix A2. The only difference between Equation 1 and Equation 3 is the additional term, 𝐷 𝐾𝐿 (𝑞 𝑎𝑡𝑡𝑟 (𝜋⃗⃗ 𝑖 |𝑋 𝑖 , 𝛽⃗ 𝐸 )| 𝑝 𝑎𝑡𝑡𝑟 (𝜋⃗⃗ 𝑖 |𝑎⃗ 𝑖 )) . This term is the product attribute classifier with associated cross entropy loss term given in Table 1. We need this term to train the encoder model to be able to predict product attributes when they are not in the data, in particular, during prediction of new designs from generative model. Adding the latent variables for parameters of the multinomial attribute classifier, 𝜋⃗⃗ 𝑖 , results in a double integral and a corresponding expectation over the joint density of both ℎ⃗⃗ 𝑖 and 𝜋⃗⃗ 𝑖 . Our assumptions on factorization of the latent terms splits into two KL-divergence terms in the last line of the derivation. See Keng (2017) for additional discussion on the relation between KL-divergence and the cross entropy loss term. 0 (A.4) log 𝑝(𝑦 𝑖 , 𝑋 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗) = log ∬ 𝑝(𝑦 𝑖 , 𝑋 𝑖 , ℎ⃗⃗ 𝑖 , 𝜋 𝑖 ⃗⃗⃗⃗|𝑎⃗ 𝑖 , 𝛽⃗) dℎ⃗⃗𝑑𝜋⃗⃗ = log ∬ 𝑝 𝑝𝑟𝑒𝑑 (𝑦 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)𝑝 𝑔𝑒𝑛 (𝑋 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)𝑝 𝑝𝑟𝑖𝑜𝑟 (ℎ⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗)𝑝 𝑎𝑡𝑡𝑟 (𝜋⃗⃗ 𝑖 |𝑎⃗ 𝑖 , ℎ⃗⃗ 𝑖 , 𝛽⃗) dℎ⃗⃗𝑑𝜋⃗⃗ = log ∬ 𝑝 𝑝𝑟𝑒𝑑 (𝑦 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)𝑝 𝑔𝑒𝑛 (𝑋 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)𝑝 𝑝𝑟𝑖𝑜𝑟 (ℎ⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗)𝑝 𝑎𝑡𝑡𝑟 (𝜋⃗⃗ 𝑖 |𝑎⃗ 𝑖 , ℎ⃗⃗ 𝑖 , 𝛽⃗)𝑞(ℎ⃗⃗ 𝑖 , 𝜋⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝑋 𝑖 , 𝛽⃗) 𝑞(ℎ⃗⃗ 𝑖 , 𝜋⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝑋 𝑖 , 𝛽⃗)dℎ⃗⃗𝑑𝜋⃗⃗ = log E ℎ⃗⃗⃗ 𝑖 ,𝜋⃗⃗⃗ 𝑖 ~𝑞(ℎ⃗⃗⃗ 𝑖 ,𝜋⃗⃗⃗ 𝑖 |𝑎⃗⃗ 𝑖 ,𝑋 𝑖 ,𝛽⃗⃗⃗) [𝑝 𝑝𝑟𝑒𝑑 (𝑦 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)𝑝 𝑔𝑒𝑛 (𝑋 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)𝑝 𝑝𝑟𝑖𝑜𝑟 (ℎ⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗)𝑝 𝑎𝑡𝑡𝑟 (𝜋⃗⃗ 𝑖 |𝑎⃗ 𝑖 , ℎ⃗⃗ 𝑖 , 𝛽⃗)𝑞(ℎ⃗⃗ 𝑖 , 𝜋⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝑋 𝑖 , 𝛽⃗) ] ≥ E ℎ⃗⃗⃗ 𝑖 ,𝜋⃗⃗⃗ 𝑖 ~𝑞(ℎ⃗⃗⃗ 𝑖 ,𝜋⃗⃗⃗ 𝑖 |𝑎⃗⃗ 𝑖 ,𝑋 𝑖 ,𝛽⃗⃗⃗) [log 𝑝 𝑝𝑟𝑒𝑑 (𝑦 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)𝑝 𝑔𝑒𝑛 (𝑋 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)𝑝 𝑝𝑟𝑖𝑜𝑟 (ℎ⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗)𝑝 𝑎𝑡𝑡𝑟 (𝜋⃗⃗ 𝑖 |𝑎⃗ 𝑖 , ℎ⃗⃗ 𝑖 , 𝛽⃗)𝑞(ℎ⃗⃗ 𝑖 , 𝜋⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝑋 𝑖 , 𝛽⃗) ] = E ℎ⃗⃗⃗ 𝑖 ,𝜋⃗⃗⃗ 𝑖 ~𝑞(ℎ⃗⃗⃗ 𝑖 ,𝜋⃗⃗⃗ 𝑖 |𝑎⃗⃗ 𝑖 ,𝑋 𝑖 ,𝛽⃗⃗⃗) [log 𝑝 𝑝𝑟𝑒𝑑 (𝑦 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗) + log 𝑝 𝑔𝑒𝑛 (𝑋 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗) + log 𝑝 𝑝𝑟𝑖𝑜𝑟 (ℎ⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗)𝑞 𝑒𝑛𝑐 (ℎ⃗⃗ 𝑖 |𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗)+ log 𝑝 𝑎𝑡𝑡𝑟 (𝜋⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗)𝑞 𝑎𝑡𝑡𝑟 (𝜋⃗⃗ 𝑖 |𝑋 𝑖 , 𝛽⃗)] = E 𝜋⃗⃗⃗ 𝑖 ~𝑞(𝜋⃗⃗⃗ 𝑖 |𝑋 𝑖 ,𝛽⃗⃗⃗) [E ℎ⃗⃗⃗ 𝑖 ~𝑞(ℎ⃗⃗⃗ 𝑖 |𝑎⃗⃗ 𝑖 ,𝑋 𝑖 ,𝛽⃗⃗⃗) [log 𝑝 𝑝𝑟𝑒𝑑 (𝑦 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗) + log 𝑝 𝑔𝑒𝑛 (𝑋 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗) − log 𝑞 𝑒𝑛𝑐 (ℎ⃗⃗ 𝑖 |𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗)𝑝 𝑝𝑟𝑖𝑜𝑟 (ℎ⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗)− log 𝑞 𝑎𝑡𝑡𝑟 (𝜋⃗⃗ 𝑖 |𝑋 𝑖 , 𝛽⃗)𝑝 𝑎𝑡𝑡𝑟 (𝜋⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗)]] = E ℎ⃗⃗⃗ 𝑖 ~𝑞(ℎ⃗⃗⃗ 𝑖 |𝑎⃗⃗ 𝑖 ,𝑋 𝑖 ,𝛽⃗⃗⃗) [log 𝑝 𝑝𝑟𝑒𝑑 (𝑦 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗) + log 𝑝 𝑔𝑒𝑛 (𝑋 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗) − log 𝑞 𝑒𝑛𝑐 (ℎ⃗⃗ 𝑖 |𝑋 𝑖 , 𝑎⃗ 𝑖 , 𝛽⃗)𝑝 𝑝𝑟𝑖𝑜𝑟 (ℎ⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗) ]− E 𝜋⃗⃗⃗ 𝑖 ~𝑞(𝜋⃗⃗⃗ 𝑖 |𝑋 𝑖 ,𝛽⃗⃗⃗) [log 𝑞 𝑎𝑡𝑡𝑟 (𝜋⃗⃗ 𝑖 |𝑋 𝑖 , 𝛽⃗)𝑝 𝑎𝑡𝑡𝑟 (𝜋⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗)] = E ℎ⃗⃗⃗ 𝑖 ~𝑞(ℎ⃗⃗⃗ 𝑖 |𝑎⃗⃗ 𝑖 ,𝑋 𝑖 ,𝛽⃗⃗⃗) [log 𝑝 𝑝𝑟𝑒𝑑 (𝑦 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)] + E ℎ⃗⃗⃗ 𝑖 ~𝑞(ℎ⃗⃗⃗ 𝑖 |𝑎⃗⃗ 𝑖 ,𝑋 𝑖 ,𝛽⃗⃗⃗) [log 𝑝 𝑔𝑒𝑛 (𝑋 𝑖 |ℎ⃗⃗ 𝑖 , 𝛽⃗)]− D 𝐾𝐿 [𝑞(ℎ⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝑋 𝑖 , 𝛽⃗)||𝑝 𝑝𝑟𝑖𝑜𝑟 (ℎ⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗)] − D 𝐾𝐿 (𝑞 𝑎𝑡𝑡𝑟 (𝜋⃗⃗ 𝑖 |𝑋 𝑖 , 𝛽⃗)| 𝑝 𝑎𝑡𝑡𝑟 (𝜋⃗⃗ 𝑖 |𝑎⃗ 𝑖 , 𝛽⃗)) A4. Example Rating Page from Aesthetic Rating Survey used in Theme Clinic References

Aaker DA, Keller KL (1990) Consumer evaluations of brand extensions.

Journal of Marketing

Adidas AG 2017 Annual Report . Ansari A, Li Y, Zhang JZ (2018) Probabilistic topic model for hybrid recommender systems: A stochastic variational bayesian approach.

Marketing Science

International Conference on Machine Learning . 214–223. Berlyne DE (1971)

Aesthetics and Psychobiology (Appleton-Century-Crofts, East Norwalk, CT, US). Berthelot D, Schumm T, Metz L (2017) BEGAN: Boundary Equilibrium Generative Adversarial Networks. arXiv:1703.10717 [cs, stat]. Blei DM, Kucukelbir A, McAuliffe JD (2017) Variational Inference: A Review for Statisticians.

Journal of the American Statistical Association

Journal of Marketing

Keeping it Fresh: Strategic Product Redesigns and Welfare (National Bureau of Economic Research). Bouchard C, Aoussat A, Duchamp R (2006) Role of sketching in conceptual design of car styling.

Journal of Design Research

Journal of the American Statistical Association

Cowles Foundation Discussion Paper No. 2176 . Chan TH, Mihm J, Sosa ME (2018) On styles in product design: An analysis of U.S. design patents.

Management Science arXiv:1611.02731 [cs, stat] . Cho H, Hasija S, Sosa M (2015)

How Important is Design for the Automobile Value Chain? (Social Science Research Network, Rochester, NY). Clement J (2007) Visual influence on in-store buying decisions: An eye-track experiment on the visual influence of packaging design.

Journal of Marketing Management

Watches Tell More Than Time: Product Design, Information, and the Quest for Elegance (McGraw-Hill London). Cooper RG (1990) Stage-gate systems: A new tool for managing new products.

Business horizons

Journal of Product Innovation Management

Design Studies

Journal of Product Innovation Management

IEEE Signal Processing Magazine , (4), 80-106. 3 Dilokthanakul N, Mediano PAM, Garnelo M, Lee MCH, Salimbeni H, Arulkumaran K, Shanahan M (2016) Deep unsupervised clustering with gaussian mixture variational autoencoders. arXiv:1611.02648 [cs, stat] . Dotson J, Beltramo M, Feit E, Smith R (2016). Modeling the Effect of Images on Product Choices. SSRN Electronic Journal . Dzyabura D, Hauser JR, El Kihal S, Ibragimov M (2018) Leveraging the power of images in predicting product return rates.

SSRN Electronic Journal . Evgeniou T, Pontil M (2004) Regularized multi–task learning.

Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . (ACM), 109–117. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014). Generative adversarial nets.

Advances in Neural Information Processing Systems . 2672-2680. Hartley J (1996)

Brands Through the Lens of Style.

Quest and Associates.

Hauser J, Eggers F, Selove M (2018). The Strategic Implications of Scale in Choice-Based Conjoint Analysis. Hekkert P, Snelders D, Wieringen PC (2003) ‘Most advanced, yet acceptable’: Typicality and novelty as joint predictors of aesthetic preference in industrial design.

British Journal of Psychology arXiv:1807.03026 [cs, stat] . Heljakka A, Solin A, Kannala J (2019) Towards photographic image manipulation with balanced growing of generative autoencoders. arXiv:1904.06145 [cs, stat] . Hertenstein JH, Platt MB, Veryzer RW (2005) The impact of industrial design effectiveness on corporate financial performance*.

Journal of Product Innovation Management

Advances in Neural Information Processing Systems . 6626–6637. Hoffman M, Blei D, Wang C, Paisley J (2013). Stochastic variational inference.

Journal of Machine Learning Research

Journal of Marketing

Advances in Neural Information Processing Systems . International Journal of Data Science and Analytics arXiv:1611.01144 [cs, stat] . Jindal RP, Sarangee KR, Echambadi R, Lee S (2016) Designed to succeed: Dimensions of product design and their impact on market share.

Journal of Marketing

Machine learning . 207–216. Keng (2017) Semi-supervised Learning with Variational Autoencoders. Available: https://bit.ly/2O9RvF8. Karjalainen TM, Snelders D (2010) Designing visual recognition for the brand.

Journal of Product Innovation Management arXiv preprint arXiv:1710.10196 . Keller KL (2003) Brand Synthesis: The multidimensionality of brand knowledge.

Journal of Consumer Research

International Conference on Learning Representation . (San Diego, California). Kingma DP, Mohamed S, Rezende DJ, Welling M (2014) Semi-supervised learning with deep generative models.

Advances in Neural Information Processing Systems . 3581–3589. Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 . Kreuzbauer R, Malter AJ (2005) Embodied cognition and new product design: Changing product form to influence brand categorization.

Journal of Product Innovation Management

Marketing Science arXiv preprint arXiv:1512.09300 . Liu L, Dzyabura D, Mizik N (2017) Visual listening in: Extracting brand image portrayed on social media. Liu X, Lee D, Srinivasan K (2018) Large scale cross-category analysis of consumer review content on sales conversion leveraging deep learning.

Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence . Liu Y, Li KJ, Chen H, Balachander S (2017) The effects of products’ aesthetic design on demand and marketing-mix effectiveness: The role of segment prototypicality and brand consistency.

Journal of Marketing

Journal of Mechanical Design , (2), 021104. Makhzani A, Shlens J, Jaitly N, Goodfellow I, Frey B (2015) Adversarial autoencoders. arXiv preprint arXiv:1511.05644 . Manoogian J II (2013) Vehicle design process used at general motors. Martindale C (1990) The Clockwork Muse: The Predictability of Artistic Change (Basic Books, New York, NY). Miyato T, Kataoka T, Koyama M, Yoshida Y (2018) Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 . Noble CH, Kumar M (2010) Exploring the appeal of product design: A grounded, value-based model of key design elements and relationships.

Journal of Product Innovation Management

Emotional Design: Why We Love (or Hate) Everyday Things (Basic books, New York, NY). Olshausen BA, Field DJ (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images.

Nature

Journal of Mechanical Design

Journal of marketing , (3):64-81. Orme B, Chrzan K (2017). Becoming an Expert in Conjoint Analysis: Choice Modelling for Pros. Sawtooth Software. Palazzolo M, Feinberg F (2015) Modeling consideration set substitution. Working Paper - University of Michigan. 5 Pan Y, Burnap A, Hartley J, Gonzalez R, Papalambros PY (2017) Deep design: Product aesthetics for heterogeneous markets. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . KDD ’17. (ACM, New York, NY, USA), 1961–1970. Pauwels K, Silva-Risso J, Srinivasan S, Hanssens DM (2004) New products, sales promotions, and firm value: The case of the automobile industry.

Journal of Marketing

Journal of Marketing Management

Design Studies arXiv:1511.02386 [cs, stat] . Ranscombe C, Hicks B, Mullineux G, Singh B (2012) Visually decomposing vehicle images: Exploring the influence of different aesthetic features on consumer perception of brand.

Design Studies

Journal of Mechanical Design

The Journal of Product and Brand Management; Santa Barbara

Marketing Science

Computer Vision – ECCV 2018 Workshops . (Springer International Publishing, Cham), 37–44. Shu R, Bui HH, Zhao S, Kochenderfer MJ, Ermon S (2018) Amortized inference regularization. arXiv:1805.08913 [cs, stat] . Sohn K, Lee H, Yan X (2015) Learning structured output representation using deep conditional generative models.

Advances in Neural Information Processing Systems . 3483–3491. Sun C, Ghose A, Liu X (2018) Automating Online-Offline Data Multi-View Merger for Integrated Marketing. Available at SSRN: http://dx.doi.org/10.2139/ssrn.3265400. Timoshenko A, Hauser JR (2019) Identifying customer needs from user-generated content.

Marketing Science

Thirty-Second AAAI Conference on Artificial Intelligence . Vasileva, M, Plummer B, Dusad K, Rajpal S, Kumar R, Forsyth D (2018). Learning type-aware embeddings for fashion compatibility. In

Proceedings of the European Conference on Computer Vision (ECCV) . 390-405. Vlasic B (2011)

Once upon a car: The fall and resurrection of America’s big three auto makers–GM, Ford, and Chrysler (William Morrow). Yoganarasimhan H (2017) Identifying the presence and cause of fashion cycles in data.

Journal of Marketing Research arXiv:1904.07964 [cs, stat]arXiv:1904.07964 [cs, stat]