Quantifying Confounding Bias in Generative Art: A Case Study
QQuantifying Confounding Bias in Generative Art: A Case Study
Ramya Srinivasan and
Kanji Uchino
Fujitsu Laboratories of America
Abstract
In recent years, AI generated art has become very popular.From generating art works in the style of famous artists likePaul Cezanne and Claude Monet to simulating styles of artmovements like Ukiyo-e, a variety of creative applicationshave been explored using AI. Looking from an art historicalperspective, these applications raise some ethical questions.Can AI model artists’ styles without stereotyping them? DoesAI do justice to the socio-cultural nuances of art movements?In this work, we take a first step towards analyzing these is-sues. Leveraging directed acyclic graphs to represent poten-tial process of art creation, we propose a simple metric toquantify confounding bias due to the lack of modeling theinfluence of art movements in learning artists’ styles. As acase study, we consider the popular cycleGAN model and an-alyze confounding bias across various genres. The proposedmetric is more effective than state-of-the-art outlier detectionmethod in understanding the influence of art movements inartworks. We hope our work will elucidate important short-comings of computationally modeling artists’ styles and trig-ger discussions related to accountability of AI generated art.
From healthcare and finance to judiciary and surveillance,artificial intelligence (AI) is being employed in a wide va-riety of applications (Buch, Ahmed, and Maruthappu 2018;Lin 2019; Feldstein 2019). AI has also made inroads intocreative fields such as music, dance, poetry, storytelling,cooking, and fashion design to name a few (Engel et al.2019; Pettee et al. 2019; Liu et al. 2018; Varshney et al.2019; Jandial et al. 2020). Creating portraits, generatingpaintings in the “style” of famous artists, style transfer(i.e. transferring the contents of one image according tothe style of another image), and creating novel art styleshave been some popular applications of AI in art genera-tion (Zhu et al. 2017; Tan et al. 2017; Elgammal et al. 2017;Gatys, Ecker, and Bethge 2016).With the growing adoption of AI, a large body of work hasanalyzed its ethical impacts in sensitive applications such asin medicine and law enforcement (Obermeyer et al. 2019;Buolamwini and Gebru 2018; Lum, Boudin, and Price 2020;Raghavan et al. 2020). Of late, there has been considerable interest in understanding AI related biases in creative tasksas well. For example, in (Prates, Avelar, and Lamb 2019), theauthors investigate gender bias in AI generated translations.A recent work by researchers at Allen Institute of ArtificialIntelligence demonstrates toxicity in popular language mod-els (Wiggers 2020). In (Jain et al. 2020), the authors showthat synthetic images obtained from Generative Adversar-ial Networks (GANs) exacerbate biases of training data. Anotable instance of bias in AI generated art concerns an appcalled “AIportraits” that is shown to exhibit racial bias (Ong-weso 2019). It was pointed out that skin color of people ofcolor is lightened in the app’s portrait rendition.In addition to noticeable biases concerning race, gender,etc., there can be several latent biases in AI generated art,especially in the context of modeling artist’s style and styletransfer. For example, the authors in (Srinivasan and Uchino2020) leverage causal models to study several types of bi-ases in modeling art styles and discuss socio-cultural impli-cations of the same. In a similar vein, the authors in (Hassineand Neeman 2019) discuss some of the shortcomings in AIgenerated art and argue that such art is rife with culturallybiased interpretations.
Artworks have often been used to document important his-torical events such as wars, political developments, mytho-logical facts, literary anecdotes, and many aspects of every-day lives of common people (Rabb and Brown 1986). Forexample, ancient Greek art is abundant with mythologicalpaintings depicting Goddesses like Athena and Hera, manyIndian artworks portray political and historical events suchas the Anglo-Maratha wars and the Anglo-Sikh wars, andancient Egyptian genre art illustrate culturally rich scenesfrom lives of ordinary people such as how women preparedfood and how people measured harvest. Art movements en-tail a wealth of information related to culture, politics, andsocial structure of past times, and these aspects are often notcaptured in generated art. Furthermore, owing to automa-tion bias exhibited by people (Skitka, Mosier, and Burdick1999), an AI generated art that fails to justify subtleties ofart movements can precipitate bias in understanding history.Furthermore, any generated art that claims to mimicartists’ styles should not stereotype artists based on a sin-gle algorithmically quantifiable metric such as color, brush- a r X i v : . [ c s . C Y ] F e b igure 1: Sample Illustration of Impressionism and Post- Impressionism artworks used in the analysis. Impressionism works were char-acterized by vibrant colors, spontaneous and accurate rendering of light, color, and atmosphere, focusing mostly on urban lifestyles. Post-Impressionism works focused on lives of ordinary people, depicting emotions and other symbolic contents strokes, texture, etc. As artist Paul Cezanne describes “If Iwere called upon to define briefly the word Art, I should callit the reproduction of what the senses perceive in nature,seen through the veil of the soul” . Thus, several cognitiveaspects such as perception, memory, beliefs, and emotionsinfluence artists and artworks. In reality, many of these as-pects can never be observed or measured, and thus the truestyle of any artist cannot be computationally modeled. Mod-els like (Zhu et al. 2017) and (Tan et al. 2017) that claim togenerate art in the styles of artists like Claude Monet, Vin-cent Van Gogh, and others are at best capturing correlationfeatures like colors or brushstrokes and overlooking manylatent aspects (such as culture and emotions) that character-ize artists’ styles (Hertzmann 2018).For aforementioned reasons, understanding biases in AIgenerated art is a necessary task. Given the prevalence of alarge number of tools to easily mimic artists “styles”, thistask becomes even more pertinent. Prior work has mostlyfocused on qualitatively analyzing biases in AI generated art(Srinivasan and Uchino 2020; Hassine and Neeman 2019).In this work, we provide a quantitative analysis of confound-ing biases in AI generated art. In general, confounding bi-ases arise due to unmeasured factors that influence both theinputs and outputs of interest. In particular, we quantify theconfounding bias due to the lack of modeling of art move-ment’s influence on artists and artworks.Art movements can be described as tendencies or styles inart with a specific common philosophy influenced by variousfactors such as cultures, geographies, political-dynasticalmarkers, etc. and followed by a group of artists during a spe-cific period of time (Wikiart 2020). Renaissance art, Mod-ern art, and Ukiyo-e are some examples of art movements.Further, each art movement can have sub-categories. Forexample, modern art includes many sub-categories such asDadaism, Impressionism, Post-impressionism, Naturalism,Cubism, Futurism, etc.Let us consider Impressionism and Post-impressionismas these are the art movements analyzed in the paper. Fig- ure 1 provides an illustration of artworks belonging to thesemovements. Although both these movements originated inFrance, there are marked by subtle differences. Impression-ism was characterized by spontaneous brush strokes, vibrantcolors, and urban life styles. Impressionists emphasized onaccurate depiction of light with its changing quality, precisecharacterization of movement, and the atmosphere (Oxford-Art-Online 2021). Post-impressionism originated in reac-tion to Impressionism. Post Impressionists rejected Impres-sionists’ concern over accurate depiction of color, insteadthey focused on symbolic depiction of content, formal order,and structure. Post-Impressionism artists focused on livesof ordinary people to naturally depict their emotions andlifestyles (Oxford-Art-Online 2021). Thus art movement isa dominant factor influencing both the artists and artworks.A model that ignores the influence of art movement in mod-eling artist’s style can thus fail to capture socio-cultural nu-ances and contribute to confounding bias.
As a case study, we consider the cycleGAN model (Zhuet al. 2017) which has been used to model styles of PaulCezanne, Claude Monet, and Vincent van Gogh. This is afully automated method without involving human (i.e. artist)in the loop. Studying biases associated with fully automatedAI methods is an essential precursor to understand biases inAI methods that aid artists in completing an art. This is be-cause the latter set of methods can involve both artist andAI related biases, and understanding AI related biases inde-pendent of artist specific bias can thus be very beneficial.Therefore, we find the model proposed in (Zhu et al. 2017)appropriate for our case study.We consider the influence of Impressionism and Post-Impressionism in modeling artists’ styles as these were thedominant art movements that influenced the artists underconsideration in (Zhu et al. 2017). We evaluate the bias dueto lack of consideration of art movement in modeling artists’style in the cycleGAN model across various genres such asandscapes, cityscapes, still life, and flower paintings. It isworth noting that most existing AI methods used to generateart styles largely focus on western art movements. Ideally, itis important to study the biases in generative art correspond-ing to non-western art movements, as these art movementsare at greater risk being biased due to the already existing so-cial structural disparities. However, due to the paucity of ex-isting AI tools that model multiple art styles of non-westerntraditions, we have focused on the two aforementioned west-ern art movements for analysis.Motivated by (Srinivasan and Uchino 2020), we leveragedirected acyclic graphs (Pearl 2009) in order to estimate con-founding bias. First, causal relationships between art move-ment, artists, artworks, art material, genre, and other rele-vant factors are encoded via directed acyclic graphs (DAGs).DAGs serve as accessible visual analysis and interpretationtools for art historians to encode their domain knowledge.As we are interested in understanding the causal influenceof the artist on the artwork, in our DAG, artist is the inputvariable and artwork is the output variable. Art movement,art material, and genres are potential confounders. Next, theminimum adjustment set to remove confounding bias is de-termined using d-separation rules and backdoor adjustmentformula (Pearl 2009). As our goal is to analyze the role ofart movement in modeling artists’ style, we fix genre andart material across the images used in our analysis. Thus wehave to only adjust for art movement.The computation of confounding bias is based on the ideaof covariate matching (Stuart 2010). Suppose the set of realartworks of an artist i is denoted by A i and cycleGAN gen-erated images corresponding to the artist is denoted by G i .Further, let A j where j ∈ (1 , , ...n ) , j (cid:54) = i be the set of realartworks of other artists belonging to the same art move-ment as artist i . First, a RESNET50 architecture is trained todistinguish between Impressionism and Post Impressionismartworks (He et al. 2015). Then, using the learned classifier’sfeatures representative of the art movement, every elementof A i is matched with its nearest neighbor in G i . Next, ev-ery element of A i is matched with its nearest neighbor in A j .As there can be many artists belonging to the same art move-ment, we compute nearest neighbor of A i with respect to allsuch artists. As all confounders other than art movement arefixed across all the images in the analysis, any difference be-tween the matched pairs should reflect the bias due to lack ofmodeling art movement. In an ideal scenario where the styleof the artist is accurately modeled, the mismatch between A i and G i should be low, and the mismatch between A i and A j should be high, assuming any two artists have distinct stylesof their own. Using these intuitions, we propose a simplemetric to quantify confounding bias due to the lack of mod-eling art movement. We also show how our metric is able toquantify bias that state-of-the-art outlier detection methods(Shastry and Oore 2020) cannot capture. Our findings show that understanding the influence of artmovement is essential for learning about artists’ style. Thisis even more important for learning the styles of artistswhose works largely belong to one art movement, (e.g. Claude Monet, whose works mostly belong to the Impres-sionism art movement). This is because the influence ofart movement is likely to be higher for such an artist thanthose whose works span various art movements. We elabo-rate these insights in Section 6. In reality, the true style ofan artist cannot be modeled due to many unobserved con-founders such as the emotions, beliefs, and other cognitiveabilities of the artist. In this regard, we hope our work trig-gers inter-disciplinary discussions related to accountabilityof AI generated art such as the need to understand feasi-bility of modeling artists’ styles, the need for incorporatingdomain knowledge in AI based art generation, and the socio-cultural consequences of AI generated art.The rest of the paper is organized as follows. Section 2reviews some related work. In Section 3, we provide anoverview of directed acyclic graphs that we leverage tomodel confounding bias. In Section 4, we describe con-founding bias with illustrations. In Section 5, we provide anoverview of the method. We report results from our experi-ments in Section 6. We analyze and discuss the implicationof the results in Section 7, before concluding in Section 8.
There has been a growing interest in using AI to generateart. A good review about AI powered artworks can be foundin (Miller 2019). There are a variety of AI models to gen-erate art, generative adversarial networks (GANs) being aprominent type. Models such as (Zhu et al. 2017), (Elgam-mal et al. 2017) and (Tan et al. 2017) are just some illus-trations of GAN based art generation. In (Gatys, Ecker, andBethge 2016), a convolutional neural network architecture isproposed for style transfer. There are also open source plat-forms that lets end-users to easily create art. For example,(Macnish 2018) allows a user to convert a photo into a car-toon. With (Artbreeder 2020), one can blend the contents ofa photo in the style of another. Platforms like (AIportraits2020) and (GoART 2020) claim to convert a user uploadedphoto in the style of famous artists and art movements.It is also interesting to note that art has been used toexpose bias in the AI pipeline. A very prominent exam-ple in this regard is the ‘Imagenet Roulette’ project by AIresearcher Kate Crawford and artist Trevor Paglen (Craw-ford and Paglen 2019), wherein biases in machine learningdatasets are highlighted through art. A convolutional neu-ral network based architecture is proposed in (Mordvintsev,Olah, and Tyka 2015) that helps to visualize the workings ofvarious layers in deep networks by creating dream-like ap-pearances. These visualizations can aid in understanding thefunctioning of various layers.Some recent works have exposed biases in AI generatedart. For instance, it was reported in (Sung 2019; Ongweso2019) that the AIportraits app (AIportraits 2020) was biasedagainst people of color. In (Jain et al. 2020), consideringsynthetic images generated by GAN, the authors point outthat GAN architectures are prone to exacerbating biases oftraining data. The authors in (Hassine and Neeman 2019)discuss some shortcomings of AI generated art and arguethat such art is rife with cultural biases. The closest work tothe present work is (Srinivasan and Uchino 2020), whereinhe authors leverage causal graphs to qualitatively highlightvarious types of biases in AI generated art. We take a stepfurther: leveraging the model proposed in (Srinivasan andUchino 2020), we quantify confounding bias in AI generatedart. This kind of quantitative analysis provides an objectivemeasure for understanding bias.
A directed acyclic graph (DAGs) is a directed graph withoutany loops or cycles. Variables of interest are represented bynodes in the graph and the directed edges between them in-dicate the causal relations. These directions are often basedon assumptions of domain experts and available knowledge.DAGs allow encoding of assumptions about data, model,and analysis, and serve as a tool to test for various biasesunder such assumptions. DAGs facilitate domain expertssuch as art historians to encode their assumptions, and henceserve as accessible data visualization and analysis tools.As noted in (Srinivasan and Uchino 2020), there are sev-eral aspects that can characterize an artwork. These includethe artist, art material, genre, art movement, etc. The rela-tionships between these various aspects can be determinedby domain experts. For example, a domain expert (e.g. arthistorian) may premise that genre can influence both theartist and the artwork. DAGs aid in visualizing the relation-ships between these various aspects. Figure 2 provides anillustration of a DAG encoding one set of such assumptions.It is to be noted that depending on the assumptions of vari-ous domain experts, there can be other DAGs describing therelationship of an artwork with the artist, genre, art move-ment, etc. However, confounding biases can be analyzedseparately for each DAG, thereby enhancing the robustnessof analysis.Given a DAG, d-separation is a criterion for decidingwhether a set X of variables is independent of another set Z , given a third set Y . The idea is to associate “depen-dence” with “connectedness” (i.e., the existence of a con-necting path) and “independence” with “unconnected-ness”or “separation” (Pearl 2009). Path here refers to any consec-utive sequence of edges, disregarding their direction.Consider a three vertex graph consisting of vertices X , Y ,and Z . There are three basic types of relations using whichany pattern of arrows in a DAG can be analyzed, these beingas follows.• X → Y → Z (causal chain/mediation)• X ← Y → Z (confounder)• X → Y ← Z (collider)In the first case, the effect of X on Z is mediated through Y . Conditioning on Y , X becomes independent of Z or Y is said to block the path from X to Z .In the second case, Y is a common cause of X and Z . Y is a confounder as it causes spurious correlations between X and Z . Conditioning on Y , the path from X to Y is blocked.This is the scenario we will analyze in detail in this paper.For example, for the DAG in Figure 2, genre G , art move-ment A , and art material M are all confounders in being ableto determine the causal effect of artist X on artwork Z . The Figure 2: DAG for the case study considered. X: Artist, Z: Art-work, A: Art movement, G: Genre, M: Art material. Image Source:[Srinivasan and Uchino 2020] causal effect of artist on artwork captures artist’s influenceon the artwork, and hence reflective of their style.In the last case, Y is a collider as two arrows enter into it.As such, the path from X to Y is blocked. Upon condition-ing on Y , the path will be unblocked. In general, a set Y isadmissible (or “sufficient”) for estimating the causal effectof X on Z if the following two conditions hold (Pearl 2009):• No element of Y is a descendant of X • The elements of Y block all backdoor paths from X to Z —i.e., all paths that end with an arrow pointing to X .Thus we need to block all backdoor paths in order to removethe effect of confounders (which can introduce spurious cor-relations) in determining the causal effects of interest. Withthis background, we discuss confounding bias in more detailin the following section. The style of an artist is characterized by several aspects.Some such aspects may be observable (e.g. art material,genre, art movements, etc.) and some others such as emo-tions, beliefs, prejudices, memory, etc. cannot be perceivedor observed. For this reason, the true style of any artist can-not be computationally captured. Our goal is thus not tocomputationally model any artist’s style, but to analyticallyhighlight the shortcomings in the models that claim to mimicartists’ style. As the bias with respect to unobserved cogni-tive aspects such as emotions, memory, etc. can never bemeasured, we restrict our analysis to observable aspects.We discuss confounding biases that arise due to commoncauses that affect both the inputs and outputs of interest. Inour setting, confounding biases can arise due to factors thataffect both artists and artworks. Based on the assumptionsencoded in the DAG, such confounders could include artovement, genres, art materials, etc. A model that does notconsider the influence of these confounders is prone to bias.For analysis, we consider the DAG provided by (Srini-vasan and Uchino 2020) as shown in Figure 2. We will usethis as a running example throughout the paper. Here, thevariable X denotes the artist, Z denotes the artwork, G isthe genre, M is the art material, and A denotes the art move-ment. In this setting, the problem of modeling artist’s stylecan be viewed as estimating the causal effect of X on Z .According to the assumptions encoded in this DAG, art ma-terial, genre, and art movement are confounders influencingboth the artist and the artwork. Further, art movement in-fluences the art material. Let us assume that all of the con-founders are observable. Under these assumptions, in orderto compute the causal effect of an artist on the artwork, wehave to block the backdoor path from X to Z , so as to re-move confounding bias.In order to block all backdoor paths in Figure 2, one hasto adjust for genre, art movement, and art material by con-ditioning on those variables. The following expression cap-tures the causal effect of X on Z for the graph in Figure2. CE = (cid:88) g,a,m P ( Z | x, G = g, A = a, M = m ) (1), where CE denotes causal effect of X on Z . The sum-mation (cid:80) g,a,m captures the adjustment across all possi-ble art movements, art materials, and genres that the artisthas worked, in order to model their style. The implicationof finding a sufficient set, A, G, M , is that stratifying on
A, G, M is guaranteed to remove all confounding bias rela-tive to the causal effect of X on Z .The above instance depicted a DAG without any unob-served confounders. However, in reality, there are many un-observed confounders such as artist’s memory, beliefs, andemotions. In the presence of unobserved confounders, thecausal effect of X on Z is not identifiable, implying thatthe true style of an artist cannot be modeled. The authors in(Srinivasan and Uchino 2020) illustrate this scenario with aDAG as shown in in Figure 3. For the purposes of this work,we will consider only observable confounders and demon-strate the confounding bias that is associated with (Zhu etal. 2017) in not considering the influence of confounders likeart movements to model artists’ styles.Art movements introduced techniques, materials, andthemes unique to the culture, society, geographic region,and the times during which these movements gainedprominence. Art movements were symbolic of histori-cal, religious, social, and political events of their times.Artists were heavily influenced by the style propagatedby the art movement. By not considering the influenceof art movement in modeling an artist’s style, the so-cial/cultural/religious/political significance associated withthe artwork may be lost, and the intent of the artwork maybe misrepresented. In the next section, we describe the pro-posed method for quantifying confounding bias. Figure 3: DAG with unobserved confounders. X: Artist, Z: Art-work, A: Art movement, G: Genre, M: Art material, E: Unobservedemotions of the artist. Dotted lines indicate unobserved variables.Image Source: [Srinivasan and Uchino 2020]
Our goal is to be able to quantify the confounding bias dueto the lack of consideration of art movement’s influence inmodeling artists’ styles. Thus, first we need to learn goodrepresentations of art movements.
The first step is to learn good representations of the imagesunder study with respect to art movements of interest. Wewill then use these representations to compute confoundingbias (see Section 5.2). We use RESNET50 architecture (Heet al. 2015) to learn classifiers for distinguishing Impression-ism from Post Impressionism. Then, we extract the learnedfeatures from the penultimate layer of the trained networkfor representing the art movements (please see Section 6.1).In order to learn accurate representations of the art move-ments under study, we must ensure diversity in the artworksbelonging to those art movements, i.e., we must considerartworks across genres and art material belonging to the artmovement or else we will be learning a biased representationof the art movement.In fact, as part of our experiments, we tried to learn artmovements fixing the genre, but this lowered the accuracy ofthe classifier; thus in order to learn reliable representationsof art movements, we need to consider all artworks (acrossgenres, materials, etc.) belonging to the art movement. Weuse RESNET50 (He et al. 2015) to learn features representa-tive of Impressionism and Post Impressionism. Confoundingbias in modeling styles of artists Monet, Cezanne, and vanGogh is computed using these learned features across multi-le genres such as landscapes, cityscapes, flower paintings,and still life. Next we describe the procedure for computa-tion of confounding bias.
We fix genre, and art material across all the images consid-ered so that we only have to adjust for art movement as aconfounder. However, this does not hurt the generalizabilityof the method. For multiple confounders, all the elements inthe minimum adjustment set have to be adjusted similar tothe adjustment of art movement described below.We leverage the concept of covariate matching in order toadjust for confounders. In our problem setting, we want tobe able to estimate the causal effect of an artist say X = i ,in the presence of a confounder, namely, art movement. Sup-pose the set of real artworks of the artist is denoted by A i andthe set of generated images of the artist (by the cycleGANmodel), is denoted by G i . Specifically, let A i = { a i , a i , ...a iK } (2), where K is the number of real artworks of artist i , and let G i = { g i , g i , ...g iL } (3), where L is the number of generated artworks of artist i , andlet A j = { a j , a j , ...a jR } (4), denote the R real artworks of an artist j belonging to thesame art movement as i , and belonging to the same genreas considered in the analysis. Since there can be more thanone artist belonging to the same art movement as i , assumethere are J such artists, so j ∈ (1 , , ..., J ) , j (cid:54) = i denotes allthese artists. Typically, artists are identified with specific artmovements, and such information can obtained from sourceslike (Wikiart 2020); the set A j can be constructed using thisinformation.First, for each element g il ∈ G i , its nearest neighbor a ilmatch in set A i is computed based on the values of theconfounders, i.e. features representative of the art movementobtained from (He et al. 2015). Note, that each element inthe sets A i , G i , A j is a 1000 dimensional vector. Next, foreach element in a ik ∈ A i , its nearest neighbor a jkmatch inset A j is computed. As all other potential confounders suchas genre and art material are fixed to be the same across allthe images considered, the difference in the correspondingmatches between sets A i and G i is a measure of the varia-tion in (lack of) modeling art movement and in modeling thespecific artist’s style. Similarly, any difference between thecorresponding matches between sets A i and A j is a mea-sure of variation across artists’ styles and art movements.In an ideal scenario where a generative model is able toaccurately learn the style of artist i considering the influ-ence of art movement, the difference between correspond-ing matches between the real and generated images, i.e., ( A i − G i ) should be close to 0. On the other hand, the dif-ference between matches across artists should be significantcompared to the difference between real and generated im-ages of an artist, i.e. ( A i − A j ) > ( A i − G i ) , this is becausedifferent artists have distinct styles of their own, assuming they do not mimic one another. Using these intuitions, wepropose the following metric to quantify confounding biasdue to lack of modeling art movement. bias = L (cid:80) l | a ilmatch − g il | J (cid:80) j K (cid:80) k | a jkmatch − a ik | (5) The aforementioned metric captures the two intuitions justdescribed. The numerator in the above equation captures theaverage difference between real artworks and generated im-ages for artist i across all the generated images. The denom-inator captures the average difference between real artworksof the artist under consideration and other artists belongingto the same art movement. The inner summation and aver-aging is normalizing with respect to the artist i , consideringall real artworks of i , and the outer summation and averag-ing is normalization with respect to all J artists belongingto the same art movement as i . When the generated imagesare similar to real artworks of i , the numerator is close to 0,this happens when art movement’s influence is modeled ac-curately (amongst other relevant factors) since we considerfeatures representative of art movement in capturing this dif-ference. In a similar vein, the denominator of the above met-ric will be high when the specific artist’s style is learnedcorrectly. So, a low value of the above metric denotes lowconfounding bias with respect to art movement. Note, thevalue of the metric can be greater than 1, in which case weassume that there is considerable confounding bias. We use Euclidean distance to compute matches. We alsotried other distances measures such as Manhattan distance,Chebychev distance, and Wassterstein’s distance. Across alldistance measures, we observed that the relative order of thebias scores remained the same, thus the metric is not sensi-tive to changes in the choice of distance measure.
In this section, we report results on computing confound-ing bias along with an interpretation of the same. We beginby describing experiments on learning representations of artmovements.
We train a RESNET50 (He et al. 2015) classifier to distin-guish between Impressionism and Post-Impressionism, theprominent art movements that were characteristic of artistsconsidered in the cycleGAN model. Specifically, we startwith the model pre-trained on Imagenet dataset and fine-tuneusing the art dataset under study. We then use the learnedfeatures from the penultimate layer of the trained model asrepresentations of the art movement, resulting in a 1000 di-mensional vector for each image. Note, any state-of-the-artarchitecture could be used in place of (He et al. 2015). Inorder to train the classifier to distinguish between Impres-sionism and Post Impressionism, we need to consider art-works across artists belonging to those art movements. FromWikiart 2020), we collected artworks belonging to artistswho were identified as belonging to these art movements,and whose majority of the works ( > ) belonged to Im-pressionism or Post Impressionism. This ensured collectingartworks representative of the concerned art movements.We thus collected about 5083 images belonging to Im-pressionism and about 3495 images belonging to Post Im-pressionism by crawling images from Wikiart. The datasetconsists of Impressionist artists like Berthe Morisot, EdgarDegas, Mary Cassatt, Childe Hassam, Anotonie Blanchard,Claude Monet, Gustave Caillebotte, Sorolla Joaquin, Kon-stavin Korovin, amongst others. Post Impressionist artistsincluded in the dataset are Vincent van Gogh, Paul Cezanne,Samuel Peploe, Moise Kisling, Ion Pacea, Pyotr Kon-chalovsky, Maurice Prendergast, Maxime Maufra, etc. Sam-ple illustration of the dataset is provided in Figure 1. Weused of the images for training and the rest for vali-dation. We obtained best validation accuracy of . withAdam optimizer, learning rate = 0.0001, and batch size = 50.Additionally, we conducted the experiments with othermodels such as RESNET34, VGG16, and EfficientNet B0-3 to check for any performance improvement. Except forEfficientNet B0-3 being computationally faster, there wasnot any significant improvement in validation accuracy, sowe resorted to RESNET50 features. It is to be noted thatPost Impressionism emerged as a reaction to Impressionism.Many artists such as Cezanne worked across these two artmovements. Due to these factors, these art movements havesubtle differences which are often hard to capture computa-tionally. Quite intuitively therefore, the validation accuracyis not very high. Nevertheless, these learned features serveas a baseline in capturing representations of art movements.Specifically, we used the features from the penultimate layerof the RESNET50. We considered various genres such as landscapes,cityscapes, flower paintings, and still life for our anal-ysis. Landscapes depict outdoor sceneries, cityscapes arerepresentations of houses, promenade, and prominent citystructures. Flower paintings represent a variety of flowersin vases, gardens, and ponds. Still life consists of imagesof fruits, vegetables, and other food articles. The DAG inFigure 2 can be used to depict the influence of these genreson the artist and artworks as all the relevant factors areencoded in the DAG. In Section 6, we describe why certainother genres such as portraits cannot be modeled using theDAG shown in Figure 2.For the artists under consideration namely, Paul Cezanne,Claude Monet, and Vincent van Gogh, we first obtainedreal artworks belonging to these genres from the Wikiartdataset. Thus, these images result in three sets A i , where i ∈ { cezanne, gogh, monet } corresponding to the threeartists under consideration. In obtaining these images, wefixed the art material to oil painting so that we do not have toadjust for this factor as a confounder. Next, we used randomimages from existing datasets such as Oxford flower dataset(Nilsback and Zisserman 2008), and additionally crawledfrom Google images to obtain images belonging to various G Cezanne Monet van GoghImp Post Imp Post Imp PostL .
75 0 .
78 2 .
52 2 . C . . F .
25 1 . S .
24 1 . Table 1:
Bias scores computed across genres with respect to Im-pressionism and Post Impressionism art movements. Blank entriesdenote cases in which there were not ample instances to com-pute the metric. Imp: Impressionism, Post: Post Impressionism, G:Genre, L: Landscape, C: Cityscapes, F: Flowers, S: Still life genres under consideration. We then used these images astest images to obtain corresponding generated artworks inthe styles of Cezanne, Monet, and van Gogh. These consti-tuted three sets G cezanne , G gogh and G monet . There wereroughly 60 test images in each genre.Next, the sets A j were constituted using images of otherartists who belonged to same art movement as the artist un-der consideration. As J , the number of such artists increases,we can get more reliable indicators of art movements, andthus confounding bias due to art movement will becomemore evident. It is to be noted that not all artists J neces-sarily had ample number of images in a particular genre.So, we only considered those artists who had more than 35images in a particular genre and art movement for analysiswithin genres. This is because using just a few images of aparticular genre by an artist does not help in quantifying biasreliably. For the same reason, confounding bias in modelingartists’ styles who had too few images in a particular genreand art movement, cannot be estimated. For example, thereare only two landscapes of van Gogh in the Impressionismstyle, and none for Monet in Post Impressionism, so it is notpossible to quantify for confounding bias in landscapes withrespect to Impressionism for van Gogh and Post Impression-ism for Monet. So, we report results for only those scenar-ios in which there were at least 35 artworks of the artist inthat particular genre. We then obtained feature descriptors(using the representations from the penultimate layer of theRESNET50 architecture) of the images in sets A i , G i , and A j using the learned representations of art movements. Con-founding bias was then computed using eq. (5). Table 1 liststhe values of this metric for various genres and artists. Blankentries denote cases where there were not ample instances tocompute the metric. The bias scores are mostly lower forCezanne who had worked across both Impressionism andPost Impressionism, whereas the scores are higher for vanGogh and Monet who had largely worked in Post Impres-sionism and Impressionism respectively. This observationsuggests that bias scores vary across artists based on thenumber of art movements influencing them.To verify, we conducted statistical hypothesis testing. Weset the null hypothesis as: the mean of bias scores is samefor artists who had worked across art movements and artistswho had worked largely in one art movement. Formally, weigure 4: : Top: Real artworks corresponding to the art movements and artists mentioned in the respective columns. Middle: Illustrations ofartworks generated by cycleGAN for the corresponding artists and art movements. Bottom: Corresponding photos used to generate imagesin the middle row. Spontaneous and accurate depiction of light along with its changing quality, an important characteristic of Impressionismis missing in the generated version (row 2, column 1 and 2). Expressive brushstrokes emphasizing geometric forms is missing in generatedimage corresponding to Post-Impressionist style of van Gogh. set the null hypothesis H as H : M s = M m (6), where M s denotes the mean of the confounding bias scoresfor artists who largely worked in a single art movement, and M m denotes the mean of the confounding bias scores forartists who had worked across multiple art movements. Thecorresponding alternate hypothesis H is set as H : M s (cid:54) = M m (7)As there were very few observations at our disposal, we usedthe non-parametric Wilcoxon signed ranked test. The nullhypothesis was rejected with a p -value of 0.033 ( α = 0 . ),thus showing that bias scores vary across artists based on thenumber of art movements influencing them. The aforementioned results can beinterpreted as follows. If an artist had worked across artmovements, then modeling the influence of art movementwould be less crucial in generating artworks according tothe artist’s style. This is because, there are artworks acrossart movements for such an artist, and thus there is a greaterchance of match between generated images and real imagesdue to the greater diversity and variation in the set of realimages of the artist. On the contrary, if an artist had workedprimarily in one art movement, then it is likely to observehigher bias if the influence of art movement is not consid-ered. This is because the generated images have to matchwith respect to specific art movement or else they will havegreater dissimilarity.To elaborate further, let us consider the genre of land-scapes. Figure 4 provides an illustration of real landscapes of Monet, van Gogh and Cezanne, cycleGAN generated land-scapes in the styles of these artists along with correspond-ing photos of the generated images. There are about 250landscapes by Monet in Impressionism style but none corre-sponding to Post Impressionism. Most of van Gogh’s land-scapes were set in the Post Impressionism style with just twoin the Impressionism style; there are about 35 landscapes byCezanne in Impressionism and 102 in Post Impressionism.Consider the photo in row 3 column 1. The correspondingcycleGAN generated image shown in row 2 column 1 doesnot exhibit the sharp colors of twilight shown in the photo,and alters the affect of the original photo. This is not inline with Impressionism which was characterized by sponta-neous and accurate depiction of light with its changing col-ors. Also, the generated image perhaps does not do justice tothe cognitive abilities of the artist; please see image in row1 column 1 that corresponds to a real landscape by Monetillustrating twilight in the outdoors, with shades of red. Infact, spontaneous and natural rendering of light and colorwas a distinct feature of Impressionism. In a similar vein,the generated images of van Gogh row 2, column 4 and 5 ex-hibit markedly different brushstrokes and texture comparedto the Post Impressionist works of van Gogh. Post Impres-sionism works of van Gogh were characterized by swirlingbrushstrokes, emphasizing geometric forms for an expres-sive effect.From Table 1, the bias score with respect to Monet is 2.52(Impressionism) and 2.96 with respect to van Gogh (PostImpressionism). On the contrary, the bias scores are lowerthan 1 for Cezanne who had worked across Impression-ism and Post Impressionism. Higher scores indicate greaterias thus corroborating with the fact that the bias is higherfor artists who were influenced by a single art movementas compared to those who were influenced by multiple artmovements.
In order to evaluate the effectiveness of the proposed met-ric, we compared it with a state-of-the-art outlier detectionmethod (Shastry and Oore 2020). Specifically, the authors in(Shastry and Oore 2020) propose to detect outliers by iden-tifying inconsistencies between activity patterns of the neu-ral network and predicted class. They characterize activitypatterns by Gram matrices and identify anomalies in Grammatrix values by comparing each value with its respectiverange observed over the training data. The method can beused with any pre-trained softmax classifier. Furthermore,the method neither requires access to outlier data for fine-tuning hyperparameters, nor does it require access for out ofdistribution for inferring parameters, and hence appropriatefor our comparison.First, we wanted to test if (Shastry and Oore 2020) candetect outliers with respect to real artworks belonging todifferent art movements. Across all genres, the best detec-tion accuracy of (Shastry and Oore 2020) in identifying out-liers with respect to Impressionism (i.e. in separating Postimpressionism real artworks from real Impressionism art-works) was just . . As Impressionism and post Im-pressionism were similar in many aspects, we then tested if(Shastry and Oore 2020) can detect outliers across art move-ments with marked differences such as in separating Roman-ticism and Realism from Impressionism and Post Impres-sionism. Even in this case, the best detection accuracy was . Finally, the best detection accuracy of (Shastry andOore 2020) in separating real artworks from generated art-works was . . Unlike (Shastry and Oore 2020), theproposed metric is more effective in capturing the influenceof art movements in modeling artists’ styles since the biasscores corresponding to artists who had largely worked in asingle art movement is significantly higher than those whohad worked across multiple art movements. In the next sec-tion, we also discuss the other benefits of the proposed biasmetric. In this section, we discuss a few other relevant questions inthe context of the above results.
The very goal of estimating confounding bias is to be ableto capture the drawbacks due to lack of modeling art move-ments. When images across art movements are combined,the fact that art movement is a potential confounder is ig-nored, thereby leading to biased representations. Thus, con-founding bias has to be computed with respect to Impres-sionism and Post Impressionism separately. Computing biasacross a combination of images from these two art move- ments is an illustration of “Simpson’s paradox” (Pearl andMackenzie 2018).Simpson’s paradox is a trend that characterizes the in-consistencies across different groups of the data. Specifi-cally, an effect that appears across different sub groups ofdata but that which gets reversed when the groups are com-bined illustrates Simpson’s paradox. In other words, Simp-son paradox refers to the effect that occurs when associationbetween two variables is different from the association be-tween the same two variable after controlling for other vari-ables. The correct result ( i.e. whether to consider aggregateddata or data corresponding to sub-groups) is dependent onthe causal graph characterizing the problem and data.The authors in (Pearl and Mackenzie 2018) illustrateSimpson’s paradox with several real examples. For example,the authors cite a study of thyroid disease published in 1995where smokers had a higher survival rate than non-smokers.However, the non-smokers had a higher survival rate in sixout of the seven age groups considered, and the differencewas minimal in the seventh. Age was a confounder of smok-ing and survival, and hence it had to be adjusted for. Thecorrect result corresponds to the one obtained after stratify-ing data by age, and thus it was concluded that smoking hada negative impact on survival.Let us revisit our example. According to the DAG in Fig-ure 2, art movement is a confounder which needs to be ad-justed for. If however, we overlook this confounder by com-bining images across art movements, then the confounderis not adjusted according to eq. (1). In fact, when we com-bined images across art movements, the resulting bias scorewas lower, however this result is incorrect, thus illustratingthe paradox.In cycleGAN (Zhu et al. 2017), the authors propose a cy-cle consistency loss such that the generated images whenmapped back to the original (real) images are indistinguish-able from the original images. This in turn implies that thegenerated images are as realistic as possible.Simpson’s paradox elucidates why cycleGAN that istrained on data combined across art movements and whoseloss function intuitively appears sound, cannot capture theinfluence of art movements. Because the loss is being mini-mized across images from different art movements, it is notguaranteed to minimize the loss within each art movement.Results in Table 1 and Figure 4 illustrate this point further.Thus, in order to accurately model artists’ styles, (Zhu et al.2017) had to minimize the loss proposed by stratifying thedata by art movement.
The computation of bias is based on the DAG provided. TheDAG considered in the case study is not applicable to othergenres such as portraits and genre art. Portraits, for exam-ple, involve many other factors in their creation. Character-istics such as gender, age, beauty, and other aesthetics playa prominent role in the way sitters are depicted. Also, fac-tors characterizing sitter’s lineage/genealogy (e.g. race, fam-ily, cultural background, religion, etc.) can also influence theendition. The social standing of the sitter such as their pro-fession, political backgrounds, and power could influencethe artists in the way they depict the sitter. For example, itis possible that powerful people commanded the artists todepict them in a certain way, and the artists thereby had toexaggerate certain characteristics.Genre art depicted everyday aspects of ordinary peo-ple. These artworks encompassed a variety of socio-culturalthemes such as cooking, harvesting, dancing, etc. Thereforegenre art involves many socio-cultural factors that the DAGconsidered in the case study does not entail. Thus, for com-putation of bias in other genres, appropriate DAGs have tobe constructed in consultation with art historians, taking intoaccount all relevant variables of interest.
As discussed in the previous sections, the proposed metric isuseful in quantifying the confounding bias associated withgenerative AI methods that fail to consider art movementsand other confounders in modeling artists’ styles.Such an objective assessment of bias can also be usefulin authenticating artworks, i.e. the computed bias scores canaid in verifying if an artwork was a genuine creation of a par-ticular artist. This is because, if an artwork is not a real workof an artist, then the bias score associated with such a work islikely to be higher, and similarly, if the artwork is a genuinecreation of the artist, then the bias score is likely to be lower.It is to be noted that we are not claiming that the bias scorealone is sufficient to validate the authenticity of an artwork;instead, we believe it can be beneficial in assessing the au-thenticity of artworks along with other forms of evidences,including those of art historians. A related application of theproposed metric would be for price assessment of generativeart. In other words, the computed bias scores can serve as ameasure of the selling price/value of a generative artwork. Ifthe bias score is high, then the value of generative artwork islikely to be low, and vice versa.Finally, the proposed metric can also aid in the study ofart history. The computed bias scores can provide an inde-pendent and complementary source of evidence to art histo-rians to verify their assumptions or opinions regarding var-ious topics of interest such as in understanding character-istics of art movements, and in studying influence of spe-cific art materials on artists. By considering different DAGsthat encode assumptions of different art historians, it is alsopossible to compare perspectives and understand if there aresources of bias that are common across assumptions of dif-ferent art historians. Such common bias sources will thenserve as a strong evidence for art historians in accepting orrejecting a viewpoint.
Art movements influenced the style of artists in many sub-tle ways. Overlooking the contribution of art movements inmodeling artists’ styles leads to confounding bias. In real-ity, there are several unobserved factors such as emotions,and beliefs that characterize an artist’s style. Thus, it is not possible to computationally model an artist’s style. In do-ing so, generative art might be stereotyping artists based ona narrow metric such as color or brush strokes, and not dojustice to the artist’s abilities. Furthermore, generated art-works might accentuate automation bias by conveying in-accurate information about socio-political-cultural aspectsdue to their inability in capturing the nuances depicted in artmovements. In this work, leveraging directed acyclic graphs,we proposed a simple metric to quantify confounding biasdue to the lack of modeling art movement’s influence inlearning artists’ styles. We analyzed this confounding biasacross genres for artists considered in the cycleGAN model,and provided an intuitive interpretation of the bias scores.We hope our work triggers discussions related to feasibilityof modeling artists’ styles, and more broadly raises issuesrelated to accountability of AI-simulated artists’ styles.
References [AIportraits 2020] AIportraits. 2020. Aiportraits: Theeasiest way to make your portraits look stunning. https://aiportraits.org .[Artbreeder 2020] Artbreeder. 2020. Artbreeder: Extendyour imagination. .[Buch, Ahmed, and Maruthappu 2018] Buch, V. H.; Ahmed,I.; and Maruthappu, M. 2018. Artificial intelligence inmedicine: current trends and future possibilities.
BritishJournal of General Practice
FAccT .[Crawford and Paglen 2019] Crawford, K., and Paglen, T.2019. Excavating ai: The politics of images in machinelearning training sets. .[Elgammal et al. 2017] Elgammal, A.; Liu, B.; Elhoseiny,M.; and Mazzone, M. 2017. Can: Creative adversarial net-works, generating ”art” by learning about styles and deviat-ing from style norms.
International Conference on Compu-tational Creativity (ICCC) .[Engel et al. 2019] Engel, J.; Agrawal, K. K.; Chen, S.; Gul-rajani, I.; Donahue, C.; and Roberts, A. 2019. Gansynth:Adversarial neural audio synthesis.
International Confer-ence on Learning Representations .[Feldstein 2019] Feldstein, S. 2019. The global expansionof ai surveillance.
Carnegie Endowment for InternationalPeace .[Gatys, Ecker, and Bethge 2016] Gatys, L. A.; Ecker, A. S.;and Bethge, M. 2016. Image style transfer using convolu-tional neural networks.
Computer Vision and Pattern Recog-nition .[GoART 2020] GoART. 2020. Goart: Ai photoeffects. http://goart.fotor.com.s3-website-us-west-2.amazonaws.com .[Hassine and Neeman 2019] Hassine, T., and Neeman, Z.2019. The zombification of art history: How ai resurrectsdead masters, and perpetuates historical biases.
CITAR
ArXiv .[Hertzmann 2018] Hertzmann, A. 2018. Can computers cre-ate art?
ArXiv .[Jain et al. 2020] Jain, N.; Olmo, A.; Sengupta, S.;Manikonda, L.; and Kambhampati, S. 2020. Imper-fect imaganation: Implications of gans exacerbating biaseson facial data augmentation and snapchat selfie lenses.
ArXiv .[Jandial et al. 2020] Jandial, S.; Chopra, A.; Ayush, K.; He-mani, M.; Kumar, A.; and Krishnamurthy, B. 2020.Sievenet: A unified framework for robust image-based vir-tual try-on.
WACV .[Lin 2019] Lin, T. C. 2019. Artificial intelligence, finance,and the law.
Fordham Law Review
ACM Multimedia .[Lum, Boudin, and Price 2020] Lum, K.; Boudin, C.; andPrice, M. 2020. The impact of overbooking on a pre-trialrisk assessment tool.
FAccT https://experiments.withgoogle.com/cartoonify .[Miller 2019] Miller, A. 2019. The artist in the machine theworld of ai-powered creativity.
MIT Press .[Mordvintsev, Olah, and Tyka 2015] Mordvintsev, A.;Olah, C.; and Tyka, M. 2015. Deep dream. https://github.com/google/deepdream .[Nilsback and Zisserman 2008] Nilsback, M.-E., and Zisser-man, A. 2008. Automated flower classification over a largenumber of classes.
ICVGIP .[Obermeyer et al. 2019] Obermeyer, Z.; Powers, B.; Vogeli,C.; and Mullainathan, S. 2019. Dissecting racial bias in analgorithm used to manage the health of populations.
Science .[Oxford-Art-Online 2021] Oxford-Art-Online. 2021. Im-pressionism and post-impressionism.
Oxford Art Online .[Pearl and Mackenzie 2018] Pearl, J., and Mackenzie, D.2018. The book of why: The new science of cause and ef-fect.
Basic Books, New York .[Pearl 2009] Pearl, J. 2009. Causality: Models, reasoningand inference, 2nd edition.
Cambridge University Press .[Pettee et al. 2019] Pettee, M.; Shimmin, C.; Duhaime, D.;and Vidrin, I. 2019. Beyond imitation: Generative and vari-ational choreography via machine learning.
InternationalConference on Computational Creativity .[Prates, Avelar, and Lamb 2019] Prates, M.; Avelar, P.; andLamb, L. 2019. Assessing gender bias in machine transla-tion: a case study with google translate.
Neural Computingand Applications . [Rabb and Brown 1986] Rabb, T. K., and Brown, J. 1986.The evidence of art: Images and meaning in history.
TheJournal of Interdisciplinary History
FAccT
ICML .[Skitka, Mosier, and Burdick 1999] Skitka, L.; Mosier, K.;and Burdick, M. 1999. Does automation bias decision-making?
International Journal of Human-Computer Stud-ies .[Srinivasan and Uchino 2020] Srinivasan, R., and Uchino, K.2020. Biases in ai generated art—a causal look from the lensof art history.
ArXiv .[Stuart 2010] Stuart, E. A. 2010. Matching methods forcausal inference: A review and a look forward.
StatisticalScience https://mashable.com/article/ai-portrait-generator-pocs/ .[Tan et al. 2017] Tan, W. R.; Chan, C. S.; Aguirre, H.; andTanaka, K. 2017. Artgan: Artwork synthesis with condi-tional categorical gans.
ArXiV .[Varshney et al. 2019] Varshney, L. R.; Pinel, F.; Varshney,K. R.; Bhattacharjya, D.; Sch¨orgendorfer, A.; and Chee, Y.2019. A big data approach to computational creativity: Thecurious case of chef watson.
IBM Journal of Research andDevelopment
Venture Beat .[Wikiart 2020] Wikiart. 2020. Visual art encyclopedia. .[Zhu et al. 2017] Zhu, J.-Y.; Park, T.; Isola, P.; and Efros,A. A. 2017. Unpaired image-to-image translation usingcycle-consistent adversarial networks.