[PDF] Could you become more credible by being White? Assessing Impact of Race on Credibility with Deepfakes

Abstract

Computer mediated conversations (e.g., videoconferencing) is now the new mainstream media. How would credibility be impacted if one could change their race on the fly in these environments? We propose an approach using Deepfakes and a supporting GAN architecture to isolate visual features and alter racial perception. We then crowd-sourced over 800 survey responses to measure how credibility was influenced by changing the perceived race. We evaluate the effect of showing a still image of a Black person versus a still image of a White person using the same audio clip for each survey. We also test the effect of showing either an original video or an altered video where the appearance of the person in the original video is modified to appear more White. We measure credibility as the percent of participant responses who believed the speaker was telling the truth. We found that changing the race of a person in a static image has negligible impact on credibility. However, the same manipulation of race on a video increases credibility significantly (61\% to 73\% with p < 0.05). Furthermore, a VADER sentiment analysis over the free response survey questions reveals that more positive sentiment is used to justify the credibility of a White individual in a video.

Full PDF

CCould you become more credible by being White? Assessing Impact of Race onCredibility with Deepfakes

Kurtis Haut, Caleb Wohn, Victor Antony,

Aidan Goldfarb, Melissa Welsh, Dillanie Sumanthiran, Ji-ze JangMd. Rafayet Ali and Ehsan HoqueUniversity of RochesterDeptment of Computer Science

Abstract

Computer mediated conversations (e.g., videoconferencing)is now the new mainstream media. How would credibility beimpacted if one could change their race on the ﬂy in theseenvironments? We propose an approach using Deepfakes anda supporting GAN architecture to isolate visual features andalter racial perception. We then crowd-sourced over 800 sur-vey responses to measure how credibility was inﬂuenced bychanging the perceived race. We evaluate the effect of show-ing a still image of a Black person versus a still image of aWhite person using the same audio clip for each survey. Wealso test the effect of showing either an original video or an al-tered video where the appearance of the person in the originalvideo is modiﬁed to appear more White. We measure credi-bility as the percent of participant responses who believed thespeaker was telling the truth. We found that changing the raceof a person in a static image has negligible impact on credi-bility. However, the same manipulation of race on a video in-creases credibility signiﬁcantly (61% to 73% with p < Introduction

Manipulating race to measure its impact has been exten-sively studied in the past. It has been shown that having aBlack sounding name puts you at a disadvantage for em-ployment opportunities compared to having a White sound-ing name on identical resumes (Bertrand and Mullainathan2004). Avatars offer a high degree of predictive customiza-tion and have been utilized to research impacts of race invirtual reality (Groom, Bailenson, and Nass 2009) and creat-ing racial empathy that extends to the real world (Peck et al.2013). Although names and avatars are both effective waysto create racial perceptions, in the domain of credibility, itremains unclear whether the same ﬁndings may extend tosubtle manipulation of images and videos. The recent ad-vancements of AI now allows precise manipulation of skintone, hair type, eye color, and facial features. This opens upan exciting opportunity to probe for race and measure its im-pact in a way that wasn’t possible before. In this paper, we evaluate racial perceptions by leveraging Deepfakes and asupporting GAN architecture to manipulate facial featuresand skin color while holding other variables such as accentand clothing constant. This allows us to directly measurethe effect of those racial changes. We apply our techniqueson a naturalistic dataset on deception collected using theADDR framework where the participants engage in an ac-tivity on a video call (Sen et al. 2018). Given that the datasetis made of clips from actual video calls recorded by the web-cam, we hope that our ﬁndings can provide insights for realworld videoconferencing situations such as sales, negotia-tions, job interviews, telemedicine and virtual trials. Virtualtrials have become especially important during the COVID-19 pandemic as courts in the US in all 50 states have begunto hold online trials using video conferencing (Branch 2020;Coie 2020). Virtual testimony is perceived to be less credi-ble that in-person testimony. (Landstr¨om and Granhag 2010;Landstr¨om, Granhag, and Hartwig 2005; Walsh and Walsh). There could be many reasons for this, but understand-ing whether perception of race is contributing to this phe-nomenon may help future court proceedings. Even for lowstakes video calls, credibility assessments of our conver-sational partners impact communication. Therefore, under-standing how racial perceptions inﬂuence credibility in thesedomains matters. Perception of race is a complex issue com-posed of variables such as class, education, accent, skin tone,etc. In this paper, we look at racial perceptions through thelens of modifying visual representation (skin tone and facialfeatures).We designed an experiment consisting of 4 surveys (seeﬁg. 1 to begin the process of answering these research ques-tions:• How do racial perceptions in videos and images inﬂuencean individual’s credibility?• What public sentiments are associated with the credibilityof a perceived race?To assess credibility, we employ Amazon MechanicalTurk (AMT) to crowd-source responses for our surveys. Anonline worker for AMT is called a mTurker who acceptsa Human Information Task (HIT) and is paid for complet-ing the human labor associated with that HIT. In our case,a mTurker watches a video or listens to an audio with astill image, then completes a survey. In our surveys, we takeigure 1: a) Depicts the image condition. We trained a CycleGAN using Chicago Face Dataset to generate unambiguousracial images of a Black and White man. We launched two surveys on Amazon Mechanical Turk (denoted by AMT HIT) andcompared the responses. Both surveys have the same audio of the transcript shown. b) depicts the video condition. We took anoriginal video and made subtle racial changes to the face area using deepfacelab. We launched two surveys as in a) and comparethe responses. The transcript of what the speaker said is shown and both videos have identical audio.credibility to mean the percent of participants who believedthat the speaker in the video or audio was telling the truth.In total, we have 4 surveys with 800 total responses from theparticipants (For demographic information see ﬁg. 2). Theﬁrst 2 surveys test the condition of what happens when a dif-ferent still image is shown to the participant while the sameaudio ﬁle is played, i.e., for one survey an image of a Blackperson is shown to the participant while in the other survey,an image of a White person is shown to the participant (seeﬁg. 1a). The next 2 surveys test the condition of showing adifferent video to the participants. In one survey, a video isshown. In the other survey, we show an altered version ofthe same video where the only difference is we make thespeaker appear White (see Fig.1b). We used the CycleGAN(Zhu et al. 2017a) to generate image mappings of White toBlack and vice versa to test the image condition. For thevideo condition, we used an encoder decoder architectureto perform a face swap; essentially changing the visual rep-resentation of a South Asian person to appear more White(in terms of skin tone, facial features) using DeepFaceLab(Perov et al. 2020). For each of the four surveys, 200 partic-ipants (total 800) responded. We then performed statisticalanalysis on the survey responses to determine if the credibil- ity differences for each condition were meaningful.In summary, we ﬁnd and recommend the following:• Those who believed the person was White in the alteredvideo were more likely to say the person was telling thetruth.• There is no statistical difference in credibility between theimage conditions when participants listened to the sameaudio.• Participants described the altered video using words with more positive sentiment in the free response survey ques-tions.• Subtle modiﬁcations to alter racial perception may impactcredibility in video recordings from prior videoconferenc-ing sessions.Although creating a high quality Deepfake in real-timeduring a videoconferencing session is not available rightnow, in the near future this will be possible. Regardlessof the situation surrounding the video call, it is likely thatpeople may have the ability to change their perceived race(users can already change their gender or skin complexionon Snapchat). Our ﬁndings indicate that this action couldave an effect on the credibility of that person. We make thefollowing recommendation to video-conferencing or socialmedia companies, should they decide to enable this featurein the future:• Complete transparency and full disclosure of the racechange to all parties on the video call. (e.g., they mustbe notiﬁed and there should be an identifying mark on thewindow of that persons video during the call).

Background

Racial Biases within Societies

Prior studies within the context of racial discrimination aimto tackle the problem solely through using sociological met-rics of quantiﬁcation. Perception of racial identity has beenobserved to depend on various aspects of race, includingskin color (Maddox and Gray 2002) and accents (Dixon,Mahoney, and Cocks 2002). When assessing racial discrim-ination in context of skin color, prior works illustrate howpeople of different races tend to distrust each other themost (Smith 2010). The level of racial discrimination caneven be broken down further into the skin color gradient(Leonardo 2004). For example, even when race is held as aconstant and only skin color ﬂuctuates, dark-skinned Blackswere 11 times more likely to experience racial discrimina-tion than their light-skinned counterparts (Klonoff and Lan-drine 2000). These studies suggest that skin color alonemay cause a person to be discriminated against. In addi-tion to skin color and race, the accent of a person can alsohave a signiﬁcant inﬂuence on how others perceive them.This is often found in incidents involving people who speakAfrican American Vernacular English (AAVE). There havebeen many court cases in which the accent of African Amer-icans negatively inﬂuence the outcome of a trial (Rickfordand King 2016). Moreover, people with a non-native accentmay cause listeners to perceive them as being less credi-ble because their accents serve as a signal or because lis-teners ﬁnd their speech to be difﬁcult to process (Lev-Ariand Keysar 2010). Through their work, Shiri Lev-Ari andBoaz Keysar showed that participants in their study per-ceived trivia statements such as “Ants don’t sleep” to be trueless frequently when spoken by a non-native English speakerthan a native speaker. This lends credence to the idea thatan individual’s socio-political background may have a sig-niﬁcant impact on how credible they are perceived to be.Similarly, G¨unaydin et al demonstrated that objective fa-cial resemblance to a signiﬁcant other inﬂuences snap judg-ments of liking automatically, effortlessly, and without con-scious awareness (G¨unaydin et al. 2012). This yields furtherevidence to the possibility that race may impact credibil-ity given the large proportions of individuals who grow upin communities which are segregated or dominated by oneracial group.However, there is very limited information in relevant lit-erature assessing the impacts of perceived race on credi-bility through modifying visual representation exclusively.This could be due to the fact that one can not easily changetheir physical features of race nor can they replicate life ex-perience to measure the effect of those changes. However, when we communicate online with images or videos, theseconstraints do not exist. Because we can modify visual rep-resentation algorithmicly on the computer, we can researchthe effect this has on credibility when communicating faceto face over digital media.

Employing AI for Investigating Racial Bias

In order to more objectively assess how skin color and facialfeatures effect one’s credibility, techniques such as CycleGANs and Deepfakes are the most suitable ones. Those tech-niques can keep features such as accent, clothing, gestures,and facial expressions constant. Cycle GANs have previ-ously been used within research applications to make imageprocessing training sets more inclusive to various skin tones.In the work of “Fairness GAN”, Sattigeri illustrates the Cy-cleGAN’s power in constructing an extension to the Celeb-faces Attributes to be demographically inclusive, one por-tion being skintone(Sattigeri et al. 2018). Other work withCycle GAN looks at applications such as artiﬁcial makeup,which can vary in degree of heaviness when applying syn-thetic makeup (Chen et al. 2019). Deepfakes also hold sig-niﬁcant value in manipulating skin tone and facial featuresof videos. Although researchers have worked on creating al-gorithms to detect computer manipulated images and videos(Korshunov and Marcel 2018), and humans are likely to bemore accurate at this task at present(Korshunov and Marcel2020), the future is anything but certain. These algorithmscontinue to improve, generating more realistic images at astartling pace. In 2019, Karras et al introduced StyleGANwhich demonstrated an innovative architecture able to gen-erate more realistic images (Karras, Laine, and Aila 2019).In this paper, we leverage Deepfakes and Cycle-GANs tocreate subtle racial changes to help understand the nuancesof race perception and credibility in video call environments.

Methods

The credibility and associated language sentiment for dif-ferent racial perceptions was measured using responsesfrom 800 participants recruited from Amazon MechanicalTurk. Generally, mTurkers form a more diverse populationthan the standard internet population or the populations ofcollege students typically used in laboratory experiments(Buhrmester, Kwang, and Gosling 2016)(Horton, Rand, andZeckhauser 2011)(Paolacci, Chandler, and Ipeirotis 2010).These participants were given one of the four surveys whichwere designed to measure two conditions; image and video.Among the 800 participants 305 gave us their demographicinformation (shown in Fig. 2a, 2b and 2c). These partici-pants were compensated 2 dollars for their time completingthe surveys.

Design of Experiment

Credibility Ground Truths

We used one audio clip and one video clip from the pub-licly available UR Lying dataset, collected using the ADDRframework (Sen et al. 2018). Each clip is 30 seconds inlength and encompasses the speaker answering a question

What was your image? . The speakers in the video are shown a) Age Distribution(b) Race(c) Gender Identity

Country Count PercentUS 263 86.23%Brazil 16 5.25%India 6 1.97%Italy 4 1.31%Bangladesh 2 0.66%China 2 0.66%Ireland 2 0.66%Spain 2 0.66%U.K. 2 0.66%Argentina 1 0.33%Bahamas 1 0.33%Canada 1 0.33%Hong Kong 1 0.33%Indonesia 1 0.33%Mexico 1 0.33% (d) National Origin

Figure 2: Self-reported demographic information of 305 par-ticipants an image prior to asking this question. The ADDR frame-work instructed the speaker to lie or tell the truth about theirimage. In this particular instance, the speakers in both im-age condition and in video condition described their imageas it is (i.e., telling the truth). From the audio and videorecordings, we designed four separate surveys to test twoconditions. Two surveys tested an image condition and twosurveys tested a video condition. We gathered a total of 800responses from the four surveys each survey having 200 par-ticipants.

The Image Condition

We were interested to see how stillimages of people from different races paired with the sameaudio can inﬂuence credibility. The still image was designedto elicit a speciﬁc racial perception. We were interested incomparing credibility for a White person versus a Black per-son. Instead of using a real image from the dataset, whichintroduces noise such as clothing style and features uniqueto an individual of that race, we generated our own genericrace representations to produce racial mappings of Black toWhite and vice versa as shown in ﬁg. 1a. We trained a Cy-cleGAN using 4000 images from the Chicago Face Dataset(CFD) which learned how to conceivably change complexracial features such as eye color, lip shape, hair type andskin color (Zhu et al. 2017b). The participants either saw theWhite image in the survey or the Black image in the othersurvey, while listening to the same audio. We then comparedthe responses from these two surveys to see if the still imageshown to the participants inﬂuenced the speakers credibilityor attributed language sentiment.

Video Condition

While the image condition created clear,unambiguous representations of race, the video conditionexplores the nuances of racial perception by making sub-tle modiﬁcations to the facial region. In this condition, par-ticipants either watched the original video in one survey oran altered video in the other survey. In the altered video, theperson from the original video is made to appear more Whiteusing a DeepFaceLab (Perov et al. 2020) face mapping. Wedo it by taking the face of a White participant from the URLying database and map it onto the face of a non white par-ticipant as seen in ﬁg 1b. We then create 2 surveys where onesurvey shows the participant the original video and the othershows the altered video. We then compare the responses onthe credibility and language sentiment of the speaker.

Survey

To measure the effect of an individual’s perceivedrace on that individual’s credibility we created four indepen-dent surveys. Two of the surveys played an audio ﬁle fromthe UR Lying Database accompanied with either a Black orWhite still image. The other two surveys played either anoriginal video from the UR Lying database who is SouthAsian or an altered video where the person is made to ap-pear more White. For each survey, we collect informationon whether the participant thought the speaker was lying ortelling the truth and ask the participant to justify his/her an-swer using a free response. The questions we asked the par-ticipants remain the same in each survey and can be foundin table 1. All together, we collect 200 responses per surveytotaling 800 responses.able 1: Questions for the surveys

Questions

Please brieﬂy (1 or 2 sentences) describe what the personin the video said.Do you think the speaker was telling the truth?How conﬁdent are you about the whether the speakerwas lying/telling the truth? (10 means you are certain,1 means you have no idea)What made you think the person was lying or telling thetruth?Use a few adjectives to describe the characteristics of thespeakerWhat race do you think the speaker is?What do you think the socioeconomic status of thespeaker is?Do you think the video was authentic?What is your age?What do you consider to be your gender?Where did you grow up? (i.e. where did you spend mostof your ﬁrst 12 years?)What do you consider to be your socioeconomic class?What do you consider to be your race?

Analysis

Pre-processing to Identify Perceived Race

We includeda question asking the participants to identify the race of thespeaker with the goal of detecting the perceived race. Sinceour research question concerns how perceptions of race in-ﬂuence credibility, we had to ﬁlter out responses which didnot ”correctly” identify the race (i.e., ”Black” in the case ofthe audio with a still image of a Black man, ”White” in thecase of the audio with a still image of a White man, ”Asian”in the case of the original video of a South Asian speaker,and ”White” in the case of the altered video). Thus, after thepre-processing step is complete in our analysis pipeline, weare left only with the responses who perceived the intendedrace for each survey.

Assessing perceived credibility

We asked the participantswhether or not they thought the speaker was telling the truthin each survey in a binary fashion (i.e., yes or no). For theimage condition, we analyzed the responses from the twosurveys where this image is either Black or White as de-scribed earlier. We took the number of respondents whothought the speaker was telling the truth in both cases andcompare them against each other using a proportions Z-test.Similarly, for the video condition, we analyze the responsesfrom the two surveys where the video is either the origi-nal or altered video as described earlier. Following the sameroutine for the image condition, we take the number whothought the speaker was telling the truth in both surveys andcompare them using a proportions Z-test.

Analyzing Sentiment

We also asked the participants togive their rationale for why they thought the speaker waslying or telling the truth as well as to give a few adjectives todescribe the characteristics of the speaker in a free responseform. We analyzed these text-ﬁeld questions using a VADER sentiment analysis (Gilbert and Hutto 2014). VADER usessimple rules and a lexicon built by averaging sentiment anal-ysis done by participants from Amazon Mechanical Turk.Its heuristic rules were designed based on statistical analy-sis of tweets, and it was tested against other models acrossmultiple domains using AMT to generate a “wisdom of thecrowd” ground truth. A major reason why we chose VADERis because VADER was created and tested using AMT sur-veys (a population similar to ours). We ﬁrst perform a com-pound sentiment analysis on the text of each response, whichoutputs a number between -1 and +1 for that response. Thusfor each survey, we now have an array of values between -1and +1. We take the arrays from the two surveys associatedwith the image condition and compare them against eachother using a Mann Whitney U test. Likewise for the videocondition, we take the arrays from the two surveys for thatcondition and compare them against each other. This allowsus to evaluate whether more positive sentiment is associatedwith a speciﬁc racial perception. We do the same sentimentanalysis over the responses for the question which asked theparticipants to use a few adjectives to describe the character-istics of the speaker.

Results

Fig. 3a, 3b, 3c and 3d show how the pre-processing was doneon the responses for the analysis. Because we are interestedin how racial perceptions inﬂuence credibility over digitalmedia, we analyze the responses who believed the intendedracial perception as described in the pre-processing section.

Credibility Analysis

We take credibility to mean the percent of respondents whosaid that the speaker was telling the truth. In the image con-dition, 72.3% believed the speaker was telling the truth whenan image of a White person was shown vs 70.3% when animage of a Black person was shown. There was no statisticalsigniﬁcant difference in this condition. But in the video con-dition, 61.0% of participants believed the speaker was tellingthe truth after watching the original video of the darker-skinned South Asian speaker, and 73.0% of participants be-lieved the speaker was telling the truth after watching thealtered video featuring the White speaker. This differencewas signiﬁcant ( p < . ). Table 2 shows these results.Table 2: Summary of Results. “n” is the number of re-sponses after pre-processing. “Credibility” is the percent ofresponses that believed the speaker was telling the truth. *indicates p < .05. Survey n Credibility

Black Image 138 70.29%White Image 159 72.33%Original Video (South Asian) 123 *60.98%Altered Video (White) 63 73.02%

Sentiment Analysis

We analyzed the responses to the open-ended questions, “What made you think the person was lying or telling the a) Participants Race Responses for Black Image(b) Participants Race Responses for White Image(c) Participants Race Responses for Original Video(d) Participants Race Responses for Altered Video

Figure 3: Results for the racial perceptions of the partici-pants from the pre-proccessing question “What race do youthink the speaker is?” truth?” and “Use a few adjectives to describe the charac-teristics of the speaker” . We used the compound sentimentscore from VADER, which measures sentiment on a scalefrom -1 to +1, with +1 indicating the text is very positive.The results are contained in Figures 4 and 5.

Survey Comparison VADER Sentiment

Black Image Overall 0.13White Image Overall 0.149Black Image Truth 0.22White Image Truth 0.24Black Image Lie -0.098White Image Lie -0.088Original Video Overall **0.061Altered Video Overall 0.236Original Video Truth **0.19Altered Video Truth 0.359Original Video Lie -0.145Altered Video Lie -0.133Figure 4: Results from VADER compound sentiment anal-ysis of responses to the question “What made you think theperson was lying or telling the truth?” . Overall indicates themean of the compound sentiment regardless of whether theparticipants believed the speaker or not. ‘Truth’ indicatesthose respondents who thought the speaker was telling thetruth and ‘Lie’ indicates those participants who thought thespeaker was lying. *Indicates p < < < < < urvey Comparison VADER Sentiment Black Image Overall *0.342White Image Overall 0.241Black Image Truth *0.451White Image Truth 0.367Black Image Lie *0.07White Image Lie -0.088Original Video Overall 0.163Altered Video Overall 0.221Original Video Truth 0.34Altered Video Truth 0.325Original Video Lie -0.159Altered Video Lie -0.05Figure 5: Results from VADER compound sentiment analy-sis of responses to the question “Use a few adjectives to de-scribe the characteristics of the speaker?”. Overall indicatesthe mean of the compound sentiment regardless of whetherthe participants believed the speaker or not. ‘Truth’ indicatesthose respondents who thought the speaker was telling thetruth and ‘Lie’ indicates those participants who thought thespeaker was lying. *Indicates p < < Discussion

Nuances of Racial Perception Could RevealUnconscious Bias

Our decision to use still images of a White person versus aBlack person was done to make the perception of race lessambiguous. This can be seen by looking at the distributionof survey responses in ﬁgure 3. It may seem like making thedifferentiation of race more clear would enable measuringthe effect on credibility more accurate. However, when thisracial difference is accentuated, it is also possible that thesurvey participants compensate their credibility evaluationsbased on their awareness of what is being measured due tothe Hawthorne effect (Wickstr¨om and Bendix 2000).For this reason, we also choose to do a more nuancedracial modiﬁcation with the video condition. In this condi-tion, the racial perception is much more ambiguous. Thiscan be seen by the distribution of the survey responses inﬁgure 3. We speculated, that perhaps if the racial changeswere subtle enough they could even go unnoticed. Thiscould allow unconscious bias to inﬂuence the survey re-sponses accessing credibility. It remains part of our future work to probe and compare different racial ﬁlters includinga bi-racial White and Black speaker for the video condition.More work is needed to also explore the bi-directional alter-nation of races to understand the whole spectrum of racialperception.

Image Condition Versus Video ConditionCredibility Comparison

We did not observe a difference in credibility in the still im-age condition compared to the video condition. Our originalhypothesis was changing the racial perception of the speakerto be White would increase their credibility. Although wesee some evidence of that effect with a 70.3% credibility rat-ing with the Black still image shown compared to a 72.3%credibility rating with the still White image shown, this dif-ference is not signiﬁcant.Yet with the video condition, we see that making thespeaker appear White does have a tangible inﬂuence ontheir credibility rating (61.0% increased to 73.0% when theparticipants perceived the speaker as White). This providessome evidence that appearing more White may increasecredibility in videoconferencing platforms. However, thatconclusion does not explain why we see an increase in credi-bility for the video condition but not for the image condition.It is possible that for the image condition with two raciallypronounced pictures, the respondents were more aware ofthe racial bias and didn’t allow race to inﬂuence their judge-ment. However, the video condition included a more nu-anced alteration along with nonverbal dynamics for the re-spondents to process towards making a decision. As a result,it is possible for race to become a factor towards measuringcredibility.

Comparing Feelings and Sentiment Towards Race

In order to see if perceived race had an effect on senti-ment, we compared the responses for surveys that are asso-ciated with each condition (image and video). For each con-dition, we measured and compared the overall sentiment ofthe open-ended text responses. We observed no differencesin sentiment for the image condition given by the partici-pants for their justiﬁcation of speaker credibility. However,for the video condition, again we see statistically signiﬁcantdifferences. We found more positive sentiment overall in thewords used by participants to justify their decision and thatthis difference is more pronounced when comparing truth-ful text responses. Given that we observed higher sentimentscores in the altered video, it was likely due to the partici-pants being under the impression that the speaker was White.Yet as we do not see this mirrored in the image condition,there could be more to the story.When we compared the text responses describing thecharacteristics of the speaker, we found higher sentiment inthe Black image survey compared to the White image sur-vey. We found this difference is signiﬁcant regardless of ifthe participants believed the speaker was telling the truth ornot. The higher overall sentiment attributed to the speakerperceived to be Black indicates the Hawthorne effect at workdue to its contradictory nature. Because the positive sen-timent is not reﬂected in the justiﬁcation for the decisionade regarding the truthfulness of the speaker, nor is thecredibility rating higher for the Black speaker, it is possi-ble that the positive adjectives used by the participants areinsincere. Looking at the demographics, a majority of theparticipants are young, white males (see ﬁg. 2). This partic-ular age/race group, given the current political climate, maybe exercising caution when addressing the Black commu-nity speciﬁcally. A careless choice of words may be viewedas insensitive or could potentially result in being labeled as aWhite supremacist or racist. In the case of the mTurker, sucha complaint could have their license suspended and causethem to be out of work. As a result, for the image condition,it is possible that the participant is cognizant of what is beingmeasured and therefore adjusts his/her behavior accordingly.However, with the video condition, due to the subtleties ofthe racial adjustments, perhaps we observe the unconsciousbias associated with the participants preferring the speakerwhen perceived to be White.

Recommendations for Video Conferencing Systems

In the future, our virtual identities will become increasinglyimportant. In the physical world, attributes like skin tone,hair type, eye color and facial features are hard to change.These constraints for altering one’s physical appearance donot exist when we present ourselves in digital media. If AIcan alter how we are perceived, it is important to speculatethe effect those changes can have on individual credibilityin videoconferencing environments. We argue that alteringperceived race without informing the audience involved isunethical. We recommend that videoconferencing systemsrequire full disclosure to all parties on the video call whensomeone has chosen to alter their race. The system shouldbe the entity to provide this disclosure as holding the indi-vidual accountable to disclose that information may be un-reliable. We propose this disclosure be given one time toeach person who joins the call through a notiﬁcation sys-tem. We also recommend that the videoconferencing sys-tem implement complete transparency for the duration of thecall. We propose that the system implement a distinguish-ing mark in the frame of the person who is altering theirracial appearance. The mark should be discrete, yet serve asa constant reminder to the audience. Lastly, we recommendthat the videoconferencing systems implement an appropri-ate encryption scheme to identify videos sent and receivedover their channel. Otherwise, a nefarious entity could ma-nipulate the videos unbeknownst to the system and thereforeits users. This action would bypass the intentions of full-disclosure and complete transparency.

Limitations

Deﬁnitions of “Race”

How human beings racially proﬁleeach other is a complex phenomenon composed of manyfactors. Among these factors, linguistic features of speechsuch as accents is a major component. This paper insteadfocuses on what happens when the visual representation ofan individual changes to appear White or more White.

Issues of Video Realism

A potential confounding factorin our experiments is the believability of the altered video. It is conceivable that if a participant found the video to becontrived/altered, that this could inﬂuence their evaluationof the speakers credibility.To address the issue of video realism as a confoundingfactor in our study, we posed the question “I found the videoto be authentic” in both the original and the altered videosurveys. For the original video, 10% of participants foundthe video to be not authentic. Compared to the altered video,15% of participants thought the video was not authentic. Af-ter running a proportions z test we concluded that there is nodifference between these surveys regarding the perceived au-thenticity of the videos shown to the participants (p > Collecting Data from Amazon Mechanical Turk

We useAMT as a mechanism to draw a sample of the general popu-lation. We must recognize that this surveying technique willsuffer from some degree of population bias. When we lookat the demographics of the survey responses, there is a biastowards younger, White males from the US. We could havechosen other surveying options. Unfortunately, the cost ofbringing people into the lab to complete the surveys wasmade burdensome due to the the social distancing stipulationof the pandemic. It also would have been far more expensiveand would implement a population bias of a different kind.Considering the trade-offs, AMT was our best option givenits experimental validity and ability to crowd-source a largenumber of responses effectively.

Realism of Credibility Ground Truths

Another issuewith our experimental ﬁndings is the ground truths for thecredibility assessment videos that were shown to the partic-ipants. We sacriﬁce realism in order to guarantee the groundtruths that were collected during an experimental setting.The deceptions that occur over face to face, computer me-diated platforms may be very different from those obtainedduring an experimental context, and the inﬂuence of racialperceptions could likewise be as well. Despite that acknowl-edgement, the videos used were of study participants partic-ipating in a computer mediated deception experiment. Thuswe expect the video to be good to use for exploring credibil-ity in the domain of digital media.

Future Work

We note here that we cannot directly compare the differenceof racial perceptions created via images and audio versusvideo in how they affect credibility over video calls. Wewould need many examples using consistent image framesand audio to begin to draw a generalized conclusion on thattopic. We leave the ﬁne grained analysis of still image ver-sus video for future work as the implications are importantto understand.Lastly, this paper focuses on examining what happenswhen the visual representation of race is changed with AIand how that inﬂuences one’s credibility. In the future, wewill isolate speciﬁc accents such as African American Ver-nacular English (AAVE) and compare that against standardAmerican English to observe the direct effect this has onan individual’s credibility. AI can alter visual appearanceand should also be able to modify accent. This will enablenew insights due to the isolation and ﬁne grained controlf racial audio features. Understanding the implications ofthose changes in digital media involving audio is importantwork for the future.

Conclusion

We began the process of quantifying how racial perceptionsinﬂuence credibility in digital media (e.g., video conferenc-ing, video posting, etc. ). This is a difﬁcult problem to un-dertake as isolating the variables contributing to racial pro-ﬁling is complicated. Here we isolated one of those vari-ables – visual representation (e.g., skin tone and facial fea-tures) – and explore how modifying visual representationimpacts credibility. We used AI to assist in creating andmodifying speciﬁc race perceptions. CycleGAN was usedto create unambiguous racial representations, while Deep-fakes were used to incorporate subtle differences in modi-fying perceived race. We created surveys to test the imageand video conditions to help understand these nuances. Weused Amazon Mechanical Turk to recruit participants andobtain a more representative sample of the normal popu-lation compared to college students in a laboratory setting.All together, we crowd-sourced 800 participants to evaluatecredibility (percent of participants who believed the speaker)in media shown in the surveys. By comparing the responsesfor the surveys associated with each condition, we measurethe effect of how racial perceptions inﬂuence credibility. Inthe image condition, we found that the believed race of thespeaker did not effect the credibility of that speaker. How-ever, in the video condition where the racial adjustmentsare more nuanced, we found that the participants who be-lieved the speaker was White were signiﬁcantly more likelyto believe that speaker was telling the truth (61.0% versus73.0%). Additionally, we found that more positive sentimentwas associated with the text responses for the altered videocompared to the original video for participants justifyingtheir credibility decisions. All together, our evidence sug-gests that subtle modiﬁcations to alter perceived race mayallow unconscious bias to impact credibility. This has impli-cations in the domains of sales, negotiations, job interviewsand court trials when conducted in videoconferencing en-vironments. We therefore recommend these systems imple-ment full-disclosure and complete transparency when a per-son is altering their racial appearance using these platforms.We hope this work serves as a formal, initial investigation ofthis space and encourages further exploration.

References [Bertrand and Mullainathan 2004] Bertrand, M., and Mul-lainathan, S. 2004. Are emily and greg more employablethan lakisha and jamal? a ﬁeld experiment on labor marketdiscrimination.

American economic review

Proceedings of the IEEEConference on Computer Vision and Pattern Recognition ,10042–10050.[Coie 2020] Coie, P. 2020. Courts using videoconferencesoftware.[Dixon, Mahoney, and Cocks 2002] Dixon, J. A.; Mahoney,B.; and Cocks, R. 2002. Accents of guilt? effects of regionalaccent, race, and crime type on attributions of guilt.

Journalof Language and Social Psychology

Eighth International Conferenceon Weblogs and Social Media (ICWSM-14). Available at(20/04/16) http://comp. social. gatech. edu/papers/icwsm14.vader. hutto. pdf , volume 81, 82.[Groom, Bailenson, and Nass 2009] Groom, V.; Bailenson,J. N.; and Nass, C. 2009. The inﬂuence of racial embod-iment on racial bias in immersive virtual environments.

So-cial Inﬂuence

Journal of Experimental Social Psychology

Experimen-tal economics

Proceedings of theIEEE/CVF Conference on Computer Vision and PatternRecognition , 4401–4410.[Klonoff and Landrine 2000] Klonoff, E. A., and Landrine,H. 2000. Is skin color a marker for racial discrimination?explaining the skin color–hypertension relationship.

Journalof Behavioral Medicine arXiv preprint arXiv:1812.08685 .[Korshunov and Marcel 2020] Korshunov, P., and Marcel, S.2020. Deepfake detection: humans vs. machines. arXivpreprint arXiv:2009.03155 .[Landstr¨om and Granhag 2010] Landstr¨om, S., andGranhag, P. A. 2010. In-court versus out-of-court testi-monies: Children’s experiences and adults’ assessments.

Applied Cognitive Psychology

Applied CognitivePsychology: The Ofﬁcial Journal of the Society for AppliedResearch in Memory and Cognition

Edu-cational philosophy and theory

Journal of experimental socialpsychology

Personality and Social Psy-chology Bulletin

Judgment and Decision making

Consciousness and cog-nition

Language

Proceedings of the ACM on Interactive, Mobile,Wearable and Ubiquitous Technologies

AnnualReview of Sociology

The use of Teleconfer-encing in Asylum Removal Proceedings

Scandinavian journal ofwork, environment & health