Could you become more credible by being White? Assessing Impact of Race on Credibility with Deepfakes
Kurtis Haut, Caleb Wohn, Victor Antony, Aidan Goldfarb, Melissa Welsh, Dillanie Sumanthiran, Ji-ze Jang, Md. Rafayet Ali, Ehsan Hoque
CCould you become more credible by being White? Assessing Impact of Race onCredibility with Deepfakes
Kurtis Haut, Caleb Wohn, Victor Antony,
Aidan Goldfarb, Melissa Welsh, Dillanie Sumanthiran, Ji-ze JangMd. Rafayet Ali and Ehsan HoqueUniversity of RochesterDeptment of Computer Science
Abstract
Computer mediated conversations (e.g., videoconferencing)is now the new mainstream media. How would credibility beimpacted if one could change their race on the fly in theseenvironments? We propose an approach using Deepfakes anda supporting GAN architecture to isolate visual features andalter racial perception. We then crowd-sourced over 800 sur-vey responses to measure how credibility was influenced bychanging the perceived race. We evaluate the effect of show-ing a still image of a Black person versus a still image of aWhite person using the same audio clip for each survey. Wealso test the effect of showing either an original video or an al-tered video where the appearance of the person in the originalvideo is modified to appear more White. We measure credi-bility as the percent of participant responses who believed thespeaker was telling the truth. We found that changing the raceof a person in a static image has negligible impact on credi-bility. However, the same manipulation of race on a video in-creases credibility significantly (61% to 73% with p < Introduction
Manipulating race to measure its impact has been exten-sively studied in the past. It has been shown that having aBlack sounding name puts you at a disadvantage for em-ployment opportunities compared to having a White sound-ing name on identical resumes (Bertrand and Mullainathan2004). Avatars offer a high degree of predictive customiza-tion and have been utilized to research impacts of race invirtual reality (Groom, Bailenson, and Nass 2009) and creat-ing racial empathy that extends to the real world (Peck et al.2013). Although names and avatars are both effective waysto create racial perceptions, in the domain of credibility, itremains unclear whether the same findings may extend tosubtle manipulation of images and videos. The recent ad-vancements of AI now allows precise manipulation of skintone, hair type, eye color, and facial features. This opens upan exciting opportunity to probe for race and measure its im-pact in a way that wasn’t possible before. In this paper, we evaluate racial perceptions by leveraging Deepfakes and asupporting GAN architecture to manipulate facial featuresand skin color while holding other variables such as accentand clothing constant. This allows us to directly measurethe effect of those racial changes. We apply our techniqueson a naturalistic dataset on deception collected using theADDR framework where the participants engage in an ac-tivity on a video call (Sen et al. 2018). Given that the datasetis made of clips from actual video calls recorded by the web-cam, we hope that our findings can provide insights for realworld videoconferencing situations such as sales, negotia-tions, job interviews, telemedicine and virtual trials. Virtualtrials have become especially important during the COVID-19 pandemic as courts in the US in all 50 states have begunto hold online trials using video conferencing (Branch 2020;Coie 2020). Virtual testimony is perceived to be less credi-ble that in-person testimony. (Landstr¨om and Granhag 2010;Landstr¨om, Granhag, and Hartwig 2005; Walsh and Walsh). There could be many reasons for this, but understand-ing whether perception of race is contributing to this phe-nomenon may help future court proceedings. Even for lowstakes video calls, credibility assessments of our conver-sational partners impact communication. Therefore, under-standing how racial perceptions influence credibility in thesedomains matters. Perception of race is a complex issue com-posed of variables such as class, education, accent, skin tone,etc. In this paper, we look at racial perceptions through thelens of modifying visual representation (skin tone and facialfeatures).We designed an experiment consisting of 4 surveys (seefig. 1 to begin the process of answering these research ques-tions:• How do racial perceptions in videos and images influencean individual’s credibility?• What public sentiments are associated with the credibilityof a perceived race?To assess credibility, we employ Amazon MechanicalTurk (AMT) to crowd-source responses for our surveys. Anonline worker for AMT is called a mTurker who acceptsa Human Information Task (HIT) and is paid for complet-ing the human labor associated with that HIT. In our case,a mTurker watches a video or listens to an audio with astill image, then completes a survey. In our surveys, we takeigure 1: a) Depicts the image condition. We trained a CycleGAN using Chicago Face Dataset to generate unambiguousracial images of a Black and White man. We launched two surveys on Amazon Mechanical Turk (denoted by AMT HIT) andcompared the responses. Both surveys have the same audio of the transcript shown. b) depicts the video condition. We took anoriginal video and made subtle racial changes to the face area using deepfacelab. We launched two surveys as in a) and comparethe responses. The transcript of what the speaker said is shown and both videos have identical audio.credibility to mean the percent of participants who believedthat the speaker in the video or audio was telling the truth.In total, we have 4 surveys with 800 total responses from theparticipants (For demographic information see fig. 2). Thefirst 2 surveys test the condition of what happens when a dif-ferent still image is shown to the participant while the sameaudio file is played, i.e., for one survey an image of a Blackperson is shown to the participant while in the other survey,an image of a White person is shown to the participant (seefig. 1a). The next 2 surveys test the condition of showing adifferent video to the participants. In one survey, a video isshown. In the other survey, we show an altered version ofthe same video where the only difference is we make thespeaker appear White (see Fig.1b). We used the CycleGAN(Zhu et al. 2017a) to generate image mappings of White toBlack and vice versa to test the image condition. For thevideo condition, we used an encoder decoder architectureto perform a face swap; essentially changing the visual rep-resentation of a South Asian person to appear more White(in terms of skin tone, facial features) using DeepFaceLab(Perov et al. 2020). For each of the four surveys, 200 partic-ipants (total 800) responded. We then performed statisticalanalysis on the survey responses to determine if the credibil- ity differences for each condition were meaningful.In summary, we find and recommend the following:• Those who believed the person was White in the alteredvideo were more likely to say the person was telling thetruth.• There is no statistical difference in credibility between theimage conditions when participants listened to the sameaudio.• Participants described the altered video using words with more positive sentiment in the free response survey ques-tions.• Subtle modifications to alter racial perception may impactcredibility in video recordings from prior videoconferenc-ing sessions.Although creating a high quality Deepfake in real-timeduring a videoconferencing session is not available rightnow, in the near future this will be possible. Regardlessof the situation surrounding the video call, it is likely thatpeople may have the ability to change their perceived race(users can already change their gender or skin complexionon Snapchat). Our findings indicate that this action couldave an effect on the credibility of that person. We make thefollowing recommendation to video-conferencing or socialmedia companies, should they decide to enable this featurein the future:• Complete transparency and full disclosure of the racechange to all parties on the video call. (e.g., they mustbe notified and there should be an identifying mark on thewindow of that persons video during the call).
Background
Racial Biases within Societies
Prior studies within the context of racial discrimination aimto tackle the problem solely through using sociological met-rics of quantification. Perception of racial identity has beenobserved to depend on various aspects of race, includingskin color (Maddox and Gray 2002) and accents (Dixon,Mahoney, and Cocks 2002). When assessing racial discrim-ination in context of skin color, prior works illustrate howpeople of different races tend to distrust each other themost (Smith 2010). The level of racial discrimination caneven be broken down further into the skin color gradient(Leonardo 2004). For example, even when race is held as aconstant and only skin color fluctuates, dark-skinned Blackswere 11 times more likely to experience racial discrimina-tion than their light-skinned counterparts (Klonoff and Lan-drine 2000). These studies suggest that skin color alonemay cause a person to be discriminated against. In addi-tion to skin color and race, the accent of a person can alsohave a significant influence on how others perceive them.This is often found in incidents involving people who speakAfrican American Vernacular English (AAVE). There havebeen many court cases in which the accent of African Amer-icans negatively influence the outcome of a trial (Rickfordand King 2016). Moreover, people with a non-native accentmay cause listeners to perceive them as being less credi-ble because their accents serve as a signal or because lis-teners find their speech to be difficult to process (Lev-Ariand Keysar 2010). Through their work, Shiri Lev-Ari andBoaz Keysar showed that participants in their study per-ceived trivia statements such as “Ants don’t sleep” to be trueless frequently when spoken by a non-native English speakerthan a native speaker. This lends credence to the idea thatan individual’s socio-political background may have a sig-nificant impact on how credible they are perceived to be.Similarly, G¨unaydin et al demonstrated that objective fa-cial resemblance to a significant other influences snap judg-ments of liking automatically, effortlessly, and without con-scious awareness (G¨unaydin et al. 2012). This yields furtherevidence to the possibility that race may impact credibil-ity given the large proportions of individuals who grow upin communities which are segregated or dominated by oneracial group.However, there is very limited information in relevant lit-erature assessing the impacts of perceived race on credi-bility through modifying visual representation exclusively.This could be due to the fact that one can not easily changetheir physical features of race nor can they replicate life ex-perience to measure the effect of those changes. However, when we communicate online with images or videos, theseconstraints do not exist. Because we can modify visual rep-resentation algorithmicly on the computer, we can researchthe effect this has on credibility when communicating faceto face over digital media.
Employing AI for Investigating Racial Bias
In order to more objectively assess how skin color and facialfeatures effect one’s credibility, techniques such as CycleGANs and Deepfakes are the most suitable ones. Those tech-niques can keep features such as accent, clothing, gestures,and facial expressions constant. Cycle GANs have previ-ously been used within research applications to make imageprocessing training sets more inclusive to various skin tones.In the work of “Fairness GAN”, Sattigeri illustrates the Cy-cleGAN’s power in constructing an extension to the Celeb-faces Attributes to be demographically inclusive, one por-tion being skintone(Sattigeri et al. 2018). Other work withCycle GAN looks at applications such as artificial makeup,which can vary in degree of heaviness when applying syn-thetic makeup (Chen et al. 2019). Deepfakes also hold sig-nificant value in manipulating skin tone and facial featuresof videos. Although researchers have worked on creating al-gorithms to detect computer manipulated images and videos(Korshunov and Marcel 2018), and humans are likely to bemore accurate at this task at present(Korshunov and Marcel2020), the future is anything but certain. These algorithmscontinue to improve, generating more realistic images at astartling pace. In 2019, Karras et al introduced StyleGANwhich demonstrated an innovative architecture able to gen-erate more realistic images (Karras, Laine, and Aila 2019).In this paper, we leverage Deepfakes and Cycle-GANs tocreate subtle racial changes to help understand the nuancesof race perception and credibility in video call environments.
Methods
The credibility and associated language sentiment for dif-ferent racial perceptions was measured using responsesfrom 800 participants recruited from Amazon MechanicalTurk. Generally, mTurkers form a more diverse populationthan the standard internet population or the populations ofcollege students typically used in laboratory experiments(Buhrmester, Kwang, and Gosling 2016)(Horton, Rand, andZeckhauser 2011)(Paolacci, Chandler, and Ipeirotis 2010).These participants were given one of the four surveys whichwere designed to measure two conditions; image and video.Among the 800 participants 305 gave us their demographicinformation (shown in Fig. 2a, 2b and 2c). These partici-pants were compensated 2 dollars for their time completingthe surveys.
Design of Experiment
Credibility Ground Truths
We used one audio clip and one video clip from the pub-licly available UR Lying dataset, collected using the ADDRframework (Sen et al. 2018). Each clip is 30 seconds inlength and encompasses the speaker answering a question
What was your image? . The speakers in the video are shown a) Age Distribution(b) Race(c) Gender Identity
Country Count PercentUS 263 86.23%Brazil 16 5.25%India 6 1.97%Italy 4 1.31%Bangladesh 2 0.66%China 2 0.66%Ireland 2 0.66%Spain 2 0.66%U.K. 2 0.66%Argentina 1 0.33%Bahamas 1 0.33%Canada 1 0.33%Hong Kong 1 0.33%Indonesia 1 0.33%Mexico 1 0.33% (d) National Origin
Figure 2: Self-reported demographic information of 305 par-ticipants an image prior to asking this question. The ADDR frame-work instructed the speaker to lie or tell the truth about theirimage. In this particular instance, the speakers in both im-age condition and in video condition described their imageas it is (i.e., telling the truth). From the audio and videorecordings, we designed four separate surveys to test twoconditions. Two surveys tested an image condition and twosurveys tested a video condition. We gathered a total of 800responses from the four surveys each survey having 200 par-ticipants.
The Image Condition
We were interested to see how stillimages of people from different races paired with the sameaudio can influence credibility. The still image was designedto elicit a specific racial perception. We were interested incomparing credibility for a White person versus a Black per-son. Instead of using a real image from the dataset, whichintroduces noise such as clothing style and features uniqueto an individual of that race, we generated our own genericrace representations to produce racial mappings of Black toWhite and vice versa as shown in fig. 1a. We trained a Cy-cleGAN using 4000 images from the Chicago Face Dataset(CFD) which learned how to conceivably change complexracial features such as eye color, lip shape, hair type andskin color (Zhu et al. 2017b). The participants either saw theWhite image in the survey or the Black image in the othersurvey, while listening to the same audio. We then comparedthe responses from these two surveys to see if the still imageshown to the participants influenced the speakers credibilityor attributed language sentiment.
Video Condition
While the image condition created clear,unambiguous representations of race, the video conditionexplores the nuances of racial perception by making sub-tle modifications to the facial region. In this condition, par-ticipants either watched the original video in one survey oran altered video in the other survey. In the altered video, theperson from the original video is made to appear more Whiteusing a DeepFaceLab (Perov et al. 2020) face mapping. Wedo it by taking the face of a White participant from the URLying database and map it onto the face of a non white par-ticipant as seen in fig 1b. We then create 2 surveys where onesurvey shows the participant the original video and the othershows the altered video. We then compare the responses onthe credibility and language sentiment of the speaker.
Survey
To measure the effect of an individual’s perceivedrace on that individual’s credibility we created four indepen-dent surveys. Two of the surveys played an audio file fromthe UR Lying Database accompanied with either a Black orWhite still image. The other two surveys played either anoriginal video from the UR Lying database who is SouthAsian or an altered video where the person is made to ap-pear more White. For each survey, we collect informationon whether the participant thought the speaker was lying ortelling the truth and ask the participant to justify his/her an-swer using a free response. The questions we asked the par-ticipants remain the same in each survey and can be foundin table 1. All together, we collect 200 responses per surveytotaling 800 responses.able 1: Questions for the surveys
Questions
Please briefly (1 or 2 sentences) describe what the personin the video said.Do you think the speaker was telling the truth?How confident are you about the whether the speakerwas lying/telling the truth? (10 means you are certain,1 means you have no idea)What made you think the person was lying or telling thetruth?Use a few adjectives to describe the characteristics of thespeakerWhat race do you think the speaker is?What do you think the socioeconomic status of thespeaker is?Do you think the video was authentic?What is your age?What do you consider to be your gender?Where did you grow up? (i.e. where did you spend mostof your first 12 years?)What do you consider to be your socioeconomic class?What do you consider to be your race?
Analysis
Pre-processing to Identify Perceived Race
We includeda question asking the participants to identify the race of thespeaker with the goal of detecting the perceived race. Sinceour research question concerns how perceptions of race in-fluence credibility, we had to filter out responses which didnot ”correctly” identify the race (i.e., ”Black” in the case ofthe audio with a still image of a Black man, ”White” in thecase of the audio with a still image of a White man, ”Asian”in the case of the original video of a South Asian speaker,and ”White” in the case of the altered video). Thus, after thepre-processing step is complete in our analysis pipeline, weare left only with the responses who perceived the intendedrace for each survey.
Assessing perceived credibility
We asked the participantswhether or not they thought the speaker was telling the truthin each survey in a binary fashion (i.e., yes or no). For theimage condition, we analyzed the responses from the twosurveys where this image is either Black or White as de-scribed earlier. We took the number of respondents whothought the speaker was telling the truth in both cases andcompare them against each other using a proportions Z-test.Similarly, for the video condition, we analyze the responsesfrom the two surveys where the video is either the origi-nal or altered video as described earlier. Following the sameroutine for the image condition, we take the number whothought the speaker was telling the truth in both surveys andcompare them using a proportions Z-test.
Analyzing Sentiment
We also asked the participants togive their rationale for why they thought the speaker waslying or telling the truth as well as to give a few adjectives todescribe the characteristics of the speaker in a free responseform. We analyzed these text-field questions using a VADER sentiment analysis (Gilbert and Hutto 2014). VADER usessimple rules and a lexicon built by averaging sentiment anal-ysis done by participants from Amazon Mechanical Turk.Its heuristic rules were designed based on statistical analy-sis of tweets, and it was tested against other models acrossmultiple domains using AMT to generate a “wisdom of thecrowd” ground truth. A major reason why we chose VADERis because VADER was created and tested using AMT sur-veys (a population similar to ours). We first perform a com-pound sentiment analysis on the text of each response, whichoutputs a number between -1 and +1 for that response. Thusfor each survey, we now have an array of values between -1and +1. We take the arrays from the two surveys associatedwith the image condition and compare them against eachother using a Mann Whitney U test. Likewise for the videocondition, we take the arrays from the two surveys for thatcondition and compare them against each other. This allowsus to evaluate whether more positive sentiment is associatedwith a specific racial perception. We do the same sentimentanalysis over the responses for the question which asked theparticipants to use a few adjectives to describe the character-istics of the speaker.
Results
Fig. 3a, 3b, 3c and 3d show how the pre-processing was doneon the responses for the analysis. Because we are interestedin how racial perceptions influence credibility over digitalmedia, we analyze the responses who believed the intendedracial perception as described in the pre-processing section.
Credibility Analysis
We take credibility to mean the percent of respondents whosaid that the speaker was telling the truth. In the image con-dition, 72.3% believed the speaker was telling the truth whenan image of a White person was shown vs 70.3% when animage of a Black person was shown. There was no statisticalsignificant difference in this condition. But in the video con-dition, 61.0% of participants believed the speaker was tellingthe truth after watching the original video of the darker-skinned South Asian speaker, and 73.0% of participants be-lieved the speaker was telling the truth after watching thealtered video featuring the White speaker. This differencewas significant ( p < . ). Table 2 shows these results.Table 2: Summary of Results. “n” is the number of re-sponses after pre-processing. “Credibility” is the percent ofresponses that believed the speaker was telling the truth. *indicates p < .05. Survey n Credibility
Black Image 138 70.29%White Image 159 72.33%Original Video (South Asian) 123 *60.98%Altered Video (White) 63 73.02%
Sentiment Analysis
We analyzed the responses to the open-ended questions, “What made you think the person was lying or telling the a) Participants Race Responses for Black Image(b) Participants Race Responses for White Image(c) Participants Race Responses for Original Video(d) Participants Race Responses for Altered Video
Figure 3: Results for the racial perceptions of the partici-pants from the pre-proccessing question “What race do youthink the speaker is?” truth?” and “Use a few adjectives to describe the charac-teristics of the speaker” . We used the compound sentimentscore from VADER, which measures sentiment on a scalefrom -1 to +1, with +1 indicating the text is very positive.The results are contained in Figures 4 and 5.
Survey Comparison VADER Sentiment
Black Image Overall 0.13White Image Overall 0.149Black Image Truth 0.22White Image Truth 0.24Black Image Lie -0.098White Image Lie -0.088Original Video Overall **0.061Altered Video Overall 0.236Original Video Truth **0.19Altered Video Truth 0.359Original Video Lie -0.145Altered Video Lie -0.133Figure 4: Results from VADER compound sentiment anal-ysis of responses to the question “What made you think theperson was lying or telling the truth?” . Overall indicates themean of the compound sentiment regardless of whether theparticipants believed the speaker or not. ‘Truth’ indicatesthose respondents who thought the speaker was telling thetruth and ‘Lie’ indicates those participants who thought thespeaker was lying. *Indicates p < < < < < urvey Comparison VADER Sentiment Black Image Overall *0.342White Image Overall 0.241Black Image Truth *0.451White Image Truth 0.367Black Image Lie *0.07White Image Lie -0.088Original Video Overall 0.163Altered Video Overall 0.221Original Video Truth 0.34Altered Video Truth 0.325Original Video Lie -0.159Altered Video Lie -0.05Figure 5: Results from VADER compound sentiment analy-sis of responses to the question “Use a few adjectives to de-scribe the characteristics of the speaker?”. Overall indicatesthe mean of the compound sentiment regardless of whetherthe participants believed the speaker or not. ‘Truth’ indicatesthose respondents who thought the speaker was telling thetruth and ‘Lie’ indicates those participants who thought thespeaker was lying. *Indicates p < < Discussion
Nuances of Racial Perception Could RevealUnconscious Bias
Our decision to use still images of a White person versus aBlack person was done to make the perception of race lessambiguous. This can be seen by looking at the distributionof survey responses in figure 3. It may seem like making thedifferentiation of race more clear would enable measuringthe effect on credibility more accurate. However, when thisracial difference is accentuated, it is also possible that thesurvey participants compensate their credibility evaluationsbased on their awareness of what is being measured due tothe Hawthorne effect (Wickstr¨om and Bendix 2000).For this reason, we also choose to do a more nuancedracial modification with the video condition. In this condi-tion, the racial perception is much more ambiguous. Thiscan be seen by the distribution of the survey responses infigure 3. We speculated, that perhaps if the racial changeswere subtle enough they could even go unnoticed. Thiscould allow unconscious bias to influence the survey re-sponses accessing credibility. It remains part of our future work to probe and compare different racial filters includinga bi-racial White and Black speaker for the video condition.More work is needed to also explore the bi-directional alter-nation of races to understand the whole spectrum of racialperception.
Image Condition Versus Video ConditionCredibility Comparison
We did not observe a difference in credibility in the still im-age condition compared to the video condition. Our originalhypothesis was changing the racial perception of the speakerto be White would increase their credibility. Although wesee some evidence of that effect with a 70.3% credibility rat-ing with the Black still image shown compared to a 72.3%credibility rating with the still White image shown, this dif-ference is not significant.Yet with the video condition, we see that making thespeaker appear White does have a tangible influence ontheir credibility rating (61.0% increased to 73.0% when theparticipants perceived the speaker as White). This providessome evidence that appearing more White may increasecredibility in videoconferencing platforms. However, thatconclusion does not explain why we see an increase in credi-bility for the video condition but not for the image condition.It is possible that for the image condition with two raciallypronounced pictures, the respondents were more aware ofthe racial bias and didn’t allow race to influence their judge-ment. However, the video condition included a more nu-anced alteration along with nonverbal dynamics for the re-spondents to process towards making a decision. As a result,it is possible for race to become a factor towards measuringcredibility.
Comparing Feelings and Sentiment Towards Race
In order to see if perceived race had an effect on senti-ment, we compared the responses for surveys that are asso-ciated with each condition (image and video). For each con-dition, we measured and compared the overall sentiment ofthe open-ended text responses. We observed no differencesin sentiment for the image condition given by the partici-pants for their justification of speaker credibility. However,for the video condition, again we see statistically significantdifferences. We found more positive sentiment overall in thewords used by participants to justify their decision and thatthis difference is more pronounced when comparing truth-ful text responses. Given that we observed higher sentimentscores in the altered video, it was likely due to the partici-pants being under the impression that the speaker was White.Yet as we do not see this mirrored in the image condition,there could be more to the story.When we compared the text responses describing thecharacteristics of the speaker, we found higher sentiment inthe Black image survey compared to the White image sur-vey. We found this difference is significant regardless of ifthe participants believed the speaker was telling the truth ornot. The higher overall sentiment attributed to the speakerperceived to be Black indicates the Hawthorne effect at workdue to its contradictory nature. Because the positive sen-timent is not reflected in the justification for the decisionade regarding the truthfulness of the speaker, nor is thecredibility rating higher for the Black speaker, it is possi-ble that the positive adjectives used by the participants areinsincere. Looking at the demographics, a majority of theparticipants are young, white males (see fig. 2). This partic-ular age/race group, given the current political climate, maybe exercising caution when addressing the Black commu-nity specifically. A careless choice of words may be viewedas insensitive or could potentially result in being labeled as aWhite supremacist or racist. In the case of the mTurker, sucha complaint could have their license suspended and causethem to be out of work. As a result, for the image condition,it is possible that the participant is cognizant of what is beingmeasured and therefore adjusts his/her behavior accordingly.However, with the video condition, due to the subtleties ofthe racial adjustments, perhaps we observe the unconsciousbias associated with the participants preferring the speakerwhen perceived to be White.
Recommendations for Video Conferencing Systems
In the future, our virtual identities will become increasinglyimportant. In the physical world, attributes like skin tone,hair type, eye color and facial features are hard to change.These constraints for altering one’s physical appearance donot exist when we present ourselves in digital media. If AIcan alter how we are perceived, it is important to speculatethe effect those changes can have on individual credibilityin videoconferencing environments. We argue that alteringperceived race without informing the audience involved isunethical. We recommend that videoconferencing systemsrequire full disclosure to all parties on the video call whensomeone has chosen to alter their race. The system shouldbe the entity to provide this disclosure as holding the indi-vidual accountable to disclose that information may be un-reliable. We propose this disclosure be given one time toeach person who joins the call through a notification sys-tem. We also recommend that the videoconferencing sys-tem implement complete transparency for the duration of thecall. We propose that the system implement a distinguish-ing mark in the frame of the person who is altering theirracial appearance. The mark should be discrete, yet serve asa constant reminder to the audience. Lastly, we recommendthat the videoconferencing systems implement an appropri-ate encryption scheme to identify videos sent and receivedover their channel. Otherwise, a nefarious entity could ma-nipulate the videos unbeknownst to the system and thereforeits users. This action would bypass the intentions of full-disclosure and complete transparency.
Limitations
Definitions of “Race”
How human beings racially profileeach other is a complex phenomenon composed of manyfactors. Among these factors, linguistic features of speechsuch as accents is a major component. This paper insteadfocuses on what happens when the visual representation ofan individual changes to appear White or more White.
Issues of Video Realism
A potential confounding factorin our experiments is the believability of the altered video. It is conceivable that if a participant found the video to becontrived/altered, that this could influence their evaluationof the speakers credibility.To address the issue of video realism as a confoundingfactor in our study, we posed the question “I found the videoto be authentic” in both the original and the altered videosurveys. For the original video, 10% of participants foundthe video to be not authentic. Compared to the altered video,15% of participants thought the video was not authentic. Af-ter running a proportions z test we concluded that there is nodifference between these surveys regarding the perceived au-thenticity of the videos shown to the participants (p > Collecting Data from Amazon Mechanical Turk
We useAMT as a mechanism to draw a sample of the general popu-lation. We must recognize that this surveying technique willsuffer from some degree of population bias. When we lookat the demographics of the survey responses, there is a biastowards younger, White males from the US. We could havechosen other surveying options. Unfortunately, the cost ofbringing people into the lab to complete the surveys wasmade burdensome due to the the social distancing stipulationof the pandemic. It also would have been far more expensiveand would implement a population bias of a different kind.Considering the trade-offs, AMT was our best option givenits experimental validity and ability to crowd-source a largenumber of responses effectively.
Realism of Credibility Ground Truths
Another issuewith our experimental findings is the ground truths for thecredibility assessment videos that were shown to the partic-ipants. We sacrifice realism in order to guarantee the groundtruths that were collected during an experimental setting.The deceptions that occur over face to face, computer me-diated platforms may be very different from those obtainedduring an experimental context, and the influence of racialperceptions could likewise be as well. Despite that acknowl-edgement, the videos used were of study participants partic-ipating in a computer mediated deception experiment. Thuswe expect the video to be good to use for exploring credibil-ity in the domain of digital media.
Future Work
We note here that we cannot directly compare the differenceof racial perceptions created via images and audio versusvideo in how they affect credibility over video calls. Wewould need many examples using consistent image framesand audio to begin to draw a generalized conclusion on thattopic. We leave the fine grained analysis of still image ver-sus video for future work as the implications are importantto understand.Lastly, this paper focuses on examining what happenswhen the visual representation of race is changed with AIand how that influences one’s credibility. In the future, wewill isolate specific accents such as African American Ver-nacular English (AAVE) and compare that against standardAmerican English to observe the direct effect this has onan individual’s credibility. AI can alter visual appearanceand should also be able to modify accent. This will enablenew insights due to the isolation and fine grained controlf racial audio features. Understanding the implications ofthose changes in digital media involving audio is importantwork for the future.
Conclusion
We began the process of quantifying how racial perceptionsinfluence credibility in digital media (e.g., video conferenc-ing, video posting, etc. ). This is a difficult problem to un-dertake as isolating the variables contributing to racial pro-filing is complicated. Here we isolated one of those vari-ables – visual representation (e.g., skin tone and facial fea-tures) – and explore how modifying visual representationimpacts credibility. We used AI to assist in creating andmodifying specific race perceptions. CycleGAN was usedto create unambiguous racial representations, while Deep-fakes were used to incorporate subtle differences in modi-fying perceived race. We created surveys to test the imageand video conditions to help understand these nuances. Weused Amazon Mechanical Turk to recruit participants andobtain a more representative sample of the normal popu-lation compared to college students in a laboratory setting.All together, we crowd-sourced 800 participants to evaluatecredibility (percent of participants who believed the speaker)in media shown in the surveys. By comparing the responsesfor the surveys associated with each condition, we measurethe effect of how racial perceptions influence credibility. Inthe image condition, we found that the believed race of thespeaker did not effect the credibility of that speaker. How-ever, in the video condition where the racial adjustmentsare more nuanced, we found that the participants who be-lieved the speaker was White were significantly more likelyto believe that speaker was telling the truth (61.0% versus73.0%). Additionally, we found that more positive sentimentwas associated with the text responses for the altered videocompared to the original video for participants justifyingtheir credibility decisions. All together, our evidence sug-gests that subtle modifications to alter perceived race mayallow unconscious bias to impact credibility. This has impli-cations in the domains of sales, negotiations, job interviewsand court trials when conducted in videoconferencing en-vironments. We therefore recommend these systems imple-ment full-disclosure and complete transparency when a per-son is altering their racial appearance using these platforms.We hope this work serves as a formal, initial investigation ofthis space and encourages further exploration.
References [Bertrand and Mullainathan 2004] Bertrand, M., and Mul-lainathan, S. 2004. Are emily and greg more employablethan lakisha and jamal? a field experiment on labor marketdiscrimination.
American economic review
Proceedings of the IEEEConference on Computer Vision and Pattern Recognition ,10042–10050.[Coie 2020] Coie, P. 2020. Courts using videoconferencesoftware.[Dixon, Mahoney, and Cocks 2002] Dixon, J. A.; Mahoney,B.; and Cocks, R. 2002. Accents of guilt? effects of regionalaccent, race, and crime type on attributions of guilt.
Journalof Language and Social Psychology
Eighth International Conferenceon Weblogs and Social Media (ICWSM-14). Available at(20/04/16) http://comp. social. gatech. edu/papers/icwsm14.vader. hutto. pdf , volume 81, 82.[Groom, Bailenson, and Nass 2009] Groom, V.; Bailenson,J. N.; and Nass, C. 2009. The influence of racial embod-iment on racial bias in immersive virtual environments.
So-cial Influence
Journal of Experimental Social Psychology
Experimen-tal economics
Proceedings of theIEEE/CVF Conference on Computer Vision and PatternRecognition , 4401–4410.[Klonoff and Landrine 2000] Klonoff, E. A., and Landrine,H. 2000. Is skin color a marker for racial discrimination?explaining the skin color–hypertension relationship.
Journalof Behavioral Medicine arXiv preprint arXiv:1812.08685 .[Korshunov and Marcel 2020] Korshunov, P., and Marcel, S.2020. Deepfake detection: humans vs. machines. arXivpreprint arXiv:2009.03155 .[Landstr¨om and Granhag 2010] Landstr¨om, S., andGranhag, P. A. 2010. In-court versus out-of-court testi-monies: Children’s experiences and adults’ assessments.
Applied Cognitive Psychology
Applied CognitivePsychology: The Official Journal of the Society for AppliedResearch in Memory and Cognition
Edu-cational philosophy and theory
Journal of experimental socialpsychology
Personality and Social Psy-chology Bulletin
Judgment and Decision making
Consciousness and cog-nition
Language
Proceedings of the ACM on Interactive, Mobile,Wearable and Ubiquitous Technologies
AnnualReview of Sociology
The use of Teleconfer-encing in Asylum Removal Proceedings
Scandinavian journal ofwork, environment & health