Visual Framing of Science Conspiracy Videos: Integrating Machine Learning with Communication Theories to Study the Use of Color and Brightness
Kaiping Chen, Sang Jung Kim, Sebastian Raschka, Qiantong Gao
1 Visual Framing of Science Conspiracy Videos: Integrating Machine Learning with Communication Theories to Study the Use of Color and Brightness
Kaiping Chen, Sang Jung Kim, Sebastian Raschka, Qiantong Gao * University of Wisconsin-Madison Version: February 1 st , 2021 Abstract
Recent years have witnessed an explosion of science conspiracy videos on the Internet, challenging science epistemology and public understanding of science. Scholars have started to examine the persuasion techniques used in conspiracy messages such as uncertainty and fear yet, little is understood about the visual narratives, especially how visual narratives differ in videos that debunk conspiracies versus those that propagate conspiracies. This paper addresses this gap in understanding visual framing in conspiracy videos through analyzing millions of frames from conspiracy and counter-conspiracy YouTube videos using computational methods. We found that conspiracy videos tended to use lower color variance and brightness, especially in thumbnails and earlier parts of the videos. This paper also demonstrates how researchers can integrate textual and visual features for identifying conspiracies on social media and discusses the implications of computational modeling for scholars interested in studying visual manipulation in the digital era.
Keywords . conspiracy, color and brightness, YouTube, computer vision, machine learning, text analysis Recent years have witnessed an explosion of science conspiracy videos on the Internet, challenging science epistemology and public understanding of science (Ahmed et al., 2020). Despite efforts to debunk conspiracies, they are still believed by many populations (Miller, 2020). Scholars have started to examine the persuasion techniques used in conspiracy messages such as uncertainty and fear (van Prooijen & Douglas 2017) yet, little is understood about the visual narratives, especially how visual narratives differ in videos that support conspiracy verses those that debunk conspiracy. Having a comparative understanding about the visual features in conspiracies vs debunking videos are crucial as both narratives co-exist and compete in the media ecosystem. Our paper addresses this gap by revealing how conspiracy science videos manipulate colors and brightness to arouse uncertainty and fear. As a result, our work will bring new knowledge to communication theories on framing, uncertainty, and fear appeals, and inform strategies for distinguishing science from misinformation. * Kaiping Chen (http://kaipingchen.com) is an Assistant Professor in Computational Communication in the Life Sciences Communication Department at the University of Wisconsin-Madison and is the corresponding author of this paper ([email protected]). Sang Jung Kim is a PhD student in the School of Journalism and Mass Communication at the University of Wisconsin-Madison. Sebastian Raschka (http://pages.stat.wisc.edu/~sraschka/) is an Assistant Professor in the Statistics Department at the University of Wisconsin-Madison. Qiantong Gao is an undergraduate majoring in Computer Science. All authors have contributed equally to this project.
Literature Review
With the rapid circulation of conspiracy theories on social media platforms and their detrimental effects on scientific foundations in society, understanding the nature of conspiracy theories has become a critical topic in communication (Halpern, Valenzuela, Katz, & Miranda, 2019). A conspiracy theory is a type of misinformation that discredits established institutions by building a narrative that these institutions have nefarious intents to threaten society (Scheufele & Krause, 2019). Conspiracy theories challenge established scientific knowledge, such as claiming that 5G technology caused COVID-19 or asserting that scientific institutions have malicious intentions to use the climate change discourse to harm citizens. Such misinformation leads citizens to reject scientific knowledge and threatens informed citizenship, which is crucial for democratic decision-making (Moore, 2018). Conspiracy theories are rapidly circulated on social media platforms and blur the boundaries between authoritative and alternative information sources (Kou, Gui, Chen, & Pine, 2017). Simultaneously, there are corrective efforts from social media users to combat misinformation circulated on social platforms (Vraga, Kim, & Cook, 2019). While the prevalence of conspiracy theories in the social media environment could be damaging, the existence of correction messages combating conspiracy theories by users could contribute to the social media environment’s self-purification. However, before comparing the influence of conspiracy theories and corrective messages, it is essential to understand the content characteristics of conspiracy theories and correction messages on social media platforms.
Framing Theory and Visual Framing in Communication Field
Both conspiracy theories and corrective messages on social media platforms attempt to persuade audiences and influence their beliefs. While conspiracy theories aim to convince audiences that established institutions have malicious intentions, correction messages aim to dispel audiences’ beliefs in conspiracy theories (Garrett, Nisbet, & Lynch, 2013). We rely on framing theories, a core theory in the communication field to examine persuasive messages, to understand the characteristics of conspiracy theories and correction messages. Message frames presented by
Visual Framing in Conspiracy Theories and Their Similarities to Horror Films
One key difference that distinguishes conspiracy theories from correction messages is the strong association between conspiracy theories and the culture of paranoia (Aupers, 2012). Paranoia occurs by individuals’ persecutory belief that harm will occur, and the harm is intended by others (Raihani & Bell, 2019). Conspiracy theories use framing devices to make audiences paranoid by demonizing established institutions and ultimately making audiences doubt factual knowledge
RQ1: Do conspiracy videos use more low-key lighting than correction videos?
RQ2: Do conspiracy videos use lower color variances than correction videos?
Conspiracy theories about climate change have existed for many years, and conspiracy theories related to COVID-19 are rapidly increasing after the outbreak of the disease (Mian & Khan, 2020). While conspiracy theories about climate change mostly build narratives to deny the threatening result of climate change (Douglas & Sutton, 2015), conspiracy theories about COVID-19 include both narratives denying the devastating consequence of COVID-19 and blaming the cause of COVID-19 to the government (e.g., the Chinese government) or technology (e.g., 5G technology) (Shahsavari, Holur, Wang, Tangherlini, & Roychowdhury, 2020). However, some climate change conspiracy theories also suggest that climate change happens because of the government’s nefarious intentions to harm citizens (e.g., geoengineering) (Allgaier, 2019). Since both conspiracy theories related to climate change and emerging conspiracy theories around COVID-19 have commonalities, visual frames of conspiracy theories around these issues might be similar. Therefore, we raise the third research question:
RQ3 : Do conspiracy videos about emerging science topics (COVID-19) and conspiracy videos about traditional science topics (Climate change) share the characteristics of horror films?
A Computational Approach to Understand Visual Framing (in Conspiracy)
There is a growing work among communication scholars that analyzes visual frames in media messages (Bucy & Joo, 2021) and applies computational skills to study visual frames (Joo & Steinert-Threlkeld, 2018; Peng, 2021). These computational methods are applied to 1) extract high-level features such as capturing facial expressions in the images or videos, and (or) 2) pull out visual modalities such as color or saturation in the images or videos. The present paper uses computational methods to answer our research questions by 1) extracting frames from each video, 2) computing visual modalities from video frames, and 3)
Figure 1. Flowchart on how to analyze images from videos
Extracting frames from each video
To extract visual frames from videos, researchers convert videos into an array of images by capturing frames (Derry et al., 2010). Converting videos into frames is an important first step if a researcher wants to process images computationally to extract visual feature information (Afifah, Nasrin, & Mukit, 2018). One of the computational tools that can be used by communication researchers to extract frames from videos is the OpenCV library (Culjak, Abram, Pribanic, Dzapo, & Cifrak, 2012). OpenCV is a comprehensive open-source computer vision library that provides application programming interfaces (APIs) for common programming languages such as C++, Python, Java, and MATLAB. In addition to providing state-of-the-art methods for image analysis, it also provides access to various image and video statistics, for example, the number of frames, frame rate, and frame size (Afifah et al., 2018).
Computing visual modalities from video frames
Communication studies employing computational techniques to analyze visual frames follow at least four approaches (Peng, 2021): 1) extracting attributes predetermined by open-source computer vision libraries and commercial APIs from videos or images (e.g., Peng, 2018), 2) using supervised machine learning models to extract feature attributes defined by researchers (e.g., Joo and Steinert-Threlkeld, 2018), 3) clustering features extracted from a supervised model pre-trained on a large image dataset and fine-tuned to the target dataset (i.e., transfer learning) (Peng, 2021), and 4) computationally analyzing visual modalities of images or videos, such as
Building models to classify conspiracy videos: image features vs. text features
The present paper also introduces another computational approach to the communication field, that is, evaluating the importance of visual features in determining conspiracy videos. Researchers can use machine learning or deep learning techniques to assess if a combination of visual elements in images or videos can successfully distinguish images or videos with different characteristics (Birnbaum et al., 2020). For the present paper, we question whether visual attributes in YouTube videos can be utilized to distinguish between conspiracy videos and correction videos successfully. More specifically, we examine if the information about low-key lighting and color variances can reliably differentiate between the two types of videos. Therefore, we raise the following research question:
RQ4: Can information about low-key lighting and color variances be used to reliably distinguish between conspiracy and non-conspiracy videos?
Researchers can utilize machine learning and deep learning models to predict whether videos classify as conspiracy from videos’ visual features. Classical machine learning models such as random forests were developed with data in a tabular form in mind (Raschka, Patterson, & Nolet, 2020). In contrast, deep learning models are more attractive and effective for unstructured data, that is, image and text data in its original (raw) form prior to feature extraction (Raschka et al., 2020). Most methods for evaluating the performance of machine learning and deep learning are based on cross-validation (Raschka, 2018). In k -fold cross-validation, a dataset is randomly shuffled and split into k folds without replacement. Then, k −1 folds are used for model training, and one-fold is used for evaluation. This procedure is repeated k times so that k performance estimates are obtained. The final performance is obtained by averaging the k performance estimates. A commonly recommended value for k is 10 (Kohavi, 1995). Studies on misinformation using computational methods often rely on k -fold cross-validation test to evaluate if textual characteristics of misinformation successfully predict the misinformation content (Pathack & Shrihari, 2019). The present paper evaluates both multilayer perceptron (a deep neural network) and random forest (a traditional decision tree-based ensemble method) that classify conspiracy videos and debunking videos from visual modalities. RQ5: Does the performance to identify conspiracy videos increase when we integrate textual and visual features of videos, compared to using one type alone?
Data and Method
Data Collection
The main dataset we used in this paper consists of YouTube videos that are related to COVID-19 conspiracies. To sample COVID-19 conspiracy related videos, we drew upon search terms from literature and Google YouTube Search Trends (Au, Howard, & Bright, 2020; Pennycook, McPhetres, Zhang, Lu, & Rand, 2020). These search terms covered a variety of COVID-19 conspiracies from themes related to geopolitics (Wuhan Virus, Bioweapon), modern technology (5G conspiracy), and people’s distrust of social elites (the Bill Gates population control conspiracy, Judy Mikovits, QAnon). We then used the YouTube API to sample the top 10 most viewed videos each day from March 1 st to May 31 st , 2020 (N=3668). Among them, 2695 were English videos. The YouTube API provides access to video-related information such as title, description, channel-related information, and user click data ( Textual Content Analysis
Since we are interested in comparing videos that propagate conspiracies vs debunk conspiracies, we developed two content variables to analyze the textual information (i.e., video transcripts) about each video: relevancy and attitude.
Relevancy means whether a video is about the conspiracy topic. This variable serves as a sanity check to ensure that the videos we collected through the YouTube API are about the conspiracy topics we care about.
Attitude means whether a video propagates conspiracies or debunks conspiracies.
Debunk means a video refutes,
Visual Content Analysis
Our visual content analysis consists of the following steps: 1) extract frames from each video, 2) compute various color features for each frame and then aggregate at the video level, and 3) conduct Welch two sample t-tests, controlled with the Benjamini-Hochberg procedure (1995), to examine whether the color statistics differ significantly between conspiracy and debunking videos. To extract the frames from each video, we used the OpenCV Python package (v. 4.4), which allows researchers to extract frames from videos at a pre-determined rate. We extracted one frame per second. Among all the 2153 COVID-19 videos, the average number of frames is 1160. Among the COVID-19 videos, the minimum video length is 5 seconds, the maximum is 36,610 seconds, and the median is 482 seconds. In total, we analyzed 2,497,789 frames. For the climate change videos, the minimum video length is 6 seconds, the maximum is 9,561 seconds, the median is 490 seconds, and the average is 1012 seconds. In total, there are 243,939 frames we analyzed. To calculate the color statistics of each frame, we used both OpenCV and the visual aesthetic GitHub repository from Peng and Jemmott (2018). We are interested in both the low-level features of color use (e.g., lighting) as well as the high-level features of color use (e.g., color variances). Using the two packages, we obtained RGB, HSV, brightness, contrast, and colorfulness statistics for each frame of a video. To compute the color statistics on the video level, we then calculated the median and variance of all the color features across all frames in a d , which estimates the effect size (i.e., the standardized difference of the population means). Building classification models to identity conspiracy videos
To answer RQ4 and RQ5 which examine what features (including textual and visual) can best identify conspiracy videos, we trained multiple models on our hand-labeled COVID-19 dataset. In the first type of model, we only included the textual features from transcripts (n=560). To explore different options of textual features, we built three models on different textual feature sets, including a multi-layer perceptron with two densely connected layers of 64 hidden units trained on word frequencies from the whole transcript, a random forest model trained with the dictionary features, and a random forest model trained with document embedding features. In the second type of model, we only included the visual features (n=407, not censored and conspiracy relevant) to train the multi-layer perceptron and random forest models as before. In the third type of model, we included different combinations of textual and visual features to train the multilayer perceptron and random forest model: 1) transcript and visual features, 2) dictionary and visual features, and 3) document embedding and visual features. In addition, for each of the model above, we conducted the 10-fold cross validation to assess the performance of the model, including precision and recall. To compute the feature importance, we calculated the permutation importance of each of the feature to measure the weight of each feature using the Python eli5 v.0.10.1 package. To test whether models trained on different combinations of textual and visual features perform significantly different from each other, we used the McNemar testing procedure, a statistical test for paired categorical data (such as class label predictions) from two classifiers, implemented in MLxtend v. 0.18.0 (Raschka, 2018).
Results
How Conspiracy and Debunking Videos Differ in their Color Use
To address RQ1 and RQ2, whether conspiracy videos use more low-key lighting and lower color variances than correction videos, we compared the average color-related feature values (the values were averaged across all pixels and frames in a given video) of 2153 conspiracy (n = 1677) and correction videos (n = 476). We found that conspiracy videos indeed exhibit a lower color saturation than correction videos (median saturation 70.87 versus 75.86, Figure 2). While this difference is statistically significant (p = 0.02), the saturation difference between conspiracy and debunking videos is relatively subtle (Cohen’s d = 0.121) and may not be visible to a human observer. For other color features such as the RGB channel values, colorfulness, contrast, hue,
10 and brightness, there is no statistically significant difference between all the frames in our conspiracy videos versus those in the correction videos.
Figure 2. Saturation comparison between conspiracy vs debunking videos
However, notable differences were observed when we compared the frames from the first ten seconds of all conspiracy videos vs correction videos. The reason we conducted analyses on the first 10 seconds, which is considered as the average amount of time for a YouTube viewer to decide whether to engage with the content (McGavin, n.d.). Online creators need to catch viewers’ eyes as soon as possible or they will lose the audience. Table 1 presents the median value of color features that are significant at the 0.05 level, when we compared frames using the first 10 seconds from all conspiracy videos and correction videos. We found that conspiracy videos use less red, green, and blue color compared to correction videos. We also found that conspiracy videos are less bright than debunking videos and used less color variance (contrast). The results show that effect size is small and thus color use in conspiracy vs correction videos differs significantly (p < 0.05) but not substantively (Cohen’s d < 0.5). Besides examining the first 10 seconds images of each video, we also studied thumbnails, a picture to represent the whole video and an important information for users to judge whether to watch the video. Similar to the color use of the first 10 seconds images, we found conspiracy videos used less brightness, contrast, and colorfulness, compared to correction videos and their differences are statistically significant (Table 2). Table 1. Significant results on COVID-19 conspiracy-related videos: using first 10 seconds Table 2. Significant results on COVID-19 conspiracy-related videos: using thumbnails
To examine whether our findings hold true going beyond the COVID-19 conspiracy, we also looked into videos related to conspiracy theories from a traditional controversial science topic, climate change to answer RQ3. We repeated the same analyses as we did for COVID-19 videos: using all frames, the first 10 seconds frames, and the thumbnails. Similar to the findings on COVID-19 conspiracy videos, we found that climate change conspiracy videos used lower saturation, hue, and brightness. The effect size is larger compared to COVID-19 videos (e.g., Cohen’s d for the median saturation is near 0.5, suggesting a medium effect size). Table 3. Significant results on climate change conspiracy-related videos * For results of all the visual features variables (including not significant ones) described in Table 1-3, please see Supplemental Material Appendix I, II, and III. Classifying Conspiracy Videos Comparing Visual vs Textual vs Integration Features
To answer RQ4 and RQ5, which aim to explore what types of features are more useful in identifying conspiracy videos from correction videos, we built different types of models, some using textual features only, some using visual features only, while others integrating both features. We used both a neural network (multilayer perceptron) and a random forest to compare the performance (e.g., precision and recall from 10-fold cross-validation) across models trained with different types of features. Table 4 summarizes the model performance. First, we found that using textual or visual features or an integration produce satisfying performance for identifying conspiracy videos. For instance, our model 1, 4 and 6 achieved precision around 80% and recall around 90%. Using model 1 to classify our 1915 videos that have transcripts, we found that 89.34% are relevant to conspiracy theories, and among them, 84.46% propagated conspiracies, and 15.54% debunked conspiracies. Second, in terms of whether data integration can achieve a better performance, we found inconclusive evidence. For instance, taking model 3, 4, and 5 for example, we found that although integrating textual features (e.g., emotion scores from analyzing video transcripts) and visual features (model 5) gives us the highest recall, the precision rate is lowest compared to only using the textual (model 3) or the visual features (model 4). To further study what textual and visual features are more important for identifying conspiracy videos, we computed the feature importance of model 3, 4, and 5. Table 5(a) shows that for textual features, which are measured by applying an emotion dictionary (NRC) to all the video transcripts, emotions such as trust, fear, and anger are the most important features to identify conspiracy videos. This echoes literature on conspiracy theories that stress the use of trust and fear appeals as key features of conspiracy theories (Aupers, 2012). For visual features, we found that hue, brightness, and color contrast are the most important visual features to identify conspiracy videos. When analyzing the relationship among the visual features such as color hue and brightness (see Supplemental Material Appendix IV), we did not find a notable correlation among the top features, suggesting these top features encode important and non-redundant information for identifying conspiracies. This also aligns with the literature on visual framing in horror films (Rasheed et al., 2005). Table 5(b) further shows that when we used both textual and visual features to identity conspiracies (RQ5), textual features and visual features both appear among the top important features, suggesting that they are both critical in identifying conspiracies.
13 Table 4. Performance comparison in identifying conspiracy videos Table 5 (a). Feature importance from model 3 and 4
Table 5(b). Feature importance from model 5: integrating textual and visual features
14 There could be several possible reasons why integration models like model 5 do not necessarily increase model performance. We speculate that the combination of visual and textual features is redundant and thus an integrated approach does not improve model performance. To investigate this claim, we compared model 3 and 4 via McNemar’s test under the null hypothesis that there is no difference in the performance of the two predictive models. Given a resulting p-value of 0.26, we fail to reject the null at a significance level of 0.05. This may suggest that the model trained on the textual features and the model trained on the visual features behave similarly, and classification model to identify conspiracy does not prefer one type of data source (textual vs visual) over the other. This also aligns with Table 5(b), where we see that both the textual and the visual features appear in the top important feature list for identifying conspiracy videos. Because both the textual features and the visual features result in good model prediction performance when used individually, the low precision score from the integration model (model 5) could be due to the noise in our small dataset. An alternative explanation is that a more sophisticated integration method, compared to our current integration of using simple combinations of emotion + visual features or word embedding + visual features, is required for boosting the performance of a model. We discuss the implication in the discussion section.
Discussion and Future Work
How Conspiracy Videos Manipulate Color and Brightness
Our paper advances the understanding of a core topic in communication -- visual framing in conspiracy theories. Conspiracy theories and correction messages co-exist and compete on social media. Despite of many studies that examine the textual features of conspiracies and how conspiracies diffuse on social media (Tingley & Wagner, 2017; Wood, 2018), little is understood about the visual features of conspiracies. Even less is understood about how visual features differ between conspiracies and correction messages. Studying visual framing of conspiracy is vital for advancing our understanding of conspiracy and misinformation in general as visuals serve as critical cues to influence people’s emotion and behavior (Vraga et al., 2020). In this paper, we focus on one aspect of visual framing: the use of visual modalities, to investigate how conspiracy videos manipulate visual modalities to communicate complex science issues from the recent COVID-19 conspiracies to the traditionally controversial climate change conspiracies. There are two noteworthy findings from our computational analysis on the images from YouTube videos on conspiracies and debunking conspiracies. First, in terms of the COVID-19 conspiracy, we found that conspiracy videos use lower color variance and brightness compared to debunking videos in general. However, the effect size in terms of this difference is small. This significant but not substantive difference poses challenges for platforms and researchers to use visuals to identity misinformation, as the difference in color and brightness does not differ substantively. It also poses challenges for science communicators to teach strategies to the public about how to distinguish conspiracy from truth. Despite the small visual difference between COVID-19 conspiracy videos vs debunking videos, the difference is much larger and appreciable for climate change conspiracies. The difference in the effect size when we compare different science topics suggest that visual framing could be contingent on the topic and thus, algorithms for identifying conspiracy through visuals need to tailor to the topic. Through revealing the complexity of visual use between conspiracy and debunking videos across science domains, our paper contributes to literature on visual framing, conspiracy, and misinformation
15 correction on social media. It will be fruitful for future research to examine other visual aspects beyond color and brightness, and to also compare science topics with other topics such as health, and politics. Interestingly, the color and brightness differences between conspiracy and debunking videos were only substantive and statistically significant for the video thumbnails and the first 10-second frames and not the full-length videos. Compared to correction video thumbnails, conspiracy video thumbnails shared more characteristics with horror films, and this was also true for the first 10-second frames of the videos. There could be two reasons. First, examining all frames could bring more noise to the data. For instance, some videos have a huge variation in the use of color features and even though we used the median color value to average across frames of a video, it still cannot capture the noise. The second reason, which is more likely, is that online creators deliberately orchestrate thumbnails and earlier frames of the video to catch attention of audiences (Zannettou, Chatzis, Papadamou, & Sirivianos, 2018). Since information provided by social media platforms is almost infinite, creating content that grabs attention is crucial in social media platforms (Webster, 2014). Thumbnails and first 10 seconds of frames of YouTube videos function as the clickbait ( Zannettou, Sirivianos, Blackburn, & Kourtellis, 2019) compared to the rest of frames in the videos, and therefore are more likely to use distinguishable textual and visual features to be “watched” by YouTube audiences. Color and brightness could be one effective way conspiracy video creators manipulate to attract their audience.
Beyond Classification -- Gathering Insights from Small and Large Data Using Machine Learning
From an image analysis perspective, our article offers several lessons for how machine learning can be used to extract insights from a dataset. In recent years, machine learning has seen widespread adoption in many scientific fields and disciplines. Most commonly, machine learning is used to replace hand-designed predictive rules for classification and simpler forms of regression models. Machine learning promises to make data modeling less laborious. Instead of spending countless human hours gaining insights from the data as to what constitutes highly accurate predictive rules, we employ algorithms that can automatically learn these rules from data. However, thinking beyond designing accurate classification systems from data, we can also use machine learning classifiers to gain insights from data, as we demonstrated in this paper. In particular, our paper demonstrates how we can use machine learning to identify which combination of features can play a dominant role in distinguishing between different types of videos. The inspection of individual visual features, such as the saturation levels shown in Figure 2, is insufficient for distinguishing between conspiracy and debunking videos reliably. To get a better understanding, one might consider plotting multiple features at once; however, increasing the number of features to look at simultaneously increases the cognitive complexity and quickly becomes infeasible for humans. Making sense out of datasets with multiple features is where machine learning can be utilized. After all, the objective of supervised machine learning algorithms employed in this study is to utilize the features in a dataset to maximize predictive performance. Hence, after training a highly accurate classifier, we can inspect how it utilizes the dataset’s feature information to conclude about the important characteristics. For instance, we found that random forest models can use a combination of multiple visual features to make accurate predictions with a precision of approximately 77% (Table 4), where the three most important features hue, brightness, and contrast are approximately equally important to the model (see Table 5). Noting
16 that hue measures the color content and is effectively uncorrelated to the brightness and contrast properties across the frames (see Supplemental Material Appendix IV) provides evidence that both color (RQ1) and brightness and low-key lighting (RQ2) are properties that can be used together (RQ4) to distinguish between conspiracy and correction videos. When using machine learning for image analysis, two sets of methods may be considered. The first set consists of traditional machine learning methods designed for tabular (also known as “structured”) datasets. Examples of these methods include random forests and multilayer perceptrons, which we employed in this study. A second set of methods includes deep neural network architectures designed for working with images or pixel inputs directly, which are known as “unstructured” data. Examples of this second set of methods include convolutional neural networks (Simonyan & Zisserman, 2015) and vision transformers (Khan et al., 2021). To make unstructured data suitable for traditional machine learning methods requires converting a video or image dataset into a tabular format. This is usually done by extracting features information via additional preprocessing steps, for instance, calculating the summary statistics of the RGB and HSV color channels and other summary features such as colorfulness (Hasler & Süsstrunk, 2003). In contrast, convolutional neural networks or vision transformers take the original images or video frames as input and perform feature extraction implicitly during model training. While this implicit feature extraction is attractive from a researcher’s perspective, because a researcher does not have to spend time and resources extracting summary information from images, convolutional neural networks and vision transformers require substantially larger amounts of data compared to traditional machine learning models designed for tabular data (Figure 5). In this project, we utilized traditional machine learning algorithms because the amount of labeled data (videos annotated as conspiracy or debunking) was limited to a few thousand videos. Based on other works concerned with training convolutional networks for video classification, we estimated that we would require at least hundreds of thousands of videos to train an accurate classifier (Karpathy et al., 2014). While traditional machine learning algorithms can also be applied to unstructured data (such as image frames extracted from videos) directly, the sizeable feature-to-sample ratio can be detrimental to performance due to the risk of overfitting. Hence, we proposed several manual feature extraction techniques, such as computing the color channel averages and variances as well as other color and brightness values. As a beneficial side effect, restricting the problem to a few well-defined features based on domain literature (e.g., communication) also simplifies the analysis of the model’s features for making predictions. While it is easier to interpret the feature contributions from traditional machine learning methods trained on structured data, it is worth noting several methods have recently been developed to gain insights into features deep neural networks derive from unstructured data as well (Zhang & Zhu, 2018). Figure 5. Illustration of the complexity of the hyperparameter tuning tasks associated with different models (top) and recommended dataset sizes (bottom) when working with different models suited for structured and (or) unstructured data
Our study shows that machine learning can yield concrete insights even if a dataset is small. Moreover, to approach a study in a time- and cost-effective manner, we recommend beginning with established building blocks such as those methods used in our computational analysis framework, before considering more complex and data-hungry deep learning models. Beyond gaining insights into the data, these methods can be used as valuable performance baselines when collecting additional labeled data and experimenting with deep neural network architectures such as convolutional neural networks in future work. While deep neural networks trained on large datasets have predictive performance advantages over traditional methods, these methods come with increased hardware and tuning requirements (Sze, Chen, Yang, & Emer, 2017). It is also unclear whether intuitive, human-interpretable information about color and brightness for distinguishing different video categories can be extracted from deep neural networks trained on unstructured data. Future work may explore whether additional insights (for example, information about important image locations) via gradient-based localization methods can be gained from extending this study to using large video datasets. Lastly, a new research area called self-supervised learning has recently emerged, which allows researchers to train large deep learning models even though labeled data is scarce (Jing & Tian, 2020) opening new opportunities for applications of deep learning for studying visual manipulation in the digital era. References
Au, H., Howard, P. N., & Bright, J. (2020, May 18). Social media misinformation on German Intelligence Reports. Retrieved from https://comprop.oii.ox.ac.uk/wp-content/uploads/sites/93/2020/06/ComProp-Coronavirus-Misinformation-Weekly-Briefing-18-05-2020.pdf Afifah, F., Nasrin, S., & Mukit, A. (2018). Vehicle speed estimation using image processing.
Journal of Advanced Research in Applied Mechanics, 48 (1), 9-16. Ahmed, W., Vidal-Alaball, J., Downing, J., & López Seguí, F. (2020). COVID-19 and the 5G conspiracy theory: Social network analysis of Twitter data.
Journal of Medical Internet Research , (5), e19458. https://doi.org/10.2196/19458 Allgaier, J. (2019). Science and environmental communication on YouTube: Strategically distorted communications in online videos on climate change and climate engineering. Frontiers in Communication , , 36. https://doi.org/10.3389/fcomm.2019.00036 Aupers, S. (2012). ‘Trust no one’: Modernization, paranoia and conspiracy culture. European Journal of Communication , (1), 22–34. https://doi.org/10.1177/0267323111433566 Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57 (1), 289-300 Birnbaum, M. L., Norel, R., Van Meter, A., Ali, A. F., Arenare, E., Eyigoz, E., Agurto, C., Germano, N., Kane, J. M., & Cecchi, G. A. (2020). Identifying signals associated with psychiatric illness utilizing language and images posted to Facebook.
Npj Schizophrenia , (1), 1-38. https://doi.org/10.1038/s41537-020-00125-0 Brantner, C., Lobinger, K., & Wetzstein, I. (2011). Effects of visual framing on emotional responses and evaluations of news stories about the Gaza conflict 2009. Journalism & Mass Communication Quarterly , (3), 523–540. https://doi.org/10.1177/107769901108800304 Brennen, J. S., Simon, F., Howard, P. N., & Nielsen, R. K. (2020). Types, sources, and claims of COVID-19 misinformation. Reuters Institute , 7, 3-1. Bucy, E. P., & Joo, J. (2021). Editors’ Introduction: Visual politics, grand collaborative programs, and the opportunity to think big.
The International Journal of Press/Politics , (1), 5–21. https://doi.org/10.1177/1940161220970361 Chong, D., & Druckman, J. N. (2007). Framing theory. Annual Review of Political Science , (1), 103–126. https://doi.org/10.1146/annurev.polisci.10.072805.103054 Culjak, I., Abram, D., Pribanic, T., Dzapo, H., & Cifrek, M. (2012, May). A brief introduction to OpenCV. In 2012 proceedings of the 35th international convention MIPRO (pp. 1725-1730). IEEE.
19 Derry, S. J., Pea, R. D., Barron, B., Engle, R. A., Erickson, F., Goldman, R., Hall, R., Koschmann, T., Lemke, J. L., Sherin, M. G., & Sherin, B. L. (2010). Conducting video research in the learning sciences: Guidance on selection, analysis, technology, and ethics.
Journal of the Learning Sciences , (1), 3–53. https://doi.org/10.1080/10508400903452884 Douglas, K. M., & Sutton, R. M. (2015). Climate change: why the conspiracy theories are dangerous. Bulletin of the Atomic Scientists , (2), 98–106. https://doi.org/10.1177/0096340215571908 Filimonov, K., Russmann, U., & Svensson, J. (2016). Picturing the party: instagram and party campaigning in the 2014 Swedish elections. Social Media + Society , (3), 1-11. https://doi.org/10.1177/2056305116662179 Garrett, R. K., Nisbet, E. C., & Lynch, E. K. (2013). Undermining the corrective effects of media-based political fact checking? the role of contextual cues and naïve theory: undermining corrective effects. Journal of Communication , (4), 617–637. https://doi.org/10.1111/jcom.12038 Geise, S. (2017). Visual framing. In P. Rössler (Ed), The International Encyclopedia of Media Effects , (pp. 1-12). NJ: John Wiley & Sons. Halpern, D., Valenzuela, S., Katz, J., & Miranda, J. P. (2019). From belief in conspiracy theories to trust in others: Which factors influence exposure, believing and sharing fake news. In G. Meiselwitz (Ed.),
Social Computing and Social Media. Design, Human Behavior and Analytics (Vol. 11578, pp. 217–232). NY: Springer International Publishing. https://doi.org/10.1007/978-3-030-21902-4_16 Hasler, D., & Suesstrunk, S. E. (2003, June). Measuring colorfulness in natural images . Proceedings of Human Vision and Electronic Imaging VIII (pp. 87-95). https://doi.org/10.1117/12.477378 Hollway, W., & Jefferson, T. (2005). Panic and perjury: A psychosocial exploration of agency.
British Journal of Social Psychology , (2), 147–163. https://doi.org/10.1348/014466604X18983 Hunt, R. W. G. (1989). Measuring Colour . NJ: John Wiley & Sons. Hussain, M. N., Tokdemir, S., Agarwal, N., & Al-Khateeb, S. (2018, August).
Analyzing disinformation and crowd manipulation tactics on YouTube.
In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Barcelona. https://doi.org/10.1109/ASONAM.2018.8508766 Jing, L., & Tian, Y. (2020). Self-supervised visual feature learning with deep neural networks: A survey.
IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/ 10.1109/TPAMI.2020.2992393.
20 Joblove, G. H., & Greenberg, D. (1978). Color spaces for computer graphics.
Proceedings of the 5th Annual Conference on Computer Graphics and Interactive Techniques - SIGGRAPH ’78 (pp. 20–25). https://doi.org/10.1145/800248.807362 Joo, J., & Steinert-Threlkeld, Z. C. (2018). Image as data: Automated visual content analysis for political science. arXiv preprint arXiv:1810.015 Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014). Large-scale video classification with convolutional neural networks.
In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 1725-1732). Khan, S., Naseer, M., Hayat, M., Zamir, S. W., Khan, F. S., & Shah, M. (2021). Transformers in Vision: A Survey. ArXiv:2101.01169 [Cs]. http://arxiv.org/abs/2101.01169 Kienhues, D., Jucks, R., & Bromme, R. (2020). Sealing the gateways for post-truthism: Reestablishing the epistemic authority of science.
Educational Psychologist , (3), 144–154. https://doi.org/10.1080/00461520.2020.1784012 King, A. J., & Lazard, A. J. (2020). Advancing visual health communication research to improve infodemic response. Health Communication , (14), 1723–1728. https://doi.org/10.1080/10410236.2020.1838094 Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Ijcai , 14(2), pp. 1137-1145). Kou, Y., Gui, X., Chen, Y., & Pine, K. (2017, December). Conspiracy Talk on Social Media: Collective Sensemaking during a Public Health Crisis.
Proceedings of the ACM on Human-Computer Interaction , (CSCW) (pp. 1–21). https://doi.org/10.1145/3134696 Kress, G. R., & Van Leeuwen, T. (1996). Reading images: The grammar of visual design.
NY: Psychology Press. Le, Q., & Mikolov, T. (2014, June). Distributed representations of sentences and documents. In International conference on machine learning (pp. 1188-1196). PMLR. McGavin, R. (n.d.).
How to use the first 10 seconds of your video to hook your audience and increase views
PLoS ONE , (7), e40333. https://doi.org/10.1371/journal.pone.0040333 Mian, A., & Khan, S. (2020). Coronavirus: The spread of misinformation. BMC Medicine , (1), 89. https://doi.org/10.1186/s12916-020-01556-3
21 Miller, J. M. (2020). Psychological, political, and situational factors combine to boost covid-19 conspiracy theory beliefs.
Canadian Journal of Political Science , 1–8. https://doi.org/10.1017/S000842392000058X Mohammad, S. M., & Turney, P. D. (2013). Nrc emotion lexicon.
National Research Council, Canada . Moore, A. (2018). Conspiracies, conspiracy theories and democracy.
Political Studies Review , (1), 2–12. https://doi.org/10.1111/1478-9302.12102 Neville-Shepard, R. (2018). Paranoid style and subtextual form in modern conspiracy rhetoric. Southern Communication Journal , (2), 119–132. https://doi.org/10.1080/1041794X.2017.1423106 Oxman, E. (2010). Sensing the image: Roland Barthes and the affect of the visual. SubStance , (2), 71–90. https://doi.org/10.1353/sub.0.0083 Peng, Y. (2018). Same candidates, different faces: Uncovering media bias in visual portrayals of presidential candidates with computer vision. Journal of Communication 68 (5), 920–941. doi: 10.1093/joc/jqy041. Peng, Y. (2021). What makes politicians’ instagram posts popular? Analyzing social media strategies of candidates and office holders with computer vision.
The International Journal of Press/Politics , (1), 143–166. https://doi.org/10.1177/1940161220964769 Peng, Y., & Jemmott, J. B. (2018). Feast for the eyes: Effects of food perceptions and computer vision features on food photo popularity. International Journal of Communication, 12 . 313-336. Pennycook, G., McPhetres, J., Zhang, Y., Lu, J. G., & Rand, D. G. (2020). Fighting COVID-19 Misinformation on Social Media: Experimental Evidence for a Scalable Accuracy-Nudge Intervention.
Psychological Science, 31 (7), 770–780. https://doi.org/10.1177/0956797620939054 Pinedo, I. C. (2004). Postmodern elements of the contemporary horror film.
The Horror Film , 85-117. Raihani, N. J., & Bell, V. (2019). An evolutionary perspective on paranoia.
Nature Human Behaviour , (2), 114–121. https://doi.org/10.1038/s41562-018-0495-0 Raschka, S. (2018). Model evaluation, model selection, and algorithm selection in machine learning. arXiv preprint arXiv:1811.12808. Raschka, S., Patterson, J., & Nolet, C. (2020). Machine learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence. Information , (4), 193. https://doi.org/10.3390/info11040193
22 Rasheed, Z., Sheikh, Y., & Shah, M. (2005). On the use of computable features for film classification.
IEEE Transactions on Circuits and Systems for Video Technology , (1), 52–64. https://doi.org/10.1109/TCSVT.2004.839993 Rodriguez, L., & Dimitrova, D. V. (2011). The levels of visual framing. Journal of Visual Literacy , (1), 48–65. https://doi.org/10.1080/23796529.2011.11674684 Scheufele, D. A., & Krause, N. M. (2019). Science audiences, misinformation, and fake news. Proceedings of the National Academy of Sciences , (16), 7662–7669. https://doi.org/10.1073/pnas.1805871115 Shahsavari, S., Holur, P., Wang, T., Tangherlini, T. R., & Roychowdhury, V. (2020). Conspiracy in the time of corona: Automatic detection of emerging COVID-19 conspiracy theories in social media and the news. Journal of Computational Social Science , (2), 279–317. https://doi.org/10.1007/s42001-020-00086-5 Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Stemmet, C. (2012). Trust no truth: an analysis of the visual translation styles in the conspiracy film (Doctoral dissertation, University of Pretoria). Sze, V., Chen, Y. H., Yang, T. J., & Emer, J. S. (2017, November). Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE, 105 (12) (pp. 2295-2329). Tingley, D., & Wagner, G. (2017). Solar geoengineering and the chemtrails conspiracy on social media.
Palgrave Communications , (1), 12. https://doi.org/10.1057/s41599-017-0014-3 Valdez, P., & Mehrabian, A. (1994). Effects of color on emotions. Journal of Experimental Psychology: General , (4), 394–409. https://doi.org/10.1037/0096-3445.123.4.394 van Prooijen, J.-W., & Douglas, K. M. (2017). Conspiracy theories as part of history: The role of societal crisis situations. Memory Studies , (3), 323–333. https://doi.org/10.1177/1750698017701615 Vraga, E. K., Kim, S. C., & Cook, J. (2019). Testing Logic-based and Humor-based Corrections for Science, Health, and Political Misinformation on Social Media. Journal of Broadcasting & Electronic Media , (3), 393–414. https://doi.org/10.1080/08838151.2019.1653102 Vraga, E. K., Kim, S. C., Cook, J., & Bode, L. (2020). Testing the effectiveness of correction placement and type on Instagram. The International Journal of Press/Politics , (4), 632–652. https://doi.org/10.1177/1940161220919082 Webster, J. G. (2014). The Marketplace of Attention: How Audiences Take Shape in a Digital Age . Cambridge: MIT Press.
23 Wood, M. J. (2018). Propagating and debunking conspiracy theories on twitter during the 2015–2016 Zika virus outbreak.
Cyberpsychology, Behavior, and Social Networking , (8), 485–490. https://doi.org/10.1089/cyber.2017.0669 Zannettou, S., Chatzis, S., Papadamou, K., & Sirivianos, M. (2018, May). The Good, the bad and the bait: Detecting and characterizing clickbait on YouTube. , 63–69. https://doi.org/10.1109/SPW.2018.00018 Zannettou, S., Sirivianos, M., Blackburn, J., & Kourtellis, N. (2019). The Web of false information: rumors, fake news, hoaxes, clickbait, and various other shenanigans. Journal of Data and Information Quality , (3), 1–37. https://doi.org/10.1145/3309699 Zhang, Q. S., & Zhu, S. C. (2018). Visual interpretability for deep learning: a survey. Frontiers of Information Technology & Electronic Engineering, 19 (1), 27-39.(1), 27-39.