[PDF] Automatic Monitoring Social Dynamics During Big Incidences: A Case Study of COVID-19 in Bangladesh

Abstract

Newspapers are trustworthy media where people get the most reliable and credible information compared with other sources. On the other hand, social media often spread rumors and misleading news to get more traffic and attention. Careful characterization, evaluation, and interpretation of newspaper data can provide insight into intrigue and passionate social issues to monitor any big social incidence. This study analyzed a large set of spatio-temporal Bangladeshi newspaper data related to the COVID-19 pandemic. The methodology included volume analysis, topic analysis, automated classification, and sentiment analysis of news articles to get insight into the COVID-19 pandemic in different sectors and regions in Bangladesh over a period of time. This analysis will help the government and other organizations to figure out the challenges that have arisen in society due to this pandemic, what steps should be taken immediately and in the post-pandemic period, how the government and its allies can come together to address the crisis in the future, keeping these problems in mind.

Full PDF

AAutomatic Monitoring Social Dynamics DuringBig Incidences: A Case Study of COVID-19 inBangladesh

Fahim Shahriar and Md Abul Bashar Comilla University, Cumilla, Bangladesh [email protected] Queensland University of Technology, Brisbane, Australia [email protected]

Abstract.

Newspapers are trustworthy media where people get themost reliable and credible information compared with other sources. Onthe other hand, social media often spread rumors and misleading news toget more traﬃc and attention. Careful characterization, evaluation, andinterpretation of newspaper data can provide insight into intrigue andpassionate social issues to monitor any big social incidence. This studyanalyzed a large set of spatio-temporal Bangladeshi newspaper data re-lated to the COVID-19 pandemic. The methodology included volumeanalysis, topic analysis, automated classiﬁcation, and sentiment analysisof news articles to get insight into the COVID-19 pandemic in diﬀerentsectors and regions in Bangladesh over a period of time. This analy-sis will help the government and other organizations to ﬁgure out thechallenges that have arisen in society due to this pandemic, what stepsshould be taken immediately and in the post-pandemic period, how thegovernment and its allies can come together to address the crisis in thefuture, keeping these problems in mind.

Keywords:

Topic Analysis · LDA Topic Model · Dynamic Topic Modeling · Time Series Decomposition · Bengali Text Dataset · Newspaper · Text Classiﬁ-cation · RNN · LSTM · Sentiment Analysis · CNN-BiLSTM

The outbreak of COVID-19 has brought serious health and economic conse-quences to society. It triggered one of the largest recessions in the world. Traveland currency companies lost billions of dollars, global stock markets plummeted,schools were closed, and the health care system was exhausted. Mental and so-cial problems arose as people started to worry about infection, losing friends andfamily, losing their jobs, or isolation.Bangladesh has not been rid of this terrible virus. The virus has had majorimpacts on people’s lives and signiﬁcantly degraded quality of life. There weresigniﬁcant cases of infections and deaths. The hospital did not have adequate a r X i v : . [ c s . C Y ] J a n Fahim Shahriar and Md Abul Bashar treatment facilities, including doctors, beds, and emergency supplies. Besidesthe health crisis, people have suﬀered enormous economic losses. Many peoplehave lost their jobs; companies lost revenues, many of them go bankrupt. Themost aﬀected were the day-laborer and low-income workers. The lockdown inthe pandemic suppressed their income. Many workers starved since their liveli-hood was cut-oﬀ. Working people took to the streets in search of their livelihood.They started protesting in the streets for relief. Seeing their plight, many people,including the government, came forward to help them. Because of the lockdown,the international transport system was shut down, and stopped imports andexports. As a result, the country’s industry suﬀered miserably. Objective mon-itoring and analysis of social dynamics during such a big incident can help thegovernment and other authorities decide and take initiatives where required. Thisresearch proposes utilizing articles published in newspapers to objectively mon-itor and analyze social dynamics during a big incidence, such as the COVID-19pandemic.Newspapers are one of the most popular mass media in our daily life. News-papers provide information on all the country’s ﬁnancial, political, social, envi-ronmental, etc. Whether it is a public campaign, an emergency, or a provocation,newspapers are a great resource for keeping track of internal and external eventsand stories. This mass media generally provide authentic information, whereassocial media such as Facebook and Twitter often spread rumors and cannot berelied upon for authentic news. Eﬀective classiﬁcation, analysis, and interpreta-tion of newspaper data can provide a deep understanding of any big incident ina society.In this research, we analyzed a large spatio-temporal dataset of BangladeshiDaily Newspapers related to COVID-19. The approach incorporated volumeanalysis, topic analysis, automatic classiﬁcation of news articles, and sentimentanalysis to better understand the COVID-19 pandemic in Bangladesh’s divisionsand districts over time. The experimental results and analysis will give an ob-jective insight into the COVID-19 pandemic in Bangladesh that will beneﬁt thegovernment and other authorities for disseminating resources. This paper espe-cially shows how to utilize automatic techniques for monitoring social dynamicsin big incidents such as a pandemic, natural disaster, and social unrest.This research makes the following main contributions. (1) It collects, manu-ally classiﬁes, and publishes a large collection of COVID-19 related Bangladeshinews articles in Bengali and English. (2) It investigates the topics discussed dur-ing the COVID-19 pandemic in Bangladesh and how they have changed overtime using manual and automatic techniques. (3) It designs a CNN-BiLSTMarchitecture for analyzing sentiment in Bengali text. (4) It analyzes COVID-19related sentiments in the community over time and space. (5) It automaticallycategorizes documents into classes of observation interest for monitoring socialinterests.The rest of the paper is organized as follows: Section 2 discusses relatedwork, Sections 3 discussed methodology and data collection, Section 4 presentsexperimental results, and Section 5 concludes the paper. utomatic Monitoring Social Dynamics During Big Incidences 3

In this segment, we will discuss some past related works done by diﬀerent ana-lysts. We will divide it into four sections: Static Topic Modeling, Dynamic TopicModeling, Sentiment Analysis, and Text Classiﬁcation.

Topic modeling is a process of discovering hidden topics in a collection of textsBashar et al. (2020a); Balasubramaniam et al. (2020). It can be considered asa factual show of topics through text mining. One of the most popular topicsmodeling technique Latent Dirichlet Allocation (LDA) (Blei et al., 2003; Basharet al., 2020a) discovers topics based on word recurrence in a set of documents.LDA is incredibly valuable for ﬁnding a sensibly precise blend of topics inside agiven record.Topic modeling has been well studied for English text mining. For instance,Zhao et al. (2011) used unsupervised topic modeling in their research and com-pared the content of Twitter with the traditional news media “New York Times” .They used the Twitter-LDA model to ﬁnd topics from a representative sample ofthe entire Twitter and then used text mining techniques to compare these Twit-ter topics with

New York Times ’ topics, taking into account the topic categoryand type. Wang and Blei (2011) developed an algorithm to recommend scientiﬁcarticles to users in online communities. Their method combines the advantagesof traditional collaborative ﬁltering and probabilistic topic modeling. They ap-plied collaborative topic modeling for recommending scientiﬁc articles. Wayastiet al. (2018) applied the Latent Dirichlet Allocation function in the researchand extracted topics based on ride-hailing customers’ posts on Twitter. In theirresearch, they used 40 parameter combinations of LDA to obtain the best com-bination of topics. According to the perplexity value, the customers discussed 9topics in the post, including keywords for each topic. Tong and Zhang (2016)recommended two experiments to build topic models on Wikipedia articles andTwitter users’ tweets.However, topic modeling has not been well studied for Bengali text mining,unlike English text mining. Das and Bandyopadhyay (2010b) used topic wiseopinion summarization from Bengali text. They applied K-Means clustering and document-level theme relational graph representation. However, they did notuse any topic modeling technique, such as LDA. Rakshit et al. (2015) applieda Multi-class SVM classiﬁer for analyzing Bengali poetry and poet relations.They performed a subject-wise classiﬁcation of poems into foreordained cate-gories. Hasan et al. (2019) compared the performance of the LDA and LDA2vectopic model in Bengali Newspaper. Al Helal and Mouhoub (2018) used LDA fordetecting the primary topics from a Bengali news corpus. However, they did notdirectly apply LDA in the Bengali text. Instead, they translated the Bengali textinto English and then applied LDA to detect the topics. Rahman et al. (2019)used lexical analysis for sentence wise topic modeling. Their topic modeling was

Fahim Shahriar and Md Abul Bashar based on sentiment analysis. None of the existing works used Bengali text topicmodeling for monitoring a pandemic or a major event.In addition to English and Bengali, topic modeling in various languages isalso studied. De Santis et al. (2020) analyzed a system that uses NLP pipelines, atheoretical framework for content aging to determine the qualitative parametersof tweets, and co-occurrence analysis to build topic maps chart splits to identifytopics related to posts from Italian Twitter users. Han et al. (2020) extractedtopics related to COVID-19 from Sina Weibo(Chinese microblogging website)text dataset through the LDA topic model.

The dynamic topic model is a cumulative model that can be used to analyzechanges in document collection over time Bashar et al. (2020a). There are manystudies on dynamic topic modeling for the English language. For example, Al-Sumait et al. (2008) showed that the LDA model could be extended to theonline version by gradually updating the current model with new data, and themodel has the ability to capture the dynamic changes of the topics. Dieng et al.(2019) researched D-ETM on three data sets and discovered the word proba-bilities of eight diﬀerent topics that D-ETM learned over time. Nguyen et al.(2020) discovered latent topics from the ﬁnancial reports of listed companies inthe United States and studied the evolution of the themes discovered throughdynamic topic modeling methods. Marjanen et al. (2020) discussed humanisticinterpretation’s role in analyzing discourse dynamics through historical news-papers’ topic models. Bashar et al. (2020a) extracted ﬁve COVID-19 relatedtopics from the Twitter dataset through LDA topic modeling, and they showedthe changes in the extracted topics over time. Nevertheless, for the Bengali lan-guage, so far, there is no research on dynamic topic modeling. In this study, westudy the evolution of the extracted COVID-19 related topics over time usingdynamic topic modeling.

Text classiﬁcation, moreover known as text labeling or text categorization, iscategorizing content into organized bunches Bashar et al. (2020b); Bashar andNayak (2020); Bashar et al. (2018); Bashar and Nayak (2020). By utilizing NLP,classiﬁers can naturally label text and, after that, relegate a set of predeﬁnedlabels or categories based on its substance.Many researchers worked on text classiﬁcation in English. For example, Patiland Pawar (2012) used the Naive Bayes algorithm to classify website content.They divided the website content into ten categories, and the average accuracyof the ten categories was almost 80%. Bijalwan et al. (2014) used K-NearestNeighbors, Naive Bayes, and Term-gram to classify text. They showed that intheir research, K-Nearest Neighbors’ accuracy was better than Naive Bayes andTerm-gram. Tam et al. (2002) showed that K-Nearest Neighbors was superiorto NNet and Naive Bayes for English documents. Pawar and Gawande (2012) utomatic Monitoring Social Dynamics During Big Incidences 5 showed that Support Vector Machines’ performance is far superior to DecisionTrees, Naive Bayes, K-Nearest Neighbors, Rocchio’s algorithms, and Backpropa-gation networks. Liu et al. (2010) showed that Support Vector Machines is betterthan K-Nearest Neighbors and Naive Bayes.In addition to English text classiﬁcation, some researchers have also classi-ﬁed Bengali text. For example, Mandal and Sen (2014) applied four supervisedlearning methods: (Naive Bayes, k nearest neighbor, Decision Tree classiﬁer, andSupport Vector Machine) for labeled web documents. They classiﬁed the doc-uments into ﬁve categories: (Business, Sports, Health, Technology, Education).Chy et al. (2014) applied a Naive Bayes classiﬁer to categorized Bengali news.Pal et al. (2015) described Naive Bayes classiﬁer for Bengali sentence classiﬁ-cation. They used over 1747 sentences in their experiment and got an accuracyof 84%. Kabir et al. (2015) used Stochastic Gradient Descent (SGD) classiﬁerto categorize Bengali documents. Eshan and Hasan (2017) created an applica-tion that identiﬁes abusive texts in Bengali. They applied Naive Bayes, RandomForest, Support Vector Machine (SVM) with Radial Basis Function (RBF), Lin-ear, Polynomial, and Sigmoid kernel to classify the texts and compare the resultsamong them. Islam et al. (2017) applied SVM, Naive Bayes, and Stochastic Gra-dient Descent(SGD) to classify Bengali documents and compare results of thoseclassiﬁers. However, non of the existing works used Bengali text classiﬁcation formonitoring a pandemic or a major event.

Sentiment Analysis refers to computationally recognizing and categorizing opin-ions communicated in a chunk of text. It is successfully used in commerce wherethey use it to track online discussions to identify social estimation of their brand,item, or beneﬁt.A lot of research work has been done in sentiment analysis for the Englishlanguage. For example, Cui et al. (2006) have reviewed about 100,000 productreviews from various websites. They divided reviews into two main categories:positive and negative. Jagtap and Dhotre (2014) applied the Support VectorMachine and Hidden Markov Model, and the Hybrid classiﬁcation model is wellsuited for extracting teacher feedback and evaluating sentiments. Alm et al.(2005) divided the seven emotional words into three polarity categories: positiveemotion, negative emotion, and neutral, and the Winnow parameter adjustmentmethod used can reach 63% accuracy. For extracting the Twitter sentiment,Agarwal et al. (2011) applied unigram, tree model, and feature-based model.Bashar et al. (2020a) used Convolutional Neural Network to extract sentimentsrelated to COVID-19 from the Twitter dataset.Some research used sentiment analysis in Bengali texts. For instance, Dasand Bandyopadhyay (2010a) classiﬁed emotions into six categories: Happy, Sad,Anger, Disgust, Fear, and Surprise. Chowdhury and Chowdhury (2014) used sen-timent analysis in Bangla Microblog Posts. They applied a semi-supervised boot-strapping method utilizing SVM and Maximum Entropy. Hasan et al. (2014) pro-posed a strategy to identify sentiments in Bengali texts by Contextual Valency

Fahim Shahriar and Md Abul Bashar

Analysis. They employed the methodology of POS Tagger in their approach.Hassan et al. (2016) used recurrent neural networks to Romanize Bengali textsand analyze sentiments. In their experiments, they used Bangla and RomanizedBangla Text (BRBT) dataset. For Sentiment Analysis of Bangla Microblogs,Asimuzzaman et al. (2017) used Adaptive Neuro-Fuzzy Deduction Framework toanticipate extremity and utilized ﬂuﬀy rules of speech in semantic rules. Mahtabet al. (2018) designed a model for sentiment analysis on Bangladesh Cricketnews. They applied TF-IDF and SVM (Support Vector Machine) in their modeland found 64.596% accuracy. Tripto and Ali (2018) used sentiment analysis onYoutube comments. Their research built a model based on deep learning thatclassiﬁes a Bengali text into three classes and ﬁve sentiment classes. Tabassumand Khan (2019) used the Random Forest Algorithm to classify the sentimentsin Bengali texts. Tuhin et al. (2019) applied Naive Bayes and a topic model-ing approach to design an Automated System of Sentiment Analysis in BengaliText. Their system classiﬁes emotions into six categories: happy, sad, tender,excited, angry, and scared. However, non of the existing works used Bengali textsentiment analysis for monitoring a pandemic or a major event.

This pandemic situation has changed society and the country by a signiﬁcantmargin. The whole face of the country has changed completely. Some signiﬁcantsectors of the nation, such as economic, social, political, have been aﬀected mas-sively. The education systems have been hit particularly hard. This research aimsto automatically analyze the daily newspapers in Bangladesh to reveal what isgoing on in society and gain knowledge to comprehend the fundamental topics(or subjects) and sentiment arising and advance in the discussion.This study will conduct a topic and sentiment analysis on a large collectionof COVID-19 related news articles published in Bangladesh both in Bengali andEnglish texts. The study will focus the analysis on both spatial and temporaldimensions. In the topic analysis, we used LDA-based topic modeling and dy-namic topic modeling to ﬁnd the topics, their evolution over time, and their timeand space (location). We also analyzed what impact each topic had on particu-lar areas. Then we analyzed the sentiment distribution over time and space toidentify social sentiment in space and time. The experimental workﬂow of thisstudy is shown in Figure 1.First, we manually gathered a large collection of COVID-19 related newsarticles from Bangladeshi six most circulated daily newspapers. Along with thenews, the collection contains geospatial and temporal information on the news.The dataset was then preprocessed by removing HTML, markers, and othernon-relevant information such as adverts.Then, we manually organized the news articles in a set of classes and sub-classes. Then we extracted the topics and the subtopics from the dataset. We utomatic Monitoring Social Dynamics During Big Incidences 7 used these classes and sub-classes to perform basic analysis such as comparingsimilarity and diversity in the news. These classes and sub-classes have also beenused to qualitatively evaluate the accuracy of the topics discovered by LDA andlabels predicted by classiﬁers before LDA and classiﬁers are employed for detailedanalysis.

Bengali NewspaperDatatset CollectionPreprocessing and DataPreparationTopic AnalysisVolume AnalysisTemporal Analysis ofVolume Spatial Analysis ofVolumeTopic Extraction Dynamic Topic Modeling:Temporal Trends of Topics Spatial Distribution of TopicsAutomatic Classification ofNews Articles Sentiment Analysis

Fig. 1.

Experimental Work ﬂow

These publicly available News articles related to COVID-19 have been collectedfrom the six most popular newspapers in Bangladesh from 21 January 2020to 19 May 2020. The six newspapers are

The Daily Prothom Alo , BangladeshPratidin , Kaler Kantho , The Daily Star , Newage , and

The Daily Observer . Atotal of 15,565 news articles are collected from these six newspapers. From everynews article, we extracted the news title, the main body of the news, a summaryof the news (i.e., ﬁrst few lines of the news body), the published date, and thenews incident’s location. We used Python’s

BeautifulSoup and

Newspaper3k toolfor extracting the news content.

BeautifulSoup is a popular Python package forparsing HTML and XML archives and one of the most popular web scrapingtools.

Newspaper3k is a user-friendly library for scraping the news articles andother related data from newspaper portals. It is built upon request and usedto parse LXML. This module is an improved version of the

Newspaper moduleand is also used for the same purpose. Table 1 summarizes the statistics of the

Fahim Shahriar and Md Abul Bashar collection. We call this collection

Comilla University COVID-19 News Collection (CoU-CNC). We made it available online for anyone for further analysis. Article Source Language Article CountThe Daily Prothom Alo Bengali 4169Bangladesh Pratidin Bengali 5584Kaler Kantho Bengali 1160The Daily Star English 1278The Daily Observer English 1191New Age English 2183

Table 1.

CoU-CNC Dataset Statistics

Out of these six newspapers, news articles in three newspapers (

The DailyProthom Alo , Bangladesh Pratidin , Kaler Kantho ) are composed in the Ben-gali language, and in the other three newspapers (

The Daily Star , The DailyObserver , New Age ) articles are composed in the English language. There are10,913 news articles in the Bengali language, and the remaining 4652 news arti-cles are in the English language.As the dataset has 4,652 articles in English and we wanted all articles in thesame language to be better parsed, so we translated the English articles intoBengali via Python’s googletrans module. As a result, after translating these ar-ticles, all the articles are in the Bengali language. Then, we applied tokenizationto split a string of text into smaller tokens. The news articles are split into sen-tences, and sentences are tokenized into words. Then, we applied noise removal(e.g., removing HTML tags, extra white spaces, special characters, numbers)to clean up the text. Then, we removed the stopwords from the document. Asthere is no build-in stopwords module for Bengali nltk, we manually created astopword list and made it available online . Then, we expanded contraction. Weset the minimum letter length to 6. We also removed all the words that werebelow the minimum letter length. There are no good resources for stemming andlemmatization in the Bengali language. So, we applied stemming and lemmati-zation to the tokens in our own process. After removing all the stopwords andother noises, there were a total of 80,693 tokens. There are some speciﬁc suﬃxesfor the Bengali language. The suﬃx removal from the word has also been donewith the help of Python. We used Bangla Steamer.Steamer library of Python toimprove accuracy. However, it did not show the expected results as the libraryis eﬀective for a small number of Bengali words. To increase the accuracy of this80,693 sizes lemmatized dictionary, we manually veriﬁed about 30000 most fre-quent tokens from 80693 words. We lemmatized where we needed to lemmatizedmanually, and we also corrected the incorrect and misspelled words where it was CoU-CNC Dataset: https://cutt.ly/djGILi2 Bengali-Stopwords: https://cutt.ly/2jXbDRB utomatic Monitoring Social Dynamics During Big Incidences 9 needed. Many more words are manually lemmatized and corrected through thismanually 30,000 words check. We have published veriﬁed Bengali words on theInternet and titled “Modiﬁed Bengali Words” for further analysis.To compare the number of news published and the COVID-19 cases ofBangladesh, we collected an open-source dataset of conﬁrmed COVID-19 casesand death cases of Bangladesh from March 8 to May 19.We also collected an-other open-source dataset of conﬁrmed cases based on divisions and districts ofBangladesh from March 8 to May 19. Class Distribution in News Articles

After collecting the new articles, ﬁrst,we analyzed them manually. In this process, we extracted eight classes (shownin Table 2) and 19 sub-classes from the news articles. The representation of theextracted eight classes and the hierarchical organization of sub-classes are shownin Figure 2. The distribution of the extracted classes over news articles is shownin Figure 3 and the distribution of the extracted sub-classes over news articlesis shown in Figure 4.

Table 2.

Eight Classes Extracted from the Collected News Articles(1) Statistics, (2) Social Information, (3) COVID-19 Eﬀects, (4) COVID-19 Responsesand Preventive Measure, (5) Government Announcement and Responses, (6) Solidar-ity and Cooperation, (7) International Information, and (8) Health Organization Re-sponses

Time Series or temporal analysis of newspaper articles is utilized to observe thetransient expansion during the pandemic. Time series decomposition includesconsidering a series of components in the time dimension: Level, Trend, Season-ality, and Noise segments. Level refers to the average value in the series, Trendrefers to the increasing or decreasing value in the series, Seasonality refers to therepeating short-term cycle in the series, and ﬁnally, Noise refers to the randomvariation in the series. Decomposition gives a powerful supportive model for pon-dering time series and better arrangement issues during time series analysis anddecision making. The additive model (Dagum, 2010) suggests that the segmentsare added as the following formula: y ( x ) = l ( x ) + t ( x ) + s ( x ) + n ( x ) (1)where y ( x ) represents the additive model, l ( x ) represents the observed level, t ( x ) represents the trend, s ( x ) represents the seasonality and n ( x ) represents Modiﬁed Bengali Words: https://cutt.ly/8jE6GIC https://data.humdata.org/dataset/district-wise-quarantine-for-covid-19 Social impactPublic unawarenessProtestationSanctionSpread of rumors ormisinformation on corona virusPositive patient symptoms andidentificationSevere health outcomes anddeathsTransmission patterns and risksNegative cases and Coronavirus recovery storiesStrategic preparedness andresponse planGlobal economic impact ofCorona virusProtective products andmachinesCorona virus treatment andVaccineGovernment guidelines,instructions and mobilizationPolicy inconsistencyExternal support, AidsTrip and transportationRepatriationGlobal economic impact ofCorona virusStatisticsSocial InformationCOVID-19 EffectsCOVID-19 Responses andPreventive measureGovernmentAnnouncement andresponsesSolidarity and cooperationHealth OrganizationsResponsesInternational Information

Fig. 2.

Manually Extracted Classes and Sub-classesutomatic Monitoring Social Dynamics During Big Incidences 11

Solidarity and 4.9%Health 5.5%Government 8.9%International 10.2%Statistics13.1%Social Information13.7% COVID-19 Effects26.3%COVID-19 17.4%

Fig. 3.

Distribution of Manually Extracted Classes over News Articles

Repatriation2.4%Public unawareness3.0%Transmission 3.0%Policy inconsistency3.4%Global economic 4.7%Negative cases and 4.9%Government 8.2%Global political 8.4%Severe health 9.4% Strategic 19.1%Sanction2.4%Corona virus 2.3%Protestation1.6%Trip and 1.5%Protective products 1.4%External support, 1.0%Spread of rumors or 0.8%Positive patient 11.5%Social impact11.0%

Fig. 4.

Distribution of Manually Extracted Sub-classes over News Articles2 Fahim Shahriar and Md Abul Bashar the noise or residual in the signal x . This model is linear. The change over aperiod of time is reliably aﬀected by the similar sum of the linear trend as astraight line. A linear seasonality has a similar recurrence and abundance. Onthe other hand, A multiplicative model (Dagum, 2010) recommends that thecomponents are multiplied together as the following formula: y ( x ) = l ( x ) × t ( x ) × s ( x ) × n ( x ) (2)where y ( x ) represents the multiplicative model, l ( x ) represents the observedlevel, t ( x ) represents the trend, s ( x ) represents the seasonality and n ( x ) repre-sents the noise in the signal x . A multiplicative model is exponential or quadraticwhen expanding or diminishing over the long run. A nonlinear pattern is a bentline. In this examination, we disintegrated the time series utilizing the multi-plicative model. For Spatial Analysis, we used Tableau Software to comparethe number of news published, and the number of COVID-19 conﬁrmed casesgeographically. Analyzing the topics of news articles published during a major incident or a pan-demic like COVID-19 can help monitor the situation and understand the publicconcerns, which is critical for government authorities and charity organizationsto disseminate required resources and aids. However, in such a situation, a largenumber of news articles are published in various newspapers. We observed thatas the situation deteriorated during the pandemic, newspapers had to publishmuch news on various topics. Manually analyzing the topics by reading a largenumber of articles is time-consuming and expensive. We utilized two unsuper-vised machine learning techniques: (a) LDA (Blei et al., 2003), a popular topicmodeling technique, as static topic modeling to automatically ﬁnd topics of ar-ticles published in newspapers, and (b) dynamic topic modeling in (Blei andLaﬀerty, 2006) to see how those topics evolve over the long haul.LDA is a Bayesian probabilistic model that discovers topics and providestopic distribution over documents and word distribution over topics. It has twophases: (a) the ﬁrst phase models each document as a composition of topics,and (b) the second phase models each topic as a composition of words. LDAutilizes word co-occurrences inside documents for discovering topics in a doc-ument assortment. Words occurring in an equivalent document are practicallycoming from the same topics, and documents containing comparative words willundoubtedly include comparable topics. In this research, the

Gensim package inPython was utilized to execute the LDA model. We utilized every news articleas a document in the topic modeling. Before applying the LDA topic model,we manually associated documents into general classes and sub-classes to knowabout the quality of LDA extracted ﬁne-grained topics.Then, we analyzed each LDA extracted topic’s temporal trends to see whena topic has been discussed more or published more in the newspapers. Finally,we analyzed each topic’s spatial distribution to see what eﬀect each had in a utomatic Monitoring Social Dynamics During Big Incidences 13 particular place. We used Tableau software to analyze the spatial distribution ofeach topic.

The static topic modeling treats words as interchangeable and indeed treats doc-uments as interchangeable. However, the presumption of replaceable documentsis impractical for some assortments when accumulating along the time. For exam-ple, tweets, news articles, and insightful articles as they are advancing substancealong time. The subjects in a newspaper article assortment develop, and it isessential to display the elements of the fundamental topics unequivocally.Dynamic topic modeling extends the static theme, which illustrates the pro-gression of the theme in consolidate. Dynamic topic modeling can catch thedevelopment of topics in a successively coordinated assortment of news articles.In this research, the articles are synchronized by week. We used the dynamictopic model to analyze discussion topics and topic changes over time.

Then we built a text classiﬁer to verify their performance and predict the class,sub-class, and topics in the unknown (upcoming in the future) news articles.Such classiﬁcation is important when we need to monitor a speciﬁc category (orclass or group) of news. We made Long Short-Term Memory (LSTM) RecurrentNeural Network (RNN) models in Python utilizing Keras deep learning libraryfor text classiﬁcation. RNN is a special kind of neural network where the previ-ous step’s output will be used as the current step’s input. In a traditional neuralnetwork, not all inputs and outputs are interdependent. However, interdepen-dence is an important part of text data. In such cases, the model needs to predictthe next word given the previous words, so the previous word must be stored.Thus RNN was born, which solved the problem with the help of hidden layers.The primary function of RNN is also essential, namely the hidden state . It canremember some information about the sequence.RNN is a neural feedback network that operates on the internal memory.Since the RNN has a similar function for each piece of information and thecurrent range’s output is based on the last count, the RNN is essentially recur-sive. When there is an output, it is copied and sent back to the relay network.The current input and the output of the previous input are taken into accountin determining the prediction of the next word. Unlike direct feedback neuralnetworks, RNNs can use their internal state (memory) to manage the input el-ements’ interdependence. That makes them useful for text data, handwritingrecognition, or speech recognition. The architecture of an unrolled recurrentneural network is shown in Figure 5.In Figure 5, ﬁrst the model gets x from the input sequence. Then it produces h , which is used in the next input to the model along with x . That is, both h and x become inputs to the next step. Then, h and x are input to the nextstep, and so on. Like this, RNN continues summarizing the unique circumstance A A A A A

Fig. 5.

Unrolled Recurrent Neural Network in the hidden state while training. Then, it uses the summarized hidden state toclassify the sequence (Bashar et al., 2020b).

We proposed a hybrid neural network model based on Convolutional NeuralNetwork (CNN) and LSTM for sentiment analysis in Bengali texts.Integrated models are used to solve various vision and NLP problems andimprove a single model’s performance. The following subsections provide anoverview of the LSTM and CNN models oﬀered. In subsection 3.6, we describedLSTM. In this research, we used two-layer bi-LSTM, word embedding includewords in the news articles and provide sentiments.Another part of our proposed structure is based on Convolutional NeuralNetwork (CNN). CNN has very successful in various image processing and NLPtasks these last years. They are powerful in exploration, achieving local rele-vance, and data standards through learning. Generally, to rank text on CNN,diﬀerent words in sentences (or paragraphs) need to be placed. Stacked to form atwo-dimensional matrix, pleated ﬁlters (diﬀerent lengths) are applied to the win-dow. To use CNN for text classiﬁcation, the diﬀerent words stacked in a sentenceare usually stacked in a two-dimensional matrix, and afterward, a convolutionis applied to the word in the window in one word to be created applied a newfunction declaration. Then, a max-pooling is applied to the new function, andthe combined functions of diﬀerent ﬁlters are combined to shape a concealedportrayal. Completely associated layers trail these portrayals for the last esti-mate. The architecture of our CNN-BiLSTM Hybrid network model is shown inFigure 6.We created a sequential model that includes an LSTM layer. Then we madeour model sequential and adding layers. In the ﬁrst layer, we applied a conv1Dwith 200 as a ﬁlter for CNN. After that, we applied two Bi-LSTMs on the secondand third layer with an error of 0.5. Then we applied a dense network on theremaining levels. We also used

Adam as an optimizer with tight hyperparametersand applied L2 adjustments to reduce overﬁtting as much as possible. We only utomatic Monitoring Social Dynamics During Big Incidences 15 লকডাউেন রাজগারহারােনাঅিধকাংেশরপিরবােরদখািদেয়েছচরম খাদসট। Convolution

LSTMLSTMLSTMLSTMLSTMLSTMLSTMLSTM

PositiveNegative

DocumentMatrix FeaturesMap LSTM Layers FullyConnectedLayers Output

Fig. 6.

The architecture of CNN-BiLSTM Hybrid network for Sentiment Identiﬁcation used ﬁve epochs, as using more epochs resulted in overﬁtting and kept the stacksize of 256, as it worked very well.

The time series volume analysis of newspapersis shown in Figure 7. The ﬁgure has four plots, namely observed level, trend,seasonal, and noise or residual. The ﬁrst plot Figure 7a shows the original volume,i.e., the number of COVID-19 related news articles in a time point. It shows thatthe curve began to rise from January when some COVID-19 cases were foundin China and other countries. The plot increased sharply in early March whena few instances of COVID-19 cases were identiﬁed in Bangladesh. The curveremained high onward with some ﬂuctuations. The second plot Figure 7b showsthe trend of the COVID-19 related news publication volume. It shows that theCOVID-19 related news started becoming trendy by the end of January, andthe trend increased signiﬁcantly in early March. The trend stayed high throughthe rest of the time with some ﬂuctuations. The third plot Figure 7c shows theseasonal, cyclical change in the volume. Moreover, the fourth plot Figure 7dshows a residual or random variation in the volume.To see how newspapers reacted during the COVID-19 pandemic, we trackedCOVID-19 cases, death from COVID-19, and COVID-19 news volume in Figure8. The ﬁgure shows that the newspapers were vigilant from the beginning of thepandemic. The newspaper journalists increased COVID-19 related news coverageexponentially as soon as COVID-19 cases were found in Bangladesh in earlyMarch. The news volume continued increasing until the last quarter of March.

Feb2020 Mar Apr May

Date nu m be r o f ne w s (a) Observed Feb2020 Mar Apr May

Date nu m be r o f ne w s (b) Trend Feb2020 Mar Apr May

Date nu m be r o f ne w s (c) Seasonal Feb2020 Mar Apr May

Date nu m be r o f ne w s (d) Residual Fig. 7.

Time Series Decompositionutomatic Monitoring Social Dynamics During Big Incidences 17

This part of the news volume shows the newspapers reacted from about COVID-19 from the very early pandemic stage. They signiﬁcantly covered the pandemicduring the early period of the COVID-19 cases.The number of identiﬁed cases increased signiﬁcantly by the second quarterof April, and it continued to increase. However, the number of COVID-19 relatednews articles did not increase during this time. Even in some cases, the newsarticle volume decreases marginally. The possible reasons might be: (a) BecauseBangladesh is a developing country, to survive at this point, people had to thinkmore about earnings than pandemic. As a result, pandemic news did not increaseattention, and newspapers did not increase COVID-19 related articles. (b) Someother big incidences gained more attention than COVID-19. (c) The newspapersreached their allocated space for pandemic news already.

Fig. 8.

Comparison of Daily News Article Counts and Daily Cases (21 January 2020 -19 May 2020)

Spatial Analysis of Volume

The spatial Distribution of Bengali newspapersis shown in Figure 9a. The number of news articles was concentrated on thecentral part of Bangladesh, mainly Dhaka, Narayanganj, and Gazipur. More than6000 COVID-19 related news articles were published in Bangladeshi newspapersrelated to Bangladesh’s central part. More than 2000 news articles related to thesouthern part of Bangladesh, mainly Chittagong and Cox’s Bazar.The spatial distribution of conﬁrmed cases of COVID-19 is also shown inFigure 9b. The central part of Bangladesh is the most aﬀected area. More than10,000 COVID-19 patients were identiﬁed in Dhaka during this time. Outbreaks

Fig. 9.

Spatial Analysis of News Article Volumeutomatic Monitoring Social Dynamics During Big Incidences 19 have been reported in the surrounding areas of Dhaka, mainly Narayanganj andGazipur. After the Dhaka division, we can see the highest infection rate in thesouthern part of Bangladesh, mainly Chittagong. Figure 9 shows a correlationbetween the number of conﬁrmed COVID-19 cases in an area and the publishednews volume related to that area. This means automatic monitoring of newsarticle volume can give a clear view of the severity of a pandemic or big instancesin a society.

DhakaChittagongNarayanganjGazipurComillaKhulnaBarisalSylhetKishoreganjTangail

Fig. 10.

District-wise Distribution of News Articles

ChittagongKhulnaMymensinghSylhetDhakaRajshahiRangpurBarisal

Fig. 11.

Division-wise Distribution of News Articles

The district-wise break down of published news articles for signiﬁcant volumeis shown in Figure 10 and division-wise break down in Figure 11. The ﬁguresshow that most news published was related to the Dhaka district and Dhakadivision. More than 57% of the published news was related to the Dhaka division.After Dhaka, most news has been published on Chittagong. More than 19% ofthe news was related to the Chittagong division. The geospatial and temporal

Fig. 12.

Geo-spatial and Temporal Distributions of News Articles Published over Time.Horizontal axis shows consecutive weeks in the duration and vertical axis shows thevolume (count of news articles). distributions of newspaper articles are shown in Figure 12. The ﬁgure showsthat the volume of published news articles related to each location signiﬁcantlychanged over time, lower volume before and beginning of the pandemic, andsigniﬁcantly increased during the pandemic.

For topic analysis through the LDA topic model, it is indis-pensable to decide the optimal number of topics. Seeking an appropriate LDAtopic number and clariﬁcations to examine the relationship between the COVID-19 emergency and news articles, we have given much thought. We used a coher-ence score and perplexity score to assess the choice of an appropriate numberof topics. After preprocessing the data, we applied the LDA model to discoverhidden topics in news articles. To determine the optimal number of topics, wediagnosed the coherent score and the perplexity score graph shown in Figure 13.Figure 13a is showing the coherent score graph and Figure 13b is showing theperplexity score graph.From the coherence score graph, we got the highest coherence score (0.5077)when we set the number of topics to 9, shown in Figure 13a. Moreover, fromthe perplexity score graph, we got the highest perplexity score (-7.59) when weset the number of topics to 24, shown in Figure 13b. We chose the coherentscore between the coherent score and perplexity score as the optimal number oftopics for the coherent score is 9, which is very close to the number of manuallyextracted classes of 8, shown in Table 2. So we set the number of topics for LDAtopic extraction to 9. The word clouds for top words (i.e., keywords) in each ofthe nine topics is shown in Figure 14. The weights and appearance counts of the utomatic Monitoring Social Dynamics During Big Incidences 21

5 10 15 20 25 30 Number of Topics C oh e r e n t S c o r e (a) Coherent Score

5 10 15 20 25 30 Number of Topics P e r p l ex i t y S c o r e -7.59-7.60-7.61-7.62-7.63-7.64-7.65 (b) Perplexity Score Fig. 13.

Determining optimal number of topic keywords in each topic is shown in Figure 15. The visualization of the clustersof documents in a 2D space using the t-SNE (t-distributed stochastic neighborembedding) algorithm is shown in Figure 16. In Figure 17, inter-topic distancemap and 30 relevant keywords are displayed for each topic. They discovered ninetopics are listed in Table 3.

Table 3.

Nine Topics Discovered by LDA(1) Economic Crisis and Incentives, (2) Epidemic Situation and Outbreak, (3) Vaccineand Treatment, (4) Demonstration for Wages and Relief, (5) Medical Care and HealthOrganization Responses, (6) Repatriation and International Situations, (7) Daily In-fected Death and Recovered Cases, (8) Strategic Preparedness, and (9) GovernmentAnnouncement and Responses

Figure 18 shows the topic frequency ratio in the document collection (newsarticles). The ﬁgure shows that Topic 8 (Strategic Preparedness) is the mostfrequent topic amongst all the nine topics discovered by LDA, and this topic ac-counted for 26.3% of all the nine topics. The second most frequent LDA topic isTopic 2 (Epidemic Situation and Outbreak), which accounted for 20.1%. Topic 9(Government Announcement and Responses) and Topic 7 (Daily Infected, Death,and Recovered Cases) are 13.6% and 11.7%, respectively, and are the third andfourth most frequent topics. Topic 5 (Medical Care and Health OrganizationResponses), Topic 3 (Vaccine and Treatment), and Topic 4 (Demonstration forWages) and Relief are at ﬁfth, sixth, and seventh positions, and They accountedfor 9.8%, 5.7%, and 5.2%, respectively. Finally, Topic 6 (Repatriation and In-ternational Situations) and Topic 1 (Economic Crisis and Incentives) are theleast frequent topics, and the proportion of these two topics is less than 5%. Byreviewing all these topics and analysis, we can insight into the pandemic or anyimportant incident in a society.

Topic 1 (a) Word cloud of Topic 1

Topic 2 (b) Word cloud of Topic 2

Topic 3 (c) Word cloud of Topic 3

Topic 4 (d) Word cloud of Topic 4

Topic 5 (e) Word cloud of Topic 5

Topic 6 (f) Word cloud of Topic 6

Topic 7 (g) Word cloud of Topic 7

Topic 8 (h) Word cloud of Topic 8

Topic 9 (i) Word cloud of Topic 9

Fig. 14.

Word Clouds for nine topicsutomatic Monitoring Social Dynamics During Big Incidences 23 W o r d C o un t Topic: 1