Attention dynamics on the Chinese social media Sina Weibo during the COVID-19 pandemic
CCui and Kert´esz
REGULAR ARTICLE
Attention dynamics on the Chinese social mediaSina Weibo during the COVID-19 pandemic
Hao Cui and J´anos Kert´esz * *Correspondence: [email protected] of Network and Data Science, Central European University, Quellenstrasse 51, A-1100, Vienna, Austria Abstract
COVID-19 was first detected in Hubei province of China and has had severeimpact on the life in the country since then. We investigate how this epidemichas influenced attention dynamics on the biggest Chinese microblogging websiteSina Weibo in the period December 16, 2019 – April 17, 2020. We focus on thereal-time Hot Search List (HSL), which provides the ranking of the most popular50 hashtags based on the amount of Sina Weibo searches on them. We show, howthe specific events, measures and developments during the epidemic affected theemergence of new hashtags and the ranking on the HSL. A significant increase ofCOVID-19 related hashtags started to occur on HSL around January 20, 2020,when the transmission of the disease between humans was announced. Then veryrapidly a situation was reached, where the participation of the COVID-relatedhashtags occupied 30-70% of the HSL, however, with changing content. We givean analysis of how the hashtag topics changed during the investigated time spanand conclude that there are three periods separated by February 12 and March12. In period 1, we see strong topical correlations and clustering of hashtags; inperiod 2, the correlations are weakened, without clustering pattern; in period 3,we see potential of clustering while not as strong as in period 1. To quantify thedynamics of HSL we measured the lifetimes of hashtags on the list and the rankdiversity at given ranks. Our observations indicate attention diversification sincethe COVID-19 outbreak in Mainland China and a higher rank diversity at the top15 ranks on HSL due to the COVID-19 related hashtags, a drastic attentiondecay shortly after the outburst and a slower decay for a longer period.
Keywords:
Public Attention Dynamics; COVID-19; Social Media; Ranking
In our times of information deluge the dynamics of public attention is of eminentimportance from many aspects, including education, politics, marketing and gov-ernance. On the new media the flow of information has dramatically accelerated,leading often to rapidly changing public attention. At the same time these me-dia provide unprecedented possibilities to study attention dynamics [1, 2] as theyproduce Big Data open for investigation. The microblogging service Twitter [3] isparticularly suited to provide the basis for quantitative studies on the dynamics ofpublic attention as the content of the messages is available [4]. Accordingly, Twit-ter data have been used to identify classes of dynamical collective attention [5],investigate party-related activity and its predictive power for elections [6] as well as a r X i v : . [ phy s i c s . s o c - ph ] A ug ui and Kert´esz Page 2 of 22 modeling of the related attention dynamics [7] or study the relationship of publicattention and social emotions [8].Public attention becomes a focal issue in times of crises like pandemics. As earlyas 2010, four years after it was launched, Twitter was shown to be an adequate, real-time content, sentiment, and public attention trend-tracking tool [9] and was usedto study rapidly-evolving public sentiment with respect to the epidemic H1N1 [10].The analysis of tweets enabled to quantify the difference between attention andfear and their distance-dependence in the case of the Ebola epidemic [11]. Even forthe present pandemic COVID-19, the first Twitter studies on public attention haveoccurred [12, 13] mainly focusing on the perception of policies by the public.During a critical time of the Spring Festival travel rush, Wuhan, the capital cityof Hubei province was reported to be the first COVID-19 epicenter. The service ofTwitter is blocked in China, but its local substitute, Sina Weibo is very popular [14],therefore it is natural to use data from Weibo for similar purposes as was introducedearlier for Twitter in other countries. Posts on Sina Weibo are predominantly inChinese, which causes a language barrier, however, scientists have already recog-nized that this microblogging service provides important insight into the functionof the Chinese society [15, 16]. Recently some studies have appeared dealing withthe reaction of Sina Weibo on the COVID-19 analyzing, e.g., the propagation ofsituational information [17].In this study we focus on the attention dynamics in the period of COVID-19using the Hot Search List (HSL) of Sina Weibo. This is a ranking of hashtagsupdated on the minute basis, created according to an algorithm in which the numberof searches on the hashtags is dominant. This ranking provides a proxy for theattention preferences of the Weibo users enabling the quantification of the dynamicsthereof, which reflects the changes in the attention due to events and measures.Ranking is present in many fields of today’s world from sports to universities,from wealthy individuals to purchasable goods. Recently, ranking dynamics has beenstudied widely from sports [18, 19, 20] to scientists, journals or companies [18]. Thereare stable rankings (like word frequencies) with little or no changes in the ranksand there are volatile ones with vivid dynamics (mentions of Twitter hashtags) [18].Clearly, Weibo HSL belongs to the latter with rich dynamic properties, which givesinsight into the changes in the attention of the Weibo users.The Weibo HSL provides rich data about the public attention and its dynamicsin China. Based on that data, we have been able to identify different periods in thepandemics and could follow how the attention of the population shifted from onegroup of topics to another.The paper is organized as follows: in Section 2, we provide background informationon Sina Weibo real-time Hot Search List (HSL) and methodologies on quantifyingattention dynamics. In Section 3, we present our results on attention dynamics,correlations between different types of hashtags, and attention decay. In Section 4,we discuss and summarize the results. Sina Weibo is the biggest Chinese microblogging website, with MAU (monthly ac-tive users) reaching 550 millon and DAU (daily active users) 241 million in March ui and Kert´esz Page 3 of 22 荐 ” [24] (meaning recommendation). We took data from Weibo HSL to study attention dynamics as it captures vibrantreal-time change of public attention. Due to the random existence of one or two com-mercial advertisements at the third and the sixth ranks, in order to get a constantlength of non-advertisement hashtags on the HSL at each timestamp, we removedall the hashtags labeled with ” 荐 ”, re-ranked the original HSL and took the top 48hashtags for each timestamp. All the HSL we mentioned later in this paper meanthe re-ranked HSL with 48 ranks. We collected the data on the HSL with a fre-quency of every 5 minutes from December 16, 2019 to April 17, 2020. There are intotal 26022 hashtags and 9120 of them are related to the aspects of COVID-19. Torelate social media contents with real-life pandemic situation in Mainland China,we collected the daily number of infections, deaths, and recoveries from the officialwebsite of National Health Commission of China [25]. In the following subsectionwe explain how we identified the different categories of hashtags. Fig. 1 shows the number of daily infections, deaths and recoveries in MainlandChina. The number of daily infections and deaths have a sharp peak on February12 due to the adoption of new diagnostic criteria [26]. The decreasing trend of dailyinfections since the peak turned to increasing after March 13, as a result of therising number of imported coronavirus cases from abroad [27]. We will argue thatthere are three periods to be distinguished after the outburst of COVID-19 aroundJanuary 19, separated by the maximum and local minimum of the daily number ofinfections on February 12 and March 12, respectively.The public attention towards COVID-19 is believed to change with the real worldpandemic situation. To study the public attention towards COVID-related infor-mation, we first extracted hashtags which encompass all aspects of COVID-19 andclassified them based on geographic regions and the exposure order under the pan-demic into three categories: Mainland China, East Asia outside of Mainland Chinaand Other Countries outside of East Asia. With a focus on COVID-hashtags related ui and Kert´esz Page 4 of 22
Figure 1
COVID-19 daily infection, death and recovery in Mainland China. The inset enlarges thetail of the infection curve. Three periods after the outbreak on January 19 are separated by thehighlighted peak and local minimum. to Mainland China, we manually classified them based on semantic meanings intothe following seven sub-categories. The Bad News category comprises hashtags onconfirmed infections and deaths in different regions of Mainland China as well asshortages of essential supplies. The Good News category consists of news on cases ofrecovery, sufficiency of supplies, and decrease in daily infections or deaths. The Reg-ulations category consists of authority responses of national, regional, institutionallaws, rules and regulations associated with public behavior during the pandemic.The Life Influence category contains hashtags that reflect the pandemic influenceon the aspects of citizen lives. The Front Lines category includes hashtags relatedto the lives of front line workers (mainly doctors and nurses) and their interactionswith patients in hospitals. The Science category incorporates scientific understand-ings of the virus properties, vaccine development, and ways for public protectiongiven by authoritative doctors. The Supports category takes into account hashtagson worldwide donations and emotional supports. All the classifications were madeby human decisions due to the syntactic-semantic complexity of Chinese language.For ambiguous cases which contain information of more than one category, ourclassifications were based on the focus of the main subject. The Mainland Chinasub-categories are summarized in Table 1 together with examples. The full list ofCOVID-related hashtags is available in the dataset, which we have made public (seedeclaration at the end of the paper).To further understand how the Mainland China related COVID-hashtags are cor-related with each other and with the daily number of infections/deaths/recoveries inthe three separated time periods, we measured the Pearson’s correlations betweenthe seven series of daily number of new hashtags in each of the sub-categories definedabove, together with the three series of daily number of infections/deaths/recoveries.The correlation of these ten time series are calculated using the percentage changebetween the current and the prior element instead of the actual value in orderto reduce the effect of the trend which can cause spurious correlations. For time ui and Kert´esz Page 5 of 22 series category X = { X t i : t i ∈ T, i = 1 , , ...n } and category Y = { Y t i : t i ∈ T, i = 1 , , ...n } , where T is the time index set, the Pearson’s correlation is calcu-lated using the percentage change series ˜ X = { X ti +1 − X ti X ti , t i ∈ T, i = 1 , , ...n } and˜ Y = { Y ti +1 − Y ti Y ti , t i ∈ T, i = 1 , , ...n } . Table 1
Mainland China COVID-hashtag details. A summary of the example hashtags in eachsub-category of Mainland China category and the number of hashtags in different time periods.
Category Examples Period 1 Period 2 Period 3Bad News 全 国 累 计 确 诊 新 冠 肺 炎 例 黑 龙 江 聚 集 性 疫 情 共 起 发 病 人 武 汉 多 家 医 院 物 资 紧 张 火 神 山 医 院 累 计 治 愈 患 者 破 千 省 区 现 有 确 诊 病 例 清 零 疫 情 形 势 出 现 个 积 极 变化 上 海 地 铁 不 戴 口 罩 不 得 进 站 疫 情 影 响 严 重 的 地 区可 增 发 生 活 补 助 非 疫 情 严 重 国 家 进 京 者 居 家 观 察 天 武 汉 市 民江 滩 唱 起 国 歌 疫 情 期 间 点 外 卖 指 南 一 季 度 民 航 业亏 损 亿 钟 南 山 等 专 家 连 线 武 汉 ICU 团 队 护 士 握 手 呼 唤 岁 新 冠 患 者 方 舱 医 院 收 治 第 一 批 患 者 现 场 各 年 龄 段 人 群 普 遍 易 感 新 冠 病 毒 口 罩 的 正 确 使 用 方 法 如 何 区 分 感 冒 流 感 和 新 冠 肺 炎 汶 川 村 民 自 发 支 援 武 汉 吨 蔬 菜 欧 盟 对 华 运 送 吨 急 需 物 资 武 汉 给 援 汉 医 疗 队 全 员 的 感 谢 信 One natural measure of social media attention towards a topic category is thequantity of the related hashtags. The growing pattern of the cumulative number ofhashtags on the HSL with time reflects the dynamics of the public attention. Weseparately measured the growth of the cumulative number of all hashtags and allCOVID-related hashtags that ever appeared on the HSL in our observation period.To understand how much COVID-information occupies the HSL at each timestamp,we constructed the historical ratio trajectory of the COVID-related hashtags on theHSL since the first COVID-hashtag 武 汉 发 现 不 明 原 因 肺 炎 ui and Kert´esz Page 6 of 22 The lifetime duration of a hashtag on the HSL indicates the ability of obtaining per-sistent attention from the public. We quantified the duration (continuous existenceon the HSL) of a hashtag with τ : τ = τ − τ , where τ is the timestamp of the first and τ is the timestamp of the last appearanceof a hashtag on the HSL.We compared the duration of the hashtags across various categories and differenttime scopes. We compared the duration of the hashtags before the outbreak on Jan-uary 19, all COVID-related hashtags, and non-COVID hashtags after the outbreak.To ensure complete life cycles of the hashtags, we took all hashtags whose firstarrivals on the HSL are between December 19, 2019 and January 18, 2020 as thesample for hashtags before the pandemic, which includes 6161 in total. Similarly, wetook all COVID-hashtags whose first arrivals are no later than April 14, with a totalnumber of 8808. For the non-COVID hashtags after the outbreak, we took a ran-dom sample of all non-COVID hashtags with the same size as the COVID sample.Hashtags that reappeared after disappearing from the HSL were excluded from ourcalculation. To understand the overall attention variation towards COVID-hashtagswith time, we investigated the daily value of their cumulative average duration. Wedenote D j as the cumulative average of duration from December 31, 2019 (day 0)until day j . D j is calculated as follows: D j = 1 | S ( j ) | j (cid:88) i =0 (cid:88) α ∈ S ( j ) d αi (1)where d αi is the duration of hashtag α whose first appearance was on day i . S ( j ) isthe set of all the hashtags whose first appearance is in the interval [0 , j ]. The changes in the ranking patterns of the hashtags at different time periods re-flect the general public attention dynamics. Rank diversity [20], which measures thenumber of different hashtags occupying a given rank over a given length of time,gives overall information on the total dynamical trend of the hashtags on the HSL.Rank diversity is known to give characteristic profiles for different types of systems;e.g., in open systems (where only the top part of the competing items is ranked)behaves differently from closed systems (where all the items are ranked). We com-pared the rank diversity at the 48 ranks on the HSL before the outbreak and duringthe different periods after the outbreak, with and without COVID-19 hashtags.The public attention towards a hashtag can also be indicated by its highest rankduring the lifetime on the HSL. The highest rank of a hashtag reveals its highestability and achievement when competing for attention with the other hashtags. Westudied the highest rank distribution of the classified COVID-hashtags and com-pared the results with the hashtags before the outbreak as well as the non-COVIDhashtags after the outbreak (SI). To understand the overall highest rank variation ui and Kert´esz Page 7 of 22 towards COVID-hashtags with time, we investigated the daily value trajectory oftheir cumulative average highest rank. We denote H j as the cumulative average ofhighest rank from December 31 (day 0), 2019 until day j. H j is calculated as follows: H j = 1 | S ( j ) | j (cid:88) i =0 (cid:88) α ∈ S ( j ) h αi (2)where h αi is the highest rank of hashtag α whose first appearance was on day i . S ( j )is the set of all the hashtags whose first appearance is in the interval [0 , j ]. The cumulative number of new hashtags on HSL grows approximately linearly (seeFig. 2 (A)), indicating a nearly constant attention capacity and need for news ofthe users. Closer inspection tells, however, that the rate of new hashtags decreasesbetween January 10 and February 12 followed by an increased rate until March 28after which the original slope of 225 ± ui and Kert´esz Page 8 of 22 Figure 2
Overview of COVID-hashtags on Weibo re-ranked Hot Search List (HSL) throughoutthe pandemic. (A) Cumulative number of all hashtags and all COVID-hashtags with time. Theinset indicates rapid increase in COVID-related hashtags starting from January 19 marked by avertical red line. (B) Daily new COVID-hashtags on Mainland China, East Asia outside ofMainland China and Other Countries outside of East Asia. (C) Ratio of COVID-hashtags on theHSL at each timestamp. (D) Distribution of all COVID-hashtags by categories.ui and Kert´esz Page 9 of 22
COVID-19 was first observed in east Asia, with Mainland China being the hardest-stricken region, followed by places with growing infections such as South Korea,Diamond Princess cruise ship and Japan. The epicenter of COVID-19 later shifted toEurope and the rest of the world as the situation mitigated in east Asia. The resultsdepicted in Fig. 2 (B) follow these events closely, confirming the role of the real-time HSL on Weibo as a reflection of the real world. Unsurprisingly, the upward anddownward trend periods of Mainland China and Other Countries coincide with Fig.2 (C), where the ratio of COVID-related hashtags on the HSL at each timestampis displayed. The swift third peak on April 4 in Fig. 2 (C) is due to the nationalQingming Festival (also known as the Tomb-Sweeping Day), where the victims whodied in the COVID-19 pandemic were mourned. The dynamics of the COVID-related hashtags on the HSL demonstrates vibrant generations of newly createdCOVID-19 hashtags about the relevant up-to-date events around the world. Fig.2 (D) shows the distribution of the hashtags in the sub-categories of MainlandChina category along with East Asia and Other Countries. Among the seven sub-categories that belong to Mainland China, Support, Science, and Good News haverelatively fewer hashtags, compared with Front lines, Life Influence, Bad News, andRegulations.
Fig. 3 illustrates the attention dynamics of the sub-categories of Mainland China byshowing the quantity variations in Fig. 3 (A) (C) (E), paired with their correlationmatrices with daily infections, deaths, and recoveries in Fig. 3 (B) (D) (F). As notedabove, we have identified three periods in the investigated time interval: The firstperiod is January 19 – February 11, separated by the huge peak in Fig. 1 from thethe second one (February 12 – March 12). The third period (March 13 – April 17)is separated from period 2 by the second vertical line where the number of newinfections has a local minimum (Fig. 1 inset).In Table 1, we show the number of hashtags related to Mainland China in thedifferent categories for the three periods. In Fig. 3, we show that the daily emergenceof the categorized COVID-hashtags is dominated in the first two periods by BadNews, with increasing and decreasing trends in period 1 and period 2, respectively.In period 3, the categories Regulations, Life Influence, and Front Lines receive moreattention as compared to the rest of the categories. Here the consistently high valuesin Regulations and Life Influence could result from the worsening world pandemicsituation along with the rise of the imported infected cases in Mainland China,necessitating the establishment of measures to handle it. The categories of theMainland China COVID-hashtags move with the number of infections and deathsin the world.The patterns of the Pearson’s correlation matrix of the ten time series reflect tem-poral structure with the three periods. Fig. 3 (B) shows a positive correlation blockstructure. There are strong correlations between New Death, Regulations, Science,and Bad News (upper left block) as well as between Supports, Good news and FrontLines (lower right block) and there is considerable anti-correlation between the twoblocks. Fig. 3 (D) (period 2) exhibits much weaker correlations, in fact, very fewelements of the matrix reach values beyond the noise level (see SI). Exceptions are ui and Kert´esz Page 10 of 22
Figure 3
Time series of daily new hashtags from the sub-categories of Mainland ChinaCOVID-hashtags and their correlation matrices with daily new infections, deaths, and recoveries,in the three periods after the outbreak.ui and Kert´esz Page 11 of 22 new strong correlations between New Death and Front Lines, as well as Bad Newsand Front Lines. In the third period (Fig. 3 (F)) the block structure gets againmore pronounced, though not as pronounced as in the first period. Note that thecategories had to be rearranged in order to achieve this structure. The major changeis that Supports/Front Lines and Life influence/Regulations have exchanged posi-tions. In period 1, the Bad News (mainly infections and deaths) of domestic cases inMainland China were flooding, this lead to the urgent establishment of regulations,which caused life influences. In period 3, the domestic situation was under control,therefore, the Bad News in Mainland China were mainly caused by the worseninginternational situation (infections/deaths and Chinese coming back from abroad).Then the Regulations and corresponding Life Influences towards these issues werenot anymore strongly associated with domestic deaths. In period 3, the assistingfront line doctors were gradually going back home after finishing their work, peopleexpressed their gratitude to them, so that Front Lines and Supports were movingtogether.
What is the effect of COVID-19 on the ranking dynamics? Fig. 4 shows a comparisonof the rank diversity at the top 48 ranks taking non-COVID and COVID hashtags indifferent periods. Striking differences are observed between the rank diversity plotsbefore and after the outbreak. As Fig. 4 (A) suggests, the rank diversity plot beforethe outbreak was approximately linear with moderate fluctuations. A clear gapemerges in the rank diversity after rank 15 in Fig. 4 (B) during the COVID period.We recognize resemblances in the rank diversity plots before the outbreak and afterthe outbreak considering only non-COVID hashtags, except for the strange dropsat ranks 29 and 34 in Fig. 4 (C). Comparing Fig. 4 (D) with Fig. 4 (B), the gapafter rank 15 is larger in the rank diversity plot considering only COVID-relatedhashtags. The rank diversity plots for hashtags in period 1 surpass period 2 andperiod 3 with both non-COVID and COVID hashtags as depicted in Fig. 4 (C) and(D), while the difference is much higher in the latter case.Fig. 4 gives evidence that the COVID-hashtags cause the gap in the rank diversityplot after the outbreak. Taking the normalized rank diversity plot before the out-break as a reference, a higher normalized rank diversity at a certain rank positionrepresents a higher number of unique occurrences within the observation period, sothat the COVID-related hashtags in the top 15 ranks change faster (with higherfrequency) than normal. One possible explanation is that the COVID-hashtags keptemerging with higher frequency than before the outbreak and people payed muchattention to these new hashtags. Additionally, when the flooding hashtags con-tained similar information such as the new infections and deaths in different citiesor provinces of China, the public interest towards individual hashtags could dropquickly, resulting in a higher number of unique hashtags at certain ranks in unit timeon HSL. This effect of higher rank diversity for higher ranks seems to be amplifiedby the algorithm leading to the observed gap.Strange drops of rank diversity at ranks 29 and 34 can also be seen on our plotsin Fig. 4. As provided in SI, there are hashtags that stay at the ranks 29 and 34 foran unusually long time and then disappear from the HSL, indicating algorithmic ui and Kert´esz Page 12 of 22
Figure 4
Rank diversity of the 48 ranks on the HSL before and after COVID-19 outbreak. (A)Rank diversity taking all hashtags in our observation period before the outbreak, approximatelylinear except the head and tail parts, with small fluctuations. (B) Rank diversity taking allhashtags after the outbreak, with strange points colored in red. A large gap occurs after the top15th rank. (C) Rank diversity taking all non-COVID hashtags in the three periods after theoutbreak, strong resemblances with (A). (D) Rank diversity taking all COVID-hashtags in thethree periods after the outbreak. The result in period 1 is higher than period 2 and period 3,revealing a more dynamic change of the hashtags appeared on the HSL. The gap after rank 15 ismore severe compared to (B).ui and Kert´esz Page 13 of 22 intervention from Weibo. As one of the most popular and influential social mediain China, Weibo might shoulder the responsibility during the global public healthemergency to keep people informed about related news in China and around theglobe, by means of changing the algorithm towards COVID-hashtags to promotecrucial news and keep them updating in the top 15 positions and leave the listat rank 29 or 34. Our methods are sensitive enough to demonstrate this type ofinterventions. Therefore our observations reflect a combination of both spontaneousattention dynamics from the public and the controlled effects from Sina Weibo.
Figure 5
Attention decay. (A) Cumulative average highest rank of COVID-hashtags whose firstappearance was in the time interval since December 31, 2019. (B) Cumulative average duration(hours). The inset shows a three-parameter exponential fit ( α = 4 . , β = 0 . / day , γ = 5 . )for the cumulative average duration decay after January 22, 2020. Rank diversity captures attention dynamics from the point of view of the overalldynamical rank movements of the hashtags on the HSL. It is interesting to follow thedynamics also from the aspect of the individual hashtags. The average highest rankof a category of hashtags on a given day is characteristic to the attention paid to thatcategory. (Note, of course, that getting to the HSL expresses already considerableattention.) Similarly, the average duration is another measure of attention. However,in the latter case it should be mentioned that short duration can be caused bydecaying attention to the general topic (in this case the hashtag is likely to bereplaced by another from a different topic) or because of the heavy stream of newhashtags of the same topic.How do the average highest rank and average duration accumulate with time?As Fig. 5 (A) shows, the cumulative average highest rank, H j is initially at a toprank, indicating that the first few hashtags about the unknown pneumonia receiveda huge amount of attention from the public. As more COVID-related hashtagsoccurred, H j becomes lower, with a rapid change at the beginning and a slowerchange later, separated by around January 30. This is due to the rapidly increasingnumber of COVID-related hashtags and the limited number of ranks on HSL. InFig. 5 (B), the first peak of the cumulative average duration, D j is on January 8,when the hashtag that eight patients infected by the unknown pneumonia recoveredfrom hospital. Then the D j decreases first and then increases again, reaching thesecond peak on January 22, after which the increasing daily new hashtags with shortdurations started to play a greater role than the few hashtags with long durations. ui and Kert´esz Page 14 of 22 The fast decay of D j in the period between January 22 and February 18 (see theinset in Fig. 5 (B)) was fitted by an exponential function: f ( t ) = αe − β.t + γ, (3)with α = 4 . , β = 0 . / day , γ = 5 . D j exhibits a slower and longer decay. In this work, we have studied the public attention dynamics on the biggest Chinesemicroblogging website Sina Weibo under the influence of the COVID-19 pandemic.We provide a novel approach to study and quantify the attention dynamics in termsof ranking dynamics, taking advantage of the real-time Hot Search List on Weibo.We have identified three periods within the investigated time interval and analyzedthe attention dynamics on different COVID-related categories within them. Wehave compared the behavior of the hashtags before the outbreak with COVID-related and non-COVID hashtags after the outbreak. We have observed differencesin the attention dynamics on Weibo HSL in the different periods. First, the publicattention is mainly driven by the infection and death situations in Mainland China,with mainly domestic cases at the beginning, and internationally imported caseslater. The attention variation follows worldwide major events. Second, the publicattention on Weibo towards COVID-19 is diversified into different sub-categoriesafter the outbreak on January 19, 2020, with varying correlation patterns in thethree phases. Third, the attention decays as the situation in China gets better. Thecumulative average duration follows exponential decay since the attention peak inthe pandemic beginning phase. The situation in China is interrelated with the worldpandemic situation which keeps changing, so that the decay of public attention onthe Chinese social media Weibo is not a clear-cut case. Fourth, the rank diversityat the top 15 ranks are higher than normal due to COVID-hashtags. The reasoncan be that Weibo has different algorithms towards COVID-hashtags from normalones, or possibly a combined influence of both Weibo algorithm and the spontaneouspreference from the public towards COVID-related information.Besides exploring the attention dynamics on the Chinese social media Sina Weibo,we also studied the cumulative growth of all topics and all the COVID-topics onTwitter trending list in the United States. As is shown in the supplementary ma-terial Fig. SI1 (A), the cumulative number of all the Twitter trending topics in theUnited States is almost perfectly linear. The time period that the cumulative num-ber of all COVID-topics on Twitter trending list increases is in accordance with therising period of the number of hashtags in the Other Countries category in Fig. 2(B). The similarity of results on Sina Weibo HSL and Twitter trending is a reflectionthat both platforms are influenced similarly by the major events worldwide duringthe COVID-19 pandemic. Though having more daily new topics on Twitter trend-ing list than Weibo HSL, the number of COVID-topics on Twitter is much fewer.The topics on Twitter are generally shorter and have broader meaning, for exam-ple, ui and Kert´esz Page 15 of 22 小 区 窗 台 演 唱 会 庆 祝 解 除隔 离 Supplementary information
Additional file 1. Supplementary information (PDF 2.4 MB)
Availability of data and materials
The datasets supporting the conclusions of this article are available in theAttention Dynamics Sina Weibo COVID19 repository, https://github.com/cuihaosabrina/Attention_Dynamics_Sina_Weibo_COVID19
Competing interests
The authors declare that they have no competing interests.
Funding
JK acknowledges partial support from the H2020 project SoBigData++ (ID: 871042).
Author’s contributions
HC and JK conceived the idea and designed the study. HC carried out the data collection, HC and JK did the dataanalysis. Both authors drafted the manuscript, read and approved the final manuscript.
Acknowledgements
Thanks are due to M´arton Karsai and Tiago Peixoto for suggestions.
References
1. Wu, F., Huberman, B.A.: Novelty and collective attention. Proc. Nat. Aca. Sci. , 17599–17601 (2007)2. Russell Neuman, W., Guggenheim, L., Mo Jang, S., Bae, S.Y.: The dynamics of public attention:Agenda-setting theory meets big data. Journal of Communication , 193–214 (2014)3. Twitter Micoroblog and Social Network Service. https://about.twitter.com/ . Accessed August 8, 2020.4. Twitter: Research and Experiments. https://help.twitter.com/en/rules-and-policies . Accessed August 8,2020.5. Lehmann, J., Gon¸calves, B., Ramasco, J.J., Cattuto, C.: Dynamical Classes of Collective Attention in Twitter.In: Proceedings of the 21st International Conference on World Wide Web (WWW), pp. 251–260 (2007)6. Eom, Y.-H., Puliga, M., Smailoviˇc, J., Mozetiˇc, I., Caldarelli, G.: Twitter-based analysis of the dynamics ofcollective attention to political parties. PLoS ONE , 0131184 (2015)7. Ko, J., Kwon, H.W., Kim, H.S., Lee, K., Choi, M.Y.: Model for twitter dynamics: Public attention and timeseries of tweeting. Physica A , 141–149 (2014)8. Pen, T.-Q., Sun, G., Wu, Y.: Interplay between public attention and public emotion toward multiple socialissues on twitter. PLoS ONE , 0167896 (2017)9. Chew, C., Eysenbach, G.: Pandemics in the age of twitter: Content analysis of tweets during the 2009 h1n1outbreak. PLoS ONE , 14118 (2010)10. Signorini, A., Segre, A.M., Polgreen, P.M.: The use of twitter to track levels of disease activity and publicconcern in the U.S. during the influenza A H1N1 pandemic. PLoS ONE , 19467 (2011)11. van Lent, L.G.G., Sungur, H., Kunneman, F.A., van de Velde, B., Das, E.: Too far to care? measuring publicattention and fear for ebola using twitter. J. Med. Internet Res. , 193 (2017)12. Zavarrone, E., Grassia, M.G., Marino, M., Cataldo, R., Mazza, R., Canestrari, N.: CO.ME.T.A. – COVID-19media textual analysis. A dashboard for media monitoring. https://arxiv.org/pdf/2004.07742.pdf .Accessed August 8, 2020.13. Lopez, C.E., Vasu1, M., Gallemore, C.: Understanding the perception of COVID-19 policies by mining amultilanguage Twitter dataset. https://arxiv.org/ftp/arxiv/papers/2003/2003.10359.pdf . AccessedAugust 8, 2020.14. An Introduction to Sina Weibo: Background and Status Quo. .Accessed August 8, 2020.15. Tong, J., Zuo, L.: Weibo communication and government legitimacy in China: a computer-assisted analysis ofWeibo messages on two ‘mass incidents’. Information, Communication and Society , 66–85 (2014)16. Nip, J.Y.M., Fu, K.-w.: Networked framing between source posts and their reposts: an analysis of publicopinion on China’s microblogs. Information, Communication and Society , 1127–1149 (2016) ui and Kert´esz Page 16 of 22
17. Li, L., Zhang, Q., Wang, X., Zhang, J., Wang, T., Gao, T.-L., Duan, W., Tsoi, K.K.-f., Wang, F.-Y.:Characterizing the propagation of situational information in social media during COVID-19 epidemic: A casestudy on Weibo. IEEE Transactions on Computational Social Systems , 556–562 (2020)18. Blumm, N., Ghoshal, G., Forr´o, Z., Schich, M., Bianconi, G., Bouchaud, J.-P., Barab´asi, A.-L.: Dynamics ofranking processes in complex systems. Physical Review Letters , 128701 (2012)19. Criado, R., Garcia, E., Pedroche, F., Romance, M.: A new method for comparing rankings through complexnetworks: Model and analysis of competitiveness of major european soccer leagues. Chaos , 043114 (2013)20. Morales, J.A., S´anchez, S., Flores, J., Pineda, C., Gershenson, C., Cocho, G., Zizumbo, J., Rodr´ıguez, R.F.,I ˜n iguez, G.: Generic temporal features of performance rankings in sports and games. EPJ Data Science , 33(2016)21. Weibo Reports First Quarter 2020 Unaudited Financial Results. http://ir.weibo.com/news-releases/news-release-details/weibo-reports-first-quarter-2020-unaudited-financial-results/ . AccessedAugust 8, 2020.22. Wang, Y.: An Introduction to Sina Weibo for Journalists. . AccessedAugust 8, 2020.23. Service, W.C.: Common Questions on the Rules of Real-time Hot-Search-List, Hot-Message-List andHot-Topic-List. . Accessed August8, 2020.24. Weibo Advertising. . Accessed August 8, 2020.25. National Health Commission of People’s Republic of China. . Accessed August 8, 2020.26. China confirms 15152 new coronavirus cases, 254 additional deaths. . Accessed August 8,2020.27. Sajid, I.: China reports 99 new virus cases, majority imported. .Accessed August 8, 2020. ui and Kert´esz SUPPLEMENTARY INFORMATION
Attention dynamics on the Chinese social mediaSina Weibo during the COVID-19 pandemic
Hao Cui and J´anos Kert´esz * *Correspondence: [email protected] of Network and Data Science, Central European University, Quellenstrasse 51, A-1100, Vienna, Austria SI1 Twitter trending COVID-topics in the United States
Sina Weibo is the largest microblogging site in China, where Twitter, the worldwidemost popular service of this kind does not operate. It is a natural idea to try tocompare our observations made on Sina Weibo with Twitter attention dynamics.Unfortunately, there is no comparable statistics on Twitter to the HSL. Instead,Twitter has the service to inform about most retweeted hashtags during the last24 hours updated on the minute basis and broken down to countries [1]. We havechosen to study the US tweets.Categorization of tweets has been widely investigated [2, 3], including recent at-tempts to analyze the impact of COVID-related topics [4] on Twitter by analyzingthe sentiments to 10 words related to COVID. Twitter even created a “COVID-19stream” [5] to promote this type of research. In spite of these, a direct compari-son of our results on Sina Weibo with Twitter is hindered by a number of factors,including the different characters of the listings, the different roles hashtags playin these services and the differences due to the scripts. Nevertheless, we tried tocapture at least the overall trends (see Fig. SI1).
Figure SI1
Overview of the cumulative number of topics during the observation period onTwitter trending list in the United States from January 1, 2020 to April 16, 2020. (A) Cumulativegrowth of all topics. (B) Cumulative growth of COVID-related topics.
Fig. SI1 (A) shows the cumulative number of all the Twitter trending topics in theUnited States is almost perfectly linear. As Fig. SI1 (B) shows, the COVID-topicson Twitter trending list first grows very slowly at the beginning phase, and thenstarts to increase dramatically from late February 2020. The rate of COVID-relatedtopics is, however, much smaller in the Twitter list than on that of the Sina Weibo. ui and Kert´esz Page 18 of 22
SI2 Significance of correlations
To understand how the categories of time series of daily new hashtags move to-gether and whether there are blocks of categories that co-move, we presented thecorrelation matrices plot between the ten time series in the three periods after theoutbreak. In order to get information about the significance of the correlations weapply a null model, which is created by shuffling the times of the individual values,thus smearing out the correlations. Due to the finiteness of the time series, therewill be non-zero background noise level denoted by Z in the null model, defining thebackground to which measured real correlations can be compared. Z is calculatedby correlating 500 shuffled time series for each of the 10 categories. We observedthat all the pairs have similar standard deviations between around 0.16 to 0.2. Wetake a uniform value Z = 0 . C ij correlation matrix elementsare presented for which Z < | C ij | . The figure shows the different Mainland Chinatopical categories and their thresholded correlations in the three pandemic phases.In Fig. SI2 (B) most of the correlations are beyond the threshold, while in Fig. SI2(D) very few are beyond the threshold. In Fig. SI2 (F), though some values at theupper left and lower right corners are beyond the threshold, they are much weakerthan in Fig. SI2 (B). SI3 Categorized Sina Weibo hashtags and properties
We showed in the main paper Fig. 4 that the gap between the top 15 ranks andthe rest of the ranks in the rank diversity plot after the outbreak is caused bythe COVID-hashtags. In order to further understand the properties of COVID-hashtags and how they influenced the HSL hashtag dynamics, we compared thehighest rank and duration distribution of different COVID-categories with the non-COVID hashtags before and after the outbreak.Fig. SI3 shows a detailed comparison of the highest rank and duration of the cate-gorized Mainland China COVID-hashtags on Weibo Hot Search List (HSL), beforeand after the COVID-19 outbreak. As Fig. SI3 (A) shows, most of the categorieshave a median of highest rank close to 15. Science category and Bad News categoryare generally higher ranked than other categories. The median highest rank of thenon-COVID hashtags after the outbreak is the same with that of the hashtags beforethe outbreak (rank 19), while the median highest rank of the COVID-hashtags ishigher than both (rank 16). Fig. SI3 (B) shows the lifetime duration of the differentcategories. The median duration of most of the categories is less than 3.5 hours.Science category has the highest duration among all categories. Non-COVID hash-tags after the outbreak (3.95 hours) and hashtags before the outbreak (3.80 hours)have similar duration distributions. The COVID-hashtags generally have shorterduration (3.21 hours) than non-COVID hashtags.
SI4 Hashtag rank trajectory examples
In the main paper, we have seen strange drops in the rank diversity plot at theranks 29 and 34 after the outbreak, this implies that the number of unique hashtagsoccurred at these ranks in a given time interval is smaller than usual, so that thereshould be hashtags staying there for unusually long time. Here we present examples ui and Kert´esz Page 19 of 22
Figure SI2
Topical correlations of mainland China COVID-categories in three periods.Correlations lower than 0.2 are considered as insignificant and are converted to zero.ui and Kert´esz Page 20 of 22
Figure SI3
Boxplots of the highest rank (A) and duration (B) of the different categories. In bothplots, the purple categories are sub-categories of Mainland China category, which is colored inblue. The blue categories are sub-categories of Total COVID category, which is colored in orange.The dots in (B) are the outlier hashtags with long duration, e.g, the of normal and abnormal hashtag rank trajectory plots, and verify there are hashtagsthat stay at certain ranks such as rank 29 and 34 on the HSL for a strangely longtime without any fluctuation.
Figure SI4
Examples of rank trajectory plots of COVID-related hashtags. (A), (C), (E) Abnormalrank trajectory plots. (B), (D), (F) Normal rank trajectory plots.
Fig. SI4 shows examples of abnormal and normal rank trajectory plots of COVID-related hashtags on Weibo HSL. In Fig. SI4 (A), (C), (E), the ranks of the hashtagsstay strangely long time at ranks 29 and 34, and then disappear from the HSL. Fig.SI4 (B), (D), (F) show relatively natural fluctuations in the rank trajectory plots.The example hashtags and their translations are shown in Table SI1. The abnormalrank plots are likely due to the algorithm intervention from Sina Weibo.
Table SI1
Chinese original and translations of example hashtags in Figure SI4.