22
Risk Communication in Asian Countries: COVID-19 Discourse on Twitter Sungkyu Park 1,2* Sungwon Han 1,2* Jeongwook Kim 1,2* Mir Majid Molaie 1 Hoang Dieu Vu 3 Karandeep Singh 2 Jiyoung Han 1 Wonjae Lee 1 Meeyoung Cha 2,1 1 Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea 2 Institute for Basic Science (IBS), Daejeon, South Korea 3 Hanoi University of Science and Technology (HUST), Hanoi, Vietnam Abstract COVID-19 has become one of the most widely talked about topics on social media. This re- search characterizes risk communication pat- terns by analyzing the public discourse on the novel coronavirus from four Asian coun- tries: South Korea, Iran, Vietnam, and India, which suffered the outbreak to different de- grees. The temporal analysis shows that the of- ficial epidemic phases issued by governments do not match well with the online attention on COVID-19. This finding calls for a need to an- alyze the public discourse by new measures, such as topical dynamics. Here, we propose an automatic method to detect topical phase transitions and compare similarities in major topics across these countries over time. We ex- amine the time lag difference between social media attention and confirmed patient counts. For dynamics, we find an inverse relationship between the tweet count and topical diversity. 1 Introduction The novel coronavirus pandemic (COVID-19) has affected global health and the economy. Social media and Internet usage to seek and share in- formation about the virus have increased mas- sively (Lazer et al., 2018), leaving them to be an excellent medium to examine the patterns of risk communication during a pandemic. Unfortunately, unconfirmed and intentional spread of false infor- mation can be seen on these platforms, jeopardizing public health. Studies have shown that people tend to share misinformation faster and more profoundly than real information (Vosoughi et al., 2018; Kwon et al., 2013; Kim et al., 2018). The sheer amount of information and a mixture of right and wrong confuses people of what safety health guidelines to follow. A new term named Infodemic, which com- * Equal contribution. Correspondence to: Meeyoung Cha <[email protected]>. bines information and pandemic, has been newly in- troduced to describe this phenomenon 1 . In practice, Infodemic has already heavily impacted on soci- ety. For instance, the rhetoric of misinformation on COVID-19 has shifted from false preventive mea- sures to the anti-vaccination movement (Stecula et al., 2020) and vandalism towards telecommuni- cation infrastructures (Ahmed et al., 2020). In this research, we gathered data from social me- dia to understand public discourse on the pandemic. Understanding the public concern will help deter- mine which unproven claims or misinformation to debunk first, which contributes to fighting the disease. Primarily, we aim to discern what people say in the wild. For instance, identifying prevalent misinformation in a handful of countries first can help debunk the same piece of misinformation in other countries before the misinformation becomes a dominant topic and poses a threat to public health. To detect timely topics by phase, we need to decide temporal phases first that can well reflect the real events and prevailing circumstances. If epidemic phases issued by each government are credible, we can directly use them. Otherwise, we can think of an alternative approach where we ex- tract topics corresponding to the decided temporal phases by utilizing natural computational language processing methods. Based on those topics, we pro- vide implications of the unique traits of risk com- munication locally and globally. This attempt helps to alleviate the propagation of false claims that can threaten public safety amid the COVID-19 pan- demic. In this light, we have set up the following four research questions. Note that the developed codes can be accessed via Multimedia Appendix 1 and a GitHub page 2 . 1 Coronavirus Disease 2019 (COVID-19) Situation Report. https://bit.ly/2SKCl8X. 2 The crawled Twitter dataset and the detailed informa- tion about the language-specific tokenizers is explained at arXiv:2006.12218v3 [cs.SI] 14 Aug 2020

arXiv:2006.12218v3 [cs.SI] 14 Aug 2020

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: arXiv:2006.12218v3 [cs.SI] 14 Aug 2020

Risk Communication in Asian Countries: COVID-19 Discourse on Twitter

Sungkyu Park1,2∗ Sungwon Han1,2∗ Jeongwook Kim1,2∗ Mir Majid Molaie1 Hoang Dieu Vu3

Karandeep Singh2 Jiyoung Han1 Wonjae Lee1 Meeyoung Cha2,1

1 Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea2 Institute for Basic Science (IBS), Daejeon, South Korea

3 Hanoi University of Science and Technology (HUST), Hanoi, Vietnam

Abstract

COVID-19 has become one of the most widelytalked about topics on social media. This re-search characterizes risk communication pat-terns by analyzing the public discourse onthe novel coronavirus from four Asian coun-tries: South Korea, Iran, Vietnam, and India,which suffered the outbreak to different de-grees. The temporal analysis shows that the of-ficial epidemic phases issued by governmentsdo not match well with the online attention onCOVID-19. This finding calls for a need to an-alyze the public discourse by new measures,such as topical dynamics. Here, we proposean automatic method to detect topical phasetransitions and compare similarities in majortopics across these countries over time. We ex-amine the time lag difference between socialmedia attention and confirmed patient counts.For dynamics, we find an inverse relationshipbetween the tweet count and topical diversity.

1 Introduction

The novel coronavirus pandemic (COVID-19) hasaffected global health and the economy. Socialmedia and Internet usage to seek and share in-formation about the virus have increased mas-sively (Lazer et al., 2018), leaving them to be anexcellent medium to examine the patterns of riskcommunication during a pandemic. Unfortunately,unconfirmed and intentional spread of false infor-mation can be seen on these platforms, jeopardizingpublic health. Studies have shown that people tendto share misinformation faster and more profoundlythan real information (Vosoughi et al., 2018; Kwonet al., 2013; Kim et al., 2018). The sheer amountof information and a mixture of right and wrongconfuses people of what safety health guidelines tofollow. A new term named Infodemic, which com-

∗Equal contribution. Correspondence to: Meeyoung Cha<[email protected]>.

bines information and pandemic, has been newly in-troduced to describe this phenomenon1. In practice,Infodemic has already heavily impacted on soci-ety. For instance, the rhetoric of misinformation onCOVID-19 has shifted from false preventive mea-sures to the anti-vaccination movement (Steculaet al., 2020) and vandalism towards telecommuni-cation infrastructures (Ahmed et al., 2020).

In this research, we gathered data from social me-dia to understand public discourse on the pandemic.Understanding the public concern will help deter-mine which unproven claims or misinformationto debunk first, which contributes to fighting thedisease. Primarily, we aim to discern what peoplesay in the wild. For instance, identifying prevalentmisinformation in a handful of countries first canhelp debunk the same piece of misinformation inother countries before the misinformation becomesa dominant topic and poses a threat to public health.

To detect timely topics by phase, we need todecide temporal phases first that can well reflectthe real events and prevailing circumstances. Ifepidemic phases issued by each government arecredible, we can directly use them. Otherwise, wecan think of an alternative approach where we ex-tract topics corresponding to the decided temporalphases by utilizing natural computational languageprocessing methods. Based on those topics, we pro-vide implications of the unique traits of risk com-munication locally and globally. This attempt helpsto alleviate the propagation of false claims that canthreaten public safety amid the COVID-19 pan-demic. In this light, we have set up the followingfour research questions. Note that the developedcodes can be accessed via Multimedia Appendix 1and a GitHub page2.

1Coronavirus Disease 2019 (COVID-19) Situation Report.https://bit.ly/2SKCl8X.

2The crawled Twitter dataset and the detailed informa-tion about the language-specific tokenizers is explained at

arX

iv:2

006.

1221

8v3

[cs

.SI]

14

Aug

202

0

Page 2: arXiv:2006.12218v3 [cs.SI] 14 Aug 2020

• Can official epidemic phases issued by govern-ments reflect the online interaction patterns?

• How to automatically divide topical phasesbased on a bottom-up approach?

• What are the major topics corresponding toeach topical phase?

• What are the unique traits of the topical trendsby country, and are there any notable on-line communicative characteristics that canbe shared?

2 Related Research

Issue Attention Cycle. The issue attention cyclemodel can provide a pertinent theoretical frame-work for our analyses (Downs, 1972). The modelconceptualizes how an issue rises into and fadesaway from the center of public attention. In thefirst stage, labeled as the pre-problem stage, anundesirable social condition (e.g., the appearanceof COVID-19) emerges but has not yet capturedmuch public attention. The second stage, alarmingdiscovery and euphoric enthusiasm, occur when atriggering event (e.g., the national spike of newlyconfirmed cases of COVID-19 or WHO’s a state-ment on COVID-19) heightens public awarenessof the issue. In the third stage, realizing the cost ofsignificant progress, people begin to recognize thehardship that requires a significant restructuring ofsociety and significant sacrifices of some groups inthe population to solve the problem. This causes agradual decline of intense public interest, the fourthstage. In the final stage, the post-problem stage, thecurrent issue is replaced by a new one and movesinto a twilight zone of lesser public attention.

Not all issues follow the five stages of the issueattention cycle (Nisbet and Huge, 2006). As thecyclical patterns of public attention evolve, a widearray of public discourse has been found acrossmultiple issues of climate change (McComas andShanahan, 1999), emerging technologies (Ander-son et al., 2012; Wang and Guo, 2018), and publichealth risks (Shih et al., 2008; Arendt and Scherr,2019). There are also cultural differences (Jung Ohet al., 2012). Despite these fragmented findings, is-sue attention cycle provides insights on how publicattention dramatically waxes and wanes. An issuethat has gone through the cycle is different fromthe issues that have not, with two respects, at least.

https://github.com/dscig/COVID19_tweetsTopic.

First, amid the time that the issue earned the na-tional prominence, new institutions, programs, andmeasures would have been developed to deal withthe situation. These entities are likely to persisteven after public attention has shifted elsewhere,thus having persistent societal impacts afterward.Second, these entities’ prolonged impacts are sub-ject to what was heavily discussed when the issuewas the primary public concern.

With this regard, scholars need to look at thespecifics of public conversations about a target is-sue. Although the issue attention cycle was initiallyconceived concerning traditional media, includingnewspapers and televisions, there is burgeoningliterature applying the model to social media plat-forms. Most notably, in Twitter, the public is in-creasingly turning to for information seeking andsharing without the gate-keeping process (Davidet al., 2016). Twitter conversations, as such, aremore resonating with real-world word-of-mouth. Itis not uncommon for journalists to refer to socialmedia in their news stories. Research consistentlyfinds that Twitter takes the initiative and greatercontrol over public discourse, especially in theearly stages of the issue-attention cycle (Jang et al.,2017; Wang and Guo, 2018). Building on theseprior studies, we analyze the volume of Twitterconversations about COVID-19 to demonstrate anissue attention cycle on a social media platform.

COVID-19-related Analyses. Many studieshave looked into the impact of the pandemic onvarious aspects. Some researchers have focused onpredicting the transmissibility of the virus. Onework estimated the viral reproduction number (R0)of the virus. It showed R0 of SARS-CoV-2 seemsto be already more substantial than that of SARS-CoV, which was the cause of the SARS outbreakfirstly found in the Guangdong province of Chinain 2002 (Liu et al., 2020). Another work claims thatby reducing 90% of travel worldwide, the spreadof epidemic could be significantly reduced, via astochastic mathematical prediction model of theinfection dynamics (Chinazzi et al., 2020).

Other lines of works are about understanding thepropagation of misinformation related to COVID-19. One study modeled the spread of misinforma-tion about COVID-19 as an epidemic model onvarious social media platforms like Twitter, Insta-gram, YouTube, Reddit, and Gab; it showed thatusers interact with each other differently and con-sume information differently depending on the plat-

Page 3: arXiv:2006.12218v3 [cs.SI] 14 Aug 2020

forms (Cinelli et al., 2020). In this light, media plat-forms like Facebook, YouTube, and Twitter claimto be trying to bring people back to a reliable sourceof medical information. To do so, they have directcommunication lines with CDC and WHO (Frenkelet al., 2020).

When narrowing down to local-specific matters,one article claims that the fake news online inJapan has led to xenophobia towards patients andChinese visitors, based on the qualitative analy-sis upon Japanese online news articles (Shimizu,2020). Meanwhile, a work surveyed with 300,000online panel members in 2015, South Korea, whilethe MERS outbreak was prevalent in this country,and claimed that if the information from publichealth officials is untrustworthy, people rely moreon online news outlets and communicate more viasocial media (Jang and Baek, 2019).

Another report argued that the public could notappreciate the information shared by public healthofficials due to prevalent misinformation on fakecures and conspiracy theories (Analytica). The ef-ficacy of the response to restrain this Infodemicvaries from country to country and depends on pub-lic confidence in the authorities. There is an attemptto compare three countries in terms of politicalbias. The authors conducted a large-scale surveyacross the US, the UK, and Canada. Statistically,they found that although political polarization ofCOVID-19 exists in the US and Canada, the exactbelief in COVID-19 is broadly related to the qual-ity of an individual’s reasoning skills, regardless ofpolitical ideology (Pennycook et al., 2020).

Many types of datasets are released to the public,as well as the research communities. One researchcrawled and opened tweet information from ten lan-guages with the COVID-19-relevant keywords foraround three months (Chen et al., 2020). Anotherwork collated over 59K academic articles, includ-ing over 47K full research papers about COVID-19, SARS-CoV-2, and the related Coronavirus is-sues (Wang et al., 2020) [27].

3 Method

3.1 DataWe crawled messages (tweets) from Twitter by us-ing the Twint Python library3 and the Search APIs4.

3An advanced twitter scraping tool is written in Python.The detailed information about the scraper is explained athttps://github.com/twintproject/twint.

4Official search tweets API for developers. Full-archive endpoint option provides complete access to

Language Duration Used Keyword† # of Tweets

Korean Jan 1 to Corona, 1,447,489Mar 27, 2020 Wuhan pneumonia

Farsi Jan 1 to #Corona, 459,610Mar 30, 2020 #Coronavirus,

#Wuhan,#pneumonia

Vietnamese Jan 1 to corona, 87,763Mar 31, 2020 n-cov,

covid,acute pneumonia

Hindi Jan 1 to Corona, 1,373,333Mar 31, 2020 Wuhan pneumonia

† Keywords are listed here after translated in English from theactual local languages, e.g., “코로나” −→ “Corona” in Korean.

Table 1: Statistics of the crawled tweets.

Our focus is on South Korea, Iran, Vietnam, andIndia. These are all Asian countries, and thereby,we can control covariates concerning significantdifferences shown on social media between West-ern and Asian cultures (Cho and Park, 2013; Liet al., 2018). These four countries were unique interms of dealing with the current outbreak. In Iran,the number of confirmed cases has gradually in-creased since the first confirmed case, whereas inVietnam, the numbers have consistently stayed (rel-atively) low. There was an abrupt increase in thenumbers after the first confirmed case in Korea, butit seems they have successfully flattened the risingcurve of confirmed cases, unlike other countries. InIndia, the situation had been relatively mild untilmid-Mar, since then, there has been a drastic surge.

We have set up two keywords, “Corona” and“Wuhan pneumonia” to crawl tweets (see Table 1to find exact keywords used for crawling tweetsfor each country) and collected tweets for the threemonths from January to March in 2020.

3.2 Pipeline for Detecting Topical Phasesthen Extracting Topics

The data collection pipeline includes the followingfour modules to eventually extract and labelmajor topics for certain phases, as shown in Fig-ure 1. This process is repeated for all four countries.

Preprocessing Data. To extract topics from thecollected tweets by NLP, we first need to tokenizethe data, which can be defined as converting datato the smallest units that have meaning. We havefiltered unnecessary textual information like stop

tweets from the first tweet in March 2006. See alsohttps://developer.twitter.com/en/docs/tweets/search/.

Page 4: arXiv:2006.12218v3 [cs.SI] 14 Aug 2020

Figure 1: The pipeline of the topic analysis.

words, special characters (non-letters), specialcommands, and emojis. We then utilize the existingPython tokenizer libraries corresponding to eachspecific language. Detailed information about thelanguage-specific tokenizers is explained at theprovided web link.

Decide Topical Phases. The next step is to setup specific target phases divided by dates to extracttopics. This is nontrivial since there are multiplefluctuations and changes on the topics reflectingthe real events such as an increase in COVID-19patients. Furthermore, we do not consider to use theepidemic phases announced by governments sincethe offline epidemic phases seem not to capture theactual online topic trends, as will be explained atthe forthcoming Basic Daily Trends section.

We, therefore, devise a bottom-up approach todecide dates that show the sign of sudden increasesin the daily volume of the tweets. We set up twolearnable parameters of the first derivatives (here-after velocity) and the second derivatives (here-after acceleration) of the daily tweet volumes, asillustrated in the formulas below where D is a day,t is a target date, and t− 1 is one past date from t.

velocity =# of tweett − # of tweett−1

Dt −Dt−1

acceleration =velocityt − velocityt−1

Dt −Dt−1

(1)

We reckoned the velocity and acceleration val-ues when the first confirmed case was announced asthe ground truths (GT) by country. The intuition ofthis approach is that the velocity and accelerationvalues are proxies to unique communication traitsfor each country in terms of a specific subject (i.e.,COVID-19 in our case). Once they have been com-puted from the first confirmed date, they would bethe same for the following period.

We have set up joint thresholds for velocityand acceleration to find dates that show whilevelocity is still smaller than the velocityGT ,acceleration becomes more substantial than theaccelerationGT : 0 < velocity < velocityGT &

acceleration > accelerationGT . In this light, welearn two parameters from the first confirmeddate by country then detect other dates thatcan be conjectured as the start of forthcomingtopical phases. When learning parameters, forvelocity, we round down the velocityGT valueand add 1. For acceleration, we round down theaccelerationGT , which is similar to the loss mini-mization concept of the machine-learning approach(i.e., a learning process is finished by one step).

We adopted a low-pass filter with 0.2 as thelow-frequency threshold to remove noisy signalsto smoothen the data. Finally, the temporal datawas divided into topical phases (See Appendix 1 tofind the computed daily velocity and accelerationtrends and decided phases accordingly by country).

Extract Topics – Model Topics. We have uti-lized the latent Dirichlet allocation (LDA) for thetopic modeling task. The LDA is a well-knownmachine-learning method to extract topics amidgiven textual documents (i.e., a collection of dis-crete data-points) – tweets in our case (Ostrowski,2015). The LDA generates and maximizes the jointprobability between the word distribution of topicsand the topic distribution of documents (Blei et al.,2003). The number of topics for each phase is ahyperparameter. We have set the range of the num-ber of topics is between 2 and 50, and calculateperplexity (PPL), probability of how many tokenscan be placed at the next step (i.e., indicating theambiguity of the possible next token). PPL is awell-known metric to optimize a language modelwith a training practice (Adiwardana et al., 2020).During the iteration, we have fixed the minimum re-quired frequency of words among the entire tweetsfor each phase to be 20 and the epoch number foreach topic to be 100, respectively. We then decidethe optimum number of topics for each phase bychoosing the minimum PPLs.

As a result, we have decided on the number oftopical phases and the corresponding optimizednumber of topics for each phase, as presented inTable 2. For example, in the case of South Korea,after the number of phases was decided as fourfrom the ‘Decide Topical Phases’ module, theoptimized number of topics is computed for eachphase as 2, 41, 15, and 43, respectively.

Extract Topics – Label Topics. This step in-volves labeling the main themes for the extracted

Page 5: arXiv:2006.12218v3 [cs.SI] 14 Aug 2020

Country Phase 0 Phase 1 Phase 2 Phase 3 Phase 4 Phase 5

South Korea Jan 1-19 Jan 20-Feb 12 Feb 13-Mar 9 Mar 10-27 - -velocity: 274 (tweets/day) 14.06 2,415.52 5,376.769 5,577.88acceleration: 109 (t2/d) 28.17 5,244.09 17,796.08 13,095.65

21.78 56,809.78 211,310.89 147,759.410.77 10.83 11.87 11.28

2−→1−→1 41−→18−→8 15−→6−→5 43−→21−→11Iran Jan 1-Feb 18 Feb 19-Mar 30 - - - -vel: 1,724 245.34 1,442.46acc: 787 385.63 5,272.04

1,315.13 22,128.763.41 4.20

3−→3−→3 5−→4−→6Vietnam Jan 1-20 Jan 21-25 Jan 26-Feb 15 Feb 16-Mar 4 Mar 5-22 Mar 23-31vel: 49 3.79 131.25 179.65 485.59 340.65 433.29acc: 23 7.37 218.50 686.60 1,238.77 1,089.94 1,224.00

0.21 20.75 159.80 582.29 192.24 201.860.03 0.09 0.23 0.47 0.18 0.16

19−→1−→1 3−→1−→2 6−→3−→4 46−→22−→7 48−→20−→10 16−→4−→2India Jan 1-29 Jan 30-Mar 9 Mar 10-Mar 31 - - -vel: 783 107.41 1,364.95 13,318.63acc: 285 269.72 4,261.13 58,924.55

415.69 14,467.8 318,368.051.54 3.40 5.40

3−→1−→3 50−→22−→5 47−→22−→9

Table 2: The extracted # of phases by country and the optimized # of topics within. First row by country: TimePeriod; Second: # of Users per Day; Third: # of Tweets per Day (A); Fourth: # of Retweets per Day (B); Fifth:Tweet Depth (B/A); Sixth: Optimized # of Topics based on PPL−→Major (i.e., 75% percentile) Topics−→Final # ofMerged Theme Labels from Human Annotators.

topics. This is to allocate semantic meanings toeach topic and to analyze the semantic trends. Wefirst sorted all tweets with the estimated topic num-bering by descending order (i.e., tweets with largervolumes in terms of the estimated topic numberinglist first) and discarded the minor topics that ac-counted for less than 25% percentile of all tweets.

We then extract 1K most retweeted tweets and30 highest probable keywords for each topic. Weprovide these datasets to domain experts from eachcountry and ask them to label themes for each topicbased on the given datasets. Via qualitative cod-ing, any similar or hierarchical topics were mergedinto a higher category. The final count of themesis shown in the third row for each country in Ta-ble 2. Besides, if one topic corresponds to severalthemes, then it is labeled to have multiple classes.The maximum number of multiple cases withintopics was two, and each case within a topic wasweighed as 0.5 when plotting the daily trends ofthe tweet counts.

Concerning the local/global news themes, wehave narrowed down the labels since people talkabout different issues under the news category. Wehave sub-labeled them as _confirmed if tweets areabout the confirmed/death cases, _hate if about the

hate crimes towards individual races, _economy ifabout the economic situations/policies, _cheerupif about supporting each other, _education if aboutwhen to reopen schools, and none if about generalinformation, respectively.

4 Result

4.1 Basic Daily TrendsWe first depict the trends by plotting the dailynumber of tweets (see Figure 2). We then see thedaily tweet counts, and the daily number of theCOVID-19 confirmed cases simultaneously bycountry, as depicted in Figure 3–6: adding to thetwo trends, we include official epidemic phasesannounced by each government as vertical lines.We confirmed that the tweet trends are associatedwith the confirmed case trends by seeing the tweetand confirmed case trends. Yet, the epidemicphases do not explain the tweet trends accurately.

South Korea. The first COVID-19 patient wasreported on January 20, 20205. This explains whythe tweet count remains relatively low during earlyJanuary, and it increases mainly only after late Jan-

5 COVID-19 pandemic in Korea. Wikipedia 2020. URL:https://bit.ly/3fy4SZp.

Page 6: arXiv:2006.12218v3 [cs.SI] 14 Aug 2020

Figure 2: Daily trends on the Four countries: X-axis isdates and Y-axis is trends of # of tweets with log-scale.

uary (see Figure 3). On January 25, the Koreangovernment issued a travel warning on Wuhan andthe Hubei province, as well as the suggested evac-uation of Korean citizens from those areas, whichwas primarily discussed on Twitter. On February18, the numbers increased sharply due to the 31stconfirmed case, which was related to a cult reli-gious group "Shincheonji" in Daegu city. After the31st confirmed case had been found, the quarantineauthority tried rigorous testing focusing on Daegu,and the number of the confirmed cases were drasti-cally increasing until mid-March. The tweet trendsfollow the same pattern. However, the official epi-demic phases announced by the government, di-vided by the vertical dash lines in the figure, seemto lag from the increasing number of tweets. Thispattern shows that the official epidemic phases donot match well with online attention.

Figure 3: Daily trends on South Korea: start/end datesof the official epidemic phases (vertical dash lines),trends of # of tweets (blue lines), and that of # of theconfirmed cases (red bars).

Other countries. We have repeated the sameanalysis with three other countries, as shown inFigure 4–6 (See Appendix 2 to find the country’sdetailed explanations).

4.2 Extracted Topical TrendsWe utilized the theme labels acquired from the‘Label Topics’ module daily and analyzed the topic

Figure 4: Daily trends on Iran: start/end dates of theofficial epidemic phases (vertical dash lines), trends of# of tweets (blue lines), and that of # of the confirmedcases (red bars).

Figure 5: Daily trends on Vietnam: start/end dates ofthe official epidemic phases (vertical dash lines), trendsof # of tweets (blue lines), and that of # of the con-firmed cases (red bars).

Figure 6: Daily trends on India: start/end dates of theofficial epidemic phases (vertical dash lines), trends of# of tweets (blue lines), and that of # of the confirmedcases (red bars).

changes across time with plots for the four targetcountries. One plot shows daily trends based onthe number of tweets. In contrast, another plotshows the trends based on the number of tweetsthat contained mentions of country names like theU.S. Overall, as people increasingly talk more onthe COVID-19 outbreak (i.e., the daily numberof tweets increase), the topics people talk aboutbecome less diverse.

Page 7: arXiv:2006.12218v3 [cs.SI] 14 Aug 2020

South Korea. The data yielded a total of fourtopical phases, which are used in Figure 7. Phase 0has no related topic. Then from Phase 1 to Phase3, the number of topics diverged as 8, 5, and 11. InPhase 1, people talk much about personal thoughtsand opinions linked to the current outbreak, andthey cheered each other. On Phase 2, as the crisisgoing up to its peak, people talked less on per-sonal issues and mainly talked about political andcelebrity issues. The political issues were aboutshutting down the borders of South Korea for Chinaand other countries for Korea. On Phase 3, as thedaily number of tweets becomes smaller than Phase2, people tended to talk on more diverse topics, in-cluding local and global news. In particular, peoplewere worried about hate crimes directed towardsAsians in Western countries. People might be inter-ested in different subjects as they think the crisisseems to be off the peak.

We see the daily trends on the mention of othercountries by counting the tweets remarking onother country’s name, either in their local languagesor in English. Korea, China, and Japan were mostlymentioned, and we suspect that political and diplo-matic relationships mainly triggered it. Meanwhile,the US and Italy similarly were steadily mentionedacross the three months, and the media outletsbroadcasting global news affect this phenomenon.

Figure 7: Daily topical trends on South Korea: basedon # of tweets (top) and based on # of tweets countrynames mentioned (bottom).

Other countries. We repeat the same analysisand interpret the results for other cases, Iran, Viet-nam, and India, as depicted in Figure 8–10. SeeAppendix 3 for the derived topical trend graphs andthe detailed corresponding explanations by country.

Figure 8: Daily topical trends on Iran: based on # oftweets.

Figure 9: Daily topical trends on Vietnam: based on #of tweets.

Figure 10: Daily topical trends on India: based on # oftweets.

5 Discussion

We have analyzed tweets to understand the publicdiscourse on the COVID-19 pandemic. In South

Page 8: arXiv:2006.12218v3 [cs.SI] 14 Aug 2020

Korea, the daily numbers of tweets reached localmaxima with major offline events. However, in thecase of Iran and Vietnam, the tweet counts did notsynchronize well with offline events. It may be be-cause Twitter is not as widely used in these twocountries. Overall, it is interesting to observe thatpeaks in Twitter data do not necessarily correlatewith significant events identified by local govern-ments. Therefore, we use a bottom-up approach toexplore the topical phrases which resonate with theflow of the open views.

After extracting the topical phases, number 4 inSouth Korea, 2 in Iran, 6 in Vietnam, and 3 in In-dia, respectively, we used the LDA, and found theoptimum number of topics for each topical phase,and then labeled the corresponding topics with ap-propriate themes. In general, as people talk moreabout COVID-19, the topics they refer to tend to besmaller in number. This was more apparent whentweet depth value is used for phases, as presentedin Table 2. Tweet depth is defined as the number ofretweets per day divided by the number of tweetsper day. It can be deemed as a standardized cas-cading depth, and therefore, the more considerablevalue signifies the greater extent of the depth forone tweet. From the case from South Korea andVietnam, we could verify the observation as tweetdepth tends to get larger when people communicatemore on COVID-19. This phenomenon reaffirmsanother research that found the diameter value ofthe online Coronavirus network was lower thanthat of others (Park et al., 2020). However, for Iranand India, the number of phases were too small toobserve the topical trends’ general traits.

Moreover, we found that the daily tweet trendpeak succeeded in the daily confirmed cases. InIran, Vietnam, and India, the peak of the dailytweet trend preceded the peak of the daily con-firmed cases up to a few weeks. Although the twopeaks are close to each other in South Korea, it isworth noting that around that time, the country wasbecoming the most affected by COVID-19 outsidemainland China. Interestingly, as shown in Figure3-6, the upsurge of the number of tweets in SouthKorea, Iran, and Vietnam (except India) was simul-taneously observed at the end of February, beforethe upsurge of the locally confirmed cases. Giventhat COVID-19 is a global issue, this suggests thatthe issue attention cycle of COVID-19 on a socialmedia platform is more responsive to global eventsthan local ones. In this regard, the COVID-19 pan-

demic offers an exciting opportunity for future re-search to theorize the issue attention cycle modelon a global scale and see how it evolves in conjunc-tion with local specific topics such as increasing ordecreasing confirmed cases, government measures,and social conflicts.

When comparing South Korea and Vietnam,there is an intriguing point to discuss. The topicof Phase 0 in Korea was not related to COVID-19,whereas Vietnam was about global news with con-firmed cases. We do not attempt to generalize anyfindings due to the small tweet volume in Phase0 for both countries. Still, Vietnamese users dis-cussed the global epidemic issue more from thefirst place, and this tendency affects successful de-fending against pandemic later on.

To be specific to each country, in case of SouthKorea, when the local pandemic (offline) situationhas become severe (Phase 2), the number of topicsbecomes smaller, which means people focus moreon a handful of issues. A unique trait can be ob-served that in the (phase 0), people cheered eachother up and hustled to express solidarity in thedifficult times. In the case of Iran, the number oftopics has been relatively steady across time, whilethe significant topics discussed have been confinedto news and information: we interpret that Iranianusers tend to be cautious about using social media.In case of Vietnam, at Phase 4, where the tweettraffic is relatively lower as compared to the Phase3, the number of topics becomes more substantial,and the themes of topics become less direct to theconfirmed and death tolls, e.g., people talked aboutthe economy in Phase 2 and 4. Meanwhile, theIndian case indicates another unique trait: manytopics were mainly related to misinformation, thescale of which was much lower in other countries.A large portion of misinformation, disinformation,and hateful contents is steadily observed on bothPhase 1 and 2 (see Multimedia Appendix 3).

6 Concluding Remark

There are several limitations to be considered. First,we analyzed tweets solely from the four countries,and therefore, we need to be cautious about address-ing explanations and insights that can be appliedin general. We plan to extend the current study byincluding more countries. Second, there are otherways to decide the topical phases. Our approach canbe aligned with the issue attention cycle as we com-pute unique communication traits (i.e., velocity

Page 9: arXiv:2006.12218v3 [cs.SI] 14 Aug 2020

and acceleration by country) that would be rela-tively consistent in a pandemic of COVID-19.

Nonetheless, the current research provides avaluable picture of critical topics from multiplecountries on COVID-19. We automatically dividetopical phases and extract major topics by phase.We then find several issues that were uniquely man-ifested in the recent pandemic crisis by each coun-try. For instance, we may discover the emergenceof misinformation in Hindi tweets. Our findingsshed light on understanding public concerns andmisconceptions under the crisis and, therefore, canhelp determine what misinformation is to be dis-credited with priority. This attempt helps defeat theInfodemic and limit the spread of the pandemic.

ReferencesDaniel Adiwardana, Minh-Thang Luong, David R So,

Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang,Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu,et al. 2020. Towards a human-like open-domainchatbot. arXiv preprint arXiv:2001.09977.

Wasim Ahmed, Josep Vidal-Alaball, Joseph Downing,and Francesc López Seguí. 2020. Covid-19 andthe 5g conspiracy theory: social network analysis oftwitter data. Journal of Medical Internet Research,22(5):e19458.

Oxford Analytica. Misinformation will underminecoronavirus responses. Emerald Expert Briefings,(oxan-db).

Ashley A Anderson, Dominique Brossard, and Di-etram A Scheufele. 2012. News coverage of con-troversial emerging technologies: Evidence for theissue attention cycle in print and online media. Poli-tics and the Life Sciences, 31(1-2):87–96.

Florian Arendt and Sebastian Scherr. 2019. Investigat-ing an issue–attention–action cycle: A case study onthe chronology of media attention, public attention,and actual vaccination behavior during the 2019measles outbreak in austria. Journal of health com-munication, 24(7-8):654–662.

David M Blei, Andrew Y Ng, and Michael I Jordan.2003. Latent dirichlet allocation. Journal of ma-chine Learning research, 3(Jan):993–1022.

Emily Chen, Kristina Lerman, and Emilio Ferrara.2020. Covid-19: The first public coronavirus twit-ter dataset. arXiv preprint arXiv:2003.07372.

Matteo Chinazzi, Jessica T Davis, Marco Ajelli, Cor-rado Gioannini, Maria Litvinova, Stefano Merler,Ana Pastore y Piontti, Kunpeng Mu, Luca Rossi,Kaiyuan Sun, et al. 2020. The effect of travel restric-tions on the spread of the 2019 novel coronavirus(covid-19) outbreak. Science, 368(6489):395–400.

Seong Eun Cho and Han Woo Park. 2013. A qualita-tive analysis of cross-cultural new media research:Sns use in asia and the west. Quality & Quantity,47(4):2319–2330.

Matteo Cinelli, Walter Quattrociocchi, AlessandroGaleazzi, Carlo Michele Valensise, Emanuele Brug-noli, Ana Lucia Schmidt, Paola Zola, Fabiana Zollo,and Antonio Scala. 2020. The covid-19 social mediainfodemic. arXiv preprint arXiv:2003.05004.

Clarissa C David, Jonathan Corpus Ong, and ErikaFille T Legara. 2016. Tweeting supertyphoonhaiyan: Evolving functions of twitter during and af-ter a disaster event. PloS one, 11(3).

Anthony Downs. 1972. Up and down with ecol-ogy: The issue attention cycle. The Public Interest,28:38–51.

Sheera Frenkel, Davey Alba, and Raymond Zhong.2020. Surge of virus misinformation stumps face-book and twitter. The New York Times.

Kyungeun Jang and Young Min Baek. 2019. When in-formation from public health officials is untrustwor-thy: The use of online news, interpersonal networks,and social media during the mers outbreak in southkorea. Health communication, 34(9):991–998.

S Mo Jang, Yong Jin Park, and Hoon Lee. 2017.Round-trip agenda setting: Tracking the intermediaprocess over time in the ice bucket challenge. Jour-nalism, 18(10):1292–1308.

Hyun Jung Oh, Thomas Hove, Hye-Jin Paek, Byoungk-wan Lee, Hyegyu Lee, and Sun Kyu Song. 2012.Attention cycles and the h1n1 pandemic: A cross-national study of us and korean newspaper coverage.Asian Journal of Communication, 22(2):214–232.

Jooyeon Kim, Behzad Tabibian, Alice Oh, BernhardSchölkopf, and Manuel Gomez-Rodriguez. 2018.Leveraging the crowd to detect and reduce thespread of fake news and misinformation. In Pro-ceedings of the Eleventh ACM International Confer-ence on Web Search and Data Mining, pages 324–332.

Sejeong Kwon, Meeyoung Cha, Kyomin Jung, WeiChen, and Yajun Wang. 2013. Prominent features ofrumor propagation in online social media. In 2013IEEE 13th International Conference on Data Min-ing, pages 1103–1108. IEEE.

David MJ Lazer, Matthew A Baum, Yochai Ben-kler, Adam J Berinsky, Kelly M Greenhill, FilippoMenczer, Miriam J Metzger, Brendan Nyhan, Gor-don Pennycook, David Rothschild, et al. 2018. Thescience of fake news. Science, 359(6380):1094–1096.

Yibai Li, Xuequn Wang, Xiaolin Lin, and MohammadHajli. 2018. Seeking and sharing health informa-tion on social media: A net valence model and cross-cultural comparison. Technological Forecasting andSocial Change, 126:28–40.

Page 10: arXiv:2006.12218v3 [cs.SI] 14 Aug 2020

Ying Liu, Albert A Gayle, Annelies Wilder-Smith, andJoacim Rocklöv. 2020. The reproductive numberof covid-19 is higher compared to sars coronavirus.Journal of travel medicine.

Katherine McComas and James Shanahan. 1999.Telling stories about global climate change: Measur-ing the impact of narratives on issue cycles. Com-munication research, 26(1):30–57.

Matthew C Nisbet and Mike Huge. 2006. Atten-tion cycles and frames in the plant biotechnologydebate: Managing power and participation throughthe press/policy connection. Harvard InternationalJournal of Press/Politics, 11(2):3–40.

David Alfred Ostrowski. 2015. Using latent dirichletallocation for topic modelling in twitter. In Proceed-ings of the 2015 IEEE 9th International Conferenceon Semantic Computing (IEEE ICSC 2015), pages493–497. IEEE.

Han Woo Park, Sejung Park, and Miyoung Chong.2020. Conversations and medical news frameson twitter: Infodemiological study on covid-19 insouth korea. Journal of Medical Internet Research,22(5):e18897.

Gordon Pennycook, Jonathon McPhetres, Bence Bago,and David Rand. 2020. Predictors of attitudes andmisperceptions about covid-19 in canada, the uk,and the usa.

Tsung-Jen Shih, Rosalyna Wijaya, and DominiqueBrossard. 2008. Media coverage of public healthepidemics: Linking framing and issue attention cy-cle toward an integrated theory of print news cover-age of epidemics. Mass Communication & Society,11(2):141–160.

Kazuki Shimizu. 2020. 2019-ncov, fake news, andracism. The Lancet, 395(10225):685–686.

Dominik Andrzej Stecula, Ozan Kuru, and Kath-leen Hall Jamieson. 2020. How trust in expertsand media use affect acceptance of common anti-vaccination claims. Harvard Kennedy School Mis-information Review, 1(1).

Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018.The spread of true and false news online. Science,359(6380):1146–1151.

Lucy Lu Wang, Kyle Lo, Yoganand Chandrasekhar,Russell Reas, Jiangjiang Yang, Darrin Eide, KathrynFunk, Rodney Kinney, Ziyang Liu, William Merrill,et al. 2020. Cord-19: The covid-19 open researchdataset. arXiv preprint arXiv:2004.10706.

Weirui Wang and Lei Guo. 2018. Framing geneticallymodified mosquitoes in the online news and twitter:Intermedia frame setting in the issue-attention cycle.Public Understanding of Science, 27(8):937–951.

Page 11: arXiv:2006.12218v3 [cs.SI] 14 Aug 2020

7 Appendices

7.1 Appendix 1. Computed DailyVelocity/Acceleration Trends andDecided Temporal Phases by Country

7.1.1 South Korea

Figure 11: The South Korean case: Daily trends on ve-locity and acceleration of the # of tweets (top) and di-vided phases detected by vertical dash lines (bottom).

Page 12: arXiv:2006.12218v3 [cs.SI] 14 Aug 2020

7.1.2 Iran

Figure 12: The Iranian case: Daily trends on velocityand acceleration of the # of tweets (top) and dividedphases detected by vertical dash lines (bottom).

Page 13: arXiv:2006.12218v3 [cs.SI] 14 Aug 2020

7.1.3 Vietnam

Figure 13: The Vietnamese case: Daily trends on veloc-ity and acceleration of the # of tweets (top) and dividedphases detected by vertical dash lines (bottom).

Page 14: arXiv:2006.12218v3 [cs.SI] 14 Aug 2020

7.1.4 India

Figure 14: The Indian case: Daily trends on velocityand acceleration of the # of tweets (top) and dividedphases detected by vertical dash lines (bottom).

Page 15: arXiv:2006.12218v3 [cs.SI] 14 Aug 2020

7.2 Appendix 2. Daily COVID-19 ConfirmedCase and Tweet Count Trends byCountry

7.2.1 Iran

On February 19, two people tested positive forSARS-CoV-2 in the city of Qom6. After this date,we see a significant surge in the number of tweets,and it reaches a peak in a few days (i.e., a peakshown on February 25). On February 23, the gov-ernment changed the alert from white to yellow.Although the number of confirmed cases keeps in-creasing, the number of tweets starts to decreasegradually with a little fluctuation, as shown in Fig-ure 4. Therefore, the trends of these two numbersshow different patterns in contrast to Korean tweets.Meanwhile, the government gradually increasedpreventive measures, and several cities with thehighest rate of infection were announced hot spotsor red zones. Overall, they did not place the wholecountry under the red alert. However, the govern-ment announced new guidance and banned all tripson March 25. On March 28, the president said that20% of the country’s annual budget would be allo-cated to fight the virus, which might be implicitlya sign of the red alert.

7.2.2 Vietnam

On January 23, 2020, Vietnam officially confirmedthe first two COVID-19 patients, who come fromWuhan, China7. After that, the number of tweets in-creased sharply and reached to peak in early Febru-ary, as shown in Figure 5. Although a few newcases were detected, the number of tweets tendedto decrease and remained stable. In the second halfof February, there are no new cases. However, thenumber of tweets increased rapidly and created anew peak. This peak could not remain for a longtime. Two possible reasons can explain this trend.The first is that the pandemic has spread over theworld. The second is that the last cases in Viet-nam were treated successfully. After a long timewith no new cases, Vietnam had confirmed contin-uously new cases in Hanoi and many other citiesfrom March 6. The number of tweets of this phaseincreased again and remain stable at a relativelyhigher level than the initial phase.

6COVID-19 pandemic in Iran. Wikipedia 2020.https://bit.ly/3ftQDV5.

7COVID-19 pandemic in Vietnam. Wikipedia 2020.https://bit.ly/35BOyC2.

7.2.3 IndiaThe first case of COVID-19 was confirmed onJanuary 30, 20208. The number of cases quicklyrose to three on account of students returning fromWuhan, China. Throughout February, no new caseswere reported, and the first weeks of March alsosaw a relatively low number of cases. The num-ber of cases, however, picked up numbers from thefourth week of March, notable were the 14 con-firmed cases of Italian tourists in the Rajasthanprovince. This eventually led to the government ofIndia declaring a complete lock-down of the coun-try. The daily number of tweets followed a similartrend as that of the number of cases as depictedin Figure 6. First confirmed cases around January30, 2020, caused a sudden spike in the number oftweets, that subsided in February. First COVID-19fatality on March 12 and some other COVID-19local events led to an exponential increase in tweets.The tweets peaked on March 22 when the govern-ment declared lock-down of areas with infectedcases and started trending downwards after that.It is strange that the government’s declaration ofnationwide lock-down on March 24 only causeda small spike in the number of tweets and trendcontinued downwards. However, March 31 saw asignificant spike in the number of tweets owingto confirmation of mass infections in a religiousgathering. Overall, the tweet trends seem to be syn-onymous with the government’s release of officialinformation (e.g., # confirmed cases and fatalitieson COVID-19).

8COVID-19 pandemic in India. Wikipedia 2020.https://bit.ly/37wIdsN.

Page 16: arXiv:2006.12218v3 [cs.SI] 14 Aug 2020

7.3 Appendix 3. Daily Topical Trends Shownin Social Media by Country

7.3.1 South KoreaPlease refer to the “Basic Daily Trends – South Ko-rea” subsection in the manuscript for the detaileddescriptions.

Figure 15: Daily topical trends on South Korea: basedon % (top), based on # of tweets (mid), based on # oftweets country names mentioned (bottom).

Page 17: arXiv:2006.12218v3 [cs.SI] 14 Aug 2020

7.3.2 IranFigure 16 top and mid illustrate two topical phases,their proportions, and daily topical frequencies inFarsi tweets. Phase 0 includes global news aboutChina and unconfirmed local news that reflects thefear of virus spread in the country. Political issuesform a remarkable portion of tweets in this phase,as the country has been struggling with various in-ternal and external conflicts in recent years, andthere was a congressional election in Iran. In phase1, a significant increase in the number of tweets oc-curs, where local news regarding the virus outbreakconstitutes the majority. An intriguing finding isthat informational tweets about preventive mea-surements overshadow global news, which can beexplained by the sociology of disaster that whenpeople in a less developed country are at risk, theynaturally tend to share more information. However,political tweets are still widespread because of thereasons above and public dissatisfaction about thegovernment response to the epidemic. This findingis also highlighted in Figure 16 bottom that the U.S.is the most mentioned name after Iran and China.One possible explanation is that the outbreak putsanother strain on the frail relationship between Iranand the U.S.

Page 18: arXiv:2006.12218v3 [cs.SI] 14 Aug 2020

Figure 16: Daily topical trends on Iran: based on % (top), based on # of tweets (mid), based on # of tweets countrynames mentioned (bottom).

Page 19: arXiv:2006.12218v3 [cs.SI] 14 Aug 2020

7.3.3 VietnamThere are six topical phases with Vietnam, and theyare visualized as in Figure 17 top and mid. Phase 0related to global news because, in this period, Viet-nam did not have any cases. From phase 1 to phase5, topics diverged separately, but they focused onlocal news except phase 3. Phase 3 was the phasewhen no new cases in Vietnam were detected. Wecan see a common point of phase 0, and phase 3is no new cases in Vietnam (local news), so tweetstended to talk more about global news. Especiallyin phase 3, we can see the increase of personal top-ics that most did not have in other phases. It wasbecause a conflict event that related to Korean visi-tors made a huge of personal tweets. Next, we showthe number of tweets that mentioned countries asin Figure 17 bottom. The most three countries men-tioned are Vietnam, Korea, and China. Vietnam andChina were mentioned frequently across phases be-cause Vietnam is the local, and China is the originalplace of the pandemic. Besides, Korea was men-tioned in many tweets, but they concentrated onlyon Phase 3. This is similar to topics changes due tothe Korean visitor event in Vietnam.

Page 20: arXiv:2006.12218v3 [cs.SI] 14 Aug 2020

Figure 17: Daily topical trends on Vietnam: based on % (top), based on # of tweets (mid), based on # of tweetscountry names mentioned (bottom).

Page 21: arXiv:2006.12218v3 [cs.SI] 14 Aug 2020

7.3.4 IndiaWe have established three topical phases for Hindi-written tweets (Figure 18 top and mid). In case ofIndia, the starting phase mainly has tweets that arefocused on sharing information about COVID-19,and global news about COVID-19 in China. Peo-ple tended to share the news about COVID-19 anduseful information on how to be safe. Thereafterin phase 1, the number of topics become more di-verse. Although the topics in this phase are relatedto information about the virus and global news, es-pecially China, a large chunk of it is formed byrumors or misinformation. The number of tweetsspike on January 30, 2020, when the first case wasconfirmed in India. Towards the end of Phase 1 isa spike in the number of tweets. This is primarilydue to the beginning of announcements of somemeasures by the government to contain the virus(such as halting issuing new Visas to India). Lastly,in phase 2, a huge spike in the number of tweets iswitnessed. The proportion of informational tweetsdecreases, whereas local news tweets confirmingnew cases increases. Regrettably, a marked portionof the tweets still consists of hateful content andmisinformation. Interestingly enough, although thesituation continued to worsen, tweets expressingdissatisfaction with the government are negligible.

Phase 3 also witnesses an increase in the men-tions of other countries, especially Brazil and Eu-rope, in addition to China and understandably, In-dia as depicted in Figure 18 bottom. This could beattributed to a growing number of the confirmedcases in Italy, Spain, and Brazil, as well as thenews surrounding the use of Hydroxychloroquinein Brazil. The U.S. also finds considerable men-tions due to the same reasons.

Page 22: arXiv:2006.12218v3 [cs.SI] 14 Aug 2020

Figure 18: Daily topical trends on India: based on % (top), based on # of tweets (mid), based on # of tweets countrynames mentioned (bottom).