13
Research Article Research on Sentiment Tendency and Evolution of Public Opinions in Social Networks of Smart City Yanni Liu, 1 Dongsheng Liu , 2 and Yuwei Chen 3 1 School of Statistics and Mathematics, Zhejiang Gongshang University, Hangzhou 310018, China 2 School of Computer and Information Engineering, Zhejiang Gongshang University, Hangzhou 310018, China 3 Beijing Yunzhenxin Technology Co., Ltd., Hangzhou 310012, China Correspondence should be addressed to Dongsheng Liu; [email protected] Received 29 March 2020; Revised 27 April 2020; Accepted 5 May 2020; Published 4 June 2020 Guest Editor: Zhihan Lv Copyright © 2020 Yanni Liu et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. With the rapid development of mobile Internet, the social network has become an important platform for users to receive, release, and disseminate information. In order to get more valuable information and implement effective supervision on public opinions, it is necessary to study the public opinions, sentiment tendency, and the evolution of the hot events in social networks of a smart city. In view of social networks’ characteristics such as short text, rich topics, diverse sentiments, and timeliness, this paper conducts text modeling with words co-occurrence based on the topic model. Besides, the sentiment computing and the time factor are incorporated to construct the dynamic topic-sentiment mixture model (TSTS). en, four hot events were randomly selected from the microblog as datasets to evaluate the TSTS model in terms of topic feature extraction, sentiment analysis, and time change. e results show that the TSTS model is better than the traditional models in topic extraction and sentiment analysis. Meanwhile, by fitting the time curve of hot events, the change rules of comments in the social network is obtained. 1. Introduction With the wide application of the Internet technology, the Internet has gradually transformed to the dynamic platform for information sharing and interactive communication. e 43 rd statistical report indicated that China had 854 million Internet users, and 99.1 percent of them access the Internet via mobile phones [1]. Social networks of a smart city have become the mainstream platform for information exchange and opinion expression. e users are not only the receivers of information, but also the creators to publish text com- ments in social networks. e hot events of public opinion refer that personal opinions are released on upcoming or already happened events by online communication tools and network platforms [2]. e spread of public opinion will snowball and expand by social networks, and emergent events may develop in an uncontrollable direction. Chain events caused by inadequate supervision on social networks can bring about the bad influence, and the frequency and harmfulness have shown an obvious rising trend in recent years [3]. Previous research has been studied from the qualitative aspects, such as the evolution mechanism of public opinion, information element classification, and influence judgment. However, the above research cannot meet the needs of online public opinion supervision, and the monitoring and management of hot events in social networks of a smart city need to implement quantitative judgment. For public opinion monitoring and management, Steyvers and Griffiths [4] proposed a topic model for public opinion detection in the social network. Yeh et al. [5] proposed a conceptually dynamic latent Dirichlet allocation (CD-LDA) model for topic content detection and tracking. e studies on probabilistic topic models for extracting hot topics from long texts have achieved good results [6], but these models are not suited to extract hot topics from short texts, such as Twitter and Facebook [7]. Kim et al. [8] introduced the sentiment scoring based on topics through the n-gram LDA Hindawi Complexity Volume 2020, Article ID 9789431, 13 pages https://doi.org/10.1155/2020/9789431

Research on Sentiment Tendency and Evolution of Public ...downloads.hindawi.com/journals/complexity/2020/9789431.pdf · , and the correlation distribution

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research on Sentiment Tendency and Evolution of Public ...downloads.hindawi.com/journals/complexity/2020/9789431.pdf · , and the correlation distribution

Research ArticleResearch on Sentiment Tendency and Evolution of PublicOpinions in Social Networks of Smart City

Yanni Liu1 Dongsheng Liu 2 and Yuwei Chen3

1School of Statistics and Mathematics Zhejiang Gongshang University Hangzhou 310018 China2School of Computer and Information Engineering Zhejiang Gongshang University Hangzhou 310018 China3Beijing Yunzhenxin Technology Co Ltd Hangzhou 310012 China

Correspondence should be addressed to Dongsheng Liu lds1118zjgsueducn

Received 29 March 2020 Revised 27 April 2020 Accepted 5 May 2020 Published 4 June 2020

Guest Editor Zhihan Lv

Copyright copy 2020 Yanni Liu et al +is is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

With the rapid development of mobile Internet the social network has become an important platform for users to receiverelease and disseminate information In order to get more valuable information and implement effective supervision on publicopinions it is necessary to study the public opinions sentiment tendency and the evolution of the hot events in social networksof a smart city In view of social networksrsquo characteristics such as short text rich topics diverse sentiments and timeliness thispaper conducts text modeling with words co-occurrence based on the topic model Besides the sentiment computing and thetime factor are incorporated to construct the dynamic topic-sentiment mixture model (TSTS) +en four hot events wererandomly selected from the microblog as datasets to evaluate the TSTS model in terms of topic feature extraction sentimentanalysis and time change +e results show that the TSTS model is better than the traditional models in topic extraction andsentiment analysis Meanwhile by fitting the time curve of hot events the change rules of comments in the social networkis obtained

1 Introduction

With the wide application of the Internet technology theInternet has gradually transformed to the dynamic platformfor information sharing and interactive communication+e43rd statistical report indicated that China had 854 millionInternet users and 991 percent of them access the Internetvia mobile phones [1] Social networks of a smart city havebecome the mainstream platform for information exchangeand opinion expression +e users are not only the receiversof information but also the creators to publish text com-ments in social networks +e hot events of public opinionrefer that personal opinions are released on upcoming oralready happened events by online communication tools andnetwork platforms [2] +e spread of public opinion willsnowball and expand by social networks and emergentevents may develop in an uncontrollable direction Chainevents caused by inadequate supervision on social networkscan bring about the bad influence and the frequency and

harmfulness have shown an obvious rising trend in recentyears [3]

Previous research has been studied from the qualitativeaspects such as the evolution mechanism of public opinioninformation element classification and influence judgmentHowever the above research cannot meet the needs ofonline public opinion supervision and the monitoring andmanagement of hot events in social networks of a smart cityneed to implement quantitative judgment For publicopinion monitoring andmanagement Steyvers and Griffiths[4] proposed a topic model for public opinion detection inthe social network Yeh et al [5] proposed a conceptuallydynamic latent Dirichlet allocation (CD-LDA) model fortopic content detection and tracking +e studies onprobabilistic topic models for extracting hot topics fromlong texts have achieved good results [6] but these modelsare not suited to extract hot topics from short texts such asTwitter and Facebook [7] Kim et al [8] introduced thesentiment scoring based on topics through the n-gram LDA

HindawiComplexityVolume 2020 Article ID 9789431 13 pageshttpsdoiorg10115520209789431

topic modeling technology and investigated the topic reportsand sentiment dynamics in news about Ebola virus Subenoet al [9] proposed a collapsed Gibbs sampling method basedon the latent Dirichlet allocation (LDA) model widely usedon Spark Park et al [10] used partially collapsed Gibbssampling for latent Dirichlet allocation and proposed areasoning LDAmethod which effectively obtained unbiasedestimation under flexible modeling of heterogeneous textcorpus by partially collapsed and Dirichlet mixedprocessing

However there are still some problems in the detectionof public opinion for events in social networks of a smartcity Firstly the detection and analysis of public opinions forhot events in social networks mostly remain in the quali-tative analysis or empirical research lacking quantitativeresearch Secondly there is a lack of public opinion analysismethod combining with the characteristics of the microblogin social networks +irdly for the sentiment analysis ofpublic opinion events most research adopts the two-stagemethod +at is to detect the event firstly and then conductsentiment analysis and judgment which is likely to lead tothe separation between the event and sentiment Fourthlythe dissemination for public opinions of hot events is timesensitive so it is necessary to involve the time factor incomment text analysis +us this paper proposes a mixedmodel with dynamic topic-sentiment (TSTS) for short textsin the social network which comprehensively incorporatesthe topics sentiment and time factor of events to detect thepublic opinion By quantitative analysis of real experimentdata the model can not only show the quantitative evolutiontrend of public opinion but also provide the propagationrule of sudden events

+e main contributions of this paper are reflected in twoaspects Firstly the TSTSmodel is proposed by extending thetopic model which can not only extract both topic andsentiment polarity words but also integrate the time factorto realize dynamic analysis of short texts Secondly thispaper studies the detection and evolution analysis of Internetpublic opinion by the dynamic topic-sentiment model +ereal datasets are used to conduct experimental analysis of theproposed model which can reflect the evolution trend ofpublic opinion diffusion

2 Related Research

21 Research on Relevant Topic Models +e traditionalopinion mining is to analyze the sentiment orientationsbased on the level of document and sentence+e traditionaltopic model was mainly used to compare the similarityamong articles by comparing the number of repeated wordsin different articles Blei et al [11] proposed the latentDirichlet allocation (LDA) topic model to mine the hiddensemantics of the text LDA is a three-layer Bayesian modelinvolving the document topic and word +e document iscomposed of a mixed distribution of topics and each topicfollows a polynomial distribution And Dirichlet distribu-tion is introduced as the prior information of the polynomialdistribution+e schematic diagram of the LDA topic modelis shown in Figure 1

211 Research on Topic Model of Short Text For most topicmodels the topic is the word that appears in the documentand has some connection In previous studies the topicmodel of short texts was expanded by introducing relevantbackground or author information which weakened thetopic and produced meaningless word contributionsSimilarly if co-occurrence words are extended to the wholecorpus in the experiment the occurrence frequency of eachword will be greatly increased and the connection betweenwords will be closer +en the modeling on documents willbe easier Based on the above hypotheses Cheng et al [12]proposed the biterm topic model (BTM) which is anotherway to explain the relationship between words and textmodeling of documents can be conducted based on theword co-occurrence model of the whole corpus Rashidet al [13] proposed the fuzzy topic modeling (FTM) forshort texts from the fuzzy perspective to solve the sparsityproblem Based on BTM Lu et al [14] introduced the RIBS-Bigrams model by learning the usage relationship whichshowed topics with two-letter groups Zhu et al [15]proposed a joint model based on the latentDirichlet allocation (LDA) and BTM which not only al-leviates the sparsity of the BTM algorithm in processingshort texts but also preserves topic information of thedocument through the extended LDA

212 Research on Mixed Model Integrating Topic SentimentTo evaluate the sentiment tendency of documents the jointsentiment topic (JST)model added the sentiment layer basedon the LDA model forming a four-layer Bayesian network[16] In this structure the sentiment polarity label is relatedto the document and the word generation is also influencedby both topic and sentiment In the traditional LDA modelthe generation of the document and words is determined bythe topic But in the JSTmodel the word of the document isdetermined by the topic and the sentiment Amplayo et al[17] proposed the aspect and sentiment unification model(ASUM) with sentiment level +e difference between JSTand ASUM is that words in a sentence come from differenttopics in the JSTmodel while all words of a sentence belongto one topic in the ASUM model

NM

K

Z

α θ

Фk

Figure 1 Schematic diagram of the LDA model

2 Complexity

213 Research on Topic Model with Time Factor Yao et al[18] revealed the semantic change process of words bycorrelating the time factor with Wikipedia text knowledgeIn terms of event evolution the associative topic model(ATM) is proposed [19] and the recognized cluster isrepresented as the word distribution of the cluster with thecorresponding event In addition Topic Over Time (TOT)was proposed to integrate the time factor into the LDAmodel [20] In the TOT model word co-occurrence canaffect the discovery of subject words and time informationcan also affect the extraction of topic words Unlike othermodels each topic is subject to a continuous distribution oftime and not rely onMarkov models to discretize time in theTOT model For each document generation the mixeddistribution of topics is determined by word co-occurrenceand timestamp [21] which allows the TOT model tomaintain independence in the time dimension and canpredict the time for the document without any timeinformation

22 Gibbs Sampling +e derivation of the experimentalmodel in the paper is a variant form of the Markov Chain sotheMarkov ChainMonte Carlo (MCMC)method is used forsampling in the experiment Gibbs sampling as one of theMCMC methods has been widely used in prior researchGibbs sampling is used to obtain a set of observations thatapproximate a specified multidimensional probability dis-tribution such as the probability distribution of two randomvariables

+e Gibbs sampling method used for the latentDirichlet allocation (LDA) model can significantly improvethe speed of the real-text corpus [22] Papanikolaou et al[23] estimated latent Dirichlet allocation (LDA) parametersfrom Gibbs sampling by using all conditional distributionsof potential variable assignments to effectively averagemultiple samples Zhou et al [24] proposed two kinds ofGibbs sampling inference methods such as Sparse BTM andESparse BTM to achieve BTM by weighing space and timeBhuyan [25] proposed the correlation random effect modelbased on potential variables and an algorithm to estimatecorrelation parameters based on Gibbs sampling

3 Model Constructing

31 Topic-SentimentMixtureModel with Time Factor (TSTS)Based on prior research this paper mainly improves thetopic model from three aspects Firstly the sparse matrixcaused by short texts in the social network is solved Sec-ondly the topic and sentiment distribution of the same wordpair are controlled+irdly the problem of text homogeneityis solved by incorporating the time factor into the topicmodel +erefore the TSTS model proposed in this paper isused to constrain the word pairs in the same documentwhich greatly reduces the complexity of time and space andmakes up for the sparse matrix of short texts to some extentMoreover the sentiment layer is integrated into TSTS byextending the hypotheses of ASUM and restrains word pairsgenerated by constraining sentences to follow the same

topic-sentiment distribution Finally the TSTS model in-corporating the time factor does not rely on the Markovmodel to discretize time and each topic is subject to thecontinuous temporal distribution For each documentgeneration the mixed distribution of topics is determined bythe words co-occurrence and timestamps TSTS model isshown in Figure 2

+e TSTS model simulates the generating process ofonline comments Generally the online comments fromusers can be regarded as a document which is short pithyand highly emotional +e word co-occurrence from BTM isthe most effective solution for the short-text topic model Inaddition the TSTS model with the time layer can contin-uously sample usersrsquo evaluation of hot events as well as thedynamic changes of usersrsquo sentiment +erefore the hy-potheses of the TSTS model are proposed as follows

(i) +e probability distribution of the time factor is notdirectly equal to the joint distribution of the topicand sentiment

(ii) +e topic-sentiment distribution of each documentis independent [26]

(iii) Similar topics of different sentiment polarity are notautomatically categorized [27]

Combined with the probability graph of Bayesianrsquosnetwork the TSTS model proposed in the paper has fourcharacteristics First a word pair is used to replace a singleword to carry out the sampling model Second eachtimestamp is related by topic and sentiment +ird theextraction of thematic characteristic and sentiment words isfor the whole corpus Fourth in the derivation process of theTSTS model it is not necessary to correspond betweenthematic feature and affective polarity words+at is becauseevery topic and sentiment have the corresponding poly-nomial word pair distribution In addition the text mod-eling process of the TSTS model also follows the assumptionthat there is a connection between the sentimental polaritywords of the topic features which also changes with the timefactor So the documents used to train the model must havea specific timestamp such as the publishing time of themicroblog

32 Generation of a Text in TSTSModel In the TSTS modelwe assumed that a corpus is composed of several texts Forinstance a microblog is a text containing two dimensions oftopic and sentiment Considering the effectiveness of publicopinions and related parameters of the microblog text worddistribution is determined by the topic sentiment and timeSo TSTS is an unsupervised topic-sentiment mixed model+e generation process of the document is as follows

(1) Extract a polynomial distribution θd on a topic fromthe Dirichlet prior distribution α that isθd sim Dir(α)

(2) Extract a polynomial distribution ψzl at some pointfrom the Dirichlet prior distribution μ that isψzl sim Dir(μ)

Complexity 3

(3) Extract a polynomial distribution πz in a sentimentfrom the Dirichlet prior distribution c that isπz sim Dir(c)

(4) For each document d and for each pair of words inthe article b (wi1 wi2) b isin B

(a) Choose a topic zi sim θdzi sim θd

(b) Choose an emotional label li sim πzi

(c) Choose a pair of words bi sim φzili(d) Choose a timestamp ti sim ψzili

As shown in Figure 2 word pairs in a document maybelong to different timestamps in the text generation process ofthe TSTS topic model In theory all the content of an articlesuch as words and topics should belong to the same timestampAlso the introduction of the time factor into the topic modelwill affect the topic homogeneity of an article However thedefault time factor of the TSTS model in the topic model willnot affect the homogeneity of the text So it is assumed that thetime factor in the paper has no weight Based on the TOTandthe group topic (GT) model the superparameter μ is intro-duced into TSTS to balance the interaction of time and wordsin document generation +e parametersrsquo explanation of theTSTS model is shown in Table 1

33 Model Deduction According to the Bayesian networkstructure diagram of the TSTS model the polynomial dis-tribution θ of the topic the distribution π of sentiment withthe topic the correlation distribution φ of word pairs withlttopic sentimentgt and the correlation distribution ψ oftime with lttopic sentimentgt can be calculated according tothe superparameters α β c and μ +en Gibbs sampling isdone that can ensure the convergence of the TSTS modelunder enough iteration times And each word in the doc-ument is assigned the topic and sentiment that are mostsuitable for the facts

According to the principle of Bayesian independencethe joint probability of word pair topic sentimental polarityand timestamp is given as follows

p(b t l z | α β c μ) p(b | l z β) middot p(t | l z μ)

middot p(l | z c) middot p(z | α)(1)

where the parameters are independent such as word pairs band parameters α c and μ timestamps t and parametersα c and β sentiment polarity l and parameters α μ and βand topic words z and parameters β c and μ +erefore thejoint distribution in the equation can be obtained by cal-culating the four parts on the right side of the equation

Given the sentiment polarity label of specific topicfeatures the distribution of b can be regarded as a poly-nomial distribution Based on the premise of topic words zi

and li bi is generated by N times with the probabilityp(b | l z) at each time Given that word pairs are inde-pendent of each other we can obtain

p(b | l z β) 1113945

N

i1p bi zi li

11138681113868111386811138681113872 1113873 1113945

N

i

β middot bi (2)

Superparameters are the representation parameters ofthe framework in the machine learning model [28] such asthe number of classes in the clustering method or thenumber of topics in the topic model In the Bayesian net-work the distribution and density function of θ are denotedas H(θ) and h(θ) respectively +ey are regarded as theprior distribution function and the prior density functionrespectively which are collectively referred to as the priordistribution If the distribution of θ is obtained after sam-pling it is called the posterior distribution Based on theconjugate property of Dirichletsimmultinomial when the

tL

Z

Wi1

Wi2nd

γ

π

μ

TS times T

S times TФ

β

ϕ

α

θ

D

Figure 2 TSTS model

Table 1 Explanation of parameters

D Number of documentsV Vocabulary sizeT Number of topicsS Number of sentiment polarityH Number of timestampsM Number of word pairsB Set of word pairsB Word pairs b (wi1 wi2)

W WordT TimeZ TopicL Sentiment polarity labelΘ [θd] polynomial distribution of topicsΦ [φzl]T times S times V matrix word pairsrsquo distributionΠ [πz]T times S matrix sentiment distributionΨ [ψzl] T times S times H matrix time distributionα Dirichlet prior parameters of Θc Dirichlet prior parameters of πβ Asymmetric Dirichlet prior parameters of Φμ Dirichlet prior parameters of ψnd +e number of word pairs in document dndj +e number of word pairs for topic j in document dnj +e number of word pairs for topic j

njk

+e number of word pairs assigned as topic j and sentimentpolarity k

nijk

+e number of word pair bi is assigned to the topic j andsentiment polarity k

njkh

+e number of word pair bi is assigned to the topic j andsentiment polarity k when timestamp is h

nminus p +e number of word pairs in the current document exceptfor the p position

4 Complexity

parameters in the population distribution conform to thedistribution law of polynomial (Multinomial) the conjugateprior distribution conforms to the following distribution

Dir(θ | α) + Mult(δ) Dir(θ | α + δ) (3)

For the general text model the discretized Dirichletdistribution and multinational distribution are as follows

Dir(b | β) Γ 1113936

Tj1 β1113872 1113873

1113937Tj1 Γ(β)

1113945

T

j1nj (4)

Mult(n | b N) N

n1113888 1113889 1113945

T

j1nj (5)

where i j k and h represent the iteration times of wordpairs topic sentiment and timestamp in the modelingprocess respectively Since the distribution of p(b | l z β)

follows the Dirichlet distribution this paper introduces φ forp(b | l z β) It can be obtained by integrating φ

p(b | l z β) 1113946 p(b | l zφ) middot p(φ | β)dφ

Γ(Vβ)

Γ(β)V1113888 1113889

TmiddotS

1113945j

1113945k

1113937iΓ nijk + β1113872 1113873

Γ njk + Vβ1113872 1113873

(6)

To estimate the posterior parameters φ in the formulawe can combine with the Bayes formula and the conjugateproperty of Dirichletsimmultinomial +e distribution of theposterior parameters can be obtained as follows

p((φ | l z β))propDir φ nijk + β111386811138681113868111386811138681113874 1113875 (7)

Given that the expectation of the Dirichlet distribution isE(Dir(ε)) εi1113936iεi so the calculated parameters are esti-mated by the known posterior parameter distribution ex-pectation +e estimated results are shown in equation (7)Similarly for p(t | l z μ) ψ is introduced By integrating ψ itcan be obtained as follows

p(t | l z μ) Γ(Hμ)

Γ(μ)H1113888 1113889

TmiddotS

1113945j

1113945k

1113937hΓ njkh + μ1113872 1113873

Γ njk + Hμ1113872 1113873 (8)

For p(l | z c) π is introduced By integrating π it can beobtained as follows

p(l | z c) Γ 1113936kck( 1113857

1113937kΓ ck( 11138571113888 1113889

T

1113945j

1113937kΓ njk + ck1113872 1113873

Γ nj + 1113936kαk1113872 1113873 (9)

For p(z | α) θ is introduced By integrating θ it can beobtained as follows

p(z | α) Γ 1113936jαj1113872 1113873

1113937jΓ αj1113872 1113873⎛⎝ ⎞⎠

D

1113945d

1113937jΓ ndj + αj1113872 1113873

Γ nd + 1113936jαj1113872 1113873 (10)

+e TSTS model can estimate the posterior distributionafter estimated values z and s have been obtained by sam-pling calculations +en the calculated equations (2)ndash(6) are

brought into equation (1) Combining with the nature ofGamma function the conditional distribution probability inGibbs sampling can be obtained

p sp k zp j b t lminus p1113868111386811138681113868 zminus p

α β c μ1113872 1113873

propn

minusp

dj + αj

nminusp

d + 1113936jαj

middotn

minusp

wpjk + β

nminusp

jk + Vβmiddot

nminusp

jk + ck

nminuspj + 1113936kck

middotn

minusp

jktp+ μ

nminusp

jk + Hμ

(11)

In order to simplify equation (6) the superparameter μ

1nd is introduced When the superparameters α β μ and c

are given the set B of the word pair the corresponding topicz and sentiment label l can be used to infer the parametersφ θ π and ψ based on Bayesrsquo rule and Dirichlet conjugateproperties

φjki nijk + βnjk + Vβ

θdj ndj + αj

nd + 1113936jαj

πjk njk + ck

nj + 1113936kck

ψjkh njkh + μnjk + Hμ

(12)

4 Experiment Analysis

41 Data Collection In order to verify the TSTS modelproposed in this paper the four hot events are randomlyselected from the trending searches of Sina Weibo in 2019And the comments of four events are regarded as the ex-perimental datasets +e four datasets selected are ldquoMilitaryparade in National Dayrdquo ldquo+e assault on a doctorrdquo ldquoHongKongrsquos eventrdquo and ldquoGarbage sorting in Shanghairdquo +ecomments are extracted from the Sina social networkplatform In the original datasets there are some mean-ingless words in the microblog text such as stop wordsinterjections in tone punctuation marks and numeric ex-pressions Before text modeling the word segmentationpackage in Python is used to process the experimental initialdataset In addition considering that comments on socialnetworks are relatively new the fashionable expressions inthe social network are collected and added to the customizeddictionary So these emerging words can be identified as faras possible and replaced with normal expressions In ad-dition there are some useless words in the text such as URLlinks and numbers which can be filtered by regular ex-pressions Finally a total of 14288 experimental data in fourevents are obtained +e description of four datasets isshown in Table 2

42 Sentiment Dictionary +e words or phrases in thesentiment dictionary have obvious sentiment tendency

Complexity 5

which can be divided into positive and negative words +esentiment dictionary in this paper has two major roles Onthe one hand we can identify sentiment polarity words anddistinguish topic features and sentiment words On the otherhand combining with sentiment prior information to makethe model more accurate in judging the sentiment polarity ofthe text Given that sentiment polarity words can reflectusersrsquo sentiment tendency it is of great significance to an-alyze the sentiment orientation of the text

At present there are two major Chinese sentiment dic-tionaries NTU and HowNet +e former dictionary contains2812 positive words and 8276 negative words +e lattercontains about 5000 positive words and 5000 negative wordsBased on HowNet and the classification of sentiment polarity[29 30] this paper constitutes the sentiment dictionary of theTSTS model evaluation experiment as shown in Table 3

43 Parameter Setting In this paper the Gibbs algorithm isused to sample the TSTS model and estimate four posteriorparameters According to the parameter setting in the tra-ditional topic model the superparameters are set as followsFirst the superparameter α is set as 50K and K is thenumber of topics extracted Second β is set as 001 +ird cis set as (005 times AVE)S AVE stands for the average lengthof articles that is the average number of words in themicroblog in this experiment and S stands for the totalnumber of polar tags Finally μ is set as 1nd

44 Evaluation Indicator For the extraction of topic fea-tures perplexity is used as an evaluation indicator tomeasure the predictive power of unknown data in theprocess of model modeling Also the lower perplexity meansbetter efficiency +e calculation formula of the perplexity isas follows

perplexity P 1113957Dt |M( 1113857 exp minus1113936

Dt

d1 logP 1113957bt

d |M1113874 1113875

1113936Dt

d11113957N

t

d

⎧⎪⎪⎨

⎪⎪⎩

⎫⎪⎪⎬

⎪⎪⎭

(13)

where 1113957Dt 1113957bt

d1113882 1113883Dt

d1represents an unknown dataset with the

timestamp t

P 1113957bt

d |M1113874 1113875 1113945

1113957Nt

d

n11113945

L

l11113945

T

t1P 1113957bdn | l z1113872 1113873P(z | l)P(l) (14)

where 1113957bt

d represents the vector set of word pairs in text d 1113957Nt

d

represents the number of word pairs in 1113957bt

d and P(1113957bt

d |M)

represents the direct possibility of training corpus and theformula is as follows

P 1113957bt

d |M1113874 1113875 1113945V

i11113944

l

l11113944

T

z1φlzi middot θdlz middot πdl)

1113957Nt

di ⎛⎝ (15)

For sentiment segmentation the sentiment judgmentfrom the perspective of the document is used as the eval-uation index which is based on the sentim ent polarity labelin the sentiment dictionary For the documents in thisexperiment the positive and negative sentiment of a doc-ument can be judged +is paper adopts the consistency testmethod to mark the sentiment labels [31]

5 Results

51 Extraction of Topics +e primary task of the TSTSmodel is to extract topic features As an extension of thetopic-sentiment mixed model the assessment is to judgewhether the extracted topic features are reasonable andaccurate Before extracting topic features from text mod-eling it is necessary to determine the number of topics to beextracted and the iteration times of Gibbs sampling For theeffective evaluation of topic discovery the degree of per-plexity is used as the measurement index in the paper +elower the perplexity is the better the fitting effect of themodel is Taking dataset 1 as an example the simulationresults are shown in Figure 3

Based on the experimental results shown in Figure 3 thenumber of topics was set 20 in the subsequent experimentsIn addition we can calculate the perplexity of three modelswith the change of the iterations By comparing the ex-perimental results of TSTS and LDA it can be found that theeffect of TSTS is always better than LDA and the degree ofperplexity decreases with the increase of iteration +atindicates that the topic discovery ability of TSTS graduallyimproves mainly because the TSTS model incorporates theword pairs to alleviate the sparse matrix of LDA for shorttexts By comparing the experimental results of TSTS andBTM it can be found that TSTS was better than BTM whenthe number of iterations increases However as the number

Table 2 Experiment datasets

+e dataset (the number of comments)Number of words in per

microblog Vocabulary size

Initial Pretreatment Initial Pretreatment+e dataset 1 (3562) 134 102 9789 6319+e dataset 2 (3527) 127 94 9736 6242+e dataset 3 (3617) 131 100 9780 6301+e dataset 4 (3582) 128 96 9742 6254Average 130 98 9762 6279

Table 3 Classification of sentiment words

Sentiment labels Happy Surprise Sad AngryVocabulary size 2467 276 3025 1897

6 Complexity

of iterations increased the gap of both models becamesmaller +is is because the word pair of BTM is used for thewhole corpus When the number of iterations is small theproportion of noise words is relatively large resulting inpoor quality of topic words In addition the sentiment layeris integrated into TSTS and the error generated in thesentiment estimation will affect the next iteration AlthoughTSTS is worse than BTMwhen there are more iterations theeffect of TSTS can still be balanced with BTM +ereforeduring the extraction of topic features the number of topicsand iterations can be set as 20 and 600

52 Sentiment Polarity +e information related to senti-ment polarity is provided in accordance with the topic andsentiment polarity of words +e sentiment distribution oftopics extracted from the TSTS model is shown in Figure 4In addition JST and ASUM are introduced as the com-parison to measure the effect of sentiment recognition of theTSTS model Each document has a binary sentiment labelsuch as positive or negative sentiment Taking dataset 2 ldquo+eattack on the doctorrdquo as an example the result is shown inFigure 4 +e number of topics is set to 5 at the beginning ofthe experiment With the refinement of granularity theperformance of the TSTS model increases Compared withJSTand ASUM the curve of the TSTS model changes greatlyconsidering the topic and sentiment relationship amongword pairs of the document +e change curve of the JSTmodel shows a steady upward trend and the identificationefficiency of ASUM is low +at is because ASUM has strictassumptions and the increase in the number of topics willcause the decentralization of topics and sentiments whichhas a great negative impact on the overall performance of themodel +e overall effect of the TSTS model was slightlybetter than JST and ASUM but the effect decreased slightlyafter the number of topics increased to 20+is is because the

data collected in the dataset are limited and the number oftopics has been set to discretize the word distribution +usthe judgment of sentiment polarity is affected+e sentimentlabel classification of documents is compared under differenttopics and the result of the TSTS model is better than JSTand ASUM

With the increase of topics the recognition performanceof the topic model has some fluctuations But the TSTSmodel was always better than JST and ASUM When thenumber of obtained topics and iterations is set as 20 and 600the TSTS is the best model in topic detection When thenumber of topics in the four datasets was set as 20 theaccuracy of sentiment polarity judgment is shown in Table 4

From Table 4 the TSTS model is better than JST andASUM in judging the sentiment polarity of documents +is

0 5 10 15 20 25 30 35Topics

4000

4500

5000

5500

6000

6500

7000Perplexity

LDABTMTSTS

(a)

0 200 400 600 800 1000 1200Iteration

3500

4000

4500

5000

5500

Perp

lexi

ty

LDABTMTSTS

(b)

Figure 3 +e relationship of the perplexity with the number of (a) topics and (b) iterations

0 5 10 15 20 25 30 35Topics

04

045

05

055

06

065

07

Accu

racy

ASUMJSTTSTS

Figure 4 Accuracy of sentiment polarity judgment

Complexity 7

is because sentiment polarity depends on the performance oftopic discovery in the previous stage In this experiment theeffect of JST and ASUM was exactly opposite +e differencewas caused by the length of the original document whichalso indirectly verified the effectiveness of the TSTS model

From Figure 5 it can be seen that the proportion ofpositive sentiment is significantly higher than the othersentiments in the dataset ldquoMilitary parade in National Dayrdquoand the dataset ldquoGarbage sorting in Shanghairdquo which isconsistent with the sentiment tendency of users in the socialnetwork For the second dataset ldquo+e assault on a doctorrdquothe two kinds of negative sentiment polarities of topic 1 and

topic 2 were compared Topic 1 is more likely to be sadsentiment while topic 2 is more likely to be angry senti-ment Topic 1 reflects the statement of the event and topic2 represents the follow-up discussion of the event

53 Topic and Sentiment Evolution +e curves of topicfeatures extracted from four datasets through the TSTSmodel are shown in Figure 6 Taking dataset 2 as an examplethe topic curve conforms to the evolution law of social andabrupt events +e two curves represent the trend of featurewords over time in topic 1 and topic 2 Topic 1 is the

Table 4 Accuracy of sentiment polarity judgment

ASUM JST TSTS+e dataset 1 04763 05427 06348+e dataset 2 04617 05398 06599+e dataset 3 04832 05461 06475+e dataset 4 04841 05294 06522

Happy Surprise Sad AngryEmotion

0

01

02

03

04

05

06

07

08

Prop

ortio

n

Topic 1Topic 2

(a)

Happy Surprise Sad AngryEmotion

0

01

02

03

04

05

06

Prop

ortio

n

Topic 1Topic 2

(b)

Happy Surprise Sad AngryEmotion

Topic 1Topic 2

0

01

02

03

04

05

Prop

ortio

n

(c)

Happy Surprise Sad AngryEmotion

Topic 1Topic 2

0

01

02

03

04

05

Prop

ortio

n

(d)

Figure 5 Sentiment distribution in four datasets (a) Military parade in National Day (b) +e assault on a doctor (c) Hong Kongrsquos event(d) Garbage sorting in Shanghai

8 Complexity

statement about the case itself From the beginning of theevent the amount of discussion about the event on the socialnetwork rose sharply and then gradually declined Topic 2

is a discussion on the development of the case which hascaused the second hot discussion again +e time is notconsistent when two curves reach the high peak +e peak

0

04

02

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(a)

0

04

02

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(b)

0

02

04

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(c)

0

02

04

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(d)

Figure 6 +e changing of topics in four datasets (a) Military parade in National Day (b) +e assault on a doctor (c) Hong Kongrsquos event(d) Garbage sorting in Shanghai

Complexity 9

3 5 7 9 2113 15 17 19 23 27251 11Time

00102030405060708

Prop

ortio

n

HappySurprise

SadAngry

(a)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

06

07

Prop

ortio

n

(b)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

0

01

02

03

04

05

Prop

ortio

n

HappySurprise

SadAngry

(c)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

Prop

ortio

n

(d)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

0

01

02

03

04

05

Prop

ortio

n

HappySurprise

SadAngry

(e)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

Prop

ortio

n

(f )

Figure 7 Continued

10 Complexity

value of topic 2 curve is lower than that of topic 1 whichreflects the discussion of the same event will fade over timeEven if there is a new topic discussion of a new topic is farlower than the beginning of the event Meanwhile similarresults can be verified in other three datasets

+e proportion of the sentiment polarity in the fourdatasets is shown in Figure 7 Since the sentiment polarityproportion is measured the four sentiment polarities arebalanced distributed before the occurrence of the eventsAfter the event occurred the polarity of positive and neg-ative sentiment began to change toward the two extremesAmong the four datasets the positive sentiment was higherthan the negative sentiment in the first dataset ldquoNational Daymilitary parade eventrdquo and the fourth dataset ldquoShanghaigarbage classification eventrdquo which also conforms to thesocial sentiment of events In addition it can be found thatthe difference among the four sentiment labels is large in theinitial stage and the distribution of sentiment labels be-comes stable in the later stage It has proved that the secondreport of social events does not cause the heat for the firsttime But the sentiment tendency judgment in social net-works will not decline sharply with the reduction of dis-cussion which can be proved in topic 2 of the seconddataset ldquo+e assault on a doctorrdquo Given that the topicfeatures come from the background of a corpus and containa lot of noise words the relative position of four curves iscloser in terms of sentiment polarity evolution Howeverthere is still a gap between government feelings which isdifferent from the average distribution of sentiment polarityat the beginning of the event

6 Discussion

From the perspective of theoretical significance this paperextends the LDA model to some extent First in view of thesparse matrix caused by the short text word pairs are in-troduced to replace a single word for text generationaccording to BTM Based on the hypotheses of JST and

ASUM the sentiment layer is introduced to form theBayesian network structure and the word pair is limited tothe same sentiment polarity distribution Second in order torealize dynamic analysis and text homogeneity the time-stamp and the corresponding superparameter are intro-duced to alleviate the problem of the word order in the textgeneration +ird this research combines behavioral ex-periments big data mining mathematical modeling andimitating to promote the research expansion of new situa-tions and new methods

From the perspective of practical significance this paperis of great value in tracking and monitoring topics of publicopinion in social networks+e online public opinions of hotevents can be monitored which contributes to accuratelyjudge social events and make emergency decisions forgovernment or departments In addition this paper ana-lyzed the evolution response and governance of publicopinion which is conducive to understand the formationmechanism and the collaborative evolution of publicopinion Meanwhile the use of public opinion informationcan detect and screen information prevent the spread ofrumor and scientifically formulate the mechanism of uti-lization to effectively reduce the loss risk

7 Conclusions

In the context of the mobile social network the number ofshort texts is growing explosively In order to extract in-formation from massive short texts quickly and monitorpublic opinions the TSTS model is proposed in the studybased on LDA BTM JST ASUM and TOT From theexperimental results the TSTS model achieves good per-formance In terms of topic feature extraction the degree ofperplexity of TSTS is always lower than LDA Moreoveralthough the degree of perplexity is slightly higher than BTMwith the increase of iteration times it can maintain thebalance with BTM In the sentiment analysis the effect ofTSTS was significantly better than JST and ASUM Finally

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

Prop

ortio

n

HappySurprise

SadAngry

04

035

03

025

02

015

01

(g)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

Prop

ortio

n

05045

04035

03025

02015

01

(h)

Figure 7+e changing of sentiment in four datasets (a) Military parade in National Day topic 1 (b) Military parade in National Day topic2 (c)+e assault on a doctor topic 1 (d)+e assault on a doctor topic 2 (e) Hong Kongrsquos event topic 1 (f ) Hong Kongrsquos event topic 2(g) Garbage sorting in Shanghai topic 1 (h) Garbage sorting in Shanghai topic 2

Complexity 11

the TSTS model incorporating the time factor can determinethe change trend of the topic and sentiment

+ere are still some shortcomings in this paper Firstlyfor the extraction of topic feature words the global topiclevel can be added to the topic layer of the TSTS model tofilter the common topic words Secondly in sentimentpolarity judgment the sentiment labels aremanually markedbased on prior knowledge However the sentiments areextremely rich and changeable In the future research theBayesian network and entity theory can be used to judgesentiment bias

Data Availability

+e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is work was supported in part by the Project of NationalScience and Technology Department (2018YFF0213102)Public Projects of Zhejiang Province (LGF18G010003LGF19G010002 and LGF20G010002) Science and TechnologyProject of Zhejiang Province (2020C01158) and First ClassDiscipline of Zhejiang - A (Zhejiang Gongshang University -Statistics)

References

[1] CNNIC ldquoStatistical report on internet development inChinardquo CNNIC Beijing China 2019 httpwwwcacgovcn2019-0830c_1124938750htm

[2] M Ingawale A Dutta R Roy and P Seetharaman ldquoNetworkanalysis of user generated content quality in wikipediardquoOnline Information Review vol 37 no 4 pp 602ndash619 2013

[3] Q Gao Y Tian and M Tu ldquoExploring factors influencingChinese userrsquos perceived credibility of health and safety in-formation onWeibordquo Computers in Human Behavior vol 45pp 21ndash31 2015

[4] M Steyvers and T Griffiths ldquoProbabilistic Topic ModelsrdquoHandbook of Latent Semantic Analysis Psychology Pressvol 427 no 7 pp 424ndash440 New York NY USA 2007

[5] J-F Yeh Y-S Tan and C-H Lee ldquoTopic detection andtracking for conversational content by using conceptual dy-namic latent Dirichlet allocationrdquo Neurocomputing vol 216pp 310ndash318 2016

[6] X Zhou and L Chen ldquoEvent detection over twitter socialmedia streamsrdquoDeVLDB Journal vol 23 no 3 pp 381ndash4002014

[7] Y J Du Y T Yi and X Y Li ldquoExtracting and tracking hottopics of micro-blogs based on improved LatentDirichlet allocationrdquo Engineering Applications of ArtificialIntelligence vol 87 pp 1ndash13 2020

[8] H J Kim Y K Jeong Y Kim et al ldquoTopic-based content andsentiment analysis of Ebola virus on twitter and in the newsrdquoJournal of Information Science vol 42 no 6 pp 763ndash7812016

[9] B Subeno R Kusumaningrum and F Farikhin ldquoOptimi-sation towards latent Dirichlet allocation its topic numberand collapsed gibbs sampling inference processrdquo Interna-tional Journal of Electrical and Computer Engineering (IJECE)vol 8 no 5 pp 3204ndash3213 2018

[10] H Park T Park and Y-S Lee ldquoPartially collapsed gibbssampling for latent Dirichlet allocationrdquo Expert Systems withApplications vol 131 pp 208ndash218 2019

[11] D M Blei A Y Ng and M I Jordan ldquoLatentDirichlet allocationrdquo Journal of Machine Learning Researchvol 3 pp 993ndash1022 2003

[12] X Cheng X Yan Y Lan and J Guo ldquoBTM topic modelingover short textsrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 12 pp 2928ndash2941 2014

[13] J Rashid S M A Shah and A Irtaza ldquoFuzzy topic modelingapproach for text mining over short textrdquo Information Pro-cessing amp Management vol 56 no 6 pp 1ndash19 2019

[14] H-Y Lu N Kang Y Li Q-Y Zhan J-Y Xie andC-J Wang ldquoUtilizing recurrent neural network for topicdiscovery in short text scenariosrdquo Intelligent Data Analysisvol 23 no 2 pp 259ndash277 2019

[15] L Zhu H Xu Y Xu et al ldquoA joint model of extended LDAand IBTM over streaming Chinese short textsrdquo IntelligentData Analysis vol 23 no 3 pp 681ndash699 2019

[16] M Tang J Jin Y Liu et al ldquoIntegrating topic sentiment andsyntax for modeling online product reviews a topic modelapproachrdquo Journal of Computing and Information Science inEngineering vol 19 no 1 pp 1ndash12 2019

[17] R K Amplayo S Lee and M Song ldquoIncorporating productdescription to sentiment topic models for improved aspect-based sentiment analysisrdquo Information Sciences vol 454-455pp 200ndash215 2018

[18] L Yao Y Zhang B Wei et al ldquoConcept over time thecombination of probabilistic topic model with Wikipediaknowledgerdquo Expert Systems with Applications vol 60pp 27ndash38 2016

[19] S Park W Lee and I-C Moon ldquoAssociative topic modelswith numerical time seriesrdquo Information Processing ampManagement vol 51 no 5 pp 737ndash755 2015

[20] P Lorenz-Spreen F Wolf J Braun et al ldquoTracking onlinetopics over time understanding dynamic hashtag commu-nitiesrdquo Computational Social Networks vol 5 no 1 pp 1ndash182018

[21] Y He C Lin W Gao et al ldquoDynamic joint sentiment-topicmodelrdquo ACM Transactions on Intelligent Systems amp Tech-nology vol 5 no 1 pp 1ndash21 2013

[22] L Kuo and T Y Yang ldquoAn improved collapsed Gibbssampler for Dirichlet process mixing modelsrdquo ComputationalStatistics amp Data Analysis vol 50 no 3 pp 659ndash674 2006

[23] Y Papanikolaou J R Foulds T N Rubin et al ldquoDensedistributions from sparse samples improved Gibbs samplingparameter estimators for LDArdquo Statistics vol 18 no 62pp 1ndash58 2015

[24] X Zhou J Ouyang and X Li ldquoTwo time-efficient Gibbssampling inference algorithms for biterm topic modelrdquo Ap-plied Intelligence vol 48 no 3 pp 730ndash754 2018

[25] P Bhuyan ldquoEstimation of random-effects model for longi-tudinal data with non ignorable missingness using Gibbssamplingrdquo Computational Statistics vol 34 no 4pp 1963ndash1710 2019

[26] C Lin Y He R Everson and S Ruger ldquoWeakly supervisedjoint sentiment-topic detection from textrdquo IEEE Transactionson Knowledge and Data Engineering vol 24 no 6pp 1134ndash1145 2012

12 Complexity

[27] A Daud and F Muhammad ldquoGroup topic modeling foracademic knowledge discoveryrdquo Applied Intelligence vol 36no 4 pp 870ndash886 2012

[28] Z Huang J Tang G Shan J Ni Y Chen and C Wang ldquoAnefficient passenger-hunting recommendation framework withmulti-task deep learningrdquo IEEE Internet of Dings Journalvol 6 no 5 pp 7713ndash7721 2019

[29] S MMohammad S Kiritchenko and X Zhu ldquoNRC-Canadabuilding the state-of-the-art in sentiment analysis of tweetsrdquoin Proceedings of the Seventh International Workshop onSemantic Evaluation Exercises (SemEval-2013) SpringerAtlanta GA USA pp 1ndash5 June 2013

[30] T Chen Q Li J Yang G Cong and G Li ldquoModeling of thepublic opinion polarization process with the considerations ofindividual heterogeneity and dynamic conformityrdquo Mathe-matics vol 7 no 10 p 917 2019

[31] S A Curiskis B Drake T R Osborn and P J Kennedy ldquoAnevaluation of document clustering and topic modelling in twoonline social networks twitter and Redditrdquo InformationProcessing and Management vol 57 no 2 pp 1ndash21 2019

Complexity 13

Page 2: Research on Sentiment Tendency and Evolution of Public ...downloads.hindawi.com/journals/complexity/2020/9789431.pdf · , and the correlation distribution

topic modeling technology and investigated the topic reportsand sentiment dynamics in news about Ebola virus Subenoet al [9] proposed a collapsed Gibbs sampling method basedon the latent Dirichlet allocation (LDA) model widely usedon Spark Park et al [10] used partially collapsed Gibbssampling for latent Dirichlet allocation and proposed areasoning LDAmethod which effectively obtained unbiasedestimation under flexible modeling of heterogeneous textcorpus by partially collapsed and Dirichlet mixedprocessing

However there are still some problems in the detectionof public opinion for events in social networks of a smartcity Firstly the detection and analysis of public opinions forhot events in social networks mostly remain in the quali-tative analysis or empirical research lacking quantitativeresearch Secondly there is a lack of public opinion analysismethod combining with the characteristics of the microblogin social networks +irdly for the sentiment analysis ofpublic opinion events most research adopts the two-stagemethod +at is to detect the event firstly and then conductsentiment analysis and judgment which is likely to lead tothe separation between the event and sentiment Fourthlythe dissemination for public opinions of hot events is timesensitive so it is necessary to involve the time factor incomment text analysis +us this paper proposes a mixedmodel with dynamic topic-sentiment (TSTS) for short textsin the social network which comprehensively incorporatesthe topics sentiment and time factor of events to detect thepublic opinion By quantitative analysis of real experimentdata the model can not only show the quantitative evolutiontrend of public opinion but also provide the propagationrule of sudden events

+e main contributions of this paper are reflected in twoaspects Firstly the TSTSmodel is proposed by extending thetopic model which can not only extract both topic andsentiment polarity words but also integrate the time factorto realize dynamic analysis of short texts Secondly thispaper studies the detection and evolution analysis of Internetpublic opinion by the dynamic topic-sentiment model +ereal datasets are used to conduct experimental analysis of theproposed model which can reflect the evolution trend ofpublic opinion diffusion

2 Related Research

21 Research on Relevant Topic Models +e traditionalopinion mining is to analyze the sentiment orientationsbased on the level of document and sentence+e traditionaltopic model was mainly used to compare the similarityamong articles by comparing the number of repeated wordsin different articles Blei et al [11] proposed the latentDirichlet allocation (LDA) topic model to mine the hiddensemantics of the text LDA is a three-layer Bayesian modelinvolving the document topic and word +e document iscomposed of a mixed distribution of topics and each topicfollows a polynomial distribution And Dirichlet distribu-tion is introduced as the prior information of the polynomialdistribution+e schematic diagram of the LDA topic modelis shown in Figure 1

211 Research on Topic Model of Short Text For most topicmodels the topic is the word that appears in the documentand has some connection In previous studies the topicmodel of short texts was expanded by introducing relevantbackground or author information which weakened thetopic and produced meaningless word contributionsSimilarly if co-occurrence words are extended to the wholecorpus in the experiment the occurrence frequency of eachword will be greatly increased and the connection betweenwords will be closer +en the modeling on documents willbe easier Based on the above hypotheses Cheng et al [12]proposed the biterm topic model (BTM) which is anotherway to explain the relationship between words and textmodeling of documents can be conducted based on theword co-occurrence model of the whole corpus Rashidet al [13] proposed the fuzzy topic modeling (FTM) forshort texts from the fuzzy perspective to solve the sparsityproblem Based on BTM Lu et al [14] introduced the RIBS-Bigrams model by learning the usage relationship whichshowed topics with two-letter groups Zhu et al [15]proposed a joint model based on the latentDirichlet allocation (LDA) and BTM which not only al-leviates the sparsity of the BTM algorithm in processingshort texts but also preserves topic information of thedocument through the extended LDA

212 Research on Mixed Model Integrating Topic SentimentTo evaluate the sentiment tendency of documents the jointsentiment topic (JST)model added the sentiment layer basedon the LDA model forming a four-layer Bayesian network[16] In this structure the sentiment polarity label is relatedto the document and the word generation is also influencedby both topic and sentiment In the traditional LDA modelthe generation of the document and words is determined bythe topic But in the JSTmodel the word of the document isdetermined by the topic and the sentiment Amplayo et al[17] proposed the aspect and sentiment unification model(ASUM) with sentiment level +e difference between JSTand ASUM is that words in a sentence come from differenttopics in the JSTmodel while all words of a sentence belongto one topic in the ASUM model

NM

K

Z

α θ

Фk

Figure 1 Schematic diagram of the LDA model

2 Complexity

213 Research on Topic Model with Time Factor Yao et al[18] revealed the semantic change process of words bycorrelating the time factor with Wikipedia text knowledgeIn terms of event evolution the associative topic model(ATM) is proposed [19] and the recognized cluster isrepresented as the word distribution of the cluster with thecorresponding event In addition Topic Over Time (TOT)was proposed to integrate the time factor into the LDAmodel [20] In the TOT model word co-occurrence canaffect the discovery of subject words and time informationcan also affect the extraction of topic words Unlike othermodels each topic is subject to a continuous distribution oftime and not rely onMarkov models to discretize time in theTOT model For each document generation the mixeddistribution of topics is determined by word co-occurrenceand timestamp [21] which allows the TOT model tomaintain independence in the time dimension and canpredict the time for the document without any timeinformation

22 Gibbs Sampling +e derivation of the experimentalmodel in the paper is a variant form of the Markov Chain sotheMarkov ChainMonte Carlo (MCMC)method is used forsampling in the experiment Gibbs sampling as one of theMCMC methods has been widely used in prior researchGibbs sampling is used to obtain a set of observations thatapproximate a specified multidimensional probability dis-tribution such as the probability distribution of two randomvariables

+e Gibbs sampling method used for the latentDirichlet allocation (LDA) model can significantly improvethe speed of the real-text corpus [22] Papanikolaou et al[23] estimated latent Dirichlet allocation (LDA) parametersfrom Gibbs sampling by using all conditional distributionsof potential variable assignments to effectively averagemultiple samples Zhou et al [24] proposed two kinds ofGibbs sampling inference methods such as Sparse BTM andESparse BTM to achieve BTM by weighing space and timeBhuyan [25] proposed the correlation random effect modelbased on potential variables and an algorithm to estimatecorrelation parameters based on Gibbs sampling

3 Model Constructing

31 Topic-SentimentMixtureModel with Time Factor (TSTS)Based on prior research this paper mainly improves thetopic model from three aspects Firstly the sparse matrixcaused by short texts in the social network is solved Sec-ondly the topic and sentiment distribution of the same wordpair are controlled+irdly the problem of text homogeneityis solved by incorporating the time factor into the topicmodel +erefore the TSTS model proposed in this paper isused to constrain the word pairs in the same documentwhich greatly reduces the complexity of time and space andmakes up for the sparse matrix of short texts to some extentMoreover the sentiment layer is integrated into TSTS byextending the hypotheses of ASUM and restrains word pairsgenerated by constraining sentences to follow the same

topic-sentiment distribution Finally the TSTS model in-corporating the time factor does not rely on the Markovmodel to discretize time and each topic is subject to thecontinuous temporal distribution For each documentgeneration the mixed distribution of topics is determined bythe words co-occurrence and timestamps TSTS model isshown in Figure 2

+e TSTS model simulates the generating process ofonline comments Generally the online comments fromusers can be regarded as a document which is short pithyand highly emotional +e word co-occurrence from BTM isthe most effective solution for the short-text topic model Inaddition the TSTS model with the time layer can contin-uously sample usersrsquo evaluation of hot events as well as thedynamic changes of usersrsquo sentiment +erefore the hy-potheses of the TSTS model are proposed as follows

(i) +e probability distribution of the time factor is notdirectly equal to the joint distribution of the topicand sentiment

(ii) +e topic-sentiment distribution of each documentis independent [26]

(iii) Similar topics of different sentiment polarity are notautomatically categorized [27]

Combined with the probability graph of Bayesianrsquosnetwork the TSTS model proposed in the paper has fourcharacteristics First a word pair is used to replace a singleword to carry out the sampling model Second eachtimestamp is related by topic and sentiment +ird theextraction of thematic characteristic and sentiment words isfor the whole corpus Fourth in the derivation process of theTSTS model it is not necessary to correspond betweenthematic feature and affective polarity words+at is becauseevery topic and sentiment have the corresponding poly-nomial word pair distribution In addition the text mod-eling process of the TSTS model also follows the assumptionthat there is a connection between the sentimental polaritywords of the topic features which also changes with the timefactor So the documents used to train the model must havea specific timestamp such as the publishing time of themicroblog

32 Generation of a Text in TSTSModel In the TSTS modelwe assumed that a corpus is composed of several texts Forinstance a microblog is a text containing two dimensions oftopic and sentiment Considering the effectiveness of publicopinions and related parameters of the microblog text worddistribution is determined by the topic sentiment and timeSo TSTS is an unsupervised topic-sentiment mixed model+e generation process of the document is as follows

(1) Extract a polynomial distribution θd on a topic fromthe Dirichlet prior distribution α that isθd sim Dir(α)

(2) Extract a polynomial distribution ψzl at some pointfrom the Dirichlet prior distribution μ that isψzl sim Dir(μ)

Complexity 3

(3) Extract a polynomial distribution πz in a sentimentfrom the Dirichlet prior distribution c that isπz sim Dir(c)

(4) For each document d and for each pair of words inthe article b (wi1 wi2) b isin B

(a) Choose a topic zi sim θdzi sim θd

(b) Choose an emotional label li sim πzi

(c) Choose a pair of words bi sim φzili(d) Choose a timestamp ti sim ψzili

As shown in Figure 2 word pairs in a document maybelong to different timestamps in the text generation process ofthe TSTS topic model In theory all the content of an articlesuch as words and topics should belong to the same timestampAlso the introduction of the time factor into the topic modelwill affect the topic homogeneity of an article However thedefault time factor of the TSTS model in the topic model willnot affect the homogeneity of the text So it is assumed that thetime factor in the paper has no weight Based on the TOTandthe group topic (GT) model the superparameter μ is intro-duced into TSTS to balance the interaction of time and wordsin document generation +e parametersrsquo explanation of theTSTS model is shown in Table 1

33 Model Deduction According to the Bayesian networkstructure diagram of the TSTS model the polynomial dis-tribution θ of the topic the distribution π of sentiment withthe topic the correlation distribution φ of word pairs withlttopic sentimentgt and the correlation distribution ψ oftime with lttopic sentimentgt can be calculated according tothe superparameters α β c and μ +en Gibbs sampling isdone that can ensure the convergence of the TSTS modelunder enough iteration times And each word in the doc-ument is assigned the topic and sentiment that are mostsuitable for the facts

According to the principle of Bayesian independencethe joint probability of word pair topic sentimental polarityand timestamp is given as follows

p(b t l z | α β c μ) p(b | l z β) middot p(t | l z μ)

middot p(l | z c) middot p(z | α)(1)

where the parameters are independent such as word pairs band parameters α c and μ timestamps t and parametersα c and β sentiment polarity l and parameters α μ and βand topic words z and parameters β c and μ +erefore thejoint distribution in the equation can be obtained by cal-culating the four parts on the right side of the equation

Given the sentiment polarity label of specific topicfeatures the distribution of b can be regarded as a poly-nomial distribution Based on the premise of topic words zi

and li bi is generated by N times with the probabilityp(b | l z) at each time Given that word pairs are inde-pendent of each other we can obtain

p(b | l z β) 1113945

N

i1p bi zi li

11138681113868111386811138681113872 1113873 1113945

N

i

β middot bi (2)

Superparameters are the representation parameters ofthe framework in the machine learning model [28] such asthe number of classes in the clustering method or thenumber of topics in the topic model In the Bayesian net-work the distribution and density function of θ are denotedas H(θ) and h(θ) respectively +ey are regarded as theprior distribution function and the prior density functionrespectively which are collectively referred to as the priordistribution If the distribution of θ is obtained after sam-pling it is called the posterior distribution Based on theconjugate property of Dirichletsimmultinomial when the

tL

Z

Wi1

Wi2nd

γ

π

μ

TS times T

S times TФ

β

ϕ

α

θ

D

Figure 2 TSTS model

Table 1 Explanation of parameters

D Number of documentsV Vocabulary sizeT Number of topicsS Number of sentiment polarityH Number of timestampsM Number of word pairsB Set of word pairsB Word pairs b (wi1 wi2)

W WordT TimeZ TopicL Sentiment polarity labelΘ [θd] polynomial distribution of topicsΦ [φzl]T times S times V matrix word pairsrsquo distributionΠ [πz]T times S matrix sentiment distributionΨ [ψzl] T times S times H matrix time distributionα Dirichlet prior parameters of Θc Dirichlet prior parameters of πβ Asymmetric Dirichlet prior parameters of Φμ Dirichlet prior parameters of ψnd +e number of word pairs in document dndj +e number of word pairs for topic j in document dnj +e number of word pairs for topic j

njk

+e number of word pairs assigned as topic j and sentimentpolarity k

nijk

+e number of word pair bi is assigned to the topic j andsentiment polarity k

njkh

+e number of word pair bi is assigned to the topic j andsentiment polarity k when timestamp is h

nminus p +e number of word pairs in the current document exceptfor the p position

4 Complexity

parameters in the population distribution conform to thedistribution law of polynomial (Multinomial) the conjugateprior distribution conforms to the following distribution

Dir(θ | α) + Mult(δ) Dir(θ | α + δ) (3)

For the general text model the discretized Dirichletdistribution and multinational distribution are as follows

Dir(b | β) Γ 1113936

Tj1 β1113872 1113873

1113937Tj1 Γ(β)

1113945

T

j1nj (4)

Mult(n | b N) N

n1113888 1113889 1113945

T

j1nj (5)

where i j k and h represent the iteration times of wordpairs topic sentiment and timestamp in the modelingprocess respectively Since the distribution of p(b | l z β)

follows the Dirichlet distribution this paper introduces φ forp(b | l z β) It can be obtained by integrating φ

p(b | l z β) 1113946 p(b | l zφ) middot p(φ | β)dφ

Γ(Vβ)

Γ(β)V1113888 1113889

TmiddotS

1113945j

1113945k

1113937iΓ nijk + β1113872 1113873

Γ njk + Vβ1113872 1113873

(6)

To estimate the posterior parameters φ in the formulawe can combine with the Bayes formula and the conjugateproperty of Dirichletsimmultinomial +e distribution of theposterior parameters can be obtained as follows

p((φ | l z β))propDir φ nijk + β111386811138681113868111386811138681113874 1113875 (7)

Given that the expectation of the Dirichlet distribution isE(Dir(ε)) εi1113936iεi so the calculated parameters are esti-mated by the known posterior parameter distribution ex-pectation +e estimated results are shown in equation (7)Similarly for p(t | l z μ) ψ is introduced By integrating ψ itcan be obtained as follows

p(t | l z μ) Γ(Hμ)

Γ(μ)H1113888 1113889

TmiddotS

1113945j

1113945k

1113937hΓ njkh + μ1113872 1113873

Γ njk + Hμ1113872 1113873 (8)

For p(l | z c) π is introduced By integrating π it can beobtained as follows

p(l | z c) Γ 1113936kck( 1113857

1113937kΓ ck( 11138571113888 1113889

T

1113945j

1113937kΓ njk + ck1113872 1113873

Γ nj + 1113936kαk1113872 1113873 (9)

For p(z | α) θ is introduced By integrating θ it can beobtained as follows

p(z | α) Γ 1113936jαj1113872 1113873

1113937jΓ αj1113872 1113873⎛⎝ ⎞⎠

D

1113945d

1113937jΓ ndj + αj1113872 1113873

Γ nd + 1113936jαj1113872 1113873 (10)

+e TSTS model can estimate the posterior distributionafter estimated values z and s have been obtained by sam-pling calculations +en the calculated equations (2)ndash(6) are

brought into equation (1) Combining with the nature ofGamma function the conditional distribution probability inGibbs sampling can be obtained

p sp k zp j b t lminus p1113868111386811138681113868 zminus p

α β c μ1113872 1113873

propn

minusp

dj + αj

nminusp

d + 1113936jαj

middotn

minusp

wpjk + β

nminusp

jk + Vβmiddot

nminusp

jk + ck

nminuspj + 1113936kck

middotn

minusp

jktp+ μ

nminusp

jk + Hμ

(11)

In order to simplify equation (6) the superparameter μ

1nd is introduced When the superparameters α β μ and c

are given the set B of the word pair the corresponding topicz and sentiment label l can be used to infer the parametersφ θ π and ψ based on Bayesrsquo rule and Dirichlet conjugateproperties

φjki nijk + βnjk + Vβ

θdj ndj + αj

nd + 1113936jαj

πjk njk + ck

nj + 1113936kck

ψjkh njkh + μnjk + Hμ

(12)

4 Experiment Analysis

41 Data Collection In order to verify the TSTS modelproposed in this paper the four hot events are randomlyselected from the trending searches of Sina Weibo in 2019And the comments of four events are regarded as the ex-perimental datasets +e four datasets selected are ldquoMilitaryparade in National Dayrdquo ldquo+e assault on a doctorrdquo ldquoHongKongrsquos eventrdquo and ldquoGarbage sorting in Shanghairdquo +ecomments are extracted from the Sina social networkplatform In the original datasets there are some mean-ingless words in the microblog text such as stop wordsinterjections in tone punctuation marks and numeric ex-pressions Before text modeling the word segmentationpackage in Python is used to process the experimental initialdataset In addition considering that comments on socialnetworks are relatively new the fashionable expressions inthe social network are collected and added to the customizeddictionary So these emerging words can be identified as faras possible and replaced with normal expressions In ad-dition there are some useless words in the text such as URLlinks and numbers which can be filtered by regular ex-pressions Finally a total of 14288 experimental data in fourevents are obtained +e description of four datasets isshown in Table 2

42 Sentiment Dictionary +e words or phrases in thesentiment dictionary have obvious sentiment tendency

Complexity 5

which can be divided into positive and negative words +esentiment dictionary in this paper has two major roles Onthe one hand we can identify sentiment polarity words anddistinguish topic features and sentiment words On the otherhand combining with sentiment prior information to makethe model more accurate in judging the sentiment polarity ofthe text Given that sentiment polarity words can reflectusersrsquo sentiment tendency it is of great significance to an-alyze the sentiment orientation of the text

At present there are two major Chinese sentiment dic-tionaries NTU and HowNet +e former dictionary contains2812 positive words and 8276 negative words +e lattercontains about 5000 positive words and 5000 negative wordsBased on HowNet and the classification of sentiment polarity[29 30] this paper constitutes the sentiment dictionary of theTSTS model evaluation experiment as shown in Table 3

43 Parameter Setting In this paper the Gibbs algorithm isused to sample the TSTS model and estimate four posteriorparameters According to the parameter setting in the tra-ditional topic model the superparameters are set as followsFirst the superparameter α is set as 50K and K is thenumber of topics extracted Second β is set as 001 +ird cis set as (005 times AVE)S AVE stands for the average lengthof articles that is the average number of words in themicroblog in this experiment and S stands for the totalnumber of polar tags Finally μ is set as 1nd

44 Evaluation Indicator For the extraction of topic fea-tures perplexity is used as an evaluation indicator tomeasure the predictive power of unknown data in theprocess of model modeling Also the lower perplexity meansbetter efficiency +e calculation formula of the perplexity isas follows

perplexity P 1113957Dt |M( 1113857 exp minus1113936

Dt

d1 logP 1113957bt

d |M1113874 1113875

1113936Dt

d11113957N

t

d

⎧⎪⎪⎨

⎪⎪⎩

⎫⎪⎪⎬

⎪⎪⎭

(13)

where 1113957Dt 1113957bt

d1113882 1113883Dt

d1represents an unknown dataset with the

timestamp t

P 1113957bt

d |M1113874 1113875 1113945

1113957Nt

d

n11113945

L

l11113945

T

t1P 1113957bdn | l z1113872 1113873P(z | l)P(l) (14)

where 1113957bt

d represents the vector set of word pairs in text d 1113957Nt

d

represents the number of word pairs in 1113957bt

d and P(1113957bt

d |M)

represents the direct possibility of training corpus and theformula is as follows

P 1113957bt

d |M1113874 1113875 1113945V

i11113944

l

l11113944

T

z1φlzi middot θdlz middot πdl)

1113957Nt

di ⎛⎝ (15)

For sentiment segmentation the sentiment judgmentfrom the perspective of the document is used as the eval-uation index which is based on the sentim ent polarity labelin the sentiment dictionary For the documents in thisexperiment the positive and negative sentiment of a doc-ument can be judged +is paper adopts the consistency testmethod to mark the sentiment labels [31]

5 Results

51 Extraction of Topics +e primary task of the TSTSmodel is to extract topic features As an extension of thetopic-sentiment mixed model the assessment is to judgewhether the extracted topic features are reasonable andaccurate Before extracting topic features from text mod-eling it is necessary to determine the number of topics to beextracted and the iteration times of Gibbs sampling For theeffective evaluation of topic discovery the degree of per-plexity is used as the measurement index in the paper +elower the perplexity is the better the fitting effect of themodel is Taking dataset 1 as an example the simulationresults are shown in Figure 3

Based on the experimental results shown in Figure 3 thenumber of topics was set 20 in the subsequent experimentsIn addition we can calculate the perplexity of three modelswith the change of the iterations By comparing the ex-perimental results of TSTS and LDA it can be found that theeffect of TSTS is always better than LDA and the degree ofperplexity decreases with the increase of iteration +atindicates that the topic discovery ability of TSTS graduallyimproves mainly because the TSTS model incorporates theword pairs to alleviate the sparse matrix of LDA for shorttexts By comparing the experimental results of TSTS andBTM it can be found that TSTS was better than BTM whenthe number of iterations increases However as the number

Table 2 Experiment datasets

+e dataset (the number of comments)Number of words in per

microblog Vocabulary size

Initial Pretreatment Initial Pretreatment+e dataset 1 (3562) 134 102 9789 6319+e dataset 2 (3527) 127 94 9736 6242+e dataset 3 (3617) 131 100 9780 6301+e dataset 4 (3582) 128 96 9742 6254Average 130 98 9762 6279

Table 3 Classification of sentiment words

Sentiment labels Happy Surprise Sad AngryVocabulary size 2467 276 3025 1897

6 Complexity

of iterations increased the gap of both models becamesmaller +is is because the word pair of BTM is used for thewhole corpus When the number of iterations is small theproportion of noise words is relatively large resulting inpoor quality of topic words In addition the sentiment layeris integrated into TSTS and the error generated in thesentiment estimation will affect the next iteration AlthoughTSTS is worse than BTMwhen there are more iterations theeffect of TSTS can still be balanced with BTM +ereforeduring the extraction of topic features the number of topicsand iterations can be set as 20 and 600

52 Sentiment Polarity +e information related to senti-ment polarity is provided in accordance with the topic andsentiment polarity of words +e sentiment distribution oftopics extracted from the TSTS model is shown in Figure 4In addition JST and ASUM are introduced as the com-parison to measure the effect of sentiment recognition of theTSTS model Each document has a binary sentiment labelsuch as positive or negative sentiment Taking dataset 2 ldquo+eattack on the doctorrdquo as an example the result is shown inFigure 4 +e number of topics is set to 5 at the beginning ofthe experiment With the refinement of granularity theperformance of the TSTS model increases Compared withJSTand ASUM the curve of the TSTS model changes greatlyconsidering the topic and sentiment relationship amongword pairs of the document +e change curve of the JSTmodel shows a steady upward trend and the identificationefficiency of ASUM is low +at is because ASUM has strictassumptions and the increase in the number of topics willcause the decentralization of topics and sentiments whichhas a great negative impact on the overall performance of themodel +e overall effect of the TSTS model was slightlybetter than JST and ASUM but the effect decreased slightlyafter the number of topics increased to 20+is is because the

data collected in the dataset are limited and the number oftopics has been set to discretize the word distribution +usthe judgment of sentiment polarity is affected+e sentimentlabel classification of documents is compared under differenttopics and the result of the TSTS model is better than JSTand ASUM

With the increase of topics the recognition performanceof the topic model has some fluctuations But the TSTSmodel was always better than JST and ASUM When thenumber of obtained topics and iterations is set as 20 and 600the TSTS is the best model in topic detection When thenumber of topics in the four datasets was set as 20 theaccuracy of sentiment polarity judgment is shown in Table 4

From Table 4 the TSTS model is better than JST andASUM in judging the sentiment polarity of documents +is

0 5 10 15 20 25 30 35Topics

4000

4500

5000

5500

6000

6500

7000Perplexity

LDABTMTSTS

(a)

0 200 400 600 800 1000 1200Iteration

3500

4000

4500

5000

5500

Perp

lexi

ty

LDABTMTSTS

(b)

Figure 3 +e relationship of the perplexity with the number of (a) topics and (b) iterations

0 5 10 15 20 25 30 35Topics

04

045

05

055

06

065

07

Accu

racy

ASUMJSTTSTS

Figure 4 Accuracy of sentiment polarity judgment

Complexity 7

is because sentiment polarity depends on the performance oftopic discovery in the previous stage In this experiment theeffect of JST and ASUM was exactly opposite +e differencewas caused by the length of the original document whichalso indirectly verified the effectiveness of the TSTS model

From Figure 5 it can be seen that the proportion ofpositive sentiment is significantly higher than the othersentiments in the dataset ldquoMilitary parade in National Dayrdquoand the dataset ldquoGarbage sorting in Shanghairdquo which isconsistent with the sentiment tendency of users in the socialnetwork For the second dataset ldquo+e assault on a doctorrdquothe two kinds of negative sentiment polarities of topic 1 and

topic 2 were compared Topic 1 is more likely to be sadsentiment while topic 2 is more likely to be angry senti-ment Topic 1 reflects the statement of the event and topic2 represents the follow-up discussion of the event

53 Topic and Sentiment Evolution +e curves of topicfeatures extracted from four datasets through the TSTSmodel are shown in Figure 6 Taking dataset 2 as an examplethe topic curve conforms to the evolution law of social andabrupt events +e two curves represent the trend of featurewords over time in topic 1 and topic 2 Topic 1 is the

Table 4 Accuracy of sentiment polarity judgment

ASUM JST TSTS+e dataset 1 04763 05427 06348+e dataset 2 04617 05398 06599+e dataset 3 04832 05461 06475+e dataset 4 04841 05294 06522

Happy Surprise Sad AngryEmotion

0

01

02

03

04

05

06

07

08

Prop

ortio

n

Topic 1Topic 2

(a)

Happy Surprise Sad AngryEmotion

0

01

02

03

04

05

06

Prop

ortio

n

Topic 1Topic 2

(b)

Happy Surprise Sad AngryEmotion

Topic 1Topic 2

0

01

02

03

04

05

Prop

ortio

n

(c)

Happy Surprise Sad AngryEmotion

Topic 1Topic 2

0

01

02

03

04

05

Prop

ortio

n

(d)

Figure 5 Sentiment distribution in four datasets (a) Military parade in National Day (b) +e assault on a doctor (c) Hong Kongrsquos event(d) Garbage sorting in Shanghai

8 Complexity

statement about the case itself From the beginning of theevent the amount of discussion about the event on the socialnetwork rose sharply and then gradually declined Topic 2

is a discussion on the development of the case which hascaused the second hot discussion again +e time is notconsistent when two curves reach the high peak +e peak

0

04

02

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(a)

0

04

02

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(b)

0

02

04

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(c)

0

02

04

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(d)

Figure 6 +e changing of topics in four datasets (a) Military parade in National Day (b) +e assault on a doctor (c) Hong Kongrsquos event(d) Garbage sorting in Shanghai

Complexity 9

3 5 7 9 2113 15 17 19 23 27251 11Time

00102030405060708

Prop

ortio

n

HappySurprise

SadAngry

(a)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

06

07

Prop

ortio

n

(b)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

0

01

02

03

04

05

Prop

ortio

n

HappySurprise

SadAngry

(c)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

Prop

ortio

n

(d)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

0

01

02

03

04

05

Prop

ortio

n

HappySurprise

SadAngry

(e)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

Prop

ortio

n

(f )

Figure 7 Continued

10 Complexity

value of topic 2 curve is lower than that of topic 1 whichreflects the discussion of the same event will fade over timeEven if there is a new topic discussion of a new topic is farlower than the beginning of the event Meanwhile similarresults can be verified in other three datasets

+e proportion of the sentiment polarity in the fourdatasets is shown in Figure 7 Since the sentiment polarityproportion is measured the four sentiment polarities arebalanced distributed before the occurrence of the eventsAfter the event occurred the polarity of positive and neg-ative sentiment began to change toward the two extremesAmong the four datasets the positive sentiment was higherthan the negative sentiment in the first dataset ldquoNational Daymilitary parade eventrdquo and the fourth dataset ldquoShanghaigarbage classification eventrdquo which also conforms to thesocial sentiment of events In addition it can be found thatthe difference among the four sentiment labels is large in theinitial stage and the distribution of sentiment labels be-comes stable in the later stage It has proved that the secondreport of social events does not cause the heat for the firsttime But the sentiment tendency judgment in social net-works will not decline sharply with the reduction of dis-cussion which can be proved in topic 2 of the seconddataset ldquo+e assault on a doctorrdquo Given that the topicfeatures come from the background of a corpus and containa lot of noise words the relative position of four curves iscloser in terms of sentiment polarity evolution Howeverthere is still a gap between government feelings which isdifferent from the average distribution of sentiment polarityat the beginning of the event

6 Discussion

From the perspective of theoretical significance this paperextends the LDA model to some extent First in view of thesparse matrix caused by the short text word pairs are in-troduced to replace a single word for text generationaccording to BTM Based on the hypotheses of JST and

ASUM the sentiment layer is introduced to form theBayesian network structure and the word pair is limited tothe same sentiment polarity distribution Second in order torealize dynamic analysis and text homogeneity the time-stamp and the corresponding superparameter are intro-duced to alleviate the problem of the word order in the textgeneration +ird this research combines behavioral ex-periments big data mining mathematical modeling andimitating to promote the research expansion of new situa-tions and new methods

From the perspective of practical significance this paperis of great value in tracking and monitoring topics of publicopinion in social networks+e online public opinions of hotevents can be monitored which contributes to accuratelyjudge social events and make emergency decisions forgovernment or departments In addition this paper ana-lyzed the evolution response and governance of publicopinion which is conducive to understand the formationmechanism and the collaborative evolution of publicopinion Meanwhile the use of public opinion informationcan detect and screen information prevent the spread ofrumor and scientifically formulate the mechanism of uti-lization to effectively reduce the loss risk

7 Conclusions

In the context of the mobile social network the number ofshort texts is growing explosively In order to extract in-formation from massive short texts quickly and monitorpublic opinions the TSTS model is proposed in the studybased on LDA BTM JST ASUM and TOT From theexperimental results the TSTS model achieves good per-formance In terms of topic feature extraction the degree ofperplexity of TSTS is always lower than LDA Moreoveralthough the degree of perplexity is slightly higher than BTMwith the increase of iteration times it can maintain thebalance with BTM In the sentiment analysis the effect ofTSTS was significantly better than JST and ASUM Finally

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

Prop

ortio

n

HappySurprise

SadAngry

04

035

03

025

02

015

01

(g)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

Prop

ortio

n

05045

04035

03025

02015

01

(h)

Figure 7+e changing of sentiment in four datasets (a) Military parade in National Day topic 1 (b) Military parade in National Day topic2 (c)+e assault on a doctor topic 1 (d)+e assault on a doctor topic 2 (e) Hong Kongrsquos event topic 1 (f ) Hong Kongrsquos event topic 2(g) Garbage sorting in Shanghai topic 1 (h) Garbage sorting in Shanghai topic 2

Complexity 11

the TSTS model incorporating the time factor can determinethe change trend of the topic and sentiment

+ere are still some shortcomings in this paper Firstlyfor the extraction of topic feature words the global topiclevel can be added to the topic layer of the TSTS model tofilter the common topic words Secondly in sentimentpolarity judgment the sentiment labels aremanually markedbased on prior knowledge However the sentiments areextremely rich and changeable In the future research theBayesian network and entity theory can be used to judgesentiment bias

Data Availability

+e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is work was supported in part by the Project of NationalScience and Technology Department (2018YFF0213102)Public Projects of Zhejiang Province (LGF18G010003LGF19G010002 and LGF20G010002) Science and TechnologyProject of Zhejiang Province (2020C01158) and First ClassDiscipline of Zhejiang - A (Zhejiang Gongshang University -Statistics)

References

[1] CNNIC ldquoStatistical report on internet development inChinardquo CNNIC Beijing China 2019 httpwwwcacgovcn2019-0830c_1124938750htm

[2] M Ingawale A Dutta R Roy and P Seetharaman ldquoNetworkanalysis of user generated content quality in wikipediardquoOnline Information Review vol 37 no 4 pp 602ndash619 2013

[3] Q Gao Y Tian and M Tu ldquoExploring factors influencingChinese userrsquos perceived credibility of health and safety in-formation onWeibordquo Computers in Human Behavior vol 45pp 21ndash31 2015

[4] M Steyvers and T Griffiths ldquoProbabilistic Topic ModelsrdquoHandbook of Latent Semantic Analysis Psychology Pressvol 427 no 7 pp 424ndash440 New York NY USA 2007

[5] J-F Yeh Y-S Tan and C-H Lee ldquoTopic detection andtracking for conversational content by using conceptual dy-namic latent Dirichlet allocationrdquo Neurocomputing vol 216pp 310ndash318 2016

[6] X Zhou and L Chen ldquoEvent detection over twitter socialmedia streamsrdquoDeVLDB Journal vol 23 no 3 pp 381ndash4002014

[7] Y J Du Y T Yi and X Y Li ldquoExtracting and tracking hottopics of micro-blogs based on improved LatentDirichlet allocationrdquo Engineering Applications of ArtificialIntelligence vol 87 pp 1ndash13 2020

[8] H J Kim Y K Jeong Y Kim et al ldquoTopic-based content andsentiment analysis of Ebola virus on twitter and in the newsrdquoJournal of Information Science vol 42 no 6 pp 763ndash7812016

[9] B Subeno R Kusumaningrum and F Farikhin ldquoOptimi-sation towards latent Dirichlet allocation its topic numberand collapsed gibbs sampling inference processrdquo Interna-tional Journal of Electrical and Computer Engineering (IJECE)vol 8 no 5 pp 3204ndash3213 2018

[10] H Park T Park and Y-S Lee ldquoPartially collapsed gibbssampling for latent Dirichlet allocationrdquo Expert Systems withApplications vol 131 pp 208ndash218 2019

[11] D M Blei A Y Ng and M I Jordan ldquoLatentDirichlet allocationrdquo Journal of Machine Learning Researchvol 3 pp 993ndash1022 2003

[12] X Cheng X Yan Y Lan and J Guo ldquoBTM topic modelingover short textsrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 12 pp 2928ndash2941 2014

[13] J Rashid S M A Shah and A Irtaza ldquoFuzzy topic modelingapproach for text mining over short textrdquo Information Pro-cessing amp Management vol 56 no 6 pp 1ndash19 2019

[14] H-Y Lu N Kang Y Li Q-Y Zhan J-Y Xie andC-J Wang ldquoUtilizing recurrent neural network for topicdiscovery in short text scenariosrdquo Intelligent Data Analysisvol 23 no 2 pp 259ndash277 2019

[15] L Zhu H Xu Y Xu et al ldquoA joint model of extended LDAand IBTM over streaming Chinese short textsrdquo IntelligentData Analysis vol 23 no 3 pp 681ndash699 2019

[16] M Tang J Jin Y Liu et al ldquoIntegrating topic sentiment andsyntax for modeling online product reviews a topic modelapproachrdquo Journal of Computing and Information Science inEngineering vol 19 no 1 pp 1ndash12 2019

[17] R K Amplayo S Lee and M Song ldquoIncorporating productdescription to sentiment topic models for improved aspect-based sentiment analysisrdquo Information Sciences vol 454-455pp 200ndash215 2018

[18] L Yao Y Zhang B Wei et al ldquoConcept over time thecombination of probabilistic topic model with Wikipediaknowledgerdquo Expert Systems with Applications vol 60pp 27ndash38 2016

[19] S Park W Lee and I-C Moon ldquoAssociative topic modelswith numerical time seriesrdquo Information Processing ampManagement vol 51 no 5 pp 737ndash755 2015

[20] P Lorenz-Spreen F Wolf J Braun et al ldquoTracking onlinetopics over time understanding dynamic hashtag commu-nitiesrdquo Computational Social Networks vol 5 no 1 pp 1ndash182018

[21] Y He C Lin W Gao et al ldquoDynamic joint sentiment-topicmodelrdquo ACM Transactions on Intelligent Systems amp Tech-nology vol 5 no 1 pp 1ndash21 2013

[22] L Kuo and T Y Yang ldquoAn improved collapsed Gibbssampler for Dirichlet process mixing modelsrdquo ComputationalStatistics amp Data Analysis vol 50 no 3 pp 659ndash674 2006

[23] Y Papanikolaou J R Foulds T N Rubin et al ldquoDensedistributions from sparse samples improved Gibbs samplingparameter estimators for LDArdquo Statistics vol 18 no 62pp 1ndash58 2015

[24] X Zhou J Ouyang and X Li ldquoTwo time-efficient Gibbssampling inference algorithms for biterm topic modelrdquo Ap-plied Intelligence vol 48 no 3 pp 730ndash754 2018

[25] P Bhuyan ldquoEstimation of random-effects model for longi-tudinal data with non ignorable missingness using Gibbssamplingrdquo Computational Statistics vol 34 no 4pp 1963ndash1710 2019

[26] C Lin Y He R Everson and S Ruger ldquoWeakly supervisedjoint sentiment-topic detection from textrdquo IEEE Transactionson Knowledge and Data Engineering vol 24 no 6pp 1134ndash1145 2012

12 Complexity

[27] A Daud and F Muhammad ldquoGroup topic modeling foracademic knowledge discoveryrdquo Applied Intelligence vol 36no 4 pp 870ndash886 2012

[28] Z Huang J Tang G Shan J Ni Y Chen and C Wang ldquoAnefficient passenger-hunting recommendation framework withmulti-task deep learningrdquo IEEE Internet of Dings Journalvol 6 no 5 pp 7713ndash7721 2019

[29] S MMohammad S Kiritchenko and X Zhu ldquoNRC-Canadabuilding the state-of-the-art in sentiment analysis of tweetsrdquoin Proceedings of the Seventh International Workshop onSemantic Evaluation Exercises (SemEval-2013) SpringerAtlanta GA USA pp 1ndash5 June 2013

[30] T Chen Q Li J Yang G Cong and G Li ldquoModeling of thepublic opinion polarization process with the considerations ofindividual heterogeneity and dynamic conformityrdquo Mathe-matics vol 7 no 10 p 917 2019

[31] S A Curiskis B Drake T R Osborn and P J Kennedy ldquoAnevaluation of document clustering and topic modelling in twoonline social networks twitter and Redditrdquo InformationProcessing and Management vol 57 no 2 pp 1ndash21 2019

Complexity 13

Page 3: Research on Sentiment Tendency and Evolution of Public ...downloads.hindawi.com/journals/complexity/2020/9789431.pdf · , and the correlation distribution

213 Research on Topic Model with Time Factor Yao et al[18] revealed the semantic change process of words bycorrelating the time factor with Wikipedia text knowledgeIn terms of event evolution the associative topic model(ATM) is proposed [19] and the recognized cluster isrepresented as the word distribution of the cluster with thecorresponding event In addition Topic Over Time (TOT)was proposed to integrate the time factor into the LDAmodel [20] In the TOT model word co-occurrence canaffect the discovery of subject words and time informationcan also affect the extraction of topic words Unlike othermodels each topic is subject to a continuous distribution oftime and not rely onMarkov models to discretize time in theTOT model For each document generation the mixeddistribution of topics is determined by word co-occurrenceand timestamp [21] which allows the TOT model tomaintain independence in the time dimension and canpredict the time for the document without any timeinformation

22 Gibbs Sampling +e derivation of the experimentalmodel in the paper is a variant form of the Markov Chain sotheMarkov ChainMonte Carlo (MCMC)method is used forsampling in the experiment Gibbs sampling as one of theMCMC methods has been widely used in prior researchGibbs sampling is used to obtain a set of observations thatapproximate a specified multidimensional probability dis-tribution such as the probability distribution of two randomvariables

+e Gibbs sampling method used for the latentDirichlet allocation (LDA) model can significantly improvethe speed of the real-text corpus [22] Papanikolaou et al[23] estimated latent Dirichlet allocation (LDA) parametersfrom Gibbs sampling by using all conditional distributionsof potential variable assignments to effectively averagemultiple samples Zhou et al [24] proposed two kinds ofGibbs sampling inference methods such as Sparse BTM andESparse BTM to achieve BTM by weighing space and timeBhuyan [25] proposed the correlation random effect modelbased on potential variables and an algorithm to estimatecorrelation parameters based on Gibbs sampling

3 Model Constructing

31 Topic-SentimentMixtureModel with Time Factor (TSTS)Based on prior research this paper mainly improves thetopic model from three aspects Firstly the sparse matrixcaused by short texts in the social network is solved Sec-ondly the topic and sentiment distribution of the same wordpair are controlled+irdly the problem of text homogeneityis solved by incorporating the time factor into the topicmodel +erefore the TSTS model proposed in this paper isused to constrain the word pairs in the same documentwhich greatly reduces the complexity of time and space andmakes up for the sparse matrix of short texts to some extentMoreover the sentiment layer is integrated into TSTS byextending the hypotheses of ASUM and restrains word pairsgenerated by constraining sentences to follow the same

topic-sentiment distribution Finally the TSTS model in-corporating the time factor does not rely on the Markovmodel to discretize time and each topic is subject to thecontinuous temporal distribution For each documentgeneration the mixed distribution of topics is determined bythe words co-occurrence and timestamps TSTS model isshown in Figure 2

+e TSTS model simulates the generating process ofonline comments Generally the online comments fromusers can be regarded as a document which is short pithyand highly emotional +e word co-occurrence from BTM isthe most effective solution for the short-text topic model Inaddition the TSTS model with the time layer can contin-uously sample usersrsquo evaluation of hot events as well as thedynamic changes of usersrsquo sentiment +erefore the hy-potheses of the TSTS model are proposed as follows

(i) +e probability distribution of the time factor is notdirectly equal to the joint distribution of the topicand sentiment

(ii) +e topic-sentiment distribution of each documentis independent [26]

(iii) Similar topics of different sentiment polarity are notautomatically categorized [27]

Combined with the probability graph of Bayesianrsquosnetwork the TSTS model proposed in the paper has fourcharacteristics First a word pair is used to replace a singleword to carry out the sampling model Second eachtimestamp is related by topic and sentiment +ird theextraction of thematic characteristic and sentiment words isfor the whole corpus Fourth in the derivation process of theTSTS model it is not necessary to correspond betweenthematic feature and affective polarity words+at is becauseevery topic and sentiment have the corresponding poly-nomial word pair distribution In addition the text mod-eling process of the TSTS model also follows the assumptionthat there is a connection between the sentimental polaritywords of the topic features which also changes with the timefactor So the documents used to train the model must havea specific timestamp such as the publishing time of themicroblog

32 Generation of a Text in TSTSModel In the TSTS modelwe assumed that a corpus is composed of several texts Forinstance a microblog is a text containing two dimensions oftopic and sentiment Considering the effectiveness of publicopinions and related parameters of the microblog text worddistribution is determined by the topic sentiment and timeSo TSTS is an unsupervised topic-sentiment mixed model+e generation process of the document is as follows

(1) Extract a polynomial distribution θd on a topic fromthe Dirichlet prior distribution α that isθd sim Dir(α)

(2) Extract a polynomial distribution ψzl at some pointfrom the Dirichlet prior distribution μ that isψzl sim Dir(μ)

Complexity 3

(3) Extract a polynomial distribution πz in a sentimentfrom the Dirichlet prior distribution c that isπz sim Dir(c)

(4) For each document d and for each pair of words inthe article b (wi1 wi2) b isin B

(a) Choose a topic zi sim θdzi sim θd

(b) Choose an emotional label li sim πzi

(c) Choose a pair of words bi sim φzili(d) Choose a timestamp ti sim ψzili

As shown in Figure 2 word pairs in a document maybelong to different timestamps in the text generation process ofthe TSTS topic model In theory all the content of an articlesuch as words and topics should belong to the same timestampAlso the introduction of the time factor into the topic modelwill affect the topic homogeneity of an article However thedefault time factor of the TSTS model in the topic model willnot affect the homogeneity of the text So it is assumed that thetime factor in the paper has no weight Based on the TOTandthe group topic (GT) model the superparameter μ is intro-duced into TSTS to balance the interaction of time and wordsin document generation +e parametersrsquo explanation of theTSTS model is shown in Table 1

33 Model Deduction According to the Bayesian networkstructure diagram of the TSTS model the polynomial dis-tribution θ of the topic the distribution π of sentiment withthe topic the correlation distribution φ of word pairs withlttopic sentimentgt and the correlation distribution ψ oftime with lttopic sentimentgt can be calculated according tothe superparameters α β c and μ +en Gibbs sampling isdone that can ensure the convergence of the TSTS modelunder enough iteration times And each word in the doc-ument is assigned the topic and sentiment that are mostsuitable for the facts

According to the principle of Bayesian independencethe joint probability of word pair topic sentimental polarityand timestamp is given as follows

p(b t l z | α β c μ) p(b | l z β) middot p(t | l z μ)

middot p(l | z c) middot p(z | α)(1)

where the parameters are independent such as word pairs band parameters α c and μ timestamps t and parametersα c and β sentiment polarity l and parameters α μ and βand topic words z and parameters β c and μ +erefore thejoint distribution in the equation can be obtained by cal-culating the four parts on the right side of the equation

Given the sentiment polarity label of specific topicfeatures the distribution of b can be regarded as a poly-nomial distribution Based on the premise of topic words zi

and li bi is generated by N times with the probabilityp(b | l z) at each time Given that word pairs are inde-pendent of each other we can obtain

p(b | l z β) 1113945

N

i1p bi zi li

11138681113868111386811138681113872 1113873 1113945

N

i

β middot bi (2)

Superparameters are the representation parameters ofthe framework in the machine learning model [28] such asthe number of classes in the clustering method or thenumber of topics in the topic model In the Bayesian net-work the distribution and density function of θ are denotedas H(θ) and h(θ) respectively +ey are regarded as theprior distribution function and the prior density functionrespectively which are collectively referred to as the priordistribution If the distribution of θ is obtained after sam-pling it is called the posterior distribution Based on theconjugate property of Dirichletsimmultinomial when the

tL

Z

Wi1

Wi2nd

γ

π

μ

TS times T

S times TФ

β

ϕ

α

θ

D

Figure 2 TSTS model

Table 1 Explanation of parameters

D Number of documentsV Vocabulary sizeT Number of topicsS Number of sentiment polarityH Number of timestampsM Number of word pairsB Set of word pairsB Word pairs b (wi1 wi2)

W WordT TimeZ TopicL Sentiment polarity labelΘ [θd] polynomial distribution of topicsΦ [φzl]T times S times V matrix word pairsrsquo distributionΠ [πz]T times S matrix sentiment distributionΨ [ψzl] T times S times H matrix time distributionα Dirichlet prior parameters of Θc Dirichlet prior parameters of πβ Asymmetric Dirichlet prior parameters of Φμ Dirichlet prior parameters of ψnd +e number of word pairs in document dndj +e number of word pairs for topic j in document dnj +e number of word pairs for topic j

njk

+e number of word pairs assigned as topic j and sentimentpolarity k

nijk

+e number of word pair bi is assigned to the topic j andsentiment polarity k

njkh

+e number of word pair bi is assigned to the topic j andsentiment polarity k when timestamp is h

nminus p +e number of word pairs in the current document exceptfor the p position

4 Complexity

parameters in the population distribution conform to thedistribution law of polynomial (Multinomial) the conjugateprior distribution conforms to the following distribution

Dir(θ | α) + Mult(δ) Dir(θ | α + δ) (3)

For the general text model the discretized Dirichletdistribution and multinational distribution are as follows

Dir(b | β) Γ 1113936

Tj1 β1113872 1113873

1113937Tj1 Γ(β)

1113945

T

j1nj (4)

Mult(n | b N) N

n1113888 1113889 1113945

T

j1nj (5)

where i j k and h represent the iteration times of wordpairs topic sentiment and timestamp in the modelingprocess respectively Since the distribution of p(b | l z β)

follows the Dirichlet distribution this paper introduces φ forp(b | l z β) It can be obtained by integrating φ

p(b | l z β) 1113946 p(b | l zφ) middot p(φ | β)dφ

Γ(Vβ)

Γ(β)V1113888 1113889

TmiddotS

1113945j

1113945k

1113937iΓ nijk + β1113872 1113873

Γ njk + Vβ1113872 1113873

(6)

To estimate the posterior parameters φ in the formulawe can combine with the Bayes formula and the conjugateproperty of Dirichletsimmultinomial +e distribution of theposterior parameters can be obtained as follows

p((φ | l z β))propDir φ nijk + β111386811138681113868111386811138681113874 1113875 (7)

Given that the expectation of the Dirichlet distribution isE(Dir(ε)) εi1113936iεi so the calculated parameters are esti-mated by the known posterior parameter distribution ex-pectation +e estimated results are shown in equation (7)Similarly for p(t | l z μ) ψ is introduced By integrating ψ itcan be obtained as follows

p(t | l z μ) Γ(Hμ)

Γ(μ)H1113888 1113889

TmiddotS

1113945j

1113945k

1113937hΓ njkh + μ1113872 1113873

Γ njk + Hμ1113872 1113873 (8)

For p(l | z c) π is introduced By integrating π it can beobtained as follows

p(l | z c) Γ 1113936kck( 1113857

1113937kΓ ck( 11138571113888 1113889

T

1113945j

1113937kΓ njk + ck1113872 1113873

Γ nj + 1113936kαk1113872 1113873 (9)

For p(z | α) θ is introduced By integrating θ it can beobtained as follows

p(z | α) Γ 1113936jαj1113872 1113873

1113937jΓ αj1113872 1113873⎛⎝ ⎞⎠

D

1113945d

1113937jΓ ndj + αj1113872 1113873

Γ nd + 1113936jαj1113872 1113873 (10)

+e TSTS model can estimate the posterior distributionafter estimated values z and s have been obtained by sam-pling calculations +en the calculated equations (2)ndash(6) are

brought into equation (1) Combining with the nature ofGamma function the conditional distribution probability inGibbs sampling can be obtained

p sp k zp j b t lminus p1113868111386811138681113868 zminus p

α β c μ1113872 1113873

propn

minusp

dj + αj

nminusp

d + 1113936jαj

middotn

minusp

wpjk + β

nminusp

jk + Vβmiddot

nminusp

jk + ck

nminuspj + 1113936kck

middotn

minusp

jktp+ μ

nminusp

jk + Hμ

(11)

In order to simplify equation (6) the superparameter μ

1nd is introduced When the superparameters α β μ and c

are given the set B of the word pair the corresponding topicz and sentiment label l can be used to infer the parametersφ θ π and ψ based on Bayesrsquo rule and Dirichlet conjugateproperties

φjki nijk + βnjk + Vβ

θdj ndj + αj

nd + 1113936jαj

πjk njk + ck

nj + 1113936kck

ψjkh njkh + μnjk + Hμ

(12)

4 Experiment Analysis

41 Data Collection In order to verify the TSTS modelproposed in this paper the four hot events are randomlyselected from the trending searches of Sina Weibo in 2019And the comments of four events are regarded as the ex-perimental datasets +e four datasets selected are ldquoMilitaryparade in National Dayrdquo ldquo+e assault on a doctorrdquo ldquoHongKongrsquos eventrdquo and ldquoGarbage sorting in Shanghairdquo +ecomments are extracted from the Sina social networkplatform In the original datasets there are some mean-ingless words in the microblog text such as stop wordsinterjections in tone punctuation marks and numeric ex-pressions Before text modeling the word segmentationpackage in Python is used to process the experimental initialdataset In addition considering that comments on socialnetworks are relatively new the fashionable expressions inthe social network are collected and added to the customizeddictionary So these emerging words can be identified as faras possible and replaced with normal expressions In ad-dition there are some useless words in the text such as URLlinks and numbers which can be filtered by regular ex-pressions Finally a total of 14288 experimental data in fourevents are obtained +e description of four datasets isshown in Table 2

42 Sentiment Dictionary +e words or phrases in thesentiment dictionary have obvious sentiment tendency

Complexity 5

which can be divided into positive and negative words +esentiment dictionary in this paper has two major roles Onthe one hand we can identify sentiment polarity words anddistinguish topic features and sentiment words On the otherhand combining with sentiment prior information to makethe model more accurate in judging the sentiment polarity ofthe text Given that sentiment polarity words can reflectusersrsquo sentiment tendency it is of great significance to an-alyze the sentiment orientation of the text

At present there are two major Chinese sentiment dic-tionaries NTU and HowNet +e former dictionary contains2812 positive words and 8276 negative words +e lattercontains about 5000 positive words and 5000 negative wordsBased on HowNet and the classification of sentiment polarity[29 30] this paper constitutes the sentiment dictionary of theTSTS model evaluation experiment as shown in Table 3

43 Parameter Setting In this paper the Gibbs algorithm isused to sample the TSTS model and estimate four posteriorparameters According to the parameter setting in the tra-ditional topic model the superparameters are set as followsFirst the superparameter α is set as 50K and K is thenumber of topics extracted Second β is set as 001 +ird cis set as (005 times AVE)S AVE stands for the average lengthof articles that is the average number of words in themicroblog in this experiment and S stands for the totalnumber of polar tags Finally μ is set as 1nd

44 Evaluation Indicator For the extraction of topic fea-tures perplexity is used as an evaluation indicator tomeasure the predictive power of unknown data in theprocess of model modeling Also the lower perplexity meansbetter efficiency +e calculation formula of the perplexity isas follows

perplexity P 1113957Dt |M( 1113857 exp minus1113936

Dt

d1 logP 1113957bt

d |M1113874 1113875

1113936Dt

d11113957N

t

d

⎧⎪⎪⎨

⎪⎪⎩

⎫⎪⎪⎬

⎪⎪⎭

(13)

where 1113957Dt 1113957bt

d1113882 1113883Dt

d1represents an unknown dataset with the

timestamp t

P 1113957bt

d |M1113874 1113875 1113945

1113957Nt

d

n11113945

L

l11113945

T

t1P 1113957bdn | l z1113872 1113873P(z | l)P(l) (14)

where 1113957bt

d represents the vector set of word pairs in text d 1113957Nt

d

represents the number of word pairs in 1113957bt

d and P(1113957bt

d |M)

represents the direct possibility of training corpus and theformula is as follows

P 1113957bt

d |M1113874 1113875 1113945V

i11113944

l

l11113944

T

z1φlzi middot θdlz middot πdl)

1113957Nt

di ⎛⎝ (15)

For sentiment segmentation the sentiment judgmentfrom the perspective of the document is used as the eval-uation index which is based on the sentim ent polarity labelin the sentiment dictionary For the documents in thisexperiment the positive and negative sentiment of a doc-ument can be judged +is paper adopts the consistency testmethod to mark the sentiment labels [31]

5 Results

51 Extraction of Topics +e primary task of the TSTSmodel is to extract topic features As an extension of thetopic-sentiment mixed model the assessment is to judgewhether the extracted topic features are reasonable andaccurate Before extracting topic features from text mod-eling it is necessary to determine the number of topics to beextracted and the iteration times of Gibbs sampling For theeffective evaluation of topic discovery the degree of per-plexity is used as the measurement index in the paper +elower the perplexity is the better the fitting effect of themodel is Taking dataset 1 as an example the simulationresults are shown in Figure 3

Based on the experimental results shown in Figure 3 thenumber of topics was set 20 in the subsequent experimentsIn addition we can calculate the perplexity of three modelswith the change of the iterations By comparing the ex-perimental results of TSTS and LDA it can be found that theeffect of TSTS is always better than LDA and the degree ofperplexity decreases with the increase of iteration +atindicates that the topic discovery ability of TSTS graduallyimproves mainly because the TSTS model incorporates theword pairs to alleviate the sparse matrix of LDA for shorttexts By comparing the experimental results of TSTS andBTM it can be found that TSTS was better than BTM whenthe number of iterations increases However as the number

Table 2 Experiment datasets

+e dataset (the number of comments)Number of words in per

microblog Vocabulary size

Initial Pretreatment Initial Pretreatment+e dataset 1 (3562) 134 102 9789 6319+e dataset 2 (3527) 127 94 9736 6242+e dataset 3 (3617) 131 100 9780 6301+e dataset 4 (3582) 128 96 9742 6254Average 130 98 9762 6279

Table 3 Classification of sentiment words

Sentiment labels Happy Surprise Sad AngryVocabulary size 2467 276 3025 1897

6 Complexity

of iterations increased the gap of both models becamesmaller +is is because the word pair of BTM is used for thewhole corpus When the number of iterations is small theproportion of noise words is relatively large resulting inpoor quality of topic words In addition the sentiment layeris integrated into TSTS and the error generated in thesentiment estimation will affect the next iteration AlthoughTSTS is worse than BTMwhen there are more iterations theeffect of TSTS can still be balanced with BTM +ereforeduring the extraction of topic features the number of topicsand iterations can be set as 20 and 600

52 Sentiment Polarity +e information related to senti-ment polarity is provided in accordance with the topic andsentiment polarity of words +e sentiment distribution oftopics extracted from the TSTS model is shown in Figure 4In addition JST and ASUM are introduced as the com-parison to measure the effect of sentiment recognition of theTSTS model Each document has a binary sentiment labelsuch as positive or negative sentiment Taking dataset 2 ldquo+eattack on the doctorrdquo as an example the result is shown inFigure 4 +e number of topics is set to 5 at the beginning ofthe experiment With the refinement of granularity theperformance of the TSTS model increases Compared withJSTand ASUM the curve of the TSTS model changes greatlyconsidering the topic and sentiment relationship amongword pairs of the document +e change curve of the JSTmodel shows a steady upward trend and the identificationefficiency of ASUM is low +at is because ASUM has strictassumptions and the increase in the number of topics willcause the decentralization of topics and sentiments whichhas a great negative impact on the overall performance of themodel +e overall effect of the TSTS model was slightlybetter than JST and ASUM but the effect decreased slightlyafter the number of topics increased to 20+is is because the

data collected in the dataset are limited and the number oftopics has been set to discretize the word distribution +usthe judgment of sentiment polarity is affected+e sentimentlabel classification of documents is compared under differenttopics and the result of the TSTS model is better than JSTand ASUM

With the increase of topics the recognition performanceof the topic model has some fluctuations But the TSTSmodel was always better than JST and ASUM When thenumber of obtained topics and iterations is set as 20 and 600the TSTS is the best model in topic detection When thenumber of topics in the four datasets was set as 20 theaccuracy of sentiment polarity judgment is shown in Table 4

From Table 4 the TSTS model is better than JST andASUM in judging the sentiment polarity of documents +is

0 5 10 15 20 25 30 35Topics

4000

4500

5000

5500

6000

6500

7000Perplexity

LDABTMTSTS

(a)

0 200 400 600 800 1000 1200Iteration

3500

4000

4500

5000

5500

Perp

lexi

ty

LDABTMTSTS

(b)

Figure 3 +e relationship of the perplexity with the number of (a) topics and (b) iterations

0 5 10 15 20 25 30 35Topics

04

045

05

055

06

065

07

Accu

racy

ASUMJSTTSTS

Figure 4 Accuracy of sentiment polarity judgment

Complexity 7

is because sentiment polarity depends on the performance oftopic discovery in the previous stage In this experiment theeffect of JST and ASUM was exactly opposite +e differencewas caused by the length of the original document whichalso indirectly verified the effectiveness of the TSTS model

From Figure 5 it can be seen that the proportion ofpositive sentiment is significantly higher than the othersentiments in the dataset ldquoMilitary parade in National Dayrdquoand the dataset ldquoGarbage sorting in Shanghairdquo which isconsistent with the sentiment tendency of users in the socialnetwork For the second dataset ldquo+e assault on a doctorrdquothe two kinds of negative sentiment polarities of topic 1 and

topic 2 were compared Topic 1 is more likely to be sadsentiment while topic 2 is more likely to be angry senti-ment Topic 1 reflects the statement of the event and topic2 represents the follow-up discussion of the event

53 Topic and Sentiment Evolution +e curves of topicfeatures extracted from four datasets through the TSTSmodel are shown in Figure 6 Taking dataset 2 as an examplethe topic curve conforms to the evolution law of social andabrupt events +e two curves represent the trend of featurewords over time in topic 1 and topic 2 Topic 1 is the

Table 4 Accuracy of sentiment polarity judgment

ASUM JST TSTS+e dataset 1 04763 05427 06348+e dataset 2 04617 05398 06599+e dataset 3 04832 05461 06475+e dataset 4 04841 05294 06522

Happy Surprise Sad AngryEmotion

0

01

02

03

04

05

06

07

08

Prop

ortio

n

Topic 1Topic 2

(a)

Happy Surprise Sad AngryEmotion

0

01

02

03

04

05

06

Prop

ortio

n

Topic 1Topic 2

(b)

Happy Surprise Sad AngryEmotion

Topic 1Topic 2

0

01

02

03

04

05

Prop

ortio

n

(c)

Happy Surprise Sad AngryEmotion

Topic 1Topic 2

0

01

02

03

04

05

Prop

ortio

n

(d)

Figure 5 Sentiment distribution in four datasets (a) Military parade in National Day (b) +e assault on a doctor (c) Hong Kongrsquos event(d) Garbage sorting in Shanghai

8 Complexity

statement about the case itself From the beginning of theevent the amount of discussion about the event on the socialnetwork rose sharply and then gradually declined Topic 2

is a discussion on the development of the case which hascaused the second hot discussion again +e time is notconsistent when two curves reach the high peak +e peak

0

04

02

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(a)

0

04

02

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(b)

0

02

04

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(c)

0

02

04

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(d)

Figure 6 +e changing of topics in four datasets (a) Military parade in National Day (b) +e assault on a doctor (c) Hong Kongrsquos event(d) Garbage sorting in Shanghai

Complexity 9

3 5 7 9 2113 15 17 19 23 27251 11Time

00102030405060708

Prop

ortio

n

HappySurprise

SadAngry

(a)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

06

07

Prop

ortio

n

(b)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

0

01

02

03

04

05

Prop

ortio

n

HappySurprise

SadAngry

(c)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

Prop

ortio

n

(d)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

0

01

02

03

04

05

Prop

ortio

n

HappySurprise

SadAngry

(e)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

Prop

ortio

n

(f )

Figure 7 Continued

10 Complexity

value of topic 2 curve is lower than that of topic 1 whichreflects the discussion of the same event will fade over timeEven if there is a new topic discussion of a new topic is farlower than the beginning of the event Meanwhile similarresults can be verified in other three datasets

+e proportion of the sentiment polarity in the fourdatasets is shown in Figure 7 Since the sentiment polarityproportion is measured the four sentiment polarities arebalanced distributed before the occurrence of the eventsAfter the event occurred the polarity of positive and neg-ative sentiment began to change toward the two extremesAmong the four datasets the positive sentiment was higherthan the negative sentiment in the first dataset ldquoNational Daymilitary parade eventrdquo and the fourth dataset ldquoShanghaigarbage classification eventrdquo which also conforms to thesocial sentiment of events In addition it can be found thatthe difference among the four sentiment labels is large in theinitial stage and the distribution of sentiment labels be-comes stable in the later stage It has proved that the secondreport of social events does not cause the heat for the firsttime But the sentiment tendency judgment in social net-works will not decline sharply with the reduction of dis-cussion which can be proved in topic 2 of the seconddataset ldquo+e assault on a doctorrdquo Given that the topicfeatures come from the background of a corpus and containa lot of noise words the relative position of four curves iscloser in terms of sentiment polarity evolution Howeverthere is still a gap between government feelings which isdifferent from the average distribution of sentiment polarityat the beginning of the event

6 Discussion

From the perspective of theoretical significance this paperextends the LDA model to some extent First in view of thesparse matrix caused by the short text word pairs are in-troduced to replace a single word for text generationaccording to BTM Based on the hypotheses of JST and

ASUM the sentiment layer is introduced to form theBayesian network structure and the word pair is limited tothe same sentiment polarity distribution Second in order torealize dynamic analysis and text homogeneity the time-stamp and the corresponding superparameter are intro-duced to alleviate the problem of the word order in the textgeneration +ird this research combines behavioral ex-periments big data mining mathematical modeling andimitating to promote the research expansion of new situa-tions and new methods

From the perspective of practical significance this paperis of great value in tracking and monitoring topics of publicopinion in social networks+e online public opinions of hotevents can be monitored which contributes to accuratelyjudge social events and make emergency decisions forgovernment or departments In addition this paper ana-lyzed the evolution response and governance of publicopinion which is conducive to understand the formationmechanism and the collaborative evolution of publicopinion Meanwhile the use of public opinion informationcan detect and screen information prevent the spread ofrumor and scientifically formulate the mechanism of uti-lization to effectively reduce the loss risk

7 Conclusions

In the context of the mobile social network the number ofshort texts is growing explosively In order to extract in-formation from massive short texts quickly and monitorpublic opinions the TSTS model is proposed in the studybased on LDA BTM JST ASUM and TOT From theexperimental results the TSTS model achieves good per-formance In terms of topic feature extraction the degree ofperplexity of TSTS is always lower than LDA Moreoveralthough the degree of perplexity is slightly higher than BTMwith the increase of iteration times it can maintain thebalance with BTM In the sentiment analysis the effect ofTSTS was significantly better than JST and ASUM Finally

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

Prop

ortio

n

HappySurprise

SadAngry

04

035

03

025

02

015

01

(g)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

Prop

ortio

n

05045

04035

03025

02015

01

(h)

Figure 7+e changing of sentiment in four datasets (a) Military parade in National Day topic 1 (b) Military parade in National Day topic2 (c)+e assault on a doctor topic 1 (d)+e assault on a doctor topic 2 (e) Hong Kongrsquos event topic 1 (f ) Hong Kongrsquos event topic 2(g) Garbage sorting in Shanghai topic 1 (h) Garbage sorting in Shanghai topic 2

Complexity 11

the TSTS model incorporating the time factor can determinethe change trend of the topic and sentiment

+ere are still some shortcomings in this paper Firstlyfor the extraction of topic feature words the global topiclevel can be added to the topic layer of the TSTS model tofilter the common topic words Secondly in sentimentpolarity judgment the sentiment labels aremanually markedbased on prior knowledge However the sentiments areextremely rich and changeable In the future research theBayesian network and entity theory can be used to judgesentiment bias

Data Availability

+e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is work was supported in part by the Project of NationalScience and Technology Department (2018YFF0213102)Public Projects of Zhejiang Province (LGF18G010003LGF19G010002 and LGF20G010002) Science and TechnologyProject of Zhejiang Province (2020C01158) and First ClassDiscipline of Zhejiang - A (Zhejiang Gongshang University -Statistics)

References

[1] CNNIC ldquoStatistical report on internet development inChinardquo CNNIC Beijing China 2019 httpwwwcacgovcn2019-0830c_1124938750htm

[2] M Ingawale A Dutta R Roy and P Seetharaman ldquoNetworkanalysis of user generated content quality in wikipediardquoOnline Information Review vol 37 no 4 pp 602ndash619 2013

[3] Q Gao Y Tian and M Tu ldquoExploring factors influencingChinese userrsquos perceived credibility of health and safety in-formation onWeibordquo Computers in Human Behavior vol 45pp 21ndash31 2015

[4] M Steyvers and T Griffiths ldquoProbabilistic Topic ModelsrdquoHandbook of Latent Semantic Analysis Psychology Pressvol 427 no 7 pp 424ndash440 New York NY USA 2007

[5] J-F Yeh Y-S Tan and C-H Lee ldquoTopic detection andtracking for conversational content by using conceptual dy-namic latent Dirichlet allocationrdquo Neurocomputing vol 216pp 310ndash318 2016

[6] X Zhou and L Chen ldquoEvent detection over twitter socialmedia streamsrdquoDeVLDB Journal vol 23 no 3 pp 381ndash4002014

[7] Y J Du Y T Yi and X Y Li ldquoExtracting and tracking hottopics of micro-blogs based on improved LatentDirichlet allocationrdquo Engineering Applications of ArtificialIntelligence vol 87 pp 1ndash13 2020

[8] H J Kim Y K Jeong Y Kim et al ldquoTopic-based content andsentiment analysis of Ebola virus on twitter and in the newsrdquoJournal of Information Science vol 42 no 6 pp 763ndash7812016

[9] B Subeno R Kusumaningrum and F Farikhin ldquoOptimi-sation towards latent Dirichlet allocation its topic numberand collapsed gibbs sampling inference processrdquo Interna-tional Journal of Electrical and Computer Engineering (IJECE)vol 8 no 5 pp 3204ndash3213 2018

[10] H Park T Park and Y-S Lee ldquoPartially collapsed gibbssampling for latent Dirichlet allocationrdquo Expert Systems withApplications vol 131 pp 208ndash218 2019

[11] D M Blei A Y Ng and M I Jordan ldquoLatentDirichlet allocationrdquo Journal of Machine Learning Researchvol 3 pp 993ndash1022 2003

[12] X Cheng X Yan Y Lan and J Guo ldquoBTM topic modelingover short textsrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 12 pp 2928ndash2941 2014

[13] J Rashid S M A Shah and A Irtaza ldquoFuzzy topic modelingapproach for text mining over short textrdquo Information Pro-cessing amp Management vol 56 no 6 pp 1ndash19 2019

[14] H-Y Lu N Kang Y Li Q-Y Zhan J-Y Xie andC-J Wang ldquoUtilizing recurrent neural network for topicdiscovery in short text scenariosrdquo Intelligent Data Analysisvol 23 no 2 pp 259ndash277 2019

[15] L Zhu H Xu Y Xu et al ldquoA joint model of extended LDAand IBTM over streaming Chinese short textsrdquo IntelligentData Analysis vol 23 no 3 pp 681ndash699 2019

[16] M Tang J Jin Y Liu et al ldquoIntegrating topic sentiment andsyntax for modeling online product reviews a topic modelapproachrdquo Journal of Computing and Information Science inEngineering vol 19 no 1 pp 1ndash12 2019

[17] R K Amplayo S Lee and M Song ldquoIncorporating productdescription to sentiment topic models for improved aspect-based sentiment analysisrdquo Information Sciences vol 454-455pp 200ndash215 2018

[18] L Yao Y Zhang B Wei et al ldquoConcept over time thecombination of probabilistic topic model with Wikipediaknowledgerdquo Expert Systems with Applications vol 60pp 27ndash38 2016

[19] S Park W Lee and I-C Moon ldquoAssociative topic modelswith numerical time seriesrdquo Information Processing ampManagement vol 51 no 5 pp 737ndash755 2015

[20] P Lorenz-Spreen F Wolf J Braun et al ldquoTracking onlinetopics over time understanding dynamic hashtag commu-nitiesrdquo Computational Social Networks vol 5 no 1 pp 1ndash182018

[21] Y He C Lin W Gao et al ldquoDynamic joint sentiment-topicmodelrdquo ACM Transactions on Intelligent Systems amp Tech-nology vol 5 no 1 pp 1ndash21 2013

[22] L Kuo and T Y Yang ldquoAn improved collapsed Gibbssampler for Dirichlet process mixing modelsrdquo ComputationalStatistics amp Data Analysis vol 50 no 3 pp 659ndash674 2006

[23] Y Papanikolaou J R Foulds T N Rubin et al ldquoDensedistributions from sparse samples improved Gibbs samplingparameter estimators for LDArdquo Statistics vol 18 no 62pp 1ndash58 2015

[24] X Zhou J Ouyang and X Li ldquoTwo time-efficient Gibbssampling inference algorithms for biterm topic modelrdquo Ap-plied Intelligence vol 48 no 3 pp 730ndash754 2018

[25] P Bhuyan ldquoEstimation of random-effects model for longi-tudinal data with non ignorable missingness using Gibbssamplingrdquo Computational Statistics vol 34 no 4pp 1963ndash1710 2019

[26] C Lin Y He R Everson and S Ruger ldquoWeakly supervisedjoint sentiment-topic detection from textrdquo IEEE Transactionson Knowledge and Data Engineering vol 24 no 6pp 1134ndash1145 2012

12 Complexity

[27] A Daud and F Muhammad ldquoGroup topic modeling foracademic knowledge discoveryrdquo Applied Intelligence vol 36no 4 pp 870ndash886 2012

[28] Z Huang J Tang G Shan J Ni Y Chen and C Wang ldquoAnefficient passenger-hunting recommendation framework withmulti-task deep learningrdquo IEEE Internet of Dings Journalvol 6 no 5 pp 7713ndash7721 2019

[29] S MMohammad S Kiritchenko and X Zhu ldquoNRC-Canadabuilding the state-of-the-art in sentiment analysis of tweetsrdquoin Proceedings of the Seventh International Workshop onSemantic Evaluation Exercises (SemEval-2013) SpringerAtlanta GA USA pp 1ndash5 June 2013

[30] T Chen Q Li J Yang G Cong and G Li ldquoModeling of thepublic opinion polarization process with the considerations ofindividual heterogeneity and dynamic conformityrdquo Mathe-matics vol 7 no 10 p 917 2019

[31] S A Curiskis B Drake T R Osborn and P J Kennedy ldquoAnevaluation of document clustering and topic modelling in twoonline social networks twitter and Redditrdquo InformationProcessing and Management vol 57 no 2 pp 1ndash21 2019

Complexity 13

Page 4: Research on Sentiment Tendency and Evolution of Public ...downloads.hindawi.com/journals/complexity/2020/9789431.pdf · , and the correlation distribution

(3) Extract a polynomial distribution πz in a sentimentfrom the Dirichlet prior distribution c that isπz sim Dir(c)

(4) For each document d and for each pair of words inthe article b (wi1 wi2) b isin B

(a) Choose a topic zi sim θdzi sim θd

(b) Choose an emotional label li sim πzi

(c) Choose a pair of words bi sim φzili(d) Choose a timestamp ti sim ψzili

As shown in Figure 2 word pairs in a document maybelong to different timestamps in the text generation process ofthe TSTS topic model In theory all the content of an articlesuch as words and topics should belong to the same timestampAlso the introduction of the time factor into the topic modelwill affect the topic homogeneity of an article However thedefault time factor of the TSTS model in the topic model willnot affect the homogeneity of the text So it is assumed that thetime factor in the paper has no weight Based on the TOTandthe group topic (GT) model the superparameter μ is intro-duced into TSTS to balance the interaction of time and wordsin document generation +e parametersrsquo explanation of theTSTS model is shown in Table 1

33 Model Deduction According to the Bayesian networkstructure diagram of the TSTS model the polynomial dis-tribution θ of the topic the distribution π of sentiment withthe topic the correlation distribution φ of word pairs withlttopic sentimentgt and the correlation distribution ψ oftime with lttopic sentimentgt can be calculated according tothe superparameters α β c and μ +en Gibbs sampling isdone that can ensure the convergence of the TSTS modelunder enough iteration times And each word in the doc-ument is assigned the topic and sentiment that are mostsuitable for the facts

According to the principle of Bayesian independencethe joint probability of word pair topic sentimental polarityand timestamp is given as follows

p(b t l z | α β c μ) p(b | l z β) middot p(t | l z μ)

middot p(l | z c) middot p(z | α)(1)

where the parameters are independent such as word pairs band parameters α c and μ timestamps t and parametersα c and β sentiment polarity l and parameters α μ and βand topic words z and parameters β c and μ +erefore thejoint distribution in the equation can be obtained by cal-culating the four parts on the right side of the equation

Given the sentiment polarity label of specific topicfeatures the distribution of b can be regarded as a poly-nomial distribution Based on the premise of topic words zi

and li bi is generated by N times with the probabilityp(b | l z) at each time Given that word pairs are inde-pendent of each other we can obtain

p(b | l z β) 1113945

N

i1p bi zi li

11138681113868111386811138681113872 1113873 1113945

N

i

β middot bi (2)

Superparameters are the representation parameters ofthe framework in the machine learning model [28] such asthe number of classes in the clustering method or thenumber of topics in the topic model In the Bayesian net-work the distribution and density function of θ are denotedas H(θ) and h(θ) respectively +ey are regarded as theprior distribution function and the prior density functionrespectively which are collectively referred to as the priordistribution If the distribution of θ is obtained after sam-pling it is called the posterior distribution Based on theconjugate property of Dirichletsimmultinomial when the

tL

Z

Wi1

Wi2nd

γ

π

μ

TS times T

S times TФ

β

ϕ

α

θ

D

Figure 2 TSTS model

Table 1 Explanation of parameters

D Number of documentsV Vocabulary sizeT Number of topicsS Number of sentiment polarityH Number of timestampsM Number of word pairsB Set of word pairsB Word pairs b (wi1 wi2)

W WordT TimeZ TopicL Sentiment polarity labelΘ [θd] polynomial distribution of topicsΦ [φzl]T times S times V matrix word pairsrsquo distributionΠ [πz]T times S matrix sentiment distributionΨ [ψzl] T times S times H matrix time distributionα Dirichlet prior parameters of Θc Dirichlet prior parameters of πβ Asymmetric Dirichlet prior parameters of Φμ Dirichlet prior parameters of ψnd +e number of word pairs in document dndj +e number of word pairs for topic j in document dnj +e number of word pairs for topic j

njk

+e number of word pairs assigned as topic j and sentimentpolarity k

nijk

+e number of word pair bi is assigned to the topic j andsentiment polarity k

njkh

+e number of word pair bi is assigned to the topic j andsentiment polarity k when timestamp is h

nminus p +e number of word pairs in the current document exceptfor the p position

4 Complexity

parameters in the population distribution conform to thedistribution law of polynomial (Multinomial) the conjugateprior distribution conforms to the following distribution

Dir(θ | α) + Mult(δ) Dir(θ | α + δ) (3)

For the general text model the discretized Dirichletdistribution and multinational distribution are as follows

Dir(b | β) Γ 1113936

Tj1 β1113872 1113873

1113937Tj1 Γ(β)

1113945

T

j1nj (4)

Mult(n | b N) N

n1113888 1113889 1113945

T

j1nj (5)

where i j k and h represent the iteration times of wordpairs topic sentiment and timestamp in the modelingprocess respectively Since the distribution of p(b | l z β)

follows the Dirichlet distribution this paper introduces φ forp(b | l z β) It can be obtained by integrating φ

p(b | l z β) 1113946 p(b | l zφ) middot p(φ | β)dφ

Γ(Vβ)

Γ(β)V1113888 1113889

TmiddotS

1113945j

1113945k

1113937iΓ nijk + β1113872 1113873

Γ njk + Vβ1113872 1113873

(6)

To estimate the posterior parameters φ in the formulawe can combine with the Bayes formula and the conjugateproperty of Dirichletsimmultinomial +e distribution of theposterior parameters can be obtained as follows

p((φ | l z β))propDir φ nijk + β111386811138681113868111386811138681113874 1113875 (7)

Given that the expectation of the Dirichlet distribution isE(Dir(ε)) εi1113936iεi so the calculated parameters are esti-mated by the known posterior parameter distribution ex-pectation +e estimated results are shown in equation (7)Similarly for p(t | l z μ) ψ is introduced By integrating ψ itcan be obtained as follows

p(t | l z μ) Γ(Hμ)

Γ(μ)H1113888 1113889

TmiddotS

1113945j

1113945k

1113937hΓ njkh + μ1113872 1113873

Γ njk + Hμ1113872 1113873 (8)

For p(l | z c) π is introduced By integrating π it can beobtained as follows

p(l | z c) Γ 1113936kck( 1113857

1113937kΓ ck( 11138571113888 1113889

T

1113945j

1113937kΓ njk + ck1113872 1113873

Γ nj + 1113936kαk1113872 1113873 (9)

For p(z | α) θ is introduced By integrating θ it can beobtained as follows

p(z | α) Γ 1113936jαj1113872 1113873

1113937jΓ αj1113872 1113873⎛⎝ ⎞⎠

D

1113945d

1113937jΓ ndj + αj1113872 1113873

Γ nd + 1113936jαj1113872 1113873 (10)

+e TSTS model can estimate the posterior distributionafter estimated values z and s have been obtained by sam-pling calculations +en the calculated equations (2)ndash(6) are

brought into equation (1) Combining with the nature ofGamma function the conditional distribution probability inGibbs sampling can be obtained

p sp k zp j b t lminus p1113868111386811138681113868 zminus p

α β c μ1113872 1113873

propn

minusp

dj + αj

nminusp

d + 1113936jαj

middotn

minusp

wpjk + β

nminusp

jk + Vβmiddot

nminusp

jk + ck

nminuspj + 1113936kck

middotn

minusp

jktp+ μ

nminusp

jk + Hμ

(11)

In order to simplify equation (6) the superparameter μ

1nd is introduced When the superparameters α β μ and c

are given the set B of the word pair the corresponding topicz and sentiment label l can be used to infer the parametersφ θ π and ψ based on Bayesrsquo rule and Dirichlet conjugateproperties

φjki nijk + βnjk + Vβ

θdj ndj + αj

nd + 1113936jαj

πjk njk + ck

nj + 1113936kck

ψjkh njkh + μnjk + Hμ

(12)

4 Experiment Analysis

41 Data Collection In order to verify the TSTS modelproposed in this paper the four hot events are randomlyselected from the trending searches of Sina Weibo in 2019And the comments of four events are regarded as the ex-perimental datasets +e four datasets selected are ldquoMilitaryparade in National Dayrdquo ldquo+e assault on a doctorrdquo ldquoHongKongrsquos eventrdquo and ldquoGarbage sorting in Shanghairdquo +ecomments are extracted from the Sina social networkplatform In the original datasets there are some mean-ingless words in the microblog text such as stop wordsinterjections in tone punctuation marks and numeric ex-pressions Before text modeling the word segmentationpackage in Python is used to process the experimental initialdataset In addition considering that comments on socialnetworks are relatively new the fashionable expressions inthe social network are collected and added to the customizeddictionary So these emerging words can be identified as faras possible and replaced with normal expressions In ad-dition there are some useless words in the text such as URLlinks and numbers which can be filtered by regular ex-pressions Finally a total of 14288 experimental data in fourevents are obtained +e description of four datasets isshown in Table 2

42 Sentiment Dictionary +e words or phrases in thesentiment dictionary have obvious sentiment tendency

Complexity 5

which can be divided into positive and negative words +esentiment dictionary in this paper has two major roles Onthe one hand we can identify sentiment polarity words anddistinguish topic features and sentiment words On the otherhand combining with sentiment prior information to makethe model more accurate in judging the sentiment polarity ofthe text Given that sentiment polarity words can reflectusersrsquo sentiment tendency it is of great significance to an-alyze the sentiment orientation of the text

At present there are two major Chinese sentiment dic-tionaries NTU and HowNet +e former dictionary contains2812 positive words and 8276 negative words +e lattercontains about 5000 positive words and 5000 negative wordsBased on HowNet and the classification of sentiment polarity[29 30] this paper constitutes the sentiment dictionary of theTSTS model evaluation experiment as shown in Table 3

43 Parameter Setting In this paper the Gibbs algorithm isused to sample the TSTS model and estimate four posteriorparameters According to the parameter setting in the tra-ditional topic model the superparameters are set as followsFirst the superparameter α is set as 50K and K is thenumber of topics extracted Second β is set as 001 +ird cis set as (005 times AVE)S AVE stands for the average lengthof articles that is the average number of words in themicroblog in this experiment and S stands for the totalnumber of polar tags Finally μ is set as 1nd

44 Evaluation Indicator For the extraction of topic fea-tures perplexity is used as an evaluation indicator tomeasure the predictive power of unknown data in theprocess of model modeling Also the lower perplexity meansbetter efficiency +e calculation formula of the perplexity isas follows

perplexity P 1113957Dt |M( 1113857 exp minus1113936

Dt

d1 logP 1113957bt

d |M1113874 1113875

1113936Dt

d11113957N

t

d

⎧⎪⎪⎨

⎪⎪⎩

⎫⎪⎪⎬

⎪⎪⎭

(13)

where 1113957Dt 1113957bt

d1113882 1113883Dt

d1represents an unknown dataset with the

timestamp t

P 1113957bt

d |M1113874 1113875 1113945

1113957Nt

d

n11113945

L

l11113945

T

t1P 1113957bdn | l z1113872 1113873P(z | l)P(l) (14)

where 1113957bt

d represents the vector set of word pairs in text d 1113957Nt

d

represents the number of word pairs in 1113957bt

d and P(1113957bt

d |M)

represents the direct possibility of training corpus and theformula is as follows

P 1113957bt

d |M1113874 1113875 1113945V

i11113944

l

l11113944

T

z1φlzi middot θdlz middot πdl)

1113957Nt

di ⎛⎝ (15)

For sentiment segmentation the sentiment judgmentfrom the perspective of the document is used as the eval-uation index which is based on the sentim ent polarity labelin the sentiment dictionary For the documents in thisexperiment the positive and negative sentiment of a doc-ument can be judged +is paper adopts the consistency testmethod to mark the sentiment labels [31]

5 Results

51 Extraction of Topics +e primary task of the TSTSmodel is to extract topic features As an extension of thetopic-sentiment mixed model the assessment is to judgewhether the extracted topic features are reasonable andaccurate Before extracting topic features from text mod-eling it is necessary to determine the number of topics to beextracted and the iteration times of Gibbs sampling For theeffective evaluation of topic discovery the degree of per-plexity is used as the measurement index in the paper +elower the perplexity is the better the fitting effect of themodel is Taking dataset 1 as an example the simulationresults are shown in Figure 3

Based on the experimental results shown in Figure 3 thenumber of topics was set 20 in the subsequent experimentsIn addition we can calculate the perplexity of three modelswith the change of the iterations By comparing the ex-perimental results of TSTS and LDA it can be found that theeffect of TSTS is always better than LDA and the degree ofperplexity decreases with the increase of iteration +atindicates that the topic discovery ability of TSTS graduallyimproves mainly because the TSTS model incorporates theword pairs to alleviate the sparse matrix of LDA for shorttexts By comparing the experimental results of TSTS andBTM it can be found that TSTS was better than BTM whenthe number of iterations increases However as the number

Table 2 Experiment datasets

+e dataset (the number of comments)Number of words in per

microblog Vocabulary size

Initial Pretreatment Initial Pretreatment+e dataset 1 (3562) 134 102 9789 6319+e dataset 2 (3527) 127 94 9736 6242+e dataset 3 (3617) 131 100 9780 6301+e dataset 4 (3582) 128 96 9742 6254Average 130 98 9762 6279

Table 3 Classification of sentiment words

Sentiment labels Happy Surprise Sad AngryVocabulary size 2467 276 3025 1897

6 Complexity

of iterations increased the gap of both models becamesmaller +is is because the word pair of BTM is used for thewhole corpus When the number of iterations is small theproportion of noise words is relatively large resulting inpoor quality of topic words In addition the sentiment layeris integrated into TSTS and the error generated in thesentiment estimation will affect the next iteration AlthoughTSTS is worse than BTMwhen there are more iterations theeffect of TSTS can still be balanced with BTM +ereforeduring the extraction of topic features the number of topicsand iterations can be set as 20 and 600

52 Sentiment Polarity +e information related to senti-ment polarity is provided in accordance with the topic andsentiment polarity of words +e sentiment distribution oftopics extracted from the TSTS model is shown in Figure 4In addition JST and ASUM are introduced as the com-parison to measure the effect of sentiment recognition of theTSTS model Each document has a binary sentiment labelsuch as positive or negative sentiment Taking dataset 2 ldquo+eattack on the doctorrdquo as an example the result is shown inFigure 4 +e number of topics is set to 5 at the beginning ofthe experiment With the refinement of granularity theperformance of the TSTS model increases Compared withJSTand ASUM the curve of the TSTS model changes greatlyconsidering the topic and sentiment relationship amongword pairs of the document +e change curve of the JSTmodel shows a steady upward trend and the identificationefficiency of ASUM is low +at is because ASUM has strictassumptions and the increase in the number of topics willcause the decentralization of topics and sentiments whichhas a great negative impact on the overall performance of themodel +e overall effect of the TSTS model was slightlybetter than JST and ASUM but the effect decreased slightlyafter the number of topics increased to 20+is is because the

data collected in the dataset are limited and the number oftopics has been set to discretize the word distribution +usthe judgment of sentiment polarity is affected+e sentimentlabel classification of documents is compared under differenttopics and the result of the TSTS model is better than JSTand ASUM

With the increase of topics the recognition performanceof the topic model has some fluctuations But the TSTSmodel was always better than JST and ASUM When thenumber of obtained topics and iterations is set as 20 and 600the TSTS is the best model in topic detection When thenumber of topics in the four datasets was set as 20 theaccuracy of sentiment polarity judgment is shown in Table 4

From Table 4 the TSTS model is better than JST andASUM in judging the sentiment polarity of documents +is

0 5 10 15 20 25 30 35Topics

4000

4500

5000

5500

6000

6500

7000Perplexity

LDABTMTSTS

(a)

0 200 400 600 800 1000 1200Iteration

3500

4000

4500

5000

5500

Perp

lexi

ty

LDABTMTSTS

(b)

Figure 3 +e relationship of the perplexity with the number of (a) topics and (b) iterations

0 5 10 15 20 25 30 35Topics

04

045

05

055

06

065

07

Accu

racy

ASUMJSTTSTS

Figure 4 Accuracy of sentiment polarity judgment

Complexity 7

is because sentiment polarity depends on the performance oftopic discovery in the previous stage In this experiment theeffect of JST and ASUM was exactly opposite +e differencewas caused by the length of the original document whichalso indirectly verified the effectiveness of the TSTS model

From Figure 5 it can be seen that the proportion ofpositive sentiment is significantly higher than the othersentiments in the dataset ldquoMilitary parade in National Dayrdquoand the dataset ldquoGarbage sorting in Shanghairdquo which isconsistent with the sentiment tendency of users in the socialnetwork For the second dataset ldquo+e assault on a doctorrdquothe two kinds of negative sentiment polarities of topic 1 and

topic 2 were compared Topic 1 is more likely to be sadsentiment while topic 2 is more likely to be angry senti-ment Topic 1 reflects the statement of the event and topic2 represents the follow-up discussion of the event

53 Topic and Sentiment Evolution +e curves of topicfeatures extracted from four datasets through the TSTSmodel are shown in Figure 6 Taking dataset 2 as an examplethe topic curve conforms to the evolution law of social andabrupt events +e two curves represent the trend of featurewords over time in topic 1 and topic 2 Topic 1 is the

Table 4 Accuracy of sentiment polarity judgment

ASUM JST TSTS+e dataset 1 04763 05427 06348+e dataset 2 04617 05398 06599+e dataset 3 04832 05461 06475+e dataset 4 04841 05294 06522

Happy Surprise Sad AngryEmotion

0

01

02

03

04

05

06

07

08

Prop

ortio

n

Topic 1Topic 2

(a)

Happy Surprise Sad AngryEmotion

0

01

02

03

04

05

06

Prop

ortio

n

Topic 1Topic 2

(b)

Happy Surprise Sad AngryEmotion

Topic 1Topic 2

0

01

02

03

04

05

Prop

ortio

n

(c)

Happy Surprise Sad AngryEmotion

Topic 1Topic 2

0

01

02

03

04

05

Prop

ortio

n

(d)

Figure 5 Sentiment distribution in four datasets (a) Military parade in National Day (b) +e assault on a doctor (c) Hong Kongrsquos event(d) Garbage sorting in Shanghai

8 Complexity

statement about the case itself From the beginning of theevent the amount of discussion about the event on the socialnetwork rose sharply and then gradually declined Topic 2

is a discussion on the development of the case which hascaused the second hot discussion again +e time is notconsistent when two curves reach the high peak +e peak

0

04

02

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(a)

0

04

02

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(b)

0

02

04

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(c)

0

02

04

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(d)

Figure 6 +e changing of topics in four datasets (a) Military parade in National Day (b) +e assault on a doctor (c) Hong Kongrsquos event(d) Garbage sorting in Shanghai

Complexity 9

3 5 7 9 2113 15 17 19 23 27251 11Time

00102030405060708

Prop

ortio

n

HappySurprise

SadAngry

(a)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

06

07

Prop

ortio

n

(b)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

0

01

02

03

04

05

Prop

ortio

n

HappySurprise

SadAngry

(c)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

Prop

ortio

n

(d)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

0

01

02

03

04

05

Prop

ortio

n

HappySurprise

SadAngry

(e)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

Prop

ortio

n

(f )

Figure 7 Continued

10 Complexity

value of topic 2 curve is lower than that of topic 1 whichreflects the discussion of the same event will fade over timeEven if there is a new topic discussion of a new topic is farlower than the beginning of the event Meanwhile similarresults can be verified in other three datasets

+e proportion of the sentiment polarity in the fourdatasets is shown in Figure 7 Since the sentiment polarityproportion is measured the four sentiment polarities arebalanced distributed before the occurrence of the eventsAfter the event occurred the polarity of positive and neg-ative sentiment began to change toward the two extremesAmong the four datasets the positive sentiment was higherthan the negative sentiment in the first dataset ldquoNational Daymilitary parade eventrdquo and the fourth dataset ldquoShanghaigarbage classification eventrdquo which also conforms to thesocial sentiment of events In addition it can be found thatthe difference among the four sentiment labels is large in theinitial stage and the distribution of sentiment labels be-comes stable in the later stage It has proved that the secondreport of social events does not cause the heat for the firsttime But the sentiment tendency judgment in social net-works will not decline sharply with the reduction of dis-cussion which can be proved in topic 2 of the seconddataset ldquo+e assault on a doctorrdquo Given that the topicfeatures come from the background of a corpus and containa lot of noise words the relative position of four curves iscloser in terms of sentiment polarity evolution Howeverthere is still a gap between government feelings which isdifferent from the average distribution of sentiment polarityat the beginning of the event

6 Discussion

From the perspective of theoretical significance this paperextends the LDA model to some extent First in view of thesparse matrix caused by the short text word pairs are in-troduced to replace a single word for text generationaccording to BTM Based on the hypotheses of JST and

ASUM the sentiment layer is introduced to form theBayesian network structure and the word pair is limited tothe same sentiment polarity distribution Second in order torealize dynamic analysis and text homogeneity the time-stamp and the corresponding superparameter are intro-duced to alleviate the problem of the word order in the textgeneration +ird this research combines behavioral ex-periments big data mining mathematical modeling andimitating to promote the research expansion of new situa-tions and new methods

From the perspective of practical significance this paperis of great value in tracking and monitoring topics of publicopinion in social networks+e online public opinions of hotevents can be monitored which contributes to accuratelyjudge social events and make emergency decisions forgovernment or departments In addition this paper ana-lyzed the evolution response and governance of publicopinion which is conducive to understand the formationmechanism and the collaborative evolution of publicopinion Meanwhile the use of public opinion informationcan detect and screen information prevent the spread ofrumor and scientifically formulate the mechanism of uti-lization to effectively reduce the loss risk

7 Conclusions

In the context of the mobile social network the number ofshort texts is growing explosively In order to extract in-formation from massive short texts quickly and monitorpublic opinions the TSTS model is proposed in the studybased on LDA BTM JST ASUM and TOT From theexperimental results the TSTS model achieves good per-formance In terms of topic feature extraction the degree ofperplexity of TSTS is always lower than LDA Moreoveralthough the degree of perplexity is slightly higher than BTMwith the increase of iteration times it can maintain thebalance with BTM In the sentiment analysis the effect ofTSTS was significantly better than JST and ASUM Finally

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

Prop

ortio

n

HappySurprise

SadAngry

04

035

03

025

02

015

01

(g)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

Prop

ortio

n

05045

04035

03025

02015

01

(h)

Figure 7+e changing of sentiment in four datasets (a) Military parade in National Day topic 1 (b) Military parade in National Day topic2 (c)+e assault on a doctor topic 1 (d)+e assault on a doctor topic 2 (e) Hong Kongrsquos event topic 1 (f ) Hong Kongrsquos event topic 2(g) Garbage sorting in Shanghai topic 1 (h) Garbage sorting in Shanghai topic 2

Complexity 11

the TSTS model incorporating the time factor can determinethe change trend of the topic and sentiment

+ere are still some shortcomings in this paper Firstlyfor the extraction of topic feature words the global topiclevel can be added to the topic layer of the TSTS model tofilter the common topic words Secondly in sentimentpolarity judgment the sentiment labels aremanually markedbased on prior knowledge However the sentiments areextremely rich and changeable In the future research theBayesian network and entity theory can be used to judgesentiment bias

Data Availability

+e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is work was supported in part by the Project of NationalScience and Technology Department (2018YFF0213102)Public Projects of Zhejiang Province (LGF18G010003LGF19G010002 and LGF20G010002) Science and TechnologyProject of Zhejiang Province (2020C01158) and First ClassDiscipline of Zhejiang - A (Zhejiang Gongshang University -Statistics)

References

[1] CNNIC ldquoStatistical report on internet development inChinardquo CNNIC Beijing China 2019 httpwwwcacgovcn2019-0830c_1124938750htm

[2] M Ingawale A Dutta R Roy and P Seetharaman ldquoNetworkanalysis of user generated content quality in wikipediardquoOnline Information Review vol 37 no 4 pp 602ndash619 2013

[3] Q Gao Y Tian and M Tu ldquoExploring factors influencingChinese userrsquos perceived credibility of health and safety in-formation onWeibordquo Computers in Human Behavior vol 45pp 21ndash31 2015

[4] M Steyvers and T Griffiths ldquoProbabilistic Topic ModelsrdquoHandbook of Latent Semantic Analysis Psychology Pressvol 427 no 7 pp 424ndash440 New York NY USA 2007

[5] J-F Yeh Y-S Tan and C-H Lee ldquoTopic detection andtracking for conversational content by using conceptual dy-namic latent Dirichlet allocationrdquo Neurocomputing vol 216pp 310ndash318 2016

[6] X Zhou and L Chen ldquoEvent detection over twitter socialmedia streamsrdquoDeVLDB Journal vol 23 no 3 pp 381ndash4002014

[7] Y J Du Y T Yi and X Y Li ldquoExtracting and tracking hottopics of micro-blogs based on improved LatentDirichlet allocationrdquo Engineering Applications of ArtificialIntelligence vol 87 pp 1ndash13 2020

[8] H J Kim Y K Jeong Y Kim et al ldquoTopic-based content andsentiment analysis of Ebola virus on twitter and in the newsrdquoJournal of Information Science vol 42 no 6 pp 763ndash7812016

[9] B Subeno R Kusumaningrum and F Farikhin ldquoOptimi-sation towards latent Dirichlet allocation its topic numberand collapsed gibbs sampling inference processrdquo Interna-tional Journal of Electrical and Computer Engineering (IJECE)vol 8 no 5 pp 3204ndash3213 2018

[10] H Park T Park and Y-S Lee ldquoPartially collapsed gibbssampling for latent Dirichlet allocationrdquo Expert Systems withApplications vol 131 pp 208ndash218 2019

[11] D M Blei A Y Ng and M I Jordan ldquoLatentDirichlet allocationrdquo Journal of Machine Learning Researchvol 3 pp 993ndash1022 2003

[12] X Cheng X Yan Y Lan and J Guo ldquoBTM topic modelingover short textsrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 12 pp 2928ndash2941 2014

[13] J Rashid S M A Shah and A Irtaza ldquoFuzzy topic modelingapproach for text mining over short textrdquo Information Pro-cessing amp Management vol 56 no 6 pp 1ndash19 2019

[14] H-Y Lu N Kang Y Li Q-Y Zhan J-Y Xie andC-J Wang ldquoUtilizing recurrent neural network for topicdiscovery in short text scenariosrdquo Intelligent Data Analysisvol 23 no 2 pp 259ndash277 2019

[15] L Zhu H Xu Y Xu et al ldquoA joint model of extended LDAand IBTM over streaming Chinese short textsrdquo IntelligentData Analysis vol 23 no 3 pp 681ndash699 2019

[16] M Tang J Jin Y Liu et al ldquoIntegrating topic sentiment andsyntax for modeling online product reviews a topic modelapproachrdquo Journal of Computing and Information Science inEngineering vol 19 no 1 pp 1ndash12 2019

[17] R K Amplayo S Lee and M Song ldquoIncorporating productdescription to sentiment topic models for improved aspect-based sentiment analysisrdquo Information Sciences vol 454-455pp 200ndash215 2018

[18] L Yao Y Zhang B Wei et al ldquoConcept over time thecombination of probabilistic topic model with Wikipediaknowledgerdquo Expert Systems with Applications vol 60pp 27ndash38 2016

[19] S Park W Lee and I-C Moon ldquoAssociative topic modelswith numerical time seriesrdquo Information Processing ampManagement vol 51 no 5 pp 737ndash755 2015

[20] P Lorenz-Spreen F Wolf J Braun et al ldquoTracking onlinetopics over time understanding dynamic hashtag commu-nitiesrdquo Computational Social Networks vol 5 no 1 pp 1ndash182018

[21] Y He C Lin W Gao et al ldquoDynamic joint sentiment-topicmodelrdquo ACM Transactions on Intelligent Systems amp Tech-nology vol 5 no 1 pp 1ndash21 2013

[22] L Kuo and T Y Yang ldquoAn improved collapsed Gibbssampler for Dirichlet process mixing modelsrdquo ComputationalStatistics amp Data Analysis vol 50 no 3 pp 659ndash674 2006

[23] Y Papanikolaou J R Foulds T N Rubin et al ldquoDensedistributions from sparse samples improved Gibbs samplingparameter estimators for LDArdquo Statistics vol 18 no 62pp 1ndash58 2015

[24] X Zhou J Ouyang and X Li ldquoTwo time-efficient Gibbssampling inference algorithms for biterm topic modelrdquo Ap-plied Intelligence vol 48 no 3 pp 730ndash754 2018

[25] P Bhuyan ldquoEstimation of random-effects model for longi-tudinal data with non ignorable missingness using Gibbssamplingrdquo Computational Statistics vol 34 no 4pp 1963ndash1710 2019

[26] C Lin Y He R Everson and S Ruger ldquoWeakly supervisedjoint sentiment-topic detection from textrdquo IEEE Transactionson Knowledge and Data Engineering vol 24 no 6pp 1134ndash1145 2012

12 Complexity

[27] A Daud and F Muhammad ldquoGroup topic modeling foracademic knowledge discoveryrdquo Applied Intelligence vol 36no 4 pp 870ndash886 2012

[28] Z Huang J Tang G Shan J Ni Y Chen and C Wang ldquoAnefficient passenger-hunting recommendation framework withmulti-task deep learningrdquo IEEE Internet of Dings Journalvol 6 no 5 pp 7713ndash7721 2019

[29] S MMohammad S Kiritchenko and X Zhu ldquoNRC-Canadabuilding the state-of-the-art in sentiment analysis of tweetsrdquoin Proceedings of the Seventh International Workshop onSemantic Evaluation Exercises (SemEval-2013) SpringerAtlanta GA USA pp 1ndash5 June 2013

[30] T Chen Q Li J Yang G Cong and G Li ldquoModeling of thepublic opinion polarization process with the considerations ofindividual heterogeneity and dynamic conformityrdquo Mathe-matics vol 7 no 10 p 917 2019

[31] S A Curiskis B Drake T R Osborn and P J Kennedy ldquoAnevaluation of document clustering and topic modelling in twoonline social networks twitter and Redditrdquo InformationProcessing and Management vol 57 no 2 pp 1ndash21 2019

Complexity 13

Page 5: Research on Sentiment Tendency and Evolution of Public ...downloads.hindawi.com/journals/complexity/2020/9789431.pdf · , and the correlation distribution

parameters in the population distribution conform to thedistribution law of polynomial (Multinomial) the conjugateprior distribution conforms to the following distribution

Dir(θ | α) + Mult(δ) Dir(θ | α + δ) (3)

For the general text model the discretized Dirichletdistribution and multinational distribution are as follows

Dir(b | β) Γ 1113936

Tj1 β1113872 1113873

1113937Tj1 Γ(β)

1113945

T

j1nj (4)

Mult(n | b N) N

n1113888 1113889 1113945

T

j1nj (5)

where i j k and h represent the iteration times of wordpairs topic sentiment and timestamp in the modelingprocess respectively Since the distribution of p(b | l z β)

follows the Dirichlet distribution this paper introduces φ forp(b | l z β) It can be obtained by integrating φ

p(b | l z β) 1113946 p(b | l zφ) middot p(φ | β)dφ

Γ(Vβ)

Γ(β)V1113888 1113889

TmiddotS

1113945j

1113945k

1113937iΓ nijk + β1113872 1113873

Γ njk + Vβ1113872 1113873

(6)

To estimate the posterior parameters φ in the formulawe can combine with the Bayes formula and the conjugateproperty of Dirichletsimmultinomial +e distribution of theposterior parameters can be obtained as follows

p((φ | l z β))propDir φ nijk + β111386811138681113868111386811138681113874 1113875 (7)

Given that the expectation of the Dirichlet distribution isE(Dir(ε)) εi1113936iεi so the calculated parameters are esti-mated by the known posterior parameter distribution ex-pectation +e estimated results are shown in equation (7)Similarly for p(t | l z μ) ψ is introduced By integrating ψ itcan be obtained as follows

p(t | l z μ) Γ(Hμ)

Γ(μ)H1113888 1113889

TmiddotS

1113945j

1113945k

1113937hΓ njkh + μ1113872 1113873

Γ njk + Hμ1113872 1113873 (8)

For p(l | z c) π is introduced By integrating π it can beobtained as follows

p(l | z c) Γ 1113936kck( 1113857

1113937kΓ ck( 11138571113888 1113889

T

1113945j

1113937kΓ njk + ck1113872 1113873

Γ nj + 1113936kαk1113872 1113873 (9)

For p(z | α) θ is introduced By integrating θ it can beobtained as follows

p(z | α) Γ 1113936jαj1113872 1113873

1113937jΓ αj1113872 1113873⎛⎝ ⎞⎠

D

1113945d

1113937jΓ ndj + αj1113872 1113873

Γ nd + 1113936jαj1113872 1113873 (10)

+e TSTS model can estimate the posterior distributionafter estimated values z and s have been obtained by sam-pling calculations +en the calculated equations (2)ndash(6) are

brought into equation (1) Combining with the nature ofGamma function the conditional distribution probability inGibbs sampling can be obtained

p sp k zp j b t lminus p1113868111386811138681113868 zminus p

α β c μ1113872 1113873

propn

minusp

dj + αj

nminusp

d + 1113936jαj

middotn

minusp

wpjk + β

nminusp

jk + Vβmiddot

nminusp

jk + ck

nminuspj + 1113936kck

middotn

minusp

jktp+ μ

nminusp

jk + Hμ

(11)

In order to simplify equation (6) the superparameter μ

1nd is introduced When the superparameters α β μ and c

are given the set B of the word pair the corresponding topicz and sentiment label l can be used to infer the parametersφ θ π and ψ based on Bayesrsquo rule and Dirichlet conjugateproperties

φjki nijk + βnjk + Vβ

θdj ndj + αj

nd + 1113936jαj

πjk njk + ck

nj + 1113936kck

ψjkh njkh + μnjk + Hμ

(12)

4 Experiment Analysis

41 Data Collection In order to verify the TSTS modelproposed in this paper the four hot events are randomlyselected from the trending searches of Sina Weibo in 2019And the comments of four events are regarded as the ex-perimental datasets +e four datasets selected are ldquoMilitaryparade in National Dayrdquo ldquo+e assault on a doctorrdquo ldquoHongKongrsquos eventrdquo and ldquoGarbage sorting in Shanghairdquo +ecomments are extracted from the Sina social networkplatform In the original datasets there are some mean-ingless words in the microblog text such as stop wordsinterjections in tone punctuation marks and numeric ex-pressions Before text modeling the word segmentationpackage in Python is used to process the experimental initialdataset In addition considering that comments on socialnetworks are relatively new the fashionable expressions inthe social network are collected and added to the customizeddictionary So these emerging words can be identified as faras possible and replaced with normal expressions In ad-dition there are some useless words in the text such as URLlinks and numbers which can be filtered by regular ex-pressions Finally a total of 14288 experimental data in fourevents are obtained +e description of four datasets isshown in Table 2

42 Sentiment Dictionary +e words or phrases in thesentiment dictionary have obvious sentiment tendency

Complexity 5

which can be divided into positive and negative words +esentiment dictionary in this paper has two major roles Onthe one hand we can identify sentiment polarity words anddistinguish topic features and sentiment words On the otherhand combining with sentiment prior information to makethe model more accurate in judging the sentiment polarity ofthe text Given that sentiment polarity words can reflectusersrsquo sentiment tendency it is of great significance to an-alyze the sentiment orientation of the text

At present there are two major Chinese sentiment dic-tionaries NTU and HowNet +e former dictionary contains2812 positive words and 8276 negative words +e lattercontains about 5000 positive words and 5000 negative wordsBased on HowNet and the classification of sentiment polarity[29 30] this paper constitutes the sentiment dictionary of theTSTS model evaluation experiment as shown in Table 3

43 Parameter Setting In this paper the Gibbs algorithm isused to sample the TSTS model and estimate four posteriorparameters According to the parameter setting in the tra-ditional topic model the superparameters are set as followsFirst the superparameter α is set as 50K and K is thenumber of topics extracted Second β is set as 001 +ird cis set as (005 times AVE)S AVE stands for the average lengthof articles that is the average number of words in themicroblog in this experiment and S stands for the totalnumber of polar tags Finally μ is set as 1nd

44 Evaluation Indicator For the extraction of topic fea-tures perplexity is used as an evaluation indicator tomeasure the predictive power of unknown data in theprocess of model modeling Also the lower perplexity meansbetter efficiency +e calculation formula of the perplexity isas follows

perplexity P 1113957Dt |M( 1113857 exp minus1113936

Dt

d1 logP 1113957bt

d |M1113874 1113875

1113936Dt

d11113957N

t

d

⎧⎪⎪⎨

⎪⎪⎩

⎫⎪⎪⎬

⎪⎪⎭

(13)

where 1113957Dt 1113957bt

d1113882 1113883Dt

d1represents an unknown dataset with the

timestamp t

P 1113957bt

d |M1113874 1113875 1113945

1113957Nt

d

n11113945

L

l11113945

T

t1P 1113957bdn | l z1113872 1113873P(z | l)P(l) (14)

where 1113957bt

d represents the vector set of word pairs in text d 1113957Nt

d

represents the number of word pairs in 1113957bt

d and P(1113957bt

d |M)

represents the direct possibility of training corpus and theformula is as follows

P 1113957bt

d |M1113874 1113875 1113945V

i11113944

l

l11113944

T

z1φlzi middot θdlz middot πdl)

1113957Nt

di ⎛⎝ (15)

For sentiment segmentation the sentiment judgmentfrom the perspective of the document is used as the eval-uation index which is based on the sentim ent polarity labelin the sentiment dictionary For the documents in thisexperiment the positive and negative sentiment of a doc-ument can be judged +is paper adopts the consistency testmethod to mark the sentiment labels [31]

5 Results

51 Extraction of Topics +e primary task of the TSTSmodel is to extract topic features As an extension of thetopic-sentiment mixed model the assessment is to judgewhether the extracted topic features are reasonable andaccurate Before extracting topic features from text mod-eling it is necessary to determine the number of topics to beextracted and the iteration times of Gibbs sampling For theeffective evaluation of topic discovery the degree of per-plexity is used as the measurement index in the paper +elower the perplexity is the better the fitting effect of themodel is Taking dataset 1 as an example the simulationresults are shown in Figure 3

Based on the experimental results shown in Figure 3 thenumber of topics was set 20 in the subsequent experimentsIn addition we can calculate the perplexity of three modelswith the change of the iterations By comparing the ex-perimental results of TSTS and LDA it can be found that theeffect of TSTS is always better than LDA and the degree ofperplexity decreases with the increase of iteration +atindicates that the topic discovery ability of TSTS graduallyimproves mainly because the TSTS model incorporates theword pairs to alleviate the sparse matrix of LDA for shorttexts By comparing the experimental results of TSTS andBTM it can be found that TSTS was better than BTM whenthe number of iterations increases However as the number

Table 2 Experiment datasets

+e dataset (the number of comments)Number of words in per

microblog Vocabulary size

Initial Pretreatment Initial Pretreatment+e dataset 1 (3562) 134 102 9789 6319+e dataset 2 (3527) 127 94 9736 6242+e dataset 3 (3617) 131 100 9780 6301+e dataset 4 (3582) 128 96 9742 6254Average 130 98 9762 6279

Table 3 Classification of sentiment words

Sentiment labels Happy Surprise Sad AngryVocabulary size 2467 276 3025 1897

6 Complexity

of iterations increased the gap of both models becamesmaller +is is because the word pair of BTM is used for thewhole corpus When the number of iterations is small theproportion of noise words is relatively large resulting inpoor quality of topic words In addition the sentiment layeris integrated into TSTS and the error generated in thesentiment estimation will affect the next iteration AlthoughTSTS is worse than BTMwhen there are more iterations theeffect of TSTS can still be balanced with BTM +ereforeduring the extraction of topic features the number of topicsand iterations can be set as 20 and 600

52 Sentiment Polarity +e information related to senti-ment polarity is provided in accordance with the topic andsentiment polarity of words +e sentiment distribution oftopics extracted from the TSTS model is shown in Figure 4In addition JST and ASUM are introduced as the com-parison to measure the effect of sentiment recognition of theTSTS model Each document has a binary sentiment labelsuch as positive or negative sentiment Taking dataset 2 ldquo+eattack on the doctorrdquo as an example the result is shown inFigure 4 +e number of topics is set to 5 at the beginning ofthe experiment With the refinement of granularity theperformance of the TSTS model increases Compared withJSTand ASUM the curve of the TSTS model changes greatlyconsidering the topic and sentiment relationship amongword pairs of the document +e change curve of the JSTmodel shows a steady upward trend and the identificationefficiency of ASUM is low +at is because ASUM has strictassumptions and the increase in the number of topics willcause the decentralization of topics and sentiments whichhas a great negative impact on the overall performance of themodel +e overall effect of the TSTS model was slightlybetter than JST and ASUM but the effect decreased slightlyafter the number of topics increased to 20+is is because the

data collected in the dataset are limited and the number oftopics has been set to discretize the word distribution +usthe judgment of sentiment polarity is affected+e sentimentlabel classification of documents is compared under differenttopics and the result of the TSTS model is better than JSTand ASUM

With the increase of topics the recognition performanceof the topic model has some fluctuations But the TSTSmodel was always better than JST and ASUM When thenumber of obtained topics and iterations is set as 20 and 600the TSTS is the best model in topic detection When thenumber of topics in the four datasets was set as 20 theaccuracy of sentiment polarity judgment is shown in Table 4

From Table 4 the TSTS model is better than JST andASUM in judging the sentiment polarity of documents +is

0 5 10 15 20 25 30 35Topics

4000

4500

5000

5500

6000

6500

7000Perplexity

LDABTMTSTS

(a)

0 200 400 600 800 1000 1200Iteration

3500

4000

4500

5000

5500

Perp

lexi

ty

LDABTMTSTS

(b)

Figure 3 +e relationship of the perplexity with the number of (a) topics and (b) iterations

0 5 10 15 20 25 30 35Topics

04

045

05

055

06

065

07

Accu

racy

ASUMJSTTSTS

Figure 4 Accuracy of sentiment polarity judgment

Complexity 7

is because sentiment polarity depends on the performance oftopic discovery in the previous stage In this experiment theeffect of JST and ASUM was exactly opposite +e differencewas caused by the length of the original document whichalso indirectly verified the effectiveness of the TSTS model

From Figure 5 it can be seen that the proportion ofpositive sentiment is significantly higher than the othersentiments in the dataset ldquoMilitary parade in National Dayrdquoand the dataset ldquoGarbage sorting in Shanghairdquo which isconsistent with the sentiment tendency of users in the socialnetwork For the second dataset ldquo+e assault on a doctorrdquothe two kinds of negative sentiment polarities of topic 1 and

topic 2 were compared Topic 1 is more likely to be sadsentiment while topic 2 is more likely to be angry senti-ment Topic 1 reflects the statement of the event and topic2 represents the follow-up discussion of the event

53 Topic and Sentiment Evolution +e curves of topicfeatures extracted from four datasets through the TSTSmodel are shown in Figure 6 Taking dataset 2 as an examplethe topic curve conforms to the evolution law of social andabrupt events +e two curves represent the trend of featurewords over time in topic 1 and topic 2 Topic 1 is the

Table 4 Accuracy of sentiment polarity judgment

ASUM JST TSTS+e dataset 1 04763 05427 06348+e dataset 2 04617 05398 06599+e dataset 3 04832 05461 06475+e dataset 4 04841 05294 06522

Happy Surprise Sad AngryEmotion

0

01

02

03

04

05

06

07

08

Prop

ortio

n

Topic 1Topic 2

(a)

Happy Surprise Sad AngryEmotion

0

01

02

03

04

05

06

Prop

ortio

n

Topic 1Topic 2

(b)

Happy Surprise Sad AngryEmotion

Topic 1Topic 2

0

01

02

03

04

05

Prop

ortio

n

(c)

Happy Surprise Sad AngryEmotion

Topic 1Topic 2

0

01

02

03

04

05

Prop

ortio

n

(d)

Figure 5 Sentiment distribution in four datasets (a) Military parade in National Day (b) +e assault on a doctor (c) Hong Kongrsquos event(d) Garbage sorting in Shanghai

8 Complexity

statement about the case itself From the beginning of theevent the amount of discussion about the event on the socialnetwork rose sharply and then gradually declined Topic 2

is a discussion on the development of the case which hascaused the second hot discussion again +e time is notconsistent when two curves reach the high peak +e peak

0

04

02

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(a)

0

04

02

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(b)

0

02

04

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(c)

0

02

04

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(d)

Figure 6 +e changing of topics in four datasets (a) Military parade in National Day (b) +e assault on a doctor (c) Hong Kongrsquos event(d) Garbage sorting in Shanghai

Complexity 9

3 5 7 9 2113 15 17 19 23 27251 11Time

00102030405060708

Prop

ortio

n

HappySurprise

SadAngry

(a)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

06

07

Prop

ortio

n

(b)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

0

01

02

03

04

05

Prop

ortio

n

HappySurprise

SadAngry

(c)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

Prop

ortio

n

(d)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

0

01

02

03

04

05

Prop

ortio

n

HappySurprise

SadAngry

(e)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

Prop

ortio

n

(f )

Figure 7 Continued

10 Complexity

value of topic 2 curve is lower than that of topic 1 whichreflects the discussion of the same event will fade over timeEven if there is a new topic discussion of a new topic is farlower than the beginning of the event Meanwhile similarresults can be verified in other three datasets

+e proportion of the sentiment polarity in the fourdatasets is shown in Figure 7 Since the sentiment polarityproportion is measured the four sentiment polarities arebalanced distributed before the occurrence of the eventsAfter the event occurred the polarity of positive and neg-ative sentiment began to change toward the two extremesAmong the four datasets the positive sentiment was higherthan the negative sentiment in the first dataset ldquoNational Daymilitary parade eventrdquo and the fourth dataset ldquoShanghaigarbage classification eventrdquo which also conforms to thesocial sentiment of events In addition it can be found thatthe difference among the four sentiment labels is large in theinitial stage and the distribution of sentiment labels be-comes stable in the later stage It has proved that the secondreport of social events does not cause the heat for the firsttime But the sentiment tendency judgment in social net-works will not decline sharply with the reduction of dis-cussion which can be proved in topic 2 of the seconddataset ldquo+e assault on a doctorrdquo Given that the topicfeatures come from the background of a corpus and containa lot of noise words the relative position of four curves iscloser in terms of sentiment polarity evolution Howeverthere is still a gap between government feelings which isdifferent from the average distribution of sentiment polarityat the beginning of the event

6 Discussion

From the perspective of theoretical significance this paperextends the LDA model to some extent First in view of thesparse matrix caused by the short text word pairs are in-troduced to replace a single word for text generationaccording to BTM Based on the hypotheses of JST and

ASUM the sentiment layer is introduced to form theBayesian network structure and the word pair is limited tothe same sentiment polarity distribution Second in order torealize dynamic analysis and text homogeneity the time-stamp and the corresponding superparameter are intro-duced to alleviate the problem of the word order in the textgeneration +ird this research combines behavioral ex-periments big data mining mathematical modeling andimitating to promote the research expansion of new situa-tions and new methods

From the perspective of practical significance this paperis of great value in tracking and monitoring topics of publicopinion in social networks+e online public opinions of hotevents can be monitored which contributes to accuratelyjudge social events and make emergency decisions forgovernment or departments In addition this paper ana-lyzed the evolution response and governance of publicopinion which is conducive to understand the formationmechanism and the collaborative evolution of publicopinion Meanwhile the use of public opinion informationcan detect and screen information prevent the spread ofrumor and scientifically formulate the mechanism of uti-lization to effectively reduce the loss risk

7 Conclusions

In the context of the mobile social network the number ofshort texts is growing explosively In order to extract in-formation from massive short texts quickly and monitorpublic opinions the TSTS model is proposed in the studybased on LDA BTM JST ASUM and TOT From theexperimental results the TSTS model achieves good per-formance In terms of topic feature extraction the degree ofperplexity of TSTS is always lower than LDA Moreoveralthough the degree of perplexity is slightly higher than BTMwith the increase of iteration times it can maintain thebalance with BTM In the sentiment analysis the effect ofTSTS was significantly better than JST and ASUM Finally

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

Prop

ortio

n

HappySurprise

SadAngry

04

035

03

025

02

015

01

(g)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

Prop

ortio

n

05045

04035

03025

02015

01

(h)

Figure 7+e changing of sentiment in four datasets (a) Military parade in National Day topic 1 (b) Military parade in National Day topic2 (c)+e assault on a doctor topic 1 (d)+e assault on a doctor topic 2 (e) Hong Kongrsquos event topic 1 (f ) Hong Kongrsquos event topic 2(g) Garbage sorting in Shanghai topic 1 (h) Garbage sorting in Shanghai topic 2

Complexity 11

the TSTS model incorporating the time factor can determinethe change trend of the topic and sentiment

+ere are still some shortcomings in this paper Firstlyfor the extraction of topic feature words the global topiclevel can be added to the topic layer of the TSTS model tofilter the common topic words Secondly in sentimentpolarity judgment the sentiment labels aremanually markedbased on prior knowledge However the sentiments areextremely rich and changeable In the future research theBayesian network and entity theory can be used to judgesentiment bias

Data Availability

+e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is work was supported in part by the Project of NationalScience and Technology Department (2018YFF0213102)Public Projects of Zhejiang Province (LGF18G010003LGF19G010002 and LGF20G010002) Science and TechnologyProject of Zhejiang Province (2020C01158) and First ClassDiscipline of Zhejiang - A (Zhejiang Gongshang University -Statistics)

References

[1] CNNIC ldquoStatistical report on internet development inChinardquo CNNIC Beijing China 2019 httpwwwcacgovcn2019-0830c_1124938750htm

[2] M Ingawale A Dutta R Roy and P Seetharaman ldquoNetworkanalysis of user generated content quality in wikipediardquoOnline Information Review vol 37 no 4 pp 602ndash619 2013

[3] Q Gao Y Tian and M Tu ldquoExploring factors influencingChinese userrsquos perceived credibility of health and safety in-formation onWeibordquo Computers in Human Behavior vol 45pp 21ndash31 2015

[4] M Steyvers and T Griffiths ldquoProbabilistic Topic ModelsrdquoHandbook of Latent Semantic Analysis Psychology Pressvol 427 no 7 pp 424ndash440 New York NY USA 2007

[5] J-F Yeh Y-S Tan and C-H Lee ldquoTopic detection andtracking for conversational content by using conceptual dy-namic latent Dirichlet allocationrdquo Neurocomputing vol 216pp 310ndash318 2016

[6] X Zhou and L Chen ldquoEvent detection over twitter socialmedia streamsrdquoDeVLDB Journal vol 23 no 3 pp 381ndash4002014

[7] Y J Du Y T Yi and X Y Li ldquoExtracting and tracking hottopics of micro-blogs based on improved LatentDirichlet allocationrdquo Engineering Applications of ArtificialIntelligence vol 87 pp 1ndash13 2020

[8] H J Kim Y K Jeong Y Kim et al ldquoTopic-based content andsentiment analysis of Ebola virus on twitter and in the newsrdquoJournal of Information Science vol 42 no 6 pp 763ndash7812016

[9] B Subeno R Kusumaningrum and F Farikhin ldquoOptimi-sation towards latent Dirichlet allocation its topic numberand collapsed gibbs sampling inference processrdquo Interna-tional Journal of Electrical and Computer Engineering (IJECE)vol 8 no 5 pp 3204ndash3213 2018

[10] H Park T Park and Y-S Lee ldquoPartially collapsed gibbssampling for latent Dirichlet allocationrdquo Expert Systems withApplications vol 131 pp 208ndash218 2019

[11] D M Blei A Y Ng and M I Jordan ldquoLatentDirichlet allocationrdquo Journal of Machine Learning Researchvol 3 pp 993ndash1022 2003

[12] X Cheng X Yan Y Lan and J Guo ldquoBTM topic modelingover short textsrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 12 pp 2928ndash2941 2014

[13] J Rashid S M A Shah and A Irtaza ldquoFuzzy topic modelingapproach for text mining over short textrdquo Information Pro-cessing amp Management vol 56 no 6 pp 1ndash19 2019

[14] H-Y Lu N Kang Y Li Q-Y Zhan J-Y Xie andC-J Wang ldquoUtilizing recurrent neural network for topicdiscovery in short text scenariosrdquo Intelligent Data Analysisvol 23 no 2 pp 259ndash277 2019

[15] L Zhu H Xu Y Xu et al ldquoA joint model of extended LDAand IBTM over streaming Chinese short textsrdquo IntelligentData Analysis vol 23 no 3 pp 681ndash699 2019

[16] M Tang J Jin Y Liu et al ldquoIntegrating topic sentiment andsyntax for modeling online product reviews a topic modelapproachrdquo Journal of Computing and Information Science inEngineering vol 19 no 1 pp 1ndash12 2019

[17] R K Amplayo S Lee and M Song ldquoIncorporating productdescription to sentiment topic models for improved aspect-based sentiment analysisrdquo Information Sciences vol 454-455pp 200ndash215 2018

[18] L Yao Y Zhang B Wei et al ldquoConcept over time thecombination of probabilistic topic model with Wikipediaknowledgerdquo Expert Systems with Applications vol 60pp 27ndash38 2016

[19] S Park W Lee and I-C Moon ldquoAssociative topic modelswith numerical time seriesrdquo Information Processing ampManagement vol 51 no 5 pp 737ndash755 2015

[20] P Lorenz-Spreen F Wolf J Braun et al ldquoTracking onlinetopics over time understanding dynamic hashtag commu-nitiesrdquo Computational Social Networks vol 5 no 1 pp 1ndash182018

[21] Y He C Lin W Gao et al ldquoDynamic joint sentiment-topicmodelrdquo ACM Transactions on Intelligent Systems amp Tech-nology vol 5 no 1 pp 1ndash21 2013

[22] L Kuo and T Y Yang ldquoAn improved collapsed Gibbssampler for Dirichlet process mixing modelsrdquo ComputationalStatistics amp Data Analysis vol 50 no 3 pp 659ndash674 2006

[23] Y Papanikolaou J R Foulds T N Rubin et al ldquoDensedistributions from sparse samples improved Gibbs samplingparameter estimators for LDArdquo Statistics vol 18 no 62pp 1ndash58 2015

[24] X Zhou J Ouyang and X Li ldquoTwo time-efficient Gibbssampling inference algorithms for biterm topic modelrdquo Ap-plied Intelligence vol 48 no 3 pp 730ndash754 2018

[25] P Bhuyan ldquoEstimation of random-effects model for longi-tudinal data with non ignorable missingness using Gibbssamplingrdquo Computational Statistics vol 34 no 4pp 1963ndash1710 2019

[26] C Lin Y He R Everson and S Ruger ldquoWeakly supervisedjoint sentiment-topic detection from textrdquo IEEE Transactionson Knowledge and Data Engineering vol 24 no 6pp 1134ndash1145 2012

12 Complexity

[27] A Daud and F Muhammad ldquoGroup topic modeling foracademic knowledge discoveryrdquo Applied Intelligence vol 36no 4 pp 870ndash886 2012

[28] Z Huang J Tang G Shan J Ni Y Chen and C Wang ldquoAnefficient passenger-hunting recommendation framework withmulti-task deep learningrdquo IEEE Internet of Dings Journalvol 6 no 5 pp 7713ndash7721 2019

[29] S MMohammad S Kiritchenko and X Zhu ldquoNRC-Canadabuilding the state-of-the-art in sentiment analysis of tweetsrdquoin Proceedings of the Seventh International Workshop onSemantic Evaluation Exercises (SemEval-2013) SpringerAtlanta GA USA pp 1ndash5 June 2013

[30] T Chen Q Li J Yang G Cong and G Li ldquoModeling of thepublic opinion polarization process with the considerations ofindividual heterogeneity and dynamic conformityrdquo Mathe-matics vol 7 no 10 p 917 2019

[31] S A Curiskis B Drake T R Osborn and P J Kennedy ldquoAnevaluation of document clustering and topic modelling in twoonline social networks twitter and Redditrdquo InformationProcessing and Management vol 57 no 2 pp 1ndash21 2019

Complexity 13

Page 6: Research on Sentiment Tendency and Evolution of Public ...downloads.hindawi.com/journals/complexity/2020/9789431.pdf · , and the correlation distribution

which can be divided into positive and negative words +esentiment dictionary in this paper has two major roles Onthe one hand we can identify sentiment polarity words anddistinguish topic features and sentiment words On the otherhand combining with sentiment prior information to makethe model more accurate in judging the sentiment polarity ofthe text Given that sentiment polarity words can reflectusersrsquo sentiment tendency it is of great significance to an-alyze the sentiment orientation of the text

At present there are two major Chinese sentiment dic-tionaries NTU and HowNet +e former dictionary contains2812 positive words and 8276 negative words +e lattercontains about 5000 positive words and 5000 negative wordsBased on HowNet and the classification of sentiment polarity[29 30] this paper constitutes the sentiment dictionary of theTSTS model evaluation experiment as shown in Table 3

43 Parameter Setting In this paper the Gibbs algorithm isused to sample the TSTS model and estimate four posteriorparameters According to the parameter setting in the tra-ditional topic model the superparameters are set as followsFirst the superparameter α is set as 50K and K is thenumber of topics extracted Second β is set as 001 +ird cis set as (005 times AVE)S AVE stands for the average lengthof articles that is the average number of words in themicroblog in this experiment and S stands for the totalnumber of polar tags Finally μ is set as 1nd

44 Evaluation Indicator For the extraction of topic fea-tures perplexity is used as an evaluation indicator tomeasure the predictive power of unknown data in theprocess of model modeling Also the lower perplexity meansbetter efficiency +e calculation formula of the perplexity isas follows

perplexity P 1113957Dt |M( 1113857 exp minus1113936

Dt

d1 logP 1113957bt

d |M1113874 1113875

1113936Dt

d11113957N

t

d

⎧⎪⎪⎨

⎪⎪⎩

⎫⎪⎪⎬

⎪⎪⎭

(13)

where 1113957Dt 1113957bt

d1113882 1113883Dt

d1represents an unknown dataset with the

timestamp t

P 1113957bt

d |M1113874 1113875 1113945

1113957Nt

d

n11113945

L

l11113945

T

t1P 1113957bdn | l z1113872 1113873P(z | l)P(l) (14)

where 1113957bt

d represents the vector set of word pairs in text d 1113957Nt

d

represents the number of word pairs in 1113957bt

d and P(1113957bt

d |M)

represents the direct possibility of training corpus and theformula is as follows

P 1113957bt

d |M1113874 1113875 1113945V

i11113944

l

l11113944

T

z1φlzi middot θdlz middot πdl)

1113957Nt

di ⎛⎝ (15)

For sentiment segmentation the sentiment judgmentfrom the perspective of the document is used as the eval-uation index which is based on the sentim ent polarity labelin the sentiment dictionary For the documents in thisexperiment the positive and negative sentiment of a doc-ument can be judged +is paper adopts the consistency testmethod to mark the sentiment labels [31]

5 Results

51 Extraction of Topics +e primary task of the TSTSmodel is to extract topic features As an extension of thetopic-sentiment mixed model the assessment is to judgewhether the extracted topic features are reasonable andaccurate Before extracting topic features from text mod-eling it is necessary to determine the number of topics to beextracted and the iteration times of Gibbs sampling For theeffective evaluation of topic discovery the degree of per-plexity is used as the measurement index in the paper +elower the perplexity is the better the fitting effect of themodel is Taking dataset 1 as an example the simulationresults are shown in Figure 3

Based on the experimental results shown in Figure 3 thenumber of topics was set 20 in the subsequent experimentsIn addition we can calculate the perplexity of three modelswith the change of the iterations By comparing the ex-perimental results of TSTS and LDA it can be found that theeffect of TSTS is always better than LDA and the degree ofperplexity decreases with the increase of iteration +atindicates that the topic discovery ability of TSTS graduallyimproves mainly because the TSTS model incorporates theword pairs to alleviate the sparse matrix of LDA for shorttexts By comparing the experimental results of TSTS andBTM it can be found that TSTS was better than BTM whenthe number of iterations increases However as the number

Table 2 Experiment datasets

+e dataset (the number of comments)Number of words in per

microblog Vocabulary size

Initial Pretreatment Initial Pretreatment+e dataset 1 (3562) 134 102 9789 6319+e dataset 2 (3527) 127 94 9736 6242+e dataset 3 (3617) 131 100 9780 6301+e dataset 4 (3582) 128 96 9742 6254Average 130 98 9762 6279

Table 3 Classification of sentiment words

Sentiment labels Happy Surprise Sad AngryVocabulary size 2467 276 3025 1897

6 Complexity

of iterations increased the gap of both models becamesmaller +is is because the word pair of BTM is used for thewhole corpus When the number of iterations is small theproportion of noise words is relatively large resulting inpoor quality of topic words In addition the sentiment layeris integrated into TSTS and the error generated in thesentiment estimation will affect the next iteration AlthoughTSTS is worse than BTMwhen there are more iterations theeffect of TSTS can still be balanced with BTM +ereforeduring the extraction of topic features the number of topicsand iterations can be set as 20 and 600

52 Sentiment Polarity +e information related to senti-ment polarity is provided in accordance with the topic andsentiment polarity of words +e sentiment distribution oftopics extracted from the TSTS model is shown in Figure 4In addition JST and ASUM are introduced as the com-parison to measure the effect of sentiment recognition of theTSTS model Each document has a binary sentiment labelsuch as positive or negative sentiment Taking dataset 2 ldquo+eattack on the doctorrdquo as an example the result is shown inFigure 4 +e number of topics is set to 5 at the beginning ofthe experiment With the refinement of granularity theperformance of the TSTS model increases Compared withJSTand ASUM the curve of the TSTS model changes greatlyconsidering the topic and sentiment relationship amongword pairs of the document +e change curve of the JSTmodel shows a steady upward trend and the identificationefficiency of ASUM is low +at is because ASUM has strictassumptions and the increase in the number of topics willcause the decentralization of topics and sentiments whichhas a great negative impact on the overall performance of themodel +e overall effect of the TSTS model was slightlybetter than JST and ASUM but the effect decreased slightlyafter the number of topics increased to 20+is is because the

data collected in the dataset are limited and the number oftopics has been set to discretize the word distribution +usthe judgment of sentiment polarity is affected+e sentimentlabel classification of documents is compared under differenttopics and the result of the TSTS model is better than JSTand ASUM

With the increase of topics the recognition performanceof the topic model has some fluctuations But the TSTSmodel was always better than JST and ASUM When thenumber of obtained topics and iterations is set as 20 and 600the TSTS is the best model in topic detection When thenumber of topics in the four datasets was set as 20 theaccuracy of sentiment polarity judgment is shown in Table 4

From Table 4 the TSTS model is better than JST andASUM in judging the sentiment polarity of documents +is

0 5 10 15 20 25 30 35Topics

4000

4500

5000

5500

6000

6500

7000Perplexity

LDABTMTSTS

(a)

0 200 400 600 800 1000 1200Iteration

3500

4000

4500

5000

5500

Perp

lexi

ty

LDABTMTSTS

(b)

Figure 3 +e relationship of the perplexity with the number of (a) topics and (b) iterations

0 5 10 15 20 25 30 35Topics

04

045

05

055

06

065

07

Accu

racy

ASUMJSTTSTS

Figure 4 Accuracy of sentiment polarity judgment

Complexity 7

is because sentiment polarity depends on the performance oftopic discovery in the previous stage In this experiment theeffect of JST and ASUM was exactly opposite +e differencewas caused by the length of the original document whichalso indirectly verified the effectiveness of the TSTS model

From Figure 5 it can be seen that the proportion ofpositive sentiment is significantly higher than the othersentiments in the dataset ldquoMilitary parade in National Dayrdquoand the dataset ldquoGarbage sorting in Shanghairdquo which isconsistent with the sentiment tendency of users in the socialnetwork For the second dataset ldquo+e assault on a doctorrdquothe two kinds of negative sentiment polarities of topic 1 and

topic 2 were compared Topic 1 is more likely to be sadsentiment while topic 2 is more likely to be angry senti-ment Topic 1 reflects the statement of the event and topic2 represents the follow-up discussion of the event

53 Topic and Sentiment Evolution +e curves of topicfeatures extracted from four datasets through the TSTSmodel are shown in Figure 6 Taking dataset 2 as an examplethe topic curve conforms to the evolution law of social andabrupt events +e two curves represent the trend of featurewords over time in topic 1 and topic 2 Topic 1 is the

Table 4 Accuracy of sentiment polarity judgment

ASUM JST TSTS+e dataset 1 04763 05427 06348+e dataset 2 04617 05398 06599+e dataset 3 04832 05461 06475+e dataset 4 04841 05294 06522

Happy Surprise Sad AngryEmotion

0

01

02

03

04

05

06

07

08

Prop

ortio

n

Topic 1Topic 2

(a)

Happy Surprise Sad AngryEmotion

0

01

02

03

04

05

06

Prop

ortio

n

Topic 1Topic 2

(b)

Happy Surprise Sad AngryEmotion

Topic 1Topic 2

0

01

02

03

04

05

Prop

ortio

n

(c)

Happy Surprise Sad AngryEmotion

Topic 1Topic 2

0

01

02

03

04

05

Prop

ortio

n

(d)

Figure 5 Sentiment distribution in four datasets (a) Military parade in National Day (b) +e assault on a doctor (c) Hong Kongrsquos event(d) Garbage sorting in Shanghai

8 Complexity

statement about the case itself From the beginning of theevent the amount of discussion about the event on the socialnetwork rose sharply and then gradually declined Topic 2

is a discussion on the development of the case which hascaused the second hot discussion again +e time is notconsistent when two curves reach the high peak +e peak

0

04

02

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(a)

0

04

02

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(b)

0

02

04

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(c)

0

02

04

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(d)

Figure 6 +e changing of topics in four datasets (a) Military parade in National Day (b) +e assault on a doctor (c) Hong Kongrsquos event(d) Garbage sorting in Shanghai

Complexity 9

3 5 7 9 2113 15 17 19 23 27251 11Time

00102030405060708

Prop

ortio

n

HappySurprise

SadAngry

(a)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

06

07

Prop

ortio

n

(b)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

0

01

02

03

04

05

Prop

ortio

n

HappySurprise

SadAngry

(c)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

Prop

ortio

n

(d)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

0

01

02

03

04

05

Prop

ortio

n

HappySurprise

SadAngry

(e)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

Prop

ortio

n

(f )

Figure 7 Continued

10 Complexity

value of topic 2 curve is lower than that of topic 1 whichreflects the discussion of the same event will fade over timeEven if there is a new topic discussion of a new topic is farlower than the beginning of the event Meanwhile similarresults can be verified in other three datasets

+e proportion of the sentiment polarity in the fourdatasets is shown in Figure 7 Since the sentiment polarityproportion is measured the four sentiment polarities arebalanced distributed before the occurrence of the eventsAfter the event occurred the polarity of positive and neg-ative sentiment began to change toward the two extremesAmong the four datasets the positive sentiment was higherthan the negative sentiment in the first dataset ldquoNational Daymilitary parade eventrdquo and the fourth dataset ldquoShanghaigarbage classification eventrdquo which also conforms to thesocial sentiment of events In addition it can be found thatthe difference among the four sentiment labels is large in theinitial stage and the distribution of sentiment labels be-comes stable in the later stage It has proved that the secondreport of social events does not cause the heat for the firsttime But the sentiment tendency judgment in social net-works will not decline sharply with the reduction of dis-cussion which can be proved in topic 2 of the seconddataset ldquo+e assault on a doctorrdquo Given that the topicfeatures come from the background of a corpus and containa lot of noise words the relative position of four curves iscloser in terms of sentiment polarity evolution Howeverthere is still a gap between government feelings which isdifferent from the average distribution of sentiment polarityat the beginning of the event

6 Discussion

From the perspective of theoretical significance this paperextends the LDA model to some extent First in view of thesparse matrix caused by the short text word pairs are in-troduced to replace a single word for text generationaccording to BTM Based on the hypotheses of JST and

ASUM the sentiment layer is introduced to form theBayesian network structure and the word pair is limited tothe same sentiment polarity distribution Second in order torealize dynamic analysis and text homogeneity the time-stamp and the corresponding superparameter are intro-duced to alleviate the problem of the word order in the textgeneration +ird this research combines behavioral ex-periments big data mining mathematical modeling andimitating to promote the research expansion of new situa-tions and new methods

From the perspective of practical significance this paperis of great value in tracking and monitoring topics of publicopinion in social networks+e online public opinions of hotevents can be monitored which contributes to accuratelyjudge social events and make emergency decisions forgovernment or departments In addition this paper ana-lyzed the evolution response and governance of publicopinion which is conducive to understand the formationmechanism and the collaborative evolution of publicopinion Meanwhile the use of public opinion informationcan detect and screen information prevent the spread ofrumor and scientifically formulate the mechanism of uti-lization to effectively reduce the loss risk

7 Conclusions

In the context of the mobile social network the number ofshort texts is growing explosively In order to extract in-formation from massive short texts quickly and monitorpublic opinions the TSTS model is proposed in the studybased on LDA BTM JST ASUM and TOT From theexperimental results the TSTS model achieves good per-formance In terms of topic feature extraction the degree ofperplexity of TSTS is always lower than LDA Moreoveralthough the degree of perplexity is slightly higher than BTMwith the increase of iteration times it can maintain thebalance with BTM In the sentiment analysis the effect ofTSTS was significantly better than JST and ASUM Finally

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

Prop

ortio

n

HappySurprise

SadAngry

04

035

03

025

02

015

01

(g)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

Prop

ortio

n

05045

04035

03025

02015

01

(h)

Figure 7+e changing of sentiment in four datasets (a) Military parade in National Day topic 1 (b) Military parade in National Day topic2 (c)+e assault on a doctor topic 1 (d)+e assault on a doctor topic 2 (e) Hong Kongrsquos event topic 1 (f ) Hong Kongrsquos event topic 2(g) Garbage sorting in Shanghai topic 1 (h) Garbage sorting in Shanghai topic 2

Complexity 11

the TSTS model incorporating the time factor can determinethe change trend of the topic and sentiment

+ere are still some shortcomings in this paper Firstlyfor the extraction of topic feature words the global topiclevel can be added to the topic layer of the TSTS model tofilter the common topic words Secondly in sentimentpolarity judgment the sentiment labels aremanually markedbased on prior knowledge However the sentiments areextremely rich and changeable In the future research theBayesian network and entity theory can be used to judgesentiment bias

Data Availability

+e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is work was supported in part by the Project of NationalScience and Technology Department (2018YFF0213102)Public Projects of Zhejiang Province (LGF18G010003LGF19G010002 and LGF20G010002) Science and TechnologyProject of Zhejiang Province (2020C01158) and First ClassDiscipline of Zhejiang - A (Zhejiang Gongshang University -Statistics)

References

[1] CNNIC ldquoStatistical report on internet development inChinardquo CNNIC Beijing China 2019 httpwwwcacgovcn2019-0830c_1124938750htm

[2] M Ingawale A Dutta R Roy and P Seetharaman ldquoNetworkanalysis of user generated content quality in wikipediardquoOnline Information Review vol 37 no 4 pp 602ndash619 2013

[3] Q Gao Y Tian and M Tu ldquoExploring factors influencingChinese userrsquos perceived credibility of health and safety in-formation onWeibordquo Computers in Human Behavior vol 45pp 21ndash31 2015

[4] M Steyvers and T Griffiths ldquoProbabilistic Topic ModelsrdquoHandbook of Latent Semantic Analysis Psychology Pressvol 427 no 7 pp 424ndash440 New York NY USA 2007

[5] J-F Yeh Y-S Tan and C-H Lee ldquoTopic detection andtracking for conversational content by using conceptual dy-namic latent Dirichlet allocationrdquo Neurocomputing vol 216pp 310ndash318 2016

[6] X Zhou and L Chen ldquoEvent detection over twitter socialmedia streamsrdquoDeVLDB Journal vol 23 no 3 pp 381ndash4002014

[7] Y J Du Y T Yi and X Y Li ldquoExtracting and tracking hottopics of micro-blogs based on improved LatentDirichlet allocationrdquo Engineering Applications of ArtificialIntelligence vol 87 pp 1ndash13 2020

[8] H J Kim Y K Jeong Y Kim et al ldquoTopic-based content andsentiment analysis of Ebola virus on twitter and in the newsrdquoJournal of Information Science vol 42 no 6 pp 763ndash7812016

[9] B Subeno R Kusumaningrum and F Farikhin ldquoOptimi-sation towards latent Dirichlet allocation its topic numberand collapsed gibbs sampling inference processrdquo Interna-tional Journal of Electrical and Computer Engineering (IJECE)vol 8 no 5 pp 3204ndash3213 2018

[10] H Park T Park and Y-S Lee ldquoPartially collapsed gibbssampling for latent Dirichlet allocationrdquo Expert Systems withApplications vol 131 pp 208ndash218 2019

[11] D M Blei A Y Ng and M I Jordan ldquoLatentDirichlet allocationrdquo Journal of Machine Learning Researchvol 3 pp 993ndash1022 2003

[12] X Cheng X Yan Y Lan and J Guo ldquoBTM topic modelingover short textsrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 12 pp 2928ndash2941 2014

[13] J Rashid S M A Shah and A Irtaza ldquoFuzzy topic modelingapproach for text mining over short textrdquo Information Pro-cessing amp Management vol 56 no 6 pp 1ndash19 2019

[14] H-Y Lu N Kang Y Li Q-Y Zhan J-Y Xie andC-J Wang ldquoUtilizing recurrent neural network for topicdiscovery in short text scenariosrdquo Intelligent Data Analysisvol 23 no 2 pp 259ndash277 2019

[15] L Zhu H Xu Y Xu et al ldquoA joint model of extended LDAand IBTM over streaming Chinese short textsrdquo IntelligentData Analysis vol 23 no 3 pp 681ndash699 2019

[16] M Tang J Jin Y Liu et al ldquoIntegrating topic sentiment andsyntax for modeling online product reviews a topic modelapproachrdquo Journal of Computing and Information Science inEngineering vol 19 no 1 pp 1ndash12 2019

[17] R K Amplayo S Lee and M Song ldquoIncorporating productdescription to sentiment topic models for improved aspect-based sentiment analysisrdquo Information Sciences vol 454-455pp 200ndash215 2018

[18] L Yao Y Zhang B Wei et al ldquoConcept over time thecombination of probabilistic topic model with Wikipediaknowledgerdquo Expert Systems with Applications vol 60pp 27ndash38 2016

[19] S Park W Lee and I-C Moon ldquoAssociative topic modelswith numerical time seriesrdquo Information Processing ampManagement vol 51 no 5 pp 737ndash755 2015

[20] P Lorenz-Spreen F Wolf J Braun et al ldquoTracking onlinetopics over time understanding dynamic hashtag commu-nitiesrdquo Computational Social Networks vol 5 no 1 pp 1ndash182018

[21] Y He C Lin W Gao et al ldquoDynamic joint sentiment-topicmodelrdquo ACM Transactions on Intelligent Systems amp Tech-nology vol 5 no 1 pp 1ndash21 2013

[22] L Kuo and T Y Yang ldquoAn improved collapsed Gibbssampler for Dirichlet process mixing modelsrdquo ComputationalStatistics amp Data Analysis vol 50 no 3 pp 659ndash674 2006

[23] Y Papanikolaou J R Foulds T N Rubin et al ldquoDensedistributions from sparse samples improved Gibbs samplingparameter estimators for LDArdquo Statistics vol 18 no 62pp 1ndash58 2015

[24] X Zhou J Ouyang and X Li ldquoTwo time-efficient Gibbssampling inference algorithms for biterm topic modelrdquo Ap-plied Intelligence vol 48 no 3 pp 730ndash754 2018

[25] P Bhuyan ldquoEstimation of random-effects model for longi-tudinal data with non ignorable missingness using Gibbssamplingrdquo Computational Statistics vol 34 no 4pp 1963ndash1710 2019

[26] C Lin Y He R Everson and S Ruger ldquoWeakly supervisedjoint sentiment-topic detection from textrdquo IEEE Transactionson Knowledge and Data Engineering vol 24 no 6pp 1134ndash1145 2012

12 Complexity

[27] A Daud and F Muhammad ldquoGroup topic modeling foracademic knowledge discoveryrdquo Applied Intelligence vol 36no 4 pp 870ndash886 2012

[28] Z Huang J Tang G Shan J Ni Y Chen and C Wang ldquoAnefficient passenger-hunting recommendation framework withmulti-task deep learningrdquo IEEE Internet of Dings Journalvol 6 no 5 pp 7713ndash7721 2019

[29] S MMohammad S Kiritchenko and X Zhu ldquoNRC-Canadabuilding the state-of-the-art in sentiment analysis of tweetsrdquoin Proceedings of the Seventh International Workshop onSemantic Evaluation Exercises (SemEval-2013) SpringerAtlanta GA USA pp 1ndash5 June 2013

[30] T Chen Q Li J Yang G Cong and G Li ldquoModeling of thepublic opinion polarization process with the considerations ofindividual heterogeneity and dynamic conformityrdquo Mathe-matics vol 7 no 10 p 917 2019

[31] S A Curiskis B Drake T R Osborn and P J Kennedy ldquoAnevaluation of document clustering and topic modelling in twoonline social networks twitter and Redditrdquo InformationProcessing and Management vol 57 no 2 pp 1ndash21 2019

Complexity 13

Page 7: Research on Sentiment Tendency and Evolution of Public ...downloads.hindawi.com/journals/complexity/2020/9789431.pdf · , and the correlation distribution

of iterations increased the gap of both models becamesmaller +is is because the word pair of BTM is used for thewhole corpus When the number of iterations is small theproportion of noise words is relatively large resulting inpoor quality of topic words In addition the sentiment layeris integrated into TSTS and the error generated in thesentiment estimation will affect the next iteration AlthoughTSTS is worse than BTMwhen there are more iterations theeffect of TSTS can still be balanced with BTM +ereforeduring the extraction of topic features the number of topicsand iterations can be set as 20 and 600

52 Sentiment Polarity +e information related to senti-ment polarity is provided in accordance with the topic andsentiment polarity of words +e sentiment distribution oftopics extracted from the TSTS model is shown in Figure 4In addition JST and ASUM are introduced as the com-parison to measure the effect of sentiment recognition of theTSTS model Each document has a binary sentiment labelsuch as positive or negative sentiment Taking dataset 2 ldquo+eattack on the doctorrdquo as an example the result is shown inFigure 4 +e number of topics is set to 5 at the beginning ofthe experiment With the refinement of granularity theperformance of the TSTS model increases Compared withJSTand ASUM the curve of the TSTS model changes greatlyconsidering the topic and sentiment relationship amongword pairs of the document +e change curve of the JSTmodel shows a steady upward trend and the identificationefficiency of ASUM is low +at is because ASUM has strictassumptions and the increase in the number of topics willcause the decentralization of topics and sentiments whichhas a great negative impact on the overall performance of themodel +e overall effect of the TSTS model was slightlybetter than JST and ASUM but the effect decreased slightlyafter the number of topics increased to 20+is is because the

data collected in the dataset are limited and the number oftopics has been set to discretize the word distribution +usthe judgment of sentiment polarity is affected+e sentimentlabel classification of documents is compared under differenttopics and the result of the TSTS model is better than JSTand ASUM

With the increase of topics the recognition performanceof the topic model has some fluctuations But the TSTSmodel was always better than JST and ASUM When thenumber of obtained topics and iterations is set as 20 and 600the TSTS is the best model in topic detection When thenumber of topics in the four datasets was set as 20 theaccuracy of sentiment polarity judgment is shown in Table 4

From Table 4 the TSTS model is better than JST andASUM in judging the sentiment polarity of documents +is

0 5 10 15 20 25 30 35Topics

4000

4500

5000

5500

6000

6500

7000Perplexity

LDABTMTSTS

(a)

0 200 400 600 800 1000 1200Iteration

3500

4000

4500

5000

5500

Perp

lexi

ty

LDABTMTSTS

(b)

Figure 3 +e relationship of the perplexity with the number of (a) topics and (b) iterations

0 5 10 15 20 25 30 35Topics

04

045

05

055

06

065

07

Accu

racy

ASUMJSTTSTS

Figure 4 Accuracy of sentiment polarity judgment

Complexity 7

is because sentiment polarity depends on the performance oftopic discovery in the previous stage In this experiment theeffect of JST and ASUM was exactly opposite +e differencewas caused by the length of the original document whichalso indirectly verified the effectiveness of the TSTS model

From Figure 5 it can be seen that the proportion ofpositive sentiment is significantly higher than the othersentiments in the dataset ldquoMilitary parade in National Dayrdquoand the dataset ldquoGarbage sorting in Shanghairdquo which isconsistent with the sentiment tendency of users in the socialnetwork For the second dataset ldquo+e assault on a doctorrdquothe two kinds of negative sentiment polarities of topic 1 and

topic 2 were compared Topic 1 is more likely to be sadsentiment while topic 2 is more likely to be angry senti-ment Topic 1 reflects the statement of the event and topic2 represents the follow-up discussion of the event

53 Topic and Sentiment Evolution +e curves of topicfeatures extracted from four datasets through the TSTSmodel are shown in Figure 6 Taking dataset 2 as an examplethe topic curve conforms to the evolution law of social andabrupt events +e two curves represent the trend of featurewords over time in topic 1 and topic 2 Topic 1 is the

Table 4 Accuracy of sentiment polarity judgment

ASUM JST TSTS+e dataset 1 04763 05427 06348+e dataset 2 04617 05398 06599+e dataset 3 04832 05461 06475+e dataset 4 04841 05294 06522

Happy Surprise Sad AngryEmotion

0

01

02

03

04

05

06

07

08

Prop

ortio

n

Topic 1Topic 2

(a)

Happy Surprise Sad AngryEmotion

0

01

02

03

04

05

06

Prop

ortio

n

Topic 1Topic 2

(b)

Happy Surprise Sad AngryEmotion

Topic 1Topic 2

0

01

02

03

04

05

Prop

ortio

n

(c)

Happy Surprise Sad AngryEmotion

Topic 1Topic 2

0

01

02

03

04

05

Prop

ortio

n

(d)

Figure 5 Sentiment distribution in four datasets (a) Military parade in National Day (b) +e assault on a doctor (c) Hong Kongrsquos event(d) Garbage sorting in Shanghai

8 Complexity

statement about the case itself From the beginning of theevent the amount of discussion about the event on the socialnetwork rose sharply and then gradually declined Topic 2

is a discussion on the development of the case which hascaused the second hot discussion again +e time is notconsistent when two curves reach the high peak +e peak

0

04

02

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(a)

0

04

02

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(b)

0

02

04

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(c)

0

02

04

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(d)

Figure 6 +e changing of topics in four datasets (a) Military parade in National Day (b) +e assault on a doctor (c) Hong Kongrsquos event(d) Garbage sorting in Shanghai

Complexity 9

3 5 7 9 2113 15 17 19 23 27251 11Time

00102030405060708

Prop

ortio

n

HappySurprise

SadAngry

(a)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

06

07

Prop

ortio

n

(b)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

0

01

02

03

04

05

Prop

ortio

n

HappySurprise

SadAngry

(c)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

Prop

ortio

n

(d)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

0

01

02

03

04

05

Prop

ortio

n

HappySurprise

SadAngry

(e)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

Prop

ortio

n

(f )

Figure 7 Continued

10 Complexity

value of topic 2 curve is lower than that of topic 1 whichreflects the discussion of the same event will fade over timeEven if there is a new topic discussion of a new topic is farlower than the beginning of the event Meanwhile similarresults can be verified in other three datasets

+e proportion of the sentiment polarity in the fourdatasets is shown in Figure 7 Since the sentiment polarityproportion is measured the four sentiment polarities arebalanced distributed before the occurrence of the eventsAfter the event occurred the polarity of positive and neg-ative sentiment began to change toward the two extremesAmong the four datasets the positive sentiment was higherthan the negative sentiment in the first dataset ldquoNational Daymilitary parade eventrdquo and the fourth dataset ldquoShanghaigarbage classification eventrdquo which also conforms to thesocial sentiment of events In addition it can be found thatthe difference among the four sentiment labels is large in theinitial stage and the distribution of sentiment labels be-comes stable in the later stage It has proved that the secondreport of social events does not cause the heat for the firsttime But the sentiment tendency judgment in social net-works will not decline sharply with the reduction of dis-cussion which can be proved in topic 2 of the seconddataset ldquo+e assault on a doctorrdquo Given that the topicfeatures come from the background of a corpus and containa lot of noise words the relative position of four curves iscloser in terms of sentiment polarity evolution Howeverthere is still a gap between government feelings which isdifferent from the average distribution of sentiment polarityat the beginning of the event

6 Discussion

From the perspective of theoretical significance this paperextends the LDA model to some extent First in view of thesparse matrix caused by the short text word pairs are in-troduced to replace a single word for text generationaccording to BTM Based on the hypotheses of JST and

ASUM the sentiment layer is introduced to form theBayesian network structure and the word pair is limited tothe same sentiment polarity distribution Second in order torealize dynamic analysis and text homogeneity the time-stamp and the corresponding superparameter are intro-duced to alleviate the problem of the word order in the textgeneration +ird this research combines behavioral ex-periments big data mining mathematical modeling andimitating to promote the research expansion of new situa-tions and new methods

From the perspective of practical significance this paperis of great value in tracking and monitoring topics of publicopinion in social networks+e online public opinions of hotevents can be monitored which contributes to accuratelyjudge social events and make emergency decisions forgovernment or departments In addition this paper ana-lyzed the evolution response and governance of publicopinion which is conducive to understand the formationmechanism and the collaborative evolution of publicopinion Meanwhile the use of public opinion informationcan detect and screen information prevent the spread ofrumor and scientifically formulate the mechanism of uti-lization to effectively reduce the loss risk

7 Conclusions

In the context of the mobile social network the number ofshort texts is growing explosively In order to extract in-formation from massive short texts quickly and monitorpublic opinions the TSTS model is proposed in the studybased on LDA BTM JST ASUM and TOT From theexperimental results the TSTS model achieves good per-formance In terms of topic feature extraction the degree ofperplexity of TSTS is always lower than LDA Moreoveralthough the degree of perplexity is slightly higher than BTMwith the increase of iteration times it can maintain thebalance with BTM In the sentiment analysis the effect ofTSTS was significantly better than JST and ASUM Finally

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

Prop

ortio

n

HappySurprise

SadAngry

04

035

03

025

02

015

01

(g)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

Prop

ortio

n

05045

04035

03025

02015

01

(h)

Figure 7+e changing of sentiment in four datasets (a) Military parade in National Day topic 1 (b) Military parade in National Day topic2 (c)+e assault on a doctor topic 1 (d)+e assault on a doctor topic 2 (e) Hong Kongrsquos event topic 1 (f ) Hong Kongrsquos event topic 2(g) Garbage sorting in Shanghai topic 1 (h) Garbage sorting in Shanghai topic 2

Complexity 11

the TSTS model incorporating the time factor can determinethe change trend of the topic and sentiment

+ere are still some shortcomings in this paper Firstlyfor the extraction of topic feature words the global topiclevel can be added to the topic layer of the TSTS model tofilter the common topic words Secondly in sentimentpolarity judgment the sentiment labels aremanually markedbased on prior knowledge However the sentiments areextremely rich and changeable In the future research theBayesian network and entity theory can be used to judgesentiment bias

Data Availability

+e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is work was supported in part by the Project of NationalScience and Technology Department (2018YFF0213102)Public Projects of Zhejiang Province (LGF18G010003LGF19G010002 and LGF20G010002) Science and TechnologyProject of Zhejiang Province (2020C01158) and First ClassDiscipline of Zhejiang - A (Zhejiang Gongshang University -Statistics)

References

[1] CNNIC ldquoStatistical report on internet development inChinardquo CNNIC Beijing China 2019 httpwwwcacgovcn2019-0830c_1124938750htm

[2] M Ingawale A Dutta R Roy and P Seetharaman ldquoNetworkanalysis of user generated content quality in wikipediardquoOnline Information Review vol 37 no 4 pp 602ndash619 2013

[3] Q Gao Y Tian and M Tu ldquoExploring factors influencingChinese userrsquos perceived credibility of health and safety in-formation onWeibordquo Computers in Human Behavior vol 45pp 21ndash31 2015

[4] M Steyvers and T Griffiths ldquoProbabilistic Topic ModelsrdquoHandbook of Latent Semantic Analysis Psychology Pressvol 427 no 7 pp 424ndash440 New York NY USA 2007

[5] J-F Yeh Y-S Tan and C-H Lee ldquoTopic detection andtracking for conversational content by using conceptual dy-namic latent Dirichlet allocationrdquo Neurocomputing vol 216pp 310ndash318 2016

[6] X Zhou and L Chen ldquoEvent detection over twitter socialmedia streamsrdquoDeVLDB Journal vol 23 no 3 pp 381ndash4002014

[7] Y J Du Y T Yi and X Y Li ldquoExtracting and tracking hottopics of micro-blogs based on improved LatentDirichlet allocationrdquo Engineering Applications of ArtificialIntelligence vol 87 pp 1ndash13 2020

[8] H J Kim Y K Jeong Y Kim et al ldquoTopic-based content andsentiment analysis of Ebola virus on twitter and in the newsrdquoJournal of Information Science vol 42 no 6 pp 763ndash7812016

[9] B Subeno R Kusumaningrum and F Farikhin ldquoOptimi-sation towards latent Dirichlet allocation its topic numberand collapsed gibbs sampling inference processrdquo Interna-tional Journal of Electrical and Computer Engineering (IJECE)vol 8 no 5 pp 3204ndash3213 2018

[10] H Park T Park and Y-S Lee ldquoPartially collapsed gibbssampling for latent Dirichlet allocationrdquo Expert Systems withApplications vol 131 pp 208ndash218 2019

[11] D M Blei A Y Ng and M I Jordan ldquoLatentDirichlet allocationrdquo Journal of Machine Learning Researchvol 3 pp 993ndash1022 2003

[12] X Cheng X Yan Y Lan and J Guo ldquoBTM topic modelingover short textsrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 12 pp 2928ndash2941 2014

[13] J Rashid S M A Shah and A Irtaza ldquoFuzzy topic modelingapproach for text mining over short textrdquo Information Pro-cessing amp Management vol 56 no 6 pp 1ndash19 2019

[14] H-Y Lu N Kang Y Li Q-Y Zhan J-Y Xie andC-J Wang ldquoUtilizing recurrent neural network for topicdiscovery in short text scenariosrdquo Intelligent Data Analysisvol 23 no 2 pp 259ndash277 2019

[15] L Zhu H Xu Y Xu et al ldquoA joint model of extended LDAand IBTM over streaming Chinese short textsrdquo IntelligentData Analysis vol 23 no 3 pp 681ndash699 2019

[16] M Tang J Jin Y Liu et al ldquoIntegrating topic sentiment andsyntax for modeling online product reviews a topic modelapproachrdquo Journal of Computing and Information Science inEngineering vol 19 no 1 pp 1ndash12 2019

[17] R K Amplayo S Lee and M Song ldquoIncorporating productdescription to sentiment topic models for improved aspect-based sentiment analysisrdquo Information Sciences vol 454-455pp 200ndash215 2018

[18] L Yao Y Zhang B Wei et al ldquoConcept over time thecombination of probabilistic topic model with Wikipediaknowledgerdquo Expert Systems with Applications vol 60pp 27ndash38 2016

[19] S Park W Lee and I-C Moon ldquoAssociative topic modelswith numerical time seriesrdquo Information Processing ampManagement vol 51 no 5 pp 737ndash755 2015

[20] P Lorenz-Spreen F Wolf J Braun et al ldquoTracking onlinetopics over time understanding dynamic hashtag commu-nitiesrdquo Computational Social Networks vol 5 no 1 pp 1ndash182018

[21] Y He C Lin W Gao et al ldquoDynamic joint sentiment-topicmodelrdquo ACM Transactions on Intelligent Systems amp Tech-nology vol 5 no 1 pp 1ndash21 2013

[22] L Kuo and T Y Yang ldquoAn improved collapsed Gibbssampler for Dirichlet process mixing modelsrdquo ComputationalStatistics amp Data Analysis vol 50 no 3 pp 659ndash674 2006

[23] Y Papanikolaou J R Foulds T N Rubin et al ldquoDensedistributions from sparse samples improved Gibbs samplingparameter estimators for LDArdquo Statistics vol 18 no 62pp 1ndash58 2015

[24] X Zhou J Ouyang and X Li ldquoTwo time-efficient Gibbssampling inference algorithms for biterm topic modelrdquo Ap-plied Intelligence vol 48 no 3 pp 730ndash754 2018

[25] P Bhuyan ldquoEstimation of random-effects model for longi-tudinal data with non ignorable missingness using Gibbssamplingrdquo Computational Statistics vol 34 no 4pp 1963ndash1710 2019

[26] C Lin Y He R Everson and S Ruger ldquoWeakly supervisedjoint sentiment-topic detection from textrdquo IEEE Transactionson Knowledge and Data Engineering vol 24 no 6pp 1134ndash1145 2012

12 Complexity

[27] A Daud and F Muhammad ldquoGroup topic modeling foracademic knowledge discoveryrdquo Applied Intelligence vol 36no 4 pp 870ndash886 2012

[28] Z Huang J Tang G Shan J Ni Y Chen and C Wang ldquoAnefficient passenger-hunting recommendation framework withmulti-task deep learningrdquo IEEE Internet of Dings Journalvol 6 no 5 pp 7713ndash7721 2019

[29] S MMohammad S Kiritchenko and X Zhu ldquoNRC-Canadabuilding the state-of-the-art in sentiment analysis of tweetsrdquoin Proceedings of the Seventh International Workshop onSemantic Evaluation Exercises (SemEval-2013) SpringerAtlanta GA USA pp 1ndash5 June 2013

[30] T Chen Q Li J Yang G Cong and G Li ldquoModeling of thepublic opinion polarization process with the considerations ofindividual heterogeneity and dynamic conformityrdquo Mathe-matics vol 7 no 10 p 917 2019

[31] S A Curiskis B Drake T R Osborn and P J Kennedy ldquoAnevaluation of document clustering and topic modelling in twoonline social networks twitter and Redditrdquo InformationProcessing and Management vol 57 no 2 pp 1ndash21 2019

Complexity 13

Page 8: Research on Sentiment Tendency and Evolution of Public ...downloads.hindawi.com/journals/complexity/2020/9789431.pdf · , and the correlation distribution

is because sentiment polarity depends on the performance oftopic discovery in the previous stage In this experiment theeffect of JST and ASUM was exactly opposite +e differencewas caused by the length of the original document whichalso indirectly verified the effectiveness of the TSTS model

From Figure 5 it can be seen that the proportion ofpositive sentiment is significantly higher than the othersentiments in the dataset ldquoMilitary parade in National Dayrdquoand the dataset ldquoGarbage sorting in Shanghairdquo which isconsistent with the sentiment tendency of users in the socialnetwork For the second dataset ldquo+e assault on a doctorrdquothe two kinds of negative sentiment polarities of topic 1 and

topic 2 were compared Topic 1 is more likely to be sadsentiment while topic 2 is more likely to be angry senti-ment Topic 1 reflects the statement of the event and topic2 represents the follow-up discussion of the event

53 Topic and Sentiment Evolution +e curves of topicfeatures extracted from four datasets through the TSTSmodel are shown in Figure 6 Taking dataset 2 as an examplethe topic curve conforms to the evolution law of social andabrupt events +e two curves represent the trend of featurewords over time in topic 1 and topic 2 Topic 1 is the

Table 4 Accuracy of sentiment polarity judgment

ASUM JST TSTS+e dataset 1 04763 05427 06348+e dataset 2 04617 05398 06599+e dataset 3 04832 05461 06475+e dataset 4 04841 05294 06522

Happy Surprise Sad AngryEmotion

0

01

02

03

04

05

06

07

08

Prop

ortio

n

Topic 1Topic 2

(a)

Happy Surprise Sad AngryEmotion

0

01

02

03

04

05

06

Prop

ortio

n

Topic 1Topic 2

(b)

Happy Surprise Sad AngryEmotion

Topic 1Topic 2

0

01

02

03

04

05

Prop

ortio

n

(c)

Happy Surprise Sad AngryEmotion

Topic 1Topic 2

0

01

02

03

04

05

Prop

ortio

n

(d)

Figure 5 Sentiment distribution in four datasets (a) Military parade in National Day (b) +e assault on a doctor (c) Hong Kongrsquos event(d) Garbage sorting in Shanghai

8 Complexity

statement about the case itself From the beginning of theevent the amount of discussion about the event on the socialnetwork rose sharply and then gradually declined Topic 2

is a discussion on the development of the case which hascaused the second hot discussion again +e time is notconsistent when two curves reach the high peak +e peak

0

04

02

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(a)

0

04

02

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(b)

0

02

04

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(c)

0

02

04

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(d)

Figure 6 +e changing of topics in four datasets (a) Military parade in National Day (b) +e assault on a doctor (c) Hong Kongrsquos event(d) Garbage sorting in Shanghai

Complexity 9

3 5 7 9 2113 15 17 19 23 27251 11Time

00102030405060708

Prop

ortio

n

HappySurprise

SadAngry

(a)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

06

07

Prop

ortio

n

(b)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

0

01

02

03

04

05

Prop

ortio

n

HappySurprise

SadAngry

(c)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

Prop

ortio

n

(d)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

0

01

02

03

04

05

Prop

ortio

n

HappySurprise

SadAngry

(e)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

Prop

ortio

n

(f )

Figure 7 Continued

10 Complexity

value of topic 2 curve is lower than that of topic 1 whichreflects the discussion of the same event will fade over timeEven if there is a new topic discussion of a new topic is farlower than the beginning of the event Meanwhile similarresults can be verified in other three datasets

+e proportion of the sentiment polarity in the fourdatasets is shown in Figure 7 Since the sentiment polarityproportion is measured the four sentiment polarities arebalanced distributed before the occurrence of the eventsAfter the event occurred the polarity of positive and neg-ative sentiment began to change toward the two extremesAmong the four datasets the positive sentiment was higherthan the negative sentiment in the first dataset ldquoNational Daymilitary parade eventrdquo and the fourth dataset ldquoShanghaigarbage classification eventrdquo which also conforms to thesocial sentiment of events In addition it can be found thatthe difference among the four sentiment labels is large in theinitial stage and the distribution of sentiment labels be-comes stable in the later stage It has proved that the secondreport of social events does not cause the heat for the firsttime But the sentiment tendency judgment in social net-works will not decline sharply with the reduction of dis-cussion which can be proved in topic 2 of the seconddataset ldquo+e assault on a doctorrdquo Given that the topicfeatures come from the background of a corpus and containa lot of noise words the relative position of four curves iscloser in terms of sentiment polarity evolution Howeverthere is still a gap between government feelings which isdifferent from the average distribution of sentiment polarityat the beginning of the event

6 Discussion

From the perspective of theoretical significance this paperextends the LDA model to some extent First in view of thesparse matrix caused by the short text word pairs are in-troduced to replace a single word for text generationaccording to BTM Based on the hypotheses of JST and

ASUM the sentiment layer is introduced to form theBayesian network structure and the word pair is limited tothe same sentiment polarity distribution Second in order torealize dynamic analysis and text homogeneity the time-stamp and the corresponding superparameter are intro-duced to alleviate the problem of the word order in the textgeneration +ird this research combines behavioral ex-periments big data mining mathematical modeling andimitating to promote the research expansion of new situa-tions and new methods

From the perspective of practical significance this paperis of great value in tracking and monitoring topics of publicopinion in social networks+e online public opinions of hotevents can be monitored which contributes to accuratelyjudge social events and make emergency decisions forgovernment or departments In addition this paper ana-lyzed the evolution response and governance of publicopinion which is conducive to understand the formationmechanism and the collaborative evolution of publicopinion Meanwhile the use of public opinion informationcan detect and screen information prevent the spread ofrumor and scientifically formulate the mechanism of uti-lization to effectively reduce the loss risk

7 Conclusions

In the context of the mobile social network the number ofshort texts is growing explosively In order to extract in-formation from massive short texts quickly and monitorpublic opinions the TSTS model is proposed in the studybased on LDA BTM JST ASUM and TOT From theexperimental results the TSTS model achieves good per-formance In terms of topic feature extraction the degree ofperplexity of TSTS is always lower than LDA Moreoveralthough the degree of perplexity is slightly higher than BTMwith the increase of iteration times it can maintain thebalance with BTM In the sentiment analysis the effect ofTSTS was significantly better than JST and ASUM Finally

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

Prop

ortio

n

HappySurprise

SadAngry

04

035

03

025

02

015

01

(g)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

Prop

ortio

n

05045

04035

03025

02015

01

(h)

Figure 7+e changing of sentiment in four datasets (a) Military parade in National Day topic 1 (b) Military parade in National Day topic2 (c)+e assault on a doctor topic 1 (d)+e assault on a doctor topic 2 (e) Hong Kongrsquos event topic 1 (f ) Hong Kongrsquos event topic 2(g) Garbage sorting in Shanghai topic 1 (h) Garbage sorting in Shanghai topic 2

Complexity 11

the TSTS model incorporating the time factor can determinethe change trend of the topic and sentiment

+ere are still some shortcomings in this paper Firstlyfor the extraction of topic feature words the global topiclevel can be added to the topic layer of the TSTS model tofilter the common topic words Secondly in sentimentpolarity judgment the sentiment labels aremanually markedbased on prior knowledge However the sentiments areextremely rich and changeable In the future research theBayesian network and entity theory can be used to judgesentiment bias

Data Availability

+e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is work was supported in part by the Project of NationalScience and Technology Department (2018YFF0213102)Public Projects of Zhejiang Province (LGF18G010003LGF19G010002 and LGF20G010002) Science and TechnologyProject of Zhejiang Province (2020C01158) and First ClassDiscipline of Zhejiang - A (Zhejiang Gongshang University -Statistics)

References

[1] CNNIC ldquoStatistical report on internet development inChinardquo CNNIC Beijing China 2019 httpwwwcacgovcn2019-0830c_1124938750htm

[2] M Ingawale A Dutta R Roy and P Seetharaman ldquoNetworkanalysis of user generated content quality in wikipediardquoOnline Information Review vol 37 no 4 pp 602ndash619 2013

[3] Q Gao Y Tian and M Tu ldquoExploring factors influencingChinese userrsquos perceived credibility of health and safety in-formation onWeibordquo Computers in Human Behavior vol 45pp 21ndash31 2015

[4] M Steyvers and T Griffiths ldquoProbabilistic Topic ModelsrdquoHandbook of Latent Semantic Analysis Psychology Pressvol 427 no 7 pp 424ndash440 New York NY USA 2007

[5] J-F Yeh Y-S Tan and C-H Lee ldquoTopic detection andtracking for conversational content by using conceptual dy-namic latent Dirichlet allocationrdquo Neurocomputing vol 216pp 310ndash318 2016

[6] X Zhou and L Chen ldquoEvent detection over twitter socialmedia streamsrdquoDeVLDB Journal vol 23 no 3 pp 381ndash4002014

[7] Y J Du Y T Yi and X Y Li ldquoExtracting and tracking hottopics of micro-blogs based on improved LatentDirichlet allocationrdquo Engineering Applications of ArtificialIntelligence vol 87 pp 1ndash13 2020

[8] H J Kim Y K Jeong Y Kim et al ldquoTopic-based content andsentiment analysis of Ebola virus on twitter and in the newsrdquoJournal of Information Science vol 42 no 6 pp 763ndash7812016

[9] B Subeno R Kusumaningrum and F Farikhin ldquoOptimi-sation towards latent Dirichlet allocation its topic numberand collapsed gibbs sampling inference processrdquo Interna-tional Journal of Electrical and Computer Engineering (IJECE)vol 8 no 5 pp 3204ndash3213 2018

[10] H Park T Park and Y-S Lee ldquoPartially collapsed gibbssampling for latent Dirichlet allocationrdquo Expert Systems withApplications vol 131 pp 208ndash218 2019

[11] D M Blei A Y Ng and M I Jordan ldquoLatentDirichlet allocationrdquo Journal of Machine Learning Researchvol 3 pp 993ndash1022 2003

[12] X Cheng X Yan Y Lan and J Guo ldquoBTM topic modelingover short textsrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 12 pp 2928ndash2941 2014

[13] J Rashid S M A Shah and A Irtaza ldquoFuzzy topic modelingapproach for text mining over short textrdquo Information Pro-cessing amp Management vol 56 no 6 pp 1ndash19 2019

[14] H-Y Lu N Kang Y Li Q-Y Zhan J-Y Xie andC-J Wang ldquoUtilizing recurrent neural network for topicdiscovery in short text scenariosrdquo Intelligent Data Analysisvol 23 no 2 pp 259ndash277 2019

[15] L Zhu H Xu Y Xu et al ldquoA joint model of extended LDAand IBTM over streaming Chinese short textsrdquo IntelligentData Analysis vol 23 no 3 pp 681ndash699 2019

[16] M Tang J Jin Y Liu et al ldquoIntegrating topic sentiment andsyntax for modeling online product reviews a topic modelapproachrdquo Journal of Computing and Information Science inEngineering vol 19 no 1 pp 1ndash12 2019

[17] R K Amplayo S Lee and M Song ldquoIncorporating productdescription to sentiment topic models for improved aspect-based sentiment analysisrdquo Information Sciences vol 454-455pp 200ndash215 2018

[18] L Yao Y Zhang B Wei et al ldquoConcept over time thecombination of probabilistic topic model with Wikipediaknowledgerdquo Expert Systems with Applications vol 60pp 27ndash38 2016

[19] S Park W Lee and I-C Moon ldquoAssociative topic modelswith numerical time seriesrdquo Information Processing ampManagement vol 51 no 5 pp 737ndash755 2015

[20] P Lorenz-Spreen F Wolf J Braun et al ldquoTracking onlinetopics over time understanding dynamic hashtag commu-nitiesrdquo Computational Social Networks vol 5 no 1 pp 1ndash182018

[21] Y He C Lin W Gao et al ldquoDynamic joint sentiment-topicmodelrdquo ACM Transactions on Intelligent Systems amp Tech-nology vol 5 no 1 pp 1ndash21 2013

[22] L Kuo and T Y Yang ldquoAn improved collapsed Gibbssampler for Dirichlet process mixing modelsrdquo ComputationalStatistics amp Data Analysis vol 50 no 3 pp 659ndash674 2006

[23] Y Papanikolaou J R Foulds T N Rubin et al ldquoDensedistributions from sparse samples improved Gibbs samplingparameter estimators for LDArdquo Statistics vol 18 no 62pp 1ndash58 2015

[24] X Zhou J Ouyang and X Li ldquoTwo time-efficient Gibbssampling inference algorithms for biterm topic modelrdquo Ap-plied Intelligence vol 48 no 3 pp 730ndash754 2018

[25] P Bhuyan ldquoEstimation of random-effects model for longi-tudinal data with non ignorable missingness using Gibbssamplingrdquo Computational Statistics vol 34 no 4pp 1963ndash1710 2019

[26] C Lin Y He R Everson and S Ruger ldquoWeakly supervisedjoint sentiment-topic detection from textrdquo IEEE Transactionson Knowledge and Data Engineering vol 24 no 6pp 1134ndash1145 2012

12 Complexity

[27] A Daud and F Muhammad ldquoGroup topic modeling foracademic knowledge discoveryrdquo Applied Intelligence vol 36no 4 pp 870ndash886 2012

[28] Z Huang J Tang G Shan J Ni Y Chen and C Wang ldquoAnefficient passenger-hunting recommendation framework withmulti-task deep learningrdquo IEEE Internet of Dings Journalvol 6 no 5 pp 7713ndash7721 2019

[29] S MMohammad S Kiritchenko and X Zhu ldquoNRC-Canadabuilding the state-of-the-art in sentiment analysis of tweetsrdquoin Proceedings of the Seventh International Workshop onSemantic Evaluation Exercises (SemEval-2013) SpringerAtlanta GA USA pp 1ndash5 June 2013

[30] T Chen Q Li J Yang G Cong and G Li ldquoModeling of thepublic opinion polarization process with the considerations ofindividual heterogeneity and dynamic conformityrdquo Mathe-matics vol 7 no 10 p 917 2019

[31] S A Curiskis B Drake T R Osborn and P J Kennedy ldquoAnevaluation of document clustering and topic modelling in twoonline social networks twitter and Redditrdquo InformationProcessing and Management vol 57 no 2 pp 1ndash21 2019

Complexity 13

Page 9: Research on Sentiment Tendency and Evolution of Public ...downloads.hindawi.com/journals/complexity/2020/9789431.pdf · , and the correlation distribution

statement about the case itself From the beginning of theevent the amount of discussion about the event on the socialnetwork rose sharply and then gradually declined Topic 2

is a discussion on the development of the case which hascaused the second hot discussion again +e time is notconsistent when two curves reach the high peak +e peak

0

04

02

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(a)

0

04

02

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(b)

0

02

04

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(c)

0

02

04

06

08

1

Prop

ortio

n

3 5 7 9 11 2515 17 19 21 23 271 13Time

Topic 1Topic 2

(d)

Figure 6 +e changing of topics in four datasets (a) Military parade in National Day (b) +e assault on a doctor (c) Hong Kongrsquos event(d) Garbage sorting in Shanghai

Complexity 9

3 5 7 9 2113 15 17 19 23 27251 11Time

00102030405060708

Prop

ortio

n

HappySurprise

SadAngry

(a)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

06

07

Prop

ortio

n

(b)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

0

01

02

03

04

05

Prop

ortio

n

HappySurprise

SadAngry

(c)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

Prop

ortio

n

(d)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

0

01

02

03

04

05

Prop

ortio

n

HappySurprise

SadAngry

(e)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

Prop

ortio

n

(f )

Figure 7 Continued

10 Complexity

value of topic 2 curve is lower than that of topic 1 whichreflects the discussion of the same event will fade over timeEven if there is a new topic discussion of a new topic is farlower than the beginning of the event Meanwhile similarresults can be verified in other three datasets

+e proportion of the sentiment polarity in the fourdatasets is shown in Figure 7 Since the sentiment polarityproportion is measured the four sentiment polarities arebalanced distributed before the occurrence of the eventsAfter the event occurred the polarity of positive and neg-ative sentiment began to change toward the two extremesAmong the four datasets the positive sentiment was higherthan the negative sentiment in the first dataset ldquoNational Daymilitary parade eventrdquo and the fourth dataset ldquoShanghaigarbage classification eventrdquo which also conforms to thesocial sentiment of events In addition it can be found thatthe difference among the four sentiment labels is large in theinitial stage and the distribution of sentiment labels be-comes stable in the later stage It has proved that the secondreport of social events does not cause the heat for the firsttime But the sentiment tendency judgment in social net-works will not decline sharply with the reduction of dis-cussion which can be proved in topic 2 of the seconddataset ldquo+e assault on a doctorrdquo Given that the topicfeatures come from the background of a corpus and containa lot of noise words the relative position of four curves iscloser in terms of sentiment polarity evolution Howeverthere is still a gap between government feelings which isdifferent from the average distribution of sentiment polarityat the beginning of the event

6 Discussion

From the perspective of theoretical significance this paperextends the LDA model to some extent First in view of thesparse matrix caused by the short text word pairs are in-troduced to replace a single word for text generationaccording to BTM Based on the hypotheses of JST and

ASUM the sentiment layer is introduced to form theBayesian network structure and the word pair is limited tothe same sentiment polarity distribution Second in order torealize dynamic analysis and text homogeneity the time-stamp and the corresponding superparameter are intro-duced to alleviate the problem of the word order in the textgeneration +ird this research combines behavioral ex-periments big data mining mathematical modeling andimitating to promote the research expansion of new situa-tions and new methods

From the perspective of practical significance this paperis of great value in tracking and monitoring topics of publicopinion in social networks+e online public opinions of hotevents can be monitored which contributes to accuratelyjudge social events and make emergency decisions forgovernment or departments In addition this paper ana-lyzed the evolution response and governance of publicopinion which is conducive to understand the formationmechanism and the collaborative evolution of publicopinion Meanwhile the use of public opinion informationcan detect and screen information prevent the spread ofrumor and scientifically formulate the mechanism of uti-lization to effectively reduce the loss risk

7 Conclusions

In the context of the mobile social network the number ofshort texts is growing explosively In order to extract in-formation from massive short texts quickly and monitorpublic opinions the TSTS model is proposed in the studybased on LDA BTM JST ASUM and TOT From theexperimental results the TSTS model achieves good per-formance In terms of topic feature extraction the degree ofperplexity of TSTS is always lower than LDA Moreoveralthough the degree of perplexity is slightly higher than BTMwith the increase of iteration times it can maintain thebalance with BTM In the sentiment analysis the effect ofTSTS was significantly better than JST and ASUM Finally

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

Prop

ortio

n

HappySurprise

SadAngry

04

035

03

025

02

015

01

(g)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

Prop

ortio

n

05045

04035

03025

02015

01

(h)

Figure 7+e changing of sentiment in four datasets (a) Military parade in National Day topic 1 (b) Military parade in National Day topic2 (c)+e assault on a doctor topic 1 (d)+e assault on a doctor topic 2 (e) Hong Kongrsquos event topic 1 (f ) Hong Kongrsquos event topic 2(g) Garbage sorting in Shanghai topic 1 (h) Garbage sorting in Shanghai topic 2

Complexity 11

the TSTS model incorporating the time factor can determinethe change trend of the topic and sentiment

+ere are still some shortcomings in this paper Firstlyfor the extraction of topic feature words the global topiclevel can be added to the topic layer of the TSTS model tofilter the common topic words Secondly in sentimentpolarity judgment the sentiment labels aremanually markedbased on prior knowledge However the sentiments areextremely rich and changeable In the future research theBayesian network and entity theory can be used to judgesentiment bias

Data Availability

+e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is work was supported in part by the Project of NationalScience and Technology Department (2018YFF0213102)Public Projects of Zhejiang Province (LGF18G010003LGF19G010002 and LGF20G010002) Science and TechnologyProject of Zhejiang Province (2020C01158) and First ClassDiscipline of Zhejiang - A (Zhejiang Gongshang University -Statistics)

References

[1] CNNIC ldquoStatistical report on internet development inChinardquo CNNIC Beijing China 2019 httpwwwcacgovcn2019-0830c_1124938750htm

[2] M Ingawale A Dutta R Roy and P Seetharaman ldquoNetworkanalysis of user generated content quality in wikipediardquoOnline Information Review vol 37 no 4 pp 602ndash619 2013

[3] Q Gao Y Tian and M Tu ldquoExploring factors influencingChinese userrsquos perceived credibility of health and safety in-formation onWeibordquo Computers in Human Behavior vol 45pp 21ndash31 2015

[4] M Steyvers and T Griffiths ldquoProbabilistic Topic ModelsrdquoHandbook of Latent Semantic Analysis Psychology Pressvol 427 no 7 pp 424ndash440 New York NY USA 2007

[5] J-F Yeh Y-S Tan and C-H Lee ldquoTopic detection andtracking for conversational content by using conceptual dy-namic latent Dirichlet allocationrdquo Neurocomputing vol 216pp 310ndash318 2016

[6] X Zhou and L Chen ldquoEvent detection over twitter socialmedia streamsrdquoDeVLDB Journal vol 23 no 3 pp 381ndash4002014

[7] Y J Du Y T Yi and X Y Li ldquoExtracting and tracking hottopics of micro-blogs based on improved LatentDirichlet allocationrdquo Engineering Applications of ArtificialIntelligence vol 87 pp 1ndash13 2020

[8] H J Kim Y K Jeong Y Kim et al ldquoTopic-based content andsentiment analysis of Ebola virus on twitter and in the newsrdquoJournal of Information Science vol 42 no 6 pp 763ndash7812016

[9] B Subeno R Kusumaningrum and F Farikhin ldquoOptimi-sation towards latent Dirichlet allocation its topic numberand collapsed gibbs sampling inference processrdquo Interna-tional Journal of Electrical and Computer Engineering (IJECE)vol 8 no 5 pp 3204ndash3213 2018

[10] H Park T Park and Y-S Lee ldquoPartially collapsed gibbssampling for latent Dirichlet allocationrdquo Expert Systems withApplications vol 131 pp 208ndash218 2019

[11] D M Blei A Y Ng and M I Jordan ldquoLatentDirichlet allocationrdquo Journal of Machine Learning Researchvol 3 pp 993ndash1022 2003

[12] X Cheng X Yan Y Lan and J Guo ldquoBTM topic modelingover short textsrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 12 pp 2928ndash2941 2014

[13] J Rashid S M A Shah and A Irtaza ldquoFuzzy topic modelingapproach for text mining over short textrdquo Information Pro-cessing amp Management vol 56 no 6 pp 1ndash19 2019

[14] H-Y Lu N Kang Y Li Q-Y Zhan J-Y Xie andC-J Wang ldquoUtilizing recurrent neural network for topicdiscovery in short text scenariosrdquo Intelligent Data Analysisvol 23 no 2 pp 259ndash277 2019

[15] L Zhu H Xu Y Xu et al ldquoA joint model of extended LDAand IBTM over streaming Chinese short textsrdquo IntelligentData Analysis vol 23 no 3 pp 681ndash699 2019

[16] M Tang J Jin Y Liu et al ldquoIntegrating topic sentiment andsyntax for modeling online product reviews a topic modelapproachrdquo Journal of Computing and Information Science inEngineering vol 19 no 1 pp 1ndash12 2019

[17] R K Amplayo S Lee and M Song ldquoIncorporating productdescription to sentiment topic models for improved aspect-based sentiment analysisrdquo Information Sciences vol 454-455pp 200ndash215 2018

[18] L Yao Y Zhang B Wei et al ldquoConcept over time thecombination of probabilistic topic model with Wikipediaknowledgerdquo Expert Systems with Applications vol 60pp 27ndash38 2016

[19] S Park W Lee and I-C Moon ldquoAssociative topic modelswith numerical time seriesrdquo Information Processing ampManagement vol 51 no 5 pp 737ndash755 2015

[20] P Lorenz-Spreen F Wolf J Braun et al ldquoTracking onlinetopics over time understanding dynamic hashtag commu-nitiesrdquo Computational Social Networks vol 5 no 1 pp 1ndash182018

[21] Y He C Lin W Gao et al ldquoDynamic joint sentiment-topicmodelrdquo ACM Transactions on Intelligent Systems amp Tech-nology vol 5 no 1 pp 1ndash21 2013

[22] L Kuo and T Y Yang ldquoAn improved collapsed Gibbssampler for Dirichlet process mixing modelsrdquo ComputationalStatistics amp Data Analysis vol 50 no 3 pp 659ndash674 2006

[23] Y Papanikolaou J R Foulds T N Rubin et al ldquoDensedistributions from sparse samples improved Gibbs samplingparameter estimators for LDArdquo Statistics vol 18 no 62pp 1ndash58 2015

[24] X Zhou J Ouyang and X Li ldquoTwo time-efficient Gibbssampling inference algorithms for biterm topic modelrdquo Ap-plied Intelligence vol 48 no 3 pp 730ndash754 2018

[25] P Bhuyan ldquoEstimation of random-effects model for longi-tudinal data with non ignorable missingness using Gibbssamplingrdquo Computational Statistics vol 34 no 4pp 1963ndash1710 2019

[26] C Lin Y He R Everson and S Ruger ldquoWeakly supervisedjoint sentiment-topic detection from textrdquo IEEE Transactionson Knowledge and Data Engineering vol 24 no 6pp 1134ndash1145 2012

12 Complexity

[27] A Daud and F Muhammad ldquoGroup topic modeling foracademic knowledge discoveryrdquo Applied Intelligence vol 36no 4 pp 870ndash886 2012

[28] Z Huang J Tang G Shan J Ni Y Chen and C Wang ldquoAnefficient passenger-hunting recommendation framework withmulti-task deep learningrdquo IEEE Internet of Dings Journalvol 6 no 5 pp 7713ndash7721 2019

[29] S MMohammad S Kiritchenko and X Zhu ldquoNRC-Canadabuilding the state-of-the-art in sentiment analysis of tweetsrdquoin Proceedings of the Seventh International Workshop onSemantic Evaluation Exercises (SemEval-2013) SpringerAtlanta GA USA pp 1ndash5 June 2013

[30] T Chen Q Li J Yang G Cong and G Li ldquoModeling of thepublic opinion polarization process with the considerations ofindividual heterogeneity and dynamic conformityrdquo Mathe-matics vol 7 no 10 p 917 2019

[31] S A Curiskis B Drake T R Osborn and P J Kennedy ldquoAnevaluation of document clustering and topic modelling in twoonline social networks twitter and Redditrdquo InformationProcessing and Management vol 57 no 2 pp 1ndash21 2019

Complexity 13

Page 10: Research on Sentiment Tendency and Evolution of Public ...downloads.hindawi.com/journals/complexity/2020/9789431.pdf · , and the correlation distribution

3 5 7 9 2113 15 17 19 23 27251 11Time

00102030405060708

Prop

ortio

n

HappySurprise

SadAngry

(a)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

06

07

Prop

ortio

n

(b)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

0

01

02

03

04

05

Prop

ortio

n

HappySurprise

SadAngry

(c)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

Prop

ortio

n

(d)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

0

01

02

03

04

05

Prop

ortio

n

HappySurprise

SadAngry

(e)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

0

01

02

03

04

05

Prop

ortio

n

(f )

Figure 7 Continued

10 Complexity

value of topic 2 curve is lower than that of topic 1 whichreflects the discussion of the same event will fade over timeEven if there is a new topic discussion of a new topic is farlower than the beginning of the event Meanwhile similarresults can be verified in other three datasets

+e proportion of the sentiment polarity in the fourdatasets is shown in Figure 7 Since the sentiment polarityproportion is measured the four sentiment polarities arebalanced distributed before the occurrence of the eventsAfter the event occurred the polarity of positive and neg-ative sentiment began to change toward the two extremesAmong the four datasets the positive sentiment was higherthan the negative sentiment in the first dataset ldquoNational Daymilitary parade eventrdquo and the fourth dataset ldquoShanghaigarbage classification eventrdquo which also conforms to thesocial sentiment of events In addition it can be found thatthe difference among the four sentiment labels is large in theinitial stage and the distribution of sentiment labels be-comes stable in the later stage It has proved that the secondreport of social events does not cause the heat for the firsttime But the sentiment tendency judgment in social net-works will not decline sharply with the reduction of dis-cussion which can be proved in topic 2 of the seconddataset ldquo+e assault on a doctorrdquo Given that the topicfeatures come from the background of a corpus and containa lot of noise words the relative position of four curves iscloser in terms of sentiment polarity evolution Howeverthere is still a gap between government feelings which isdifferent from the average distribution of sentiment polarityat the beginning of the event

6 Discussion

From the perspective of theoretical significance this paperextends the LDA model to some extent First in view of thesparse matrix caused by the short text word pairs are in-troduced to replace a single word for text generationaccording to BTM Based on the hypotheses of JST and

ASUM the sentiment layer is introduced to form theBayesian network structure and the word pair is limited tothe same sentiment polarity distribution Second in order torealize dynamic analysis and text homogeneity the time-stamp and the corresponding superparameter are intro-duced to alleviate the problem of the word order in the textgeneration +ird this research combines behavioral ex-periments big data mining mathematical modeling andimitating to promote the research expansion of new situa-tions and new methods

From the perspective of practical significance this paperis of great value in tracking and monitoring topics of publicopinion in social networks+e online public opinions of hotevents can be monitored which contributes to accuratelyjudge social events and make emergency decisions forgovernment or departments In addition this paper ana-lyzed the evolution response and governance of publicopinion which is conducive to understand the formationmechanism and the collaborative evolution of publicopinion Meanwhile the use of public opinion informationcan detect and screen information prevent the spread ofrumor and scientifically formulate the mechanism of uti-lization to effectively reduce the loss risk

7 Conclusions

In the context of the mobile social network the number ofshort texts is growing explosively In order to extract in-formation from massive short texts quickly and monitorpublic opinions the TSTS model is proposed in the studybased on LDA BTM JST ASUM and TOT From theexperimental results the TSTS model achieves good per-formance In terms of topic feature extraction the degree ofperplexity of TSTS is always lower than LDA Moreoveralthough the degree of perplexity is slightly higher than BTMwith the increase of iteration times it can maintain thebalance with BTM In the sentiment analysis the effect ofTSTS was significantly better than JST and ASUM Finally

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

Prop

ortio

n

HappySurprise

SadAngry

04

035

03

025

02

015

01

(g)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

Prop

ortio

n

05045

04035

03025

02015

01

(h)

Figure 7+e changing of sentiment in four datasets (a) Military parade in National Day topic 1 (b) Military parade in National Day topic2 (c)+e assault on a doctor topic 1 (d)+e assault on a doctor topic 2 (e) Hong Kongrsquos event topic 1 (f ) Hong Kongrsquos event topic 2(g) Garbage sorting in Shanghai topic 1 (h) Garbage sorting in Shanghai topic 2

Complexity 11

the TSTS model incorporating the time factor can determinethe change trend of the topic and sentiment

+ere are still some shortcomings in this paper Firstlyfor the extraction of topic feature words the global topiclevel can be added to the topic layer of the TSTS model tofilter the common topic words Secondly in sentimentpolarity judgment the sentiment labels aremanually markedbased on prior knowledge However the sentiments areextremely rich and changeable In the future research theBayesian network and entity theory can be used to judgesentiment bias

Data Availability

+e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is work was supported in part by the Project of NationalScience and Technology Department (2018YFF0213102)Public Projects of Zhejiang Province (LGF18G010003LGF19G010002 and LGF20G010002) Science and TechnologyProject of Zhejiang Province (2020C01158) and First ClassDiscipline of Zhejiang - A (Zhejiang Gongshang University -Statistics)

References

[1] CNNIC ldquoStatistical report on internet development inChinardquo CNNIC Beijing China 2019 httpwwwcacgovcn2019-0830c_1124938750htm

[2] M Ingawale A Dutta R Roy and P Seetharaman ldquoNetworkanalysis of user generated content quality in wikipediardquoOnline Information Review vol 37 no 4 pp 602ndash619 2013

[3] Q Gao Y Tian and M Tu ldquoExploring factors influencingChinese userrsquos perceived credibility of health and safety in-formation onWeibordquo Computers in Human Behavior vol 45pp 21ndash31 2015

[4] M Steyvers and T Griffiths ldquoProbabilistic Topic ModelsrdquoHandbook of Latent Semantic Analysis Psychology Pressvol 427 no 7 pp 424ndash440 New York NY USA 2007

[5] J-F Yeh Y-S Tan and C-H Lee ldquoTopic detection andtracking for conversational content by using conceptual dy-namic latent Dirichlet allocationrdquo Neurocomputing vol 216pp 310ndash318 2016

[6] X Zhou and L Chen ldquoEvent detection over twitter socialmedia streamsrdquoDeVLDB Journal vol 23 no 3 pp 381ndash4002014

[7] Y J Du Y T Yi and X Y Li ldquoExtracting and tracking hottopics of micro-blogs based on improved LatentDirichlet allocationrdquo Engineering Applications of ArtificialIntelligence vol 87 pp 1ndash13 2020

[8] H J Kim Y K Jeong Y Kim et al ldquoTopic-based content andsentiment analysis of Ebola virus on twitter and in the newsrdquoJournal of Information Science vol 42 no 6 pp 763ndash7812016

[9] B Subeno R Kusumaningrum and F Farikhin ldquoOptimi-sation towards latent Dirichlet allocation its topic numberand collapsed gibbs sampling inference processrdquo Interna-tional Journal of Electrical and Computer Engineering (IJECE)vol 8 no 5 pp 3204ndash3213 2018

[10] H Park T Park and Y-S Lee ldquoPartially collapsed gibbssampling for latent Dirichlet allocationrdquo Expert Systems withApplications vol 131 pp 208ndash218 2019

[11] D M Blei A Y Ng and M I Jordan ldquoLatentDirichlet allocationrdquo Journal of Machine Learning Researchvol 3 pp 993ndash1022 2003

[12] X Cheng X Yan Y Lan and J Guo ldquoBTM topic modelingover short textsrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 12 pp 2928ndash2941 2014

[13] J Rashid S M A Shah and A Irtaza ldquoFuzzy topic modelingapproach for text mining over short textrdquo Information Pro-cessing amp Management vol 56 no 6 pp 1ndash19 2019

[14] H-Y Lu N Kang Y Li Q-Y Zhan J-Y Xie andC-J Wang ldquoUtilizing recurrent neural network for topicdiscovery in short text scenariosrdquo Intelligent Data Analysisvol 23 no 2 pp 259ndash277 2019

[15] L Zhu H Xu Y Xu et al ldquoA joint model of extended LDAand IBTM over streaming Chinese short textsrdquo IntelligentData Analysis vol 23 no 3 pp 681ndash699 2019

[16] M Tang J Jin Y Liu et al ldquoIntegrating topic sentiment andsyntax for modeling online product reviews a topic modelapproachrdquo Journal of Computing and Information Science inEngineering vol 19 no 1 pp 1ndash12 2019

[17] R K Amplayo S Lee and M Song ldquoIncorporating productdescription to sentiment topic models for improved aspect-based sentiment analysisrdquo Information Sciences vol 454-455pp 200ndash215 2018

[18] L Yao Y Zhang B Wei et al ldquoConcept over time thecombination of probabilistic topic model with Wikipediaknowledgerdquo Expert Systems with Applications vol 60pp 27ndash38 2016

[19] S Park W Lee and I-C Moon ldquoAssociative topic modelswith numerical time seriesrdquo Information Processing ampManagement vol 51 no 5 pp 737ndash755 2015

[20] P Lorenz-Spreen F Wolf J Braun et al ldquoTracking onlinetopics over time understanding dynamic hashtag commu-nitiesrdquo Computational Social Networks vol 5 no 1 pp 1ndash182018

[21] Y He C Lin W Gao et al ldquoDynamic joint sentiment-topicmodelrdquo ACM Transactions on Intelligent Systems amp Tech-nology vol 5 no 1 pp 1ndash21 2013

[22] L Kuo and T Y Yang ldquoAn improved collapsed Gibbssampler for Dirichlet process mixing modelsrdquo ComputationalStatistics amp Data Analysis vol 50 no 3 pp 659ndash674 2006

[23] Y Papanikolaou J R Foulds T N Rubin et al ldquoDensedistributions from sparse samples improved Gibbs samplingparameter estimators for LDArdquo Statistics vol 18 no 62pp 1ndash58 2015

[24] X Zhou J Ouyang and X Li ldquoTwo time-efficient Gibbssampling inference algorithms for biterm topic modelrdquo Ap-plied Intelligence vol 48 no 3 pp 730ndash754 2018

[25] P Bhuyan ldquoEstimation of random-effects model for longi-tudinal data with non ignorable missingness using Gibbssamplingrdquo Computational Statistics vol 34 no 4pp 1963ndash1710 2019

[26] C Lin Y He R Everson and S Ruger ldquoWeakly supervisedjoint sentiment-topic detection from textrdquo IEEE Transactionson Knowledge and Data Engineering vol 24 no 6pp 1134ndash1145 2012

12 Complexity

[27] A Daud and F Muhammad ldquoGroup topic modeling foracademic knowledge discoveryrdquo Applied Intelligence vol 36no 4 pp 870ndash886 2012

[28] Z Huang J Tang G Shan J Ni Y Chen and C Wang ldquoAnefficient passenger-hunting recommendation framework withmulti-task deep learningrdquo IEEE Internet of Dings Journalvol 6 no 5 pp 7713ndash7721 2019

[29] S MMohammad S Kiritchenko and X Zhu ldquoNRC-Canadabuilding the state-of-the-art in sentiment analysis of tweetsrdquoin Proceedings of the Seventh International Workshop onSemantic Evaluation Exercises (SemEval-2013) SpringerAtlanta GA USA pp 1ndash5 June 2013

[30] T Chen Q Li J Yang G Cong and G Li ldquoModeling of thepublic opinion polarization process with the considerations ofindividual heterogeneity and dynamic conformityrdquo Mathe-matics vol 7 no 10 p 917 2019

[31] S A Curiskis B Drake T R Osborn and P J Kennedy ldquoAnevaluation of document clustering and topic modelling in twoonline social networks twitter and Redditrdquo InformationProcessing and Management vol 57 no 2 pp 1ndash21 2019

Complexity 13

Page 11: Research on Sentiment Tendency and Evolution of Public ...downloads.hindawi.com/journals/complexity/2020/9789431.pdf · , and the correlation distribution

value of topic 2 curve is lower than that of topic 1 whichreflects the discussion of the same event will fade over timeEven if there is a new topic discussion of a new topic is farlower than the beginning of the event Meanwhile similarresults can be verified in other three datasets

+e proportion of the sentiment polarity in the fourdatasets is shown in Figure 7 Since the sentiment polarityproportion is measured the four sentiment polarities arebalanced distributed before the occurrence of the eventsAfter the event occurred the polarity of positive and neg-ative sentiment began to change toward the two extremesAmong the four datasets the positive sentiment was higherthan the negative sentiment in the first dataset ldquoNational Daymilitary parade eventrdquo and the fourth dataset ldquoShanghaigarbage classification eventrdquo which also conforms to thesocial sentiment of events In addition it can be found thatthe difference among the four sentiment labels is large in theinitial stage and the distribution of sentiment labels be-comes stable in the later stage It has proved that the secondreport of social events does not cause the heat for the firsttime But the sentiment tendency judgment in social net-works will not decline sharply with the reduction of dis-cussion which can be proved in topic 2 of the seconddataset ldquo+e assault on a doctorrdquo Given that the topicfeatures come from the background of a corpus and containa lot of noise words the relative position of four curves iscloser in terms of sentiment polarity evolution Howeverthere is still a gap between government feelings which isdifferent from the average distribution of sentiment polarityat the beginning of the event

6 Discussion

From the perspective of theoretical significance this paperextends the LDA model to some extent First in view of thesparse matrix caused by the short text word pairs are in-troduced to replace a single word for text generationaccording to BTM Based on the hypotheses of JST and

ASUM the sentiment layer is introduced to form theBayesian network structure and the word pair is limited tothe same sentiment polarity distribution Second in order torealize dynamic analysis and text homogeneity the time-stamp and the corresponding superparameter are intro-duced to alleviate the problem of the word order in the textgeneration +ird this research combines behavioral ex-periments big data mining mathematical modeling andimitating to promote the research expansion of new situa-tions and new methods

From the perspective of practical significance this paperis of great value in tracking and monitoring topics of publicopinion in social networks+e online public opinions of hotevents can be monitored which contributes to accuratelyjudge social events and make emergency decisions forgovernment or departments In addition this paper ana-lyzed the evolution response and governance of publicopinion which is conducive to understand the formationmechanism and the collaborative evolution of publicopinion Meanwhile the use of public opinion informationcan detect and screen information prevent the spread ofrumor and scientifically formulate the mechanism of uti-lization to effectively reduce the loss risk

7 Conclusions

In the context of the mobile social network the number ofshort texts is growing explosively In order to extract in-formation from massive short texts quickly and monitorpublic opinions the TSTS model is proposed in the studybased on LDA BTM JST ASUM and TOT From theexperimental results the TSTS model achieves good per-formance In terms of topic feature extraction the degree ofperplexity of TSTS is always lower than LDA Moreoveralthough the degree of perplexity is slightly higher than BTMwith the increase of iteration times it can maintain thebalance with BTM In the sentiment analysis the effect ofTSTS was significantly better than JST and ASUM Finally

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

Prop

ortio

n

HappySurprise

SadAngry

04

035

03

025

02

015

01

(g)

1 3 5 7 9 11 13 15 17 19 21 23 25 27Time

HappySurprise

SadAngry

Prop

ortio

n

05045

04035

03025

02015

01

(h)

Figure 7+e changing of sentiment in four datasets (a) Military parade in National Day topic 1 (b) Military parade in National Day topic2 (c)+e assault on a doctor topic 1 (d)+e assault on a doctor topic 2 (e) Hong Kongrsquos event topic 1 (f ) Hong Kongrsquos event topic 2(g) Garbage sorting in Shanghai topic 1 (h) Garbage sorting in Shanghai topic 2

Complexity 11

the TSTS model incorporating the time factor can determinethe change trend of the topic and sentiment

+ere are still some shortcomings in this paper Firstlyfor the extraction of topic feature words the global topiclevel can be added to the topic layer of the TSTS model tofilter the common topic words Secondly in sentimentpolarity judgment the sentiment labels aremanually markedbased on prior knowledge However the sentiments areextremely rich and changeable In the future research theBayesian network and entity theory can be used to judgesentiment bias

Data Availability

+e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is work was supported in part by the Project of NationalScience and Technology Department (2018YFF0213102)Public Projects of Zhejiang Province (LGF18G010003LGF19G010002 and LGF20G010002) Science and TechnologyProject of Zhejiang Province (2020C01158) and First ClassDiscipline of Zhejiang - A (Zhejiang Gongshang University -Statistics)

References

[1] CNNIC ldquoStatistical report on internet development inChinardquo CNNIC Beijing China 2019 httpwwwcacgovcn2019-0830c_1124938750htm

[2] M Ingawale A Dutta R Roy and P Seetharaman ldquoNetworkanalysis of user generated content quality in wikipediardquoOnline Information Review vol 37 no 4 pp 602ndash619 2013

[3] Q Gao Y Tian and M Tu ldquoExploring factors influencingChinese userrsquos perceived credibility of health and safety in-formation onWeibordquo Computers in Human Behavior vol 45pp 21ndash31 2015

[4] M Steyvers and T Griffiths ldquoProbabilistic Topic ModelsrdquoHandbook of Latent Semantic Analysis Psychology Pressvol 427 no 7 pp 424ndash440 New York NY USA 2007

[5] J-F Yeh Y-S Tan and C-H Lee ldquoTopic detection andtracking for conversational content by using conceptual dy-namic latent Dirichlet allocationrdquo Neurocomputing vol 216pp 310ndash318 2016

[6] X Zhou and L Chen ldquoEvent detection over twitter socialmedia streamsrdquoDeVLDB Journal vol 23 no 3 pp 381ndash4002014

[7] Y J Du Y T Yi and X Y Li ldquoExtracting and tracking hottopics of micro-blogs based on improved LatentDirichlet allocationrdquo Engineering Applications of ArtificialIntelligence vol 87 pp 1ndash13 2020

[8] H J Kim Y K Jeong Y Kim et al ldquoTopic-based content andsentiment analysis of Ebola virus on twitter and in the newsrdquoJournal of Information Science vol 42 no 6 pp 763ndash7812016

[9] B Subeno R Kusumaningrum and F Farikhin ldquoOptimi-sation towards latent Dirichlet allocation its topic numberand collapsed gibbs sampling inference processrdquo Interna-tional Journal of Electrical and Computer Engineering (IJECE)vol 8 no 5 pp 3204ndash3213 2018

[10] H Park T Park and Y-S Lee ldquoPartially collapsed gibbssampling for latent Dirichlet allocationrdquo Expert Systems withApplications vol 131 pp 208ndash218 2019

[11] D M Blei A Y Ng and M I Jordan ldquoLatentDirichlet allocationrdquo Journal of Machine Learning Researchvol 3 pp 993ndash1022 2003

[12] X Cheng X Yan Y Lan and J Guo ldquoBTM topic modelingover short textsrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 12 pp 2928ndash2941 2014

[13] J Rashid S M A Shah and A Irtaza ldquoFuzzy topic modelingapproach for text mining over short textrdquo Information Pro-cessing amp Management vol 56 no 6 pp 1ndash19 2019

[14] H-Y Lu N Kang Y Li Q-Y Zhan J-Y Xie andC-J Wang ldquoUtilizing recurrent neural network for topicdiscovery in short text scenariosrdquo Intelligent Data Analysisvol 23 no 2 pp 259ndash277 2019

[15] L Zhu H Xu Y Xu et al ldquoA joint model of extended LDAand IBTM over streaming Chinese short textsrdquo IntelligentData Analysis vol 23 no 3 pp 681ndash699 2019

[16] M Tang J Jin Y Liu et al ldquoIntegrating topic sentiment andsyntax for modeling online product reviews a topic modelapproachrdquo Journal of Computing and Information Science inEngineering vol 19 no 1 pp 1ndash12 2019

[17] R K Amplayo S Lee and M Song ldquoIncorporating productdescription to sentiment topic models for improved aspect-based sentiment analysisrdquo Information Sciences vol 454-455pp 200ndash215 2018

[18] L Yao Y Zhang B Wei et al ldquoConcept over time thecombination of probabilistic topic model with Wikipediaknowledgerdquo Expert Systems with Applications vol 60pp 27ndash38 2016

[19] S Park W Lee and I-C Moon ldquoAssociative topic modelswith numerical time seriesrdquo Information Processing ampManagement vol 51 no 5 pp 737ndash755 2015

[20] P Lorenz-Spreen F Wolf J Braun et al ldquoTracking onlinetopics over time understanding dynamic hashtag commu-nitiesrdquo Computational Social Networks vol 5 no 1 pp 1ndash182018

[21] Y He C Lin W Gao et al ldquoDynamic joint sentiment-topicmodelrdquo ACM Transactions on Intelligent Systems amp Tech-nology vol 5 no 1 pp 1ndash21 2013

[22] L Kuo and T Y Yang ldquoAn improved collapsed Gibbssampler for Dirichlet process mixing modelsrdquo ComputationalStatistics amp Data Analysis vol 50 no 3 pp 659ndash674 2006

[23] Y Papanikolaou J R Foulds T N Rubin et al ldquoDensedistributions from sparse samples improved Gibbs samplingparameter estimators for LDArdquo Statistics vol 18 no 62pp 1ndash58 2015

[24] X Zhou J Ouyang and X Li ldquoTwo time-efficient Gibbssampling inference algorithms for biterm topic modelrdquo Ap-plied Intelligence vol 48 no 3 pp 730ndash754 2018

[25] P Bhuyan ldquoEstimation of random-effects model for longi-tudinal data with non ignorable missingness using Gibbssamplingrdquo Computational Statistics vol 34 no 4pp 1963ndash1710 2019

[26] C Lin Y He R Everson and S Ruger ldquoWeakly supervisedjoint sentiment-topic detection from textrdquo IEEE Transactionson Knowledge and Data Engineering vol 24 no 6pp 1134ndash1145 2012

12 Complexity

[27] A Daud and F Muhammad ldquoGroup topic modeling foracademic knowledge discoveryrdquo Applied Intelligence vol 36no 4 pp 870ndash886 2012

[28] Z Huang J Tang G Shan J Ni Y Chen and C Wang ldquoAnefficient passenger-hunting recommendation framework withmulti-task deep learningrdquo IEEE Internet of Dings Journalvol 6 no 5 pp 7713ndash7721 2019

[29] S MMohammad S Kiritchenko and X Zhu ldquoNRC-Canadabuilding the state-of-the-art in sentiment analysis of tweetsrdquoin Proceedings of the Seventh International Workshop onSemantic Evaluation Exercises (SemEval-2013) SpringerAtlanta GA USA pp 1ndash5 June 2013

[30] T Chen Q Li J Yang G Cong and G Li ldquoModeling of thepublic opinion polarization process with the considerations ofindividual heterogeneity and dynamic conformityrdquo Mathe-matics vol 7 no 10 p 917 2019

[31] S A Curiskis B Drake T R Osborn and P J Kennedy ldquoAnevaluation of document clustering and topic modelling in twoonline social networks twitter and Redditrdquo InformationProcessing and Management vol 57 no 2 pp 1ndash21 2019

Complexity 13

Page 12: Research on Sentiment Tendency and Evolution of Public ...downloads.hindawi.com/journals/complexity/2020/9789431.pdf · , and the correlation distribution

the TSTS model incorporating the time factor can determinethe change trend of the topic and sentiment

+ere are still some shortcomings in this paper Firstlyfor the extraction of topic feature words the global topiclevel can be added to the topic layer of the TSTS model tofilter the common topic words Secondly in sentimentpolarity judgment the sentiment labels aremanually markedbased on prior knowledge However the sentiments areextremely rich and changeable In the future research theBayesian network and entity theory can be used to judgesentiment bias

Data Availability

+e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is work was supported in part by the Project of NationalScience and Technology Department (2018YFF0213102)Public Projects of Zhejiang Province (LGF18G010003LGF19G010002 and LGF20G010002) Science and TechnologyProject of Zhejiang Province (2020C01158) and First ClassDiscipline of Zhejiang - A (Zhejiang Gongshang University -Statistics)

References

[1] CNNIC ldquoStatistical report on internet development inChinardquo CNNIC Beijing China 2019 httpwwwcacgovcn2019-0830c_1124938750htm

[2] M Ingawale A Dutta R Roy and P Seetharaman ldquoNetworkanalysis of user generated content quality in wikipediardquoOnline Information Review vol 37 no 4 pp 602ndash619 2013

[3] Q Gao Y Tian and M Tu ldquoExploring factors influencingChinese userrsquos perceived credibility of health and safety in-formation onWeibordquo Computers in Human Behavior vol 45pp 21ndash31 2015

[4] M Steyvers and T Griffiths ldquoProbabilistic Topic ModelsrdquoHandbook of Latent Semantic Analysis Psychology Pressvol 427 no 7 pp 424ndash440 New York NY USA 2007

[5] J-F Yeh Y-S Tan and C-H Lee ldquoTopic detection andtracking for conversational content by using conceptual dy-namic latent Dirichlet allocationrdquo Neurocomputing vol 216pp 310ndash318 2016

[6] X Zhou and L Chen ldquoEvent detection over twitter socialmedia streamsrdquoDeVLDB Journal vol 23 no 3 pp 381ndash4002014

[7] Y J Du Y T Yi and X Y Li ldquoExtracting and tracking hottopics of micro-blogs based on improved LatentDirichlet allocationrdquo Engineering Applications of ArtificialIntelligence vol 87 pp 1ndash13 2020

[8] H J Kim Y K Jeong Y Kim et al ldquoTopic-based content andsentiment analysis of Ebola virus on twitter and in the newsrdquoJournal of Information Science vol 42 no 6 pp 763ndash7812016

[9] B Subeno R Kusumaningrum and F Farikhin ldquoOptimi-sation towards latent Dirichlet allocation its topic numberand collapsed gibbs sampling inference processrdquo Interna-tional Journal of Electrical and Computer Engineering (IJECE)vol 8 no 5 pp 3204ndash3213 2018

[10] H Park T Park and Y-S Lee ldquoPartially collapsed gibbssampling for latent Dirichlet allocationrdquo Expert Systems withApplications vol 131 pp 208ndash218 2019

[11] D M Blei A Y Ng and M I Jordan ldquoLatentDirichlet allocationrdquo Journal of Machine Learning Researchvol 3 pp 993ndash1022 2003

[12] X Cheng X Yan Y Lan and J Guo ldquoBTM topic modelingover short textsrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 12 pp 2928ndash2941 2014

[13] J Rashid S M A Shah and A Irtaza ldquoFuzzy topic modelingapproach for text mining over short textrdquo Information Pro-cessing amp Management vol 56 no 6 pp 1ndash19 2019

[14] H-Y Lu N Kang Y Li Q-Y Zhan J-Y Xie andC-J Wang ldquoUtilizing recurrent neural network for topicdiscovery in short text scenariosrdquo Intelligent Data Analysisvol 23 no 2 pp 259ndash277 2019

[15] L Zhu H Xu Y Xu et al ldquoA joint model of extended LDAand IBTM over streaming Chinese short textsrdquo IntelligentData Analysis vol 23 no 3 pp 681ndash699 2019

[16] M Tang J Jin Y Liu et al ldquoIntegrating topic sentiment andsyntax for modeling online product reviews a topic modelapproachrdquo Journal of Computing and Information Science inEngineering vol 19 no 1 pp 1ndash12 2019

[17] R K Amplayo S Lee and M Song ldquoIncorporating productdescription to sentiment topic models for improved aspect-based sentiment analysisrdquo Information Sciences vol 454-455pp 200ndash215 2018

[18] L Yao Y Zhang B Wei et al ldquoConcept over time thecombination of probabilistic topic model with Wikipediaknowledgerdquo Expert Systems with Applications vol 60pp 27ndash38 2016

[19] S Park W Lee and I-C Moon ldquoAssociative topic modelswith numerical time seriesrdquo Information Processing ampManagement vol 51 no 5 pp 737ndash755 2015

[20] P Lorenz-Spreen F Wolf J Braun et al ldquoTracking onlinetopics over time understanding dynamic hashtag commu-nitiesrdquo Computational Social Networks vol 5 no 1 pp 1ndash182018

[21] Y He C Lin W Gao et al ldquoDynamic joint sentiment-topicmodelrdquo ACM Transactions on Intelligent Systems amp Tech-nology vol 5 no 1 pp 1ndash21 2013

[22] L Kuo and T Y Yang ldquoAn improved collapsed Gibbssampler for Dirichlet process mixing modelsrdquo ComputationalStatistics amp Data Analysis vol 50 no 3 pp 659ndash674 2006

[23] Y Papanikolaou J R Foulds T N Rubin et al ldquoDensedistributions from sparse samples improved Gibbs samplingparameter estimators for LDArdquo Statistics vol 18 no 62pp 1ndash58 2015

[24] X Zhou J Ouyang and X Li ldquoTwo time-efficient Gibbssampling inference algorithms for biterm topic modelrdquo Ap-plied Intelligence vol 48 no 3 pp 730ndash754 2018

[25] P Bhuyan ldquoEstimation of random-effects model for longi-tudinal data with non ignorable missingness using Gibbssamplingrdquo Computational Statistics vol 34 no 4pp 1963ndash1710 2019

[26] C Lin Y He R Everson and S Ruger ldquoWeakly supervisedjoint sentiment-topic detection from textrdquo IEEE Transactionson Knowledge and Data Engineering vol 24 no 6pp 1134ndash1145 2012

12 Complexity

[27] A Daud and F Muhammad ldquoGroup topic modeling foracademic knowledge discoveryrdquo Applied Intelligence vol 36no 4 pp 870ndash886 2012

[28] Z Huang J Tang G Shan J Ni Y Chen and C Wang ldquoAnefficient passenger-hunting recommendation framework withmulti-task deep learningrdquo IEEE Internet of Dings Journalvol 6 no 5 pp 7713ndash7721 2019

[29] S MMohammad S Kiritchenko and X Zhu ldquoNRC-Canadabuilding the state-of-the-art in sentiment analysis of tweetsrdquoin Proceedings of the Seventh International Workshop onSemantic Evaluation Exercises (SemEval-2013) SpringerAtlanta GA USA pp 1ndash5 June 2013

[30] T Chen Q Li J Yang G Cong and G Li ldquoModeling of thepublic opinion polarization process with the considerations ofindividual heterogeneity and dynamic conformityrdquo Mathe-matics vol 7 no 10 p 917 2019

[31] S A Curiskis B Drake T R Osborn and P J Kennedy ldquoAnevaluation of document clustering and topic modelling in twoonline social networks twitter and Redditrdquo InformationProcessing and Management vol 57 no 2 pp 1ndash21 2019

Complexity 13

Page 13: Research on Sentiment Tendency and Evolution of Public ...downloads.hindawi.com/journals/complexity/2020/9789431.pdf · , and the correlation distribution

[27] A Daud and F Muhammad ldquoGroup topic modeling foracademic knowledge discoveryrdquo Applied Intelligence vol 36no 4 pp 870ndash886 2012

[28] Z Huang J Tang G Shan J Ni Y Chen and C Wang ldquoAnefficient passenger-hunting recommendation framework withmulti-task deep learningrdquo IEEE Internet of Dings Journalvol 6 no 5 pp 7713ndash7721 2019

[29] S MMohammad S Kiritchenko and X Zhu ldquoNRC-Canadabuilding the state-of-the-art in sentiment analysis of tweetsrdquoin Proceedings of the Seventh International Workshop onSemantic Evaluation Exercises (SemEval-2013) SpringerAtlanta GA USA pp 1ndash5 June 2013

[30] T Chen Q Li J Yang G Cong and G Li ldquoModeling of thepublic opinion polarization process with the considerations ofindividual heterogeneity and dynamic conformityrdquo Mathe-matics vol 7 no 10 p 917 2019

[31] S A Curiskis B Drake T R Osborn and P J Kennedy ldquoAnevaluation of document clustering and topic modelling in twoonline social networks twitter and Redditrdquo InformationProcessing and Management vol 57 no 2 pp 1ndash21 2019

Complexity 13