Semi-supervised learning for big social data analysis · Big social data analysis Sentiment analysis a b s t r a c t anera of mediasocial connectivity, weband users becoming enthusiastic

Neurocomputing 275 (2018) 1662–1673

Contents lists available at ScienceDirect

Neurocomputing

journal homepage: www.elsevier.com/locate/neucom

Semi-supervised learning for big social data analysis

Amir Hussain a , Erik Cambria b , ∗

a School of Natural Sciences, University of Stirling, FK9 4LA, Stirling, UK b School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Ave, Singapore 639798

a r t i c l e i n f o

Article history:

Received 1 May 2017

Revised 6 October 2017

Accepted 6 October 2017

Available online 18 October 2017

Communicated by Zidong Wang

Keywords:

Semi-supervised learning

Big social data analysis

Sentiment analysis

a b s t r a c t

In an era of social media and connectivity, web users are becoming increasingly enthusiastic about inter-

acting, sharing, and working together through online collaborative media. More recently, this collective

intelligence has spread to many different areas, with a growing impact on everyday life, such as in ed-

ucation, health, commerce and tourism, leading to an exponential growth in the size of the social Web.

However, the distillation of knowledge from such unstructured Big data is, an extremely challenging task.

Consequently, the semantic and multimodal contents of the Web in this present day are, whilst being

well suited for human use, still barely accessible to machines. In this work, we explore the potential of a

novel semi-supervised learning model based on the combined use of random projection scaling as part of

a vector space model, and support vector machines to perform reasoning on a knowledge base. The latter

is developed by merging a graph representation of commonsense with a linguistic resource for the lexical

representation of affect. Com parative simulation results show a significant improvement in tasks such as

emotion recognition and polarity detection, and pave the way for development of future semi-supervised

learning approaches to big social data analytics.

© 2017 Elsevier B.V. All rights reserved.

i

u

i

i

t

l

v

r

t

t

a

r

t

e

t

e

p

w

m

o

1. Introduction

With the advent of social networks, web communities, blogs,

Wikipedia, and other forms of online collaborative media, the

way people express their opinions and sentiments has radically

changed in recent years [1] . These new tools have facilitated the

creation of original content, ideas, and opinions, connecting mil-

lions of people through the World Wide Web, in a financially and

labour-effective manner. This has made a huge source of informa-

tion and opinions easily available by the mere click of a mouse.

As a result, the distillation of knowledge from this huge

amount of unstructured information comes into vital play for mar-

keters looking to create and shape brand and product identities.

The practical purpose this encapsulates has led to the emerg-

ing field of big social data analysis, which deals with information

retrieval and knowledge discovery from natural language and so-

cial networks using graph mining and natural language processing

(NLP) techniques to distill knowledge and opinions from the huge

amount of information on the World Wide Web. Sentic comput-

ing [2] tackles these crucial issues by exploiting affective common-

sense reasoning, modeled upon the intrinsically human capacity to

∗ Corresponding author. E-mail addresses: [email protected] (A. Hussain), [email protected] (E. Cam-

bria).

a

a

(

c

https://doi.org/10.1016/j.neucom.2017.10.010

0925-2312/© 2017 Elsevier B.V. All rights reserved.

nterpret cognitive and affective information associated with nat-

ral language, so as to infer new knowledge and make decisions

n connection with one’s social and emotional values, censors, and

deals. In other words, we can say that commonsense computing

echniques are applied to narrow the semantic gap between word-

evel natural language data and the concept-level opinions con-

eyed by these.

In the past, graph mining techniques and multi-dimensionality

eduction techniques [3] were employed on a knowledge base ob-

ained by merging ConceptNet [4] , a directed graph representa-

ion of commonsense knowledge, with WordNet-Affect (WNA) [5] ,

linguistic resource for the lexical representation of affect. Our

esearch fits within the sentic computing framework and aims

o exploit machine learning for developing a cognitive model for

motion recognition in natural language text. Unlike purely syntac-

ical techniques, concept-based approaches can even detect subtly

xpressed sentiments e.g. by analyzing concepts that do not ex-

licitly convey any emotions, but are linked implicitly to others

hich do. On this note, the bag-of-concepts model represents se-

antics associated with natural language, much better than bag-

f-words. Interestingly, in the bag-of-words model, a concept such

s cloud_computing would be split into two separate words,nd would hence disrupt the semantics of the input sentence

in which, for example, the word cloud could wrongly activateoncepts related to weather ).

https://doi.org/10.1016/j.neucom.2017.10.010http://www.ScienceDirect.comhttp://www.elsevier.com/locate/neucomhttp://crossmark.crossref.org/dialog/?doi=10.1016/j.neucom.2017.10.010&domain=pdfmailto:[email protected]:[email protected]://doi.org/10.1016/j.neucom.2017.10.010

A. Hussain, E. Cambria / Neurocomputing 275 (2018) 1662–1673 1663

a

a

s

w

u

t

G

i

m

c

a

o

e

b

r

b

m

(

s

t

u

o

i

c

a

c

p

i

i

a

t

c

r

a

v

l

s

m

u

s

s

e

c

p

s

c

I

c

c

t

T

d

a

h

o

m

i

s

f

i

t

S

a

t

b

f

w

2

p

b

r

i

a

n

w

c

a

f

t

p

l

a

i

n

t

i

u

u

b

w

c

o

u

o

f

t

p

g

t

u

i

t

s

v

p

n

p

d

u

p

b

c

p

b

a

F

k

(

o

t

s

Concept level analysis permits the inference of semantic and

ffective information associated with natural language opinions

nd, therefore, facilitates comparative fine-grained feature-based

entiment analysis [6] . Instead of collecting isolated opinions on a

hole item (e.g., iPhone8), users prefer to compare different prod-

cts according to specific features (e.g., iPhone8’s vs. GalaxyS8’s

ouchscreen), or even sub-features (e.g., fragility of iPhone8’s vs.

alaxyS8’s touchscreen). In this context, commonsense knowledge

s essential for deconstructing natural language text into senti-

ents accurately; for instance, the concept go_read_the_bookan be deemed positive if present in a book review, whereas in

movie review would indicate negative feedback. The inference

f emotions and polarity from natural language concepts is, how-

ver, a formidable task as it requires advanced reasoning capa-

ilities such as commonsense as well as analogical and affective

easoning.

In the proposed work, a novel semi-supervised learning model

ased on the merged use of multi-dimensional scaling (MDS) by

eans of random projections and biased support vector machine

bSVM) [7] is exploited for the task of emotion recognition. Semi-

upervised classification should demonstrate an improvement over

he classification rule, as both unlabeled and labeled data are

tilised to empirically learn a classification function (compared to

nly labeled data). Interest in semi-supervised learning has grown

n recent times, which can be attributed to the existence of appli-

ation domains (e.g., text mining, natural language processing, im-

ge and video retrieval, and bioinformatics). The proposed scheme

an benefit from biased regularization, which provides a viable ap-

roach to implementing an inductive bias in a kernel machine. This

s of fundamental importance in learning theory given that it heav-

ly influences the generalization ability of a learning system. From

mathematical perspective, inductive bias can be formalized as

he set of assumptions which determine the choice of a particular

lass of functions for supporting the learning process. Therefore, it

epresents a powerful tool that embeds prior knowledge for the

pplicative problem at hand.

To this aim, semi-supervised learning is formalized as a super-

ised learning problem biased by an unsupervised reference so-

ution. First, we introduce a novel, general biased-regularization

cheme that integrates biased versions of two well-known kernel

achines, specifically, support vector machines (SVMs) and reg-

larized least squares (RLS). Subsequently, we propose a semi-

upervised learning model, based on this biased-regularization

cheme adopting a two-stage procedure. In the first stage, a refer-

nce solution is obtained using an unsupervised clustering of the

omplete dataset (including both unlabeled and labeled data). A

rimary impact of this is that the eventual semi-supervised clas-

ification framework can derive the reference function from any

lustering algorithm, thus providing it with remarkable flexibility.

n the following stage, clustering outcomes drive the learning pro-

ess in a biased SVM (bSVM) or a biased RL S (bRL S) to acquire

lass information provided by the labels. The final outcome is that

he overall learned function utilizes labeled and unlabeled data.

he developed framework is applicable to linear and non-linear

ata distributions: the former works based on a cluster assumption

pplied to the data, whilst the latter operates based on a manifold

ypothesis. Consequently, a semi-supervised learning process can

nly be valid, when unlabeled data can assume an intrinsic geo-

etric structure, for example, a low-dimensional non-linear man-

fold in the ideal case. With respect to previous strategies, the re-

ults demonstrate significant enhancements and pave the way for

uture development of semi-supervised learning approaches echo-

ng that of affective commonsense reasoning.

The rest of this paper is organized as follows: Section 2 in-

roduces related work in the field of sentiment analysis research;

ection 3 describes in detail the new semi-supervised learning

rchitecture for affective commonsense reasoning; Section 4 illus-

rates results obtained by applying the new model to an affective

enchmark and to an opinion mining dataset; finally, Section 5 of-

ers some concluding remarks and recommendations for future

ork.

. Related work

In recent years, sentiment analysis [6] has become increasingly

opular for processing social media data on online communities,

logs, Wikis, microblogging platforms, and other online collabo-

ative media. Sentiment analysis is a branch of affective comput-

ng research that aims to classify text (but sometimes also audio

nd video [8] ) into either positive or negative (but sometimes also

eutral [9] ). Sentiment analysis has raised growing interest both

ithin the scientific community, leading to many exciting open

hallenges, as well as in the business world, due to the remark-

ble benefits to be had from financial forecasting [10] and political

orecasting [11] , e-health [12] and e-tourism [13] , community de-

ection [14] and user profiling [15] , and more.

While most works approach it as a simple categorization

roblem, sentiment analysis is actually a suitcase research prob-

em [16] that requires tackling many NLP tasks, including

spect extraction [17] , named entity recognition [18] , word polar-

ty disambiguation [19] , temporal tagging [20] , personality recog-

ition [21] , and sarcasm detection [22] . Most existing approaches

o sentiment analysis rely on the extraction of a vector represent-

ng the most salient and important text features, which is later

sed for classification purposes [23] . Some of the most commonly

sed features are term frequency and presence. The latter is a

inary-valued feature vector in which the entries merely indicate

hether a term occurs (value 1) or not (value 0). Feature vectors

an sometimes have term-based features with them. Position is

ne such example; considering that the position of tokens in text

nits can alter a token’s effect on the text’s sentiment. Presence

f n-grams, usually bi-grams and tri-grams, can also be useful as

eatures, as one can find methods which are dependent on dis-

ances between terms. Part-of-speech (POS) information (for exam-

le, nouns, verbs, adverbs, and adjectives) is commonly utilized for

eneral textual analysis, in a basic form of word-sense disambigua-

ion. There are some specific adjectives, which have been proven as

seful indicators of sentiment, and as guides for feature selection

n sentiment classification. Lastly, other studies carried out the de-

ection of sentiments via selected phrases, selected through pre-

pecified POS patterns, the majority of which had either an ad-

erb or an adjective. Numerous approaches exist which map given

ieces of text to labels from predefined sets of categories, or real

umber representatives of a polarity degree. Nonetheless, these ap-

roaches and their performances are confined to an application’s

omain and relevant areas.

The transformation of sentiment analysis research can be eval-

ated by examining the token of analysis used, along with im-

licit associated information. In this way, current approaches can

e sorted into four primary categories: keyword spotting, statisti-

al methods, lexical affinity and concept based techniques.

Keyword spotting is very naïve and is the most popular ap-

roach, due to its accessibility and economical nature. Text can

e grouped into affect categories depending on fairly unambiguous

ffect words like ‘happy’, ‘bored, ‘afraid’, and ‘sad’ being present.

or example, Elliott’s Affective Reasoner [24] , checks for 198 affect

eywords (e.g., ‘distressed’, ‘enraged’) and affect intensity modifiers

e.g., ‘extremely’, ‘mildly’, and ‘somewhat’). Other popular sources

f affect words are Ortony’s Affective Lexicon [25] , which groups

erms into affective categories, and Wiebe’s linguistic annotation

cheme [26] .

1664 A. Hussain, E. Cambria / Neurocomputing 275 (2018) 1662–1673

Fig. 1. The Hourglass of Emotions.

a

s

a

s

d

C

s

p

c

t

a

a

k

l

t

a

(

C

c

e

i

e

g

Lexical ‘affinity’ is slightly more sophisticated than keyword

spotting as, rather than simply detecting obvious affect words, it

assigns specific words a probabilistic affinity’ for a particular emo-

tion. For example, accident might be assigned a 75% probabilityof being indicative of a negative affect, as in the case of situations

such as car_accident or hurt_by_accident . These probabil-ities are usually trained from linguistic corpora [27–29] .

Statistical methods like that of latent semantic analysis (LSA)

and SVM, have proven useful for the affective cataloging of

texts. Researchers have applied these approaches on projects like

Pang’s movie review classifier [30] , Goertzel’s Webmind [31] , and

more [17,32–34] . By feeding a machine learning algorithm a large

training corpus of affectively annotated texts, it is possible for

the systems to not only learn the affective valence of affect key-

words, but to also take into account the valence of other arbi-

trary keywords (like lexical affinity), punctuation, and word co-

occurrence frequencies. Statistical methods, however, tend to be

semantically weak, which means that, with the exception of ob-

vious affect keywords, other lexical or co-occurrence elements in

a statistical model have little predictive value individually. Hence,

statistical classifiers only have an acceptable accuracy when given

a large enough text input. Therefore, while these methods may

be able to classify text at a paragraph or page level, they are not

effective for smaller text units such as sentences.

Concept-based approaches focus on a semantic analysis of text

through the use of web ontologies [35] or semantic networks [36] ,

which require grasping the conceptual and affective information

associated with natural language opinions. By relying on large

semantic knowledge bases, such approaches step away from the

blind use of keywords and word co-occurrence count, instead re-

lying on the implicit meaning/features associated with natural lan-

guage concepts. Concept-based approaches differ from purely syn-

tactical techniques, in that they can detect sentiments which are

expressed in a subtle manner, e.g., through the analysis of con-

cepts which are implicitly linked to other concepts that express

emotions.

3. Semi-supervised reasoning

In the proposed framework, MDS is used to represent concepts

in a multi-dimensional vector space and biased SVM (bSVM) is

exploited to infer semantics and sentics (that is, the conceptual

and affective information) associated with such concepts, according

to an hourglass-shaped emotion categorization model [2] ( Fig. 1 ).

Under the aforementioned model, sentiments are organized around

four independent dimensions (Pleasantness, Attention, Sensitivity,

and Aptitude) whose different levels of activation make up the

total emotional state of the mind. The bSVM model is adopted

as a semi-supervised approach to tackle the classification task so

as to overcome the lack of labeled commonsense data. In semi-

supervised classification, in fact, both unlabeled and labeled data

are exploited to learn a classification function empirically instead

of learning a classification rule based only on labeled data. As

a result, concepts for which affective information is missing can

be employed in the classification phase. The main purpose of the

bSVM-based framework developed in this study is to foretell the

degree of affective valence each concept posses in a particular facet

of the Hourglass model.

3.1. Affective commonsense knowledge base

The core module of the framework hereby proposed is an

affective commonsense knowledge base built upon ConceptNet, the

graph representation of the Open Mind Common Sense (OMCS)

corpus, which is structurally similar to WordNet [37] , but whose

scope of contents is general world knowledge, in the same vein

s Cyc [38] . Instead of insisting on formalizing commonsense rea-

oning using mathematical logic [39] , ConceptNet uses a new

pproach: it represents multi-word expressions in the form of a

emantic network and makes it available for usage in NLP.

WordNet focuses on lexical categorization and word-similarity

etermination and Cyc focuses on formalized logical reasoning.

ontrastingly, ConceptNet is characterized by contextual common-

ense reasoning: this means that it is meant for the task of making

ractical context-based inferences over real-world texts. In Con-

eptNet, WordNet’s notion of node in the semantic network is ex-

ended from purely lexical items (words and simple phrases with

tomic meaning) to include higher-order compound concepts such

s satisfy_hunger and follow_recipe , so as to representnowledge of a greater range of concepts found in everyday life.

Moreover, in ConceptNet, WordNet’s repertoire of semantic re-

ations is extended from the triplet of synonym, IsA and PartOf ,

o a repertoire of twenty semantic relations including, for ex-

mple, EffectOf (causality), SubeventOf (event hierarchy), CapableOf

agent’s ability), MotivationOf (affect), PropertyOf , and LocationOf .

onceptNet’s knowledge is also of a more informal and practi-

al nature. For example, WordNet has formal taxonomic knowl-

dge that a ‘dog’ is a ‘canine’, which is a ‘carnivore’, which

s a ‘placental mammal’; but it cannot make the logically ori-

nted member-to-set association that a dog falls under the cate-ories of pet or family_member . ConceptNet on the other hand


Fig. 2. AffectNet.

d

E

a

n

i

r

s

s

o

b

o

s

y

t

p

u

m

a

l

w

t

i

a

s

s

c

b

t

t

i

b

f

w

c

c

u

p

w

i

i

W

c

c

t

t

t

s

Fig. 3. AffectiveSpace’s first two eigenmoods.

i

a

e

s

e

t

i

d

3

i

f

a

a

T

i

w

p

p

c

t

u

p

w

t

p

r

p

w

t

escribes something that is often true but not always. For instance,

ffectOf( fall_off_bicycle , get_hurt ); this shows it contains lot of knowledge that is defeasible, which is something that can-

ot be left aside in commonsense reasoning. Most of the facts that

nterrelate ConceptNet’s semantic network are dedicated to making

ather generic connections between concepts.

ConceptNet is obtained via an automatic process, in which a

et of extraction rules are applied to the semi-structured English

entences of the OMCS corpus, following which an additional set

f ‘relaxation’ procedures are applied. This results in network gaps

eing filled in and smoothed over, thereby optimizing connectivity

f semantic networks. ConceptNet is a good source of common-

ense knowledge, but it alone is not enough for sentiment anal-

sis despite detailing the semantic links between concepts, given

hat it frequently misses associations between concepts that ex-

ress the same sentiment. To surmount this flaw, we employed the

se of WNA – a semantic resource for the etymological embodi-

ent of affective knowledge, which was built upon WordNet. By

llocating a number of WordNet synsets to one or more affective

abels (a-labels), WNA was developed. For instance, synsets marked

ith the a-label ‘emotion’ demarcated concepts representing emo-

ional states. There are also other a-labels for concepts represent-

ng situations that provoke emotions, emotional responses as well

s moods. Through a dual-stage process, WNA was born. The first

tage pertained to the documentation of a base core of affective

ynsets, while the second saw the expansion of the core with asso-

iations outlined in WordNet. ConceptNet and WNA are then com-

ined through the linear fusion of their respective matrix represen-

ations into a single matrix, in which the knowledge between the

wo databases is shared. In order for this combination process, the

nput data from both sources has to be transmuted so that it can

e denoted in its entirety in the same matrix. Hence, the lemma

orms of ConceptNet notions are allied with the lemma forms of

ords in WNA, and the most common associations in WMA are

harted into ConceptNet’s set of relations. For instance, Holonym is

harted into PartOf and Hypernym is to IsA . Effectively, we transfig-

re ConceptNet into a matrix by divvying each assertion into two

arts: a concept and a feature, wherein a feature is a contention

ithout any quantified concept; for example ‘is a type of fluid’.

Based on the dependability of assertions, the subsequent logs

n the matrix are either positive or negative scores, which scale

ncreases proportionally to their dependability. Like ConceptNet,

NA is also denoted as a matrix, in which rows are affective

oncepts and columns are associated qualities. In combining Con-

eptNet and WNA into a single matrix, we created a new affec-

ive semantic network, in which commonsense concepts are in-

errelated with a graded system of affective domain tags. We call

his new framework AffectNet [2] ( Fig. 2 ); through it, common-

ense and emotional knowledge are not merely connected, but

nstead melded together. For example, concepts from daily life such

s meet_people or have_breakfast are now associated withmotional domain notions such as. ‘joy’, ‘anger’, or ‘surprise’. Such

emantic associations can be of much benefit when tasks such as

motion recognition or polarity detection from natural language

ext are performed, as it is common for both sentiments and opin-

ons to be conveyed implicitly through context and domain depen-

ent concepts, instead of through specific affect words.

.2. Affective analogical reasoning

Problems are solved best when we understand a solution for

t. The tricky part is when we encounter problems we have never

aced before, for which require intuition. Intuition can be explained

s the process of forming analogies between a current problem

nd ones we have cracked previously to find a suitable solution.

his process of thinking Fig. 3 could well be the essence of human

ntelligence, as in daily life no two situations are identical, and

e are continually having to apply analogical reasoning to solve

roblems and make decisions. The human mind continuously ap-

lies compression in all vital relations [40] . The compression prin-

iples aim to convert diffuse and distended conceptual structures

o more focused versions, to become more congenial for human

nderstanding.

In order to emulate such a process, MDS was previously ap-

lied on the matrix representation of AffectNet, a semantic net-

ork in which commonsense concepts were linked to seman-

ic and affective f eatures ( Table 1 ). The result was AffectiveS-

ace. PCA is most widely used as a data-aware dimensionality

eduction method [41] and is closely related to the low-rank ap-

roximation method, singular value decomposition (SVD), as both

ork on a transformed version of the data matrix [42] . Therefore,

runcated singular value decomposition (TSVD) is applied to the


Table 1

A snippet of the AffectNet matrix.

AffectNet IsA-pet KindOf-food Arises-joy ...

Dog 0.981 0 0.789 ...

Cupcake 0 0.922 0.910 ...

Songbird 0.672 0 0.862 ...

Gift 0 0 0.899 ...

Sandwich 0 0.853 0.768 ...

Rotten fish 0 0.459 0 ...

Win lottery 0 0 0.991 ...

Bunny 0.611 0.892 0.594 ...

Police man 0 0 0 ...

Cat 0.913 0 0.699 ...

Rattlesnake 0.432 0.235 0 ...

... ... ... ... ...

‘

s

c

l

i

c

a

c

f

t

A

p

p r

l d

s

c

s

t

i

t

r

y

f

f

t

A

i

n

t

o

i

N

p

m

L

y√

≤

w

d

w

m

o

t

s

e

concept-feature matrix for the purposes of expediently diminish-

ing its dimensionality and netting key links.

The purpose of this is to ensure that the new joint model com-

prises solely of the key features that exemplify the general outlook.

Employing TSVD on AffectNet, results in it defining other qualities

that could be of parallel relevance to known affective concepts. If

there is no known value for an item in a concept in the matrix,

but the same item is inherent in other analogous concepts, then

by parallel association, it is highly probable that the same item is

inherent in the concept as well. Essentially, concepts and features

that are aligned and possess high dot yields are ideal contenders

for analogies.

Osgood et al. [43] produced a revolutionary piece on compre-

hending and visualizing affective knowledge connected to natural

language text. In their work, MDS was utilized to construct visu-

alizations of affective texts based on their parallel scores when

contrasted with texts from other cultures. In a multi-dimensional

space, words can be conceived of as points, and parallel scores

then denote the distances between words. MDS rethinks these dis-

tances as points in a reduced dimensional space (most commonly

two or three dimensioned). Similarly, AffectiveSpace’s purpose is

to visualize the semantic and affective likeness between dissimi-

lar concepts by projecting them onto a multi-dimensional vector

space.

However, different from that in Osgood’s research, the basic

foundation of AffectiveSpace is not merely a restricted set of par-

allel scores between affect words, but instead consists of millions

of confidence scores linked to bits of commonsense knowledge

that are in turn related to a structured system of affective domain

labels. Instead of being defined by just a small number of human

annotators and characterized as a word-word matrix, AffectiveS-

pace is solidly based on AffectNet and denoted as a concept-feature

matrix. SVD seeks to decompose the AffectNet matrix A ∈ R n ×d into three components,

A = USV T , (1)where U and V are unitary matrices, and S is an rectangular diag-

onal matrix with nonnegative real numbers on the diagonal.

SVD has been proved to be optimal in preserving any unitarily

invariant norm 1 ‖ · ‖ M [42] : ‖ A − A k ‖ M = min

rank (B)=k ‖ A − B ‖ M , (2)

where A k , i.e., AffectiveSpace, is formed by containing only the top

k singular values in S . Hence, in AffectiveSpace, commonsense con-

cepts and emotions are represented by vectors of k coordinates.

These coordinates can be seen as describing concepts in terms of

1 A norm ‖ · ‖ M is unitarily invariant if ‖ UAV ‖ M = ‖ A ‖ M for all A and all unitary U , V .

φ

eigenmoods’ which form the axes of AffectiveSpace, i.e., the ba-

is e 0 , ..., e k −1 of the vector space. For example, the most signifi-ant eigenmood, e 0 , represents concepts with positive affective va-

ence. That is, the larger a concept’s component in the e 0 direction

s, the more affectively positive it is likely to be. Correspondingly,

oncepts with negative e 0 components are likely to have negative

ffective valence.

Hence, by exploiting the information sharing property of SVD,

oncepts with the same affective valence are likely to have similar

eatures. This means that concepts which convey the same emo-

ion are more likely to fall in close proximity to each other in

ffectiveSpace. Concept similarity depends on, not their absolute

ositions in vector space, but the angle they make with the origin.

For example, concepts such as beautiful day , birthdayarty , and make someone happy are found very close in di-

ection in the vector space, while concepts like feel guilty , beaid off , and shed tear are found in a completely differentirection (nearly opposite with respect to the centre of the space).

The difficulty with this sort of representation is a lack of model

calability: as the number of concepts and semantic features in-

reases, the AffectNet matrix becomes so high-dimensional and

parse that it can no longer be computed by the SVD [44] . Al-

hough there has been substantial research seeking fast approx-

mations of the SVD, the approximate methods are at best ≈ 5imes faster than the standard one [42] , hence it is not viable for

eal-world big data applications.

There has been conjecture that neuronal learning has simple,

et powerful meta-algorithms underlying it [45] , which should be

ast, effective, scalable and biologically plausible, as well as having

ew-to-no assumptions [44] . Optimizing all the ≈ 10 15 connectionshrough the last few million years’ evolution is very unlikely [44] .

lternatively, nature probably only optimizes the global connectiv-

ty (mainly white matter), but leaves the other details to random-

ess [44] .

To handle the growing number of concepts and semantic fea-

ures, we replace SVD with random projection (RP) [46] , a data-

blivious method, to map the original high-dimensional data-set

nto a much lower-dimensional subspace by using a Gaussian

(0, 1) matrix, while preserving the pair-wise distances with high

robability. This theoretically strong and empirically verified state-

ent follows Johnson and Lindenstrauss’s (JL) Lemma [44] . The JL

emma states that with high probability, for all pairs of points x,

∈ X simultaneously,

m

d ‖ x − y ‖ 2 (1 − ε) ≤‖ �x − �y ‖ 2 ≤ (3)

√ m

d ‖ x − y ‖ 2 (1 + ε) , (4)

here X is a set of vectors in Euclidean space, d is the original

imension of this Euclidean space, m is the dimension of the space

e wish to reduce the data points to, ε is a tolerance parametereasuring to what extent is the maximum allowed distortion rate

f the metric space, and � is a random matrix.

Structured random projection for making matrix multiplica-

ion much faster was introduced in [47] . Achlioptas [48] proposed

parse random projection to replace the Gaussian matrix with i.i.d.

ntries in

ji = √

s

⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩

1 withprob . 1

2 s

0 with prob . 1 − 1 s

−1 with prob . 1 2 s

, (5)


Fig. 4. AffectiveSpace 2.

w

t

b

t

r

O p

�

w

r

p

d

i

b

a

m

n

t

w

s

s

n

t

s

o

i

i

c

e

r

t

s

d

3

s

r

a

l

c

a

u

f

t

p

i

3

o

t

d

i

R

w

o

t

X

m

t

l

i

l

m

m

f ∥∥

n

i

o

p

i

l

o

{

s

y

w

t

s

a

l

p

o

t

here one can achieve a × 3 speedup by setting s = 3 , since only1 3 of the data need to be processed. However, since our input ma-

rix is already too sparse, we avoid using sparse random projection.

When the number of features is a lot larger than the num-

er of training samples ( d � n ), subsampled randomized Hadamardransform (SRHT) is preferred, as it behaves similarly to Gaussian

andom matrices but also accelerates the process from O(n d) to(n log d) time [49] . Following [49,50] , for d = 2 p where p is any

ositive integer, a SRHT can be defined as:

= √

d

m RHD , (6)

here

• m is the number we want to subsample from d featuresandomly.

• R is a random m × d matrix. The rows of R are m uniform sam-les (without replacement) from the standard basis of R d .

• H ∈ R d×d is a normalized Walsh–Hadamard matrix, which isefined recursively: H d =

[H d/ 2 H d/ 2 H d/ 2 H d/ 2

]with H 2 =

[+1 +1 +1 −1

].

• D is a d × d diagonal matrix and the diagonal elements are.i.d. Rademacher random variables.

Our subsequent analysis relies solely on distances and angles

etween pairs of vectors (i.e., the Euclidean geometry information),

nd is sufficient enough to set the projected space to be logarith-

ic in the size of the data [51] and apply SRHT. The result is a

ew vector space model, AffectiveSpace 2 ( Fig. 4 ), which preserves

he semantic and affective relatedness of commonsense concepts

hile being highly scalable [52] . The key to performing common-

ense reasoning is to find a good trade-off for knowledge repre-

entation. Since no two situations in life are ever exactly the same,

o representation should be too concrete or else it will fail apply

o new situations. However, at the same time, no representation

hould be too abstract, or risk suppressing too many details.

Within AffectiveSpace 2, this knowledge representation trade-

ff can be seen in the choice of the vector space dimensional-

ty. The number m indicates the dimension of AffectiveSpace and,

n fact, is a measure of the trade-off between accuracy and effi-

iency in the representation of the affective commonsense knowl-

dge base. The bigger m is, the more precisely AffectiveSpace rep-

esents AffectNet’s knowledge; the smaller m is, on the other hand,

he more efficiently AffectiveSpace represents affective common-

ense knowledge both in terms of vector space generation and of

ot product computation.

.3. Semi-supervised learning approach

Even though AffectiveSpace 2 is a powerful tool for discovering

emantic and affective relatedness of natural language concepts,

easoning by analogy in such a multi-dimensional vector space is

difficult task as the distribution of concepts in the space is non-

inear and only the affective valence of a relatively small set of

oncepts is known a priori. Hence, the bSVM and bRLS models are

dopted as a semi-supervised approach in order to exploit both

nlabeled and labeled commonsense data to learn a classification

unction empirically. They are both based on the biased regulariza-

ion theory, realized as follows: a reference solution (e.g., a hyper-

lane) is used to bias the solution of a regularization-based learn-

ng machine.

.3.1. Regularization-based learning

Modern classification methods often rely on regularization the-

ry. In a regularized functional, a positive parameter, λ, rules theradeoff between the empirical risk, R emp [ f ], (loss function) of the

ecision functions f (i.e., regression or classification) and a regular-

zing term. The cost to be minimized can be expressed as:

reg = R emp [ f ] + λ�[ f ] (7)here the regularization operator, �[ f ], quantifies the complexity

f the class of functions from which f is drawn. Usually f belongs

o a Reproducing Kernel Hilbert Space (RKHS) H . For a data set,

, one computes a square matrix, K , of elements, which is sym-

etric and positive definite. Every entry K ( s, x ) can be viewed as

he inner product < φ( s ), φ( x ) > where φ(.) is the (implicit, noninear) mapping function uniquely defined by H . A kernel method

mplies the choice of an inner-product formulation; the simplest,

inear kernel supports the inner vector product in the actual do-

ain space: < φ( s ), φ( x ) > ≡ < s, x > . When dealing with maximum-margin algorithms, �[ f ] is imple-

ented by the term ∥∥ f ∥∥2

H , which supports a square norm in the

eature space. The Representer Theorem proves that, when �[ f ] =f ∥∥2

H , the solution of the regularized cost can be expressed as a fi-

ite summation over a set of labeled training patterns X = { x i , y i } , = 1 , . . . , l, y i ∈ {−1 , +1 } :

f (x j ) = w · x j = l ∑

i =1 βi K(x i , x j ) (8)

SVM and RLS are popular methods belonging to this family

f regularizing algorithms; both provide excellent performance in

attern recognition problems. The two learning algorithms differ

n their choice of loss function: the SVM model uses the ‘hinge’

oss function, whereas RLS operates on a square loss function.

The SVM training process requires one to solve the following

ptimization problem:

min ̄w ,b, ̄�}

C

l ∑ i =1

�i + 1

2

∥∥w ∥∥2 �i > 0 (9) ubject to:

i (w T x i ) ≥ 1 − �i (10)

here � i is a penalty term to be added for each misclassified pat-ern, and C is a hyperparameter that plays the role of 1/ λ. The con-tant parameter b has been dropped because one can equivalently

ugment the space X with a feature of constant value +1. The prob-

em can be efficiently solved in its dual form by using quadratic

rogramming techniques. When using the dual formulation, one

ptimizes a set of Lagrange multipliers, αi , and it can be shownhat the series coefficients in (8) can be written as β = α y .

i i i


r

w

f

c

w

f

λ c

t

v

w

d

t

g

I

λ t

e

b

(

t

r

t

s

a

l

w

o

R

Y

∑

b

∑

3

v

w

e

s

p⎧⎪⎨⎪⎩ i{

T

When dealing with the RLS model, the problem to be optimized

is:

min f

l ∑ i =1

(y i − f i ) 2 + λ

2

∥∥ f ∥∥2 H

(11)

whose optimum in β is found by solving the following linearsystem:

(K + λI ) β = y (12)

3.3.2. Maximal discrepancy bounds for model selection

One of the main obstacles in classification problems is tun-

ing classifier regularization parameter(s). When tackling limited-

sample problems, strategies such as k-fold cross validation may be

difficult to apply due to the small size of both the training and

the test sets. Thus, theoretical approaches which derive the ana-

lytical expressions of the generalization bounds, can give powerful

options for attaining reliable model selection. These methods do

not require data partitioning and are always based on the com-

plexity on the hypothesis space, F . The bound value to the true

generalization error, R [ f ], is asserted with confidence at least 1 − δ, and is commonly written as the sum of several terms:

R [ f ] ≤ R emp [ f ] + χ + � (13)where R emp [ f ] is the error on the training set, χ measures the com-plexity of the space of classifying functions, and � penalizes the

finiteness of the training sample.

The Maximal-Discrepancy bound (MD) belongs to the class of

theoretical approaches. In this research, the MD bound is exploited

to confirm that the proposed semi-supervised learning scheme can

ensure the shrinking of the generalization bound, thus providing

effective tools for model selection in a semi-supervised setting.

3.3.3. A biased regularization

A general biased regularization model is realized as follows: a

reference solution (e.g., a hyperplane) is used to bias the solution

of a regularization-based learning machine.

In a linear domain one can define a generic convex loss func-

tion, l ( X, Y, w ), and a biased regularizing term; the resulting cost

function is:

l(X , Y , w ) + λ1 ∥∥w − λ2 w 0 ∥∥2 (14)

where w 0 is a reference hyperplane, λ1 is the classical regulariza-tion parameter that controls smoothness (e.g., 1/ C in SVM), and

λ2 controls the adherence to the reference solution w 0 . Expression(14) is a convex functional and thus admits a global solution. From

(14) one gets:

l(X , Y , w ) + λ1 2

∥∥w − λ2 w 0 ∥∥2 = l(X , Y , w ) + λ1 2

∥∥w ∥∥2 −λ1 λ2 ww 0 (15)

which actually involves two regularization parameters, λ1 andλ2 ; this problem setting differs from the one proposed for SVM,where only one regularization parameter was defined, obtaining

l(X , Y , w ) + λ1 ∥∥w − w 0 ∥∥2 . The latter expression coincides in the

special case λ2 = 1 . Fig. 5 explicates the role played by parameter λ2 in three dif-

ferent cases. In all those figures, w is set as the origin, 0, of the

space of hypothesis, whereas a black square denotes the reference

hyperplane, w 0 , and a grey square indicates the ‘true’ optimal so-

lution. For the sake of clarity, and without loss of generality, the

examples assume that:

1. λ1 is set to a fixed value (i.e., λ1 = 1 ). 2. The distance

∥∥w − λ2 w 0 ∥∥ is constant for any λ2 . 3. w λ2 = 0 (black triangle) is the best solution one can obtain

from the unbiased learning (i.e., λ = 0). Here, the best solution
2
efers to the solution that is closest to w ∗ among all the possible that lie at a distance

∥∥w − λ2 w 0 ∥∥ from w 0 (the dashed circum-erence).

Fig. 5 (a) refers to the situation in which the reference w 0 is

loser to the true solution w ∗ than w λ2 = 0 . The Figure shows thathen λ2 decreases from 1 to 0, the centre of the ideal circum-

erence, which encloses the eventual solution w λ2 , drifts. When

2 → 0, w λ2 moves toward the origin w = 0 , which represents theondition of no reference exploited. Indeed, the draw highlights

hat, when w 0 gives a reliable reference, one can take full ad-

antage of biased regularization, as the best solution for λ2 = 1 , λ2 = 1 , definitely improves over w λ2 = 0 .

Fig. 5 (b) illustrates the opposite case: the reference w 0 is more

istant from the true solution w ∗ than w λ2 = 0 (it is worth to notehat the relative position of w ∗ and w λ2 = 0 with respect to the ori-in w = 0 remained unchanged when compared with Fig. 5 (a)).n this situation, one would obtain the best outcome by setting

2 = 0 , thus neutralizing the contribution of the biased regulariza-ion. Hence, w 0 does not represent a helpful reference.

Finally, Fig. 5 (c) illustrates another situation in which the refer-

nce w 0 is more distant from the true solution, w ∗, than w λ2 = 0 ,

ut biased regularization still remains useful, as by adjusting λ2 i.e., by modulating the contribution of the reference w 0 ) one even-

ually obtains a solution w λ2 that improves over w λ2 = 0 . As aesult, one can take advantage of biased regularization even when

he reference solution is not optimal.

The extension of (14) to non-linear models is obtained by con-

idering a Reproducing Kernel Hilbert Space H . In that case one has

reference function f 0 and the functional (15) becomes:

(X , Y , f ) + λ1 2

∥∥ f − λ2 f 0 ∥∥2 H (16)here now the norm of the regularizer is taken in H . Eventually,

ne obtains the models for the biased SVM (bSVM) and the biased

L S (bRL S), respectively, by adopting the proper loss function l ( X,

, w ): bSVM:

l

i =1 (1 − y i f (x i )) + +

λ1 2

∥∥ f − λ2 f 0 ∥∥2 H (17)RLS:

l

i =1 (y i − f (x i )) 2 +

λ1 2

∥∥ f − λ2 f 0 ∥∥2 H (18).3.4. Biased SVM

The following theorem shows the formalization of the biased

ersion of the support vector machine (bSVM) within this scheme

ith the inclusion of a regularizing bias.

Theorem 1 (bSVM) : Given a reference hyperplane w 0 (or a refer-

nce function f 0 , if the domain is not linear), a regularization con-

tant C , and a biasing constant λ2 , the dual form of the learningroblem:

min { �, w } C ∑ l

i =1 �i + 1

2

∥∥w − λ2 w 0 ∥∥2 y i (w · x i ) ≥ 1 − �i ∀ i �i ≥ 0 ∀ i

(19)

s written as

min { α} 1

2 αt Q α − ∑ l i =1 αi (1 − λ2 y i f 0 (x i ))

0 ≤ αi ≤ C ∀ i (20)

he model of the data is:

f (x ) = l ∑

i =1 αi y i K(x , x i ) + λ2 f 0 (x ) (21)


Fig. 5. The role played by parameter λ2 in the proposed problem setting. Three different situations are analyzed: (a): the reference w 0 is closer to the true solution w ∗ than

w λ2 = 0 ; (b): the reference w 0 is more distant from the true solution w ∗ than w λ2 = 0 ; (c): the reference w 0 is more distant from the true solution w

∗ than w λ2 = 0 , but biased regularization can be useful.

w

m

k

p

(

L

g

i

G

K

r

t

G

{

b

s

i

3

R

c

m

h

w

(

f

λ

m

i

m

w

β

C

r

t

m

i

p

a

b

b

w

t

3

r

d

d

t

l l l

here α are the support vectors, � the slack variables used toeasure the grade of allowed misclassification and K the chosen

ernel.

Unlike the conventional SVM formulation, the minimization

roblem (20) does not contain a linear constraint. This problem

20) can be optimized by an SMO version which uses only a single

agrange multiplier at each iteration. In such a new procedure, the

radient integrates the new reference based-term and the regular-

zation parameter. The gradient value for the i th pattern is:

i = y i l ∑

j=1 α j y j K(x j , x i ) − 1 + λ2 y i f 0 (x i ) (22)

Then, as usual, the projected gradient PG is computed and the

KT optimality conditions are checked on this value. The algo-

ithm runs till the KKT conditions are satisfied. In the following

he pseudo-code of the algorithm is presented.

bSVM Solver :

1. Initialization: α = 0 , flag = 0 2. While (!flag):

a. flag = 1 b. for i = 1 , li.

= y i l ∑

j=1 α j iy j K(x j , x i ) − 1 + λ2 y i f 0 (x i ) (23)

ii.

min (G, 0) αi = 0 min (G, 0) αi = C G 0 < αi < C

(24)

iii. if | PG | > �1. αi = min (max (αi − G/k ii , 0) , C) 2. f lag = 0 end if

end for end while

The pseudo-code listed above can be considerably accelerated

y updating the gradient only when necessary, as well as by using

hrinking and random permutations of indexes of patterns at each

teration.

.3.5. Biased RLS

The following theorem formalizes the linear biased version of

LS.

Theorem 2 : Given a reference hyperplane w 0 , a regularization

onstant λ1 , and a biasing constant λ2 , the problem:

in w ∥∥Xw − y ∥∥2 + λ1 ∥∥w − λ2 w 0 ∥∥2 (25)

as solution:

= (X t X + λ1 I ) −1 (X t y + λ1 λ2 w 0 ) (26)The following theorem gives the dual form of biased RLS

bRLS):

Theorem 3 : Given a reference hyperplane w 0 (or a reference

unction f 0 ), a regularization constant λ1 , and a biasing constant

2 , the dual of the problem:

in w ∥∥Xw − y ∥∥2 + λ1 ∥∥w − λ2 w 0 ∥∥2 (27)

s:

in β∥∥K β + λ2 f 0 (x ) − y ∥∥2 + λ1 βt K β (28)

hich has solution:

= (K + λ1 I ) −1 (y − λ2 f 0 (X )) (29)The model of the data is:

f (x ) = l ∑

i =1 βi K(x , x i ) + λ2 f 0 (x ) (30)

orollary 1. Given the RLS learning machine, the RKHS H and the

epresentation,

f (x ) = l ∑

i =1 βi K(x , x i ) + λ2 f 0 (x ) (31)

he solution of problem

in f ∥∥ f − y ∥∥2 + λ1 ∥∥ f − λ2 f 0 ∥∥2 H (32)

s:

(K + λ1 I ) β = y − λ2 f 0 (X ) (33)From a computational point of view, the choice between the

rimal or the dual solution depends on the characteristics of the

vailable data. If samples lie in a low-dimensional space and can

e separated by a linear classifier, the primal form is preferable

ecause it scales linearly with the number of features. Conversely,

hen the number of patterns is lower than the number of features,

he dual form should be used.

.3.6. A semi-supervised learning scheme based on biased

egularization

Once the biased version of the SVM kernel machine has been

efined, the following four-step procedure can be followed in or-

er to formalize a semi-supervised framework for the classification

ask.

Let X be a dataset composed by l labeled patterns and u un-

abeled patterns; let X denote the labeled subset, y denote the


Table 2

Comparison of semi-supervised methods.

Method Complexity

bSVM O c + O (l + u ) 2 bSVM linear O c + O (d(l + u ) k ) bRLS O c + O (l + u ) 3 LapRLS O (l + u ) 3 LapSVM O (l + u ) 3 LapSVM linear O ( d 3 )

Cluster kernel O ( max ( l, u ) 3 )

TSVM O (k (l + 2 u ) 2 ) EM –

Co-training –

Semi parametric regularization O (l + u ) 3

i

l

t

o

i

b

c

c

f

S

t

w

s

p

i

T

c

n

u

t

g

i

r

l

o

s

g

o

i

r

corresponding vector of labels, and X u denote the unlabeled sub-

set. Then the semi-supervised learning scheme can be formalized

as follows:

1. Clustering: Use any clustering algorithm to perform an un-

supervised partition of the dataset X (a bipartition in the simplest

case).

2. Calibration: For every cluster, a majority voting scheme is

adopted to set the cluster label; this is done by exploiting the la-

beled samples. Then, for each cluster, assign to each sample the

cluster label. Let ˆ y denote this new set of labels.

3. Mapping: Given X and ˆ y , train the selected learning machine

and obtain the solution w 0 .

4. Biasing: Given X l and the true labels y l , train the biased ver-

sion of the learning machine (biased by w 0 ). The solution w carries

information derived from both the labeled data X l and the unla-

beled data X u .

The ultimate result is that the overall learned function ex-

ploits both labeled and unlabeled commonsense data. The semi-

supervised learning scheme possesses some interesting features:

• Since the proposed method could be applied both to linearand non linear domains, the result is a completely general-

izable learning scheme.

• The present learning scheme distinguishes between twoprincipal actions: clustering and biasing. This means that

the two tasks can be tackled independently. If one wants to

adopt a particular solution for biasing or a new clustering al-

gorithm is designed, then the two actions can be controlled

and adjusted separately.

• If the learning machine is a single layer learning machinewhose cost is convex then convexity is preserved and a

global solution is granted.

• Every clustering method can be used to build the referencesolution.

3.3.7. bSVM and bRLS for semi-supervised learning: computational

complexity

This semi-supervised learning scheme consists of 3 compu-

tationally intensive steps: clustering, mapping and biasing. Clus-

tering tasks can be completed using various multiple clustering

algorithms, which are then characterized by computational com-

plexities. As follows, the complexity of clustering can be denoted

generically as O c . There also exists some solutions which allow the

implementation of powerful clustering algorithms such as k-means

or Special Clustering.

In the second step, mapping, the time complexity is entirely de-

termined by the learning machine applied to all the l + u availablesamples. For RLS this would mean a complexity of O ( ( l + u ) 3 ) , i.e.,the solution of the system of linear equations (29) . When adopt-

ing SVM as learning machine, one can exploit the SMO algorithm,

which scales in between O (l + u ) and O ( (l + u ) 2 ) . The third step, biasing, consists of solving through either a lin-

ear system or using an SMO-like algorithm. In both cases, one also

need to pre-compute the predictions of the reference model f 0 ( x )

for all the labeled patterns (with d-dimensional patterns the even-

tual cost is O ( ud )). As a result, when bRLS is adopted as learning

machine, the computational complexity is:

O bRLS = O c + O ((l + u ) 3

)+ O (ud) + O

(l 3 )

(34)

In O bRLS the dominant terms are O ( ( l + u ) 3 ) and possibly thecomplexity O c associated to the clustering task. When instead

bSVM is used, the computational complexity is:

O bSV M = O c + O ((l + u ) 2

)+ O (ud) + O

(l 2 )

(35)

where the dominant terms are O ((l + u ) 2 ) and, again, the com-plexity O c . In this case, one assumes that O (ud) < O ((l + u ) 2 ) ; this

s a reasonable hypothesis, except for those cases where the data

ie in a highly dimensional space. Therefore, the complexity of the

raining procedure roughly scales with the same complexity of the

riginal learning machine. SVM scales approximately as O ( l 2 ), and

ts semi-supervised version (bSVM) scales as O ((l + u ) 2 ) ; a similarehavior characterizes RLS and bRLS.

Below are final considerations which can contribute to the dis-

ussion surrounding computational complexity:

• If it is known a priori that the data are almost linearly sep-arable, then it is possible to build very efficient learning al-

gorithms. For instance, one can couple fast linear k-means

implementations with the linear version of SVM and bSVM.

That set up leads to very efficient learning methods, in par-

ticular when data is highly sparse, such as in text mining

problems.

• The proposed semi-supervised learning scheme can addresslarge scale problems, as long as the clustering engines are

scaling well. For certain domains, for example text mining,

in which linearity and data sparsity are exploitable, adap-

tations of the learning algorithm can result in incredibly

fast learning algorithms. These algorithms can hence process

hundreds of thousands of patterns in a matter of seconds.

• New, unseen test patterns can be managed effectively asclass assignment can exploit the closed form functions.

The proposed framework can provide attractive features when

ompared with other semi-supervised methods. First, the reference

unction can be worked out by exploiting any clustering algorithm.

econd, biased regularization can support effectively model selec-

ion. This feature is crucial indeed, in particular when a model

ith few labeled data is addressed. Other approaches to semi-

upervised learning do not provide this attribute. Furthermore, the

resent framework exploits a convex cost function.

The proposed framework also ensures satisfactory performance

n terms of computational complexity. This is highlighted in

able 2 , which reports – for each approach – the computational

omplexity. Computational complexity is formalized by using the

umber of labeled patterns, l , the number of unlabeled patterns,

, the dimensionality of the data, d , the number of iterations of

he learning algorithm, k , and the complexity of the clustering al-

orithm, O c . For the proposed biased learning machines, complex-

ty has been formalized by assuming the use of efficient SMO-like

outines.

It is evident from the table that the current semi-supervised

earning scheme can achieve satisfactory performances in terms

f computational complexity, whenever the clustering algorithm

cales as (or better than) the adopted biased machine. In this re-

ard, bSVM appears especially appealing as it scales quadratically,

r even linearly if the underlying problem has particular character-

stics. Indeed, one should take into account that the term O c rep-

esents the added cost to be paid for a gain in flexibility.


Fig. 6. The effectiveness of model selection is improved by adopting a formulation of biased regularization which fully exploits parameters λ1 and λ2 . Four cases are

presented: (a) good reference / weak bias; (b) good reference / strong bias; (c) bad reference / weak bias; (d) bad reference / strong bias.

3

e

t

d

c

t

i

E

t

f

p

t

e

u

p

p

t

i

f

o

F

F

s

l

t

a

t

a

t

w

e

i

F

t

b

s

i

c

t

f

w

p

4

b

Table 3

Emotion recognition accuracy.

Method Strict accuracy (%) Relaxed accuracy (%)

bSVM (100 m ) 63 87

bSVM (50 m ) 62 89

bRLS (100 m ) 63.5 88.5

bRLS (50 m ) 64 90.5

ANN 46.9 76.5

k-medoids 43.2 74.1

k-NN 41.9 72.3

Random 14.3 40.1

A

t

a

i

I

A

c

I

e

c

w

t

d

a

b

c

p

r

t

I

p

t

s

t

t

v

t

f

A

t

t

w

m

t

p

2 http://sentic.net/downloads

.3.8. Biased regularization for semi-supervised learning supports

ffective model selection

The proposed semi-supervised learning framework can extend

he supervised classification scheme presented in [53] , which

emonstrated that when a cluster hypothesis holds, the use of

lustering to set a reference solution leads to a significant reduc-

ion of the space of possible functions. Such a result is noteworthy

n that it leads to tight generalization bounds, since the term χ inq. (13) measures the complexity of the space of classifying func-

ions. Tight generalization bounds are in turn a necessary condition

or supporting an effective model selection.

In [53] , model selection of SVM is actually performed by ex-

loiting an auxiliary machine, called VQSVM. In the VQSVM model

he learning task is fully supervised because the clustering refer-

nce only derives from labeled data. Besides, an Ivanov biased reg-

larization term is used and the regularization parameter λ2 is im-licitly set to a fixed value.

To extend the scheme [53] to semi-supervised learning, the

resent framework involves both labeled and unlabeled data in

he clustering step. Indeed, the effectiveness of model selection is

mproved by adopting a formulation of biased regularization that

ully exploits parameters λ1 and λ2 . Fig. 6 considers four cases, and analyzes the relative positions

f the origin w = 0 , the reference, w 0 , and the true solution, w ∗.ig. 2 (a) exemplifies the case ‘good reference / weak bias’, whereas

ig. 2 (b) illustrates the case ‘good reference / strong bias’. In both

ituations, the reference solution w 0 is not far from the true so-

ution w ∗. Yet, the first case adopts a weak bias, which permitshe biasing step to explore a relatively wide portion of space

round the reference (the lighter circumference, which represents

he space of functions). On the other hand, the second case adopts

strong bias, thereby exploring a smaller portion of space. Even-

ually, the area being explored via a strong bias does not include

∗. Fig. 6 (c) refers to the case ‘bad reference / weak bias’. The ref-

rence w 0 is quite distant from the true solution w ∗; by adopt-

ng a weak bias, though, one can still exploit biasing to reach w ∗.inally, Fig. 6 (d) presents the case ‘bad reference / strong bias’. In

his situation, by adopting a strong bias one restricts the space to

e explored to a small region around w 0 . As a result, the proposed

olution will be very distant from the true solution w ∗. On the whole, the four examples affirm that that by modulat-

ng the biasing mechanism through the parameters λ1 and λ2 onean take full advantage of the semi-supervised scheme and support

he model selection procedure properly. Eventually, the proposed

ramework involves a novel semi-supervised classification scheme,

hich supports a fully automated model selection and can be ap-

lied also when the size of the labeled dataset is small ( l < 50).

. Experimental results

The proposed affective commonsense reasoning architecture

ased on bSVM and bRLS was tested on the publicly available

ffectNet benchmark. 2 The AffectNet database provides four affec-

ive labels for 2257 concepts. Each label corresponds to a level of

ctivation in the Hourglass model. Concepts are described accord-

ng to the m -dimensional vector space defined by AffectiveSpace 2.

n the present experimental session, two different configurations of

ffectiveSpace were compared: m = 100 and m = 50. In order to robustly evaluate the performance of the proposed

lassification framework, a cross-validation procedure was applied.

n particular, we considered ten different experimental runs. In

ach run, 200 concepts that were randomly extracted from the

omplete database provided the test set; the remaining concepts

ere evenly split into a training set and a validation set. In order

o test the semi-supervised approach, 500 unlabeled patterns ran-

omly extracted from a total of 16,431 unlabeled concepts were

dded to the training set. The reference function of the bSVM and

RLS frameworks can be derived from any clustering algorithm; we

hose for this purpose the k-means algorithm.

In the present set up, the validation set was designed to sup-

ort the model selection phase, i.e., the selection of the best pa-

ameterization for the classifier. A RBF kernel was adopted, thus

he model selection actually involved two parameters: γ and C.n each run the classification accuracy was measured by using the

rediction system obtained after the model selection phase. Only

he patterns included in the test set were exploited to assess clas-

ification accuracy, i.e., the patterns that were not involved in the

raining phase or in the model selection phase.

Table 3 reports the average classification accuracy obtained by

he bSVM and bRLS frameworks over the ten runs. The table pro-

ides the results of the two different experiments addressed in

his research: the experiment involving the 100-dimensional Af-

ectiveSpace 2 and the experiment involving the 50-dimensional

ffectiveSpace 2. Classification accuracy is evaluated according to

wo different criteria. The first criterion, “Strict Accuracy”, refers

o the percentage of patterns for which the classification frame-

ork correctly predicted the activation level for every affective di-

ension. The second criterion, “Relaxed Accuracy”, assumes that

he prediction is correct even in the event the framework fails to

roperly assess one affective dimension out of four. The relaxed

http://sentic.net/downloads


Table 4

Comparison with the state of the art.

System Accuracy (%)

Socher et al. (2012) 80.00

Socher et al. (2013) 85.40

Proposed method 88.50

accuracy actually takes into account that a certain level of noise

may hinder the AffectNet benchmark, as affective activation levels

are assigned according to subjective tests.

The results showed in Table 3 provide a few interesting out-

comes. Firstly, both the bSVM-based and the bRLS-based frame-

works proved to be able to attain very promising performance in

terms of classification accuracy. Indeed, the proposed approach sig-

nificantly improves over standard classification techniques such as

k-NN model, k-medoids, and artificial neural network (ANN). Sec-

ondly, both frameworks also attained reliable performance when

the 50-dimensional AffectiveSpace 2 was adopted, which consis-

tently reduced the complexity of the reasoning architecture. Such

results confirm the ability to deal with complex problems. More-

over, bRLS slightly improves over bSVM.

The proposed affective reasoning framework with bSVM was

also evaluated on an opinion mining dataset derived from a corpus

developed by Pang and Lee [32] . The corpus consists of 10 0 0 pos-

itive and 10 0 0 negative movie reviews from expert reviewers, col-

lected from rottentomatoes.com. All text has been converted into

lowercase and has been lemmatized, whilst HTML tags have also

been removed. Pang and Lee initially labelled each review man-

ually as either positive or negative, following which the Stanford

NLP group annotated this dataset at sentence level [54,55] .

They extracted 11,855 sentences from the reviews and manu-

ally labeled them as positive and negative. We used the emotion-

polarity formula provided by the Hourglass model to calculate a

binary polarity value for each dataset sentence. Table 4 presents

the comparison of the proposed system with the state-of-the-art

accuracy.

5. Conclusion

We live in a world where millions express their views and opin-

ions of commercial products on the web on a daily basis. This

distillation of knowledge due to the massive amounts of unstruc-

tured information available, is of considerable importance for tasks

such as social media marketing, product positioning, and financial

market prediction.

While existing approaches are limited by the fact that they

work at word-level need a lot of training, semi-supervised com-

monsense reasoning seems a good solution to the problem of big

social data analysis. In this work, a novel learning model based on

the combined use of random projections and support vector ma-

chines was exploited to perform reasoning on a knowledge base of

affective commonsense. Results showed a significant improvement

in both emotion recognition and polarity detection and, hence,

paved the way for semi-supervised learning approaches to big

social data analysis.

References

[1] S. Poria , E. Cambria , R. Bajpai , A. Hussain , A review of affective computing:from unimodal analysis to multimodal fusion, Inf. Fus. 37 (2017) 98–125 .

[2] E. Cambria , A. Hussain , Sentic Computing: A Common-Sense-Based Frameworkfor Concept-Level Sentiment Analysis, Springer, Cham, Switzerland, 2015 .

[3] E. Cambria , D. Olsher , K. Kwok , Sentic activation: a two-level affective com-

mon sense reasoning framework, in: Proceedings of the AAAI, Toronto, 2012,pp. 186–192 .

[4] R. Speer , C. Havasi , ConceptNet 5: A large semantic network for relationalknowledge, in: E. Hovy, M. Johnson, G. Hirst (Eds.), Theory and Applications

of Natural Language Processing, Springer, 2012 .

[5] C. Strapparava , A. Valitutti , WordNet-Affect: an affective extension of Word-Net, in: Proceedings of the International Conference on Language Resources

and Evaluation, Lisbon, 2004, pp. 1083–1086 . [6] E. Cambria , D. Das , S. Bandyopadhyay , A. Feraco , A Practical Guide to Senti-

ment Analysis, Springer, Cham, Switzerland, 2017 . [7] F. Bisio , P. Gastaldo , R. Zunino , S. Decherchi , Semi-supervised machine learn-

ing approach for unknown malicious software detection, in: Proceedings of theIEEE International Symposium on Innovations in Intelligent Systems and Appli-

cations (INISTA), IEEE, 2014, pp. 52–59 .

[8] S. Poria , E. Cambria , D. Hazarika , N. Mazumder , A. Zadeh , L.-P. Morency , Con-text-dependent sentiment analysis in user-generated videos, in: Proceedings of

the ACL, 2017, pp. 873–883 . [9] I. Chaturvedi , E. Ragusa , P. Gastaldo , R. Zunino , E. Cambria , Bayesian net-

work based extreme learning machine for subjectivity detection, J. Frankl. Inst.(2017) .

[10] F. Xing, E. Cambria, R. Welsch, Natural language based financial forecasting: a

survey, Artif. Intell. Rev. (2018), doi: 10.1007/s10462- 017- 9588- 9 . [11] M. Ebrahimi , A. Hossein , A. Sheth , Challenges of sentiment analysis for dynam-

icevents, IEEE Intell. Syst. 32 (5) (2017) . [12] E. Cambria , A. Hussain , T. Durrani , C. Havasi , C. Eckl , J. Munro , Sentic comput-

ing for patient centered application, in: Proceedings of the IEEE InternationalConference on Signal Processing, Beijing, 2010, pp. 1279–1282 .

[13] A. Valdivia , V. Luzon , F. Herrera , Sentiment analysis in tripadvisor, IEEE Intell.

Syst. 32 (4) (2017) 2–7 . [14] S. Cavallari , V. Zheng , H. Cai , K. Chang , E. Cambria , Joint node and commu-

nity embedding on graphs, in: Proceedings of the International Conference onInformation and Knowledge Management, 2017 .

[15] R. Mihalcea , A. Garimella , What men say, what women hear: finding gen-der-specific meaning shades, IEEE Intell. Syst. 31 (4) (2016) 62–67 .

[16] E. Cambria , S. Poria , A. Gelbukh , M. Thelwall , Sentiment analysis is a big suit-

case, IEEE Intell. Syst. 32 (6) (2017) . [17] S. Poria , I. Chaturvedi , E. Cambria , F. Bisio , Sentic LDA: improving on LDA with

semantic similarity for aspect-based sentiment analysis, in: Proceedings of theInternational Joint Conference on Neural Networks, 2016, pp. 4 465–4 473 .

[18] Y. Ma , E. Cambria , S. Gao , Label embedding for zero-shot fine-grained namedentity typing, in: Proceedings of the International Conference on Computa-

tional Linguistics, Osaka, 2016, pp. 171–180 .

[19] Y. Xia , E. Cambria , A. Hussain , H. Zhao , Word polarity disambiguation us-ing Bayesian model and opinion-level features, Cognit. Comput. 7 (3) (2015)

369–380 . [20] X. Zhong , A. Sun , E. Cambria , Time expression analysis and recognition using

syntactic token types and general heuristic rules, in: Proceedings of the ACL,2017, pp. 420–429 .

[21] N. Majumder , S. Poria , A. Gelbukh , E. Cambria , Deep learning-based document

modeling for personality detection from text, IEEE Intell. Syst. 32 (2) (2017)74–79 .

[22] S. Poria , E. Cambria , D. Hazarika , P. Vij , A deeper look into sarcastic tweetsusing deep convolutional neural networks, in: Proceedings of the International

Conference on Computational Linguistics, 2016, pp. 1601–1612 . [23] L. Oneto , F. Bisio , E. Cambria , D. Anguita , Statistical learning theory and ELM

for big social data analysis, IEEE Comput. Intell. Mag. 11 (3) (2016) 45–55 . [24] C.D. Elliott , The affective reasoner: a process model of emotions in a multi-a-

gent system, Northwestern University, Evanston, 1992 Ph.D. thesis .

[25] A. Ortony , G. Clore , A. Collins , The Cognitive Structure of Emotions, CambridgeUniversity Press, Cambridge, 1988 .

[26] J. Wiebe , T. Wilson , C. Cardie , Annotating expressions of opinions and emotionsin language, Lang. Resour. Eval. 39 (2) (2005) 165–210 .

[27] T. Wilson , J. Wiebe , P. Hoffmann , Recognizing contextual polarity inphrase-level sentiment analysis, in: Proceedings of the Human Language Tech-

nology Conference and Empirical Methods in Natural Language Processing,

Vancouver, 2005, pp. 347–354 . [28] R. Stevenson , J. Mikels , T. James , Characterization of the affective norms for

english words by discrete emotional categories, Behav. Res. Methods 39 (2007)1020–1024 .

[29] S. Somasundaran , J. Wiebe , J. Ruppenhofer , Discourse level opinion interpre-tation, in: Proceedings of the International Conference on Computational Lin-

guistics, Manchester, 2008, pp. 801–808 .

[30] B. Pang , L. Lee , S. Vaithyanathan , Thumbs up? Sentiment classification usingmachine learning techniques, in: Proceedings of the Empirical Methods for

Natural Language Processing, Philadelphia, 2002, pp. 79–86 . [31] B. Goertzel , K. Silverman , C. Hartley , S. Bugaj , M. Ross , The Baby Webmind

project, in: Proceedings of the Adaptation in Artificial and Biological Systems,Birmingham, 20 0 0 .

[32] B. Pang , L. Lee , Seeing stars: exploiting class relationships for sentiment cate-

gorization with respect to rating scales, in: Proceedings of the ACL, Ann Arbor,2005, pp. 115–124 .

[33] M. Hu , B. Liu , Mining and summarizing customer reviews, in: Proceedings ofthe Conference on Knowledge Discovery and Data Mining, Seattle, 2004 .

[34] L. Velikovich , S. Goldensohn , K. Hannan , R. McDonald , The viability of we-b-derived polarity lexicons, in: Proceedings of the Annual Conference of the

North American Chapter of the Association for Computational Linguistics, Los

Angeles, 2010, pp. 777–785 . [35] A. Gangemi , V. Presutti , D. Reforgiato , Frame-based detection of opinion hold-

ers and topics: a model and a tool, IEEE Comput. Intell. Mag. 9 (1) (2014)20–30 .

http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0001http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0001http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0001http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0001http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0001http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0002http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0002http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0002http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0003http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0003http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0003http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0003http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0004http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0004http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0004http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0005http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0005http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0005http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0006http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0006http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0006http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0006http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0006http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0007http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0007http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0007http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0007http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0007http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0008http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0008http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0008http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0008http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0008http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0008http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0008http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0009http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0009http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0009http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0009http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0009http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0009https://doi.org/10.1007/s10462-017-9588-9http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0011http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0011http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0011http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0011http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0012http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0012http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0012http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0012http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0012http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0012http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0012http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0013http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0013http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0013http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0013http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0014http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0014http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0014http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0014http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0014http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0014http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0015http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0015http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0015http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0016http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0016http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0016http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0016http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0016http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0017http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0017http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0017http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0017http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0017http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0018http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0018http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0018http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0018http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0019http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0019http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0019http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0019http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0019http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0020http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0020http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0020http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0020http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0021http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0021http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0021http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0021http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0021http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0022http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0022http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0022http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0022http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0022http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0023http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0023http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0023http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0023http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0023http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0024http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0024http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0025http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0025http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0025http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0025http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0026http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0026http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0026http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0026http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0027http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0027http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0027http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0027http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0028http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0028http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0028http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0028http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0029http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0029http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0029http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0029http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0030http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0030http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0030http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0030http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0031http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0031http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0031http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0031http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0031http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0031http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0032http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0032http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0032http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0033http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0033http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0033http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0034http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0034http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0034http://refhub.elsevier.com/S0925-2312(17)3

Documents

Semi-supervised learning for big social data analysis · Big social data analysis Sentiment analysis a b s t r a c t anera of mediasocial connectivity, weband users becoming enthusiastic