Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Neurocomputing 275 (2018) 1662–1673
Contents lists available at ScienceDirect
Neurocomputing
journal homepage: www.elsevier.com/locate/neucom
Semi-supervised learning for big social data analysis
Amir Hussain a , Erik Cambria b , ∗
a School of Natural Sciences, University of Stirling, FK9 4LA, Stirling, UK b School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Ave, Singapore 639798
a r t i c l e i n f o
Article history:
Received 1 May 2017
Revised 6 October 2017
Accepted 6 October 2017
Available online 18 October 2017
Communicated by Zidong Wang
Keywords:
Semi-supervised learning
Big social data analysis
Sentiment analysis
a b s t r a c t
In an era of social media and connectivity, web users are becoming increasingly enthusiastic about inter-
acting, sharing, and working together through online collaborative media. More recently, this collective
intelligence has spread to many different areas, with a growing impact on everyday life, such as in ed-
ucation, health, commerce and tourism, leading to an exponential growth in the size of the social Web.
However, the distillation of knowledge from such unstructured Big data is, an extremely challenging task.
Consequently, the semantic and multimodal contents of the Web in this present day are, whilst being
well suited for human use, still barely accessible to machines. In this work, we explore the potential of a
novel semi-supervised learning model based on the combined use of random projection scaling as part of
a vector space model, and support vector machines to perform reasoning on a knowledge base. The latter
is developed by merging a graph representation of commonsense with a linguistic resource for the lexical
representation of affect. Com parative simulation results show a significant improvement in tasks such as
emotion recognition and polarity detection, and pave the way for development of future semi-supervised
learning approaches to big social data analytics.
© 2017 Elsevier B.V. All rights reserved.
i
u
i
i
t
l
v
r
t
t
a
r
t
e
t
e
p
w
m
o
1. Introduction
With the advent of social networks, web communities, blogs,
Wikipedia, and other forms of online collaborative media, the
way people express their opinions and sentiments has radically
changed in recent years [1] . These new tools have facilitated the
creation of original content, ideas, and opinions, connecting mil-
lions of people through the World Wide Web, in a financially and
labour-effective manner. This has made a huge source of informa-
tion and opinions easily available by the mere click of a mouse.
As a result, the distillation of knowledge from this huge
amount of unstructured information comes into vital play for mar-
keters looking to create and shape brand and product identities.
The practical purpose this encapsulates has led to the emerg-
ing field of big social data analysis, which deals with information
retrieval and knowledge discovery from natural language and so-
cial networks using graph mining and natural language processing
(NLP) techniques to distill knowledge and opinions from the huge
amount of information on the World Wide Web. Sentic comput-
ing [2] tackles these crucial issues by exploiting affective common-
sense reasoning, modeled upon the intrinsically human capacity to
∗ Corresponding author. E-mail addresses: [email protected] (A. Hussain), [email protected] (E. Cam-
bria).
a
a
(
c
https://doi.org/10.1016/j.neucom.2017.10.010
0925-2312/© 2017 Elsevier B.V. All rights reserved.
nterpret cognitive and affective information associated with nat-
ral language, so as to infer new knowledge and make decisions
n connection with one’s social and emotional values, censors, and
deals. In other words, we can say that commonsense computing
echniques are applied to narrow the semantic gap between word-
evel natural language data and the concept-level opinions con-
eyed by these.
In the past, graph mining techniques and multi-dimensionality
eduction techniques [3] were employed on a knowledge base ob-
ained by merging ConceptNet [4] , a directed graph representa-
ion of commonsense knowledge, with WordNet-Affect (WNA) [5] ,
linguistic resource for the lexical representation of affect. Our
esearch fits within the sentic computing framework and aims
o exploit machine learning for developing a cognitive model for
motion recognition in natural language text. Unlike purely syntac-
ical techniques, concept-based approaches can even detect subtly
xpressed sentiments e.g. by analyzing concepts that do not ex-
licitly convey any emotions, but are linked implicitly to others
hich do. On this note, the bag-of-concepts model represents se-
antics associated with natural language, much better than bag-
f-words. Interestingly, in the bag-of-words model, a concept such
s cloud_computing would be split into two separate words,nd would hence disrupt the semantics of the input sentence
in which, for example, the word cloud could wrongly activateoncepts related to weather ).
https://doi.org/10.1016/j.neucom.2017.10.010http://www.ScienceDirect.comhttp://www.elsevier.com/locate/neucomhttp://crossmark.crossref.org/dialog/?doi=10.1016/j.neucom.2017.10.010&domain=pdfmailto:[email protected]:[email protected]://doi.org/10.1016/j.neucom.2017.10.010
A. Hussain, E. Cambria / Neurocomputing 275 (2018) 1662–1673 1663
a
a
s
w
u
t
G
i
m
c
a
o
e
b
r
b
m
(
s
t
u
o
i
c
a
c
p
i
i
a
t
c
r
a
v
l
s
m
u
s
s
e
c
p
s
c
I
c
c
t
T
d
a
h
o
m
i
s
f
i
t
S
a
t
b
f
w
2
p
b
r
i
a
n
w
c
a
f
t
p
l
a
i
n
t
i
u
u
b
w
c
o
u
o
f
t
p
g
t
u
i
t
s
v
p
n
p
d
u
p
b
c
p
b
a
F
k
(
o
t
s
Concept level analysis permits the inference of semantic and
ffective information associated with natural language opinions
nd, therefore, facilitates comparative fine-grained feature-based
entiment analysis [6] . Instead of collecting isolated opinions on a
hole item (e.g., iPhone8), users prefer to compare different prod-
cts according to specific features (e.g., iPhone8’s vs. GalaxyS8’s
ouchscreen), or even sub-features (e.g., fragility of iPhone8’s vs.
alaxyS8’s touchscreen). In this context, commonsense knowledge
s essential for deconstructing natural language text into senti-
ents accurately; for instance, the concept go_read_the_bookan be deemed positive if present in a book review, whereas in
movie review would indicate negative feedback. The inference
f emotions and polarity from natural language concepts is, how-
ver, a formidable task as it requires advanced reasoning capa-
ilities such as commonsense as well as analogical and affective
easoning.
In the proposed work, a novel semi-supervised learning model
ased on the merged use of multi-dimensional scaling (MDS) by
eans of random projections and biased support vector machine
bSVM) [7] is exploited for the task of emotion recognition. Semi-
upervised classification should demonstrate an improvement over
he classification rule, as both unlabeled and labeled data are
tilised to empirically learn a classification function (compared to
nly labeled data). Interest in semi-supervised learning has grown
n recent times, which can be attributed to the existence of appli-
ation domains (e.g., text mining, natural language processing, im-
ge and video retrieval, and bioinformatics). The proposed scheme
an benefit from biased regularization, which provides a viable ap-
roach to implementing an inductive bias in a kernel machine. This
s of fundamental importance in learning theory given that it heav-
ly influences the generalization ability of a learning system. From
mathematical perspective, inductive bias can be formalized as
he set of assumptions which determine the choice of a particular
lass of functions for supporting the learning process. Therefore, it
epresents a powerful tool that embeds prior knowledge for the
pplicative problem at hand.
To this aim, semi-supervised learning is formalized as a super-
ised learning problem biased by an unsupervised reference so-
ution. First, we introduce a novel, general biased-regularization
cheme that integrates biased versions of two well-known kernel
achines, specifically, support vector machines (SVMs) and reg-
larized least squares (RLS). Subsequently, we propose a semi-
upervised learning model, based on this biased-regularization
cheme adopting a two-stage procedure. In the first stage, a refer-
nce solution is obtained using an unsupervised clustering of the
omplete dataset (including both unlabeled and labeled data). A
rimary impact of this is that the eventual semi-supervised clas-
ification framework can derive the reference function from any
lustering algorithm, thus providing it with remarkable flexibility.
n the following stage, clustering outcomes drive the learning pro-
ess in a biased SVM (bSVM) or a biased RL S (bRL S) to acquire
lass information provided by the labels. The final outcome is that
he overall learned function utilizes labeled and unlabeled data.
he developed framework is applicable to linear and non-linear
ata distributions: the former works based on a cluster assumption
pplied to the data, whilst the latter operates based on a manifold
ypothesis. Consequently, a semi-supervised learning process can
nly be valid, when unlabeled data can assume an intrinsic geo-
etric structure, for example, a low-dimensional non-linear man-
fold in the ideal case. With respect to previous strategies, the re-
ults demonstrate significant enhancements and pave the way for
uture development of semi-supervised learning approaches echo-
ng that of affective commonsense reasoning.
The rest of this paper is organized as follows: Section 2 in-
roduces related work in the field of sentiment analysis research;
ection 3 describes in detail the new semi-supervised learning
rchitecture for affective commonsense reasoning; Section 4 illus-
rates results obtained by applying the new model to an affective
enchmark and to an opinion mining dataset; finally, Section 5 of-
ers some concluding remarks and recommendations for future
ork.
. Related work
In recent years, sentiment analysis [6] has become increasingly
opular for processing social media data on online communities,
logs, Wikis, microblogging platforms, and other online collabo-
ative media. Sentiment analysis is a branch of affective comput-
ng research that aims to classify text (but sometimes also audio
nd video [8] ) into either positive or negative (but sometimes also
eutral [9] ). Sentiment analysis has raised growing interest both
ithin the scientific community, leading to many exciting open
hallenges, as well as in the business world, due to the remark-
ble benefits to be had from financial forecasting [10] and political
orecasting [11] , e-health [12] and e-tourism [13] , community de-
ection [14] and user profiling [15] , and more.
While most works approach it as a simple categorization
roblem, sentiment analysis is actually a suitcase research prob-
em [16] that requires tackling many NLP tasks, including
spect extraction [17] , named entity recognition [18] , word polar-
ty disambiguation [19] , temporal tagging [20] , personality recog-
ition [21] , and sarcasm detection [22] . Most existing approaches
o sentiment analysis rely on the extraction of a vector represent-
ng the most salient and important text features, which is later
sed for classification purposes [23] . Some of the most commonly
sed features are term frequency and presence. The latter is a
inary-valued feature vector in which the entries merely indicate
hether a term occurs (value 1) or not (value 0). Feature vectors
an sometimes have term-based features with them. Position is
ne such example; considering that the position of tokens in text
nits can alter a token’s effect on the text’s sentiment. Presence
f n-grams, usually bi-grams and tri-grams, can also be useful as
eatures, as one can find methods which are dependent on dis-
ances between terms. Part-of-speech (POS) information (for exam-
le, nouns, verbs, adverbs, and adjectives) is commonly utilized for
eneral textual analysis, in a basic form of word-sense disambigua-
ion. There are some specific adjectives, which have been proven as
seful indicators of sentiment, and as guides for feature selection
n sentiment classification. Lastly, other studies carried out the de-
ection of sentiments via selected phrases, selected through pre-
pecified POS patterns, the majority of which had either an ad-
erb or an adjective. Numerous approaches exist which map given
ieces of text to labels from predefined sets of categories, or real
umber representatives of a polarity degree. Nonetheless, these ap-
roaches and their performances are confined to an application’s
omain and relevant areas.
The transformation of sentiment analysis research can be eval-
ated by examining the token of analysis used, along with im-
licit associated information. In this way, current approaches can
e sorted into four primary categories: keyword spotting, statisti-
al methods, lexical affinity and concept based techniques.
Keyword spotting is very naïve and is the most popular ap-
roach, due to its accessibility and economical nature. Text can
e grouped into affect categories depending on fairly unambiguous
ffect words like ‘happy’, ‘bored, ‘afraid’, and ‘sad’ being present.
or example, Elliott’s Affective Reasoner [24] , checks for 198 affect
eywords (e.g., ‘distressed’, ‘enraged’) and affect intensity modifiers
e.g., ‘extremely’, ‘mildly’, and ‘somewhat’). Other popular sources
f affect words are Ortony’s Affective Lexicon [25] , which groups
erms into affective categories, and Wiebe’s linguistic annotation
cheme [26] .
1664 A. Hussain, E. Cambria / Neurocomputing 275 (2018) 1662–1673
Fig. 1. The Hourglass of Emotions.
a
s
a
s
d
C
s
p
c
t
a
a
k
l
t
a
(
C
c
e
i
e
g
Lexical ‘affinity’ is slightly more sophisticated than keyword
spotting as, rather than simply detecting obvious affect words, it
assigns specific words a probabilistic affinity’ for a particular emo-
tion. For example, accident might be assigned a 75% probabilityof being indicative of a negative affect, as in the case of situations
such as car_accident or hurt_by_accident . These probabil-ities are usually trained from linguistic corpora [27–29] .
Statistical methods like that of latent semantic analysis (LSA)
and SVM, have proven useful for the affective cataloging of
texts. Researchers have applied these approaches on projects like
Pang’s movie review classifier [30] , Goertzel’s Webmind [31] , and
more [17,32–34] . By feeding a machine learning algorithm a large
training corpus of affectively annotated texts, it is possible for
the systems to not only learn the affective valence of affect key-
words, but to also take into account the valence of other arbi-
trary keywords (like lexical affinity), punctuation, and word co-
occurrence frequencies. Statistical methods, however, tend to be
semantically weak, which means that, with the exception of ob-
vious affect keywords, other lexical or co-occurrence elements in
a statistical model have little predictive value individually. Hence,
statistical classifiers only have an acceptable accuracy when given
a large enough text input. Therefore, while these methods may
be able to classify text at a paragraph or page level, they are not
effective for smaller text units such as sentences.
Concept-based approaches focus on a semantic analysis of text
through the use of web ontologies [35] or semantic networks [36] ,
which require grasping the conceptual and affective information
associated with natural language opinions. By relying on large
semantic knowledge bases, such approaches step away from the
blind use of keywords and word co-occurrence count, instead re-
lying on the implicit meaning/features associated with natural lan-
guage concepts. Concept-based approaches differ from purely syn-
tactical techniques, in that they can detect sentiments which are
expressed in a subtle manner, e.g., through the analysis of con-
cepts which are implicitly linked to other concepts that express
emotions.
3. Semi-supervised reasoning
In the proposed framework, MDS is used to represent concepts
in a multi-dimensional vector space and biased SVM (bSVM) is
exploited to infer semantics and sentics (that is, the conceptual
and affective information) associated with such concepts, according
to an hourglass-shaped emotion categorization model [2] ( Fig. 1 ).
Under the aforementioned model, sentiments are organized around
four independent dimensions (Pleasantness, Attention, Sensitivity,
and Aptitude) whose different levels of activation make up the
total emotional state of the mind. The bSVM model is adopted
as a semi-supervised approach to tackle the classification task so
as to overcome the lack of labeled commonsense data. In semi-
supervised classification, in fact, both unlabeled and labeled data
are exploited to learn a classification function empirically instead
of learning a classification rule based only on labeled data. As
a result, concepts for which affective information is missing can
be employed in the classification phase. The main purpose of the
bSVM-based framework developed in this study is to foretell the
degree of affective valence each concept posses in a particular facet
of the Hourglass model.
3.1. Affective commonsense knowledge base
The core module of the framework hereby proposed is an
affective commonsense knowledge base built upon ConceptNet, the
graph representation of the Open Mind Common Sense (OMCS)
corpus, which is structurally similar to WordNet [37] , but whose
scope of contents is general world knowledge, in the same vein
s Cyc [38] . Instead of insisting on formalizing commonsense rea-
oning using mathematical logic [39] , ConceptNet uses a new
pproach: it represents multi-word expressions in the form of a
emantic network and makes it available for usage in NLP.
WordNet focuses on lexical categorization and word-similarity
etermination and Cyc focuses on formalized logical reasoning.
ontrastingly, ConceptNet is characterized by contextual common-
ense reasoning: this means that it is meant for the task of making
ractical context-based inferences over real-world texts. In Con-
eptNet, WordNet’s notion of node in the semantic network is ex-
ended from purely lexical items (words and simple phrases with
tomic meaning) to include higher-order compound concepts such
s satisfy_hunger and follow_recipe , so as to representnowledge of a greater range of concepts found in everyday life.
Moreover, in ConceptNet, WordNet’s repertoire of semantic re-
ations is extended from the triplet of synonym, IsA and PartOf ,
o a repertoire of twenty semantic relations including, for ex-
mple, EffectOf (causality), SubeventOf (event hierarchy), CapableOf
agent’s ability), MotivationOf (affect), PropertyOf , and LocationOf .
onceptNet’s knowledge is also of a more informal and practi-
al nature. For example, WordNet has formal taxonomic knowl-
dge that a ‘dog’ is a ‘canine’, which is a ‘carnivore’, which
s a ‘placental mammal’; but it cannot make the logically ori-
nted member-to-set association that a dog falls under the cate-ories of pet or family_member . ConceptNet on the other hand
A. Hussain, E. Cambria / Neurocomputing 275 (2018) 1662–1673 1665
Fig. 2. AffectNet.
d
E
a
n
i
r
s
s
o
b
o
s
y
t
p
u
m
a
l
w
t
i
a
s
s
c
b
t
t
i
b
f
w
c
c
u
p
w
i
i
W
c
c
t
t
t
s
Fig. 3. AffectiveSpace’s first two eigenmoods.
i
a
e
s
e
t
i
d
3
i
f
a
a
T
i
w
p
p
c
t
u
p
w
t
p
r
p
w
t
escribes something that is often true but not always. For instance,
ffectOf( fall_off_bicycle , get_hurt ); this shows it contains lot of knowledge that is defeasible, which is something that can-
ot be left aside in commonsense reasoning. Most of the facts that
nterrelate ConceptNet’s semantic network are dedicated to making
ather generic connections between concepts.
ConceptNet is obtained via an automatic process, in which a
et of extraction rules are applied to the semi-structured English
entences of the OMCS corpus, following which an additional set
f ‘relaxation’ procedures are applied. This results in network gaps
eing filled in and smoothed over, thereby optimizing connectivity
f semantic networks. ConceptNet is a good source of common-
ense knowledge, but it alone is not enough for sentiment anal-
sis despite detailing the semantic links between concepts, given
hat it frequently misses associations between concepts that ex-
ress the same sentiment. To surmount this flaw, we employed the
se of WNA – a semantic resource for the etymological embodi-
ent of affective knowledge, which was built upon WordNet. By
llocating a number of WordNet synsets to one or more affective
abels (a-labels), WNA was developed. For instance, synsets marked
ith the a-label ‘emotion’ demarcated concepts representing emo-
ional states. There are also other a-labels for concepts represent-
ng situations that provoke emotions, emotional responses as well
s moods. Through a dual-stage process, WNA was born. The first
tage pertained to the documentation of a base core of affective
ynsets, while the second saw the expansion of the core with asso-
iations outlined in WordNet. ConceptNet and WNA are then com-
ined through the linear fusion of their respective matrix represen-
ations into a single matrix, in which the knowledge between the
wo databases is shared. In order for this combination process, the
nput data from both sources has to be transmuted so that it can
e denoted in its entirety in the same matrix. Hence, the lemma
orms of ConceptNet notions are allied with the lemma forms of
ords in WNA, and the most common associations in WMA are
harted into ConceptNet’s set of relations. For instance, Holonym is
harted into PartOf and Hypernym is to IsA . Effectively, we transfig-
re ConceptNet into a matrix by divvying each assertion into two
arts: a concept and a feature, wherein a feature is a contention
ithout any quantified concept; for example ‘is a type of fluid’.
Based on the dependability of assertions, the subsequent logs
n the matrix are either positive or negative scores, which scale
ncreases proportionally to their dependability. Like ConceptNet,
NA is also denoted as a matrix, in which rows are affective
oncepts and columns are associated qualities. In combining Con-
eptNet and WNA into a single matrix, we created a new affec-
ive semantic network, in which commonsense concepts are in-
errelated with a graded system of affective domain tags. We call
his new framework AffectNet [2] ( Fig. 2 ); through it, common-
ense and emotional knowledge are not merely connected, but
nstead melded together. For example, concepts from daily life such
s meet_people or have_breakfast are now associated withmotional domain notions such as. ‘joy’, ‘anger’, or ‘surprise’. Such
emantic associations can be of much benefit when tasks such as
motion recognition or polarity detection from natural language
ext are performed, as it is common for both sentiments and opin-
ons to be conveyed implicitly through context and domain depen-
ent concepts, instead of through specific affect words.
.2. Affective analogical reasoning
Problems are solved best when we understand a solution for
t. The tricky part is when we encounter problems we have never
aced before, for which require intuition. Intuition can be explained
s the process of forming analogies between a current problem
nd ones we have cracked previously to find a suitable solution.
his process of thinking Fig. 3 could well be the essence of human
ntelligence, as in daily life no two situations are identical, and
e are continually having to apply analogical reasoning to solve
roblems and make decisions. The human mind continuously ap-
lies compression in all vital relations [40] . The compression prin-
iples aim to convert diffuse and distended conceptual structures
o more focused versions, to become more congenial for human
nderstanding.
In order to emulate such a process, MDS was previously ap-
lied on the matrix representation of AffectNet, a semantic net-
ork in which commonsense concepts were linked to seman-
ic and affective f eatures ( Table 1 ). The result was AffectiveS-
ace. PCA is most widely used as a data-aware dimensionality
eduction method [41] and is closely related to the low-rank ap-
roximation method, singular value decomposition (SVD), as both
ork on a transformed version of the data matrix [42] . Therefore,
runcated singular value decomposition (TSVD) is applied to the
1666 A. Hussain, E. Cambria / Neurocomputing 275 (2018) 1662–1673
Table 1
A snippet of the AffectNet matrix.
AffectNet IsA-pet KindOf-food Arises-joy ...
Dog 0.981 0 0.789 ...
Cupcake 0 0.922 0.910 ...
Songbird 0.672 0 0.862 ...
Gift 0 0 0.899 ...
Sandwich 0 0.853 0.768 ...
Rotten fish 0 0.459 0 ...
Win lottery 0 0 0.991 ...
Bunny 0.611 0.892 0.594 ...
Police man 0 0 0 ...
Cat 0.913 0 0.699 ...
Rattlesnake 0.432 0.235 0 ...
... ... ... ... ...
‘
s
c
l
i
c
a
c
f
t
A
p
p r
l d
s
c
s
t
i
t
r
y
f
f
t
A
i
n
t
o
i
N
p
m
L
y√
≤
w
d
w
m
o
t
s
e
concept-feature matrix for the purposes of expediently diminish-
ing its dimensionality and netting key links.
The purpose of this is to ensure that the new joint model com-
prises solely of the key features that exemplify the general outlook.
Employing TSVD on AffectNet, results in it defining other qualities
that could be of parallel relevance to known affective concepts. If
there is no known value for an item in a concept in the matrix,
but the same item is inherent in other analogous concepts, then
by parallel association, it is highly probable that the same item is
inherent in the concept as well. Essentially, concepts and features
that are aligned and possess high dot yields are ideal contenders
for analogies.
Osgood et al. [43] produced a revolutionary piece on compre-
hending and visualizing affective knowledge connected to natural
language text. In their work, MDS was utilized to construct visu-
alizations of affective texts based on their parallel scores when
contrasted with texts from other cultures. In a multi-dimensional
space, words can be conceived of as points, and parallel scores
then denote the distances between words. MDS rethinks these dis-
tances as points in a reduced dimensional space (most commonly
two or three dimensioned). Similarly, AffectiveSpace’s purpose is
to visualize the semantic and affective likeness between dissimi-
lar concepts by projecting them onto a multi-dimensional vector
space.
However, different from that in Osgood’s research, the basic
foundation of AffectiveSpace is not merely a restricted set of par-
allel scores between affect words, but instead consists of millions
of confidence scores linked to bits of commonsense knowledge
that are in turn related to a structured system of affective domain
labels. Instead of being defined by just a small number of human
annotators and characterized as a word-word matrix, AffectiveS-
pace is solidly based on AffectNet and denoted as a concept-feature
matrix. SVD seeks to decompose the AffectNet matrix A ∈ R n ×d into three components,
A = USV T , (1)where U and V are unitary matrices, and S is an rectangular diag-
onal matrix with nonnegative real numbers on the diagonal.
SVD has been proved to be optimal in preserving any unitarily
invariant norm 1 ‖ · ‖ M [42] : ‖ A − A k ‖ M = min
rank (B)=k ‖ A − B ‖ M , (2)
where A k , i.e., AffectiveSpace, is formed by containing only the top
k singular values in S . Hence, in AffectiveSpace, commonsense con-
cepts and emotions are represented by vectors of k coordinates.
These coordinates can be seen as describing concepts in terms of
1 A norm ‖ · ‖ M is unitarily invariant if ‖ UAV ‖ M = ‖ A ‖ M for all A and all unitary U , V .
φ
eigenmoods’ which form the axes of AffectiveSpace, i.e., the ba-
is e 0 , ..., e k −1 of the vector space. For example, the most signifi-ant eigenmood, e 0 , represents concepts with positive affective va-
ence. That is, the larger a concept’s component in the e 0 direction
s, the more affectively positive it is likely to be. Correspondingly,
oncepts with negative e 0 components are likely to have negative
ffective valence.
Hence, by exploiting the information sharing property of SVD,
oncepts with the same affective valence are likely to have similar
eatures. This means that concepts which convey the same emo-
ion are more likely to fall in close proximity to each other in
ffectiveSpace. Concept similarity depends on, not their absolute
ositions in vector space, but the angle they make with the origin.
For example, concepts such as beautiful day , birthdayarty , and make someone happy are found very close in di-
ection in the vector space, while concepts like feel guilty , beaid off , and shed tear are found in a completely differentirection (nearly opposite with respect to the centre of the space).
The difficulty with this sort of representation is a lack of model
calability: as the number of concepts and semantic features in-
reases, the AffectNet matrix becomes so high-dimensional and
parse that it can no longer be computed by the SVD [44] . Al-
hough there has been substantial research seeking fast approx-
mations of the SVD, the approximate methods are at best ≈ 5imes faster than the standard one [42] , hence it is not viable for
eal-world big data applications.
There has been conjecture that neuronal learning has simple,
et powerful meta-algorithms underlying it [45] , which should be
ast, effective, scalable and biologically plausible, as well as having
ew-to-no assumptions [44] . Optimizing all the ≈ 10 15 connectionshrough the last few million years’ evolution is very unlikely [44] .
lternatively, nature probably only optimizes the global connectiv-
ty (mainly white matter), but leaves the other details to random-
ess [44] .
To handle the growing number of concepts and semantic fea-
ures, we replace SVD with random projection (RP) [46] , a data-
blivious method, to map the original high-dimensional data-set
nto a much lower-dimensional subspace by using a Gaussian
(0, 1) matrix, while preserving the pair-wise distances with high
robability. This theoretically strong and empirically verified state-
ent follows Johnson and Lindenstrauss’s (JL) Lemma [44] . The JL
emma states that with high probability, for all pairs of points x,
∈ X simultaneously,
m
d ‖ x − y ‖ 2 (1 − ε) ≤‖ �x − �y ‖ 2 ≤ (3)
√ m
d ‖ x − y ‖ 2 (1 + ε) , (4)
here X is a set of vectors in Euclidean space, d is the original
imension of this Euclidean space, m is the dimension of the space
e wish to reduce the data points to, ε is a tolerance parametereasuring to what extent is the maximum allowed distortion rate
f the metric space, and � is a random matrix.
Structured random projection for making matrix multiplica-
ion much faster was introduced in [47] . Achlioptas [48] proposed
parse random projection to replace the Gaussian matrix with i.i.d.
ntries in
ji = √
s
⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩
1 withprob . 1
2 s
0 with prob . 1 − 1 s
−1 with prob . 1 2 s
, (5)
A. Hussain, E. Cambria / Neurocomputing 275 (2018) 1662–1673 1667
Fig. 4. AffectiveSpace 2.
w
t
b
t
r
O p
�
w
r
p
d
i
b
a
m
n
t
w
s
s
n
t
s
o
i
i
c
e
r
t
s
d
3
s
r
a
l
c
a
u
f
t
p
i
3
o
t
d
i
R
w
o
t
X
m
t
l
i
l
m
m
f ∥∥
n
i
o
p
i
l
o
{
s
y
w
t
s
a
l
p
o
t
here one can achieve a × 3 speedup by setting s = 3 , since only1 3 of the data need to be processed. However, since our input ma-
rix is already too sparse, we avoid using sparse random projection.
When the number of features is a lot larger than the num-
er of training samples ( d � n ), subsampled randomized Hadamardransform (SRHT) is preferred, as it behaves similarly to Gaussian
andom matrices but also accelerates the process from O(n d) to(n log d) time [49] . Following [49,50] , for d = 2 p where p is any
ositive integer, a SRHT can be defined as:
= √
d
m RHD , (6)
here
• m is the number we want to subsample from d featuresandomly.
• R is a random m × d matrix. The rows of R are m uniform sam-les (without replacement) from the standard basis of R d .
• H ∈ R d×d is a normalized Walsh–Hadamard matrix, which isefined recursively: H d =
[H d/ 2 H d/ 2 H d/ 2 H d/ 2
]with H 2 =
[+1 +1 +1 −1
].
• D is a d × d diagonal matrix and the diagonal elements are.i.d. Rademacher random variables.
Our subsequent analysis relies solely on distances and angles
etween pairs of vectors (i.e., the Euclidean geometry information),
nd is sufficient enough to set the projected space to be logarith-
ic in the size of the data [51] and apply SRHT. The result is a
ew vector space model, AffectiveSpace 2 ( Fig. 4 ), which preserves
he semantic and affective relatedness of commonsense concepts
hile being highly scalable [52] . The key to performing common-
ense reasoning is to find a good trade-off for knowledge repre-
entation. Since no two situations in life are ever exactly the same,
o representation should be too concrete or else it will fail apply
o new situations. However, at the same time, no representation
hould be too abstract, or risk suppressing too many details.
Within AffectiveSpace 2, this knowledge representation trade-
ff can be seen in the choice of the vector space dimensional-
ty. The number m indicates the dimension of AffectiveSpace and,
n fact, is a measure of the trade-off between accuracy and effi-
iency in the representation of the affective commonsense knowl-
dge base. The bigger m is, the more precisely AffectiveSpace rep-
esents AffectNet’s knowledge; the smaller m is, on the other hand,
he more efficiently AffectiveSpace represents affective common-
ense knowledge both in terms of vector space generation and of
ot product computation.
.3. Semi-supervised learning approach
Even though AffectiveSpace 2 is a powerful tool for discovering
emantic and affective relatedness of natural language concepts,
easoning by analogy in such a multi-dimensional vector space is
difficult task as the distribution of concepts in the space is non-
inear and only the affective valence of a relatively small set of
oncepts is known a priori. Hence, the bSVM and bRLS models are
dopted as a semi-supervised approach in order to exploit both
nlabeled and labeled commonsense data to learn a classification
unction empirically. They are both based on the biased regulariza-
ion theory, realized as follows: a reference solution (e.g., a hyper-
lane) is used to bias the solution of a regularization-based learn-
ng machine.
.3.1. Regularization-based learning
Modern classification methods often rely on regularization the-
ry. In a regularized functional, a positive parameter, λ, rules theradeoff between the empirical risk, R emp [ f ], (loss function) of the
ecision functions f (i.e., regression or classification) and a regular-
zing term. The cost to be minimized can be expressed as:
reg = R emp [ f ] + λ�[ f ] (7)here the regularization operator, �[ f ], quantifies the complexity
f the class of functions from which f is drawn. Usually f belongs
o a Reproducing Kernel Hilbert Space (RKHS) H . For a data set,
, one computes a square matrix, K , of elements, which is sym-
etric and positive definite. Every entry K ( s, x ) can be viewed as
he inner product < φ( s ), φ( x ) > where φ(.) is the (implicit, noninear) mapping function uniquely defined by H . A kernel method
mplies the choice of an inner-product formulation; the simplest,
inear kernel supports the inner vector product in the actual do-
ain space: < φ( s ), φ( x ) > ≡ < s, x > . When dealing with maximum-margin algorithms, �[ f ] is imple-
ented by the term ∥∥ f ∥∥2
H , which supports a square norm in the
eature space. The Representer Theorem proves that, when �[ f ] =f ∥∥2
H , the solution of the regularized cost can be expressed as a fi-
ite summation over a set of labeled training patterns X = { x i , y i } , = 1 , . . . , l, y i ∈ {−1 , +1 } :
f (x j ) = w · x j = l ∑
i =1 βi K(x i , x j ) (8)
SVM and RLS are popular methods belonging to this family
f regularizing algorithms; both provide excellent performance in
attern recognition problems. The two learning algorithms differ
n their choice of loss function: the SVM model uses the ‘hinge’
oss function, whereas RLS operates on a square loss function.
The SVM training process requires one to solve the following
ptimization problem:
min ̄w ,b, ̄�}
C
l ∑ i =1
�i + 1
2
∥∥w ∥∥2 �i > 0 (9) ubject to:
i (w T x i ) ≥ 1 − �i (10)
here � i is a penalty term to be added for each misclassified pat-ern, and C is a hyperparameter that plays the role of 1/ λ. The con-tant parameter b has been dropped because one can equivalently
ugment the space X with a feature of constant value +1. The prob-
em can be efficiently solved in its dual form by using quadratic
rogramming techniques. When using the dual formulation, one
ptimizes a set of Lagrange multipliers, αi , and it can be shownhat the series coefficients in (8) can be written as β = α y .
i i i
1668 A. Hussain, E. Cambria / Neurocomputing 275 (2018) 1662–1673
r
w
f
c
w
f
λ c
t
v
w
d
t
g
I
λ t
e
b
(
t
r
t
s
a
l
w
o
R
Y
∑
b
∑
3
v
w
e
s
p⎧⎪⎨⎪⎩ i{
T
When dealing with the RLS model, the problem to be optimized
is:
min f
l ∑ i =1
(y i − f i ) 2 + λ
2
∥∥ f ∥∥2 H
(11)
whose optimum in β is found by solving the following linearsystem:
(K + λI ) β = y (12)
3.3.2. Maximal discrepancy bounds for model selection
One of the main obstacles in classification problems is tun-
ing classifier regularization parameter(s). When tackling limited-
sample problems, strategies such as k-fold cross validation may be
difficult to apply due to the small size of both the training and
the test sets. Thus, theoretical approaches which derive the ana-
lytical expressions of the generalization bounds, can give powerful
options for attaining reliable model selection. These methods do
not require data partitioning and are always based on the com-
plexity on the hypothesis space, F . The bound value to the true
generalization error, R [ f ], is asserted with confidence at least 1 − δ, and is commonly written as the sum of several terms:
R [ f ] ≤ R emp [ f ] + χ + � (13)where R emp [ f ] is the error on the training set, χ measures the com-plexity of the space of classifying functions, and � penalizes the
finiteness of the training sample.
The Maximal-Discrepancy bound (MD) belongs to the class of
theoretical approaches. In this research, the MD bound is exploited
to confirm that the proposed semi-supervised learning scheme can
ensure the shrinking of the generalization bound, thus providing
effective tools for model selection in a semi-supervised setting.
3.3.3. A biased regularization
A general biased regularization model is realized as follows: a
reference solution (e.g., a hyperplane) is used to bias the solution
of a regularization-based learning machine.
In a linear domain one can define a generic convex loss func-
tion, l ( X, Y, w ), and a biased regularizing term; the resulting cost
function is:
l(X , Y , w ) + λ1 ∥∥w − λ2 w 0 ∥∥2 (14)
where w 0 is a reference hyperplane, λ1 is the classical regulariza-tion parameter that controls smoothness (e.g., 1/ C in SVM), and
λ2 controls the adherence to the reference solution w 0 . Expression(14) is a convex functional and thus admits a global solution. From
(14) one gets:
l(X , Y , w ) + λ1 2
∥∥w − λ2 w 0 ∥∥2 = l(X , Y , w ) + λ1 2
∥∥w ∥∥2 −λ1 λ2 ww 0 (15)
which actually involves two regularization parameters, λ1 andλ2 ; this problem setting differs from the one proposed for SVM,where only one regularization parameter was defined, obtaining
l(X , Y , w ) + λ1 ∥∥w − w 0 ∥∥2 . The latter expression coincides in the
special case λ2 = 1 . Fig. 5 explicates the role played by parameter λ2 in three dif-
ferent cases. In all those figures, w is set as the origin, 0, of the
space of hypothesis, whereas a black square denotes the reference
hyperplane, w 0 , and a grey square indicates the ‘true’ optimal so-
lution. For the sake of clarity, and without loss of generality, the
examples assume that:
1. λ1 is set to a fixed value (i.e., λ1 = 1 ). 2. The distance
∥∥w − λ2 w 0 ∥∥ is constant for any λ2 . 3. w λ2 = 0 (black triangle) is the best solution one can obtain
from the unbiased learning (i.e., λ = 0). Here, the best solution
2efers to the solution that is closest to w ∗ among all the possible that lie at a distance
∥∥w − λ2 w 0 ∥∥ from w 0 (the dashed circum-erence).
Fig. 5 (a) refers to the situation in which the reference w 0 is
loser to the true solution w ∗ than w λ2 = 0 . The Figure shows thathen λ2 decreases from 1 to 0, the centre of the ideal circum-
erence, which encloses the eventual solution w λ2 , drifts. When
2 → 0, w λ2 moves toward the origin w = 0 , which represents theondition of no reference exploited. Indeed, the draw highlights
hat, when w 0 gives a reliable reference, one can take full ad-
antage of biased regularization, as the best solution for λ2 = 1 , λ2 = 1 , definitely improves over w λ2 = 0 .
Fig. 5 (b) illustrates the opposite case: the reference w 0 is more
istant from the true solution w ∗ than w λ2 = 0 (it is worth to notehat the relative position of w ∗ and w λ2 = 0 with respect to the ori-in w = 0 remained unchanged when compared with Fig. 5 (a)).n this situation, one would obtain the best outcome by setting
2 = 0 , thus neutralizing the contribution of the biased regulariza-ion. Hence, w 0 does not represent a helpful reference.
Finally, Fig. 5 (c) illustrates another situation in which the refer-
nce w 0 is more distant from the true solution, w ∗, than w λ2 = 0 ,
ut biased regularization still remains useful, as by adjusting λ2 i.e., by modulating the contribution of the reference w 0 ) one even-
ually obtains a solution w λ2 that improves over w λ2 = 0 . As aesult, one can take advantage of biased regularization even when
he reference solution is not optimal.
The extension of (14) to non-linear models is obtained by con-
idering a Reproducing Kernel Hilbert Space H . In that case one has
reference function f 0 and the functional (15) becomes:
(X , Y , f ) + λ1 2
∥∥ f − λ2 f 0 ∥∥2 H (16)here now the norm of the regularizer is taken in H . Eventually,
ne obtains the models for the biased SVM (bSVM) and the biased
L S (bRL S), respectively, by adopting the proper loss function l ( X,
, w ): bSVM:
l
i =1 (1 − y i f (x i )) + +
λ1 2
∥∥ f − λ2 f 0 ∥∥2 H (17)RLS:
l
i =1 (y i − f (x i )) 2 +
λ1 2
∥∥ f − λ2 f 0 ∥∥2 H (18).3.4. Biased SVM
The following theorem shows the formalization of the biased
ersion of the support vector machine (bSVM) within this scheme
ith the inclusion of a regularizing bias.
Theorem 1 (bSVM) : Given a reference hyperplane w 0 (or a refer-
nce function f 0 , if the domain is not linear), a regularization con-
tant C , and a biasing constant λ2 , the dual form of the learningroblem:
min { �, w } C ∑ l
i =1 �i + 1
2
∥∥w − λ2 w 0 ∥∥2 y i (w · x i ) ≥ 1 − �i ∀ i �i ≥ 0 ∀ i
(19)
s written as
min { α} 1
2 αt Q α − ∑ l i =1 αi (1 − λ2 y i f 0 (x i ))
0 ≤ αi ≤ C ∀ i (20)
he model of the data is:
f (x ) = l ∑
i =1 αi y i K(x , x i ) + λ2 f 0 (x ) (21)
A. Hussain, E. Cambria / Neurocomputing 275 (2018) 1662–1673 1669
Fig. 5. The role played by parameter λ2 in the proposed problem setting. Three different situations are analyzed: (a): the reference w 0 is closer to the true solution w ∗ than
w λ2 = 0 ; (b): the reference w 0 is more distant from the true solution w ∗ than w λ2 = 0 ; (c): the reference w 0 is more distant from the true solution w
∗ than w λ2 = 0 , but biased regularization can be useful.
w
m
k
p
(
L
g
i
G
K
r
t
G
{
b
s
i
3
R
c
m
h
w
(
f
λ
m
i
m
w
β
C
r
t
m
i
p
a
b
b
w
t
3
r
d
d
t
l l l
here α are the support vectors, � the slack variables used toeasure the grade of allowed misclassification and K the chosen
ernel.
Unlike the conventional SVM formulation, the minimization
roblem (20) does not contain a linear constraint. This problem
20) can be optimized by an SMO version which uses only a single
agrange multiplier at each iteration. In such a new procedure, the
radient integrates the new reference based-term and the regular-
zation parameter. The gradient value for the i th pattern is:
i = y i l ∑
j=1 α j y j K(x j , x i ) − 1 + λ2 y i f 0 (x i ) (22)
Then, as usual, the projected gradient PG is computed and the
KT optimality conditions are checked on this value. The algo-
ithm runs till the KKT conditions are satisfied. In the following
he pseudo-code of the algorithm is presented.
bSVM Solver :
1. Initialization: α = 0 , flag = 0 2. While (!flag):
a. flag = 1 b. for i = 1 , li.
= y i l ∑
j=1 α j iy j K(x j , x i ) − 1 + λ2 y i f 0 (x i ) (23)
ii.
min (G, 0) αi = 0 min (G, 0) αi = C G 0 < αi < C
(24)
iii. if | PG | > �1. αi = min (max (αi − G/k ii , 0) , C) 2. f lag = 0 end if
end for end while
The pseudo-code listed above can be considerably accelerated
y updating the gradient only when necessary, as well as by using
hrinking and random permutations of indexes of patterns at each
teration.
.3.5. Biased RLS
The following theorem formalizes the linear biased version of
LS.
Theorem 2 : Given a reference hyperplane w 0 , a regularization
onstant λ1 , and a biasing constant λ2 , the problem:
in w ∥∥Xw − y ∥∥2 + λ1 ∥∥w − λ2 w 0 ∥∥2 (25)
as solution:
= (X t X + λ1 I ) −1 (X t y + λ1 λ2 w 0 ) (26)The following theorem gives the dual form of biased RLS
bRLS):
Theorem 3 : Given a reference hyperplane w 0 (or a reference
unction f 0 ), a regularization constant λ1 , and a biasing constant
2 , the dual of the problem:
in w ∥∥Xw − y ∥∥2 + λ1 ∥∥w − λ2 w 0 ∥∥2 (27)
s:
in β∥∥K β + λ2 f 0 (x ) − y ∥∥2 + λ1 βt K β (28)
hich has solution:
= (K + λ1 I ) −1 (y − λ2 f 0 (X )) (29)The model of the data is:
f (x ) = l ∑
i =1 βi K(x , x i ) + λ2 f 0 (x ) (30)
orollary 1. Given the RLS learning machine, the RKHS H and the
epresentation,
f (x ) = l ∑
i =1 βi K(x , x i ) + λ2 f 0 (x ) (31)
he solution of problem
in f ∥∥ f − y ∥∥2 + λ1 ∥∥ f − λ2 f 0 ∥∥2 H (32)
s:
(K + λ1 I ) β = y − λ2 f 0 (X ) (33)From a computational point of view, the choice between the
rimal or the dual solution depends on the characteristics of the
vailable data. If samples lie in a low-dimensional space and can
e separated by a linear classifier, the primal form is preferable
ecause it scales linearly with the number of features. Conversely,
hen the number of patterns is lower than the number of features,
he dual form should be used.
.3.6. A semi-supervised learning scheme based on biased
egularization
Once the biased version of the SVM kernel machine has been
efined, the following four-step procedure can be followed in or-
er to formalize a semi-supervised framework for the classification
ask.
Let X be a dataset composed by l labeled patterns and u un-
abeled patterns; let X denote the labeled subset, y denote the
1670 A. Hussain, E. Cambria / Neurocomputing 275 (2018) 1662–1673
Table 2
Comparison of semi-supervised methods.
Method Complexity
bSVM O c + O (l + u ) 2 bSVM linear O c + O (d(l + u ) k ) bRLS O c + O (l + u ) 3 LapRLS O (l + u ) 3 LapSVM O (l + u ) 3 LapSVM linear O ( d 3 )
Cluster kernel O ( max ( l, u ) 3 )
TSVM O (k (l + 2 u ) 2 ) EM –
Co-training –
Semi parametric regularization O (l + u ) 3
i
l
t
o
i
b
c
c
f
S
t
w
s
p
i
T
c
n
u
t
g
i
r
l
o
s
g
o
i
r
corresponding vector of labels, and X u denote the unlabeled sub-
set. Then the semi-supervised learning scheme can be formalized
as follows:
1. Clustering: Use any clustering algorithm to perform an un-
supervised partition of the dataset X (a bipartition in the simplest
case).
2. Calibration: For every cluster, a majority voting scheme is
adopted to set the cluster label; this is done by exploiting the la-
beled samples. Then, for each cluster, assign to each sample the
cluster label. Let ˆ y denote this new set of labels.
3. Mapping: Given X and ˆ y , train the selected learning machine
and obtain the solution w 0 .
4. Biasing: Given X l and the true labels y l , train the biased ver-
sion of the learning machine (biased by w 0 ). The solution w carries
information derived from both the labeled data X l and the unla-
beled data X u .
The ultimate result is that the overall learned function ex-
ploits both labeled and unlabeled commonsense data. The semi-
supervised learning scheme possesses some interesting features:
• Since the proposed method could be applied both to linearand non linear domains, the result is a completely general-
izable learning scheme.
• The present learning scheme distinguishes between twoprincipal actions: clustering and biasing. This means that
the two tasks can be tackled independently. If one wants to
adopt a particular solution for biasing or a new clustering al-
gorithm is designed, then the two actions can be controlled
and adjusted separately.
• If the learning machine is a single layer learning machinewhose cost is convex then convexity is preserved and a
global solution is granted.
• Every clustering method can be used to build the referencesolution.
3.3.7. bSVM and bRLS for semi-supervised learning: computational
complexity
This semi-supervised learning scheme consists of 3 compu-
tationally intensive steps: clustering, mapping and biasing. Clus-
tering tasks can be completed using various multiple clustering
algorithms, which are then characterized by computational com-
plexities. As follows, the complexity of clustering can be denoted
generically as O c . There also exists some solutions which allow the
implementation of powerful clustering algorithms such as k-means
or Special Clustering.
In the second step, mapping, the time complexity is entirely de-
termined by the learning machine applied to all the l + u availablesamples. For RLS this would mean a complexity of O ( ( l + u ) 3 ) , i.e.,the solution of the system of linear equations (29) . When adopt-
ing SVM as learning machine, one can exploit the SMO algorithm,
which scales in between O (l + u ) and O ( (l + u ) 2 ) . The third step, biasing, consists of solving through either a lin-
ear system or using an SMO-like algorithm. In both cases, one also
need to pre-compute the predictions of the reference model f 0 ( x )
for all the labeled patterns (with d-dimensional patterns the even-
tual cost is O ( ud )). As a result, when bRLS is adopted as learning
machine, the computational complexity is:
O bRLS = O c + O ((l + u ) 3
)+ O (ud) + O
(l 3 )
(34)
In O bRLS the dominant terms are O ( ( l + u ) 3 ) and possibly thecomplexity O c associated to the clustering task. When instead
bSVM is used, the computational complexity is:
O bSV M = O c + O ((l + u ) 2
)+ O (ud) + O
(l 2 )
(35)
where the dominant terms are O ((l + u ) 2 ) and, again, the com-plexity O c . In this case, one assumes that O (ud) < O ((l + u ) 2 ) ; this
s a reasonable hypothesis, except for those cases where the data
ie in a highly dimensional space. Therefore, the complexity of the
raining procedure roughly scales with the same complexity of the
riginal learning machine. SVM scales approximately as O ( l 2 ), and
ts semi-supervised version (bSVM) scales as O ((l + u ) 2 ) ; a similarehavior characterizes RLS and bRLS.
Below are final considerations which can contribute to the dis-
ussion surrounding computational complexity:
• If it is known a priori that the data are almost linearly sep-arable, then it is possible to build very efficient learning al-
gorithms. For instance, one can couple fast linear k-means
implementations with the linear version of SVM and bSVM.
That set up leads to very efficient learning methods, in par-
ticular when data is highly sparse, such as in text mining
problems.
• The proposed semi-supervised learning scheme can addresslarge scale problems, as long as the clustering engines are
scaling well. For certain domains, for example text mining,
in which linearity and data sparsity are exploitable, adap-
tations of the learning algorithm can result in incredibly
fast learning algorithms. These algorithms can hence process
hundreds of thousands of patterns in a matter of seconds.
• New, unseen test patterns can be managed effectively asclass assignment can exploit the closed form functions.
The proposed framework can provide attractive features when
ompared with other semi-supervised methods. First, the reference
unction can be worked out by exploiting any clustering algorithm.
econd, biased regularization can support effectively model selec-
ion. This feature is crucial indeed, in particular when a model
ith few labeled data is addressed. Other approaches to semi-
upervised learning do not provide this attribute. Furthermore, the
resent framework exploits a convex cost function.
The proposed framework also ensures satisfactory performance
n terms of computational complexity. This is highlighted in
able 2 , which reports – for each approach – the computational
omplexity. Computational complexity is formalized by using the
umber of labeled patterns, l , the number of unlabeled patterns,
, the dimensionality of the data, d , the number of iterations of
he learning algorithm, k , and the complexity of the clustering al-
orithm, O c . For the proposed biased learning machines, complex-
ty has been formalized by assuming the use of efficient SMO-like
outines.
It is evident from the table that the current semi-supervised
earning scheme can achieve satisfactory performances in terms
f computational complexity, whenever the clustering algorithm
cales as (or better than) the adopted biased machine. In this re-
ard, bSVM appears especially appealing as it scales quadratically,
r even linearly if the underlying problem has particular character-
stics. Indeed, one should take into account that the term O c rep-
esents the added cost to be paid for a gain in flexibility.
A. Hussain, E. Cambria / Neurocomputing 275 (2018) 1662–1673 1671
Fig. 6. The effectiveness of model selection is improved by adopting a formulation of biased regularization which fully exploits parameters λ1 and λ2 . Four cases are
presented: (a) good reference / weak bias; (b) good reference / strong bias; (c) bad reference / weak bias; (d) bad reference / strong bias.
3
e
t
d
c
t
i
E
t
f
p
t
e
u
p
p
t
i
f
o
F
F
s
l
t
a
t
a
t
w
e
i
F
t
b
s
i
c
t
f
w
p
4
b
Table 3
Emotion recognition accuracy.
Method Strict accuracy (%) Relaxed accuracy (%)
bSVM (100 m ) 63 87
bSVM (50 m ) 62 89
bRLS (100 m ) 63.5 88.5
bRLS (50 m ) 64 90.5
ANN 46.9 76.5
k-medoids 43.2 74.1
k-NN 41.9 72.3
Random 14.3 40.1
A
t
a
i
I
A
c
I
e
c
w
t
d
a
b
c
p
r
t
I
p
t
s
t
t
v
t
f
A
t
t
w
m
t
p
2 http://sentic.net/downloads
.3.8. Biased regularization for semi-supervised learning supports
ffective model selection
The proposed semi-supervised learning framework can extend
he supervised classification scheme presented in [53] , which
emonstrated that when a cluster hypothesis holds, the use of
lustering to set a reference solution leads to a significant reduc-
ion of the space of possible functions. Such a result is noteworthy
n that it leads to tight generalization bounds, since the term χ inq. (13) measures the complexity of the space of classifying func-
ions. Tight generalization bounds are in turn a necessary condition
or supporting an effective model selection.
In [53] , model selection of SVM is actually performed by ex-
loiting an auxiliary machine, called VQSVM. In the VQSVM model
he learning task is fully supervised because the clustering refer-
nce only derives from labeled data. Besides, an Ivanov biased reg-
larization term is used and the regularization parameter λ2 is im-licitly set to a fixed value.
To extend the scheme [53] to semi-supervised learning, the
resent framework involves both labeled and unlabeled data in
he clustering step. Indeed, the effectiveness of model selection is
mproved by adopting a formulation of biased regularization that
ully exploits parameters λ1 and λ2 . Fig. 6 considers four cases, and analyzes the relative positions
f the origin w = 0 , the reference, w 0 , and the true solution, w ∗.ig. 2 (a) exemplifies the case ‘good reference / weak bias’, whereas
ig. 2 (b) illustrates the case ‘good reference / strong bias’. In both
ituations, the reference solution w 0 is not far from the true so-
ution w ∗. Yet, the first case adopts a weak bias, which permitshe biasing step to explore a relatively wide portion of space
round the reference (the lighter circumference, which represents
he space of functions). On the other hand, the second case adopts
strong bias, thereby exploring a smaller portion of space. Even-
ually, the area being explored via a strong bias does not include
∗. Fig. 6 (c) refers to the case ‘bad reference / weak bias’. The ref-
rence w 0 is quite distant from the true solution w ∗; by adopt-
ng a weak bias, though, one can still exploit biasing to reach w ∗.inally, Fig. 6 (d) presents the case ‘bad reference / strong bias’. In
his situation, by adopting a strong bias one restricts the space to
e explored to a small region around w 0 . As a result, the proposed
olution will be very distant from the true solution w ∗. On the whole, the four examples affirm that that by modulat-
ng the biasing mechanism through the parameters λ1 and λ2 onean take full advantage of the semi-supervised scheme and support
he model selection procedure properly. Eventually, the proposed
ramework involves a novel semi-supervised classification scheme,
hich supports a fully automated model selection and can be ap-
lied also when the size of the labeled dataset is small ( l < 50).
. Experimental results
The proposed affective commonsense reasoning architecture
ased on bSVM and bRLS was tested on the publicly available
ffectNet benchmark. 2 The AffectNet database provides four affec-
ive labels for 2257 concepts. Each label corresponds to a level of
ctivation in the Hourglass model. Concepts are described accord-
ng to the m -dimensional vector space defined by AffectiveSpace 2.
n the present experimental session, two different configurations of
ffectiveSpace were compared: m = 100 and m = 50. In order to robustly evaluate the performance of the proposed
lassification framework, a cross-validation procedure was applied.
n particular, we considered ten different experimental runs. In
ach run, 200 concepts that were randomly extracted from the
omplete database provided the test set; the remaining concepts
ere evenly split into a training set and a validation set. In order
o test the semi-supervised approach, 500 unlabeled patterns ran-
omly extracted from a total of 16,431 unlabeled concepts were
dded to the training set. The reference function of the bSVM and
RLS frameworks can be derived from any clustering algorithm; we
hose for this purpose the k-means algorithm.
In the present set up, the validation set was designed to sup-
ort the model selection phase, i.e., the selection of the best pa-
ameterization for the classifier. A RBF kernel was adopted, thus
he model selection actually involved two parameters: γ and C.n each run the classification accuracy was measured by using the
rediction system obtained after the model selection phase. Only
he patterns included in the test set were exploited to assess clas-
ification accuracy, i.e., the patterns that were not involved in the
raining phase or in the model selection phase.
Table 3 reports the average classification accuracy obtained by
he bSVM and bRLS frameworks over the ten runs. The table pro-
ides the results of the two different experiments addressed in
his research: the experiment involving the 100-dimensional Af-
ectiveSpace 2 and the experiment involving the 50-dimensional
ffectiveSpace 2. Classification accuracy is evaluated according to
wo different criteria. The first criterion, “Strict Accuracy”, refers
o the percentage of patterns for which the classification frame-
ork correctly predicted the activation level for every affective di-
ension. The second criterion, “Relaxed Accuracy”, assumes that
he prediction is correct even in the event the framework fails to
roperly assess one affective dimension out of four. The relaxed
http://sentic.net/downloads
1672 A. Hussain, E. Cambria / Neurocomputing 275 (2018) 1662–1673
Table 4
Comparison with the state of the art.
System Accuracy (%)
Socher et al. (2012) 80.00
Socher et al. (2013) 85.40
Proposed method 88.50
accuracy actually takes into account that a certain level of noise
may hinder the AffectNet benchmark, as affective activation levels
are assigned according to subjective tests.
The results showed in Table 3 provide a few interesting out-
comes. Firstly, both the bSVM-based and the bRLS-based frame-
works proved to be able to attain very promising performance in
terms of classification accuracy. Indeed, the proposed approach sig-
nificantly improves over standard classification techniques such as
k-NN model, k-medoids, and artificial neural network (ANN). Sec-
ondly, both frameworks also attained reliable performance when
the 50-dimensional AffectiveSpace 2 was adopted, which consis-
tently reduced the complexity of the reasoning architecture. Such
results confirm the ability to deal with complex problems. More-
over, bRLS slightly improves over bSVM.
The proposed affective reasoning framework with bSVM was
also evaluated on an opinion mining dataset derived from a corpus
developed by Pang and Lee [32] . The corpus consists of 10 0 0 pos-
itive and 10 0 0 negative movie reviews from expert reviewers, col-
lected from rottentomatoes.com. All text has been converted into
lowercase and has been lemmatized, whilst HTML tags have also
been removed. Pang and Lee initially labelled each review man-
ually as either positive or negative, following which the Stanford
NLP group annotated this dataset at sentence level [54,55] .
They extracted 11,855 sentences from the reviews and manu-
ally labeled them as positive and negative. We used the emotion-
polarity formula provided by the Hourglass model to calculate a
binary polarity value for each dataset sentence. Table 4 presents
the comparison of the proposed system with the state-of-the-art
accuracy.
5. Conclusion
We live in a world where millions express their views and opin-
ions of commercial products on the web on a daily basis. This
distillation of knowledge due to the massive amounts of unstruc-
tured information available, is of considerable importance for tasks
such as social media marketing, product positioning, and financial
market prediction.
While existing approaches are limited by the fact that they
work at word-level need a lot of training, semi-supervised com-
monsense reasoning seems a good solution to the problem of big
social data analysis. In this work, a novel learning model based on
the combined use of random projections and support vector ma-
chines was exploited to perform reasoning on a knowledge base of
affective commonsense. Results showed a significant improvement
in both emotion recognition and polarity detection and, hence,
paved the way for semi-supervised learning approaches to big
social data analysis.
References
[1] S. Poria , E. Cambria , R. Bajpai , A. Hussain , A review of affective computing:from unimodal analysis to multimodal fusion, Inf. Fus. 37 (2017) 98–125 .
[2] E. Cambria , A. Hussain , Sentic Computing: A Common-Sense-Based Frameworkfor Concept-Level Sentiment Analysis, Springer, Cham, Switzerland, 2015 .
[3] E. Cambria , D. Olsher , K. Kwok , Sentic activation: a two-level affective com-
mon sense reasoning framework, in: Proceedings of the AAAI, Toronto, 2012,pp. 186–192 .
[4] R. Speer , C. Havasi , ConceptNet 5: A large semantic network for relationalknowledge, in: E. Hovy, M. Johnson, G. Hirst (Eds.), Theory and Applications
of Natural Language Processing, Springer, 2012 .
[5] C. Strapparava , A. Valitutti , WordNet-Affect: an affective extension of Word-Net, in: Proceedings of the International Conference on Language Resources
and Evaluation, Lisbon, 2004, pp. 1083–1086 . [6] E. Cambria , D. Das , S. Bandyopadhyay , A. Feraco , A Practical Guide to Senti-
ment Analysis, Springer, Cham, Switzerland, 2017 . [7] F. Bisio , P. Gastaldo , R. Zunino , S. Decherchi , Semi-supervised machine learn-
ing approach for unknown malicious software detection, in: Proceedings of theIEEE International Symposium on Innovations in Intelligent Systems and Appli-
cations (INISTA), IEEE, 2014, pp. 52–59 .
[8] S. Poria , E. Cambria , D. Hazarika , N. Mazumder , A. Zadeh , L.-P. Morency , Con-text-dependent sentiment analysis in user-generated videos, in: Proceedings of
the ACL, 2017, pp. 873–883 . [9] I. Chaturvedi , E. Ragusa , P. Gastaldo , R. Zunino , E. Cambria , Bayesian net-
work based extreme learning machine for subjectivity detection, J. Frankl. Inst.(2017) .
[10] F. Xing, E. Cambria, R. Welsch, Natural language based financial forecasting: a
survey, Artif. Intell. Rev. (2018), doi: 10.1007/s10462- 017- 9588- 9 . [11] M. Ebrahimi , A. Hossein , A. Sheth , Challenges of sentiment analysis for dynam-
icevents, IEEE Intell. Syst. 32 (5) (2017) . [12] E. Cambria , A. Hussain , T. Durrani , C. Havasi , C. Eckl , J. Munro , Sentic comput-
ing for patient centered application, in: Proceedings of the IEEE InternationalConference on Signal Processing, Beijing, 2010, pp. 1279–1282 .
[13] A. Valdivia , V. Luzon , F. Herrera , Sentiment analysis in tripadvisor, IEEE Intell.
Syst. 32 (4) (2017) 2–7 . [14] S. Cavallari , V. Zheng , H. Cai , K. Chang , E. Cambria , Joint node and commu-
nity embedding on graphs, in: Proceedings of the International Conference onInformation and Knowledge Management, 2017 .
[15] R. Mihalcea , A. Garimella , What men say, what women hear: finding gen-der-specific meaning shades, IEEE Intell. Syst. 31 (4) (2016) 62–67 .
[16] E. Cambria , S. Poria , A. Gelbukh , M. Thelwall , Sentiment analysis is a big suit-
case, IEEE Intell. Syst. 32 (6) (2017) . [17] S. Poria , I. Chaturvedi , E. Cambria , F. Bisio , Sentic LDA: improving on LDA with
semantic similarity for aspect-based sentiment analysis, in: Proceedings of theInternational Joint Conference on Neural Networks, 2016, pp. 4 465–4 473 .
[18] Y. Ma , E. Cambria , S. Gao , Label embedding for zero-shot fine-grained namedentity typing, in: Proceedings of the International Conference on Computa-
tional Linguistics, Osaka, 2016, pp. 171–180 .
[19] Y. Xia , E. Cambria , A. Hussain , H. Zhao , Word polarity disambiguation us-ing Bayesian model and opinion-level features, Cognit. Comput. 7 (3) (2015)
369–380 . [20] X. Zhong , A. Sun , E. Cambria , Time expression analysis and recognition using
syntactic token types and general heuristic rules, in: Proceedings of the ACL,2017, pp. 420–429 .
[21] N. Majumder , S. Poria , A. Gelbukh , E. Cambria , Deep learning-based document
modeling for personality detection from text, IEEE Intell. Syst. 32 (2) (2017)74–79 .
[22] S. Poria , E. Cambria , D. Hazarika , P. Vij , A deeper look into sarcastic tweetsusing deep convolutional neural networks, in: Proceedings of the International
Conference on Computational Linguistics, 2016, pp. 1601–1612 . [23] L. Oneto , F. Bisio , E. Cambria , D. Anguita , Statistical learning theory and ELM
for big social data analysis, IEEE Comput. Intell. Mag. 11 (3) (2016) 45–55 . [24] C.D. Elliott , The affective reasoner: a process model of emotions in a multi-a-
gent system, Northwestern University, Evanston, 1992 Ph.D. thesis .
[25] A. Ortony , G. Clore , A. Collins , The Cognitive Structure of Emotions, CambridgeUniversity Press, Cambridge, 1988 .
[26] J. Wiebe , T. Wilson , C. Cardie , Annotating expressions of opinions and emotionsin language, Lang. Resour. Eval. 39 (2) (2005) 165–210 .
[27] T. Wilson , J. Wiebe , P. Hoffmann , Recognizing contextual polarity inphrase-level sentiment analysis, in: Proceedings of the Human Language Tech-
nology Conference and Empirical Methods in Natural Language Processing,
Vancouver, 2005, pp. 347–354 . [28] R. Stevenson , J. Mikels , T. James , Characterization of the affective norms for
english words by discrete emotional categories, Behav. Res. Methods 39 (2007)1020–1024 .
[29] S. Somasundaran , J. Wiebe , J. Ruppenhofer , Discourse level opinion interpre-tation, in: Proceedings of the International Conference on Computational Lin-
guistics, Manchester, 2008, pp. 801–808 .
[30] B. Pang , L. Lee , S. Vaithyanathan , Thumbs up? Sentiment classification usingmachine learning techniques, in: Proceedings of the Empirical Methods for
Natural Language Processing, Philadelphia, 2002, pp. 79–86 . [31] B. Goertzel , K. Silverman , C. Hartley , S. Bugaj , M. Ross , The Baby Webmind
project, in: Proceedings of the Adaptation in Artificial and Biological Systems,Birmingham, 20 0 0 .
[32] B. Pang , L. Lee , Seeing stars: exploiting class relationships for sentiment cate-
gorization with respect to rating scales, in: Proceedings of the ACL, Ann Arbor,2005, pp. 115–124 .
[33] M. Hu , B. Liu , Mining and summarizing customer reviews, in: Proceedings ofthe Conference on Knowledge Discovery and Data Mining, Seattle, 2004 .
[34] L. Velikovich , S. Goldensohn , K. Hannan , R. McDonald , The viability of we-b-derived polarity lexicons, in: Proceedings of the Annual Conference of the
North American Chapter of the Association for Computational Linguistics, Los
Angeles, 2010, pp. 777–785 . [35] A. Gangemi , V. Presutti , D. Reforgiato , Frame-based detection of opinion hold-
ers and topics: a model and a tool, IEEE Comput. Intell. Mag. 9 (1) (2014)20–30 .
http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0001http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0001http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0001http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0001http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0001http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0002http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0002http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0002http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0003http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0003http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0003http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0003http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0004http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0004http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0004http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0005http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0005http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0005http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0006http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0006http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0006http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0006http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0006http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0007http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0007http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0007http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0007http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0007http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0008http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0008http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0008http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0008http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0008http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0008http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0008http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0009http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0009http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0009http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0009http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0009http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0009https://doi.org/10.1007/s10462-017-9588-9http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0011http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0011http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0011http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0011http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0012http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0012http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0012http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0012http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0012http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0012http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0012http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0013http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0013http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0013http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0013http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0014http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0014http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0014http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0014http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0014http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0014http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0015http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0015http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0015http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0016http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0016http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0016http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0016http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0016http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0017http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0017http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0017http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0017http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0017http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0018http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0018http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0018http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0018http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0019http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0019http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0019http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0019http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0019http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0020http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0020http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0020http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0020http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0021http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0021http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0021http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0021http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0021http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0022http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0022http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0022http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0022http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0022http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0023http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0023http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0023http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0023http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0023http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0024http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0024http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0025http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0025http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0025http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0025http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0026http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0026http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0026http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0026http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0027http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0027http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0027http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0027http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0028http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0028http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0028http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0028http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0029http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0029http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0029http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0029http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0030http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0030http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0030http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0030http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0031http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0031http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0031http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0031http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0031http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0031http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0032http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0032http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0032http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0033http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0033http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0033http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0034http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0034http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0034http://refhub.elsevier.com/S0925-2312(17)3