12
Neurocomputing 275 (2018) 1662–1673 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom Semi-supervised learning for big social data analysis Amir Hussain a , Erik Cambria b,a School of Natural Sciences, University of Stirling, FK9 4LA, Stirling, UK b School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Ave, Singapore 639798 a r t i c l e i n f o Article history: Received 1 May 2017 Revised 6 October 2017 Accepted 6 October 2017 Available online 18 October 2017 Communicated by Zidong Wang Keywords: Semi-supervised learning Big social data analysis Sentiment analysis a b s t r a c t In an era of social media and connectivity, web users are becoming increasingly enthusiastic about inter- acting, sharing, and working together through online collaborative media. More recently, this collective intelligence has spread to many different areas, with a growing impact on everyday life, such as in ed- ucation, health, commerce and tourism, leading to an exponential growth in the size of the social Web. However, the distillation of knowledge from such unstructured Big data is, an extremely challenging task. Consequently, the semantic and multimodal contents of the Web in this present day are, whilst being well suited for human use, still barely accessible to machines. In this work, we explore the potential of a novel semi-supervised learning model based on the combined use of random projection scaling as part of a vector space model, and support vector machines to perform reasoning on a knowledge base. The latter is developed by merging a graph representation of commonsense with a linguistic resource for the lexical representation of affect. Comparative simulation results show a significant improvement in tasks such as emotion recognition and polarity detection, and pave the way for development of future semi-supervised learning approaches to big social data analytics. © 2017 Elsevier B.V. All rights reserved. 1. Introduction With the advent of social networks, web communities, blogs, Wikipedia, and other forms of online collaborative media, the way people express their opinions and sentiments has radically changed in recent years [1]. These new tools have facilitated the creation of original content, ideas, and opinions, connecting mil- lions of people through the World Wide Web, in a financially and labour-effective manner. This has made a huge source of informa- tion and opinions easily available by the mere click of a mouse. As a result, the distillation of knowledge from this huge amount of unstructured information comes into vital play for mar- keters looking to create and shape brand and product identities. The practical purpose this encapsulates has led to the emerg- ing field of big social data analysis, which deals with information retrieval and knowledge discovery from natural language and so- cial networks using graph mining and natural language processing (NLP) techniques to distill knowledge and opinions from the huge amount of information on the World Wide Web. Sentic comput- ing [2] tackles these crucial issues by exploiting affective common- sense reasoning, modeled upon the intrinsically human capacity to Corresponding author. E-mail addresses: [email protected] (A. Hussain), [email protected] (E. Cam- bria). interpret cognitive and affective information associated with nat- ural language, so as to infer new knowledge and make decisions in connection with one’s social and emotional values, censors, and ideals. In other words, we can say that commonsense computing techniques are applied to narrow the semantic gap between word- level natural language data and the concept-level opinions con- veyed by these. In the past, graph mining techniques and multi-dimensionality reduction techniques [3] were employed on a knowledge base ob- tained by merging ConceptNet [4], a directed graph representa- tion of commonsense knowledge, with WordNet-Affect (WNA) [5], a linguistic resource for the lexical representation of affect. Our research fits within the sentic computing framework and aims to exploit machine learning for developing a cognitive model for emotion recognition in natural language text. Unlike purely syntac- tical techniques, concept-based approaches can even detect subtly expressed sentiments e.g. by analyzing concepts that do not ex- plicitly convey any emotions, but are linked implicitly to others which do. On this note, the bag-of-concepts model represents se- mantics associated with natural language, much better than bag- of-words. Interestingly, in the bag-of-words model, a concept such as cloud_computing would be split into two separate words, and would hence disrupt the semantics of the input sentence (in which, for example, the word cloud could wrongly activate concepts related to weather). https://doi.org/10.1016/j.neucom.2017.10.010 0925-2312/© 2017 Elsevier B.V. All rights reserved.

Semi-supervised learning for big social data analysis · Big social data analysis Sentiment analysis a b s t r a c t anera of mediasocial connectivity, weband users becoming enthusiastic

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • Neurocomputing 275 (2018) 1662–1673

    Contents lists available at ScienceDirect

    Neurocomputing

    journal homepage: www.elsevier.com/locate/neucom

    Semi-supervised learning for big social data analysis

    Amir Hussain a , Erik Cambria b , ∗

    a School of Natural Sciences, University of Stirling, FK9 4LA, Stirling, UK b School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Ave, Singapore 639798

    a r t i c l e i n f o

    Article history:

    Received 1 May 2017

    Revised 6 October 2017

    Accepted 6 October 2017

    Available online 18 October 2017

    Communicated by Zidong Wang

    Keywords:

    Semi-supervised learning

    Big social data analysis

    Sentiment analysis

    a b s t r a c t

    In an era of social media and connectivity, web users are becoming increasingly enthusiastic about inter-

    acting, sharing, and working together through online collaborative media. More recently, this collective

    intelligence has spread to many different areas, with a growing impact on everyday life, such as in ed-

    ucation, health, commerce and tourism, leading to an exponential growth in the size of the social Web.

    However, the distillation of knowledge from such unstructured Big data is, an extremely challenging task.

    Consequently, the semantic and multimodal contents of the Web in this present day are, whilst being

    well suited for human use, still barely accessible to machines. In this work, we explore the potential of a

    novel semi-supervised learning model based on the combined use of random projection scaling as part of

    a vector space model, and support vector machines to perform reasoning on a knowledge base. The latter

    is developed by merging a graph representation of commonsense with a linguistic resource for the lexical

    representation of affect. Com parative simulation results show a significant improvement in tasks such as

    emotion recognition and polarity detection, and pave the way for development of future semi-supervised

    learning approaches to big social data analytics.

    © 2017 Elsevier B.V. All rights reserved.

    i

    u

    i

    i

    t

    l

    v

    r

    t

    t

    a

    r

    t

    e

    t

    e

    p

    w

    m

    o

    1. Introduction

    With the advent of social networks, web communities, blogs,

    Wikipedia, and other forms of online collaborative media, the

    way people express their opinions and sentiments has radically

    changed in recent years [1] . These new tools have facilitated the

    creation of original content, ideas, and opinions, connecting mil-

    lions of people through the World Wide Web, in a financially and

    labour-effective manner. This has made a huge source of informa-

    tion and opinions easily available by the mere click of a mouse.

    As a result, the distillation of knowledge from this huge

    amount of unstructured information comes into vital play for mar-

    keters looking to create and shape brand and product identities.

    The practical purpose this encapsulates has led to the emerg-

    ing field of big social data analysis, which deals with information

    retrieval and knowledge discovery from natural language and so-

    cial networks using graph mining and natural language processing

    (NLP) techniques to distill knowledge and opinions from the huge

    amount of information on the World Wide Web. Sentic comput-

    ing [2] tackles these crucial issues by exploiting affective common-

    sense reasoning, modeled upon the intrinsically human capacity to

    ∗ Corresponding author. E-mail addresses: [email protected] (A. Hussain), [email protected] (E. Cam-

    bria).

    a

    a

    (

    c

    https://doi.org/10.1016/j.neucom.2017.10.010

    0925-2312/© 2017 Elsevier B.V. All rights reserved.

    nterpret cognitive and affective information associated with nat-

    ral language, so as to infer new knowledge and make decisions

    n connection with one’s social and emotional values, censors, and

    deals. In other words, we can say that commonsense computing

    echniques are applied to narrow the semantic gap between word-

    evel natural language data and the concept-level opinions con-

    eyed by these.

    In the past, graph mining techniques and multi-dimensionality

    eduction techniques [3] were employed on a knowledge base ob-

    ained by merging ConceptNet [4] , a directed graph representa-

    ion of commonsense knowledge, with WordNet-Affect (WNA) [5] ,

    linguistic resource for the lexical representation of affect. Our

    esearch fits within the sentic computing framework and aims

    o exploit machine learning for developing a cognitive model for

    motion recognition in natural language text. Unlike purely syntac-

    ical techniques, concept-based approaches can even detect subtly

    xpressed sentiments e.g. by analyzing concepts that do not ex-

    licitly convey any emotions, but are linked implicitly to others

    hich do. On this note, the bag-of-concepts model represents se-

    antics associated with natural language, much better than bag-

    f-words. Interestingly, in the bag-of-words model, a concept such

    s cloud_computing would be split into two separate words,nd would hence disrupt the semantics of the input sentence

    in which, for example, the word cloud could wrongly activateoncepts related to weather ).

    https://doi.org/10.1016/j.neucom.2017.10.010http://www.ScienceDirect.comhttp://www.elsevier.com/locate/neucomhttp://crossmark.crossref.org/dialog/?doi=10.1016/j.neucom.2017.10.010&domain=pdfmailto:[email protected]:[email protected]://doi.org/10.1016/j.neucom.2017.10.010

  • A. Hussain, E. Cambria / Neurocomputing 275 (2018) 1662–1673 1663

    a

    a

    s

    w

    u

    t

    G

    i

    m

    c

    a

    o

    e

    b

    r

    b

    m

    (

    s

    t

    u

    o

    i

    c

    a

    c

    p

    i

    i

    a

    t

    c

    r

    a

    v

    l

    s

    m

    u

    s

    s

    e

    c

    p

    s

    c

    I

    c

    c

    t

    T

    d

    a

    h

    o

    m

    i

    s

    f

    i

    t

    S

    a

    t

    b

    f

    w

    2

    p

    b

    r

    i

    a

    n

    w

    c

    a

    f

    t

    p

    l

    a

    i

    n

    t

    i

    u

    u

    b

    w

    c

    o

    u

    o

    f

    t

    p

    g

    t

    u

    i

    t

    s

    v

    p

    n

    p

    d

    u

    p

    b

    c

    p

    b

    a

    F

    k

    (

    o

    t

    s

    Concept level analysis permits the inference of semantic and

    ffective information associated with natural language opinions

    nd, therefore, facilitates comparative fine-grained feature-based

    entiment analysis [6] . Instead of collecting isolated opinions on a

    hole item (e.g., iPhone8), users prefer to compare different prod-

    cts according to specific features (e.g., iPhone8’s vs. GalaxyS8’s

    ouchscreen), or even sub-features (e.g., fragility of iPhone8’s vs.

    alaxyS8’s touchscreen). In this context, commonsense knowledge

    s essential for deconstructing natural language text into senti-

    ents accurately; for instance, the concept go_read_the_bookan be deemed positive if present in a book review, whereas in

    movie review would indicate negative feedback. The inference

    f emotions and polarity from natural language concepts is, how-

    ver, a formidable task as it requires advanced reasoning capa-

    ilities such as commonsense as well as analogical and affective

    easoning.

    In the proposed work, a novel semi-supervised learning model

    ased on the merged use of multi-dimensional scaling (MDS) by

    eans of random projections and biased support vector machine

    bSVM) [7] is exploited for the task of emotion recognition. Semi-

    upervised classification should demonstrate an improvement over

    he classification rule, as both unlabeled and labeled data are

    tilised to empirically learn a classification function (compared to

    nly labeled data). Interest in semi-supervised learning has grown

    n recent times, which can be attributed to the existence of appli-

    ation domains (e.g., text mining, natural language processing, im-

    ge and video retrieval, and bioinformatics). The proposed scheme

    an benefit from biased regularization, which provides a viable ap-

    roach to implementing an inductive bias in a kernel machine. This

    s of fundamental importance in learning theory given that it heav-

    ly influences the generalization ability of a learning system. From

    mathematical perspective, inductive bias can be formalized as

    he set of assumptions which determine the choice of a particular

    lass of functions for supporting the learning process. Therefore, it

    epresents a powerful tool that embeds prior knowledge for the

    pplicative problem at hand.

    To this aim, semi-supervised learning is formalized as a super-

    ised learning problem biased by an unsupervised reference so-

    ution. First, we introduce a novel, general biased-regularization

    cheme that integrates biased versions of two well-known kernel

    achines, specifically, support vector machines (SVMs) and reg-

    larized least squares (RLS). Subsequently, we propose a semi-

    upervised learning model, based on this biased-regularization

    cheme adopting a two-stage procedure. In the first stage, a refer-

    nce solution is obtained using an unsupervised clustering of the

    omplete dataset (including both unlabeled and labeled data). A

    rimary impact of this is that the eventual semi-supervised clas-

    ification framework can derive the reference function from any

    lustering algorithm, thus providing it with remarkable flexibility.

    n the following stage, clustering outcomes drive the learning pro-

    ess in a biased SVM (bSVM) or a biased RL S (bRL S) to acquire

    lass information provided by the labels. The final outcome is that

    he overall learned function utilizes labeled and unlabeled data.

    he developed framework is applicable to linear and non-linear

    ata distributions: the former works based on a cluster assumption

    pplied to the data, whilst the latter operates based on a manifold

    ypothesis. Consequently, a semi-supervised learning process can

    nly be valid, when unlabeled data can assume an intrinsic geo-

    etric structure, for example, a low-dimensional non-linear man-

    fold in the ideal case. With respect to previous strategies, the re-

    ults demonstrate significant enhancements and pave the way for

    uture development of semi-supervised learning approaches echo-

    ng that of affective commonsense reasoning.

    The rest of this paper is organized as follows: Section 2 in-

    roduces related work in the field of sentiment analysis research;

    ection 3 describes in detail the new semi-supervised learning

    rchitecture for affective commonsense reasoning; Section 4 illus-

    rates results obtained by applying the new model to an affective

    enchmark and to an opinion mining dataset; finally, Section 5 of-

    ers some concluding remarks and recommendations for future

    ork.

    . Related work

    In recent years, sentiment analysis [6] has become increasingly

    opular for processing social media data on online communities,

    logs, Wikis, microblogging platforms, and other online collabo-

    ative media. Sentiment analysis is a branch of affective comput-

    ng research that aims to classify text (but sometimes also audio

    nd video [8] ) into either positive or negative (but sometimes also

    eutral [9] ). Sentiment analysis has raised growing interest both

    ithin the scientific community, leading to many exciting open

    hallenges, as well as in the business world, due to the remark-

    ble benefits to be had from financial forecasting [10] and political

    orecasting [11] , e-health [12] and e-tourism [13] , community de-

    ection [14] and user profiling [15] , and more.

    While most works approach it as a simple categorization

    roblem, sentiment analysis is actually a suitcase research prob-

    em [16] that requires tackling many NLP tasks, including

    spect extraction [17] , named entity recognition [18] , word polar-

    ty disambiguation [19] , temporal tagging [20] , personality recog-

    ition [21] , and sarcasm detection [22] . Most existing approaches

    o sentiment analysis rely on the extraction of a vector represent-

    ng the most salient and important text features, which is later

    sed for classification purposes [23] . Some of the most commonly

    sed features are term frequency and presence. The latter is a

    inary-valued feature vector in which the entries merely indicate

    hether a term occurs (value 1) or not (value 0). Feature vectors

    an sometimes have term-based features with them. Position is

    ne such example; considering that the position of tokens in text

    nits can alter a token’s effect on the text’s sentiment. Presence

    f n-grams, usually bi-grams and tri-grams, can also be useful as

    eatures, as one can find methods which are dependent on dis-

    ances between terms. Part-of-speech (POS) information (for exam-

    le, nouns, verbs, adverbs, and adjectives) is commonly utilized for

    eneral textual analysis, in a basic form of word-sense disambigua-

    ion. There are some specific adjectives, which have been proven as

    seful indicators of sentiment, and as guides for feature selection

    n sentiment classification. Lastly, other studies carried out the de-

    ection of sentiments via selected phrases, selected through pre-

    pecified POS patterns, the majority of which had either an ad-

    erb or an adjective. Numerous approaches exist which map given

    ieces of text to labels from predefined sets of categories, or real

    umber representatives of a polarity degree. Nonetheless, these ap-

    roaches and their performances are confined to an application’s

    omain and relevant areas.

    The transformation of sentiment analysis research can be eval-

    ated by examining the token of analysis used, along with im-

    licit associated information. In this way, current approaches can

    e sorted into four primary categories: keyword spotting, statisti-

    al methods, lexical affinity and concept based techniques.

    Keyword spotting is very naïve and is the most popular ap-

    roach, due to its accessibility and economical nature. Text can

    e grouped into affect categories depending on fairly unambiguous

    ffect words like ‘happy’, ‘bored, ‘afraid’, and ‘sad’ being present.

    or example, Elliott’s Affective Reasoner [24] , checks for 198 affect

    eywords (e.g., ‘distressed’, ‘enraged’) and affect intensity modifiers

    e.g., ‘extremely’, ‘mildly’, and ‘somewhat’). Other popular sources

    f affect words are Ortony’s Affective Lexicon [25] , which groups

    erms into affective categories, and Wiebe’s linguistic annotation

    cheme [26] .

  • 1664 A. Hussain, E. Cambria / Neurocomputing 275 (2018) 1662–1673

    Fig. 1. The Hourglass of Emotions.

    a

    s

    a

    s

    d

    C

    s

    p

    c

    t

    a

    a

    k

    l

    t

    a

    (

    C

    c

    e

    i

    e

    g

    Lexical ‘affinity’ is slightly more sophisticated than keyword

    spotting as, rather than simply detecting obvious affect words, it

    assigns specific words a probabilistic affinity’ for a particular emo-

    tion. For example, accident might be assigned a 75% probabilityof being indicative of a negative affect, as in the case of situations

    such as car_accident or hurt_by_accident . These probabil-ities are usually trained from linguistic corpora [27–29] .

    Statistical methods like that of latent semantic analysis (LSA)

    and SVM, have proven useful for the affective cataloging of

    texts. Researchers have applied these approaches on projects like

    Pang’s movie review classifier [30] , Goertzel’s Webmind [31] , and

    more [17,32–34] . By feeding a machine learning algorithm a large

    training corpus of affectively annotated texts, it is possible for

    the systems to not only learn the affective valence of affect key-

    words, but to also take into account the valence of other arbi-

    trary keywords (like lexical affinity), punctuation, and word co-

    occurrence frequencies. Statistical methods, however, tend to be

    semantically weak, which means that, with the exception of ob-

    vious affect keywords, other lexical or co-occurrence elements in

    a statistical model have little predictive value individually. Hence,

    statistical classifiers only have an acceptable accuracy when given

    a large enough text input. Therefore, while these methods may

    be able to classify text at a paragraph or page level, they are not

    effective for smaller text units such as sentences.

    Concept-based approaches focus on a semantic analysis of text

    through the use of web ontologies [35] or semantic networks [36] ,

    which require grasping the conceptual and affective information

    associated with natural language opinions. By relying on large

    semantic knowledge bases, such approaches step away from the

    blind use of keywords and word co-occurrence count, instead re-

    lying on the implicit meaning/features associated with natural lan-

    guage concepts. Concept-based approaches differ from purely syn-

    tactical techniques, in that they can detect sentiments which are

    expressed in a subtle manner, e.g., through the analysis of con-

    cepts which are implicitly linked to other concepts that express

    emotions.

    3. Semi-supervised reasoning

    In the proposed framework, MDS is used to represent concepts

    in a multi-dimensional vector space and biased SVM (bSVM) is

    exploited to infer semantics and sentics (that is, the conceptual

    and affective information) associated with such concepts, according

    to an hourglass-shaped emotion categorization model [2] ( Fig. 1 ).

    Under the aforementioned model, sentiments are organized around

    four independent dimensions (Pleasantness, Attention, Sensitivity,

    and Aptitude) whose different levels of activation make up the

    total emotional state of the mind. The bSVM model is adopted

    as a semi-supervised approach to tackle the classification task so

    as to overcome the lack of labeled commonsense data. In semi-

    supervised classification, in fact, both unlabeled and labeled data

    are exploited to learn a classification function empirically instead

    of learning a classification rule based only on labeled data. As

    a result, concepts for which affective information is missing can

    be employed in the classification phase. The main purpose of the

    bSVM-based framework developed in this study is to foretell the

    degree of affective valence each concept posses in a particular facet

    of the Hourglass model.

    3.1. Affective commonsense knowledge base

    The core module of the framework hereby proposed is an

    affective commonsense knowledge base built upon ConceptNet, the

    graph representation of the Open Mind Common Sense (OMCS)

    corpus, which is structurally similar to WordNet [37] , but whose

    scope of contents is general world knowledge, in the same vein

    s Cyc [38] . Instead of insisting on formalizing commonsense rea-

    oning using mathematical logic [39] , ConceptNet uses a new

    pproach: it represents multi-word expressions in the form of a

    emantic network and makes it available for usage in NLP.

    WordNet focuses on lexical categorization and word-similarity

    etermination and Cyc focuses on formalized logical reasoning.

    ontrastingly, ConceptNet is characterized by contextual common-

    ense reasoning: this means that it is meant for the task of making

    ractical context-based inferences over real-world texts. In Con-

    eptNet, WordNet’s notion of node in the semantic network is ex-

    ended from purely lexical items (words and simple phrases with

    tomic meaning) to include higher-order compound concepts such

    s satisfy_hunger and follow_recipe , so as to representnowledge of a greater range of concepts found in everyday life.

    Moreover, in ConceptNet, WordNet’s repertoire of semantic re-

    ations is extended from the triplet of synonym, IsA and PartOf ,

    o a repertoire of twenty semantic relations including, for ex-

    mple, EffectOf (causality), SubeventOf (event hierarchy), CapableOf

    agent’s ability), MotivationOf (affect), PropertyOf , and LocationOf .

    onceptNet’s knowledge is also of a more informal and practi-

    al nature. For example, WordNet has formal taxonomic knowl-

    dge that a ‘dog’ is a ‘canine’, which is a ‘carnivore’, which

    s a ‘placental mammal’; but it cannot make the logically ori-

    nted member-to-set association that a dog falls under the cate-ories of pet or family_member . ConceptNet on the other hand

  • A. Hussain, E. Cambria / Neurocomputing 275 (2018) 1662–1673 1665

    Fig. 2. AffectNet.

    d

    E

    a

    n

    i

    r

    s

    s

    o

    b

    o

    s

    y

    t

    p

    u

    m

    a

    l

    w

    t

    i

    a

    s

    s

    c

    b

    t

    t

    i

    b

    f

    w

    c

    c

    u

    p

    w

    i

    i

    W

    c

    c

    t

    t

    t

    s

    Fig. 3. AffectiveSpace’s first two eigenmoods.

    i

    a

    e

    s

    e

    t

    i

    d

    3

    i

    f

    a

    a

    T

    i

    w

    p

    p

    c

    t

    u

    p

    w

    t

    p

    r

    p

    w

    t

    escribes something that is often true but not always. For instance,

    ffectOf( fall_off_bicycle , get_hurt ); this shows it contains lot of knowledge that is defeasible, which is something that can-

    ot be left aside in commonsense reasoning. Most of the facts that

    nterrelate ConceptNet’s semantic network are dedicated to making

    ather generic connections between concepts.

    ConceptNet is obtained via an automatic process, in which a

    et of extraction rules are applied to the semi-structured English

    entences of the OMCS corpus, following which an additional set

    f ‘relaxation’ procedures are applied. This results in network gaps

    eing filled in and smoothed over, thereby optimizing connectivity

    f semantic networks. ConceptNet is a good source of common-

    ense knowledge, but it alone is not enough for sentiment anal-

    sis despite detailing the semantic links between concepts, given

    hat it frequently misses associations between concepts that ex-

    ress the same sentiment. To surmount this flaw, we employed the

    se of WNA – a semantic resource for the etymological embodi-

    ent of affective knowledge, which was built upon WordNet. By

    llocating a number of WordNet synsets to one or more affective

    abels (a-labels), WNA was developed. For instance, synsets marked

    ith the a-label ‘emotion’ demarcated concepts representing emo-

    ional states. There are also other a-labels for concepts represent-

    ng situations that provoke emotions, emotional responses as well

    s moods. Through a dual-stage process, WNA was born. The first

    tage pertained to the documentation of a base core of affective

    ynsets, while the second saw the expansion of the core with asso-

    iations outlined in WordNet. ConceptNet and WNA are then com-

    ined through the linear fusion of their respective matrix represen-

    ations into a single matrix, in which the knowledge between the

    wo databases is shared. In order for this combination process, the

    nput data from both sources has to be transmuted so that it can

    e denoted in its entirety in the same matrix. Hence, the lemma

    orms of ConceptNet notions are allied with the lemma forms of

    ords in WNA, and the most common associations in WMA are

    harted into ConceptNet’s set of relations. For instance, Holonym is

    harted into PartOf and Hypernym is to IsA . Effectively, we transfig-

    re ConceptNet into a matrix by divvying each assertion into two

    arts: a concept and a feature, wherein a feature is a contention

    ithout any quantified concept; for example ‘is a type of fluid’.

    Based on the dependability of assertions, the subsequent logs

    n the matrix are either positive or negative scores, which scale

    ncreases proportionally to their dependability. Like ConceptNet,

    NA is also denoted as a matrix, in which rows are affective

    oncepts and columns are associated qualities. In combining Con-

    eptNet and WNA into a single matrix, we created a new affec-

    ive semantic network, in which commonsense concepts are in-

    errelated with a graded system of affective domain tags. We call

    his new framework AffectNet [2] ( Fig. 2 ); through it, common-

    ense and emotional knowledge are not merely connected, but

    nstead melded together. For example, concepts from daily life such

    s meet_people or have_breakfast are now associated withmotional domain notions such as. ‘joy’, ‘anger’, or ‘surprise’. Such

    emantic associations can be of much benefit when tasks such as

    motion recognition or polarity detection from natural language

    ext are performed, as it is common for both sentiments and opin-

    ons to be conveyed implicitly through context and domain depen-

    ent concepts, instead of through specific affect words.

    .2. Affective analogical reasoning

    Problems are solved best when we understand a solution for

    t. The tricky part is when we encounter problems we have never

    aced before, for which require intuition. Intuition can be explained

    s the process of forming analogies between a current problem

    nd ones we have cracked previously to find a suitable solution.

    his process of thinking Fig. 3 could well be the essence of human

    ntelligence, as in daily life no two situations are identical, and

    e are continually having to apply analogical reasoning to solve

    roblems and make decisions. The human mind continuously ap-

    lies compression in all vital relations [40] . The compression prin-

    iples aim to convert diffuse and distended conceptual structures

    o more focused versions, to become more congenial for human

    nderstanding.

    In order to emulate such a process, MDS was previously ap-

    lied on the matrix representation of AffectNet, a semantic net-

    ork in which commonsense concepts were linked to seman-

    ic and affective f eatures ( Table 1 ). The result was AffectiveS-

    ace. PCA is most widely used as a data-aware dimensionality

    eduction method [41] and is closely related to the low-rank ap-

    roximation method, singular value decomposition (SVD), as both

    ork on a transformed version of the data matrix [42] . Therefore,

    runcated singular value decomposition (TSVD) is applied to the

  • 1666 A. Hussain, E. Cambria / Neurocomputing 275 (2018) 1662–1673

    Table 1

    A snippet of the AffectNet matrix.

    AffectNet IsA-pet KindOf-food Arises-joy ...

    Dog 0.981 0 0.789 ...

    Cupcake 0 0.922 0.910 ...

    Songbird 0.672 0 0.862 ...

    Gift 0 0 0.899 ...

    Sandwich 0 0.853 0.768 ...

    Rotten fish 0 0.459 0 ...

    Win lottery 0 0 0.991 ...

    Bunny 0.611 0.892 0.594 ...

    Police man 0 0 0 ...

    Cat 0.913 0 0.699 ...

    Rattlesnake 0.432 0.235 0 ...

    ... ... ... ... ...

    s

    c

    l

    i

    c

    a

    c

    f

    t

    A

    p

    p r

    l d

    s

    c

    s

    t

    i

    t

    r

    y

    f

    f

    t

    A

    i

    n

    t

    o

    i

    N

    p

    m

    L

    y√

    w

    d

    w

    m

    o

    t

    s

    e

    concept-feature matrix for the purposes of expediently diminish-

    ing its dimensionality and netting key links.

    The purpose of this is to ensure that the new joint model com-

    prises solely of the key features that exemplify the general outlook.

    Employing TSVD on AffectNet, results in it defining other qualities

    that could be of parallel relevance to known affective concepts. If

    there is no known value for an item in a concept in the matrix,

    but the same item is inherent in other analogous concepts, then

    by parallel association, it is highly probable that the same item is

    inherent in the concept as well. Essentially, concepts and features

    that are aligned and possess high dot yields are ideal contenders

    for analogies.

    Osgood et al. [43] produced a revolutionary piece on compre-

    hending and visualizing affective knowledge connected to natural

    language text. In their work, MDS was utilized to construct visu-

    alizations of affective texts based on their parallel scores when

    contrasted with texts from other cultures. In a multi-dimensional

    space, words can be conceived of as points, and parallel scores

    then denote the distances between words. MDS rethinks these dis-

    tances as points in a reduced dimensional space (most commonly

    two or three dimensioned). Similarly, AffectiveSpace’s purpose is

    to visualize the semantic and affective likeness between dissimi-

    lar concepts by projecting them onto a multi-dimensional vector

    space.

    However, different from that in Osgood’s research, the basic

    foundation of AffectiveSpace is not merely a restricted set of par-

    allel scores between affect words, but instead consists of millions

    of confidence scores linked to bits of commonsense knowledge

    that are in turn related to a structured system of affective domain

    labels. Instead of being defined by just a small number of human

    annotators and characterized as a word-word matrix, AffectiveS-

    pace is solidly based on AffectNet and denoted as a concept-feature

    matrix. SVD seeks to decompose the AffectNet matrix A ∈ R n ×d into three components,

    A = USV T , (1)where U and V are unitary matrices, and S is an rectangular diag-

    onal matrix with nonnegative real numbers on the diagonal.

    SVD has been proved to be optimal in preserving any unitarily

    invariant norm 1 ‖ · ‖ M [42] : ‖ A − A k ‖ M = min

    rank (B)=k ‖ A − B ‖ M , (2)

    where A k , i.e., AffectiveSpace, is formed by containing only the top

    k singular values in S . Hence, in AffectiveSpace, commonsense con-

    cepts and emotions are represented by vectors of k coordinates.

    These coordinates can be seen as describing concepts in terms of

    1 A norm ‖ · ‖ M is unitarily invariant if ‖ UAV ‖ M = ‖ A ‖ M for all A and all unitary U , V .

    φ

    eigenmoods’ which form the axes of AffectiveSpace, i.e., the ba-

    is e 0 , ..., e k −1 of the vector space. For example, the most signifi-ant eigenmood, e 0 , represents concepts with positive affective va-

    ence. That is, the larger a concept’s component in the e 0 direction

    s, the more affectively positive it is likely to be. Correspondingly,

    oncepts with negative e 0 components are likely to have negative

    ffective valence.

    Hence, by exploiting the information sharing property of SVD,

    oncepts with the same affective valence are likely to have similar

    eatures. This means that concepts which convey the same emo-

    ion are more likely to fall in close proximity to each other in

    ffectiveSpace. Concept similarity depends on, not their absolute

    ositions in vector space, but the angle they make with the origin.

    For example, concepts such as beautiful day , birthdayarty , and make someone happy are found very close in di-

    ection in the vector space, while concepts like feel guilty , beaid off , and shed tear are found in a completely differentirection (nearly opposite with respect to the centre of the space).

    The difficulty with this sort of representation is a lack of model

    calability: as the number of concepts and semantic features in-

    reases, the AffectNet matrix becomes so high-dimensional and

    parse that it can no longer be computed by the SVD [44] . Al-

    hough there has been substantial research seeking fast approx-

    mations of the SVD, the approximate methods are at best ≈ 5imes faster than the standard one [42] , hence it is not viable for

    eal-world big data applications.

    There has been conjecture that neuronal learning has simple,

    et powerful meta-algorithms underlying it [45] , which should be

    ast, effective, scalable and biologically plausible, as well as having

    ew-to-no assumptions [44] . Optimizing all the ≈ 10 15 connectionshrough the last few million years’ evolution is very unlikely [44] .

    lternatively, nature probably only optimizes the global connectiv-

    ty (mainly white matter), but leaves the other details to random-

    ess [44] .

    To handle the growing number of concepts and semantic fea-

    ures, we replace SVD with random projection (RP) [46] , a data-

    blivious method, to map the original high-dimensional data-set

    nto a much lower-dimensional subspace by using a Gaussian

    (0, 1) matrix, while preserving the pair-wise distances with high

    robability. This theoretically strong and empirically verified state-

    ent follows Johnson and Lindenstrauss’s (JL) Lemma [44] . The JL

    emma states that with high probability, for all pairs of points x,

    ∈ X simultaneously,

    m

    d ‖ x − y ‖ 2 (1 − ε) ≤‖ �x − �y ‖ 2 ≤ (3)

    √ m

    d ‖ x − y ‖ 2 (1 + ε) , (4)

    here X is a set of vectors in Euclidean space, d is the original

    imension of this Euclidean space, m is the dimension of the space

    e wish to reduce the data points to, ε is a tolerance parametereasuring to what extent is the maximum allowed distortion rate

    f the metric space, and � is a random matrix.

    Structured random projection for making matrix multiplica-

    ion much faster was introduced in [47] . Achlioptas [48] proposed

    parse random projection to replace the Gaussian matrix with i.i.d.

    ntries in

    ji = √

    s

    ⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩

    1 withprob . 1

    2 s

    0 with prob . 1 − 1 s

    −1 with prob . 1 2 s

    , (5)

  • A. Hussain, E. Cambria / Neurocomputing 275 (2018) 1662–1673 1667

    Fig. 4. AffectiveSpace 2.

    w

    t

    b

    t

    r

    O p

    w

    r

    p

    d

    i

    b

    a

    m

    n

    t

    w

    s

    s

    n

    t

    s

    o

    i

    i

    c

    e

    r

    t

    s

    d

    3

    s

    r

    a

    l

    c

    a

    u

    f

    t

    p

    i

    3

    o

    t

    d

    i

    R

    w

    o

    t

    X

    m

    t

    l

    i

    l

    m

    m

    f ∥∥

    n

    i

    o

    p

    i

    l

    o

    {

    s

    y

    w

    t

    s

    a

    l

    p

    o

    t

    here one can achieve a × 3 speedup by setting s = 3 , since only1 3 of the data need to be processed. However, since our input ma-

    rix is already too sparse, we avoid using sparse random projection.

    When the number of features is a lot larger than the num-

    er of training samples ( d � n ), subsampled randomized Hadamardransform (SRHT) is preferred, as it behaves similarly to Gaussian

    andom matrices but also accelerates the process from O(n d) to(n log d) time [49] . Following [49,50] , for d = 2 p where p is any

    ositive integer, a SRHT can be defined as:

    = √

    d

    m RHD , (6)

    here

    • m is the number we want to subsample from d featuresandomly.

    • R is a random m × d matrix. The rows of R are m uniform sam-les (without replacement) from the standard basis of R d .

    • H ∈ R d×d is a normalized Walsh–Hadamard matrix, which isefined recursively: H d =

    [H d/ 2 H d/ 2 H d/ 2 H d/ 2

    ]with H 2 =

    [+1 +1 +1 −1

    ].

    • D is a d × d diagonal matrix and the diagonal elements are.i.d. Rademacher random variables.

    Our subsequent analysis relies solely on distances and angles

    etween pairs of vectors (i.e., the Euclidean geometry information),

    nd is sufficient enough to set the projected space to be logarith-

    ic in the size of the data [51] and apply SRHT. The result is a

    ew vector space model, AffectiveSpace 2 ( Fig. 4 ), which preserves

    he semantic and affective relatedness of commonsense concepts

    hile being highly scalable [52] . The key to performing common-

    ense reasoning is to find a good trade-off for knowledge repre-

    entation. Since no two situations in life are ever exactly the same,

    o representation should be too concrete or else it will fail apply

    o new situations. However, at the same time, no representation

    hould be too abstract, or risk suppressing too many details.

    Within AffectiveSpace 2, this knowledge representation trade-

    ff can be seen in the choice of the vector space dimensional-

    ty. The number m indicates the dimension of AffectiveSpace and,

    n fact, is a measure of the trade-off between accuracy and effi-

    iency in the representation of the affective commonsense knowl-

    dge base. The bigger m is, the more precisely AffectiveSpace rep-

    esents AffectNet’s knowledge; the smaller m is, on the other hand,

    he more efficiently AffectiveSpace represents affective common-

    ense knowledge both in terms of vector space generation and of

    ot product computation.

    .3. Semi-supervised learning approach

    Even though AffectiveSpace 2 is a powerful tool for discovering

    emantic and affective relatedness of natural language concepts,

    easoning by analogy in such a multi-dimensional vector space is

    difficult task as the distribution of concepts in the space is non-

    inear and only the affective valence of a relatively small set of

    oncepts is known a priori. Hence, the bSVM and bRLS models are

    dopted as a semi-supervised approach in order to exploit both

    nlabeled and labeled commonsense data to learn a classification

    unction empirically. They are both based on the biased regulariza-

    ion theory, realized as follows: a reference solution (e.g., a hyper-

    lane) is used to bias the solution of a regularization-based learn-

    ng machine.

    .3.1. Regularization-based learning

    Modern classification methods often rely on regularization the-

    ry. In a regularized functional, a positive parameter, λ, rules theradeoff between the empirical risk, R emp [ f ], (loss function) of the

    ecision functions f (i.e., regression or classification) and a regular-

    zing term. The cost to be minimized can be expressed as:

    reg = R emp [ f ] + λ�[ f ] (7)here the regularization operator, �[ f ], quantifies the complexity

    f the class of functions from which f is drawn. Usually f belongs

    o a Reproducing Kernel Hilbert Space (RKHS) H . For a data set,

    , one computes a square matrix, K , of elements, which is sym-

    etric and positive definite. Every entry K ( s, x ) can be viewed as

    he inner product < φ( s ), φ( x ) > where φ(.) is the (implicit, noninear) mapping function uniquely defined by H . A kernel method

    mplies the choice of an inner-product formulation; the simplest,

    inear kernel supports the inner vector product in the actual do-

    ain space: < φ( s ), φ( x ) > ≡ < s, x > . When dealing with maximum-margin algorithms, �[ f ] is imple-

    ented by the term ∥∥ f ∥∥2

    H , which supports a square norm in the

    eature space. The Representer Theorem proves that, when �[ f ] =f ∥∥2

    H , the solution of the regularized cost can be expressed as a fi-

    ite summation over a set of labeled training patterns X = { x i , y i } , = 1 , . . . , l, y i ∈ {−1 , +1 } :

    f (x j ) = w · x j = l ∑

    i =1 βi K(x i , x j ) (8)

    SVM and RLS are popular methods belonging to this family

    f regularizing algorithms; both provide excellent performance in

    attern recognition problems. The two learning algorithms differ

    n their choice of loss function: the SVM model uses the ‘hinge’

    oss function, whereas RLS operates on a square loss function.

    The SVM training process requires one to solve the following

    ptimization problem:

    min ̄w ,b, ̄�}

    C

    l ∑ i =1

    �i + 1

    2

    ∥∥w ∥∥2 �i > 0 (9) ubject to:

    i (w T x i ) ≥ 1 − �i (10)

    here � i is a penalty term to be added for each misclassified pat-ern, and C is a hyperparameter that plays the role of 1/ λ. The con-tant parameter b has been dropped because one can equivalently

    ugment the space X with a feature of constant value +1. The prob-

    em can be efficiently solved in its dual form by using quadratic

    rogramming techniques. When using the dual formulation, one

    ptimizes a set of Lagrange multipliers, αi , and it can be shownhat the series coefficients in (8) can be written as β = α y .

    i i i

  • 1668 A. Hussain, E. Cambria / Neurocomputing 275 (2018) 1662–1673

    r

    w

    f

    c

    w

    f

    λ c

    t

    v

    w

    d

    t

    g

    I

    λ t

    e

    b

    (

    t

    r

    t

    s

    a

    l

    w

    o

    R

    Y

    b

    3

    v

    w

    e

    s

    p⎧⎪⎨⎪⎩ i{

    T

    When dealing with the RLS model, the problem to be optimized

    is:

    min f

    l ∑ i =1

    (y i − f i ) 2 + λ

    2

    ∥∥ f ∥∥2 H

    (11)

    whose optimum in β is found by solving the following linearsystem:

    (K + λI ) β = y (12)

    3.3.2. Maximal discrepancy bounds for model selection

    One of the main obstacles in classification problems is tun-

    ing classifier regularization parameter(s). When tackling limited-

    sample problems, strategies such as k-fold cross validation may be

    difficult to apply due to the small size of both the training and

    the test sets. Thus, theoretical approaches which derive the ana-

    lytical expressions of the generalization bounds, can give powerful

    options for attaining reliable model selection. These methods do

    not require data partitioning and are always based on the com-

    plexity on the hypothesis space, F . The bound value to the true

    generalization error, R [ f ], is asserted with confidence at least 1 − δ, and is commonly written as the sum of several terms:

    R [ f ] ≤ R emp [ f ] + χ + � (13)where R emp [ f ] is the error on the training set, χ measures the com-plexity of the space of classifying functions, and � penalizes the

    finiteness of the training sample.

    The Maximal-Discrepancy bound (MD) belongs to the class of

    theoretical approaches. In this research, the MD bound is exploited

    to confirm that the proposed semi-supervised learning scheme can

    ensure the shrinking of the generalization bound, thus providing

    effective tools for model selection in a semi-supervised setting.

    3.3.3. A biased regularization

    A general biased regularization model is realized as follows: a

    reference solution (e.g., a hyperplane) is used to bias the solution

    of a regularization-based learning machine.

    In a linear domain one can define a generic convex loss func-

    tion, l ( X, Y, w ), and a biased regularizing term; the resulting cost

    function is:

    l(X , Y , w ) + λ1 ∥∥w − λ2 w 0 ∥∥2 (14)

    where w 0 is a reference hyperplane, λ1 is the classical regulariza-tion parameter that controls smoothness (e.g., 1/ C in SVM), and

    λ2 controls the adherence to the reference solution w 0 . Expression(14) is a convex functional and thus admits a global solution. From

    (14) one gets:

    l(X , Y , w ) + λ1 2

    ∥∥w − λ2 w 0 ∥∥2 = l(X , Y , w ) + λ1 2

    ∥∥w ∥∥2 −λ1 λ2 ww 0 (15)

    which actually involves two regularization parameters, λ1 andλ2 ; this problem setting differs from the one proposed for SVM,where only one regularization parameter was defined, obtaining

    l(X , Y , w ) + λ1 ∥∥w − w 0 ∥∥2 . The latter expression coincides in the

    special case λ2 = 1 . Fig. 5 explicates the role played by parameter λ2 in three dif-

    ferent cases. In all those figures, w is set as the origin, 0, of the

    space of hypothesis, whereas a black square denotes the reference

    hyperplane, w 0 , and a grey square indicates the ‘true’ optimal so-

    lution. For the sake of clarity, and without loss of generality, the

    examples assume that:

    1. λ1 is set to a fixed value (i.e., λ1 = 1 ). 2. The distance

    ∥∥w − λ2 w 0 ∥∥ is constant for any λ2 . 3. w λ2 = 0 (black triangle) is the best solution one can obtain

    from the unbiased learning (i.e., λ = 0). Here, the best solution

    2

    efers to the solution that is closest to w ∗ among all the possible that lie at a distance

    ∥∥w − λ2 w 0 ∥∥ from w 0 (the dashed circum-erence).

    Fig. 5 (a) refers to the situation in which the reference w 0 is

    loser to the true solution w ∗ than w λ2 = 0 . The Figure shows thathen λ2 decreases from 1 to 0, the centre of the ideal circum-

    erence, which encloses the eventual solution w λ2 , drifts. When

    2 → 0, w λ2 moves toward the origin w = 0 , which represents theondition of no reference exploited. Indeed, the draw highlights

    hat, when w 0 gives a reliable reference, one can take full ad-

    antage of biased regularization, as the best solution for λ2 = 1 , λ2 = 1 , definitely improves over w λ2 = 0 .

    Fig. 5 (b) illustrates the opposite case: the reference w 0 is more

    istant from the true solution w ∗ than w λ2 = 0 (it is worth to notehat the relative position of w ∗ and w λ2 = 0 with respect to the ori-in w = 0 remained unchanged when compared with Fig. 5 (a)).n this situation, one would obtain the best outcome by setting

    2 = 0 , thus neutralizing the contribution of the biased regulariza-ion. Hence, w 0 does not represent a helpful reference.

    Finally, Fig. 5 (c) illustrates another situation in which the refer-

    nce w 0 is more distant from the true solution, w ∗, than w λ2 = 0 ,

    ut biased regularization still remains useful, as by adjusting λ2 i.e., by modulating the contribution of the reference w 0 ) one even-

    ually obtains a solution w λ2 that improves over w λ2 = 0 . As aesult, one can take advantage of biased regularization even when

    he reference solution is not optimal.

    The extension of (14) to non-linear models is obtained by con-

    idering a Reproducing Kernel Hilbert Space H . In that case one has

    reference function f 0 and the functional (15) becomes:

    (X , Y , f ) + λ1 2

    ∥∥ f − λ2 f 0 ∥∥2 H (16)here now the norm of the regularizer is taken in H . Eventually,

    ne obtains the models for the biased SVM (bSVM) and the biased

    L S (bRL S), respectively, by adopting the proper loss function l ( X,

    , w ): bSVM:

    l

    i =1 (1 − y i f (x i )) + +

    λ1 2

    ∥∥ f − λ2 f 0 ∥∥2 H (17)RLS:

    l

    i =1 (y i − f (x i )) 2 +

    λ1 2

    ∥∥ f − λ2 f 0 ∥∥2 H (18).3.4. Biased SVM

    The following theorem shows the formalization of the biased

    ersion of the support vector machine (bSVM) within this scheme

    ith the inclusion of a regularizing bias.

    Theorem 1 (bSVM) : Given a reference hyperplane w 0 (or a refer-

    nce function f 0 , if the domain is not linear), a regularization con-

    tant C , and a biasing constant λ2 , the dual form of the learningroblem:

    min { �, w } C ∑ l

    i =1 �i + 1

    2

    ∥∥w − λ2 w 0 ∥∥2 y i (w · x i ) ≥ 1 − �i ∀ i �i ≥ 0 ∀ i

    (19)

    s written as

    min { α} 1

    2 αt Q α − ∑ l i =1 αi (1 − λ2 y i f 0 (x i ))

    0 ≤ αi ≤ C ∀ i (20)

    he model of the data is:

    f (x ) = l ∑

    i =1 αi y i K(x , x i ) + λ2 f 0 (x ) (21)

  • A. Hussain, E. Cambria / Neurocomputing 275 (2018) 1662–1673 1669

    Fig. 5. The role played by parameter λ2 in the proposed problem setting. Three different situations are analyzed: (a): the reference w 0 is closer to the true solution w ∗ than

    w λ2 = 0 ; (b): the reference w 0 is more distant from the true solution w ∗ than w λ2 = 0 ; (c): the reference w 0 is more distant from the true solution w

    ∗ than w λ2 = 0 , but biased regularization can be useful.

    w

    m

    k

    p

    (

    L

    g

    i

    G

    K

    r

    t

    G

    {

    b

    s

    i

    3

    R

    c

    m

    h

    w

    (

    f

    λ

    m

    i

    m

    w

    β

    C

    r

    t

    m

    i

    p

    a

    b

    b

    w

    t

    3

    r

    d

    d

    t

    l l l

    here α are the support vectors, � the slack variables used toeasure the grade of allowed misclassification and K the chosen

    ernel.

    Unlike the conventional SVM formulation, the minimization

    roblem (20) does not contain a linear constraint. This problem

    20) can be optimized by an SMO version which uses only a single

    agrange multiplier at each iteration. In such a new procedure, the

    radient integrates the new reference based-term and the regular-

    zation parameter. The gradient value for the i th pattern is:

    i = y i l ∑

    j=1 α j y j K(x j , x i ) − 1 + λ2 y i f 0 (x i ) (22)

    Then, as usual, the projected gradient PG is computed and the

    KT optimality conditions are checked on this value. The algo-

    ithm runs till the KKT conditions are satisfied. In the following

    he pseudo-code of the algorithm is presented.

    bSVM Solver :

    1. Initialization: α = 0 , flag = 0 2. While (!flag):

    a. flag = 1 b. for i = 1 , li.

    = y i l ∑

    j=1 α j iy j K(x j , x i ) − 1 + λ2 y i f 0 (x i ) (23)

    ii.

    min (G, 0) αi = 0 min (G, 0) αi = C G 0 < αi < C

    (24)

    iii. if | PG | > �1. αi = min (max (αi − G/k ii , 0) , C) 2. f lag = 0 end if

    end for end while

    The pseudo-code listed above can be considerably accelerated

    y updating the gradient only when necessary, as well as by using

    hrinking and random permutations of indexes of patterns at each

    teration.

    .3.5. Biased RLS

    The following theorem formalizes the linear biased version of

    LS.

    Theorem 2 : Given a reference hyperplane w 0 , a regularization

    onstant λ1 , and a biasing constant λ2 , the problem:

    in w ∥∥Xw − y ∥∥2 + λ1 ∥∥w − λ2 w 0 ∥∥2 (25)

    as solution:

    = (X t X + λ1 I ) −1 (X t y + λ1 λ2 w 0 ) (26)The following theorem gives the dual form of biased RLS

    bRLS):

    Theorem 3 : Given a reference hyperplane w 0 (or a reference

    unction f 0 ), a regularization constant λ1 , and a biasing constant

    2 , the dual of the problem:

    in w ∥∥Xw − y ∥∥2 + λ1 ∥∥w − λ2 w 0 ∥∥2 (27)

    s:

    in β∥∥K β + λ2 f 0 (x ) − y ∥∥2 + λ1 βt K β (28)

    hich has solution:

    = (K + λ1 I ) −1 (y − λ2 f 0 (X )) (29)The model of the data is:

    f (x ) = l ∑

    i =1 βi K(x , x i ) + λ2 f 0 (x ) (30)

    orollary 1. Given the RLS learning machine, the RKHS H and the

    epresentation,

    f (x ) = l ∑

    i =1 βi K(x , x i ) + λ2 f 0 (x ) (31)

    he solution of problem

    in f ∥∥ f − y ∥∥2 + λ1 ∥∥ f − λ2 f 0 ∥∥2 H (32)

    s:

    (K + λ1 I ) β = y − λ2 f 0 (X ) (33)From a computational point of view, the choice between the

    rimal or the dual solution depends on the characteristics of the

    vailable data. If samples lie in a low-dimensional space and can

    e separated by a linear classifier, the primal form is preferable

    ecause it scales linearly with the number of features. Conversely,

    hen the number of patterns is lower than the number of features,

    he dual form should be used.

    .3.6. A semi-supervised learning scheme based on biased

    egularization

    Once the biased version of the SVM kernel machine has been

    efined, the following four-step procedure can be followed in or-

    er to formalize a semi-supervised framework for the classification

    ask.

    Let X be a dataset composed by l labeled patterns and u un-

    abeled patterns; let X denote the labeled subset, y denote the

  • 1670 A. Hussain, E. Cambria / Neurocomputing 275 (2018) 1662–1673

    Table 2

    Comparison of semi-supervised methods.

    Method Complexity

    bSVM O c + O (l + u ) 2 bSVM linear O c + O (d(l + u ) k ) bRLS O c + O (l + u ) 3 LapRLS O (l + u ) 3 LapSVM O (l + u ) 3 LapSVM linear O ( d 3 )

    Cluster kernel O ( max ( l, u ) 3 )

    TSVM O (k (l + 2 u ) 2 ) EM –

    Co-training –

    Semi parametric regularization O (l + u ) 3

    i

    l

    t

    o

    i

    b

    c

    c

    f

    S

    t

    w

    s

    p

    i

    T

    c

    n

    u

    t

    g

    i

    r

    l

    o

    s

    g

    o

    i

    r

    corresponding vector of labels, and X u denote the unlabeled sub-

    set. Then the semi-supervised learning scheme can be formalized

    as follows:

    1. Clustering: Use any clustering algorithm to perform an un-

    supervised partition of the dataset X (a bipartition in the simplest

    case).

    2. Calibration: For every cluster, a majority voting scheme is

    adopted to set the cluster label; this is done by exploiting the la-

    beled samples. Then, for each cluster, assign to each sample the

    cluster label. Let ˆ y denote this new set of labels.

    3. Mapping: Given X and ˆ y , train the selected learning machine

    and obtain the solution w 0 .

    4. Biasing: Given X l and the true labels y l , train the biased ver-

    sion of the learning machine (biased by w 0 ). The solution w carries

    information derived from both the labeled data X l and the unla-

    beled data X u .

    The ultimate result is that the overall learned function ex-

    ploits both labeled and unlabeled commonsense data. The semi-

    supervised learning scheme possesses some interesting features:

    • Since the proposed method could be applied both to linearand non linear domains, the result is a completely general-

    izable learning scheme.

    • The present learning scheme distinguishes between twoprincipal actions: clustering and biasing. This means that

    the two tasks can be tackled independently. If one wants to

    adopt a particular solution for biasing or a new clustering al-

    gorithm is designed, then the two actions can be controlled

    and adjusted separately.

    • If the learning machine is a single layer learning machinewhose cost is convex then convexity is preserved and a

    global solution is granted.

    • Every clustering method can be used to build the referencesolution.

    3.3.7. bSVM and bRLS for semi-supervised learning: computational

    complexity

    This semi-supervised learning scheme consists of 3 compu-

    tationally intensive steps: clustering, mapping and biasing. Clus-

    tering tasks can be completed using various multiple clustering

    algorithms, which are then characterized by computational com-

    plexities. As follows, the complexity of clustering can be denoted

    generically as O c . There also exists some solutions which allow the

    implementation of powerful clustering algorithms such as k-means

    or Special Clustering.

    In the second step, mapping, the time complexity is entirely de-

    termined by the learning machine applied to all the l + u availablesamples. For RLS this would mean a complexity of O ( ( l + u ) 3 ) , i.e.,the solution of the system of linear equations (29) . When adopt-

    ing SVM as learning machine, one can exploit the SMO algorithm,

    which scales in between O (l + u ) and O ( (l + u ) 2 ) . The third step, biasing, consists of solving through either a lin-

    ear system or using an SMO-like algorithm. In both cases, one also

    need to pre-compute the predictions of the reference model f 0 ( x )

    for all the labeled patterns (with d-dimensional patterns the even-

    tual cost is O ( ud )). As a result, when bRLS is adopted as learning

    machine, the computational complexity is:

    O bRLS = O c + O ((l + u ) 3

    )+ O (ud) + O

    (l 3 )

    (34)

    In O bRLS the dominant terms are O ( ( l + u ) 3 ) and possibly thecomplexity O c associated to the clustering task. When instead

    bSVM is used, the computational complexity is:

    O bSV M = O c + O ((l + u ) 2

    )+ O (ud) + O

    (l 2 )

    (35)

    where the dominant terms are O ((l + u ) 2 ) and, again, the com-plexity O c . In this case, one assumes that O (ud) < O ((l + u ) 2 ) ; this

    s a reasonable hypothesis, except for those cases where the data

    ie in a highly dimensional space. Therefore, the complexity of the

    raining procedure roughly scales with the same complexity of the

    riginal learning machine. SVM scales approximately as O ( l 2 ), and

    ts semi-supervised version (bSVM) scales as O ((l + u ) 2 ) ; a similarehavior characterizes RLS and bRLS.

    Below are final considerations which can contribute to the dis-

    ussion surrounding computational complexity:

    • If it is known a priori that the data are almost linearly sep-arable, then it is possible to build very efficient learning al-

    gorithms. For instance, one can couple fast linear k-means

    implementations with the linear version of SVM and bSVM.

    That set up leads to very efficient learning methods, in par-

    ticular when data is highly sparse, such as in text mining

    problems.

    • The proposed semi-supervised learning scheme can addresslarge scale problems, as long as the clustering engines are

    scaling well. For certain domains, for example text mining,

    in which linearity and data sparsity are exploitable, adap-

    tations of the learning algorithm can result in incredibly

    fast learning algorithms. These algorithms can hence process

    hundreds of thousands of patterns in a matter of seconds.

    • New, unseen test patterns can be managed effectively asclass assignment can exploit the closed form functions.

    The proposed framework can provide attractive features when

    ompared with other semi-supervised methods. First, the reference

    unction can be worked out by exploiting any clustering algorithm.

    econd, biased regularization can support effectively model selec-

    ion. This feature is crucial indeed, in particular when a model

    ith few labeled data is addressed. Other approaches to semi-

    upervised learning do not provide this attribute. Furthermore, the

    resent framework exploits a convex cost function.

    The proposed framework also ensures satisfactory performance

    n terms of computational complexity. This is highlighted in

    able 2 , which reports – for each approach – the computational

    omplexity. Computational complexity is formalized by using the

    umber of labeled patterns, l , the number of unlabeled patterns,

    , the dimensionality of the data, d , the number of iterations of

    he learning algorithm, k , and the complexity of the clustering al-

    orithm, O c . For the proposed biased learning machines, complex-

    ty has been formalized by assuming the use of efficient SMO-like

    outines.

    It is evident from the table that the current semi-supervised

    earning scheme can achieve satisfactory performances in terms

    f computational complexity, whenever the clustering algorithm

    cales as (or better than) the adopted biased machine. In this re-

    ard, bSVM appears especially appealing as it scales quadratically,

    r even linearly if the underlying problem has particular character-

    stics. Indeed, one should take into account that the term O c rep-

    esents the added cost to be paid for a gain in flexibility.

  • A. Hussain, E. Cambria / Neurocomputing 275 (2018) 1662–1673 1671

    Fig. 6. The effectiveness of model selection is improved by adopting a formulation of biased regularization which fully exploits parameters λ1 and λ2 . Four cases are

    presented: (a) good reference / weak bias; (b) good reference / strong bias; (c) bad reference / weak bias; (d) bad reference / strong bias.

    3

    e

    t

    d

    c

    t

    i

    E

    t

    f

    p

    t

    e

    u

    p

    p

    t

    i

    f

    o

    F

    F

    s

    l

    t

    a

    t

    a

    t

    w

    e

    i

    F

    t

    b

    s

    i

    c

    t

    f

    w

    p

    4

    b

    Table 3

    Emotion recognition accuracy.

    Method Strict accuracy (%) Relaxed accuracy (%)

    bSVM (100 m ) 63 87

    bSVM (50 m ) 62 89

    bRLS (100 m ) 63.5 88.5

    bRLS (50 m ) 64 90.5

    ANN 46.9 76.5

    k-medoids 43.2 74.1

    k-NN 41.9 72.3

    Random 14.3 40.1

    A

    t

    a

    i

    I

    A

    c

    I

    e

    c

    w

    t

    d

    a

    b

    c

    p

    r

    t

    I

    p

    t

    s

    t

    t

    v

    t

    f

    A

    t

    t

    w

    m

    t

    p

    2 http://sentic.net/downloads

    .3.8. Biased regularization for semi-supervised learning supports

    ffective model selection

    The proposed semi-supervised learning framework can extend

    he supervised classification scheme presented in [53] , which

    emonstrated that when a cluster hypothesis holds, the use of

    lustering to set a reference solution leads to a significant reduc-

    ion of the space of possible functions. Such a result is noteworthy

    n that it leads to tight generalization bounds, since the term χ inq. (13) measures the complexity of the space of classifying func-

    ions. Tight generalization bounds are in turn a necessary condition

    or supporting an effective model selection.

    In [53] , model selection of SVM is actually performed by ex-

    loiting an auxiliary machine, called VQSVM. In the VQSVM model

    he learning task is fully supervised because the clustering refer-

    nce only derives from labeled data. Besides, an Ivanov biased reg-

    larization term is used and the regularization parameter λ2 is im-licitly set to a fixed value.

    To extend the scheme [53] to semi-supervised learning, the

    resent framework involves both labeled and unlabeled data in

    he clustering step. Indeed, the effectiveness of model selection is

    mproved by adopting a formulation of biased regularization that

    ully exploits parameters λ1 and λ2 . Fig. 6 considers four cases, and analyzes the relative positions

    f the origin w = 0 , the reference, w 0 , and the true solution, w ∗.ig. 2 (a) exemplifies the case ‘good reference / weak bias’, whereas

    ig. 2 (b) illustrates the case ‘good reference / strong bias’. In both

    ituations, the reference solution w 0 is not far from the true so-

    ution w ∗. Yet, the first case adopts a weak bias, which permitshe biasing step to explore a relatively wide portion of space

    round the reference (the lighter circumference, which represents

    he space of functions). On the other hand, the second case adopts

    strong bias, thereby exploring a smaller portion of space. Even-

    ually, the area being explored via a strong bias does not include

    ∗. Fig. 6 (c) refers to the case ‘bad reference / weak bias’. The ref-

    rence w 0 is quite distant from the true solution w ∗; by adopt-

    ng a weak bias, though, one can still exploit biasing to reach w ∗.inally, Fig. 6 (d) presents the case ‘bad reference / strong bias’. In

    his situation, by adopting a strong bias one restricts the space to

    e explored to a small region around w 0 . As a result, the proposed

    olution will be very distant from the true solution w ∗. On the whole, the four examples affirm that that by modulat-

    ng the biasing mechanism through the parameters λ1 and λ2 onean take full advantage of the semi-supervised scheme and support

    he model selection procedure properly. Eventually, the proposed

    ramework involves a novel semi-supervised classification scheme,

    hich supports a fully automated model selection and can be ap-

    lied also when the size of the labeled dataset is small ( l < 50).

    . Experimental results

    The proposed affective commonsense reasoning architecture

    ased on bSVM and bRLS was tested on the publicly available

    ffectNet benchmark. 2 The AffectNet database provides four affec-

    ive labels for 2257 concepts. Each label corresponds to a level of

    ctivation in the Hourglass model. Concepts are described accord-

    ng to the m -dimensional vector space defined by AffectiveSpace 2.

    n the present experimental session, two different configurations of

    ffectiveSpace were compared: m = 100 and m = 50. In order to robustly evaluate the performance of the proposed

    lassification framework, a cross-validation procedure was applied.

    n particular, we considered ten different experimental runs. In

    ach run, 200 concepts that were randomly extracted from the

    omplete database provided the test set; the remaining concepts

    ere evenly split into a training set and a validation set. In order

    o test the semi-supervised approach, 500 unlabeled patterns ran-

    omly extracted from a total of 16,431 unlabeled concepts were

    dded to the training set. The reference function of the bSVM and

    RLS frameworks can be derived from any clustering algorithm; we

    hose for this purpose the k-means algorithm.

    In the present set up, the validation set was designed to sup-

    ort the model selection phase, i.e., the selection of the best pa-

    ameterization for the classifier. A RBF kernel was adopted, thus

    he model selection actually involved two parameters: γ and C.n each run the classification accuracy was measured by using the

    rediction system obtained after the model selection phase. Only

    he patterns included in the test set were exploited to assess clas-

    ification accuracy, i.e., the patterns that were not involved in the

    raining phase or in the model selection phase.

    Table 3 reports the average classification accuracy obtained by

    he bSVM and bRLS frameworks over the ten runs. The table pro-

    ides the results of the two different experiments addressed in

    his research: the experiment involving the 100-dimensional Af-

    ectiveSpace 2 and the experiment involving the 50-dimensional

    ffectiveSpace 2. Classification accuracy is evaluated according to

    wo different criteria. The first criterion, “Strict Accuracy”, refers

    o the percentage of patterns for which the classification frame-

    ork correctly predicted the activation level for every affective di-

    ension. The second criterion, “Relaxed Accuracy”, assumes that

    he prediction is correct even in the event the framework fails to

    roperly assess one affective dimension out of four. The relaxed

    http://sentic.net/downloads

  • 1672 A. Hussain, E. Cambria / Neurocomputing 275 (2018) 1662–1673

    Table 4

    Comparison with the state of the art.

    System Accuracy (%)

    Socher et al. (2012) 80.00

    Socher et al. (2013) 85.40

    Proposed method 88.50

    accuracy actually takes into account that a certain level of noise

    may hinder the AffectNet benchmark, as affective activation levels

    are assigned according to subjective tests.

    The results showed in Table 3 provide a few interesting out-

    comes. Firstly, both the bSVM-based and the bRLS-based frame-

    works proved to be able to attain very promising performance in

    terms of classification accuracy. Indeed, the proposed approach sig-

    nificantly improves over standard classification techniques such as

    k-NN model, k-medoids, and artificial neural network (ANN). Sec-

    ondly, both frameworks also attained reliable performance when

    the 50-dimensional AffectiveSpace 2 was adopted, which consis-

    tently reduced the complexity of the reasoning architecture. Such

    results confirm the ability to deal with complex problems. More-

    over, bRLS slightly improves over bSVM.

    The proposed affective reasoning framework with bSVM was

    also evaluated on an opinion mining dataset derived from a corpus

    developed by Pang and Lee [32] . The corpus consists of 10 0 0 pos-

    itive and 10 0 0 negative movie reviews from expert reviewers, col-

    lected from rottentomatoes.com. All text has been converted into

    lowercase and has been lemmatized, whilst HTML tags have also

    been removed. Pang and Lee initially labelled each review man-

    ually as either positive or negative, following which the Stanford

    NLP group annotated this dataset at sentence level [54,55] .

    They extracted 11,855 sentences from the reviews and manu-

    ally labeled them as positive and negative. We used the emotion-

    polarity formula provided by the Hourglass model to calculate a

    binary polarity value for each dataset sentence. Table 4 presents

    the comparison of the proposed system with the state-of-the-art

    accuracy.

    5. Conclusion

    We live in a world where millions express their views and opin-

    ions of commercial products on the web on a daily basis. This

    distillation of knowledge due to the massive amounts of unstruc-

    tured information available, is of considerable importance for tasks

    such as social media marketing, product positioning, and financial

    market prediction.

    While existing approaches are limited by the fact that they

    work at word-level need a lot of training, semi-supervised com-

    monsense reasoning seems a good solution to the problem of big

    social data analysis. In this work, a novel learning model based on

    the combined use of random projections and support vector ma-

    chines was exploited to perform reasoning on a knowledge base of

    affective commonsense. Results showed a significant improvement

    in both emotion recognition and polarity detection and, hence,

    paved the way for semi-supervised learning approaches to big

    social data analysis.

    References

    [1] S. Poria , E. Cambria , R. Bajpai , A. Hussain , A review of affective computing:from unimodal analysis to multimodal fusion, Inf. Fus. 37 (2017) 98–125 .

    [2] E. Cambria , A. Hussain , Sentic Computing: A Common-Sense-Based Frameworkfor Concept-Level Sentiment Analysis, Springer, Cham, Switzerland, 2015 .

    [3] E. Cambria , D. Olsher , K. Kwok , Sentic activation: a two-level affective com-

    mon sense reasoning framework, in: Proceedings of the AAAI, Toronto, 2012,pp. 186–192 .

    [4] R. Speer , C. Havasi , ConceptNet 5: A large semantic network for relationalknowledge, in: E. Hovy, M. Johnson, G. Hirst (Eds.), Theory and Applications

    of Natural Language Processing, Springer, 2012 .

    [5] C. Strapparava , A. Valitutti , WordNet-Affect: an affective extension of Word-Net, in: Proceedings of the International Conference on Language Resources

    and Evaluation, Lisbon, 2004, pp. 1083–1086 . [6] E. Cambria , D. Das , S. Bandyopadhyay , A. Feraco , A Practical Guide to Senti-

    ment Analysis, Springer, Cham, Switzerland, 2017 . [7] F. Bisio , P. Gastaldo , R. Zunino , S. Decherchi , Semi-supervised machine learn-

    ing approach for unknown malicious software detection, in: Proceedings of theIEEE International Symposium on Innovations in Intelligent Systems and Appli-

    cations (INISTA), IEEE, 2014, pp. 52–59 .

    [8] S. Poria , E. Cambria , D. Hazarika , N. Mazumder , A. Zadeh , L.-P. Morency , Con-text-dependent sentiment analysis in user-generated videos, in: Proceedings of

    the ACL, 2017, pp. 873–883 . [9] I. Chaturvedi , E. Ragusa , P. Gastaldo , R. Zunino , E. Cambria , Bayesian net-

    work based extreme learning machine for subjectivity detection, J. Frankl. Inst.(2017) .

    [10] F. Xing, E. Cambria, R. Welsch, Natural language based financial forecasting: a

    survey, Artif. Intell. Rev. (2018), doi: 10.1007/s10462- 017- 9588- 9 . [11] M. Ebrahimi , A. Hossein , A. Sheth , Challenges of sentiment analysis for dynam-

    icevents, IEEE Intell. Syst. 32 (5) (2017) . [12] E. Cambria , A. Hussain , T. Durrani , C. Havasi , C. Eckl , J. Munro , Sentic comput-

    ing for patient centered application, in: Proceedings of the IEEE InternationalConference on Signal Processing, Beijing, 2010, pp. 1279–1282 .

    [13] A. Valdivia , V. Luzon , F. Herrera , Sentiment analysis in tripadvisor, IEEE Intell.

    Syst. 32 (4) (2017) 2–7 . [14] S. Cavallari , V. Zheng , H. Cai , K. Chang , E. Cambria , Joint node and commu-

    nity embedding on graphs, in: Proceedings of the International Conference onInformation and Knowledge Management, 2017 .

    [15] R. Mihalcea , A. Garimella , What men say, what women hear: finding gen-der-specific meaning shades, IEEE Intell. Syst. 31 (4) (2016) 62–67 .

    [16] E. Cambria , S. Poria , A. Gelbukh , M. Thelwall , Sentiment analysis is a big suit-

    case, IEEE Intell. Syst. 32 (6) (2017) . [17] S. Poria , I. Chaturvedi , E. Cambria , F. Bisio , Sentic LDA: improving on LDA with

    semantic similarity for aspect-based sentiment analysis, in: Proceedings of theInternational Joint Conference on Neural Networks, 2016, pp. 4 465–4 473 .

    [18] Y. Ma , E. Cambria , S. Gao , Label embedding for zero-shot fine-grained namedentity typing, in: Proceedings of the International Conference on Computa-

    tional Linguistics, Osaka, 2016, pp. 171–180 .

    [19] Y. Xia , E. Cambria , A. Hussain , H. Zhao , Word polarity disambiguation us-ing Bayesian model and opinion-level features, Cognit. Comput. 7 (3) (2015)

    369–380 . [20] X. Zhong , A. Sun , E. Cambria , Time expression analysis and recognition using

    syntactic token types and general heuristic rules, in: Proceedings of the ACL,2017, pp. 420–429 .

    [21] N. Majumder , S. Poria , A. Gelbukh , E. Cambria , Deep learning-based document

    modeling for personality detection from text, IEEE Intell. Syst. 32 (2) (2017)74–79 .

    [22] S. Poria , E. Cambria , D. Hazarika , P. Vij , A deeper look into sarcastic tweetsusing deep convolutional neural networks, in: Proceedings of the International

    Conference on Computational Linguistics, 2016, pp. 1601–1612 . [23] L. Oneto , F. Bisio , E. Cambria , D. Anguita , Statistical learning theory and ELM

    for big social data analysis, IEEE Comput. Intell. Mag. 11 (3) (2016) 45–55 . [24] C.D. Elliott , The affective reasoner: a process model of emotions in a multi-a-

    gent system, Northwestern University, Evanston, 1992 Ph.D. thesis .

    [25] A. Ortony , G. Clore , A. Collins , The Cognitive Structure of Emotions, CambridgeUniversity Press, Cambridge, 1988 .

    [26] J. Wiebe , T. Wilson , C. Cardie , Annotating expressions of opinions and emotionsin language, Lang. Resour. Eval. 39 (2) (2005) 165–210 .

    [27] T. Wilson , J. Wiebe , P. Hoffmann , Recognizing contextual polarity inphrase-level sentiment analysis, in: Proceedings of the Human Language Tech-

    nology Conference and Empirical Methods in Natural Language Processing,

    Vancouver, 2005, pp. 347–354 . [28] R. Stevenson , J. Mikels , T. James , Characterization of the affective norms for

    english words by discrete emotional categories, Behav. Res. Methods 39 (2007)1020–1024 .

    [29] S. Somasundaran , J. Wiebe , J. Ruppenhofer , Discourse level opinion interpre-tation, in: Proceedings of the International Conference on Computational Lin-

    guistics, Manchester, 2008, pp. 801–808 .

    [30] B. Pang , L. Lee , S. Vaithyanathan , Thumbs up? Sentiment classification usingmachine learning techniques, in: Proceedings of the Empirical Methods for

    Natural Language Processing, Philadelphia, 2002, pp. 79–86 . [31] B. Goertzel , K. Silverman , C. Hartley , S. Bugaj , M. Ross , The Baby Webmind

    project, in: Proceedings of the Adaptation in Artificial and Biological Systems,Birmingham, 20 0 0 .

    [32] B. Pang , L. Lee , Seeing stars: exploiting class relationships for sentiment cate-

    gorization with respect to rating scales, in: Proceedings of the ACL, Ann Arbor,2005, pp. 115–124 .

    [33] M. Hu , B. Liu , Mining and summarizing customer reviews, in: Proceedings ofthe Conference on Knowledge Discovery and Data Mining, Seattle, 2004 .

    [34] L. Velikovich , S. Goldensohn , K. Hannan , R. McDonald , The viability of we-b-derived polarity lexicons, in: Proceedings of the Annual Conference of the

    North American Chapter of the Association for Computational Linguistics, Los

    Angeles, 2010, pp. 777–785 . [35] A. Gangemi , V. Presutti , D. Reforgiato , Frame-based detection of opinion hold-

    ers and topics: a model and a tool, IEEE Comput. Intell. Mag. 9 (1) (2014)20–30 .

    http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0001http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0001http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0001http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0001http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0001http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0002http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0002http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0002http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0003http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0003http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0003http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0003http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0004http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0004http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0004http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0005http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0005http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0005http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0006http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0006http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0006http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0006http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0006http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0007http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0007http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0007http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0007http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0007http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0008http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0008http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0008http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0008http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0008http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0008http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0008http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0009http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0009http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0009http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0009http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0009http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0009https://doi.org/10.1007/s10462-017-9588-9http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0011http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0011http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0011http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0011http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0012http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0012http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0012http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0012http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0012http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0012http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0012http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0013http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0013http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0013http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0013http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0014http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0014http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0014http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0014http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0014http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0014http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0015http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0015http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0015http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0016http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0016http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0016http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0016http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0016http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0017http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0017http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0017http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0017http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0017http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0018http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0018http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0018http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0018http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0019http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0019http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0019http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0019http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0019http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0020http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0020http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0020http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0020http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0021http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0021http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0021http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0021http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0021http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0022http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0022http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0022http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0022http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0022http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0023http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0023http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0023http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0023http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0023http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0024http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0024http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0025http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0025http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0025http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0025http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0026http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0026http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0026http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0026http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0027http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0027http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0027http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0027http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0028http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0028http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0028http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0028http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0029http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0029http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0029http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0029http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0030http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0030http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0030http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0030http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0031http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0031http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0031http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0031http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0031http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0031http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0032http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0032http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0032http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0033http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0033http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0033http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0034http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0034http://refhub.elsevier.com/S0925-2312(17)31636-3/sbref0034http://refhub.elsevier.com/S0925-2312(17)3