Download pdf - Question Retrieval in Community Question Answeringsei.pku.edu.cn/~luyy11/slides/slides_150527_seke.pdf · Simultaneously conducting sentence segmentation and sentence-level sentiment

OUTLINE

Sentiment-Specific Representation Learning for Document-LevelSentiment Analysis. WSDM’15 Introduction Approach: Sentiment-Specific Word Embedding + Sentence Composition Experiments

Combining Language and Vision with a Multimodal Skip-gram Model.NAACL’15 Introduction Multimodal Skip-gram Architecture Experiments: Abstract Words - Images

Tagging Personal Photos with Transfer Deep Learning. WWW’15 Introduction Approach: Ontology Building + Transfer Learning + Personal Photo Tagging Experiments

1

1

LU YangyangMay 27th, 2015

[email protected]

mailto:[email protected]

AUTHORS

Sentiment-Specific Representation Learning for Document-LevelSentiment Analysis WSDM 2015, short paper Duyu Tang (Bin Qin, Ting Liu) Research Center for Social Computing and Information Retrieval,

Harbin Institute of Technology

Combining Language and Vision with a Multimodal Skip-gram Model NAACL 2015, full paper Angeliki Lazaridou, Nghia The Pham, Marco Baroni Center for Mind/Brain Sciences, University of Trento

Tagging Personal Photos with Transfer Deep Learning WWW 2015, full paper Jianlong Fu 1, Tao Mei 2, Kuiyuan Yang 2, Hanqing Lu 1, and Yong Rui 2

1. Institute of Automation, Chinese Academy of Sciences2. Microsoft Research, China

2

Sentiment analysis (also known as opinion mining) To analyze people's opinions/sentiments/emotions from texts

Related Work: Method based on feature engineering, hand-coded features Document-level sentiment analysis >> text categorization task Focusing on designing effective features Using many sentiment lexicons and hand-crafted rules as features

Deep learning methods Effective in NLP tasks: word-segmentation, POS tagging, NER, etc. Context-based learning algorithms Focusing on modeling the contexts of words Fail to capture the sentiment information of texts

developing sentiment-specific representation learning methods fordocument-level sentiment analysis

Sentiment-Specific Representation Learning for Document-Level Sentiment Analysis. WSDM’15

3

Similar contexts of “good” and “bad”

1. Sentiment-Specific Word Embedding Simultaneously encoding the contexts of words and sentiment information of

texts in the continuous representation of words Using emoticon signals as the sentiment supervision of Twitter sentences SSWE: an extension of the C&W model SSPE: an extension of Mikolov’s Skip-gram model

2. Sentiment-Specific Sentence Structure Sentiment-specific sentence segmentor

3. Sentence Composition A sentiment-tailored composition approach

4. Document Composition Learning sentiment-sensitive discourse phenomenons

Tang’s Approach

4

Model I: sentiment-specific word embedding (SSWE) An extension based on the C&W model Input: an original (or corrupted) ngram + the sentiment polarity of a

sentence as the input Output: the context score and sentiment score

Step1: Sentiment-Specific Word Embedding

5

D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu, and B. Qin. Learning sentiment-specific word embedding for twittersentiment classification. ACL’14

Model II: sentiment-specific phrase embedding (SSPE) An extension based on Mikolov’s Skip-Gram Model Utilizing the embedding of word 𝑤𝑖 to predict its context words Using the sentence representation 𝑠𝑒𝑗 to predict the gold sentiment

polarity 𝑝𝑜𝑙𝑗 (𝑠𝑒𝑗: averaging the embedding of phrases)

Step1: Sentiment-Specific Word Embedding (cont.)

6

D. Tang, F. Wei, B. Qin, M. Zhou, and T. Liu.. Building large-scale twitter-specific sentiment lexicon: Arepresentation learning approach. COLING’14

Simultaneously conducting sentence segmentation and sentence-levelsentiment classification Using a log-linear model to score each segmentation candidate Exploiting the phrasal information of top-ranked segmentations as

features to build the sentiment classifier

Step2&3: Sentiment-Specific Sentence Structure

7

D. Tang, F. Wei, B. Qin, L. Dong, T. Liu, and M. Zhou. A joint segmentation and classification framework forsentiment analysis. EMNLP’14

Dataset: Positive/negative Twitter sentiment classification on the benchmark dataset

from SemEval 2013 Baselines: DistSuper: Training with 10 million tweets selected by positive and negative emoticons LibLinear on BoW SVM: LibLinear on BoW NRC-Canada: The top-performed system in this task of SemEval 2013 Using many lexicons and hand-crafted features

Tang’s Experiments

8

OUTLINE



Tagging Personal Photos with Transfer Deep Learning. WWW’15 Introduction Approach: Ontology Building + Transfer Learning + Personal Photo Tagging

Experiments

9

9

Distributional semantic models (DSMs) Deriving vector-based representations of meaning from patterns of

word co-occurrence in corpora. Purely textual models: are severely impoverished, suffering of lack of

grounding in extra-linguistic modalities

Multimodal distributional semantic models (MDSMs) Enriching linguistic vectors with perceptual info. (e.g. image, speech) Drawbacks of current MDSMs Generally building linguistic and visual representations separately, then

merging them v.s. Human learning -- hearing words in a situated perceptualcontext

Assuming that both linguistic and visual information is available for allwords, with no generalization of knowledge across modalities

the multimodal skip-gram models For a subset of the target words, relevant visual evidence from natural

images is presented together with the corpus contexts. Traditional semantic benchmarks + Image labeling and retrieval scenario

Combining Language and Vision with a Multimodal Skip-gram Model. NAACL’15

10

Skip-gram Model for word embedding learning Given a text corpus, SKIP-GRAM aims at inducing word

representations that are good at predicting the context wordssurrounding a target word.

Injecting visual knowledge: For a subset of the target words, the corpus contexts are

accompanied by a visual representation of the concepts they denote.

Multimodal Skip-gram Architecture

11

Multi-modal Skip-gram Model A: Directly increase the similarity between linguistic and visual

representations Multi-modal Skip-gram Model B: Including an extra layer mediating between linguistic and visual

representations

Multimodal Skip-gram Architecture (cont.)

12

MMSkip-Gram Model A:

MMSkip-Gram Model B:

Experiment Settings Text corpus: a Wikipedia 2009 dump, approximately 800M tokens Image corpus: ImageNet, pictures that occur >= 500times Each word: 100 sampled pictures from ImageNet Image Representation: 4096-dimensional vector by the Caffe toolkit

Tasks: Approximating human judgments

Image labeling and retrieval

Lazaridou’s Experiments

13

Abstract words The indirect influence of visual information has interesting effects

on the representation of abstract terms. Abstract words are grounded in relevant concrete scenes and

situations.

Lazaridou’s Experiments (cont.)

14

Verifying experimentally:if near images of concrete things arerelevant not only for concrete words,as expected, but also for abstractones, as predicted by embodied viewsof meaning.

OUTLINE

15

15



Tagging Personal Photos with Transfer Deep Learning. WWW’15 Introduction Approach: Ontology Building + Transfer Learning + Personal Photo Tagging

Experiments

The emergence of mobile devices and cloud storage services >> anunprecedented growth in the number of personal photos Personal photos: photos that are usually captured by amateur users with

personal digital devices Managing personal photos: image tagging

Personal photos: different from Web images Lack accurate text descriptions in general as users are unlikely to label their

photos. The semantic distribution of personal photos is only a subset of a general

vocabulary of Web images. The appearance of personal photos is more complex. The tags, if there are any, are very subjective.

Tagging Personal Photos with Transfer Deep Learning. WWW’15

16

The challenge of understanding personal photos

Existing work of image tagging Model-based approaches Heavily relying on pre-trained classifiers with machine learning algorithms Model-free approaches Propagating tags through the tagging behavior of visual neighbors

Both assume that there is a well-labelled image database (source domain) thathas the same or at least a similar data distribution as the target domain

However, the well-labelled database is hard to obtain in the domain of personalphotos. Flickr: many personal photos and user-contributed tags Half of the user-contributed tags are noises to the image content ImageNet: accurate supervised information the two significant gaps: the semantic distribution and visual appearance

gaps between the two domains

a novel transfer deep learning approach with ontology priors1. Designing an ontology specific for personal photos2. Reducing the visual appearance gap by transfer deep learning3. Proposing two modes for personal photo tagging

Tagging Personal Photos with Transfer Deep Learning (cont.)

17

The vocabulary of general Web images: too large and not specific to personal photosExploring the semantic distributions in the domain of personal photos

1. Mining frequent tags from active users in Flickr 10, 000 active users : >= 500 photos, uploaded in the most recent 6 months >= 2 years, registration time

20 million photos and 2.7 million unique tags Each user-contributed tag >> a concept Top tags: ranked by frequency, >= 3000 times

272 concepts

2. Building concept relations 272 concepts: leaf nodes Grouped into the 20 middle-level nodes according to the word similarity in

WordNet

3. Mapping concepts with ImageNet labels for transfer learning Calculating the word similarity between the 272 concepts and the ImageNet-

22K labels by using WordNet

Step 1: Building Ontology for Personal Photos

18

The stacked convolutional autoencoders (CAES) Pre-training: both source and target domains (ImageNet, Flickr) Fine-tuning: in the source domain >> for shared feature representations Adding the top layer: a fully connected layer with ontology priors (FCO)

Step2:Transfer Deep Learning with Ontology Priors

19

Deep Learning with Bottom-Up Transfer CNN + AE >> CAE: pre-training + fine-tuning

Deep Learning with Top-Down Transfer CAE + an FCO layer: adding the priors of the leaf nodes and the middle-level

nodes in the defined ontology to reduce the domain gap

Step2: Transfer Deep Learning with Ontology Priors (cont.)

20

Tagging with Single-Mode Only takes visual content into account

Tagging with Batch-Mode Further combines visual content with time constraints in a photo

collection the timestamp when the photo was taken: a short interval >> event relations the geo-location of the photo: same locations >> scenario relations

Step3: Personal Photo Tagging

21

Labels: 2–Highly Relevant; 1–Relevant; 0–Non Relevant

Training Data: ~0.36 million images ImageNet: 272 concepts * randomly selected about 650 images Flickr: 272 concepts * 650 photos from the 10K active users

Testing Data: 7,000 annotated personal photos From 25 volunteers• Male/female: 17/8• Different education background:• CS, mathematics, physics, business, management science, art and design

• All: familiar with photography and liked taking photos• 20 ~28 / 30~45 years old: 19/6 Each volunteer: >= 500 photos total: 35,217 testing photos Labeling: randomly annotate one fifth of their own personal photos 10 photos having 15 relevant or highly relevant concepts about 70% of the photos present more than three relevant or highly

relevant concepts

Fu’s Experiments - Dataset

22

Compared approaches:1. Tag ranking2. Dyadic Transfer Learning (DTL)3. Transfer Learning with Geodesic Flow Kernel (GFK)4. Deep learning with no transfer (DL)5. Deep learning with Flickr training data (DL(Flickr))6. Deep learning with top-down transfer (DL+TT)7. Deep learning with bottom-up transfer (DL+BT)8. Deep learning with full transfer (DL+FT)

Network architecture CAE1: 96 filters of size 7×7×3 with a stride of 2 pixels CAE2: 256 filters of size 5×5×96 with a stride of 2 pixels CAE3: 384 filters of size 3 × 3 × 256 with a stride of 1 pixel CAE4: 384 filters of size 3×3×384 with a stride of 1 pixel CAE5: 256 filters of size 3 × 3 × 384 with a stride of 1 pixel

Fu’s Experiments - Settings

23

Evaluation metrics: Personal Photo Tagging:

Fu’s Experiments - Results

24

Personal Photo Search: MIT-Adobe FiveK dataset

Fu’s Experiments – Results (cont.)

26

SUMMARY

Sentiment-Specific Representation Learning for Document-Level SentimentAnalysis. WSDM’15

1. Sentiment-specific word embedding (SSWE) << the C&W model

2. Sentiment-specific phrase embedding (SSPE) << the Skip-Gram model

3. Sentiment-Specific Sentence Structure: segmentor + prediction

Experiment: Sentiment Analysis of Twitter Sentenctes in SemEval 2013

Combining Language and Vision with a Multimodal Skip-gram Model. NAACL’15 Two Multimodal Skip-gram Models: direct mapping + intermedia transformation

Experiments:

Approximating human judgments in semantic + Image labeling and retrieval

Abstract words are grounded in relevant concrete scenes and situations.

Tagging Personal Photos with Transfer Deep Learning. WWW’15

Personal Photos: a subset of Web images with several unique properties

1. Building Ontology for Personal Photos (Flickr + WordNet)

2. Transfer Deep Learning with Ontology Priors (CAE + a FCO layer)

3. Personal Photo Tagging (Single-Mode + Batch-Mode with time and location info.)

Experiments: Personal Photo Tagging + Search (8 baselines + 5 architectures)27

27