OUTLINE
Sentiment-Specific Representation Learning for Document-LevelSentiment Analysis. WSDM’15 Introduction Approach: Sentiment-Specific Word Embedding + Sentence Composition Experiments
Combining Language and Vision with a Multimodal Skip-gram Model.NAACL’15 Introduction Multimodal Skip-gram Architecture Experiments: Abstract Words - Images
Tagging Personal Photos with Transfer Deep Learning. WWW’15 Introduction Approach: Ontology Building + Transfer Learning + Personal Photo Tagging Experiments
1
1
LU YangyangMay 27th, 2015
AUTHORS
Sentiment-Specific Representation Learning for Document-LevelSentiment Analysis WSDM 2015, short paper Duyu Tang (Bin Qin, Ting Liu) Research Center for Social Computing and Information Retrieval,
Harbin Institute of Technology
Combining Language and Vision with a Multimodal Skip-gram Model NAACL 2015, full paper Angeliki Lazaridou, Nghia The Pham, Marco Baroni Center for Mind/Brain Sciences, University of Trento
Tagging Personal Photos with Transfer Deep Learning WWW 2015, full paper Jianlong Fu 1, Tao Mei 2, Kuiyuan Yang 2, Hanqing Lu 1, and Yong Rui 2
1. Institute of Automation, Chinese Academy of Sciences2. Microsoft Research, China
2
Sentiment analysis (also known as opinion mining) To analyze people's opinions/sentiments/emotions from texts
Related Work: Method based on feature engineering, hand-coded features Document-level sentiment analysis >> text categorization task Focusing on designing effective features Using many sentiment lexicons and hand-crafted rules as features
Deep learning methods Effective in NLP tasks: word-segmentation, POS tagging, NER, etc. Context-based learning algorithms Focusing on modeling the contexts of words Fail to capture the sentiment information of texts
developing sentiment-specific representation learning methods fordocument-level sentiment analysis
Sentiment-Specific Representation Learning for Document-Level Sentiment Analysis. WSDM’15
3
Similar contexts of “good” and “bad”
1. Sentiment-Specific Word Embedding Simultaneously encoding the contexts of words and sentiment information of
texts in the continuous representation of words Using emoticon signals as the sentiment supervision of Twitter sentences SSWE: an extension of the C&W model SSPE: an extension of Mikolov’s Skip-gram model
2. Sentiment-Specific Sentence Structure Sentiment-specific sentence segmentor
3. Sentence Composition A sentiment-tailored composition approach
4. Document Composition Learning sentiment-sensitive discourse phenomenons
Tang’s Approach
4
Model I: sentiment-specific word embedding (SSWE) An extension based on the C&W model Input: an original (or corrupted) ngram + the sentiment polarity of a
sentence as the input Output: the context score and sentiment score
Step1: Sentiment-Specific Word Embedding
5
D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu, and B. Qin. Learning sentiment-specific word embedding for twittersentiment classification. ACL’14
Model II: sentiment-specific phrase embedding (SSPE) An extension based on Mikolov’s Skip-Gram Model Utilizing the embedding of word 𝑤𝑖 to predict its context words Using the sentence representation 𝑠𝑒𝑗 to predict the gold sentiment
polarity 𝑝𝑜𝑙𝑗 (𝑠𝑒𝑗: averaging the embedding of phrases)
Step1: Sentiment-Specific Word Embedding (cont.)
6
D. Tang, F. Wei, B. Qin, M. Zhou, and T. Liu.. Building large-scale twitter-specific sentiment lexicon: Arepresentation learning approach. COLING’14
Simultaneously conducting sentence segmentation and sentence-levelsentiment classification Using a log-linear model to score each segmentation candidate Exploiting the phrasal information of top-ranked segmentations as
features to build the sentiment classifier
Step2&3: Sentiment-Specific Sentence Structure
7
D. Tang, F. Wei, B. Qin, L. Dong, T. Liu, and M. Zhou. A joint segmentation and classification framework forsentiment analysis. EMNLP’14
Dataset: Positive/negative Twitter sentiment classification on the benchmark dataset
from SemEval 2013 Baselines: DistSuper: Training with 10 million tweets selected by positive and negative emoticons LibLinear on BoW SVM: LibLinear on BoW NRC-Canada: The top-performed system in this task of SemEval 2013 Using many lexicons and hand-crafted features
Tang’s Experiments
8
OUTLINE
Sentiment-Specific Representation Learning for Document-LevelSentiment Analysis. WSDM’15 Introduction Approach: Sentiment-Specific Word Embedding + Sentence Composition Experiments
Combining Language and Vision with a Multimodal Skip-gram Model.NAACL’15 Introduction Multimodal Skip-gram Architecture Experiments: Abstract Words - Images
Tagging Personal Photos with Transfer Deep Learning. WWW’15 Introduction Approach: Ontology Building + Transfer Learning + Personal Photo Tagging
Experiments
9
9
Distributional semantic models (DSMs) Deriving vector-based representations of meaning from patterns of
word co-occurrence in corpora. Purely textual models: are severely impoverished, suffering of lack of
grounding in extra-linguistic modalities
Multimodal distributional semantic models (MDSMs) Enriching linguistic vectors with perceptual info. (e.g. image, speech) Drawbacks of current MDSMs Generally building linguistic and visual representations separately, then
merging them v.s. Human learning -- hearing words in a situated perceptualcontext
Assuming that both linguistic and visual information is available for allwords, with no generalization of knowledge across modalities
the multimodal skip-gram models For a subset of the target words, relevant visual evidence from natural
images is presented together with the corpus contexts. Traditional semantic benchmarks + Image labeling and retrieval scenario
Combining Language and Vision with a Multimodal Skip-gram Model. NAACL’15
10
Skip-gram Model for word embedding learning Given a text corpus, SKIP-GRAM aims at inducing word
representations that are good at predicting the context wordssurrounding a target word.
Injecting visual knowledge: For a subset of the target words, the corpus contexts are
accompanied by a visual representation of the concepts they denote.
Multimodal Skip-gram Architecture
11
Multi-modal Skip-gram Model A: Directly increase the similarity between linguistic and visual
representations Multi-modal Skip-gram Model B: Including an extra layer mediating between linguistic and visual
representations
Multimodal Skip-gram Architecture (cont.)
12
MMSkip-Gram Model A:
MMSkip-Gram Model B:
Experiment Settings Text corpus: a Wikipedia 2009 dump, approximately 800M tokens Image corpus: ImageNet, pictures that occur >= 500times Each word: 100 sampled pictures from ImageNet Image Representation: 4096-dimensional vector by the Caffe toolkit
Tasks: Approximating human judgments
Image labeling and retrieval
Lazaridou’s Experiments
13
Abstract words The indirect influence of visual information has interesting effects
on the representation of abstract terms. Abstract words are grounded in relevant concrete scenes and
situations.
Lazaridou’s Experiments (cont.)
14
Verifying experimentally:if near images of concrete things arerelevant not only for concrete words,as expected, but also for abstractones, as predicted by embodied viewsof meaning.
OUTLINE
15
15
Sentiment-Specific Representation Learning for Document-LevelSentiment Analysis. WSDM’15 Introduction Approach: Sentiment-Specific Word Embedding + Sentence Composition Experiments
Combining Language and Vision with a Multimodal Skip-gram Model.NAACL’15 Introduction Multimodal Skip-gram Architecture Experiments: Abstract Words - Images
Tagging Personal Photos with Transfer Deep Learning. WWW’15 Introduction Approach: Ontology Building + Transfer Learning + Personal Photo Tagging
Experiments
The emergence of mobile devices and cloud storage services >> anunprecedented growth in the number of personal photos Personal photos: photos that are usually captured by amateur users with
personal digital devices Managing personal photos: image tagging
Personal photos: different from Web images Lack accurate text descriptions in general as users are unlikely to label their
photos. The semantic distribution of personal photos is only a subset of a general
vocabulary of Web images. The appearance of personal photos is more complex. The tags, if there are any, are very subjective.
Tagging Personal Photos with Transfer Deep Learning. WWW’15
16
The challenge of understanding personal photos
Existing work of image tagging Model-based approaches Heavily relying on pre-trained classifiers with machine learning algorithms Model-free approaches Propagating tags through the tagging behavior of visual neighbors
Both assume that there is a well-labelled image database (source domain) thathas the same or at least a similar data distribution as the target domain
However, the well-labelled database is hard to obtain in the domain of personalphotos. Flickr: many personal photos and user-contributed tags Half of the user-contributed tags are noises to the image content ImageNet: accurate supervised information the two significant gaps: the semantic distribution and visual appearance
gaps between the two domains
a novel transfer deep learning approach with ontology priors1. Designing an ontology specific for personal photos2. Reducing the visual appearance gap by transfer deep learning3. Proposing two modes for personal photo tagging
Tagging Personal Photos with Transfer Deep Learning (cont.)
17
The vocabulary of general Web images: too large and not specific to personal photosExploring the semantic distributions in the domain of personal photos
1. Mining frequent tags from active users in Flickr 10, 000 active users : >= 500 photos, uploaded in the most recent 6 months >= 2 years, registration time
20 million photos and 2.7 million unique tags Each user-contributed tag >> a concept Top tags: ranked by frequency, >= 3000 times
272 concepts
2. Building concept relations 272 concepts: leaf nodes Grouped into the 20 middle-level nodes according to the word similarity in
WordNet
3. Mapping concepts with ImageNet labels for transfer learning Calculating the word similarity between the 272 concepts and the ImageNet-
22K labels by using WordNet
Step 1: Building Ontology for Personal Photos
18
The stacked convolutional autoencoders (CAES) Pre-training: both source and target domains (ImageNet, Flickr) Fine-tuning: in the source domain >> for shared feature representations Adding the top layer: a fully connected layer with ontology priors (FCO)
Step2:Transfer Deep Learning with Ontology Priors
19
Deep Learning with Bottom-Up Transfer CNN + AE >> CAE: pre-training + fine-tuning
Deep Learning with Top-Down Transfer CAE + an FCO layer: adding the priors of the leaf nodes and the middle-level
nodes in the defined ontology to reduce the domain gap
Step2: Transfer Deep Learning with Ontology Priors (cont.)
20
Tagging with Single-Mode Only takes visual content into account
Tagging with Batch-Mode Further combines visual content with time constraints in a photo
collection the timestamp when the photo was taken: a short interval >> event relations the geo-location of the photo: same locations >> scenario relations
Step3: Personal Photo Tagging
21
Labels: 2–Highly Relevant; 1–Relevant; 0–Non Relevant
Training Data: ~0.36 million images ImageNet: 272 concepts * randomly selected about 650 images Flickr: 272 concepts * 650 photos from the 10K active users
Testing Data: 7,000 annotated personal photos From 25 volunteers• Male/female: 17/8• Different education background:• CS, mathematics, physics, business, management science, art and design
• All: familiar with photography and liked taking photos• 20 ~28 / 30~45 years old: 19/6 Each volunteer: >= 500 photos total: 35,217 testing photos Labeling: randomly annotate one fifth of their own personal photos 10 photos having 15 relevant or highly relevant concepts about 70% of the photos present more than three relevant or highly
relevant concepts
Fu’s Experiments - Dataset
22
Compared approaches:1. Tag ranking2. Dyadic Transfer Learning (DTL)3. Transfer Learning with Geodesic Flow Kernel (GFK)4. Deep learning with no transfer (DL)5. Deep learning with Flickr training data (DL(Flickr))6. Deep learning with top-down transfer (DL+TT)7. Deep learning with bottom-up transfer (DL+BT)8. Deep learning with full transfer (DL+FT)
Network architecture CAE1: 96 filters of size 7×7×3 with a stride of 2 pixels CAE2: 256 filters of size 5×5×96 with a stride of 2 pixels CAE3: 384 filters of size 3 × 3 × 256 with a stride of 1 pixel CAE4: 384 filters of size 3×3×384 with a stride of 1 pixel CAE5: 256 filters of size 3 × 3 × 384 with a stride of 1 pixel
Fu’s Experiments - Settings
23
Evaluation metrics: Personal Photo Tagging:
Fu’s Experiments - Results
24
Personal Photo Search: MIT-Adobe FiveK dataset
Fu’s Experiments – Results (cont.)
26
SUMMARY
Sentiment-Specific Representation Learning for Document-Level SentimentAnalysis. WSDM’15
1. Sentiment-specific word embedding (SSWE) << the C&W model
2. Sentiment-specific phrase embedding (SSPE) << the Skip-Gram model
3. Sentiment-Specific Sentence Structure: segmentor + prediction
Experiment: Sentiment Analysis of Twitter Sentenctes in SemEval 2013
Combining Language and Vision with a Multimodal Skip-gram Model. NAACL’15 Two Multimodal Skip-gram Models: direct mapping + intermedia transformation
Experiments:
Approximating human judgments in semantic + Image labeling and retrieval
Abstract words are grounded in relevant concrete scenes and situations.
Tagging Personal Photos with Transfer Deep Learning. WWW’15
Personal Photos: a subset of Web images with several unique properties
1. Building Ontology for Personal Photos (Flickr + WordNet)
2. Transfer Deep Learning with Ontology Priors (CAE + a FCO layer)
3. Personal Photo Tagging (Single-Mode + Batch-Mode with time and location info.)
Experiments: Personal Photo Tagging + Search (8 baselines + 5 architectures)27
27