40
Survey on Text Representation & Shallow Sentence Embedding: Fixed-Size Ordinally-Forgetting Encoding Approach Ahmed H. AlGhidani MSc Student in Computer Science at Cairo University Research and SDE at RDI Egypt [email protected]

Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Embed Size (px)

Citation preview

Page 1: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Survey on Text Representation&

Shallow Sentence Embedding: Fixed-SizeOrdinally-Forgetting Encoding Approach

Ahmed H. AlGhidaniMSc Student in Computer Science at Cairo University

Research and SDE at RDI Egypt

[email protected]

Page 2: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Agenda

• Word Representation- 1-hot Encoding

• Word Embedding- Word2Vec- GloVe

• Sentence Representation- Bag of Words (BoW)

• Sentence Embedding- Doc2Vec- Fixed-Size Ordinally-Forgetting Encoding (FOFE)- Others

Page 3: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Word Representation

• Transform a word to vector space model

• Each word has a unique vector that differsit from others

• Vector dimensions is varies according tothe method used for transformation

Page 4: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Word Representation (Cont.)

• 1-hot Encoding

Page 5: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Word Representation (Cont.)

• Pros- Easy to understand and implement

• Cons- Depends on vocabulary size (Memoryissues)- Doesn’t represent the semanticrepresentation of words

Page 6: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Word Embedding

• A vector space model

• Each word has a fixed-size unique vectorrepresentation

• It shows the semantic relations betweenwords

Page 7: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach
Page 8: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Word Embedding (Cont.)

• Word2Vec (Mikolov, 2013)

• To represent a word, we need to use itscontext

• Group of models that are about usingshallow Neural Networks

• We will talk about Skip-gram andContinuos Bag of Words (CBOW)

Page 9: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Word Embedding (Cont.)

• Skip-gram model

• 2 layers of Multi-layer perceptron (MLP)neural netwoek

• Input is a word and output is group ofwords that maybe occured if that word isgiven

• Softmax layer to get the probability ofgetting an output word

Page 10: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach
Page 11: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach
Page 12: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach
Page 13: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Word Embedding (Cont.)

• Continuos Bag of Words (CBOW)

• 2 layers of MLP neural network

• Input is a word’s context and output is theword

• Softmax layer to get the probability ofgetting an output word

Page 14: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach
Page 15: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach
Page 16: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach
Page 17: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Word Embedding (Cont.)

• Pros- Fast and effcient- Fixed-size Dimension

• Cons- We don’t know why it works (includingMikolov itself!)

Page 18: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Word Embedding (Cont.)

• Global Vectors (GloVe)

• We build a co-occurence matrix for thewhole corpus

• Factorize this matrix to word vectors andcontext vectors

Page 19: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Word Embedding (Cont.)

Corpus: A D C E A D F E B A C E DWindow-size: 2 (bi-grams)

Co-occurence matrix XA B C D E

A 0 1 3 2 3B 1 0 1 0 1C 3 1 0 2 2D 2 0 2 0 4E 3 1 2 4 0

Page 20: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Word Embedding (Cont.)

We want to use the co-occuerenceinformation to produce the word vectors, so,we use this function for pair of words

Our target is to minimize this objectivefunction all over the corpus words

Page 21: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Word Embedding (Cont.)

Where,

Page 22: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Word Embedding (Cont.)

• Pros- Statistical approach- Combines statistics methos and skip-gram model to produce the word vector

Page 23: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Sentence Representation

• Given a context of words, the target is tovectorize the whole context to a vectorspace model

Page 24: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Sentence Representation(Cont.)

• Bag of Words Model (BoW)

• Depends on term-frequency in the context

• The vector dimensions varies according tothe size of corpus’s unique words

Page 25: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Sentence Representation(Cont.)

Corpus:Sentence 1: “The cat sat on the hat”Sentence 2: “The dog ate the cat and thehat”

Unique words:{the, cat, sat, on, hat, dog, ate, and}

Sentence 1: [2, 1, 1, 1, 1, 0, 0, 0]Sentence 2: [3, 1, 0, 0, 1, 1, 1, 1]

Page 26: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Sentence Representation(Cont.)

• Pros- Easy to understand and implement

• Cons- Depends on vocabulary size (Memoryissues)- Doesn’t represent the semanticrepresentation of words

Page 27: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Sentence Embedding

• A vector space model

• Each document has a fixed-size uniquevector representation

• It shows the semantic identity betweendocuments

Page 28: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Sentence Embedding (Cont.)

• Doc2Vec (Mikolov, 2014)

• Predicting words that follows the documentsemantic.

• Group of models that are about usingshallow Neural Networks

• We will talk about (PV-DM) and (PV-DBOW) models

Page 29: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Sentence Embedding (Cont.)

• Distributed Memory Model of ParagraphVector (PV-DM)

• 2 layers of Multi-layer perceptron (MLP)neural netwoek

• Input is a paragraph matrix and contextwords and output is a predicted word giventhe paragraph and context

Page 30: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach
Page 31: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Sentence Embedding (Cont.)

• Distributed Bag of Words of ParagraphVector (PV-DBOW)

• 2 layers of Multi-layer perceptron (MLP)neural netwoek

• Input is a paragraph and output is group ofwords that maybe occured if thatparagraph vector is given (extractingkeywords)

Page 32: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach
Page 33: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Sentence Embedding (Cont.)• Fixed-Size Ordinally-Forgetting Encoding (FOFE) (2015)

• Produces a fixed-size of vector dimension given avocublary

• The produced vectors are mostly unique, that helps todifferes them in semantic way

• Used mainly in NLP Language Modeling task usingregular Neural Network

Page 34: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach
Page 35: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Vocabulary{A, B, C}

InitiallyA = [1, 0, 0]B = [0, 1, 0]C = [0, 0, 1]

Sentence 1: {ABC}Sentence 2: {ABCBC}

TargetGet a fixed-size vector that represents eachsentence

Page 36: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Encoding Function

where,

Page 37: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

A = [1, 0, 0], B = [0, 1, 0], C = [0, 0, 1]Sentence 1: {ABC},

Sentence 2: {ABCBC}

Page 38: Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Sentence Embedding (Cont.)

• Deep Sentence Embedding Using LongShort-Term Memory Networks