46
Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing Genevieve Gorrell 5 th June 2007

Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

  • Upload
    ashtyn

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

Genevieve Gorrell 5 th June 2007. Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing. Introduction. Think datapoints plotted in hyperspace Imagine a space in which each word has its own dimension big bad [21] [11] [01] - PowerPoint PPT Presentation

Citation preview

Page 1: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

Generalized Hebbian Algorithm for Dimensionality Reduction in

Natural Language Processing

Genevieve Gorrell

5th June 2007

Page 2: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

IntroductionThink datapoints plotted in hyperspaceImagine a space in which each word has its own dimension

big bad[ 2 1 ][ 1 1 ][ 0 1 ]

We can compare these passagesusing vector representations in this space

axis of bigness

axis of badness

”big big bad”

”big bad”

”bad”

Page 3: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

Dimensionality Reduction

Do we really need two dimensions to describe the relationship between these datapoints?

axis of bigness

axis of badness

”big big bad”

”big bad”

”bad”

Page 4: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

Dimensionality Reduction

Do we really need two dimensions to describe the relationship between these datapoints?

axis of bigness

axis of badness

”big big bad”

”big bad”

”bad”

Page 5: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

Rotation

Imagine the data look like this ...

Page 6: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

Rotation

Imagine the data look like this ...

Page 7: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

More Rotation

Or even like this ...

Page 8: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

More Rotation

Or even like this ... Because if these

were the dimensions we would know which were the most important

We could describe as much of the data as possible using a smaller number of dimensions approximation compression generalisation

Page 9: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

More Rotation

Or even like this ... Because if these

were the dimensions we would know which were the most important

We could describe as much of the data as possible using a smaller number of dimensions approximation compression generalisation

Page 10: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

More Rotation

Or even like this ... Because if these

were the dimensions we would know which were the most important

We could describe as much of the data as possible using a smaller number of dimensions approximation compression generalisation

Page 11: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

Eigen DecompositionThe key lies in rotating the data into the most efficient orientationEigen decomposition will give us a set of axes (eigenvectors) of a new space in which our data might more efficiently be represented

Page 12: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

Eigen Decomposition

Eigen decomposition is a vector space technique that provides a useful way to automatically reduce data dimensionalityThis technique is of interest in natural language processing

Latent Semantic IndexingGiven a dataset in a given space, eigen decomposition can be used to create a nearest approximation in a space with fewer dimensions

For example, document vectors as bags of words in a space with one dimension per word can be mapped to a space with fewer dimensions than one per word

Mv = λv

Page 13: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

A real world example—eigenfaces

Each new dimension captures something important about the dataThe original observation can be recreated from a combination of these components

Page 14: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

Eigen Faces 2Each eigen face captures as much information in the dataset as possible (eigenvectors are orthogonal to each other)

This is much more efficient than the original representation

Page 15: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

More Eigen Face Convergence

Eigen faces with high eigenvalues capture important generalisations in the corpusThese generalisations might well apply to unseen data ...

Page 16: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

We have been using this in natural language processing ...

Corpus-driven language modelling suffers from problems with data sparsity

We can use eigen decomposition to make generalisations that might apply to unseen data

But language corpora are very large ...

Page 17: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

Problems with eigen decomposition

Existing algorithms often;require all the data be available at once (batch processing)produce all the component vectors simultaneously, even though they may not all be necessary and it takes longer to do all of themare very computationally expensive, therefore may exceed the capabilities of the computer for larger corpora

large RAM requirementexponential relationship between time/RAM requirement and dataset size

Page 18: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

Generalized Hebbian Algorithm (Sanger 1989)

Based on Hebbian learningSimple localised technique for deriving eigen decompositionRequires very little memoryLearns based on single observations (for example, document vectors) presented serially, therefore no problem to add more data

In fact, the entire matrix need never be simultaneously available

Greatest are produced first

Page 19: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

GHA Algorithmc += (c . x) x

c is the eigenvector, x is the training datum

Initialise eigenvector randomlyWhile the eigenvector is not converged {

Dot-product each training vector with the eigenvectorMultiply the result by the training vectorAdd the resulting vector to the eigenvector

}

Dot-product is a measure of similarity of direction of one vector with another, and produces a scalarThere are various ways in which one might assess convergence

Page 20: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

GHA Algorithm Continued

Or in other words, train by adding each datum to the eigenvector proportionally with the extent to which it already resembles itTrain subsequent eigenvectors by removing the stronger eigenvectors from the data before we train, so it doesn’t find those ones

Page 21: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

GHA as a neural net

x=1

dp = Input_x Weight_x

n

Weight_1

Input_1

Input_2

Input_3

Input_n

Weight_2

Weight_3

Weight_n

Weight_2 += dp Input_2

Weight_1 += dp Input_1

Weight_n += dp Input_n

Weight_3 += dp Input_3

• Can be extended to learn many eigenvectors

Page 22: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

Singular Value Decomposition

Extends eigen decomposition to paired data

Word co-occurrence

big bad

big

bad 3

5

3

3

Word bigrams

big:2 bad:2

big:1

bad:1

“bad”“big bad”“big big bad”

0

1

0

2

Page 23: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

Asymmetrical GHA (Gorrell 2006)

Extends GHA to asymmetrical datasetsallows us to work with n-grams for example

Retains the features of GHA

Page 24: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

Asymmetrical GHA Algorithm

ca += (cb.xb) xa

cb += (ca.xa) xb

Train singular vectors on data presented as a series of vector pairs by dotting left training datum with left singular vector and scaling right singular vector by the resulting scalar and vice versa

for example, first word in a bigram might be vector xa and the second, xb

Page 25: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

Asymmetrical GHA Performance (20,000 NL bigrams)

RAM requirement linear with dimensionality and number of singular vectors required

Time per training step linear with dimensionality

This is a big improvement on conventional approaches for larger corpora/dimensionalities ...

But don't forget, the algorithm needs to be allowed to converge

Page 26: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

N-Gram Language Model Smoothing

Modelling language as a string of n-gramshighly successful approachbut we will always have problems with data sparsityzero probabilities are bad news

A Zipf Curve

Page 27: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

N-gram Language Modelling—An Example Corpus

A man hits the ball at the dog. The man hits the ball at thehouse. The man takes the dog to the ball. A man takes the ball to thehouse. The dog takes the ball to the house. The dog takes the ball tothe man. The man hits the ball to the dog. The man walks the dog tothe house. The man walks the dog. The dog walks to the man. A dog hitsa ball. The man walks in the house. The man hits the dog. A ball hitsthe dog. The man walks. A ball hits. Every ball hits. Every dog walks. Everyman walks. A man walks. A small man walks. Every nice dog barks.

Page 28: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

man hits the ball at dog house takes to walks a in small nice barksa 0.03 0.0 0.0 0.03 0.0 0.01 0.0 0.0 0.0 0.0 0.0 0.0 0.01 0.0 0.0man 0.0 0.04 0.0 0.0 0.0 0.0 0.0 0.02 0.0 0.07 0.0 0.0 0.0 0.0 0.0hits 0.0 0.0 0.05 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.01 0.0 0.0 0.0 0.0the 0.1 0.0 0.0 0.07 0.0 0.1 0.05 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0ball 0.0 0.03 0.0 0.0 0.02 0.0 0.0 0.0 0.04 0.0 0.0 0.0 0.0 0.0 0.0at 0.0 0.0 0.02 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0takes 0.0 0.0 0.04 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0dog 0.0 0.01 0.0 0.0 0.0 0.0 0.0 0.02 0.02 0.02 0.0 0.0 0.0 0.0 0.01to 0.0 0.0 0.07 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0walks 0.0 0.0 0.02 0.0 0.0 0.0 0.0 0.0 0.01 0.0 0.0 0.01 0.0 0.0 0.0in 0.0 0.0 0.01 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0every 0.01 0.0 0.0 0.01 0.0 0.01 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.01 0.0small 0.01 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0nice 0.0 0.0 0.0 0.0 0.0 0.01 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

An Example Corpus as Normalised Bigram Matrix

Page 29: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

man hits the ball at dog house takes to walks a in small nice barksa 0.02 0.00 0.00 0.02 0.00 0.02 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00man 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00hits 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00the 0.10 0.00 0.00 0.07 0.00 0.10 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00ball 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00at 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00takes 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00dog 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00to 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00walks 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00in 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00every 0.01 0.00 0.00 0.01 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00small 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00nice 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

First Singular Vector Pair

Page 30: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

man hits the ball at dog house takes to walks a in small nice barksa 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00man 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00hits 0.00 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00the 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00ball 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00at 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00takes 0.00 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00dog 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00to 0.00 0.00 0.07 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00walks 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00in 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00every 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00small 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00nice 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Second Singular Vector Pair

Page 31: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

man hits the ball at dog house takes to walks a in small nice barksa 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00man 0.00 0.04 0.00 0.00 0.01 0.00 0.00 0.02 0.02 0.06 0.00 0.00 0.00 0.00 0.00hits 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00the 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00ball 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.02 0.00 0.00 0.00 0.00 0.00at 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00takes 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00dog 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.02 0.00 0.00 0.00 0.00 0.00to 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00walks 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00in 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00every 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00small 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00nice 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Third Singular Vector Pair

Page 32: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

Language Models from Eigen N-Grams

Add k singular vector pairs (“eigen n-grams”) together

Remove all the negative cell values

Normalise row-wise to get probabilities

Include a smoothing approach to remove zeros

man hits the ball at dog house takes to walks a in small nice barksa 0.02 0.00 0.00 0.02 0.00 0.02 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00man 0.00 0.04 0.00 0.00 0.01 0.00 0.00 0.02 0.02 0.06 0.00 0.00 0.00 0.00 0.00hits 0.00 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00the 0.10 0.00 0.00 0.07 0.00 0.10 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00ball 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.02 0.00 0.00 0.00 0.00 0.00at 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00takes 0.00 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00dog 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.02 0.00 0.00 0.00 0.00 0.00to 0.00 0.00 0.07 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00walks 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00in 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00every 0.01 0.00 0.00 0.01 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00small 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00nice 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Page 33: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

What do we hope to see?

Theory is that reduced dimensionality representation better describes the unseen test corpus than the original representationAs k increases perplexity should decrease until the optimum is reachedk should then begin to increase as the optimum is passed and too much data is includedWe hope for a U-shaped curve

Page 34: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

Some results ...

Perplexity is a measure of the quality of the language model

k is number of dimensions (eigen n-grams)

Times are how long it took to calculate the dimensions

Page 35: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

Some specifics about this experiment

The corpus comprises five newsgroups from CMU's newsgroup corpusTraining corpus contains over a million itemsUnseen test corpus comprises over 100,000 itemsI used AGHA to calculate the decompositionI used simple heuristically-chosen smoothing constants and single-order language models

Page 36: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

Maybe k is too low?

200,000 trigramsLAS2 algorithm

Page 37: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

Full rank decomposition

20,000 bigrams

Furthermore perplexity in each case never reaches the baseline of perplexity of the original n-gram model

Page 38: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

Linear interpolation may generate an interesting result

k Weight SVDLM perp. N-gram perp. Comb. perp25 1 7.071990e+02 4.647004e+02 3.891952e+0210 1 8.884950e+02 4.647004e+02 3.695157e+0210 0.7 8.884950e+02 4.647004e+02 3.705559e+025 1 1.156845e+03 4.647004e+02 3.788119e+02

Best result is 370 An overall improvement of 20% is demonstrated (However, this involved tuning on the test corpus)

Page 39: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

200,000 Trigram Corpus

k Weight SVDLM perp. N-gram perp. Comb. perp100 1 1.003399e+03 4.057236e+02 3.196404e+0250 1 1.220449e+03 4.057236e+02 3.008804e+0225 1 1.508873e+03 4.057236e+02 2.834632e+0210 1 2.188041e+03 4.057236e+02 2.898518e+02

Improvement on the baseline n-gram is even greater on the medium-sized corpus (30%)

Page 40: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

1 Million Trigram Corpusk Weight SVDLM perp. N-gram perp. Comb. perp25 1 4.237069e+04 3.730947e+02 3.729931e+0225 2 4.237069e+04 3.730947e+02 3.728907e+0225 10 4.237069e+04 3.730947e+02 3.721338e+0225 100 4.237069e+04 3.730947e+02 3.663666e+0225 1000 4.237069e+04 3.730947e+02 3.442525e+0225 10000 4.237069e+04 3.730947e+02 2.980755e+0225 100000 4.237069e+04 3.730947e+02 2.422045e+0225 1000000 4.237069e+04 3.730947e+02 2.187968e+0225 10000000 4.237069e+04 3.730947e+02 2.741027e+02

This is a big dataset for SVD! Needed to increase the weighting on the

SVDLM a lot to get a good result

Page 41: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

Fine-Tuning kk Weight SVDLM perp. N-gram perp. Comb. perp25 1000000 4.237069e+04 3.730947e+02 2.192305e+0220 1000000 4.249082e+04 3.730947e+02 2.174188e+0215 1000000 4.266386e+04 3.730947e+02 2.100715e+0210 1000000 4.290579e+04 3.730947e+02 2.102029e+02

Tuning k results in a best perplexity of over 40% A low optimal k is a good thing because many

algorithms for calculating SVD produce singular vectors one at a time starting with the largest

Page 42: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

TractabilityThe biggest challenge with SVDLM is tractabilityCalculating SVD is computationally demanding

But optimal k is lowI have also developed an algorithm that helps with tractability

Usability of the resulting SVDLM is also an issueSVDLM is much larger than regular n-gramBut the size can be minimised by discarding low values with minimal impact on performance

Page 43: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

Backoff SVDLM

Improving on n-gram language modelling is interesting workHowever no improvement on the state of the art has been demonstrated yet!Next steps involve creation of a backoff SVDLM

Interpolating with lower-order n-grams is standardBackoff models have much superior performance

Page 44: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

Similar Work

Jerome Bellegarda developed the LSA language model

Uses longer span eigen decomposition information to access semantic informationOthers have since developed the work

Saul and Pereira demonstrated an approach based on Markov models

Again demonstrates that some form of dimensionality reduction is beneficial

Page 45: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

Summary

GHA-based algorithm allows large datasets to be decomposedAsymmetrical formulation allows data such as n-grams to be decomposedPromising initial results in n-gram language model smoothing have been presented

Page 46: Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing

Thanks!

Gorrell, 2006 “Generalized Hebbian Algorithm for Incremental Singular Value Decomposition.” Proceedings of EACL 2006Gorrell and Webb, 2005 ”Generalized Hebbian Algorithm for Incremental Latent Semantic Analysis.” Proceedings of Interspeech 2005Sanger, T. 1989 ”Optimal Unsupervised Learning in a Single-Layer Linear Feedforward Network.” Neural Networks, 2, 459-473