Clustering Shallow Processing Techniques for NLP Ling570 November 30, 2011

ClusteringShallow Processing Techniques for NLP

Ling570November 30, 2011

RoadmapClustering

Motivation & Applications

Clustering Approaches

Evaluation

ClusteringTask: Given a set of objects, create a set of

clusters over those objects

Applications:

ClusteringTask: Given a set of objects, create a set of

clusters over those objects

Applications: Exploratory data analysis Document clustering Language modeling

Generalization for class-based LMs Unsupervised Word Sense Disambiguation

Automatic thesaurus creations Unsupervised Part-of-Speech Tagging Speaker clustering,….

Example: Document Clustering

Input: Set of individual documents

Output: Sets of document clusters

Many different types of clustering:

Many different types of clustering:Category: news, sports, weather, entertainment

Many different types of clustering:Category: news, sports, weather, entertainmentGenre clustering: Similar styles: blogs, tweets,

newswire

newswireAuthor clustering

newswireAuthor clusteringLanguage ID: language clusters

Many different types of clustering:Category: news, sports, weather, entertainmentGenre clustering: Similar styles: blogs, tweets, newswireAuthor clusteringLanguage ID: language clustersTopic clustering: documents on the same topic

OWS, debt supercommittee, Seattle Marathon, Black Friday..

Example:Word Clustering

Input: WordsBarbara, Edward, Gov, Mary, NFL, Reds, Scott, Sox,

ballot, finance, inning, payments, polls, profit, quarterback, researchers, science, score, scored, seats

Output: Word clusters

Input: WordsBarbara, Edward, Gov, Mary, NFL, Reds, Scott, Sox,

ballot, finance, inning, payments, polls, profit, quarterback, researchers, science, score, scored, seats

Example clusters:

Input: Words Barbara, Edward, Gov, Mary, NFL, Reds, Scott, Sox, ballot,

finance, inning, payments, polls, profit, quarterback, researchers, science, score, scored, seats

Example clusters: (from NYT) ballot, polls, Gov, seats profit, finance, payments NFL, Reds, Sox, inning, quarterback, scored, score researchers, science Scott, Mary, Barbara, Edward

QuestionsWhat should a cluster represent?

Due to F. Xia

Similarity among objects

How can we create clusters?

Due to F. Xia

How can we evaluate clusters?

Due to F. Xia

How can we evaluate clusters?

How can we improve NLP with clustering?

Due to F. Xia

SimilarityBetween two instances

Between an instance and a cluster

SimilarityBetween two instances

Between an instance and a cluster

Between clusters

Similarity MeasuresGiven x=(x1,x2,…,xn) and y=(y1,y2,…,yn)

Euclidean distance:

Manhattan distance:

Euclidean distance:

Manhattan distance:

Cosine similarity:

Clustering Algorithms

Types of ClusteringFlat vs Hierarchical Clustering:

Flat: partition data into k clusters

Flat: partition data into k clustersHierarchical: Nodes form hierarchy

Hard vs Soft ClusteringHard: Each object assigned to exactly one cluster

Hard vs Soft ClusteringHard: Each object assigned to exactly one clusterSoft: Allows degrees of membership and

membership in more than one cluster Often probability distribution over cluster

membership

Hierarchical Clustering

Hierarchical Vs. FlatHierarchical clustering:

More informativeGood for data explorationMany algorithms, none good for all dataComputationally expensive

Flat clustering:

Flat clustering:Fairly efficientSimple baseline algorithm: K-meansProbabilistic models use EM algorithm

Clustering AlgorithmsFlat clustering:

K-means clustering

K-medoids clustering

Hierarchical clustering:Greedy, bottom-up clustering

K-Means ClusteringInitialize:

Randomly select k initial centroids

Randomly select k initial centroidsCenter (mean) of cluster

Iterate until clusters stop changing

Iterate until clusters stop changingAssign each instance to the nearest cluster

Cluster is nearest if cluster centroid is nearest

Iterate until clusters stop changingAssign each instance to the nearest cluster

Cluster is nearest if cluster centroid is nearestRecompute cluster centroids

Mean of instances in the cluster

K-Means: 1 step

K-MeansRunning time:

O(n) – where n is the number of clustersConverges in finite number of steps

Issues:

K-MeansRunning time:

O(n) – where n is the number of clustersConverges in finite number of steps

Issues:Need to pick # clusters kCan find only local optimumSensitive to outliersRequires Euclidean distance:

What about enumerable classes (e.g. colors)?

MedoidMedoid: Element in cluster with highest average

similarity to other elements in cluster

Finding the medoid:For each element compute:

Select the element with highest f(p)

K-MedoidsInitialize:

Select k instances at random as medoids

Iterate until no changesAssign instances to cluster with nearest medoid

Recompute medoid for each cluster

Greedy, Bottom-Up Hierarchical Clustering

Initialize:Make an individual cluster for each instance

Greedy, Bottom-Up Hierarchical Clustering

Initialize:Make an individual cluster for each instance

Iterate until all instances in same clusterMerge two most similar clusters

Evaluation

EvaluationWith respect to gold standard

AccuracyFor each cluster, assign most common label to all

itemsRand indexF-measure

Alternatives:

Alternatives:Extrinsic evaluation

Alternatives:Extrinsic evaluationHuman inspection

ConfigurationGiven

Set of objects O = {o1,o2,….on}

ConfigurationGiven

Partition X ={x1,…,xr}

Partition Y ={y1,….ys}

ConfigurationGiven

Partition X ={x1,…,xr}

Partition Y ={y1,….ys}

In same sets in X In diff’t sets in X

In same sets in Y a d

In diff’t sets in Y c b

Rand IndexMeasure of cluster similarity (Rand, 1971)

No agreement?

No agreement? 0; Full agreement

No agreement? 0; Full agreement? 1

Precision & RecallAssume X is the gold standard partition

Assume Y is the system-generated partition

For each pair of items in a cluster in YCorrect if they appear together in a cluster in X

Can compute P, R, and F-measure

HW #10

Due to F. Xia

HW #10Unsupervised POS tagging:

Word clustering by neighboring word cooccurrence

Create feature vectors: Features: counts of adjacent word occurrence E.g., L=he:10 or R=run:3

Perform clustering: K-medoids algorithm ( with cosine similarity)

Evaluate clusters: Cluster mapping + accuracy

Q1create_vectors.* training_file word_file feat_file outfile

Training file: one-sentence-per-line: w1 w2 w3 …wn

word_file: List of words to cluster word<tab>freq

feat_file: List of words to use as features feat<tab>freq

outfile: One list per word in word_file Format: word L=he 10 L=she 5 ….. R=gone 2 R=run 3…

FeaturesFeatures are of the form:

(L|R)=xx freqwhere xx is a word in the feat_file, L, R: the position where the feature

appeared freq: # of times word xx appeared in

position in training file

Suppose ‘New York’ appears 540 times in corpus

York L=New 540 … R=New 0…

Vector FileOne line per word in word_file

Lines should be ordered by word_file

Features should be sorted alphabetically by feature nameE.g. L=an 3 L=the 10 … R=aqua 1 R=house 5

Feature sorting aids cosine computation

Q2k_medoids.* vector_file num_clusters

sys_cluster_file

vector_file: Created by Q1

num_clusters : number of clusters to create

sys_cluster_file: output representing clustering of vectorsmedoid w1 w2 w3 …wnwhere medoid is the medoid representing the cluster w1…wn are the words in the cluster

Q2: K-MedoidsSimilarity measure: Cosine similarity

Initial medoids:Medoid i is at instance:

where N is # of words to cluster C is # of clusters

Mapping Sys to Gold:One-to-One

Find highest number in matrixRemove corresponding row and columnRepeat until all rows removeds1 => g2 10 s2 => g1 7s3 => g3 6 acc= (10+7+6)/sum

Due to F. Xia

g1 g2 g3

s1 2 10 9

s2 7 4 2

s3 0 9 6

s4 5 0 3

Mapping Sys to Gold:One-to-One

Find highest number in matrix Remove corresponding row and column Repeat until all rows removed s1 => g2 10 s2 => g1 7 s3 => g3 6 acc= (10+7+6)/sum

Due to F. Xia

g1 g2 g3

s1 2 10 9

s2 7 4 2

s3 0 9 6

s4 5 0 3

Mapping Sys to Gold:Many-to-One

Find highest number in matrix Remove corresponding row (but not column) Repeat until all rows removed s1 => g2 10 s2 => g1 7 s3 => g3 9 s4 => g1 5 acc= (10+7+9+5)/sum

Due to F. Xia

g1 g2 g3

s1 2 10 9

s2 7 4 2

s3 0 9 6

s4 5 0 3

Q3: calculate_accuracycalculate_accuracy.* sys_clust gold_clust flag

map_file acc_file sys_clust: output of Q2: m w1 w2 …

gold_clust: similar format, gold standard

flag: 0: one-to-one; 1:many-to-one

map_file: mapping of sys to gold clusterssys_clust_num => gold_clust_num count

acc_file: just overall accuracy

ExperimentsCompare different numbers of words and

different feature representations

Compare different mapping strategies for accuracy

Tabulate results

Clustering Shallow Processing Techniques for NLP Ling570 November 30, 2011

Documents

Question-Answering: Shallow & Deep Techniques for NLP Ling571 Deep Processing Techniques for NLP March 9, 2011 Examples from Dan Jurafsky)

NLP Master Coach Course - Edge NLP Limited...NLP Master Coach Course The NLP Master Coach training programme follows the American Board of NLP Coaching division NLP Master Coach syllabus

Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011

Shallow & Deep QA Systems Ling 573 NLP Systems and Applications April 9, 2013

Introduction to NLP - ils.albany.eduWhat is NLP? 2.Some application areas of NLP 3.A brief history of NLP 4.Famous NLP systems 5. Ambiguity and NLP 6.Overcoming ambiguity – Brief

NLP Training, Coaching, Neuro Linguistic Programming & NLP ... - … · 2018-04-15 · International Master Trainer NLP Ralph Watson International Master Trainer NLP Assistant Trainers

GLOBAL NLP TRAINING BROCHURE · 2. Motivational Coach Optional: 3. NLP Communication (Society of NLP) We are unique as a company to offer NLP Communication (Society of NLP) as a certification

2015 Sep - NLP Workshop - NLP Center

CS11-711 Advanced NLP NLP Experimental Design

INLPTA NLP Trainer Certification Training - Commplus€¦ · INLPTA NLP Trainer Certification Training With INLPTA NLP Master Trainer, ... During NLP Trainer Certification you willbe

Nlp-Automata in Nlp

NLP Practitioner Notes · NLP Practitioner Notes NLP Practitioner Course presented by Micheal Goodman NLP Master Practitioner (ABNLP) and NLP Trainer (NLPEA) 2018

Introduction & Tokenization Ling570 Shallow Processing Techniques for NLP September 28, 2011

NLP mastering nlp coaching skills.pdf

ACCELERATED NLP PRACTITIONER TRAINING - … · Meta-NLP ® — Accelerated NLP Training Copyright: ... Neuro-Linguistic Programming NLP as a Model ... Changing Meta-Programs

Morphology, Phonology & FSTs Shallow Processing Techniques for NLP Ling570 October 12, 2011

Smoothing N-gram Language Models Shallow Processing Techniques for NLP Ling570 October 24, 2011

[Nlp ebook] anne linden - mindworks - nlp tools

Welcome, to your NLP · Welcome, to your NLP Practitioner Certification Pre-Study Programme Thank you for using NLP World as your learning provider in NLP. NLP World is a company

Seminar: Efficient NLP Session 2, NLP behind Broccoli