29
Fex Feature Extractor - v2

Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

Embed Size (px)

Citation preview

Page 1: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

Fex Feature Extractor - v2

Page 2: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

Topics

• Vocabulary• Syntax of scripting language

– Feature functions– Operators

• Examples– POS tagging

• Input Formats

Page 3: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

Vocabulary• example

– A list of active records for which Fex produces a single SNOW example. Usually a sentence.

• record – a single position in an example (sentence). – Contains a list of fields, each of which holds a different info: e.g. NLP: Word, Tag, Vision: color, etc.

• Raw input to Fex – A list of valid example, (raw sentences, tagged corpora, etc. )

• Fex’s Output – Lexical features written to the lexicon file. – Their corresponding numeric ID’s are written to the example file.

• feature function – A relation among one or more records.

Page 4: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

Example: Feature Functions

Page 5: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

Script Syntax• A Fex script file contains a list of definitions, each of

which will rewrite the given observation into a set of active features.

• Definition format, terms in ()’s optional:• target (inc) (loc): FeatureFunc ([left, right])

• target - Target index or word. To treat each record in the observation as a target, use -1. This is a macro for “all words”.

• inc - Include target word instead of placeholder (*) in some features.

• loc - Generate features with location relative to target.

Page 6: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

• FeatureFunc - A feature function defined in terms of certain unary and n-ary relations, and operators.

• left - Left offset of scope for generating features. Negative values are left of the target, positive to the right.

• right - Right offset of scope.

Page 7: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

Basic Feature Functions• Type Def Fex Notation Interpretation Output to

Lexicon Labellab produces a label feature lab[target word]lab(t) lab[target tag]

Word w Active if word(s) in current w[current word] record is within scope

Tag (pos) t Active if tag(s) in current t[current tag] record is within scope

Vowel v Active if the word(s) in v[initial vowel]

current record begin with a vowel.

Prefix pre Active if the word(s) in the pre[active prefix]current record begins witha prefix in a given list.

Page 8: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

Type Def Fex Notation Interpretation Output to Lexicon

Suffix suf Active if the word(s) in suf[the active suffix] the current record begins

with a prefix in a given list

Baseline base Active if a baseline tag from base[baseline tag]a prepared list exists for the word(s) in the current record

Lemma lem Active if a lemma from the lem[active lemma]WordNet database exists forthe word(s) in the current

record

Page 9: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

Example

• Sentence = “(DET The) (NN dog) (V is) (JJ mad)”method 1

Script Def Output to lexicon Output to example filedog: w [-1,1] 10001 w[The] 10001, 10002, 10003, 10004:

10002 w[is]dog: t [1,2] 10003 t[V] 10004 t[JJ]

method 2Script Def Output to lexicon Output to example file -1: lab 10001 w[The] 1, 10001, 10002, 10003, 10004:-1: w [-1,1] 10002 w[is]-1: t [1,2] 10003 t[V] 10004 t[JJ]

Page 10: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

Operators & Complex Functions

• (X) operator - Indicate that a feature is active without any specific instantiation.

Script Def Output to Lexicon

dog: v(X) [-1,1] 10001 v[]

• (x=y) operator – Creates an active feature iff the active instantiation matches the given

argument.Script Def Output to Lexicon

dog: w(x=is) 10001 w[is]

Sentence = “(DET The) (NN dog) (V is) (JJ mad)”

Page 11: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

• & operator - conjunct two features:

producing a new feature which is active iff record fulfills both constituent features.

Script Def Output to Lexicon

dog: w&t [-1,-1] 10001 w[The]&t[DET]

• | operator - disjunction of two feature:

outputting a feature for each term of the

disjunction that is active in the current record.Script Def Output to Lexicon

dog: w|t [-1,-1] 10001 w[The] 10002 t[DET]

Sentence = “(DET The) (NN dog) (V is) (JJ mad)”

Operators & Complex Functions

Page 12: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

• coloc function - Consecutive feature function: takes two or more features as arguments to produce a

consecutive collocation over two or more records. The order of the arguments is preserved in the active feature.Script Def Output to Lexicon mad: coloc(w, t) [-3,-1] 10001 w[The]-t[NN]

10002 w[dog]-t[V]

• scoloc function –Sparse Consecutive feature function: operates similarly to coloc, except that active colocations need not be consecutive. However, the order of the arguments is still preserved in determining whether a feature is active.Script Def Output to Lexicon mad: scoloc(w,t) [-3,-1] 10001 w[The]-t[NN]

10002 w[dog]-t[V] 10003 w[The]-t[V]

Operators & Complex Functions

Page 13: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

Example: POS tagging

• Useful features for POS tagging:– The preceding word is tagged c.

– The following word is tagged c.

– The word two before is tagged c.

– The word two after is tagged c.

– The preceding word is tagged c and the following word is tagged t.

– The preceding word is tagged c and the word two before is tagged t

– The following word is tagged c and the word two after is tagged t.

– The current word is w.

– The most probable part of speech for the current word is c.

Page 14: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

• Given the sentence:– (t1 The) (t2 dog) (t3 ran) (t4 very) (t5 quickly)

• The following Fex script will produce the features from the last slide.

-1: lab(t) -1 loc: t [-2,2] -1: coloc(t,t,t) [-2,2] -1 inc: w[0,0] -1: base[0,0]

• To do POS tagging, an example needs to be generated for each word in observation.

Page 15: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

• For the third word, “ran”, the script produces the following output:– Script: Lexicon Output:

-1: lab(t) 1 lab[t3]

-1 loc: t [-2,2] 10001 t[t1_*]10002 t[t2*]

10003t[*t4]10004 t[*_t5] -1:

coloc(t,t,t) [-2,2] 10005 t[t1]-t[t2]-*10006 t[t2]-*-

t[t4] 10007 *-t[t4]-t[t5] -1 inc: w [0,0] 10008

w[ran] -1: base [0,0]10009 base[V]

• And an example in the example file:– 1, 10001, 10002, 10003, 10004, 10005, 10006, 10007, 10008, 10009:

Page 16: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

Input Formats• Fex can presently accept data in two formats:

– w1 w2 w3 w4 …

– (t1 w1) (t2 w2) (t3 w3) (t4 w4) …

– w1 (t2 w2) (t3 t3a; w3) (t4; w4 w4a) …

Page 17: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

Using Fex (command line)

fex [options] script-file lexicon-file corpus-file example-file

Options:

• -t: target file– do not have any empty line in your file!!!

– Each target in a separate line

• -r: test mode– Does not create new features

• -h, -I– Creates a histogram of active features

Page 18: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

Using Fex (command line)

• Target file= targ: Script file = script: dog -1 : lab

cat -1 : w [-1,-1]

-1 : t [-1,-1]

Corpus file = corpus (DET The) (NN dog) (V is) (JJ mad)

Lexicon file =lexicon

Example file=example

fex –t targ script lexicon corpus example

Page 19: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

SNoW

Page 20: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

Word representation

0.75 11.51 2

join

as will the NOUN_ VERB-to_modal

say

_"

2 0.251.25

Page 21: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

Restrictions on the learning approach

• Multi- Class

• Variable number of features– per class– per example

• Efficient learning

• Efficient evaluation

Page 22: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

SNoW• Network of threshold gates• Target nodes represent class labels• Input nodes (features) and links are allocated in a data

driven way (Order of 105 input features for many target nodes)

• Each sub-network (target nodes) is learned autonomously as a function of the features

• An example presented is positive to one network negative to others (depends on the algorithm)

• Allocations of nodes (features) and links is Data-Driven

(a link between feature fi and target tj is created only when fi was active with any target tj)

0.75 11.51 2

join

as will the NOUN_ VERB-to_modal

say

_"

2 0.251.25

Page 23: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

Word prediction using SNoW

• Target nodeseach word in the set of candidates words is atarget node

• Input nodesan input node for feature fi is allocated only if that feature fi was active with any target

• Decision task we need to choose one target among all

possible candidates

0.75 11.51 2

join

as will the NOUN_ VERB-to_modal

say

_"

2 0.251.25

Page 24: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

SNoW (Command line)

snow –train –I inputfile –F networkfile [-ABcdePrsTvW]

snow –test –I inputfile –F networkfile [-bEloRvw]

ArchitectureWinnow: -W [, , , init weight] :targets

Perceptron: -P [, , init weight] :targets

NB: -B :targets

Page 25: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

SNoW parameters (training)

-d <none | abs:<k> | rel > : discarding method

-e <i> : eligibility threshold

-r <i> : number of cycles

output modes-c <i> : interval for network snapshot

-v < off | min | med | max > :details for the output

to the screen

Page 26: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

SNoW parameters (testing)

-b <k> : smoothing for NB

-w <k> : smoothing for W, P

output modes-E : error file

-o < accuracy | winners | allpred | allact | allboth > :details for the output

-R : results file (stdout)

Page 27: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

File Format (Example file)

6, 10034, 10141, 10151, 10158, 10179:

177, 10034, 10035, 10047:

With weights:

6, 10034(1), 10141(1.5), 10151(0.4), 10158(2), 10179(0.1):

177, 10034(2), 10035(4), 10047(0.6):

Only active feature appear in an example !!!

Page 28: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

File Format (Network file)

NBtarget 111 0 1 135 1 naivebayes 0 0.1 0.5

111 : 0 : 10020 : 4 0 -3.518980417

111 : 0 : 10021 : 1 0 -4.905274778

Winnow

target 111 1 1 135 1562 winnow 0 1.1 0.9 15 1

111 : 0 : 10020 : 4 1 1.1

111 : 0 : 10021 : 1 0 1

Perceptron

target 111 2 1 270 1 perceptron 0 0.1 4 0.2

111 : 0 : 10020 : 4 1 0.3

111 : 0 : 10021 : 1 0 0.2

Page 29: Fex Feature Extractor - v2. Topics Vocabulary Syntax of scripting language –Feature functions –Operators Examples –POS tagging Input Formats

File Format (Error file)

Algorithms:Perceptron: (1, 30, 0.05) Targets: 3, 53, 73 Ex: 8 Prediction: 3 Label: 533: 0.586653: 0.2592*73: 0.1192

Ex: 15 Prediction: 3 Label: 733: 0.598773: 0.001229*53: 0.0002248