68
Natural Language Processing Finite State Morphology Slides adapted from Jurafsky & Martin, Speech and Language Processing

Natural Language Processing

  • Upload
    callie

  • View
    51

  • Download
    0

Embed Size (px)

DESCRIPTION

Natural Language Processing. Finite State Morphology. Slides adapted from Jurafsky & Martin, Speech and Language Processing. Outline. Finite State Automata (English) Morphology Finite State Transducers. Finite State Automata. Let ’ s start with the sheep language from Chapter 2 /baa+!/. - PowerPoint PPT Presentation

Citation preview

Page 1: Natural Language Processing

Natural Language Processing

Finite State Morphology

Slides adapted from Jurafsky & Martin, Speech and Language Processing

Page 2: Natural Language Processing

2

Outline

• Finite State Automata• (English) Morphology• Finite State Transducers

Page 3: Natural Language Processing

Finite State Automata

• Let’s start with the sheep language from Chapter 2 /baa+!/

Page 4: Natural Language Processing

Sheep FSA

• We can say the following things about this machine It has 5 states b, a, and ! are in its alphabet q0 is the start state

q4 is an accept state It has 5 transitions

Page 5: Natural Language Processing

More Formally

• You can specify an FSA by enumerating the following things. The set of states: Q A finite alphabet: Σ A start state A set of accept/final states A transition relation that maps QxΣ to Q

Page 6: Natural Language Processing

Yet Another View

• The guts of FSAs can ultimately be represented as tables

b a ! e

0 1

1 2

2 3

3 3 4

4

If you’re in state 1 and you’re looking at an a, go to state 2

Page 7: Natural Language Processing

Recognition

• Recognition is the process of determining if a string should be accepted by a machine

• Or… it’s the process of determining if a string is in the language we’re defining with the machine

• Or… it’s the process of determining if a regular expression matches a string

• Those all amount the same thing in the end

Page 8: Natural Language Processing

Recognition

• Traditionally, (Turing’s notion) this process is depicted with a tape.

Page 9: Natural Language Processing

Recognition

• Simply a process of starting in the start state

• Examining the current input• Consulting the table• Going to a new state and updating

the tape pointer• Until you run out of tape

Page 10: Natural Language Processing

D-Recognize

Page 11: Natural Language Processing

Key Points

• Deterministic means that at each point in processing there is always one unique thing to do (no choices).

• D-recognize is a simple table-driven interpreter

• The algorithm is universal for all unambiguous regular languages. To change the machine, you simply

change the table.

Page 12: Natural Language Processing

Recognition as Search

• You can view this algorithm as a trivial kind of state-space search.

• States are pairings of tape positions and state numbers.

• Operators are compiled into the table• Goal state is a pairing with the end of

tape position and a final accept state• It is trivial because?

Page 13: Natural Language Processing

Non-Determinism

Page 14: Natural Language Processing

Non-Determinism

• Yet another technique Epsilon transitions Key point: these transitions do not

examine or advance the tape during recognition

Page 15: Natural Language Processing

Equivalence

• Non-deterministic machines can be converted to deterministic ones with a fairly simple construction

• That means that they have the same power; non-deterministic machines are not more powerful than deterministic ones in terms of the languages they can accept

Page 16: Natural Language Processing

ND Recognition

• Two basic approaches (used in all major implementations of regular expressions, see Friedl 2006)

1. Either take a ND machine and convert it to a D machine and then do recognition with that.

2. Or explicitly manage the process of recognition as a state-space search (leaving the machine as is).

Page 17: Natural Language Processing

Non-Deterministic Recognition: Search

• In a ND FSA there exists at least one path through the machine for a string that is in the language defined by the machine.

• But not all paths directed through the machine for an accept string lead to an accept state.

• No paths through the machine lead to an accept state for a string not in the language.

Page 18: Natural Language Processing

Non-Deterministic Recognition

• So success in non-deterministic recognition occurs when a path is found through the machine that ends in an accept.

• Failure occurs when all of the possible paths for a given string lead to failure.

Page 19: Natural Language Processing

Example

Page 20: Natural Language Processing

Example

Page 21: Natural Language Processing

Example

Page 22: Natural Language Processing

Example

Page 23: Natural Language Processing

Example

Page 24: Natural Language Processing

Example

Page 25: Natural Language Processing

Example

Page 26: Natural Language Processing

Example

Page 27: Natural Language Processing

Key Points

• States in the search space are pairings of tape positions and states in the machine.

• By keeping track of as yet unexplored states, a recognizer can systematically explore all the paths through the machine given an input.

Page 28: Natural Language Processing

Why Bother?

• Non-determinism doesn’t get us more formal power and it causes headaches so why bother? More natural (understandable) solutions Not always equivalent

Page 29: Natural Language Processing

Compositional Machines

• Formal languages are just sets of strings

• Therefore, we can talk about various set operations (intersection, union, concatenation)

• This turns out to be a useful exercise

Page 30: Natural Language Processing

Union

Page 31: Natural Language Processing

Concatenation

Page 32: Natural Language Processing

32

Words

• Finite-state methods are particularly useful in dealing with a lexicon

• Many devices, most with limited memory, need access to large lists of words

• And they need to perform fairly sophisticated tasks with those lists

• So we’ll first talk about some facts about words and then come back to computational methods

Page 33: Natural Language Processing

33

English Morphology

• Morphology is the study of the ways that words are built up from smaller meaningful units called morphemes

• We can usefully divide morphemes into two classes Stems: The core meaning-bearing units Affixes: Bits and pieces that adhere to

stems to change their meanings and grammatical functions

Page 34: Natural Language Processing

34

English Morphology

• We can further divide morphology up into two broad classes Inflectional Derivational

Page 35: Natural Language Processing

35

Inflectional Morphology

• Inflectional morphology concerns the combination of stems and affixes where the resulting word: Has the same word class as the original Serves a grammatical/semantic purpose

that is Different from the original But is nevertheless transparently related to

the original

Page 36: Natural Language Processing

36

Nouns and Verbs in English

• Nouns are simple Markers for plural and possessive

• Verbs are only slightly more complex Markers appropriate to the tense of the

verb

Page 37: Natural Language Processing

37

Regulars and Irregulars

• It is a little complicated by the fact that some words misbehave (refuse to follow the rules) Mouse/mice, goose/geese, ox/oxen Go/went, fly/flew

• The terms regular and irregular are used to refer to words that follow the rules and those that don’t

Page 38: Natural Language Processing

38

Regular and Irregular Verbs

• Regulars… Walk, walks, walking, walked, walked

• Irregulars Eat, eats, eating, ate, eaten Catch, catches, catching, caught, caught Cut, cuts, cutting, cut, cut

Page 39: Natural Language Processing

39

Derivational Morphology

• Derivational morphology is the messy stuff that no one ever taught you. Quasi-systematicity Irregular meaning change Changes of word class

Page 40: Natural Language Processing

40

Derivational Examples

• Verbs and Adjectives to Nouns

-ation computerize computerization

-ee appoint appointee

-er kill killer

-ness fuzzy fuzziness

Page 41: Natural Language Processing

41

Derivational Examples

• Nouns and Verbs to Adjectives

-al computation computational

-able embrace embraceable

-less clue clueless

Page 42: Natural Language Processing

42

Morphology and FSAs

• We’d like to use the machinery provided by FSAs to capture these facts about morphology Accept strings that are in the language Reject strings that are not And do so in a way that doesn’t require

us to in effect list all the words in the language

Page 43: Natural Language Processing

43

Start Simple

• Regular singular nouns are ok• Regular plural nouns have an -s on

the end• Irregulars are ok as is

Page 44: Natural Language Processing

44

Simple Rules

Page 45: Natural Language Processing

45

Now Plug in the Words

Page 46: Natural Language Processing

46

Derivational Rules

If everything is an accept state how do things ever get rejected?

Page 47: Natural Language Processing

47

Parsing/Generation vs. Recognition

• We can now run strings through these machines to recognize strings in the language

• But recognition is usually not quite what we need Often if we find some string in the language we

might like to assign a structure to it (parsing) Or we might have some structure and we want to

produce a surface form for it (production/generation)• Example

From “cats” to “cat +N +PL”

Page 48: Natural Language Processing

48

Finite State Transducers

• The simple story Add another tape Add extra symbols to the transitions

On one tape we read “cats”, on the other we write “cat +N +PL”

Page 49: Natural Language Processing

49

FSTs

Page 50: Natural Language Processing

50

Applications

• The kind of parsing we’re talking about is normally called morphological analysis

• It can either be • An important stand-alone component of

many applications (spelling correction, information retrieval)

• Or simply a link in a chain of further linguistic analysis

Page 51: Natural Language Processing

51

Transitions

• c:c means read a c on one tape and write a c on the other• +N:ε means read a +N symbol on one tape and write

nothing on the other• +PL:s means read +PL and write an s

c:c a:a t:t +N: ε +PL:s

Page 52: Natural Language Processing

52

Ambiguity

• Recall that in non-deterministic recognition multiple paths through a machine may lead to an accept state.• Didn’t matter which path was actually

traversed• In FSTs the path to an accept state

does matter since different paths represent different parses and different outputs will result

Page 53: Natural Language Processing

53

Ambiguity

• What’s the right parse (segmentation) for• Unionizable• Union-ize-able• Un-ion-ize-able

• Each represents a valid path through the derivational morphology machine.

Page 54: Natural Language Processing

54

Ambiguity

• There are a number of ways to deal with this problem• Simply take the first output found• Find all the possible outputs (all paths)

and return them all (without choosing)• Bias the search so that only one or a few

likely paths are explored

Page 55: Natural Language Processing

55

The Gory Details

• Of course, its not as easy as • “cat +N +PL” <-> “cats”

• As we saw earlier there are geese, mice and oxen

• But there are also a whole host of spelling/pronunciation changes that go along with inflectional changes• Cats vs Dogs• Fox and Foxes

Page 56: Natural Language Processing

56

Multi-Tape Machines

• To deal with these complications, we will add more tapes and use the output of one tape machine as the input to the next

• So to handle irregular spelling changes we’ll add intermediate tapes with intermediate symbols

Page 57: Natural Language Processing

57

Multi-Level Tape Machines

• We use one machine to transduce between the lexical and the intermediate level, and another to handle the spelling changes to the surface tape

Page 58: Natural Language Processing

58

Overall Scheme

Page 59: Natural Language Processing

59

Cascades

• This is an architecture that we’ll see again and again• Overall processing is divided up into

distinct rewrite steps• The output of one layer serves as the

input to the next• The intermediate tapes may or may not

wind up being useful in their own right

Page 60: Natural Language Processing

60

Overall Plan

Page 61: Natural Language Processing

61

Final Scheme

Page 62: Natural Language Processing

Conclusion

• Finite state machines provide flexible and efficient models of words

• Finite state transducers are the method of choice for morphological analysis If there is a solved problem in NLP, this

is it!

• Why not use finite state techniques for all problems in NLP?

62

Page 63: Natural Language Processing

63

Lexical to Intermediate Level

Page 64: Natural Language Processing

64

Intermediate to Surface

• The add an “e” rule as in fox^s# <-> foxes#

Page 65: Natural Language Processing

65

Foxes

Page 66: Natural Language Processing

66

Composition

1. Create a set of new states that correspond to each pair of states from the original machines (New states are called (x,y), where x is a state from M1, and y is a state from M2)

2. Create a new FST transition table for the new machine according to the following intuition …

Page 67: Natural Language Processing

67

Composition

• There should be a transition between two states in the new machine if it’s the case that the output for a transition from a state from M1, is the same as the input to a transition from M2 or …

Page 68: Natural Language Processing

68

Composition

• δ3((xa,ya), i:o) = (xb,yb) iffThere exists c such thatδ1(xa, i:c) = xb AND

δ2(ya, c:o) = yb