31
1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel http://www.bbn.com/ontonotes

1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

1

OntoNotes: A Unified Relational Semantic Representation

Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

http://www.bbn.com/ontonotes

Page 2: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

2

Outline

Multiple layers of annotation and modeling capture useful elements of text meaning at 90% ITA– Syntax– Proposition– Word sense

– Ontology– Coreference– Names

An integrated relational database representation– Enforces consistency across the different annotations– Supports integrated models that can combine evidence from

different layers

Some practical issues Sensitivity to changes in layers

Adding a new layer to the data

Few lessons learned

Page 3: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

3

Problems with Multiple Layers of Annotation

Not previously available – A number of these layers have not been available in significant

quantity before:• Word Sense • Coreference

Not previously integrated – Each layer encoded separately as individual files, requiring

supporting documentation for interpretation

Not previously completely consistent– Mismatches between Treebank and PropBank

Not previously user friendly– Raw text format

Page 4: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

4

Unified Representation

Provide a bare-bones representation independent of the individual layer’s semantics that can– Efficiently capture intra- and inter- layer semantics– Maintain component independence (facilitate collaboration)– Provide mechanism for flexible integration (for an application)– Integrate information at the required level of granularity– Data storage as close as possible to an application backend– Adaptable in face of incremental representational changes– API extremely accessible (don’t need to be a hacker to use it)– Ability to easily perform cross-layer queries– Easily extensible– Capable of maintaining version information – Ideally at

different possible levels– …– …

Relational Database

+

Object Oriented API

Page 5: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

5

Relational Representation

Corpus

Trees

Coreference Names

Propositions

Senses

Page 6: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

6

Example: Database Representation of Syntax

• Treebank tokens (stored in the Token table) provide the common base• The Tree table stores the recursive tree nodes, each with its span• Subsidiary tables define the sets of function tags, phrase types, etc.

Page 7: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

7

Object Oriented API

Page 8: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

8

Using the API: Importing the modules

Page 9: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

9

Using the API: Creating Skeleton Objects

Page 10: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

10

Using the API: Creating Full-fledged Objects (I)

Page 11: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

11

Using the API: Creating Full-fledged Objects (II)

Page 12: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

12

Using the API: Writing to the database

Page 13: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

13

Using the API: Reading form the Database

Page 14: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

14

Data Loading Life-cycle

Database

Page 15: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

15

OntoNotes Data: Current and Future

NW BN BC

Eng 300

Chi 250

Ara

OntoNotes 1.0

100Ara

300250Chi

200300Eng

BCBNNW

OntoNotes 2.0

200Ara

150300250Chi

200200500Eng

BCBNNW

OntoNotes 3.0

Page 16: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

16

Advantages of an Integrated Representation

Clean, consistent layers– Resolve the inconsistencies and problems that this reveals

Well defined relationships– Database schema defines the merged structure efficiently

Extract individual views – Treebank, PropBank, etc.

SQL queries can extract examples based on multiple layers or define new views

Python Object-oriented API allows for programmatic access to tables and queries

Page 17: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

17

Example of Database Query Function

for a_proposition in a_proposition_bank: if(a_proposition.lemma != "say"): arg_in_p_q = "select * from argument where proposition_id = '%s';" % (a_proposition.id) a_cursor.execute(arg_in_p_query) argument_rows = a_cursor.fetchall()

for a_argument_row in argument_rows: a_argument_id = a_argument_row["id"] a_argument_type = a_argument_row["type"]

if(a_argument_type != "ARG0"): n_in_arg_q = "select * from argument_node where argument_id = '%s';" % (a_argument_id) a_cursor.execute(n_in_arg_q) argument_node_rows = a_cursor.fetchall() for a_argument_node_row in argument_node_rows: a_node_id = a_argument_node_row["node_id"]

a_ne_node_query = "select * from name_entity where subtree_id = '%s';" % (a_node_id) a_cursor.execute(a_ne_node_query) ne_rows = a_cursor.fetchall()

for a_ne_row in ne_rows: a_ne_type = a_ne_row["type"] ne_hash[a_ne_type] = ne_hash[a_ne_type] + 1

a_tree = a_tree_document.get_tree(a_tree_id) a_node = a_tree.get_subtree(a_node_id)

for a_child in a_node.subtrees(): a_ne_subtree_query = "select * from name_entity where subtree_id = '%s';" % (a_child.id) subtree_ne_rows = a_cursor.execute(a_ne_subtree_query)

ne_subtree_rows = a_cursor.fetchall()

for a_ne_subtree_row in ne_subtree_rows: a_subtree_ne_type = a_ne_subtree_row["type"] ne_hash[a_subtree_ne_type] = ne_hash[a_subtree_ne_type] + 1

if (proposition.lemma == “say”):

query = “select * from argument where proposition_id = '%s';” ..

What is the distribution of named entities that are ARG0s of the predicate “say”?

if (argument_type == "ARG0"):

for child in node.subtrees():

......

15NORP

29Organization

34GPE

84Person

FrequencyName Entity

Page 18: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

18

Reconciling Treebank and PropBank

We found several mis-matches between syntax and propositions– Sometimes PropBank was right– Sometimes Treebank was right

Guidelines modified to bring the two in line

Now each argument points to a single node in the tree– Secondary connections are made using Treebank trace chains– Almost no discontinuous arguments– Non-trace connections are explicitly identified

This greater consistency will make it easier to train models that predict argument structure

Page 19: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

19

Sensitivity to Changes – PropBank changes

ARG2

ARG1ARGM-LOC

... major reductions and realignments of troops in central Europe – ...

NP

NP

JJ NNS CC NNS IN NP

NNS

PP

IN NP

JJ NNP

PP

S

Page 20: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

20

Sensitivity to Changes – Treebank changes

... major reductions and realignments of troops in central Europe – ...

NP

NP

JJ NNS CC NNS IN NP

NNS

PP

IN NP

JJ NNP

PP

S

• If the node got deleted, remove associated annotation• if any node has a change in children or parent node, then update associated annotation. Print new propbank

Page 21: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

21

Adding a new layer

1. What information do you want to capture?

2. Define relationship with the required layer

3. Design tables

4. Superimpose on existing machinery with respect to the anchor

5. Create a class in the corpora packagea. Define a few specific functions

• Create object from original annotation (Text Reader)• Write object to database (DB Writer)• Create object from database (DB Reader)• Write database to original format (Text Writer)• Pretty print function (Pretty Printer)

b. Write at least one alignment function at the level where the enrichment is required, or even multiple levels• Enrich Treebank/Document/…

Page 22: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

22

Few Errors Found

Missing co-indices in Trees (found during loading) Invalid sense numbers (while checking against repository) Multiple sense definitions (in the repository) Validation errors in schemas Dead pointers in ontology Multiple coreference chain memberships Missing/Invalid predicate/argument pointers Invalid PB/TB merges Filename/Content mismatches Pinyin/Unicode inconsistencies Varying sentence breaks SLINK Errors Inconsistent TB Empty specifications in the merge process Typos (found through Type Tables) .. And, a few annotation Errors

Page 23: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

23

Some Interesting Problems Addressed

Word sense annotation transferred from old Treebank to new Treebank

Coreference annotation transferred to new Treebank

Treebank/PropBank with or without NMLs reside in harmony

Various levels of data quality identified in the database

Varying styles of marking traces normalized

Language specific idiosyncrasies in inventories and frames normalized

Data generated for annotation– Eventive nouns– Coreference

Page 24: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

24

Few Lessons Learned

Each layer should – abide by a minimum dependency principle– adhere to a well defined schema

Try to maintain consistency across representation of similar components

Use a centralized, version controlled repository

Need for single-point, push-button loading philosophy

Page 25: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

25

Conclusion

Lot of annotation layers available, integrated using a relational schema

A extensible, relational/object oriented architecture available to the community

Easily Accessible– Through Python API– SQL queries

OntoNotes Release 2.0 available from LDC

unencumbered, open source!!

Page 26: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

26

Backup

Page 27: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

27

Syntax Layer

Identifies meaningful phrases in the text

Lays out the structure of how they are related

Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon .

S

major reductions and realignments of troopsin central Europe

... major reductions and realignments of troops in central Europe – ...

NP

NP

JJ NNS CC NNS IN NP

NNS

PP

IN NP

JJ NNP

PP

SYNTAX

Page 28: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

28

ARG2

ARG1

ARGM-LOC

Propositional Structure

Tells who did what to whom

For both verbs and nouns

Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon .

... major reductions and realignments of troops in central Europe – ...

NP

NP

JJ NNS CC NNS IN NP

NNS

PP

IN NP

JJ NNP

PP

S

Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon .

Page 29: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

29

reduce.01 – Make less

Predicate Frames

Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon .

reductionreduce.01 – Make less

ARG0 – Agent ARG1 – Thing fallingARG2 – Amount fallenARG3 – Starting pointARG4 – Ending point

Predicate frames define the meanings of the numbered arguments

- the troopsmajor--

Page 30: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

30

Word Sense and Ontology

Meaning of nouns and verbs are specified using a catalog of possible senses

All the senses are annotatable at 90% ITA

Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon .

Word Sense

aim

1. Point or direct object, weapon, at something ...

2. Wish, purpose or intend to achieve something

Word Sense

register

1. Enter into an official record2. Be aware of, enter into someone’s

consciousness3. Indicate a measurement4. Show in one’s face

2. Wish, purpose or intend to achieve something

1. Enter into an official record

Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon .

Ontology links (currently being added) capture similarities between related senses of different words

Page 31: 1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

31

Coreference

Identifies different mentions of the same entity within a document – especially links definite, referring noun phrases, and pronouns to their antecedents

Two types tagged – Identity and Attributive

Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon .

President Bushconventional arms talk

Pentagon He

e0 e1 e2

of some 100,000 weapons , as well as major reductions and realignments of troopsin central Europe

Vienna talks – which are aimed at the destruction

the Pentagon