42
Type Systems, Interoperability and Database Population Eric Nyberg, CMU Shilpa Arora, CMU Lance Ramshaw, BBN

Type Systems, Interoperability and Database Population

  • Upload
    robbin

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

Type Systems, Interoperability and Database Population. Eric Nyberg, CMU Shilpa Arora, CMU Lance Ramshaw, BBN. Outline. Annotation sample analysis emergent type systems ongoing issues / clarification questions Data interoperability Database population CMU’s Annotations DB OntoNotes - PowerPoint PPT Presentation

Citation preview

Page 1: Type Systems, Interoperability and Database Population

Type Systems,Interoperability

and Database PopulationEric Nyberg, CMUShilpa Arora, CMU

Lance Ramshaw, BBN

Page 2: Type Systems, Interoperability and Database Population

Outline

• Annotation sample analysis– emergent type systems– ongoing issues / clarification questions

• Data interoperability• Database population

– CMU’s Annotations DB– OntoNotes– Possible architecture for interoperability with

UIMA annotators– Issues for Discussion

Page 3: Type Systems, Interoperability and Database Population

Task• Analyze sample outputs from

different annotation groups• Formalize annotation type system

(UML object model) for each sample

• Generate clarification questions • Work toward a unified type

system• Work toward interoperability

architecture

In progress,not finished

Not started

Page 4: Type Systems, Interoperability and Database Population

For each annotation sample:

• Overview of what we received

• Brief example annotation

• Type system analysis

• Issues / Questions

Page 5: Type Systems, Interoperability and Database Population

Whats in the bin ?

5

# Annotation Manual Samples Analysis Type System

1.1 CMU Belief Annotations x x x x

1.2 CMU Event Coreference Annotations x

2.1Ed Hovy's Group - Noun Sense Annotation x x x

3.1 BBN Temporal Ordering Annotation x x x x

3.2 BBN Name Annotations x x x

3.3 BBN Coreference Annotation x x x

3.4 BBN (Complex) Coreference Annotation x x x x

4.1 UMBC Modality Annotation x x x x

5.1 Columbia Dialog Annotation x

Page 6: Type Systems, Interoperability and Database Population

CMU/Columbia Belief Annotation

• Annotation Manual:– Davis et. al., “Annotating belief in

Communication: Manual”

• Annotation Units: Propositions identified by PropBank and NomBank

6

Page 7: Type Systems, Interoperability and Database Population

CMU/Columbia CMU Belief Annotation

• Three categories:– Committed belief: Belief expressed in utterance

• Can be a proposition about present or future• E.g. (1) I know Mark and Sandra have eloped. (2) The

sun will rise again. (Future)

– Non-committed belief: Not a strong belief• Can be a proposition about present or future• E.g. (1) Mark and Sandra may have eloped. (2) John

may return tomorrow.

– Not application: Not a belief• E.g. (1) I wish Mark and Sandra would finally elope.

7

Page 8: Type Systems, Interoperability and Database Population

CMU/Columbia Belief Annotation

• Five Classes:– Committed Belief– Committed Belief Future– Non-Committed Belief– Non-Committed Belief Future– Not Applicable

8

Page 9: Type Systems, Interoperability and Database Population

CMU/Columbia Belief Annotation:Type System (1)

9

Page 10: Type Systems, Interoperability and Database Population

CMU/Columbia Belief Annotation:Type System (2)

10

Page 11: Type Systems, Interoperability and Database Population

CMU/Columbia Belief Annotation: Type System (3)

11

Page 12: Type Systems, Interoperability and Database Population

Follow up questions

• Extensions: – What extensions do we expect to the

annotation scheme? – How best we can tailor the type system

towards expected future changes

• Requirements from application domain?– Do we have a set of requirements from the

application side?

12

Page 13: Type Systems, Interoperability and Database Population

Ed Hovy’s group

• Annotations:– Annotated with OntoNotes for Noun senses– 205 nouns, one file for each noun, sense + location in files for

each noun is stored

• Sample annotations:– eng/AFGP-2002-600175-Trans.txt 427 4 [email protected] 3 Mon Dec 3

02:31:27 2007

– eng/AFGP-2002-602187-Trans.txt 25 6 [email protected] 2 Mon Dec 3 02:31:27 2007

– Noun="position", sense=3; file= AFGP-2002-600175-Trans.txt, position = “427 4”

– Noun="position", sense=3; file=AFGP-2002-602187-Trans.txt, position=“25 6”

13

Page 14: Type Systems, Interoperability and Database Population

TypeSystem (Ed Hovy et. al. Annotation)

14

Page 15: Type Systems, Interoperability and Database Population

BBN

1. BBN TTO-3 Temporal Ordering Annotation

2. BBN Name Annotations: named entities – org, date, per etc

3. BBN-Coref-Annotation: entity (with type) and entity mentions etc

4. BBN-complex-coref-annotation

15

Page 16: Type Systems, Interoperability and Database Population

Temporal Relationship Assignment

• ID TT TP TR• 11/28 1 DS 2 A• Arrived 2 EP 0 B• yesterday 3 DS 2 C• told 4 SP 2 B• Visiting 5 EUN 4 A• left 6 EP 4 A• Return 7 EF 2 A• Monday8 DS 7 C• is 9 BC 0 C• Return 10 EF 9 A• day 11 DU 10 C

16

Page 17: Type Systems, Interoperability and Database Population

Type System (BBN Temporal Ordering Annotation)

17

Page 18: Type Systems, Interoperability and Database Population

BBN Name Annotations (Type system)

18

Page 19: Type Systems, Interoperability and Database Population

BBN-complex-coref-annotation

Annotations:• Relations between entities

– Member– Member Base– Subset– Subset Size (future type system)

• Other annotations - Attributes of a mention– Reference type– Syntactic Context

19

Page 20: Type Systems, Interoperability and Database Population

20

Type System for BBN (Complex) coreference annotation

Page 21: Type Systems, Interoperability and Database Population

21

Type System for BBN (Complex) coreference annotation (contd…)

Page 22: Type Systems, Interoperability and Database Population

UMBC Modality Annotations

• TMR – Text Meaning Representation or Concepts annotated

• Main Annotation – Modality. It has three main attributes: TYPE, VALUE, SCOPE & ATTRIBUTED-TO

• TMRs can be nested i.e. attributes or relation can refer to other TMRs

22

Page 23: Type Systems, Interoperability and Database Population

23

UMBC Modality Annotations

Page 24: Type Systems, Interoperability and Database Population

Interoperability: Data• Common data model• Multiple implementations

– based on the same underlying schema(formal object model)

– meet different goals / requirements

• Implementation Criteria:– support effective run-time annotation

(e.g. UIMA type system)– Support effective user interface, query/update

(e.g. OntoNotes)– Support on-the-fly schema extension

(e.g. CMU’s AnnotationsDB)

Page 25: Type Systems, Interoperability and Database Population

Interoperability: Data [2]

• Formal object model is mapped to:– UIMA type system definition (create)– OntoNotes RDBMS schema (extend)– CMU’s Annotations DB (extend)

• Annotated data can be represented in any format that implements the formal model

• “Have your cake and eat it too”

Page 26: Type Systems, Interoperability and Database Population

CMU’s Annotations Database• MySQL implementation

• Java APIs (SQL connection API and simple object access API)

• Fully integrated with UIMA

• Used on DTO and DARPA projects

• PRO: tag types can be extended at run time by the application (schema supports open-ended type definition)

• CON: interactive tools are currently limited

Page 27: Type Systems, Interoperability and Database Population

JAVELIN Project Briefing

AQUAINTProgram

Annotations Database

In an interview with Defense News, Indian Defence Research and Development Organization (DRDO) scientists said India was launching a comprehensive plan to develop a wide range of modern nuclear missiles. Within two years, India would develop an intercontinental ballistic missile (ICBM), ...

<entity type=org offset=21 length=12 /><entity type=org offset=35 length=59 /><entity type=gpe offset=111 length=5 source=bbn ref=#INDIA /><entity type=gpe offset=223 length=5 source=bbn ref=#INDIA /><entity type=fac offset=231 length=41 />

document

datetimedocnodoctype

passage

text

tag

typevalueparent

span

offsetlength

*

**

*

Page 28: Type Systems, Interoperability and Database Population

28

An Integrated Annotation DB in OntoNotes

Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel

http://www.bbn.com/NLP/OntoNotes

Page 29: Type Systems, Interoperability and Database Population

29

Goals

Capture multiple layers of annotation and modeling– Syntax– Propositions– Word sense– Ontology– Coreference– Names

Using an integrated relational database representation– Enforces consistency across the different annotations– Supports integrated models that can combine evidence from

different layers

Page 30: Type Systems, Interoperability and Database Population

30

Unified Representation

Provide a bare-bones representation independent of the individual semantics that can– Efficiently capture intra- and inter- layer

semantics– Maintain component independence – Provide mechanism for flexible integration– Integrate information at the lowest level of

granularity

A Relational Database

Page 31: Type Systems, Interoperability and Database Population

31

Unified Relational Representation

Corpus

Trees

Coreference Names

Propositions

Senses

Page 32: Type Systems, Interoperability and Database Population

32

Example: DB Representation of Syntax

• Treebank tokens (stored in the Token table) provide the common base• The Tree table stores the recursive tree nodes, each with its span• Subsidiary tables define the sets of function tags, phase types, etc.

Page 33: Type Systems, Interoperability and Database Population

33

Advantages of an Integrated Representation

Each layer translates into a common representation Clean, consistent layers

– Resolve the inconsistencies and problems that this reveals

Well defined relationships– Database schema defines the merged structure efficiently

Original representations available as predefined views – Treebank, PropBank, etc.

SQL queries can extract examples based on multiple layers or define new views

Python Object-oriented API allows for programmatic access to tables and queries

Page 34: Type Systems, Interoperability and Database Population

34

Syntax Layer

Identifies meaningful phrases in the text

Lays out the structure of how they are related

Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon .

S

major reductions and realignments of troopsin central Europe

... major reductions and realignments of troops in central Europe – ...

NP

NP

JJ NNS CC NNS IN NP

NNS

PP

IN NP

JJ NNP

PP

SYNTAX

Page 35: Type Systems, Interoperability and Database Population

35

ARG2

ARG1

ARGM-LOC

Propositional Structure

Tells who did what to whom

For both verbs and nouns

Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon .

... major reductions and realignments of troops in central Europe – ...

NP

NP

JJ NNS CC NNS IN NP

NNS

PP

IN NP

JJ NNP

PP

S

Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon .

Page 36: Type Systems, Interoperability and Database Population

36

reduce.01 – Make less

aim.02 – Directed motion

Predicate Frames

Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon .

Predicate Frames

aimaim.01 – Plan

aim.02 – Directed motion

ARG0 – Aimer ARG1 – Action

ARG0 – AimerARG1 – Thing in motionARG2 – Target

Predicate Framesreductionreduce.01 – Make less

ARG0 – Agent ARG1 – Thing fallingARG2 – Amount fallenARG3 – Starting pointARG4 – Ending point

Predicate frames define the meanings of the numbered arguments

Page 37: Type Systems, Interoperability and Database Population

37

Word Sense and Ontology

Meaning of nouns and verbs are specified All the senses are annotatable at 90% inter-annotator agreement Catalog of possible meanings supplied in the sense inventory files Ontology links (currently being added) will capture similarities

between related senses of different words

Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon .

Word Sense

aim

1. Point or direct object, weapon, at something ...

2. Wish, purpose or intend to achieve something

Word Sense

register

1. Enter into an official record2. Be aware of, enter into someone’s

conciousness3. Indicate a measurement4. Show in one’s face

2. Wish, purpose or intend to achieve something

1. Enter into an official record

Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon .

Page 38: Type Systems, Interoperability and Database Population

38

Coreference

Identifies different mentions of the same entity in text – especially links definite, referring noun phrases, and pronouns in text

Two types – Identity as well as Attributive coreference tagged.

Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon .

President Bush

conventional arms talk

the Pentagon

Vienna talks – which are aimed at the destructionof some 100,000 weapons , as well as major reductions and realignments of troopsin central Europe the Pentagon

Pentagon

He

e0 e0 e0

Page 39: Type Systems, Interoperability and Database Population

39

Example of DB Query Function

for a_proposition in a_proposition_bank: if(a_proposition.lemma != "say"): arg_in_p_q = "select * from argument where proposition_id = '%s';" % (a_proposition.id) a_cursor.execute(arg_in_p_query) argument_rows = a_cursor.fetchall()

for a_argument_row in argument_rows: a_argument_id = a_argument_row["id"] a_argument_type = a_argument_row["type"]

if(a_argument_type != "ARG0"): n_in_arg_q = "select * from argument_node where argument_id = '%s';" % (a_argument_id) a_cursor.execute(n_in_arg_q) argument_node_rows = a_cursor.fetchall() for a_argument_node_row in argument_node_rows: a_node_id = a_argument_node_row["node_id"]

a_ne_node_query = "select * from name_entity where subtree_id = '%s';" % (a_node_id) a_cursor.execute(a_ne_node_query) ne_rows = a_cursor.fetchall()

for a_ne_row in ne_rows: a_ne_type = a_ne_row["type"] ne_hash[a_ne_type] = ne_hash[a_ne_type] + 1

a_tree = a_tree_document.get_tree(a_tree_id) a_node = a_tree.get_subtree(a_node_id)

for a_child in a_node.subtrees(): a_ne_subtree_query = "select * from name_entity where subtree_id = '%s';" % (a_child.id) subtree_ne_rows = a_cursor.execute(a_ne_subtree_query)

ne_subtree_rows = a_cursor.fetchall()

for a_ne_subtree_row in ne_subtree_rows: a_subtree_ne_type = a_ne_subtree_row["type"] ne_hash[a_subtree_ne_type] = ne_hash[a_subtree_ne_type] + 1

if (proposition.lemma == “say”):

query = “select * from argument where proposition_id = '%s';” ..

What is the distribution of named entities that are ARG0s of the predicate “say”?

if (argument_type == "ARG0"):

for child in node.subtrees():

Name Entity Frequency

Person 84

GPE 34

Organization 29

NORP 15

... ...

Page 40: Type Systems, Interoperability and Database Population

40

Conclusion

Integrating the annotation layers using a relational schema – Improves consistency– Allows predictive features that combine evidence from

multiple layers

Easily Accessible– Through Python API– SQL queries

Page 41: Type Systems, Interoperability and Database Population

Interoperability: Components

OntoNotesCollection

Reader

OntoNotesCAS

Consumer

OntoNotes

UIMAAnalysisEngine

AnnotationsDB

ADBCollection

Reader

ADBCAS

ConsumerFile SystemCollection

Reader

XCASCollection

Reader

XCASCAS

Consumer

XML

TXTExisting

UIMA wrapper

New UIMAwrapper

RDBMSstorage

Filestorage

key

A shared, formal type systemallows multiple data formats tobe combined effectively

Customer’sannotators

Page 42: Type Systems, Interoperability and Database Population

Issues for Discussion

• Persistence formats optimize for different concerns– RDBMS – relational querying, update– XCAS – fast deserialization of run-time

objects

• Consider extending schema to hold XML serialization of document annotations