8
XGTagger, a generic interface dealing with XML contents. September 19 th , 2005 Xavier Tannier, Jean-Jacques Girardot, Mihaela Mathieu Ecole des Mines de Saint-Etienne

XGTagger, a generic interface dealing with XML contents

  • Upload
    ilori

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

XGTagger, a generic interface dealing with XML contents. Xavier Tannier, Jean-Jacques Girardot, Mihaela Mathieu Ecole des Mines de Saint-Etienne. September 19 th , 2005. 1. 2. 3. 4. Gone with the wind Margaret Mitchell . 1. - PowerPoint PPT Presentation

Citation preview

Page 1: XGTagger, a generic interface  dealing with XML contents

XGTagger, a generic interface

dealing with XML contents.

September 19th, 2005

Xavier Tannier, Jean-Jacques Girardot,

Mihaela Mathieu

Ecole des Mines de Saint-Etienne

Page 2: XGTagger, a generic interface  dealing with XML contents

Initial XML document

Final XML document

Input (text only)

Output(text only)

System S(black box)

<book> <title>Gone with the wind</title> <author>Margaret Mitchell</author></book>

1.

1.

Gone with the wind . Margaret Mitchell

2.

2.

System S(POS tagger)

3.

3. Gone VPP with IN the DT wind NN . Margaret NN Mitchell NN

4.

4.

XGTagger

Page 3: XGTagger, a generic interface  dealing with XML contents

Initial XML document

Final XML document

Input (text only)

Output(text only)

System S(black box)

<book> <title>Gone with the wind</title> <author>Margaret Mitchell</author></book>

Gone VPP with IN the DT wind NN . Margaret NN Mitchell NN

5.

<book> <title> <w pos=“VPP”>Gone</w> <w pos=“IN”>with</w> <w pos=“DT”>the</w> <w pos=“NN”>wind</w> </title> <author> <w pos=“PN”>Margaret</w> <w pos=“PN”>Mitchell</w> </author> </book>

5.

XGTagger

4.

1.

Page 4: XGTagger, a generic interface  dealing with XML contents

- "hard" tags : break the linearity of the text. ex: titles, chapters, paragraphs

- "soft" tags : identify significant parts of text, but remain "transparent" when reading it.

ex: bold, italics, underlined

- "jump" tags : particular elements, as margin notes, citations, glosses.

<tag>text A</tag><tag>text B</tag>

text A <bold>text B</bold> text C

text A<note>text B</note> text C

Tag classification [Colazzo et al, 2001]

Page 5: XGTagger, a generic interface  dealing with XML contents

<par> United States <bold>elections</bold> are admisnistered at the state and local level</par>

Soft tags, reading contexts and XGTagger

United States elections are admisnistered at the state and local level

Page 6: XGTagger, a generic interface  dealing with XML contents

<paragraph> The 2004 United States<footnote>See an article p.163 about the United States of America.</footnote> elections caused less controversy than in 2000.</paragraph>

Jump tags, reading contexts and XGTagger

The 2004 United States elections caused less controversy than in 2000.See an article p.163 about the United States of America.

<paragraph>…………………………..<footnote>………………………………………………………………….</footnote>……………………………………………………………………</paragraph>

Page 7: XGTagger, a generic interface  dealing with XML contents

<book> <title>Advances in Information Retrieval </title></book>

1.

Advances in Information Retrieval

2.

System S(parser)3.

Advances NNSin IN Information///Retrieval NP

4. <book> <title> <w id=“1” pos=“NNS”>Advances</w> <w id=“2” pos=“IN”>in</w> <w id=“3” pos=“NP”>Information</w> <w id=“3” pos=“NP”>Retrieval</w> </title> </book>

5.

Example : Phrases

Page 8: XGTagger, a generic interface  dealing with XML contents

<element> I had a conversation with my brother</element>

1.

I had a conversation with my brother

2.

System S(translator)3.

I had a conversation/entretien/Gesprächwith my Brother/frère/Bruder

4. <element> <w>I</w> <w>had</w> <w>a</w> <w french=“entretien” german=“Gescpräch”>conversation </w> <w>with</w> <w>my</w> <w french=“frère” german=“Bruder”> brother</w> </element>

5.

Example : Translation