29
ISO 16642 - a tutorial Part 2: Representing data categories TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria

ISO 16642 - a tutorial Part 2: Representing data categories

  • Upload
    aya

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

ISO 16642 - a tutorial Part 2: Representing data categories. TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria. Why formalizing DatCats?. Systematizing data category description: Notion of Data Category Registry (DCR) I need a data category: is it there? - PowerPoint PPT Presentation

Citation preview

Page 1: ISO 16642 - a tutorial Part 2:  Representing data categories

ISO 16642 - a tutorialPart 2: Representing data

categories

TMF - Terminological Markup Framework

Laurent Romary - Laboratoire Loria

Page 2: ISO 16642 - a tutorial Part 2:  Representing data categories

Why formalizing DatCats?

Systematizing data category description:– Notion of Data Category Registry (DCR)

• I need a data category: is it there?– Query by name, definition etc.

Automatizing processes:– Format control of TMLs– Filters from one TML to GMT

Page 3: ISO 16642 - a tutorial Part 2:  Representing data categories

Which model for DatCats?

Using XML:– Coherence with TMF principles– Using stylesheet to generate schemas and filters

Using RDF (Resource Description Framework)– Intended format for representing meta-data:

• Description of a DatCat is meta-data with regards TMF

Page 4: ISO 16642 - a tutorial Part 2:  Representing data categories

RDF - a quick presentation

Cf. other file

Page 5: ISO 16642 - a tutorial Part 2:  Representing data categories

Data Categories

A Formal Description

Page 6: ISO 16642 - a tutorial Part 2:  Representing data categories

Data Category Registry

dcsd:DataCategory

rdf:about

Data Category

DCRegistryDCRegistry

DescriptionDescription

VersionNumber

dcsd:VersionNumber

Page 7: ISO 16642 - a tutorial Part 2:  Representing data categories

Data Category description

DCDefinition

DCName

Content

dcsd:DCDefinition

dcsd:DCName

dcsd:Content

dcsd:DCIdentifier

dcsd:Level

DCType (S, C)dcsd:DCType

Salt 2000-11-08/SEW

dcsd:DCAdmin

DCComment

dcsd:DCComment

Data Category

Locus

DCAdmin

DCIdentifierDCParent

dcsd:DCParent

DCExample

dcsd:DCExample

Page 8: ISO 16642 - a tutorial Part 2:  Representing data categories

Simple and complex DatCats

Complex data categories– shall serve as field identifiers (not names) in databases

and can have content. The datatype for this content shall be declared for each data category and can commonly take the form of different categories of text, defined data types (such as dates), and specified data domains, e.g., picklists comprising standardized permissible instances.

» Example: /Part of Speech/

Simple data categories– shall serve as the content of complex data categories.

» Example: /Noun/, /Verb/, /Adjective/ etc.

Page 9: ISO 16642 - a tutorial Part 2:  Representing data categories

Levels and content

Content

DataType TargetType

Ref to other datcat(s)

dcsd:DataType dcsd:TargetType

rdf:Alt

rdf:li

List of References

List of References

Ref to other datcats

rdf:Alt

rdf:li

Level/Loci

rdf:Alt

Ref to other datcat(s)

rdf:li

List of References

Page 10: ISO 16642 - a tutorial Part 2:  Representing data categories

Administrative properties

dcsd:DCAdmin

Data Category

DCAdmin

Status

dcsd:Status

StatusDatedcsd:StatusDate

StatusNote

dcsd:StatusNote

EditionDate

dcsd:EditionDate

ShortForm AdmittedName ForbiddenName

Source

dcsd:Source

VariantNames

dcsd:VariantNames

Dcsd:ShortFormDcsd:AdmittedName Dcsd:ForbiddenName

Page 11: ISO 16642 - a tutorial Part 2:  Representing data categories

RDF Representation

Page 12: ISO 16642 - a tutorial Part 2:  Representing data categories

/term/ - RDF description (1)

<dcsd:DataCategory dcsd:DCIdentifier="ISO12620A01"dcsd:DCName="term"dcsd:position="A.01"dcsd:DCType="C">

<dcsd:DCDefinition> A verbal designation of a generalconcept in a specific subject field </dcsd:DCDefinition>

<dcsd:DCComment><dcsd:sourceComment>For definition of related term, see

ISO 1087-1, 3.4.3.</dcsd:sourceComment><dcsd:conceptComment>Terms can consist of single words

or be composed of multiword strings…</dcsd:conceptComment><dcsd:Example>"radix" in annex C, figure

C.1.</dcsd:Example><dcsd:DictionnaryID>A.1</dcsd:DictionnaryID>

</dcsd:DCComment>

Page 13: ISO 16642 - a tutorial Part 2:  Representing data categories

/term/ - RDF description (2)

<dcsd:Content dcsd:DataType="plainText"/> <dcsd:Level>

<rdf:Alt><rdf:li>TL</rdf:li><rdf:li>TC</rdf:li>

</rdf:Alt></dcsd:Level><dcsd:DCAdmin dcsd:OrgSource="ISO TC 37"

dcsd:DocSource="ISO12620:1999"dcsd:subDate="2000-10-20 SEW"dcsd:registryComment="Prepared

2000-10-20"dcsd:Status="Accepted"/>

</dcsd:DataCategory>

Page 14: ISO 16642 - a tutorial Part 2:  Representing data categories

/term type/ - RDF description (1)

<dcsd:DataCategory dcsd:DCIdentifier="ISO12620A0201"dcsd:DCName="term type"dcsd:position="A.02.01"dcsd:DCType="C">

<dcsd:DCDefinition>An attribute assigned to aterm</dcsd:DCDefinition>

<dcsd:DCComment><dcsd:DictionnaryID>A.2.1</dcsd:DictionnaryID>

</dcsd:DCComment><dcsd:Content dcsd:DataType="picklist">

<rdf:Alt><rdf:li>ISO12620A020101</rdf:li><rdf:li>ISO12620A020102</rdf:li><rdf:li>ISO12620A020119</rdf:li>

</rdf:Alt></dcsd:Content>

Page 15: ISO 16642 - a tutorial Part 2:  Representing data categories

/term type/ - RDF description (2)

<dcsd:Level><rdf:Alt>

<rdf:li>TL</rdf:li><rdf:li>TC</rdf:li>

</rdf:Alt></dcsd:Level><dcsd:DCAdmin dcsd:OrgSource="ISO TC 37"

dcsd:DocSource="ISO12620:1999"dcsd:subDate="2000-10-20 SEW"dcsd:registryComment="Prepared

2000-10-20"dcsd:Status="Accepted"/>

</dcsd:DataCategory>

Page 16: ISO 16642 - a tutorial Part 2:  Representing data categories

Actualizing a DatCat

TMF specific properties

Page 17: ISO 16642 - a tutorial Part 2:  Representing data categories

Styling properties

dcsd:Style

Data Category

Style

StyleName

dcsd:StyleName

ElementNamedcsd:ElementName

AttributeName

dcsd:AttributeName

TypeValue

dcsd:TypeValue

Simple

ElementAttribute

TypedElementValuedElementTVElement

Value

dcsd:Value

For ‘ Simple ’

AnchorInfo

dcsd:Anchor

AnchorLevel

Page 18: ISO 16642 - a tutorial Part 2:  Representing data categories

Attribute style description

• dcsd:StyleName="Attribute"

– Conditions of use:• Not valid for annotations

– Required properties• dcsd:AttributeName

– Example:• dcsd:AttributeName="id"

• <anchorElement id="xx54893">…</>

Page 19: ISO 16642 - a tutorial Part 2:  Representing data categories

Element style description

• dcsd:StyleName="Element"

– Required properties• dcsd:ElementName

– Example:• dcsd: ElementName ="definition"

• <definition>…</definition>

Page 20: ISO 16642 - a tutorial Part 2:  Representing data categories

TypedElement style description

• dcsd:StyleName="TypedElement"

– Required properties• dcsd:ElementName, dcsd:TypeValue

– Example:• dcsd:ElementName ="termNote"

• dcsd:TypeValue="partOfSpeech"

• <termNote type="partOfSpeech"/>N</termNote>

Page 21: ISO 16642 - a tutorial Part 2:  Representing data categories

ValuedElement style description

• dcsd:StyleName="ValuedElement"

– Conditions of use:• Not valid for annotations

– Required properties• dcsd:ElementName

– Example:• dcsd:ElementName ="pos"

• <pos value="noun"/>

Page 22: ISO 16642 - a tutorial Part 2:  Representing data categories

TVElement style description

• dcsd:StyleName="TVElement"

– Conditions of use:• Not valid for annotations

– Required properties• dcsd:ElementName, dcsd:TypeValue

– Example:• dcsd:ElementName ="free"• dcsd:TypeValue="pos"

• <free type="pos" value="noun"/>

Page 23: ISO 16642 - a tutorial Part 2:  Representing data categories

Simple style description

• dcsd:StyleName="Simple"

– Conditions of use:• Express the value of simple data categories

– Required properties:• dcsd:Value

– Example:• dcsd:Value ="Nom"

• <pos>Nom</pos>

Page 24: ISO 16642 - a tutorial Part 2:  Representing data categories

Dealing with languages

Page 25: ISO 16642 - a tutorial Part 2:  Representing data categories

Two types of languages

Working language• The language used at a given place in a document,

along the XML hierarchy

• Representation: xml:lang

Object language• The language about which you speak at a given place

in your terminological entry (e.g. describes the Language Section level)

• Representation: as a data category "language", with a narrow scope

Page 26: ISO 16642 - a tutorial Part 2:  Representing data categories

Example — DXLT

<langSet lang='en’ xml:lang="fr"><descrip type="definition">Une valeur entre 0 et 1 utilisée...</descrip><tig>

<term xml:lang="en">alpha smoothing factor</term>

<termNote type="termType">fullForm</termNote></tig>

</langSet>

Page 27: ISO 16642 - a tutorial Part 2:  Representing data categories

Example — GMT

<struct type="LS" xml:lang="fr"><feat type="language">en</feat><feat type="definition">Une valeur entre 0 et 1 utilisée...</feat><struct type="TL">

<feat type="term" xml:lang="en">alpha smoothing factor</feat>

<feat type="termType">fullForm</feat></struct>

</langSet>

Page 28: ISO 16642 - a tutorial Part 2:  Representing data categories

Conclusion

– A general model for analysing and representing terminological data collection

– An underlying formalism expressed in XML,RDF

– Associated tools (Salt project)• DCSEditor,

• DCSBrowser,

• Automatic generation of XSLT filters and XML schemas from a given TML specification

Page 29: ISO 16642 - a tutorial Part 2:  Representing data categories

Useful pointers

SALT project– http://www.loria.fr/projets/SALT– http://www.ttt.org/

The TMF site– http://www.loria.fr/projets/TMF