Transcript
Page 1: Linguistic markup and transclusion processing in XML documents

mFiL 2015 1

Linguistic markup and processing of transclusion in XML documentsSimon Dew BA MISTC6 November 2015

Copyright © Simon Dew 2015.This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Page 2: Linguistic markup and transclusion processing in XML documents

mFiL 2015 2

Transclusion

Page 3: Linguistic markup and transclusion processing in XML documents

mFiL 2015 3

Transclusion

• Theodor Holm Nelson, 1981: Literary Machines• The inclusion of an electronic document, or part of a document, in

the rendering of another document.• The main document does not contain a copy of the transcluded

text, but only a reference to it.• The software used to render the document obtains the transcluded

material and incorporates it into the main work.

Ted Nelson photo by DgiesLicensed under CC BY-SA 3.0

Page 4: Linguistic markup and transclusion processing in XML documents

mFiL 2015 4

Transclusion

This presentation focuses on transclusion in XML (Extensible Markup Language) documents, including, but not limited to:

• DocBook• DITA• TEI• XHTML

Page 5: Linguistic markup and transclusion processing in XML documents

mFiL 2015 5

Transclusion

Transclusion can be large scale / context-free:

Page 6: Linguistic markup and transclusion processing in XML documents

mFiL 2015 6

Transclusion

Transclusion can be small scale / parametrised:

Page 7: Linguistic markup and transclusion processing in XML documents

mFiL 2015 7

Transclusion

Transclusion can be small scale / parametrised:

• General entities

Definition:

<!ENTITY device "Euro 500">

Reference:

<title>Configuring the &device;</title>

Result:

<title>Configuring the Euro 500</title>

Page 8: Linguistic markup and transclusion processing in XML documents

mFiL 2015 8

Transclusion

Transclusion can be small scale / parametrised:

• General entities• XInclude

Definition:

<phrase xml:id="device">Euro 500</phrase>

Reference:

<title>Configuring the <xi:include xpointer="xpath(id('device')/node())"/></title>

Result:

<title>Configuring the Euro 500</title>

Page 9: Linguistic markup and transclusion processing in XML documents

mFiL 2015 9

Transclusion

Transclusion can be small scale / parametrised:

• General entities• XInclude• Specific transclusion mechanisms, e.g. DITA conref

Definition:

<ph id="device">Euro 500</para>

Reference:

<title>Configuring the <ph conref="device"/></title>

Result:

<title>Configuring the <ph>Euro 500</ph></title>

Page 10: Linguistic markup and transclusion processing in XML documents

mFiL 2015 10

Transclusion

Transcluded content may vary.

Page 11: Linguistic markup and transclusion processing in XML documents

mFiL 2015 11

Transclusion

Transcluded content may vary.

1. Local redefinition

Page 12: Linguistic markup and transclusion processing in XML documents

mFiL 2015 12

Transclusion

Transcluded content may vary.

1. Local redefinition

2.Conditional processing:

• Conditional profiling — DocBook• DITAVAL files — DITA

<xsl:param name="profile.vendor" select="'ACME'"/>

<val> <prop action="include" att="product" val="ACME"/> <prop action="exclude" att="product" val="Yoyodyne"/></val>

Page 13: Linguistic markup and transclusion processing in XML documents

mFiL 2015 13

Linguistic consequences

Page 14: Linguistic markup and transclusion processing in XML documents

mFiL 2015 14

Linguistic consequences

A different form of the transcluded word or phrase may be required depending on the environment into which it is placed:

• Orthography, e.g. writing systems with upper case• Syntactic case• Definiteness• Number• Others, e.g. initial consonant mutation

<title>_____ Details</title>

organisational unit[TITLE CASE]

Page 15: Linguistic markup and transclusion processing in XML documents

mFiL 2015 15

Linguistic consequences

A different form of the transcluded word or phrase may be required depending on the environment into which it is placed:

• Orthography, e.g. writing systems with upper case• Syntactic case• Definiteness• Number• Others, e.g. initial consonant mutation

<para>Om nödvändigt, välj _____.</para>

organisationsenhet[+DEFINITE]

Page 16: Linguistic markup and transclusion processing in XML documents

mFiL 2015 16

Linguistic consequences

If the transcluded word or phrase is the head of a phrase, it may demand agreement from dependent words.

• Phonetics• Gender• Number• Case• Definiteness

<para>Configuring a _____ Server</para>

Oz 500[_V]

Page 17: Linguistic markup and transclusion processing in XML documents

mFiL 2015 17

Linguistic consequences

If the transcluded word or phrase is the head of a phrase, it may demand agreement from dependent words.

• Phonetics• Gender• Number• Case• Definiteness

<para>Pour configurer le _____ auqel le modem est connecté :  </para>

tablette[_C] [FEM] [SING]

Page 18: Linguistic markup and transclusion processing in XML documents

mFiL 2015 18

Principles

Page 19: Linguistic markup and transclusion processing in XML documents

mFiL 2015 19

Principles

1. Linguistic markup scheme

Defining transcluded term:

• Mark up all forms of term to be transcluded• Mark up features which affect dependent words

Where transcluded term required:

• Mark up required form• Mark up dependent words

Page 20: Linguistic markup and transclusion processing in XML documents

mFiL 2015 20

Principles

2. Linguistic pre-processing

Page 21: Linguistic markup and transclusion processing in XML documents

mFiL 2015 21

Principles

2. Linguistic pre-processing

Page 22: Linguistic markup and transclusion processing in XML documents

mFiL 2015 22

Markup

Page 23: Linguistic markup and transclusion processing in XML documents

mFiL 2015 23

Markup

XML attributes

• Extend markup schema

• Wrapper element:DocBook <phrase>DITA <ph>HTML <span>

• Namespace:http://stanleysecurity.github.io/PACBook/ns/linguistics

• Prefix:ling

Page 24: Linguistic markup and transclusion processing in XML documents

mFiL 2015 24

Markup

ling:pron Phonetic environment. (V, C, ...)

ling:num Grammatical number.(sg, pl, ...)

ling:case Grammatical case.(nom, gen, dat, acc, ...)

ling:gen Grammatical gender.(c, m, f, n, ...)

ling:class Definiteness / inflectional class.(strong, weak, mixed, ind, def, ...)

ling:orth Orthographic case.(upper, lower, title, sentence)

ling:type head — form of a head word;depend — dependent word.

Page 25: Linguistic markup and transclusion processing in XML documents

mFiL 2015 25

Markup

Resource — features of head noun that demand agreement

<resource xl:label="Product_Name"> <phrase vendor="ACME" ling:pron="C">Euro 500</phrase> <phrase vendor="Yoyodyne" ling:pron="V">Oz 500</phrase></resource>

Phonetic environment:

⟨Euro⟩ / j ə ə /ˈ ʊ ɹ ʊ _C

⟨Oz⟩ / z /ˈɒ _V

Page 26: Linguistic markup and transclusion processing in XML documents

mFiL 2015 26

Markup

Resource — all possible forms of head noun:

<resource xl:label="Org_Unit"> <phrase ling:gen="c" ling:num="sg"> <phrase ling:type="head" ling:case="nom" ling:class="ind">organisationsenhet</phrase> <phrase ling:type="head" ling:case="gen" ling:class="ind">organisationsenhets</phrase> <phrase ling:type="head" ling:case="nom" ling:class="def">organisationsenheten</phrase> <phrase ling:type="head" ling:case="gen" ling:class="def">organisationsenhetens</phrase> </phrase></resource>

Page 27: Linguistic markup and transclusion processing in XML documents

mFiL 2015 27

Markup

Document — mark up required form of transcluded term

<para>Om nödvändigt, välj <phrase ling:class="def" content:ref="Org_Unit"/>.</para>

<title><phrase ling:orth="title" content:ref="Org_Unit"/> Details</title>

Page 28: Linguistic markup and transclusion processing in XML documents

mFiL 2015 28

Markup

Document — mark up dependent words in text

<title>Configuring <wordasword ling:type="depend">a</wordasword><phrase content:ref="Product_Name"/> Server</title>

<para>Wenn <phrase> <wordasword ling:type="depend">ein</wordasword> <phrase content:ref="Device"/> </phrase> konfiguriert wird, werden die Details <phrase> <wordasword ling:type="depend">der</wordasword> <phrase content:ref="Device" ling:case="gen"/> </phrase> auf der Weboberfläche angezeigt.</para>

Page 29: Linguistic markup and transclusion processing in XML documents

mFiL 2015 29

Dictionary

Page 30: Linguistic markup and transclusion processing in XML documents

mFiL 2015 30

Dictionary

Complies with dictionaries module of the TEI.

<entry n="a"> <form> <gramGrp><usg value="C"/></gramGrp> <orth>a</orth> </form> <form> <gramGrp><usg value="V"/></gramGrp> <orth>an</orth> </form></entry>

Page 31: Linguistic markup and transclusion processing in XML documents

mFiL 2015 31

Dictionary

<usg> Phonetic environment. (V, C, ...)

<num> Grammatical number.(sg, pl, ...)

<case> Grammatical case.(nom, gen, dat, acc, ...)

<gen> Grammatical gender.(c, m, f, n, ...)

<oVar> Definiteness / inflectional class.(strong, weak, mixed, ind, def, ...)

<orth> Output.

Page 32: Linguistic markup and transclusion processing in XML documents

mFiL 2015 32

Software

Page 33: Linguistic markup and transclusion processing in XML documents

mFiL 2015 33

Transformational stylesheets

PACBook XSLT transformations:

• LingHead.xsl — select the required declension of head nouns.• LingDepend.xsl — inflect dependent words.● LingCasing.xsl — sets the orthographic case of specified text.

Page 34: Linguistic markup and transclusion processing in XML documents

mFiL 2015 34

Transformational stylesheets

PACBook XSLT transformations:

• LingHead.xsl — select the required declension of head nouns.• LingDepend.xsl — inflect dependent words.• LingCasing.xsl — sets the orthographic case of specified text.

Licence:GNU Lesser General Public License (LGPL) v3

Repository:https://github.com/STANLEYSecurity/PACBook

Page 35: Linguistic markup and transclusion processing in XML documents

mFiL 2015 35

Limitations

● Only noun phrases.● Only tested with small handful of languages.● Linguistic markup different for translated texts.● Linguistic markup can be complex for authors.

Page 36: Linguistic markup and transclusion processing in XML documents

mFiL 2015 36

Related work

● Various linguistic markup schemas / ontologies● Internationalisation markup● Nothing else?● What should we call this?

Page 37: Linguistic markup and transclusion processing in XML documents

mFiL 2015 37

Collaboration

● Dictionary — Wiktionary.● Testing and improving.● Integrating with other publication workflows.

Development fork:https://github.com/janiveer/PACBook

Page 38: Linguistic markup and transclusion processing in XML documents

mFiL 2015 38

Examples

Page 39: Linguistic markup and transclusion processing in XML documents

mFiL 2015 39

Example

Resource:

<resource xl:label="Doc"> <phrase outputformat="PDF" ling:gen="n" ling:num="sg"> <phrase ling:type="head" ling:case="nom">Dokument</phrase> <phrase ling:type="head" ling:case="acc">Dokument</phrase> <phrase ling:type="head" ling:case="gen">Dokuments</phrase> <phrase ling:type="head" ling:case="dat">Dokument</phrase> </phrase> <phrase outputformat="CHM" ling:gen="f" ling:num="sg"> <phrase ling:type="head" ling:case="nom">Hilfedatei</phrase> <phrase ling:type="head" ling:case="acc">Hilfedatei</phrase> <phrase ling:type="head" ling:case="gen">Hilfedatei</phrase> <phrase ling:type="head" ling:case="dat">Hilfedatei</phrase> </phrase></resource>

Page 40: Linguistic markup and transclusion processing in XML documents

mFiL 2015 40

Example

Document:

<para>Die Einstellung der IP-Adresse ist in<wordasword ling:type="depend">dies</wordasword><phrase content:ref="Doc" ling:case="dat"/>nicht enthalten.</para>

Page 41: Linguistic markup and transclusion processing in XML documents

mFiL 2015 41

Example

After transclusion:

<para>Die Einstellung der IP-Adresse ist in<wordasword ling:type="depend">dies</wordasword><phrase ling:case="dat"> <phrase outputformat="PDF" ling:gen="n" ling:num="sg"> <phrase ling:type="head" ling:case="nom">Dokument</phrase> <phrase ling:type="head" ling:case="acc">Dokument</phrase> <phrase ling:type="head" ling:case="gen">Dokuments</phrase> <phrase ling:type="head" ling:case="dat">Dokument</phrase> </phrase> <phrase outputformat="CHM" ling:gen="f" ling:num="sg"> <phrase ling:type="head" ling:case="nom">Hilfedatei</phrase> <phrase ling:type="head" ling:case="acc">Hilfedatei</phrase> <phrase ling:type="head" ling:case="gen">Hilfedatei</phrase> <phrase ling:type="head" ling:case="dat">Hilfedatei</phrase> </phrase></phrase>nicht enthalten.</para>

Page 42: Linguistic markup and transclusion processing in XML documents

mFiL 2015 42

Example

After head transformation:

<para>Die Einstellung der IP-Adresse ist in<wordasword ling:type="depend">dies</wordasword><phrase ling:case="dat"> <phrase outputformat="PDF" ling:gen="n" ling:num="sg"> <phrase ling:type="head" ling:case="dat">Dokument</phrase> </phrase> <phrase outputformat="CHM" ling:gen="f" ling:num="sg"> <phrase ling:type="head" ling:case="dat">Hilfedatei</phrase> </phrase></phrase>nicht enthalten.</para>

Page 43: Linguistic markup and transclusion processing in XML documents

mFiL 2015 43

Example

After conditional processing:

<para>Die Einstellung der IP-Adresse ist in<wordasword ling:type="depend">dies</wordasword><phrase ling:case="dat"> <phrase outputformat="PDF" ling:gen="n" ling:num="sg"> <phrase ling:type="head" ling:case="dat">Dokument</phrase> </phrase></phrase>nicht enthalten.</para>

<para>Die Einstellung der IP-Adresse ist in<wordasword ling:type="depend">dies</wordasword><phrase ling:case="dat"> <phrase outputformat="CHM" ling:gen="f" ling:num="sg"> <phrase ling:type="head" ling:case="dat">Hilfedatei</phrase> </phrase></phrase>nicht enthalten.</para>

Page 44: Linguistic markup and transclusion processing in XML documents

mFiL 2015 44

Example

After dependent transformation:

<para>Die Einstellung der IP-Adresse ist in<wordasword ling:type="depend">diesem</wordasword><phrase ling:case="dat"> <phrase outputformat="PDF" ling:gen="n" ling:num="sg"> <phrase ling:type="head" ling:case="dat">Dokument</phrase> </phrase></phrase>nicht enthalten.</para>

<para>Die Einstellung der IP-Adresse ist in<wordasword ling:type="depend">dieser</wordasword><phrase ling:case="dat"> <phrase outputformat="CHM" ling:gen="f" ling:num="sg"> <phrase ling:type="head" ling:case="dat">Hilfedatei</phrase> </phrase></phrase>nicht enthalten.</para>

Page 45: Linguistic markup and transclusion processing in XML documents

mFiL 2015 45

Questions?

Page 46: Linguistic markup and transclusion processing in XML documents

mFiL 2015 46

References● [Nelson] Theodor Holm Nelson. 1981. Literary Machines. Mindful Press, Sausalito, California.

● [XML] Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, François Yergeau, editors. 26 November 2008. Extensible Markup Language (XML) 1.0 (Fifth Edition). World Wide Web Consortium (W3C).

● [DocBook] DocBook Technical Committee. 1 November 2009. The DocBook Schema Version 5.0. Organization for the Advancement of Structured Information Standards (OASIS).

● [DITA] OASIS DITA Technical Committee. 1 December 2010. Darwin Information Typing Architecture (DITA) Version 1.2. Organization for the Advancement of Structured Information Standards (OASIS).

● [TEI] TEI Consortium, eds. 20 January 2014. TEI P5: Guidelines for Electronic Text Encoding and Interchange, 2.6.0. TEI Consortium.

● [HTML] Ian Hickson, Robin Berjon, Steve Faulkner, Travis Leithead, Erika Doyle Navara, Edward O’Connor, Silvia Pfeiffer, editors. 28 October 2014. HTML5. World Wide Web Consortium (W3C).

● [XInclude] Jonathan Marsh, David Orchard, and Daniel Veillard, editors. 15 November 2006. XML Inclusions (XInclude) Version 1.0 (Second Edition). World Wide Web Consortium (W3C).

● [XSLT] James Clark, editor. 16 November 1999. XSL Transformations (XSLT) Version 1.0. World Wide Web Consortium (W3C).

● [Ant] Stephane Bailliez, et al. December 29, 2013. Apache Ant™ 1.9.3 Manual. The Apache Software Foundation.

● [XProc] Norman Walsh, Alex Milowski, and Henry S. Thompson, editors. 11 May 2010. XProc: An XML Pipeline Language. World Wide Web Consortium (W3C).

● [XLIFF] OASIS XLIFF Technical Committee. 1 February 2008. XML Localisation Interchange File Format (XLIFF) Version 1.2. Organization for the Advancement of Structured Information Standards (OASIS).

● [GOLD] Scott Farrar and D. Terence Langendoen. 2003. A linguistic ontology for the Semantic Web. GLOT International. 7 (3), pp.97-100.

● [ISOcat] M. Kemps-Snijders, M.A. Windhouwer, P. Wittenburg, S.E. Wright. November 2009. ISOcat: Remodeling Metadata for Language Resources. International Journal of Metadata, Semantics and Ontologies (IJMSO), 4(4), pp 261-276.

● [ICU] ICU Project Management Committee. 7 October 2015. ICU 56. ICU — International Components for Unicode.


Recommended