Click here to load reader

Linguistic markup and transclusion processing in XML documents

  • View
    267

  • Download
    2

Embed Size (px)

Text of Linguistic markup and transclusion processing in XML documents

  1. 1. mFiL 2015 1 Linguistic markup and processing of transclusion in XML documents Simon Dew BA MISTC 6 November 2015 Copyright Simon Dew 2015. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
  2. 2. mFiL 2015 2 Transclusion
  3. 3. mFiL 2015 3 Transclusion Theodor Holm Nelson, 1981: Literary Machines The inclusion of an electronic document, or part of a document, in the rendering of another document. The main document does not contain a copy of the transcluded text, but only a reference to it. The software used to render the document obtains the transcluded material and incorporates it into the main work. Ted Nelson photo by Dgies Licensed under CC BY-SA 3.0
  4. 4. mFiL 2015 4 Transclusion This presentation focuses on transclusion in XML (Extensible Markup Language) documents, including, but not limited to: DocBook DITA TEI XHTML
  5. 5. mFiL 2015 5 Transclusion Transclusion can be large scale / context-free:
  6. 6. mFiL 2015 6 Transclusion Transclusion can be small scale / parametrised:
  7. 7. mFiL 2015 7 Transclusion Transclusion can be small scale / parametrised: General entities Definition: Reference: Configuring the &device; Result: Configuring the Euro 500
  8. 8. mFiL 2015 8 Transclusion Transclusion can be small scale / parametrised: General entities XInclude Definition: Euro 500 Reference: Configuring the Result: Configuring the Euro 500
  9. 9. mFiL 2015 9 Transclusion Transclusion can be small scale / parametrised: General entities XInclude Specific transclusion mechanisms, e.g. DITA conref Definition: Euro 500 Reference: Configuring the Result: Configuring the Euro 500
  10. 10. mFiL 2015 10 Transclusion Transcluded content may vary.
  11. 11. mFiL 2015 11 Transclusion Transcluded content may vary. 1. Local redefinition
  12. 12. mFiL 2015 12 Transclusion Transcluded content may vary. 1. Local redefinition 2.Conditional processing: Conditional profiling DocBook DITAVAL files DITA
  13. 13. mFiL 2015 13 Linguistic consequences
  14. 14. mFiL 2015 14 Linguistic consequences A different form of the transcluded word or phrase may be required depending on the environment into which it is placed: Orthography, e.g. writing systems with upper case Syntactic case Definiteness Number Others, e.g. initial consonant mutation _____ Details organisational unit [TITLE CASE]
  15. 15. mFiL 2015 15 Linguistic consequences A different form of the transcluded word or phrase may be required depending on the environment into which it is placed: Orthography, e.g. writing systems with upper case Syntactic case Definiteness Number Others, e.g. initial consonant mutation Om ndvndigt, vlj _____. organisationsenhet [+DEFINITE]
  16. 16. mFiL 2015 16 Linguistic consequences If the transcluded word or phrase is the head of a phrase, it may demand agreement from dependent words. Phonetics Gender Number Case Definiteness Configuring a _____ Server Oz 500 [_V]
  17. 17. mFiL 2015 17 Linguistic consequences If the transcluded word or phrase is the head of a phrase, it may demand agreement from dependent words. Phonetics Gender Number Case Definiteness Pour configurer le _____ auqel le modem est connect : tablette [_C] [FEM] [SING]
  18. 18. mFiL 2015 18 Principles
  19. 19. mFiL 2015 19 Principles 1. Linguistic markup scheme Defining transcluded term: Mark up all forms of term to be transcluded Mark up features which affect dependent words Where transcluded term required: Mark up required form Mark up dependent words
  20. 20. mFiL 2015 20 Principles 2. Linguistic pre-processing
  21. 21. mFiL 2015 21 Principles 2. Linguistic pre-processing
  22. 22. mFiL 2015 22 Markup
  23. 23. mFiL 2015 23 Markup XML attributes Extend markup schema Wrapper element: DocBook DITA HTML Namespace: http://stanleysecurity.github.io/PACBook/ns/linguistics Prefix: ling
  24. 24. mFiL 2015 24 Markup ling:pron Phonetic environment. (V, C, ...) ling:num Grammatical number. (sg, pl, ...) ling:case Grammatical case. (nom, gen, dat, acc, ...) ling:gen Grammatical gender. (c, m, f, n, ...) ling:class Definiteness / inflectional class. (strong, weak, mixed, ind, def, ...) ling:orth Orthographic case. (upper, lower, title, sentence) ling:type head form of a head word; depend dependent word.
  25. 25. mFiL 2015 25 Markup Resource features of head noun that demand agreement Euro 500 Oz 500 Phonetic environment: Euro / j / _C Oz / z / _V
  26. 26. mFiL 2015 26 Markup Resource all possible forms of head noun: organisationsenhet organisationsenhets organisationsenheten organisationsenhetens
  27. 27. mFiL 2015 27 Markup Document mark up required form of transcluded term Om ndvndigt, vlj . Details
  28. 28. mFiL 2015 28 Markup Document mark up dependent words in text Configuring a Server Wenn ein konfiguriert wird, werden die Details der auf der Weboberflche angezeigt.
  29. 29. mFiL 2015 29 Dictionary
  30. 30. mFiL 2015 30 Dictionary Complies with dictionaries module of the TEI. a an
  31. 31. mFiL 2015 31 Dictionary Phonetic environment. (V, C, ...) Grammatical number. (sg, pl, ...) Grammatical case. (nom, gen, dat, acc, ...) Grammatical gender. (c, m, f, n, ...) Definiteness / inflectional class. (strong, weak, mixed, ind, def, ...) Output.
  32. 32. mFiL 2015 32 Software
  33. 33. mFiL 2015 33 Transformational stylesheets PACBook XSLT transformations: LingHead.xsl select the required declension of head nouns. LingDepend.xsl inflect dependent words. LingCasing.xsl sets the orthographic case of specified text.
  34. 34. mFiL 2015 34 Transformational stylesheets PACBook XSLT transformations: LingHead.xsl select the required declension of head nouns. LingDepend.xsl inflect dependent words. LingCasing.xsl sets the orthographic case of specified text. Licence: GNU Lesser General Public License (LGPL) v3 Repository: https://github.com/STANLEYSecurity/PACBook
  35. 35. mFiL 2015 35 Limitations Only noun phrases. Only tested with small handful of languages. Linguistic markup different for translated texts. Linguistic markup can be complex for authors.
  36. 36. mFiL 2015 36 Related work Various linguistic markup schemas / ontologies Internationalisation markup Nothing else? What should we call this?
  37. 37. mFiL 2015 37 Collaboration Dictionary Wiktionary. Testing and improving. Integrating with other publication workflows. Development fork: https://github.com/janiveer/PACBook
  38. 38. mFiL 2015 38 Examples
  39. 39. mFiL 2015 39 Example Resource: Dokument Dokument Dokuments Dokument Hilfedatei Hilfedatei Hilfedatei Hilfedatei
  40. 40. mFiL 2015 40 Example Document: Die Einstellung der IP-Adresse ist in dies nicht enthalten.
  41. 41. mFiL 2015 41 Example After transclusion: Die Einstellung der IP-Adresse ist in dies Dokument Dokument Dokuments Dokument Hilfedatei Hilfedatei Hilfedatei Hilfedatei nicht enthalten.
  42. 42. mFiL 2015 42 Example After head transformation: Die Einstellung der IP-Adresse ist in dies Dokument Hilfedatei nicht enthalten.
  43. 43. mFiL 2015 43 Example After conditional processing: Die Einstellung der IP-Adresse ist in dies Dokument nicht enthalten. Die Einstellung der IP-Adresse ist in dies Hilfedatei nicht enthalten.
  44. 44. mFiL 2015 44 Example After dependent transformation: Die Einstellung der IP-Adresse ist in diesem Dokument nicht enthalten. Die Einstellung der IP-Adresse ist in dieser Hilfedatei nicht enthalten.
  45. 45. mFiL 2015 45 Questions?
  46. 46. mFiL 2015 46 References [Nelson] Theodor Holm Nelson. 1981. Literary Machines. Mindful Press, Sausalito, California. [XML] Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, Franois Yergeau, editors. 26 November 2008. Extensible Markup Language (XML) 1.0 (Fifth Edition). World Wide Web Consortium (W3C). [DocBook] DocBook Technical Committee. 1 November 2009. The DocBook Schema Version 5.0. Organization for the Advancement of Structured Information Standards (OASIS). [DITA] OASIS DITA Technical Committee. 1 December 2010. Darwin Information Typing Architecture (DITA) Version 1.2. Organization for the Advancement of Structured Information Standards (OASIS). [TEI] TEI Consortium, eds. 20 January 2014. TEI P5: Guidelines for Electronic Text Encoding and Interchange, 2.6.0. TEI Consortium. [HTML] Ian Hickson, Robin Berjon, Steve Faulkner, Travis Leithead, Erika Doyle Navara, Edward OConnor, Silvia Pfeiffer, editors. 28 October 2014. HTML5. World Wide Web Consortium (W3C). [XInclude] Jonathan Marsh, David Orchard, and Daniel Veillard, editors. 15 November 2006. XML Inclusions (XInclude) Version 1.0 (Second Edition). World Wide Web Consortium (W3C). [XSLT] James Clark, editor. 16 November 1999. XSL Transformations (XSLT) Version 1.0. World Wide Web Consortium (W3C). [Ant] Stephane Bailliez, et al. December 29, 2013. Apache Ant 1.9.3 Manual. The Apache Software Foundation. [XProc] Norman Walsh, Alex Milowski, and Henry S. Thompson, editors. 11 May 2010. XProc: An XML Pipeline Language. World Wide Web Consortium (W3C). [XLIFF] OASIS XLIFF Technical Committee. 1 February 2008. XML Localisation Interchange File Format (XLIFF) Version 1.2. Organization for the Advancement of Structured Information Standards (OASIS). [GOLD] Scott Farrar and D. Terence Langendoen. 2003. A linguistic ontology for the Semantic Web. GLOT International. 7 (3), pp.97-100. [ISOcat] M. Kemps-Snijders, M.A. Windhouwer, P. Wittenburg, S.E. Wright. November 2009. ISOcat: Remodeling Metadata for Language Resources. International Journal of Metadata, Semantics and Onto

Search related