41
The use of SGML and XML at the Publications Office Dr. Holger Bagola Dir A – Cell “Formats” [email protected]

The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

Dr. Holger BagolaDir A – Cell “Formats”[email protected]

Page 2: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

2

Table of contents

• Historical overview

• Formex

• Other areas of XML usage

• Conclusion

dator8.info

Page 3: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

3

Table of contents

• Historical overview

• Formex

• Other areas of XML usage

• Conclusion

dator8.info

Page 4: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

4

Historical overview

• Among the missions of the Publications Office:

– Archiving of legislative publications

• Choice of SGML

– Independent from any platform

– Distinction between structure and presentation

– Support for synoptic document management in a multilingual environment

• Migration to XML

– Basic advantage: availability of tools

dator8.info

Page 5: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

5

Table of contents

• Historical overview

• Formex

• Other areas of XML usage

• Conclusion

dator8.info

Page 6: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

6

Formex (1)

• SGML versions

–Version 1: adopted in 1984

• First deliveries in 1985

• Characteristics:

– Mixture of SGML and CCF (Common Communication Format) for meta-data,

– Markup not very detailed

– Character encoding based on ISO 2022

Page 7: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

7

Formex (2)

–Version 2: adopted in 1989, revised in 1992

• First deliveries in 1989

• Characteristics:

– Mixture of SGML and CCF (Common Communication Format) for meta-data,

– Introduction of a logical table model

– Character encoding based on ISO 2022

Page 8: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

8

Formex (3)

–Version 3: adopted in 1999

• Beginning of the specifications 1994

• First deliveries in 1999

• Characteristics:

– Markup of semantic role of a document component

– Definition of text entities for 11 languages

– Character encoding based on ISO 2022 (after discussion of moving to Unicode UTF-8)

Page 9: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

9

Formex (4)

• XML version

–Version 4: adopted in 2004

• First deliveries in 2004

• Characteristics:

– XML,

– Character encoding based on Unicode (UTF-8)

Page 10: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

10

Formex (5)

• Basic principles

–XML Schema instead of DTD

–One single schema

–Number of root elements 12 instead of 30

–Number of elements about 350 instead of 1200

–Distinction between semantic and physical markup

Page 11: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

11

Formex (6)

ARTICLE (TI.ARTICLE, (PARAG+ | ALINEA+))

TI.ARTICLE (#PCDATA)

PARAG (NO.PARAG, ALINEA+)

NO.PARAG (#PCDATA)

ALINEA ((#PCDATA | NOTE | HT| FT)* |

(P | LIST | TABLE)+)

. . .

Blue: semantic markup

Red: physical markup

Page 12: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

12

Formex (7)

• Table model

–Analysis of CALS, HTML, Formex v. 3

–Choice:

• Model close to HTML (top-down approach, nested tables)

• Maintenance of semantic information such as in Formex v. 3

Page 13: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

13

Formex (8)

• Footnotes–Distinction between notes in text and

tables for readability and production simplicity

– Insertion of text notes into the surrounding text

– ID/IDREF to signal identical footnotes

–Numbering is an object of presentation

–Table notes assembled at the top of the table

Page 14: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

14

Formex (9)

• Quotations

–Structured quotations vs. ‘#PCDATA’quotations

–Elements signaling start and end of a quotation (quotation marks)

–Element with function of a container for structured quotations.

Page 15: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

15

Formex (10)

Example:Article 2

In article 1(2) of regulation (EC) 1234/94 the word ‘car’ is replaced by ‘bus’.

Article 6 of the same regulation is replaced by the following text:

‘Article 6

This is the new text of article 6.’

Page 16: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

16

Formex (11)

Example:

<ARTICLE IDENTIFIER=“002”><TI.ARTICLE>Article 2</TI.ARTICLE><ALINEA>In article 1(2) of regulation (EC) 1234/94 the <QUOT.START ID=“QS0001” REF.END=“QE0001” CODE=“2018”/>car <QUOT.END ID=“QE0001”REF.START=“QS0001” CODE=“2019”/> is replaced by <QUOT.START ID=“QS0002”REF.END=“QE0002” CODE=“2019”/>bus<QUOT.END ID=“QE0002”REF.START=“QS0002” CODE=“2019”/>.</ALINEA><ALINEA>

<P>Article 6 of the same regulation is replaced by the following text:</P>

<QUOT.S><ARTICLE IDENTIFIER=“006”>

<TI.ARTICLE><QUOT.START ID=“QS0003”REF.END=“QE0003” CODE=“2018”/>Article 6</TI.ARTICLE>

<ALINEA>This is the new text of article 6.<QUOT.END ID=“QE0003” REF.START=“QS0003” CODE=“2019”/></ALINEA>

</ARTICLE></QUOT.S>

</ALINEA></ARTICLE>

Page 17: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

17

Formex (12)

• Splitting large documents

–Fragmentation by definition of inclusions for the main document

–Secondary instances referencing the inclusions by means of XML entity mechanism

– Inclusions may not necessarily be valid XML instances

Page 18: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

18

Formex (13)

main.xml

<?xml version=“1.0”?><doc>

<ti>title</ti><chap no=“1”>

<incl ref=“frag-1.frg”/></chap>

</doc>

frag-1.frg

<text>…</text><text>…</text>

container.xml

<?xml version=“1.0”?><!DOCTYPE frag [<!ENTITY cnt SYSTEM “frag-1.frg”>]><frag>&cnt;</frag>

Page 19: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

19

Formex (14)

• Character set

–OJ publications in 20 (21) languages

–Different alphabets

– International character set definition Unicode (UTF-8)

–Definition of allowed character ranges

–Special font ‘EU-Albertina’

Page 20: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

20

Formex (15)

• Meta-data

–OJ publications are composed of different levels:

• Publication

• Document

• ‘Contents’

–Meta-data separated according to these levels

Page 21: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

21

Formex (16)

Publication

Meta-data concerning the publication

Structure of thepublication withreferences to documents

Document

Meta-data for document

References to components

Document

Meta-data for document

References to components

Contentsmain part001

ContentsAnnex 1001.001

ContentsAnnex 2001.002

Contentsmain part002

ProCat

Page 22: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

22

Formex (17)

• Meta-data (continued)– Extraction of meta-data by means of

automatic processes (pre-notices)– Extension of pre-notices by juridical analysis

– Availability of notices in ProCat for other productions (Celex) and projects

Page 23: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

23

Formex (18)

• Final remark on Formex specifications

–Only few complete production chains from the author to the printer

–Concentration on publication of Official

Journal

Page 24: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

24

Formex (19)

• Validation of Formex deliveries

– In-depth validation necessary

–Automatic procedures

–Manual procedures

Page 25: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

25

Formex (20)

• Validation of Formex deliveries (continued)–Automatic procedures

• Control of filename conventions

• Parsing of various components

• Control of completeness

• Execution of additional validation rules

• Comparison of contents between Formex and PDF

⇒ Report (XML instance)

Page 26: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

26

Formex (21)

• Validation of Formex deliveries (continued)

–Manual procedures

• Verification of the report generated by the automatic validation procedure

• Control of the use of Formex specifications in all language versions

⇒ Report (XML instance) = basis forarchiving or rejection

Page 27: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

27

Formex (22)

• Conversion of Formex v. 3 into Formex v. 4– Conversion of character set (ISO 2020 – UTF8)

– Transformation of SGML instances into well-formed XML instances

– Extraction of tables and conversion into an intermediate model

– Generation of meta-data levels

– Conversion of old elements and generation of new elements

– Validation of the results

Page 28: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

28

Formex (23)

• Specifications:

http://formex.publications.eu.int/

Page 29: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

29

Table of contents

• Historical overview

• Formex

• Other areas of XML usage

• Conclusion

Page 30: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

30

Other areas of XML usage (1)

• Index of OJ publications

–Biannual issues

–Monthly issues

–Extraction from Celex/ProCat

–Transformation into PDF by means of XSLT and XSL FO (biannual version only)

Page 31: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

31

Other areas of XML usage (2)

• Consolidation of legal documents

–Mainly based on Formex

–Additional administrative data in XML

–Relations between historical levels

• Description of the composition of a given historical level

• Concordance of information on numbering schemes (articles, …) for each level

Page 32: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

32

Other areas of XML usage (3)

• Conversion to RTF

–Compatibility with other EU services

– Input in SGML or XML

–Results with LegisWrite templates

Page 33: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

33

Other areas of XML usage (4)

SGML instance

(Formex v. 3)

Characterconversion

Transformationinto well-

formed XML

Transformation into internalXML format

Transformationinto RTF

(LegisWrite)

Output inRTF (Legis-

Write)

XMLinstance

(Formex v. 4)

Page 34: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

34

Other areas of XML usage (5)

• Production of the EU budget

–Creation and maintenance of a common central repository (XML)

–Markup of modified elements during the decision process in working language

–Translation only of parts modified

–Update of repository after publication

Page 35: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

35

Other areas of XML usage (6)

Budget services

Translationservice

Publications Office

Budget XMLrepository

Printer

Formexarchive

pre-printingpost-printing

Page 36: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

36

Other areas of XML usage (7)

• ‘Secondary legislation’

–Publication of legislation in force in ‘new’languages

–XML production on basis of Formex archive

–Transformation of translated input

–Transformation of SGML into XML of Formex instance

–Merging of XML instances

Page 37: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

37

Other areas of XML usage (8)

Worddocument Formex

archive

Conversioninto XML

Extractionof text

Conversioninto XML

Extractionof skeleton

Mergingskeleton &

text

Simplifystructure

Publication

ProCat

Celex

Page 38: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

38

Other areas of XML usage (9)

• European document repository

–TIFF of publications

–PDF of publications

–Formex instances of OJ publications

–Exchange of information by XML messages

Page 39: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

39

Other areas of XML usage (10)

• Publication of calls for tender (OJ-S)

– Input in different (electronic) formats

–Harmonization in XML

–Updating database TED

–Production of CD-ROM version

Page 40: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

40

Table of contents

• Historical overview

• Formex

• Other areas of XML usage

• Conclusion

Page 41: The use of SGML and XML at the Publications Officedator8.info/pdf/SGML/5.pdf · Historical overview • Among the missions of the Publications Office: ... XSLT and XSL FO (biannual

The use of SGML and XML at the Publications Office

41

Conclusion

• Difficult start with SGML

• Successful use of XML as well as of other standards such as XSLT/XPath, XSL FO

• Powerful possibilities of re-use of XML instances