View
213
Download
0
Category
Tags:
Preview:
Citation preview
ETD2004, June 3-5 2004 University of Kentucky, Lexington
Structured ETDs at the Document and Publication Server of
Humboldt University
From DTD generation to XML conversion:
Uwe MüllerHumboldt University, BerlinElectronic Publishing Groupu.mueller@cms.hu-berlin.de
Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin
ETD2004, June 3-5 2004 University of Kentucky, Lexington
From DTD generation to XML conversion:
Structured ETDs at Humboldt's EDoc Server
Background
• Humboldt University: 800 – 1.000 dissertations / year• Germany: duty to publish dissertations
– traditional methods: • publishing house• microfiche • 40 … 200 printed copies (depending on faculty regulations)
• Humboldt U.: not mandatory to submit an ETD• ~ ¼ dissertations published electronically• XML as central strategy
Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin
ETD2004, June 3-5 2004 University of Kentucky, Lexington
From DTD generation to XML conversion:
Structured ETDs at Humboldt's EDoc Server
Why XML?
• Standardized format• Long term preservation• easily convertible to
– presentation formats (HTML, PDF)
– other XML structures• qualified full text retrieval • contains structural and
contextual information – in a machine readable format
HTMLHTML
digital signaturedigital signature
PDFPDF
digital signaturedigital signature
Office documentOffice document
digital signaturedigital signature
XMLXML
Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin
ETD2004, June 3-5 2004 University of Kentucky, Lexington
From DTD generation to XML conversion:
Structured ETDs at Humboldt's EDoc Server
XML: Restrictions to deal with
• XML source does not contain layout information• rather linear structure• XML is not used as Authoring System
– authors use their 'own' systems• Microsoft Word• LaTeX• Open Office / Star Office• Framemaker• Word Perfect
Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin
ETD2004, June 3-5 2004 University of Kentucky, Lexington
From DTD generation to XML conversion:
Structured ETDs at Humboldt's EDoc Server
How to overjump the gap?
• get the authors where they are …• instructions and guidelines for authors
– usage of style files (e.g., dissertation-hu.dot) – manuals, support hotline, regular courses
• different conversion processes– SGML author (plug in for MS Word <= 97) – Open Office / Star Office
• exploit genuine XML format
– MS Office 2003XML according to DiML DTD– common pitfalls: tables, pictures
Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin
ETD2004, June 3-5 2004 University of Kentucky, Lexington
From DTD generation to XML conversion:
Structured ETDs at Humboldt's EDoc Server
Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin
ETD2004, June 3-5 2004 University of Kentucky, Lexington
From DTD generation to XML conversion:
Structured ETDs at Humboldt's EDoc Server
Conversion Process Using Open Office
Open OfficeOpen Office
example.docexample.doc
example.sxw(zip file)
.
.
.
.
.
.
.
.
example.sxw(zip file)
.
.
.
.
.
.
.
.
content.xmlcontent.xml
example_stl.xmlexample_stl.xml
example.xmlexample.xml
front.xmlfront.xmlchapter1.xmlchapter1.xml
chapter2.xmlchapter2.xmlchapter3.xmlchapter3.xml
example.htmlexample.html
*.gif*.gif*.jpg*.jpg
front.htmlfront.htmlchapter1.htmlchapter1.html
chapter2.htmlchapter2.htmlchapter3.htmlchapter3.html
Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin
ETD2004, June 3-5 2004 University of Kentucky, Lexington
From DTD generation to XML conversion:
Structured ETDs at Humboldt's EDoc Server
Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin
ETD2004, June 3-5 2004 University of Kentucky, Lexington
From DTD generation to XML conversion:
Structured ETDs at Humboldt's EDoc Server
Principal Structure of a DiML document<etd>
<front>..title...author...abstract...</front> <body> <chapter> <section> ... </body> <back>..bibliography...appendix...vita...</back>
</etd>
Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin
ETD2004, June 3-5 2004 University of Kentucky, Lexington
From DTD generation to XML conversion:
Structured ETDs at Humboldt's EDoc Server
From flat structure to Hierarchy• only two types of styles in Word
– paragraph styles– character styles
• e.g., in case of th first occurring Heading 1 paragraph style the converter has to know– Heading 1 is the beginning of a chapter– Heading 1 implies a head element– the element chapter can only occur in body
</front><body><chapter>
<head id="anyID">Introduction</head>
Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin
ETD2004, June 3-5 2004 University of Kentucky, Lexington
From DTD generation to XML conversion:
Structured ETDs at Humboldt's EDoc Server
Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin
ETD2004, June 3-5 2004 University of Kentucky, Lexington
From DTD generation to XML conversion:
Structured ETDs at Humboldt's EDoc Server
Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin
ETD2004, June 3-5 2004 University of Kentucky, Lexington
From DTD generation to XML conversion:
Structured ETDs at Humboldt's EDoc Server
Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin
ETD2004, June 3-5 2004 University of Kentucky, Lexington
From DTD generation to XML conversion:
Structured ETDs at Humboldt's EDoc Server
Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin
ETD2004, June 3-5 2004 University of Kentucky, Lexington
From DTD generation to XML conversion:
Structured ETDs at Humboldt's EDoc Server
Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin
ETD2004, June 3-5 2004 University of Kentucky, Lexington
From DTD generation to XML conversion:
Structured ETDs at Humboldt's EDoc Server
Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin
ETD2004, June 3-5 2004 University of Kentucky, Lexington
From DTD generation to XML conversion:
Structured ETDs at Humboldt's EDoc Server
One Core – Multiple Views
• HTML generation (static or dynamic)– performance problems with XSLT and huge
documents– solution: division of XML sources into components
(easier and fast to process)• PDF + Print on Demand (http://www.proprint-service.de)• Current problems
– changing Office systems and versions• ongoing implementations and adaptations necessary• but: might be restricted to XSL coding
Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin
ETD2004, June 3-5 2004 University of Kentucky, Lexington
From DTD generation to XML conversion:
Structured ETDs at Humboldt's EDoc Server
Towards a universal DTD? • DiML – originally taken from an SGML DTD at Virginia Tech ("ETD"),
http://edoc.hu-berlin.de/diml– already many elements (> 100)– combines elements of different description levels– extended and adapted to local needs
• special requirements from several departments (e.g., literature / dramatics, humanities, geography, …)
• necessity to include external DTDs (e.g., CALS-Table, MathML, MusicML, …)
• publication types other than theses and dissertations– conference proceedings, electronic journals, other series, …
• first approach: extend DTD aiming at a universal 'mega' DTD– problems: complexity, difficult maintenance
• other possibility: create a completely new DTD for each purpose– loss of interoperability
Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin
ETD2004, June 3-5 2004 University of Kentucky, Lexington
From DTD generation to XML conversion:
Structured ETDs at Humboldt's EDoc Server
Modular DTD Approach
• idea: individually adapted DTDs
1. split up DTD into modules, such as– text, structure, citation, dramatics
2. handle external DTDs as modules as well, e.g.,– MathML, MusicML, CALS-Table
3. recombine a DTD out of user selected modules• result
a. a DTD with only the needed elements and modules
b. individual reference and sample documents
Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin
ETD2004, June 3-5 2004 University of Kentucky, Lexington
From DTD generation to XML conversion:
Structured ETDs at Humboldt's EDoc Server
Modular DTD Approach: Benefits
• modules are easily maintainable– distributed development– version numbers for each module
• reusability – define (several) styles for each module – reference information for each module
• support different languages
• get a DTD that exactly fits your needs
Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin
ETD2004, June 3-5 2004 University of Kentucky, Lexington
From DTD generation to XML conversion:
Structured ETDs at Humboldt's EDoc Server
DTDSys: Principal Architecture
• modules: small packages of elements belonging to each other
• stored in separate files in the DTDBase• include metadata, e.g., descriptive information, version
numbers, and dependences to other modules• DTDSys generates DTD and reference files using
– XSL / XSLT– Java– Web Interfaces
Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin
ETD2004, June 3-5 2004 University of Kentucky, Lexington
From DTD generation to XML conversion:
Structured ETDs at Humboldt's EDoc Server
Modules and Dependencestext br, em, strong, sup, sub, u, tt, pre
common p, head, caption, url, name, foreign…
structure chapter, section, subsection…
citation quotations and references
documents page numbers, footnotes, endnotes, …
diml front, body, back, abstract…
Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin
ETD2004, June 3-5 2004 University of Kentucky, Lexington
From DTD generation to XML conversion:
Structured ETDs at Humboldt's EDoc Server
reference.....
reference.....
DTD Generation Process
DTDBaseDTDBase
dependences.htmldependences.html
selection.xmlselection.xml full-dtd.xmlfull-dtd.xml
xdiml.dtdxdiml.dtd
dtd-reference.xmldtd-reference.xml
p.phpp.php
chapter.phpchapter.php
module-text.xmlmodule-text.xmlmodule-text.xmlmodule-text.xmlmodule-text.xmlmodule-text.xml
XSLXSL
XSLXSL
Java+XSLJava+XSL
XSLXSL
XSLXSL
including
• element info
• description
• dependences
Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin
ETD2004, June 3-5 2004 University of Kentucky, Lexington
From DTD generation to XML conversion:
Structured ETDs at Humboldt's EDoc Server
Outlook• SCOPE = Service Core for Open Publishing Environments
– development of Publication Components (authoring tools, conversion mechanisms, layout and style definitions)
– management system to maintain versions and dependences – publication system– workflow component
• Long Term Preservation activities– Implementation of OAIS reference model– Sun Center of Excellence
Uwe Müller, Electronic Publishing Group, CMS / UB Humboldt University, Berlin
ETD2004, June 3-5 2004 University of Kentucky, Lexington
From DTD generation to XML conversion:
Structured ETDs at Humboldt's EDoc Server
Thanks
to Sabine Henneberger, Jakob Voß, Matthias Schulz
Thank you!
Questions?
u.mueller@cms.hu-berlin.de
http://edoc.hu-berlin.de/
Recommended