Upload
marvel
View
34
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Author Generated JATS XML Markup. Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com. How We Started. Co-Founded Worldwide Cars Online in 1990 Sent images of cars and car parts via Compuserve emails (modem speed 7kb/sec) No official Internet - PowerPoint PPT Presentation
Citation preview
Author Generated JATS XML Markup
Andy GajetzkiCIO, ispub.com
Olivier Wenker, MD, MBAFounder and CEO, ispub.com
How We Started• Co-Founded Worldwide Cars Online in 1990– Sent images of cars and car parts via Compuserve
emails (modem speed 7kb/sec)– No official Internet – Closed the company in 1994
• Created online content while at Baylor in 1994• Netscape goes public in 1995• Officially launched 1st online journal in 1995
How We Continued• Started with The Internet Journal of
Anesthesiology• Added more journal over time• All were open access from the beginning no
registration required as reader)• Some of the first articles were submitted in print
via mail and I retyped them with Word• Articles were then submitted to me via email
(attached as Word document)
How We Continued• Initially used a Mosaic Browser tool and then a
Netscape Browser tool to create HTML for the web pages
• Then used 1st version of FrontPage to create a more complex web site
• We decided in 1997 to convert Word documents into SGML data sets and then to use XML in 1998
What We Are Today• We currently publish 82 titles (online medical
journals) at www.ispub.com • We use our own article submission system
(home-grown) at www.quickmedpub.com • We just implemented a new backend for
article submissions and article flow• We decided to have authors generate much of
the markup
And Now Lets Get Technical
Author Generated JATS XML Markup
by Andy Gajetzki
What is our JATS editor?• Represents a move to author generated
markup for our XML• Based on a customizable and reusable PHP
component – Symfony2 – popular PHP framework
• Easy to use– Form based, WYSYWIG and linear workflow
Our old workflow• How we used to do things:• Three separate workflows for each article:
1. Header generation2. Body markup3. Conversion from proprietary XML to JATS as the
last step
Word Macros
Problems with our current method
• Time consuming– Delays in publishing
• Error prone– Data entry is performed by programmers
• Authors don’t like the delay to publish and the delay to correct errors
Design Rational• We can’t support the whole spec.– How did we determine what to support?• Statistical analysis of most markup in our current article
corpus
How can we offset as much markup to the author as possible but still have a clean and intelligible end product?
What is supported• NLM Blue 3.0• Two separate support levels– Inline-level– Block-level
• Our level of JATS support is determined by each level.
Inline Level• Italics, bold, and all other presentation layer
markup supported
Block level• Single level sections only as WYSIWYG editor is
based on the HTML DOM– Other tools providing a more XML approach are
expensive, and more difficult for the author to use• General structure is
<sec> <title>
<xyz>
• <Sec> – > Boxed-text, fig, graphic, preformat, table-wrap, p, list
Titles• Support of presentational elements with, for
the most part, a non-mixed content-type
Contributors• Flexible• Single / collaborative
authors• Most JATS
<contrib-group>markup supported
• Inline-level formatting in block elements
Keywords• Keywords should be based
on MeSH entries• Validation constraints can
be applied based on that
Other article-meta• Article ID’s• Author notes• Supplemental content• Funding/grants• Article history• Permissions
Abstract / Body / Appendices
• Currently a moving target• MathML is not currently supported• Current subset of JATS covers 99% of our
cases, but we will always try to expand coverage
• WYSIWYG HTML Editor• Utilize a specific subset of HTML that we can
unambiguously map to JATS via data transformations– XSLT– regexp
• If no mapping is possible, another method must be devised
Images / Table Capture / Media• Images / Figures are handled via out-of-band
file upload on a separate page• Authors are requested to upload highest
quality format that they can• Tables can either be captured as an image, or
inserted via a Word style table creation tool • Other media types have not been
implemented yet
Endnote Handling – Document references
• JavaScript annotation tool• Endnote number / reference is highlighted in
the text and a resolution is made to a back-matter citation entry
Supported Back Matter• Acknowledgments• Appendices• Biography• Glossary’s• Citations• Notes – Content-type attribute of note element supported
Citation Handling – Back matter• One citation per line• Regular expression search for meta-data service
identifiers at PMC and Crossref– If a match is found, correct metadata is pulled from
the service• Simple JavaScript annotation tool to tokenize
citation string• Before submission, author must resolve all
endnote problems
Citation Tokenization Example
From browser to JATS XML• The block level components operate on the
HTML DOM• CSS classes are added to elements to
distinguish content types• Through various transformations, we interpret
the resultant DOM and produce the JATS XML
HTML mapping JATS XML
Validation• When things go wrong
1) XSD Validation- Intervention required by staff
2) Style/presentation problems- Intervention required by author/staff
3) Copy editing4) Peer review
Amazon Mechanical Turk• For predictable failures, Amazon Mechanical
Turk, a platform for “human intelligence tasks”, can be used
• For a small price, work units are created and human workers get paid to perform the task– 24x7 availability
Summary
Contact For Questions
Technical questions:Andy Gajetzki
General questions:Olivier Wenker, MD, MBA