33
Bookshelf Leafing through XML NLM Journal Article Tag Suite Conference 2010 Martin Latterner and Marilu Hoeppner National Center for Biotechnology Information National Library of Medicine next> <prev

Bookshelf Leafing through XML NLM Journal Article Tag Suite Conference 2010 Martin Latterner and Marilu Hoeppner National Center for Biotechnology Information

  • View
    217

  • Download
    4

Embed Size (px)

Citation preview

BookshelfLeafing through XML

NLM Journal Article Tag Suite Conference 2010

Martin Latterner and Marilu HoeppnerNational Center for Biotechnology Information

National Library of Medicine

next><prev

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

NLMBOOK

DTDv2.3

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

NLM Collection Catalog

PubMed AbstractsElectronic Literature

Archive

Books, Monographs, Reports

Journals

Other publication formats

Book chapters, Monographs, Reports

Books in PubMed

Non-PubMed Books

User guides, Documentation

Journal articles PMC Journals PubMed Central

Bookshelf

Entrez Literature Resources

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Features of the Book DTDBooks and journals within PubMed CentralBookshelf WorkflowsIntegration of information between databases

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Modifications

Allowed icon as a child of exlnk.Allowed pre as a child of entry.Allowed glossary as a child of chapter.Added type: ppt.Added attributes id and BID to <foot>.Added attribute id to <p>.Added <title>, child of <bibsect>.Added <bb>, <gf> and <figgrp> as children of <linkgrp>.Added <email> as child of <txtstyle>.Added <pdf> as child of <glossary>.Added <figgrp1> as child of <entry>.…

NCBI Book DTD 1.0Based on ISO 12083 Article DTD

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

March 2003v1.0

December 2004v2.0

November 2005v2.1

BOOKSHELF XML DATANCBI BOOK DTD

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Book DTDof the

NLM Journal Article Tag Suite

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Designed to capture the semantic elements of the content, not form

e.g. bibliographic metadata

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

<front>

<div type="titlepage" level="1" id="2001902bddd00001"> <booktitle> <ils style="strong">CONFLICT OF INTEREST IN MEDICAL RESEARCH</ils> </booktitle> <bookauthor> <bookauthor.name>Committee on Conflict of Interest in Medical Research</bookauthor.name> <bookauthor.info>Board on Health Sciences Policy</bookauthor.info> <bookauthor.info>INSTITUTE OF MEDICINE <ils style="smallcap"> <ils style="emphasis"> OF THE NATIONAL ACADEMIES</ils> </ils> </bookauthor.info> </bookauthor> <publication.stmt> <p style="center"> <publisher> <publisher.name>THE NATIONAL ACADEMIES PRESS</publisher.name> <publisher.address><state>Washington, D.C.</state></publisher.address> </publisher> </p> </publication.stmt> <page number="ii" id="2001902bppp00002"/> </div>

<div type="copyrightpage" level="1" id="2001902bddd00002"> <publication.stmt> <p style="normal"> <publisher> <publisher.name><ils style="strong">THE NATIONAL ACADEMIES PRESS</ils></publisher.name> <publisher.address> <street><ils style="strong">500 Fifth Street, N.W.</ils></street> <state><ils style="strong">Washington, DC</ils></state> <postcode><ils style="strong">20001</ils></postcode> </publisher.address> </publisher> </p> </publication.stmt> <publication.stmt> <p style="flindent">ISBN <isbn>978-0-309-13188-9</isbn> (hardcover)</p> </publication.stmt> <copyright>Copyright <copyright.year>2009</copyright.year> by the <copyright.holder>National Academy of Sciences</copyright.holder>. All rights reserved.</copyright> <printinfo> <print>Printed in the United States of America</print> </printinfo> </div></front>

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

<book-meta> <book-title-group> <book-title>Conflict of Interest in Medical Research</book-title> </book-title-group> <contrib-group> <contrib contrib-type="author"> <collab>Institute of Medicine (US) Committee on Conflict of Interest in Medical Research, Education, and Practice</collab> </contrib> </contrib-group> <publisher> <publisher-name>National Academies Press (US)</publisher-name> <publisher-loc>Washington (DC)</publisher-loc> </publisher> <isbn>978-0-309-13188-9</isbn> <pub-date pub-type="ppub"> <year>2009</year> </pub-date> <permissions> <copyright-statement>Copyright &copy; 2009, National Academy of Sciences</copyright-statement> <copyright-year>2009</copyright-year> </permissions></book-meta>

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

More granular text descriptions are handled at attribute level

e.g. preface, foreword

<sec sec-type=“preface”>

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Article Book

<abbrev-journal-title><article><article-categories><article-id><article-meta><conf-acronym><conference><conf-num><conf-theme><floats-group><front><front-stub><issue-sponsor><journal-meta><journal-subtitle><journal-title><journal-title-group><response><series-text><series-title><string-conf><sub-article><unstructured-kwd-group><x>

<alternate-form><area><book><book-front><book-meta><book-part><book-part-categories><book-part-meta><book-title><book-title-group><collection><collection-id><collection-list><collection-member><collection-meta><collection-name><map><map-group><multi-link>

DTD v3.0Elements

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

<map-group>

XML

<map-group id="my-map-id"> <graphic xlink:href="img-uri"/> <map map-name="my-map"> <area map-shape="rect" map-coords="1,1,51,76" xlink:href="uri1"/> <area map-shape="rect" map-coords="54,4,94,74" xlink:href="ur2"/> </map></map-group>

XHTML

<img src="img-uri" usemap="#my-map-id"/><map id="my-map-id" name="my-map"> <area href="uri1" shape="rect" coords="1,1,51,76"/> <area href="uri2" shape="rect" coords="54,4,94,74"/></map>

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

<multi-link>

XML

<multi-link> <term>IDDM2</term> <ext-link ext-link-type="url" xlink:href="LINK1">Bookshelf</ext-link> <ext-link ext-link-type="url" xlink:href="LINK2">PubMed Central</ext-link>…</multi-link>

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Article Bookabbrev-typearticle-typeresponse-type

alternate-form-typebook-idbook-part-numberbook-part-typegraphic-type (obsolete)indexedmap-altmap-coordsmap-namemap-shapeprimaryqualifiertaxonomic-id

DTD v3.0Attributes

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Books & Journals in PubMed Central

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Source Conversion

(1) Third-party vendor services: Tagging rules for journals can be applied to book content, especially, for lower level document objects.

CitationsFiguresTables

(2) In-house conversion: For content submitted in external DTDs, code reuse of PMC journal modules for handling:

DatesStringsCALS to XHTML table conversion

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Data Processing and Ingest

Software to lookup PubMed IDs in citations<pub-id pub-id-type=”pmid”>

Imaging resizing software and validation checks for graphics and supplementary data files such as PDF

Loading code for the extraction of key information, such as dates, subject categories, etc

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

CHOP-IT-UP

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Output Formats

HTML

Uses base XSLT Article rendering rules for conversion of XML to HTML; book-specific overwrites or modifications

PDF

Uses XSL-FO base code for articles; book-specific overwrites or modifications

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Advantages of using a Shared Tag Set

Share XSLT modules during ingest, conversion processes, and renderingUse similar database infrastructureEnables closer integration for a variety of processes, such as PubMed submission and indexing

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Bookshelf Workflows

Submission of Content to Bookshelf

• PDF or Word• XML in NLM Book DTD• XML in external DTDs• Word authoring followed by conversion to XML (in-

house)

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

<book>

Submitted Files

PDFWord

XML (External DTD)

NLM Book DTD XML

Third-party vendoror

In-house Converters

Requirements

Pass validation Pass stylecheck

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

<book-part>

PMC

<book-part><book-part><book-part>

CMS

<book>CHOP-IT-UP

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

NCBI Word converter

XML

Instant HTML Preview

Publish to

Bookshelf

Microsoft Worddocument

Word Authoring Followed by Conversion to XML

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Stylechecker

Check business rules

Goal: one set of rendering rules for uniform source XML data

2 Checkpoints

Whole book (modified article stylechecker)

Individual book-part (article stylechecker)

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Integrating Content from Different Databases

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

<!DOCTYPE sec SYSTEM "book.dtd"><sec><title/>

<sec id="molgen.tables" ><title/><p content-type="molecular_genetics"><italic>Information in the Molecular Genetics and OMIM tables may differ from that elsewhere in the GeneReview: tables may containmore recent information. &#x02014;</italic>ED.</p><table-wrap id="pkd-ar.molgen.TA" position="anchor"><caption><p>Table A. Polycystic Kidney Disease, Autosomal Recessive: Genes and Databases</p></caption><table><tbody><tr><th>Gene Symbol</th><th>Chromosomal Locus</th><th>Protein Name</th><th>Locus Specific</th><th>HGMD</th></tr>

Data in the JATS Book DTD Delivered from External Database

<?get-external-xml molgen.tables?>

Processing Instruction in Source XML

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

next><prev