26
From XML to RDF Step by Step – XML Prague 2016 WWW.FREMEPROJECT.EU 1 Cofunded by the Horizon 2020 Framework Programme of the European Union Grant Agreement Number 644771 XML PRAGUE | 12 FEBRUARY 2016 Felix Sasaki, DFKI / W3C Fellow On behalf of the FREME ConsorVum and Contributors FROM XML TO RDF STEP BY STEP: APPROACHES FOR LEVERAGING XML WORKFLOWS WITH LINKED DATA www.fremeproject.eu

FROMXML%TO%RDFSTEP%BY%STEP:% …fsasaki.github.io/stuff/xmlprague2016/sasaki-et-al-xmlprague-2016-slides.pdfFrom%XML%to%RDFStep%by%Step%–XML%Prague%2016% =PROJECT.EU% 2 THE%CO=AUTHORS%OFTHIS%EFFORT%AND%PAPER

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: FROMXML%TO%RDFSTEP%BY%STEP:% …fsasaki.github.io/stuff/xmlprague2016/sasaki-et-al-xmlprague-2016-slides.pdfFrom%XML%to%RDFStep%by%Step%–XML%Prague%2016% =PROJECT.EU% 2 THE%CO=AUTHORS%OFTHIS%EFFORT%AND%PAPER

From  XML  to  RDF  Step  by  Step  –  XML  Prague  2016   WWW.FREME-­‐PROJECT.EU   1  

Co-­‐funded  by  the  Horizon  2020  Framework  Programme  of  the  European  Union  Grant  Agreement  Number  644771  

XML  PRAGUE  |  12  FEBRUARY  2016  

Felix  Sasaki,  DFKI  /  W3C  Fellow  On  behalf  of  the  FREME  ConsorVum  and  Contributors  

FROM  XML  TO  RDF  STEP  BY  STEP:  APPROACHES  FOR  LEVERAGING  XML  WORKFLOWS  WITH  LINKED  DATA  

www.freme-­‐project.eu

Page 2: FROMXML%TO%RDFSTEP%BY%STEP:% …fsasaki.github.io/stuff/xmlprague2016/sasaki-et-al-xmlprague-2016-slides.pdfFrom%XML%to%RDFStep%by%Step%–XML%Prague%2016% =PROJECT.EU% 2 THE%CO=AUTHORS%OFTHIS%EFFORT%AND%PAPER

From  XML  to  RDF  Step  by  Step  –  XML  Prague  2016   WWW.FREME-­‐PROJECT.EU   2  

THE  CO-­‐AUTHORS  OF  THIS  EFFORT  AND  PAPER  

•  Marta  Borriello,  Vistatec  

•  ChrisNan  Dirschl,  Wolters  Kluwer  

•  Axel  Polleres,  Vienna  University  of  Economics  and  Business  (WU)  

•  Phil  Ritchie,  Vistatec  

•  Frank  Salliau,  iMinds  

•  Felix  Sasaki,  DFKI  /  W3C  Fellow  

•  Giannis  Stoitsis,  Agro-­‐Know  

Page 3: FROMXML%TO%RDFSTEP%BY%STEP:% …fsasaki.github.io/stuff/xmlprague2016/sasaki-et-al-xmlprague-2016-slides.pdfFrom%XML%to%RDFStep%by%Step%–XML%Prague%2016% =PROJECT.EU% 2 THE%CO=AUTHORS%OFTHIS%EFFORT%AND%PAPER

From  XML  to  RDF  Step  by  Step  –  XML  Prague  2016   WWW.FREME-­‐PROJECT.EU   3  

MOTIVATION  –  THIS  BREAKS  XML  PROCESSING!  

<myData>  <head>...</head>  <body>  

 <linkedDataStorage>...</linkedDataStorage>  ...  </body>  </myData>    

•  ValidaNon  •  TransformaNon  •  Query  •  ...  •  AdaptaNon  of  schemas  in  real  life  scenarios  o_en  not  possible  

Page 4: FROMXML%TO%RDFSTEP%BY%STEP:% …fsasaki.github.io/stuff/xmlprague2016/sasaki-et-al-xmlprague-2016-slides.pdfFrom%XML%to%RDFStep%by%Step%–XML%Prague%2016% =PROJECT.EU% 2 THE%CO=AUTHORS%OFTHIS%EFFORT%AND%PAPER

From  XML  to  RDF  Step  by  Step  –  XML  Prague  2016   WWW.FREME-­‐PROJECT.EU   4  

IS  THIS  RDF  CHIMERA  AGAIN?  

•  No:  RDF  Chimera  is  about  relaNon  between  formats  ◦  XML,  HTML  RDF,  JSON  

•  Our  Issue  here  is  about  integraNon  of  formats  ◦  RDF  in  XML  workflows  for  mulNlingual  and  semanNc  enrichment  

of  content  

Page 5: FROMXML%TO%RDFSTEP%BY%STEP:% …fsasaki.github.io/stuff/xmlprague2016/sasaki-et-al-xmlprague-2016-slides.pdfFrom%XML%to%RDFStep%by%Step%–XML%Prague%2016% =PROJECT.EU% 2 THE%CO=AUTHORS%OFTHIS%EFFORT%AND%PAPER

From  XML  to  RDF  Step  by  Step  –  XML  Prague  2016   WWW.FREME-­‐PROJECT.EU   5  

BACKGROUND:  THE  FREME  PROJECT  

•  Two  year  H2020  InnovaNon  acNon;  start  February  2020  

•  Industry  partners  leading  four  business  cases  around  digital  content  and  (linked)  data  

•  FREME  =  A  framework  for  mulNlingual  and  semanNc  enrichment  of  digital  content  

•  Is  there  a  real  need  for  this?  Oh  yes!  See  the  following  business  cases  

Page 6: FROMXML%TO%RDFSTEP%BY%STEP:% …fsasaki.github.io/stuff/xmlprague2016/sasaki-et-al-xmlprague-2016-slides.pdfFrom%XML%to%RDFStep%by%Step%–XML%Prague%2016% =PROJECT.EU% 2 THE%CO=AUTHORS%OFTHIS%EFFORT%AND%PAPER

From  XML  to  RDF  Step  by  Step  –  XML  Prague  2016   WWW.FREME-­‐PROJECT.EU   6  

BUSINESS  CASE  “LINKED  DATA  IN  PUBLISHING  WORKFLOWS”  •  Wolters  Kluwer,  Agroknow  

•  Enrichment  of  academic  publicaNon  metadata  

Page 7: FROMXML%TO%RDFSTEP%BY%STEP:% …fsasaki.github.io/stuff/xmlprague2016/sasaki-et-al-xmlprague-2016-slides.pdfFrom%XML%to%RDFStep%by%Step%–XML%Prague%2016% =PROJECT.EU% 2 THE%CO=AUTHORS%OFTHIS%EFFORT%AND%PAPER

From  XML  to  RDF  Step  by  Step  –  XML  Prague  2016   WWW.FREME-­‐PROJECT.EU   7  

BUSINESS  CASE    “LINKED  DATA  IN  XML  LOCALIZATION  WORKFLOWS”  

•  Vistatec  –  workflows  integraNng  localizaNon  XML  formats  XLIFF,  ITS  2.0  and  linked  data,  in  the  Ocelot  editor  for  translaNon  ediNng  and  review  –  see  GUI  screenshot  next  slide  

Page 8: FROMXML%TO%RDFSTEP%BY%STEP:% …fsasaki.github.io/stuff/xmlprague2016/sasaki-et-al-xmlprague-2016-slides.pdfFrom%XML%to%RDFStep%by%Step%–XML%Prague%2016% =PROJECT.EU% 2 THE%CO=AUTHORS%OFTHIS%EFFORT%AND%PAPER

From  XML  to  RDF  Step  by  Step  –  XML  Prague  2016   WWW.FREME-­‐PROJECT.EU   8  

Page 9: FROMXML%TO%RDFSTEP%BY%STEP:% …fsasaki.github.io/stuff/xmlprague2016/sasaki-et-al-xmlprague-2016-slides.pdfFrom%XML%to%RDFStep%by%Step%–XML%Prague%2016% =PROJECT.EU% 2 THE%CO=AUTHORS%OFTHIS%EFFORT%AND%PAPER

From  XML  to  RDF  Step  by  Step  –  XML  Prague  2016   WWW.FREME-­‐PROJECT.EU   9  

BUSINESS  CASE  “LINKED  DATA  IN  BOOK  METADATA”  

•  iMinds  –  linked  data  in  book  metadata  

•  A  potenNal  approach  for  embedding  linked  data  in  ONIX  

Page 10: FROMXML%TO%RDFSTEP%BY%STEP:% …fsasaki.github.io/stuff/xmlprague2016/sasaki-et-al-xmlprague-2016-slides.pdfFrom%XML%to%RDFStep%by%Step%–XML%Prague%2016% =PROJECT.EU% 2 THE%CO=AUTHORS%OFTHIS%EFFORT%AND%PAPER

From  XML  to  RDF  Step  by  Step  –  XML  Prague  2016   WWW.FREME-­‐PROJECT.EU   10  

APPROACHES  FOR  LINKED  DATA  INTEGRATION  

1.  Convert  XML  to  linked  data  

2.  Embed  linked  data  into  XML  via  structured  markup  

3.  Anchor  Linked  data  in  XML  akributes  

4.  Embed  linked  data  in  metadata  secNons  of  XML  files  

5.  Anchor  linked  data  via  annotaNons  in  XML  content  

Try  them  out  with  DocBook  or  TEI  content  athkp://api-­‐dev.freme-­‐project.eu/doc/freme-­‐showcase/xml-­‐to-­‐rdf.html    

ImplementaNon  uses  FREME,  the  Okapi  framework  and  Saxon-­‐CE,  the  Swiss  army  knife  of  XML  in  the  browser  processing  

Page 11: FROMXML%TO%RDFSTEP%BY%STEP:% …fsasaki.github.io/stuff/xmlprague2016/sasaki-et-al-xmlprague-2016-slides.pdfFrom%XML%to%RDFStep%by%Step%–XML%Prague%2016% =PROJECT.EU% 2 THE%CO=AUTHORS%OFTHIS%EFFORT%AND%PAPER

From  XML  to  RDF  Step  by  Step  –  XML  Prague  2016   WWW.FREME-­‐PROJECT.EU   11  

SCREENSHOT  FROM  DEMO  

Page 12: FROMXML%TO%RDFSTEP%BY%STEP:% …fsasaki.github.io/stuff/xmlprague2016/sasaki-et-al-xmlprague-2016-slides.pdfFrom%XML%to%RDFStep%by%Step%–XML%Prague%2016% =PROJECT.EU% 2 THE%CO=AUTHORS%OFTHIS%EFFORT%AND%PAPER

From  XML  to  RDF  Step  by  Step  –  XML  Prague  2016   WWW.FREME-­‐PROJECT.EU   12  

1.  CONVERT  XML  TO  LINKED  DATA  

Page 13: FROMXML%TO%RDFSTEP%BY%STEP:% …fsasaki.github.io/stuff/xmlprague2016/sasaki-et-al-xmlprague-2016-slides.pdfFrom%XML%to%RDFStep%by%Step%–XML%Prague%2016% =PROJECT.EU% 2 THE%CO=AUTHORS%OFTHIS%EFFORT%AND%PAPER

From  XML  to  RDF  Step  by  Step  –  XML  Prague  2016   WWW.FREME-­‐PROJECT.EU   13  

1.  CONVERT  XML  TO  LINKED  DATA  

Page 14: FROMXML%TO%RDFSTEP%BY%STEP:% …fsasaki.github.io/stuff/xmlprague2016/sasaki-et-al-xmlprague-2016-slides.pdfFrom%XML%to%RDFStep%by%Step%–XML%Prague%2016% =PROJECT.EU% 2 THE%CO=AUTHORS%OFTHIS%EFFORT%AND%PAPER

From  XML  to  RDF  Step  by  Step  –  XML  Prague  2016   WWW.FREME-­‐PROJECT.EU   14  

1.  CONVERT  XML  TO  LINKED  DATA  

Benefits  

•  No  need  to  change  XML  workflow  

•  Similar  to  RDF  Chimera  approach  

•  Difference:  here  focus  on  adding  new  (linked)  informaNon  

Drawback  

•  New  tool  chain  needed  

•  No  useful  representaNon  of  mixed  content  

Page 15: FROMXML%TO%RDFSTEP%BY%STEP:% …fsasaki.github.io/stuff/xmlprague2016/sasaki-et-al-xmlprague-2016-slides.pdfFrom%XML%to%RDFStep%by%Step%–XML%Prague%2016% =PROJECT.EU% 2 THE%CO=AUTHORS%OFTHIS%EFFORT%AND%PAPER

From  XML  to  RDF  Step  by  Step  –  XML  Prague  2016   WWW.FREME-­‐PROJECT.EU   15  

2.  EMBED  LINKED  DATA  INTO  XML  VIA  STRUCTURED  MARKUP  

<para>We  very  much  welcome  you  in  the  city  of  <emphasis  vocab="hkp://schema.org/"  typeof="Place"  property="name"  resource="hkp://dbpedia.org/resource/Prague">Prague</emphasis>,  a  home  of  <emphasis  vocab="hkp://schema.org/"  typeof="Thing"  property="name"  resource="hkp://dbpedia.org/resource/XML">XML</emphasis>!</para>  

Page 16: FROMXML%TO%RDFSTEP%BY%STEP:% …fsasaki.github.io/stuff/xmlprague2016/sasaki-et-al-xmlprague-2016-slides.pdfFrom%XML%to%RDFStep%by%Step%–XML%Prague%2016% =PROJECT.EU% 2 THE%CO=AUTHORS%OFTHIS%EFFORT%AND%PAPER

From  XML  to  RDF  Step  by  Step  –  XML  Prague  2016   WWW.FREME-­‐PROJECT.EU   16  

2.  EMBED  LINKED  DATA  INTO  XML  VIA  STRUCTURED  MARKUP  

Benefits  

•  Relying  on  hooks  for  data  integraNon,  e.g.  RDFa  1.1  lite  

•  Common  for  search  engine  opNmizaNon,  cf.  schema.org  

•  May  use  other  syntaxes  like  json-­‐ld  

Drawback  

•  May  break  XML  validaNon  

•  May  need  at  least  adapted  tool  chains  to  understand  RDFa  /  json-­‐ld  

Page 17: FROMXML%TO%RDFSTEP%BY%STEP:% …fsasaki.github.io/stuff/xmlprague2016/sasaki-et-al-xmlprague-2016-slides.pdfFrom%XML%to%RDFStep%by%Step%–XML%Prague%2016% =PROJECT.EU% 2 THE%CO=AUTHORS%OFTHIS%EFFORT%AND%PAPER

From  XML  to  RDF  Step  by  Step  –  XML  Prague  2016   WWW.FREME-­‐PROJECT.EU   17  

3.  ANCHOR  LINKED  DATA  IN  XML  ATTRIBUTES  

Example:  Embedding  anchors  in  XLIFF  via  ITS  2.0  text  analyNcs  markup  

<source  ...>  

 <mrk  ...its:taIdentRef="hkp://dbpedia.org/resource/Berlin">  

   Berlin</mrk>  is  the  capital  of  Germany!</source>  

Page 18: FROMXML%TO%RDFSTEP%BY%STEP:% …fsasaki.github.io/stuff/xmlprague2016/sasaki-et-al-xmlprague-2016-slides.pdfFrom%XML%to%RDFStep%by%Step%–XML%Prague%2016% =PROJECT.EU% 2 THE%CO=AUTHORS%OFTHIS%EFFORT%AND%PAPER

From  XML  to  RDF  Step  by  Step  –  XML  Prague  2016   WWW.FREME-­‐PROJECT.EU   18  

3.  ANCHOR  LINKED  DATA  IN  XML  ATTRIBUTES  

Benefits  

•  Using  exisNng  XML  akributes  =  no  new  markup  is  needed  

•  Toolchain  can  be  kept  as  is  

Drawback  

•  Actual  data  integraNon  is  just  postponed  

•  Data  integraNon  does  not  leave  a  trace  –  missing  provenance  

Page 19: FROMXML%TO%RDFSTEP%BY%STEP:% …fsasaki.github.io/stuff/xmlprague2016/sasaki-et-al-xmlprague-2016-slides.pdfFrom%XML%to%RDFStep%by%Step%–XML%Prague%2016% =PROJECT.EU% 2 THE%CO=AUTHORS%OFTHIS%EFFORT%AND%PAPER

From  XML  to  RDF  Step  by  Step  –  XML  Prague  2016   WWW.FREME-­‐PROJECT.EU   19  

4.  EMBED  LINKED  DATA  IN  METADATA  SECTIONS  OF  XML  FILES  

Page 20: FROMXML%TO%RDFSTEP%BY%STEP:% …fsasaki.github.io/stuff/xmlprague2016/sasaki-et-al-xmlprague-2016-slides.pdfFrom%XML%to%RDFStep%by%Step%–XML%Prague%2016% =PROJECT.EU% 2 THE%CO=AUTHORS%OFTHIS%EFFORT%AND%PAPER

From  XML  to  RDF  Step  by  Step  –  XML  Prague  2016   WWW.FREME-­‐PROJECT.EU   20  

4.  EMBED  LINKED  DATA  IN  METADATA  SECTIONS  OF  XML  FILES  

Benefits  

•  Metadata  secNon  does  not  influence  size  of  main  content  

•  Clear  separaNon  of  concerns  and  processing  

Drawback  

•  No  per  se  relaNon  to  actual  content  

•  Character  offset  pointers  to  content  are  fragile  

Page 21: FROMXML%TO%RDFSTEP%BY%STEP:% …fsasaki.github.io/stuff/xmlprague2016/sasaki-et-al-xmlprague-2016-slides.pdfFrom%XML%to%RDFStep%by%Step%–XML%Prague%2016% =PROJECT.EU% 2 THE%CO=AUTHORS%OFTHIS%EFFORT%AND%PAPER

From  XML  to  RDF  Step  by  Step  –  XML  Prague  2016   WWW.FREME-­‐PROJECT.EU   21  

5.  ANCHOR  LINKED  DATA  VIA  ANNOTATIONS  IN  XML  CONTENT  –  HERE  USING  W3C  ANNOTATION  MODEL  

{  "id":  "hkp://example.com/myannotaNons/a1",  "type":  "AnnotaNon“...  

"selector":  {  ...  

"/xlf:unit[1]/xlf:segment[1]/xlf:source/xlf:mrk[1]"  "itsrdf:taIdentRef":  "hkp://dbpedia.org/resource/Berlin",  "itsrdf:taClassRef":  "hkp://schema.org/Place",  ...  }  }  

Page 22: FROMXML%TO%RDFSTEP%BY%STEP:% …fsasaki.github.io/stuff/xmlprague2016/sasaki-et-al-xmlprague-2016-slides.pdfFrom%XML%to%RDFStep%by%Step%–XML%Prague%2016% =PROJECT.EU% 2 THE%CO=AUTHORS%OFTHIS%EFFORT%AND%PAPER

From  XML  to  RDF  Step  by  Step  –  XML  Prague  2016   WWW.FREME-­‐PROJECT.EU   22  

5.  ANCHOR  LINKED  DATA  VIA  ANNOTATIONS  IN  XML  CONTENT  

Benefits  

•  Same  as  approach  4  

•  In  addiNon,  more  robust  anchoring  via  path  expressions  

Drawback  

•  ResoluNon  of  path  expressions  can  be  computaNonally  expensive  

Page 23: FROMXML%TO%RDFSTEP%BY%STEP:% …fsasaki.github.io/stuff/xmlprague2016/sasaki-et-al-xmlprague-2016-slides.pdfFrom%XML%to%RDFStep%by%Step%–XML%Prague%2016% =PROJECT.EU% 2 THE%CO=AUTHORS%OFTHIS%EFFORT%AND%PAPER

From  XML  to  RDF  Step  by  Step  –  XML  Prague  2016   WWW.FREME-­‐PROJECT.EU   23  

APPROACHES  FOR  LINKED  DATA  INTEGRATION  …  

1.  Convert  XML  to  linked  data  

2.  Embed  linked  data  into  XML  via  structured  markup  

3.  Anchor  Linked  data  in  XML  akributes  

4.  Embed  linked  data  in  metadata  secNons  of  XML  files  

5.  Anchor  linked  data  via  annotaNons  in  XML  content  

…  Or:  Routes  to  Bridge  between  RDF  and  XML  

Page 24: FROMXML%TO%RDFSTEP%BY%STEP:% …fsasaki.github.io/stuff/xmlprague2016/sasaki-et-al-xmlprague-2016-slides.pdfFrom%XML%to%RDFStep%by%Step%–XML%Prague%2016% =PROJECT.EU% 2 THE%CO=AUTHORS%OFTHIS%EFFORT%AND%PAPER

From  XML  to  RDF  Step  by  Step  –  XML  Prague  2016   WWW.FREME-­‐PROJECT.EU   24  

ROUTES  TO  BRIDGE  BETWEEN  RDF  AND  XML  

•  XSPARQL  –  W3C  Member  submission  

•  CompilaNon  of  SPARQL  queries  into  XQuery  

Page 25: FROMXML%TO%RDFSTEP%BY%STEP:% …fsasaki.github.io/stuff/xmlprague2016/sasaki-et-al-xmlprague-2016-slides.pdfFrom%XML%to%RDFStep%by%Step%–XML%Prague%2016% =PROJECT.EU% 2 THE%CO=AUTHORS%OFTHIS%EFFORT%AND%PAPER

From  XML  to  RDF  Step  by  Step  –  XML  Prague  2016   WWW.FREME-­‐PROJECT.EU   25  

NEXT  STEPS  –  WHAT  DO  YOU  THINK?  

•  Conclusion  with  my  co-­‐authors:    

“We  believe  that  joint  efforts  in  standardizaNon  bodies  to  bridge  the  gaps  between  RDF  and  XML  in  order  to  enable  such  transformaNons  and  integrated  tooling  in  a  standard  way  should  be  further  pursued.”  

•  What  do  you  think  –  is  it  worth  documenNng  this  further  ◦  Have  a  W3C  community  group  on  the  topic?  ◦  Document  approaches  and  best  pracNces?  ◦  Provide  of-­‐the-­‐shelf  tooling?  

Page 26: FROMXML%TO%RDFSTEP%BY%STEP:% …fsasaki.github.io/stuff/xmlprague2016/sasaki-et-al-xmlprague-2016-slides.pdfFrom%XML%to%RDFStep%by%Step%–XML%Prague%2016% =PROJECT.EU% 2 THE%CO=AUTHORS%OFTHIS%EFFORT%AND%PAPER

From  XML  to  RDF  Step  by  Step  –  XML  Prague  2016   WWW.FREME-­‐PROJECT.EU   26  

CONTACTS  

FELIX  SASAKI  

Senior  Researcher  DFKI  /  W3C  Fellow  

On  behalf  of  the  FREME  consorNum  and  collaborators  

E-­‐mail:  [email protected]