47
ARIADNE is funded by the European Commission's Seventh Framework Programme ARIADNE IntegraAon Framework Achille FeliceD VASTLAB PIN, Università degli Studi di Firenze, Italy

Achille Felicetti "Introduction to the Ariadne winter school and to the ARIADNE"infrastructure

Embed Size (px)

Citation preview

ARIADNE  is  funded  by  the  European  Commission's  Seventh  Framework  Programme  

ARIADNE  IntegraAon  Framework  

 Achille  FeliceD  

VAST-­‐LAB  -­‐  PIN,  Università  degli  Studi  di  Firenze,  Italy  

Winter  School  Overview  

1.  The  ARIADNE  Infrastructure  2.  Integrated  registraAon  and  resource  descripAons  3.  SemanAc  integraAon  of  archaeological  informaAon  

4.  Mapping  strategies  5.  Mapping  and  conversion  tools  

Part  1  

The  ARIADNE  Infrastructure  

What  is  ARIADNE  •  ARIADNE  is  a  Research  Infrastructure  aiming  at  the  

integraAon  of  archaeological  datasets  in  Europe  (and  beyond)  

•  Four  years’  duraAon  •  StarAng  1st  February  2012  •  24+  partners  •  Coordinated  by  PIN-­‐University  of  Florence  (IT)  

The  ARIADNE  Partnership  

•  Coordinator  •         Partner  •  Associate  

Archaeology  and  heterogeneity  

Why  ARIADNE  •  Huge number of archaeological data available in

digital format •  Large number of non-communicating archaeological

datasets •  Increasing interest of the research community for

data sharing •  Social pressure for opening data vaults

Archaeologica  documentaAon  

•  Museum  InformaAon  •  Library  InformaAon  •  Images  •  3D  Models  

•  RDBMS  •  GIS  •  XML  •  CSV  •  Excel    •  Unstructured  

file  

XML

Digital  documentaAon  

IntegraAon  in  ARIADNE  •  CreaAon  of  an  integrated  ecosystem  of  archaeological  

informaAon  

•  To  guarantee  interoperability  among  data  coming  from  different  archives  

•  To  use  data  as  if  they  were  stored  in  a  single  archive  –  Unique  access  point  –  Uniform  interfaces  

•  To  ensure  retrieval  of  informaAon  in  a  coherent  and  meaningful  way  –  SemanAcs  

How  to  achieve  integraAon  Data sharing requires •  Suitability of somebody else’s data for one’s

purposes •  Interoperability of datasets •  Trusting in data collected by others •  Guarantee of data “provenance” •  Common understanding on meanings

Project activities •  Networking activities

–  Community building: involving additional institutions sharing data and establishing together guidelins

–  Standardization and good practices •  Trans-National Access to shared datasets and training in

their creation, as well as to on-line repositories –  Support for digitization and data organization

•  Research activities –  Knowledge organization –  Data management –  New or improved tools to extract information –  Advances in methodology

IntegraAon  road  map  –  DigiAsaAon  of  informaAon  on  paper  

–  Online  availability  •  Microsoc  Word,  Excel,  Access,  PDF,  GIS  …  

–  Online  accessibility  improvement  

•  ADS,  ARACHNE,  ZENON,  FasAOnline,  …  

–  Consistency  checking  for  mapping  and  informaAon  extracAon  

•  DescripAve  informaAon  to  ACDM  (Registry)  

•  Legacy  data  to  CIDOC  CRM  

•  Legacy  thesauri  to  SKOS          

Integrated  RegistraAon  and  Resource  DescripAons  

Archaeological  resources  registraAon  

•  RegistraAon  – Datasets  inventory  – Services  inventory  

•  DescripAon  – ACDM  model  for  describing  datasets  and  services  

•  IngesAon  into  Registry  •  Data  Enrichment  Policies  •  IntegraAon  Strategies  

Describing  archaeological  data  ARIADNE  Registry  (hgp://ariadne-­‐registry.dcu.gr)    •  Web  Interface  for  datasets  and  services  descripAon:    

–  hgp://ariadne-­‐registry.dcu.gr/index.php?p=web  •  XML  file  

–  hgp://ariadne-­‐registry.dcu.gr/index.php?p=xml  

•  Excel  templates  –  hgp://ariadne-­‐registry.dcu.gr/index.php?p=excel    

•  Database  export  to  RDF  according  to  the  ACDM  record  –  hgp://ariadne-­‐registry.dcu.gr/index.php?p=tools        

Registry  Web  Tool  

ARIADNE  Catalogue  Data  Model  (ACDM)  

•  A  model  for  resource  descrip8on  

– To  describe  archaeological  resources  made  available  by  partners  for  discovery,  access  and  integraAon  

– Based  on  exis8ng  standards    •  DCAT  -­‐>  Data  resources  

•  ISO/IEC  11179  -­‐>  Language  resources    

•  DBPedia  -­‐>  Services  

ARIADNE  Catalogue  Data  Model  (ACDM)  •  ARIADNE  resources  

–  Archaeological  datasets  from  20+  countries  

–  24  Languages  

–  1,800,000+  records  

–  50,000+  grey  literature  

•  ARIADNE  informa8on  types  

–  Archaeological  excavaAons  

–  Monuments  and  sites  

–  ScienAfic  analysis  

ARIADNE  Catalogue  Data  Model  (ACDM)  

•  ARIADNE  digital  resource  types  –  DBMS  -­‐>  PostgreSQL,  MySQL,  Microsoc  Access,  …  

–  Datasets  -­‐>  Repositories  of  digital  objects  with  the  same  structure  

–  Collec8ons  -­‐>  Sets  of  text  files/images  in  hierarchical  systems  

–  Mul8media  -­‐>  3D  models,  images,  videos  

–  GIS    

–  Metadata  and  vocabularies  

ARIADNE  Services  •  Web  Services  

–  Visual  Media  Service:  easy  publicaAon  and  presentaAon  on  the  web  of  complex  visual  media  assets  (hgp://visual.ariadne-­‐infrastructure.eu/  )  

–  Landscape  Service:    large  terrain  dataset  generaAon,  3D  landscape  composing  and  3D  model  processing  hgp://landscape.ariadne-­‐infrastructure.eu/    

–  DANS  Dendrochronology  Service:  hgp://dendro.dans.knaw.nl/    

–  DAI  Vocabularie:  hgp://archwort.dainst.org/thesaurus/de/vocab  

–  DAI  GazePeer:  hgp://gazegeer.dainst.org/    

–  Vocabulary  Matching  Tool:  hgp://heritagedata.org/vocabularyMatchingTool    

•  Stand-­‐alone  Services  

–  MeshLab:  open  source,  portable,  and  extensible  system  for  the  processing  and  ediAng  of  unstructured  3D  triangular  mesheshgp://meshlab.sourceforge.net/    

Visual  Media  Service  

•  For  each  visual  media  asset:  –  The  user  fills  a  simple  form  and  uploads  the  data  file  

•  3D  models  •  High-­‐resoluAon  2D  images  •  ReflecAon  TransformaAon  Images  (RTI)  

– An  automa8c  service  on  the  ARIADNE  server  opens,  converts,  transforms  it  in  a  browsable  page  

•  MulA-­‐resoluAon  encoding  •  Progressive  transmission  •  Immediate  visualizaAon    

– An  URL  (or  a  .zip  file)  is  created  and  sent  to  the  user  –  The  user  links  the  media  asset  in  his  archive  using  the  provided  URL  

Visual  Media  Service:  Online  InspecAon  

hPp://visual.ariadne-­‐infrastructure.eu/  

Media  Service  IntegraAon  

•  Results  of  the  Visual  Media  Service  canbe  incorporated  directly  in  any  exisAng  archive  

 •  See  an  example  

of  work  accomplished  with  ADS:  

The  ACDM  model  

Collection

DataFormat

DigitalObjectDesc

hasAttachedObject

MetadataRecord

Vocabulary

hasMetadataR

ecord

usesVocabulary

usesVocabulary

AttachedDocuments

hasSimpleDigitalType

Service applyTo

DataResource

ArchaeologicalResource

LanguageResource

hasRecordStructure

Database

TextualDocument

GazetteerhasItemMetadataStructure

dct:hasParts

Mapping

hasAttachedDocument

MetadataSchema

MetadataElement

hasElements

conformsTo

1..*

1..*

1..* 0..*

1..*

0..*

1..*

0..*

1

0..*

EncodingLanguage

expressedIn

hasSchema

DBSchema

isRealizedBy

1

1..*

0..1 1..*

1..*0..*

1..*

1..*

0..*

0..*

1..*

1..*0..*

1 1..*

1..*

1

1

1..*

dct:isPartOf

1..*

dcat:Catalogdct:isPartOf

foaf:Agent dct:publisher/dct:contributor/:owner/ …

Distribution dcat:distribution

1..*

1

GIS

from

0..*

0..*

to

Licence

hasLicence

dct:publisher0..*

1

MetadataAttribute

hasAttribute

1

0..* Version

hasVersion

hasVersion

1

1

0..*0..*

hasVersion

0..*

1

DataSet

dcat:Dataset dcat:Distribution

hasVersion 1..*1

skos:Concept

dct:isPartOf

hasMetadataRecord

0..*

0..*

1..*

0..*

AriadneConcept

1..* 0..*native-subject

ariadne-subject1..*

0..*

provided-subject derived-subject

dct:publisher 1*

0..*1

0..*1

1..*

0..1

Resource  associaAons  •  dct:isPartOf  associates  any  archaeological  resource  with  the  catalogue.    •  dct:publisher  an  agent  responsible  for  making  the  resource  publicly  accessible    •  dct:contributor  an  agent  responsible  for  describing  the  resource  in  the  Catalogue    •  dct:creator  an  agent  primarily  responsible  for  creaAng  the  resource.    •  owner  an  agent  that  is  the  legal  owner  of  the  resource.    •  legalResponsible  an  agent  holding  the  legal  responsibility  of  the  resource.  •  scien8ficResponsible  an  agent  holding  the  scienAfic  responsibility.    •  technicalResponsible  an  agent  holding  the  technical  responsibility    •  ariadne-­‐subject    a  subject  from  the  AAT  vocabulary    

–  provided-­‐subject  manually  specified  subjects  drawn  from  the  Gegy  AAT  vocabulary;  

–  derived-­‐subject  subjects,  automaAcally  derived  from  mapping  local  vocabularies  to  the  Gegy  AAT  vocabulary.  

•  na8ve-­‐subject  a  subject  from  a  vocabulary  in  use  by  the  original  owner  of  the  resource    

•  hasAPachedDocuments  the  documents  that  are  agached  to  the  resource  for  illustraAon  purposes.  

•  High  level  descripAon  of  the  archaeological  archives  –  Resource  Discovery  –  Common  elements  idenAficaAon  

•  Subjects  integraAon  –  Concepts  and  terms  standardisaAon  

•  SpaAal  integraAon  –  Unambiguous  idenAficaAon  of  geographic  enAAes  –  Diachronic  representaAon  of  places  

•  Temporal  integraAon  –  Unambiguous  representaAon  of  Ames  and  periods  

ARIADNE  integraAon  pathways  

IntegraAon  pathways  

•  “What”  -­‐>  Subjects  and  topics    •  “Where”  -­‐>  Place  names  and  geographic  enAAes  •  “When”  -­‐>  Temporal  aspects,  periods  and  Ames  

•  Formal  descripAon  -­‐>  ACDM  model  •  Linked  Open  Data  philosophy  -­‐>  RDF  •  Shared  knowledge  base  -­‐>  ARIADNE  Registry  

•  Thesauri  and  terminological  tools  for  archaeology  

•  NaAonal  or  local  validity  

IntegraAon  pathways:  “What”  

IntegraAon  pathways:  “What”  

•  Using  Gegy  Art  &  Architecture  Thesaurus  (AAT)  as  a  common  spine  

•  Partners  to  enter  their  data  in  the  ACDM  Registry:  –   Using  AAT  terms  –   Providing  a  mapping  from  a  naAonal  /  regional  vocabulary  to  the  AAT  

•  Mapping  tools  for  interoperability  among  vocabularies  –  USW  Mapping  Tool  

IntegraAon  paths:  “What”  

IntegraAon  pathways:  “Where”  •  All  spaAal  coordinates  converted  to  WGS84  •  Where  place  names  supplied;  Geonames  used  to  

provide  coordinates  (where  possible)  •  Hystorical  names:  Pelagios  project  

–  hgp://pelagios-­‐project.blogspot.it/  –  Linked  Open  Data  and  URI  from  Pleiades  –  hgp://pleiades.stoa.org/  

•  Need  to  preserve  hierarchy  in  place  names  (state,  region,  locality  …)  

•  Modern  place  names  resoluAon  starAng  from  ancient  ones  –  ByzanAum,  ConstanAnople,  Istanbul  …  

IntegraAon  pathways:  “Where”  

IntegraAon  pathways:  “When”  •  Content  providers:  different  periodisaAons  and  

subdivision  of  the  past  used  in  archaeology  •  Controlled  vocabularies  -­‐>  supplied  to  ARIADNE  with  a  

start  and  end  date  for  each  term  •  Vocabularies,  and  start  and  end  dates  to  be  made  

available  as  URIs  in  Linked  Data  format  –  CollaboraAon  with  PeriodO  project  –  hgp://perio.do/  –  Dedicated  URIs  for  period  collecAons  created  for  ARIADNE  in  PeriodO  

–  hgp://www.ariadne-­‐infrastructure.eu/Resources/PeriodO/DocumentaAon  

IntegraAon  pathways:  “When”  

Focus  on  Data  Quality  /  Enrichment  

Content  publicaAon  pipeline  

Data  +  

Metadata  

Data  +  

Metadata  

Data  +  

Metadata  

Excel  Files  

OAI-­‐PMH  

XML  

AggregaAon  Infrastructure  

Portal  Infrastructure  

ValidaAon   Enrichment   PublicaAon  

Thesauri  Mappings  

Temporal  Mappings  

ACDM   ACDM  Enriched  

Subject  enrichment  

•  Data  integraAon  •  NaAve  Subjects  •  AAT  Mappings  •  Derived  Subjects  

Subject  Enrichment  

NaAve  Subjects   Derived  Subjects  (AAT  Concepts)  

Content  provider  Mapping  to  AAT  

skos:prefLabel:  roads  acdm:derivedSubject:  hPp://vocab.gePy.edu/aat/300008217  

Acdm:naAveSubject:  roads  

Data  validaAon  

•  Data  consistency  •  Mandatory  fields  •  Subject,  Space,  Time  

Querying  the  integrated  archives  

•  User  interfaces  and  the  ARIADNE  Portal  – Main  access  point  to  system  and  services  

•  Archaeological  objects,  places,  events,  actors  and  types    –  Browse  and  refine  with  facet  views  

•  InteracAon  with  the  Registry  –  Drives  the  queries  towards  the  most  relevant  archives  

•  InteracAon  with  terminological  data  and  services  –  Vocabularies  to  provide  support  at  query  and  retrieval  Ame  

Query  on  Map  and  Timeline  

ARIADNE  Portal  •  ARIADNE  Portal  

–  Version  1.0:  released  on  24  March  2016  –  Version  1.2:  released  on  22  November  2016  –  Official  domain:  hgp://portal.ariadne-­‐infrastructure.eu  

Content  Discovery  •  LighAng  fast  search  •  Content  similarity    

–  Geographical  –  ThemaAc  –  Temporal  (TBA)  

•  Faceted  browsing  •  MulA-­‐lingual  AAT  based  

search  

Preserving  legacy  archives  

•  Legacy  database  syncronizaAon  

–  ARIADNE  system  constantly  updated  according  to  modificaAons  of  legacy  archives    

•  References  to  legacy  archives  always  provided  –  Data  provenance  

–  URLs  to  informaAon  on  original  portals/web  applicaAons  

•  User  to  navigate  original  informaAon  

–  To  perform  custom  searches  tailored  on  specific  needs  

 

The  ARIADNE  Infrastructure  

17  Content    Providers  

>  1.85m  Records  

1  Common  Schema  

>  6k  AAT  Concepts  

>  1.78m  SpaAal  EnAAes  

>  73k  Periods  &  Dates  

>  10  Services  

>  1.77m  NaAve  Subjects  

>  8k  Users  

Data  ingesAon  /  flow  within  ARIADNE  

Repository  

Excel  Sheet  

MORe

 

ARIADNE  Registry  

ValidaAon  

Cleaning  

Enrichment  

IntegraAon  

RDF  Store  (RDF)  

ElasAc  Search  

RDF  Store  (CRM)  

Archive  

ARIADNE  Portal  

IntegraAon  Experiments  

ARIADNE  is  a  project  funded  by  the  European  Commission  under  the  Community’s  Seventh  Framework  Programme,  contract  no.  FP7-­‐INFRASTRUCTURES-­‐2012-­‐1-­‐313193.    The  views  and  opinions  expressed  in  this  presentaAon  are  the  sole  responsibility  of  the  authors  and  do  not  necessarily  reflect  the  views  of  the  European  Commission.  

Grazie  …