Upload
chiara-del-vescovo
View
184
Download
1
Tags:
Embed Size (px)
Citation preview
The Research and Education Space a pathway to bring our cultural heritage
(including the BBC archive) to lifeDr Chiara Del Vescovo Data Architect at BBC
Vision
Web-like Web-based
Interlinking heterogenous
resources
Capturing semantic
interrelations
Reliable, provably
cleared for education
Vision
Web-like Web-based
Interlinking heterogenous
resources
Capturing semantic
interrelations
Reliable, provably
cleared for education
Linked Open Data
RES (BBC, Jisc, BUFVC)
Core Platform: “Acropolis”
Project RES: Technical Approach
1The crawler fetches data via HTTP from publishedsources. Once retrieved, it is indexed by the full-textstore and passed to the aggregation engine for evaluation.
2
The results of the aggregation engine's evaluation processare stored in the aggregate store, which contains minimalbrowse information and information about the similarity ofentities.
3
The public face of the core platform is an extremely basicbrowsing interface (which presents the data in tabular formto aid application developers), and read-write RESTful APIs.
4Applications may use the APIs to locate information aboutaggregated entities, and also to store annotations and activitydata.
5Each component employs standard protocols and formats.For example, we can make use of any capable quad-storeas our aggregate store.
Linked data
crawlerAnansi Aggregation
engineSpindle
Full-text store
Aggregate store
Minimal browse interface &
APIs
Quilt
Activity store
usersdevelopersAcropolis (index!)
BL
BM
BFI
Tate
V&A
…
BBC
RES (BBC, Jisc, BUFVC)
Core Platform: “Acropolis”
Project RES: Technical Approach
1The crawler fetches data via HTTP from publishedsources. Once retrieved, it is indexed by the full-textstore and passed to the aggregation engine for evaluation.
2
The results of the aggregation engine's evaluation processare stored in the aggregate store, which contains minimalbrowse information and information about the similarity ofentities.
3
The public face of the core platform is an extremely basicbrowsing interface (which presents the data in tabular formto aid application developers), and read-write RESTful APIs.
4Applications may use the APIs to locate information aboutaggregated entities, and also to store annotations and activitydata.
5Each component employs standard protocols and formats.For example, we can make use of any capable quad-storeas our aggregate store.
Linked data
crawlerAnansi Aggregation
engineSpindle
Full-text store
Aggregate store
Minimal browse interface &
APIs
Quilt
Activity store
informed by
usersdevelopersAcropolis (index!)
planned pilots
BL
BM
BFI
Tate
V&A
…
BBC
AcropolisCore Platform: “Acropolis”
Project RES: Technical Approach
1The crawler fetches data via HTTP from publishedsources. Once retrieved, it is indexed by the full-textstore and passed to the aggregation engine for evaluation.
2
The results of the aggregation engine's evaluation processare stored in the aggregate store, which contains minimalbrowse information and information about the similarity ofentities.
3
The public face of the core platform is an extremely basicbrowsing interface (which presents the data in tabular formto aid application developers), and read-write RESTful APIs.
4Applications may use the APIs to locate information aboutaggregated entities, and also to store annotations and activitydata.
5Each component employs standard protocols and formats.For example, we can make use of any capable quad-storeas our aggregate store.
Linked data
crawlerAnansi Aggregation
engineSpindle
Full-text store
Aggregate store
Minimal browse interface &
APIs
Quilt
Activity storebeta.acropolis.org.uk
Core Platform: “Acropolis”
Project RES: Technical Approach
1The crawler fetches data via HTTP from publishedsources. Once retrieved, it is indexed by the full-textstore and passed to the aggregation engine for evaluation.
2
The results of the aggregation engine's evaluation processare stored in the aggregate store, which contains minimalbrowse information and information about the similarity ofentities.
3
The public face of the core platform is an extremely basicbrowsing interface (which presents the data in tabular formto aid application developers), and read-write RESTful APIs.
4Applications may use the APIs to locate information aboutaggregated entities, and also to store annotations and activitydata.
5Each component employs standard protocols and formats.For example, we can make use of any capable quad-storeas our aggregate store.
Linked data
crawlerAnansi Aggregation
engineSpindle
Full-text store
Aggregate store
Minimal browse interface &
APIs
Quilt
Activity store
informed by
usersdevelopersAcropolis
What I do (with my colleague Alex)
planned pilots
BL
BM
BFI
Tate
V&A
…
BBC
What I do (with my colleague Alex)
1.devise a publishing scheme to determine URIs
2. translate original metadata into RDF 3. links discovery and reconciliation with
“hubs” (e.g., LoC, Geonames, DBPedia)
4.make the existing schema explicit as a local ontology
5.matching the ontology onto well-established ontologies (e.g., DCMI, FOAF, SKOS, CIDOC-CRM)
6.advice on how to express machine-readable licenses, for both resources and metadata
7. technical support to publish LOD
BL
BM
BFI
Tate
V&A
…
BBC
• Europeana • “general” Data Model (EDM) • collection holders responsible to fit their
resources and metadata in EDM
Europeana
• Europeana • “general” Data Model (EDM) • collection holders responsible to fit their
resources and metadata in EDM
Europeana
1. Which metadata?• Currently, resources metadata mostly oriented
towards “physical proximity” i.e., indexes reflect similarity of author’s surname, broad subject, format, media, etc.
• Heterogeneous platforms and data models incompatibility, transformations needed
• Even when RDF is used, there’s a proliferation of terms, vocabularies, formats adopted little (if any) validation
2. Linking
• Systems that do not use RDF do not allow collection holders to express their knowledge as they wish underspecified knowledge
• Even when RDF is used, information often provided as literals rather than links to URIs ad hoc solutions unavailable in a machine-readable format
3. Usability• Reliability
• Lack of toolsdevelopers have little contact with collection holders
• Licensing issues resources licensing (not always explicit) metadata licensingusers need to be aware of what that mean(note that in educations things are slightly easier - blanket licensing etc.)
Interested?• get in touch!
•new advertised position asJunior Data Architect careershub.bbc.co.uk