37
The Research and Education Space a pathway to bring our cultural heritage (including the BBC archive) to life Dr Chiara Del Vescovo Data Architect at BBC

Documents, services, and data on the web

Embed Size (px)

Citation preview

The Research and Education Space a pathway to bring our cultural heritage

(including the BBC archive) to lifeDr Chiara Del Vescovo Data Architect at BBC

Vision

Web-like Web-based

Vision

Web-like Web-based

Interlinking heterogenous

resources

Vision

Web-like Web-based

Interlinking heterogenous

resources

Capturing semantic

interrelations

Vision

Web-like Web-based

Interlinking heterogenous

resources

Capturing semantic

interrelations

Reliable, provably

cleared for education

Vision

Web-like Web-based

Interlinking heterogenous

resources

Capturing semantic

interrelations

Reliable, provably

cleared for education

Linked Open Data

A pathwayusers

BL

BM

BFI

Tate

V&A

BBC

A pathwayusers

BL

BM

BFI

Tate

V&A

BBC

?

usersdevelopers

A pathway

BL

BM

BFI

Tate

V&A

BBC

usersdevelopers

A pathway

BL

BM

BFI

Tate

V&A

BBC

aggregating platform

RES (BBC, Jisc, BUFVC)

Core Platform: “Acropolis”

Project RES: Technical Approach

1The crawler fetches data via HTTP from publishedsources. Once retrieved, it is indexed by the full-textstore and passed to the aggregation engine for evaluation.

2

The results of the aggregation engine's evaluation processare stored in the aggregate store, which contains minimalbrowse information and information about the similarity ofentities.

3

The public face of the core platform is an extremely basicbrowsing interface (which presents the data in tabular formto aid application developers), and read-write RESTful APIs.

4Applications may use the APIs to locate information aboutaggregated entities, and also to store annotations and activitydata.

5Each component employs standard protocols and formats.For example, we can make use of any capable quad-storeas our aggregate store.

Linked data

crawlerAnansi Aggregation

engineSpindle

Full-text store

Aggregate store

Minimal browse interface &

APIs

Quilt

Activity store

usersdevelopersAcropolis (index!)

BL

BM

BFI

Tate

V&A

BBC

RES (BBC, Jisc, BUFVC)

Core Platform: “Acropolis”

Project RES: Technical Approach

1The crawler fetches data via HTTP from publishedsources. Once retrieved, it is indexed by the full-textstore and passed to the aggregation engine for evaluation.

2

The results of the aggregation engine's evaluation processare stored in the aggregate store, which contains minimalbrowse information and information about the similarity ofentities.

3

The public face of the core platform is an extremely basicbrowsing interface (which presents the data in tabular formto aid application developers), and read-write RESTful APIs.

4Applications may use the APIs to locate information aboutaggregated entities, and also to store annotations and activitydata.

5Each component employs standard protocols and formats.For example, we can make use of any capable quad-storeas our aggregate store.

Linked data

crawlerAnansi Aggregation

engineSpindle

Full-text store

Aggregate store

Minimal browse interface &

APIs

Quilt

Activity store

informed by

usersdevelopersAcropolis (index!)

planned pilots

BL

BM

BFI

Tate

V&A

BBC

AcropolisCore Platform: “Acropolis”

Project RES: Technical Approach

1The crawler fetches data via HTTP from publishedsources. Once retrieved, it is indexed by the full-textstore and passed to the aggregation engine for evaluation.

2

The results of the aggregation engine's evaluation processare stored in the aggregate store, which contains minimalbrowse information and information about the similarity ofentities.

3

The public face of the core platform is an extremely basicbrowsing interface (which presents the data in tabular formto aid application developers), and read-write RESTful APIs.

4Applications may use the APIs to locate information aboutaggregated entities, and also to store annotations and activitydata.

5Each component employs standard protocols and formats.For example, we can make use of any capable quad-storeas our aggregate store.

Linked data

crawlerAnansi Aggregation

engineSpindle

Full-text store

Aggregate store

Minimal browse interface &

APIs

Quilt

Activity storebeta.acropolis.org.uk

Acropolis

Acropolis

Acropolis

Acropolis

Core Platform: “Acropolis”

Project RES: Technical Approach

1The crawler fetches data via HTTP from publishedsources. Once retrieved, it is indexed by the full-textstore and passed to the aggregation engine for evaluation.

2

The results of the aggregation engine's evaluation processare stored in the aggregate store, which contains minimalbrowse information and information about the similarity ofentities.

3

The public face of the core platform is an extremely basicbrowsing interface (which presents the data in tabular formto aid application developers), and read-write RESTful APIs.

4Applications may use the APIs to locate information aboutaggregated entities, and also to store annotations and activitydata.

5Each component employs standard protocols and formats.For example, we can make use of any capable quad-storeas our aggregate store.

Linked data

crawlerAnansi Aggregation

engineSpindle

Full-text store

Aggregate store

Minimal browse interface &

APIs

Quilt

Activity store

informed by

usersdevelopersAcropolis

What I do (with my colleague Alex)

planned pilots

BL

BM

BFI

Tate

V&A

BBC

What I do (with my colleague Alex)

BL

BM

BFI

Tate

V&A

BBC

What I do (with my colleague Alex)

1.devise a publishing scheme to determine URIs

2. translate original metadata into RDF 3. links discovery and reconciliation with

“hubs” (e.g., LoC, Geonames, DBPedia)

4.make the existing schema explicit as a local ontology

5.matching the ontology onto well-established ontologies (e.g., DCMI, FOAF, SKOS, CIDOC-CRM)

6.advice on how to express machine-readable licenses, for both resources and metadata

7. technical support to publish LOD

BL

BM

BFI

Tate

V&A

BBC

DBPedialite

DBPedialite

DBPedialite

British Museum

British Museum

British Museum

DBPedia

DBPedia

• Europeana • “general” Data Model (EDM) • collection holders responsible to fit their

resources and metadata in EDM

Europeana

• Europeana • “general” Data Model (EDM) • collection holders responsible to fit their

resources and metadata in EDM

Europeana

British Library

Extreme cases

Challenges

Stakeholders go quiet!

1. Which metadata?• Currently, resources metadata mostly oriented

towards “physical proximity” i.e., indexes reflect similarity of author’s surname, broad subject, format, media, etc.

• Heterogeneous platforms and data models incompatibility, transformations needed

• Even when RDF is used, there’s a proliferation of terms, vocabularies, formats adopted little (if any) validation

2. Linking

• Systems that do not use RDF do not allow collection holders to express their knowledge as they wish underspecified knowledge

• Even when RDF is used, information often provided as literals rather than links to URIs ad hoc solutions unavailable in a machine-readable format

3. Usability• Reliability

• Lack of toolsdevelopers have little contact with collection holders

• Licensing issues resources licensing (not always explicit) metadata licensingusers need to be aware of what that mean(note that in educations things are slightly easier - blanket licensing etc.)

Interested?• get in touch!

[email protected]

[email protected]

•new advertised position asJunior Data Architect careershub.bbc.co.uk