28
EConnect WP1 & semantic issues VU members Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker

EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker

Embed Size (px)

Citation preview

Page 1: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker

EConnect WP1 & semantic issues

• VU members– Guus Schreiber, Antoine Isaac, Jacco van

Ossenbruggen, Jan Wielemaker

Page 2: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker

EConnect WP1

• Creation of semantic layer• Alignment of vocabularies in that layer• Technical specifications of semantics-based

operations• Integration with operational services

Page 3: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker

EDL D2.5 Requirements

• Contextualization of works using knowledge organization systems

Page 4: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker

Requirements

• Contextualization of works using knowledge organization systems

• Reminder: SKOS allows to represent cross-vocabulary links

Page 5: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker

Important enriching challenges

Fine vision, but this requires:• Identifying simple values with more complex

objects, from outside the original context• Mapping knowledge organization systems

together• Recognizing relation between surrogates • Including when they represent the same work

Page 6: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker

Important enriching challenges

• Finding good vocabularies for enriching and aligning

• Subject vocabularies or authority files richly structured, with appropriate coverage of a domain

• Those is likely to be specific to a domain– At the level of aggregators?

Page 7: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker

The EDM, and then?

• That model alone is not enough

• It fits some precise data modeling and access needs

• But it does not commit much to specific domain or application requirements– Remember: it's a feature!

Page 8: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker

Two steps for flexible and useful knowledge representation

• Fitting domain via specialization

• Cf Martin: Cross-model integration via property specialization

• ens:isAbout > dc:subject > rma:depicts• skos:broader > ex:broaderPartitive

Page 9: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker
Page 10: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker

Two steps for flexible and useful knowledge representation

• Fitting domain via specialization– Cf Martin: Cross-model integration via property

specialization

– ens:isAbout > dc:subject > rma:depicts

– skos:broader > ex:broaderPartitive

• Someone has to take care of this:– Europeana? Content providers? Aggregators?

– Cf. the process devised for ESE

Page 11: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker

Two steps for flexible and useful knowledge representation

• Fitting application requirements

• Art of creating shortcuts in the representations• New application-specific properties as views over

complex paths– surrogateMatch

– integrating all views

Page 12: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker
Page 13: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker

This is important, and related to how Europeana will exploit the EDM

• What Jan is telling in the room above

• Several options for considering semantic services for Europeana

• Pre-processing query– Eg autocompletion using semantic networks

• Parallel processing of query

Page 14: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker

At the end of parallel processing

DisambiguationDisambiguation

RelationsRelations

VocabulariesVocabularies

Page 15: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker

This is important, and related to how Europeana will exploit the EDM

• What Jan is telling in the room above

• Several options for considering semantic services for Europeana

• Pre-processing query– Eg autocompletion using semantic networks

• Parallel processing of query• Post-processing of query

– E.g., clustering

Page 16: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker

ClioPatria: “Matisse”“Matisse” in the

title“Matisse” in the

title

Located in“Musee Matisse”

Located in“Musee Matisse”

Created by“Matisse”

Created by“Matisse”

Paintings in the same style as

used by “Matisse”

Paintings in the same style as

used by “Matisse”

Page 17: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker

What Jan is telling (c'ed)

• Semantic search is oriented towards serendipity• Great, but there are scalability problems• Standing in the path of the operational system?

– Not really recommended…

• Still allows for parallel and maybe post-processing– for scenarios where user can cope with rich information

Page 18: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker

What Jan is telling (c'ed)

• Other solution?– Like, more basic stuff!

• Well, we have a schema that presents quite detailed distinctions, let's make it work…

Page 19: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker

Derived properties as a way to "index" derived relations

• Complex paths are expansive to query• Shortcuts are useful

• Example: searching for "Everything inspired from Leonardo's work"

Page 20: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker

In the original EDM-compliant graph

Page 21: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker
Page 22: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker
Page 23: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker

Derived properties as a way to "index" derived relations

• Having the value "Leonardo" somehow directly attached to the surrogate of MonaLisa2000 would be handy– As well as other languages for Da Vinci

• In fact this can be used for enriching the (XML) records before they get indexed in the Europeana operational service

Page 24: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker

Compiling for traditional text-search

EDMEDM SemanticEngineSemanticEngine

XMLDumpXMLDump LuceneLucene

Page 25: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker

Determining pre-compilation strategies

• What should a pre-compiled, enriched record should contain?– Labels? closely-related concepts? labels from other

languages?

– Which short-cuts are relevant? Which are the most useful?

Page 26: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker
Page 27: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker

Determining pre-compilation strategies

• What should a pre-compiled, enriched record should contain?– Labels? closely-related concepts? labels from other languages?

– Which short-cuts are relevant? Which are the most useful?

• Coming with appropriate ways to make the schema work

• Maybe several profiles can be used – Cf the way the different elements of the ESE are used for different

Europeana features (timeline, advanced search, basic search)

• This is also semantics!

• But highly dependent from applications

Page 28: EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker

Guidelines and best practices will be handy

• Connecting specialized data models to more generic ones

• Enrichment• Connection of objects (identity conditions)• "Practical" application-specific semantics