25
Personalised access to cultural heritage spaces Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment Authors: Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac www.paths-project.eu search explore paths

Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

Embed Size (px)

DESCRIPTION

Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac This document is a case study on using the Europeana Data Model (EDM) [Doerr et al., 2010] for representing annotations of Cultural Heritage Objects (CHO). One of the main goals of the PATHS project is to augment CHOs (items) with information that will enrich the user’s experience. The additional information includes links between items in cultural collections and from items to external sources like Wikipedia. With this goal, the PATHS project has applied Natural Language Processing (NLP) techniques on a subset of the items in Europeana.

Citation preview

Page 1: Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

Personalised access to

cultural heritage spaces

Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment

Authors: Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

www.paths-project.eu!

sea

rch!

exp

lore

!

pa

ths

!

Page 2: Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

Roadmap from ESEPaths to EDMPaths:

a note on representing annotations resulting from automatic

enrichment

Aitor Soroa, Eneko Agirre, Arantxa Otegi, Antoine Isaac

February 10, 2014

Contents

1 Introduction 1

2 ESEPaths 2

3 Roadmap for basic conversion of ESEPaths to EDM 4

4 Using Open Annotation to represent attributes in relations 94.1 Offsets and selectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

5 Conclusion 11

1 Introduction

This document is a case study on using the Europeana Data Model (EDM) [Doerr et al., 2010]1for representing annotations of Cultural Heritage Objects (CHO). One of the main goals ofthe PATHS project is to augment CHOs (items) with information that will enrich the user’sexperience. The additional information includes links between items in cultural collectionsand from items to external sources like Wikipedia. With this goal, the PATHS project hasapplied Natural Language Processing (NLP) techniques on a subset of the items in Euro-peana. Using these techniques, PATHS enriches CH items with the following information[Agirre and de Lacalle, 2011, Otegi et al., 2012]:

• Informativeness score: each item is associated to a value indicating the overall “infor-mativeness” of the item, which is derived from the amount of text in its metadata andinversely proportional to the number of items where the same text is mentioned.

• Vocabulary terms : vocabulary terms associated to the item. These terms are used forcreating the tag clouds shown to the user.

1http://pro.europeana.eu/edm-documentation

1

Page 3: Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

Roadmap from ESEPaths to EDMPaths:

a note on representing annotations resulting from automatic

enrichment

Aitor Soroa, Eneko Agirre, Arantxa Otegi, Antoine Isaac

February 10, 2014

Contents

1 Introduction 1

2 ESEPaths 2

3 Roadmap for basic conversion of ESEPaths to EDM 4

4 Using Open Annotation to represent attributes in relations 94.1 Offsets and selectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

5 Conclusion 11

1 Introduction

This document is a case study on using the Europeana Data Model (EDM) [Doerr et al., 2010]1for representing annotations of Cultural Heritage Objects (CHO). One of the main goals ofthe PATHS project is to augment CHOs (items) with information that will enrich the user’sexperience. The additional information includes links between items in cultural collectionsand from items to external sources like Wikipedia. With this goal, the PATHS project hasapplied Natural Language Processing (NLP) techniques on a subset of the items in Euro-peana. Using these techniques, PATHS enriches CH items with the following information[Agirre and de Lacalle, 2011, Otegi et al., 2012]:

• Informativeness score: each item is associated to a value indicating the overall “infor-mativeness” of the item, which is derived from the amount of text in its metadata andinversely proportional to the number of items where the same text is mentioned.

• Vocabulary terms : vocabulary terms associated to the item. These terms are used forcreating the tag clouds shown to the user.

1http://pro.europeana.eu/edm-documentation

1

Page 4: Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

• Event information associated with the item: CHOs often provide event- or activity-relatedinformation, such as people walking, etc. We enrich the items by means of a predefinedlist of words that can be used to refer to events. This data allows answering questionslike “give me items with people running”, “items with people playing”, etc.

• Related items: CH items which are semantically related.

• Background links that relate CH items with external resources such as Wikipedia. Whenlinking a CH item with some external resource, we keep track of the original text snip-pet from which the association is derived. For instance, an item could be related to aWikipedia article because of some text snippet of the dc:description field. In such casewe store the reference to the field and offset as attributes.2 (note that in some caseshowever there is little point in keeping the text, because the enrichment is done based ona combination of metadata fields)

The PATHS project started in 2011, and it adopted the representation schema of choicethen, ESE3. We extended it extended to a format called ESEPaths to represent the enrichmentinformation just mentioned [Agirre and de Lacalle, 2011, Otegi et al., 2012]. In this documentwe describe a proposal for representing PATHS enrichments following EDM (Europeana DataModel), the new data model used by Europeana.

The document is structured as follows. We first introduce ESEPaths (Section 2), thenthe roadmap for a simple conversion to EDM (Section 3). Section 4 explains some possible(advanced) solutions to the problems identified in Section 3. Finally the conclusions are drawn.

2 ESEPaths

PATHS has defined a format derived from ESE, called ESEPaths, which adds the enrichmentinformation described above. Specifically, ESEPaths adds the following fields:

• <paths:informativeness> with the informativeness score of the ESE record.

• <paths:vocabulary>, which links the ESE record with vocabulary terms. The elementhas the following attributes:

– name: name of the external vocabulary.– URI: the address (URI) of the specific category in the vocabulary.– confidence: the confidence of the association.

• <paths:event> which links the ESE record with external events. The element has thefollowing attributes:

– source: the name of the external resource of the event (for instance, WordNet).– canonical_form: the canonical word form of the annotated event.– confidence: confidence of the association.

2Keeping track of this information, for instance, for an interface showing those annotations, as it can empha-

size the specific snippet and link it to the Wikipedia/dbpedia article when the user points to it.

3http://www.europeana.eu/schemas/ese/

2

Page 5: Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

<record><!-- Existing ESE record --><dc:identifier>http://www.thebowesmuseum.org.uk/10432/</dc:identifier><europeana:uri>http://www.europeana.eu/resolve/record/09405/8F49</europeana:uri><dc:title>Stembridge Windmill, High Ham, Somerset</dc:title><dc:description>This is a random-coursed blue lias ...</dc:description><dcterms:isPartOf>Bowes Museum</dcterms:isPartOf><dc:subject>1670</dc:subject><dc:type>Image</dc:type><europeana:provider>CultureGrid</europeana:provider><europeana:isShownAt>http://www.thebowesmuseum.org.uk/10432/</europeana:isShownAt><europeana:hasObject>false</europeana:hasObject><europeana:country>uk</europeana:country><europeana:type>IMAGE</europeana:type><europeana:language>en</europeana:language>

<!-- ESEPaths augmentation -->

<!-- item informativeness --><paths:informativeness>0.7</paths:informativeness>

<!-- vocabulary mapping --><paths:vocabulary confidence="0.8" source="wikicat"

URI="http://en.wikipedia.org/wiki/Category:Tower_mills">Tower Mills</paths:vocabulary>

<!-- events --><paths:event confidence="0.8" source="wordnet" canonical_form="play"

start_offset="120" end_offset="127" field="dc:description">playing</paths:event>

<!-- related items --><paths:related_item confidence="0.8" field="dc:subject" field_no="1"

method="LDA">http://www.europeana.eu/portal/record/09405t/A6F9A

</paths:related_item>

<!-- background links items --><paths:background_link source="wikipedia" start_offset="0" end_offset="11"

field="dc:subject" confidence="0.015"method="wikipedia-miner-1.2.0"title="Archaeology">http://en.wikipedia.org/wiki/Archaeology

</paths:background_link></record>

Figure 1: Example of an ESEPaths record

3

Page 6: Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

• <paths:related_item> which links the ESE record with related CH items. The elementhas the following attributes:

– confidence: confidence of the association.– method: which method produced the association– field: the name of the ESE field whose content suggests the similarity relation.– field_no: the position of the ESE field described above (useful in case the ESE

records contains more than one field with the same name).

• <paths:background_link>: which links the ESE record with an item from an externalresource. The element has the following attributes:

– source: the name of the external resource.– start_offset: the offset (in characters) within the field element where the text

anchor begins.– end_offset: the offset (in characters) within the field element where the text anchor

ends.– field: the field of the ESE record where the anchor for this relation is located.– confidence: confidence of the association.– method: which method produced the association.– title: title of the URL which the background link points to.– sentiment: polarity of the textual information included in the corresponding link.

It has fixed values, namely “pos” for positive results, “neg” for negative and “neu” forneutral.

Figure 1 shows an example of a CH record in ESEPaths. The first lines are just a copy ofthe original ESE record, whereas the new elements (in the paths namespace) are at the end.Note that identifiers (incl. URIs) are not real, and shortened so that the listing fits on the page.

3 Roadmap for basic conversion of ESEPaths to EDM

As said before, all the data produced by the PATHS project is encoded following the ESEformat extended with new elements. However, Europeana is switching from ESE to a new datamodel, EDM. The main difference between ESE and EDM is that the latter is more expressiveand based on Semantic Web and Linked Data technologies (RDF, ontologies). In this section,we outline the main design we devise for switching from ESEPaths to EDM.

The main design criteria we have followed is the following:

1. All PATHS annotations should be properly represented using EDM.

2. It must be possible to retrieve particular PATHS annotations.

3. We should depart as less as possible from standard EDM.

4

Page 7: Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

The first criterion states that all PATHS annotations should be described using EDM. Aswill be shown below, some annotation attributes are difficult to represent following EDM and,as a consequence, a compromise has to be made between describing PATHS annotations in theirfull richness and using proper EDM concepts and properties for representing them. The secondcriterion states that the EDM representation has to respect the types of the PATHS annotations.For instance, it has to be possible to retrieve all background links of a particular CH item (asopposite as, say, its related items). Finally, the last criterion states that we should use widelyused EDM objects and properties as possible. In particular, the EDM representation shoulduse the set of elements described by Europeana’s instructions for providers4, when possible.

We now describe the main steps to describe the PATHS annotations to EDM.

From ESEPaths to EDM

We start describing the resources which are already in Europeana. This includes an Euro-peana ore:Aggregation resource with information about the digital aggregation process itself(provider, etc)5.

<http://data.europeana.eu/aggregation/provider/09405/8F49> a ore:Aggregation;edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>;edm:dataProvider "English Heritage - Viewfinder";edm:provider "CultureGrid";edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>;edm:object <http://www.culturegrid.org.uk/1512084/thumbnail_image_jpeg>;edm:rights <http://www.europeana.eu/rights/rr-f/>.

Europeana also provides a proxy for the CHO, attached to this aggregation6:

<http://data.europeana.eu/proxy/provider/09405/8F49> a ore:Proxy;ore:proxyFor <http://data.europeana.eu/item/09405/8F49>;ore:proxyIn <http://data.europeana.eu/aggregation/provider/09405/8F49>;# Original ESE datadc:creator "Davies, J O";dc:date "[2001]";dc:title "Stembridge Windmill, High Ham, Somerset";dc:description "This is a random-coursed blue lias ...".

We now describe the way to represent the enrichment annotations as provided by the PATHSproject. We encapsulate these annotations into a new ore:Aggregation. This aggregationresource records a first set of enrichments created by the PATHS project over the original CHobject. It includes all relevant information like provider name, access rights, etc. as well as theannotations referring to the whole CH object, as opposed to enrichment information extractedfrom some subset of the CH object’s metadata.

4http://europeanalabs.eu/wiki/EDMObjectTemplatesProviders5The resource identifier of the aggregation used in the example is not real. The real one should be provided

by Europeana.

6Note again that the resource identifier of the proxy used in the example is not real.

5

Page 8: Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

<http://www.paths-project.eu/aggregation/paths/09405/8F49> a ore:Aggregationedm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>;edm:provider "PATHS";edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>;edm:rights <http://www.paths-project.eu/rights/rr-f/>;# item informativenesspaths:informativeness "0.7".

There are some notes to be aware of:

• The isShownAt property points to the original record, as the PATHS project does notstore any information besides the proper enrichment of CH items.

• The edm:rights property refers to the annotated information (instead of the rights ofthe original CH item).

• As said before, the paths:informativeness element pertains to the PATHS aggregationresource because it refers to the CH object as a whole.

Finally, we create a proxy resource for the PATHS aggregation and describe the remainingpaths annotations within the scope (as properties) of this resource:

<http://www.paths-project.eu/proxy/paths/09405/8F49> a ore:Proxy;ore:proxyFor <http://data.europeana.eu/item/09405/8F49>;ore:proxyIn <http://www.paths-project.eu/aggregation/paths/09405/8F49># vocabulary mappingedm:isRelatedTo:vocabulary <http://www.paths-project.eu/vocabulary/Tower_mills>;# eventsedm:isRelatedTo <http://www.paths-project.eu/event/playing>;# related itemsedm:isRelatedTo <http://www.europeana.eu/portal/record/09405t/A6F9A>;# background links itemsedm:isRelatedTo <http://en.wikipedia.org/wiki/Archaeology>.

# Or <http://dbpedia.org/resource/Archaeology>

Representing various types of enrichment. As shown in the example, the proxy resourcerelates the CH item with external resources such as vocabulary concepts, events, related itemsor objects from some external sources (such as Wikipedia or dbpedia). As all the associationsare described by means of the high-level edm:isRelatedTo property, it is necessary to properlydeclare the types of the external objects related to the CH object. Otherwise, there would beno way to discriminate among the different types of PATHS annotations (for instance, therewould be no way to specifically retrieve the vocabulary concepts related to a CH object). As afirst solution, we can include a separate description for the resources linked to the CH objectusing SKOS7.Within PATHS we define the following types of external resources:

• Related CH items: are of type paths:RelatedItemConcept, which is in turn a subclassof skos:Concept.

• Vocabulary concepts are of type skos:Concept.7http://www.w3.org/2004/02/skos

6

Page 9: Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

• Events are of type paths:EventConcept, a subclass of skos:Concept. It represents anyconcept which refers to a (type of) event (such as “run”, “play”, etc).

• Background links are of type paths:BackgroundLinkConcept, a subclass of skos:Concept.

Note that these classes are meant to offer a way to discriminate among the different typesof annotations inside the PATHS project. The classes are therefore loosely defined, in the sensethat they do not describe the proper semantic type of the resources. For instance, PATHScan relate a CH object with a dbpedia resource representing a place (New_York), a person(Pablo_Picasso), etc. However, within the scope of the PATHS annotations, the only explicitcommon type for all those resources can be inherited from their “background link” status.

Also note that at the time being, Europeana would not be able to perfectly ingest datathat uses such sub-classes, as they depart from the set of elements described by Europeana’sinstructions for providers8. This would require Europeana to handle specialisations of EDM,which is not precisely scheduled at the time of writing.

Based on the above, we also include the following statements in the example:

<http://www.paths-project.eu/vocabulary/Tower_mills> a skos:Concept;skos:prefLabel "Tower Mills"@en.

<http://www.paths-project.eu/event/playing> a paths:EventConcept;skos:prefLabel "playing"@en.

<http://www.europeana.eu/portal/record/09405t/A6F9A> a paths:RelatedItemConcept.

<http://en.wikipedia.org/wiki/Archaeology> a paths:BackgroundLinkConcept;skos:prefLabel "Archeology"@en.

along with the definitions of these new types:

paths:EventConcept a owl:Class ;rdfs:subClassOf skos:Concept ;rdfs:label "Event Concept"@en ;skos:definition "A concept describing an Event"@en .

paths:RelatedItemConcept a owl:Class ;rdfs:subClassOf skos:Concept ;rdfs:label "Related Item Concept"@en ;skos:definition "A concept describing a CH record"@en .

paths:BackgroundLinkConcept a owl:Class ;rdfs:subClassOf skos:Concept ;rdfs:label "Background Link Concept"@en ;skos:definition "A concept describing an object from an

external source such as dbpedia"@en .

The above definitions can be put next to the annotation data, in a separate file directlyprovided to Europeana or others, or even served over the Web in a Linked Data scenario. Thewhole EDM representation for the item is shown in Figure 2.

8http://europeanalabs.eu/wiki/EDMObjectTemplatesProviders

7

Page 10: Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

<http://data.europeana.eu/aggregation/provider/09405/8F49> a ore:Aggregation;edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>;edm:dataProvider "English Heritage - Viewfinder";edm:provider "CultureGrid";edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>;edm:object <http://www.culturegrid.org.uk/1512084/thumbnail_image_jpeg>;edm:rights <http://www.europeana.eu/rights/rr-f/>.

<http://data.europeana.eu/proxy/europeana/09405/8F49> a ore:Proxy;ore:proxyFor <http://data.europeana.eu/item/09405/8F49>;ore:proxyIn <http://www.paths-project.eu/aggregation/europeana/09405/8F49>;# Existing ESE recorddc:creator "Davies, J O";dc:date "[2001]";dc:title "Stembridge Windmill, High Ham, Somerset";dc:description "This is a random-coursed blue lias ...".

<http://www.paths-project.eu/aggregation/europeana/09405/8F49> a ore:Aggregation;edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>;edm:provider "PATHS";edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>;edm:rights <http://www.paths-project.eu/rights/rr-f/>;

# item informativenesspaths:informativeness "0.7".

<http://www.paths-project.eu/proxy/europeana/09405/8F49> a ore:Proxy;ore:proxyFor <http://data.europeana.eu/item/09405/8F49>;ore:proxyIn <http://www.paths-project.eu/aggregation/europeana/09405/8F49># vocabulary mappingedm:isRelatedTo:vocabulary <http://www.paths-project.eu/vocabulary/Tower_mills>;# eventsedm:isRelatedTo <http://www.paths-project.eu/event/playing>;# related itemsedm:isRelatedTo <http://www.europeana.eu/portal/record/09405t/A6F9A>;# background links itemsedm:isRelatedTo <http://en.wikipedia.org/wiki/Archaeology>.

# Or <http://dbpedia.org/resource/Archaeology><http://www.paths-project.eu/vocabulary/Tower_mills> a skos:Concept;skos:prefLabel "Tower Mills"@en.

<http://www.paths-project.eu/event/playing> a paths:EventConcept;skos:prefLabel "playing"@en.

<http://www.europeana.eu/portal/record/09405t/A6F9A> a paths:RelatedItemConcept.

Figure 2: EDM representation of the ESEPaths example

8

Page 11: Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

Using specific metadata fields to represent enrichments Alternatively, if a PATHSenrichment is known to be certain, a new metadata field can be created for the CH object. Forinstance if the mapping of the CH record to a vocabulary concept is known to be sure, we cancreate a new dc:subject field linking the CH record with the appropriate vocabulary concept.Note however that PATHS enrichments are automatically performed, and it is not certain thata concept enrichment derived from a dc:subject would result in a dc:subject relation betweenthe object and the concept. The link to the concept may have been identified based on only asmall part of the original field, thus missing some of the original semantics. Thus some manualassessment has to be done in order to promote the annotation into a proper metadata field.

4 Using Open Annotation to represent attributes in relations

The roadmap described in the previous section covers the main aspects of ESEPaths. However,there is a first piece of ESEPaths data, which can not be easily represented in EDM as it inheritsRDF’s focus on binary relations: attributes on relations. Almost all annotations created by thePATHS project have some information associated to them. Especially, many annotations recorda confidence value, describing the level of certainty of the automatic method when creating theannotation.

A way to overcome this limitation in an RDF-based model would be to reify the annota-tion into an instance of a dedicated class, and represent the annotation attributes using classproperties. For this we can re-use elements from the Open Annotation (OA) model9. Considerthis ESEPaths snippet:

<record>...<europeana:uri>http://www.europeana.eu/resolve/record/09405/8F49</europeana:uri><paths:background_link source="wikipedia" start_offset="0" end_offset="11"

field="dc:subject" confidence="0.015"method="wikipedia-miner-1.2.0"title="Archaeology">http://en.wikipedia.org/wiki/Archaeology

</paths:background_link></record>

We would create the following oa:Annotation for it:

background_link1 a oa:Annotation ;a paths:BackgroundLinkAnnotation ;oa:hasTarget <http://www.paths-project.eu/proxy/europeana/09405/8F49> ;oa:hasBody <http://en.wikipedia.org/wiki/Archaeology> ;

#Or <http://dbpedia.org/resource/Archaeology>paths:source <http://en.wikipedia.org> ;

#Or <dbpedia.org>paths:confidence "0.015" .

In the example, the <paths:background_link> annotation has been converted (reified) toan oa:Annotation resource background_link_resource1 of type paths:BackgroundLinkAnnotation,

9http://www.openannotation.org/spec/core/

9

Page 12: Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

linked by the oa:hasTarget relation to the PATHS proxy resource. The attributes of the orig-inal relation are now represented as properties of this new resource.

An alternative of the above approach would be using the OA “motivation” property forrepresenting the annotation. The OA motivation is meant to represent “the reasons why theAnnotation was created, not just the agents involved”10, which fits particularly well with thekind of information we want to represent. The “motivation” approach would lead to the followingtriplets:

background_link1 a oa:Annotation ;oa:motivatedBy paths:backgroundLinkMotivation ;oa:hasTarget <http://www.paths-project.eu/proxy/europeana/09405/8F49> ;oa:hasBody <http://en.wikipedia.org/wiki/Archaeology> ;

#Or <http://dbpedia.org/resource/Archaeology>paths:source <http://en.wikipedia.org> ;

#Or <dbpedia.org>paths:confidence "0.015" .

In this case, the <paths:background_link> object is of type oa:Annotation, and it is alsooa:motivatedBy a paths:backgroundLinkMotivation, an instance of skos:Concept.

Both approaches described so far solve the main problem of attaching attributes to relations,and also the need of defining specific relations for PATHS such as paths:background_link,that would conflict with the metadata fields currently used by EDM. Note however that theproperties of the newly defined reified annotations are still specific for PATHS (paths:source,paths:confidence, etc).

On a side note, using reified concepts for annotation raises the issue of whether we shouldstill keep the proxy-based representation next to it. Because now all the PATHS enrichmentdata is attached to the reified annotation, the Proxy object described in Section 3 will conveylittle or no information at all, compared to the original data.

4.1 Offsets and selectors

There is another piece of ESEPaths data, which is not currently represented in EDMPaths,namely, the field and offset attributes of the relations. Because all PATHS annotations areextracted from the textual content of some metadata field in the original CH record representa-tion, ESEPaths annotations keeps track of the original text snippet (called the anchor) whichwas used to derive the enrichment.

In order to track this kind of provenance information, EDM could re-use the selectors fromthe Open Annotation model11. For instance, Consider the following ESEPaths snippet:

<record>...<europeana:uri>http://www.europeana.eu/resolve/record/09405/8F49</europeana:uri>...<paths:background_link start_offset="0" end_offset="11"

field="dc:subject" ... >http://en.wikipedia.org/wiki/Archaeology

</paths:background_link></record>

10http://www.openannotation.org/spec/core/core.html#Motivations11http://www.openannotation.org/spec/core/specific.html#Selectors

10

Page 13: Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

It describes an “background link” annotation for the CH object “09405/8F49” which was ex-tracted by analyzing the offsets 0-11of the dc:subject of the original record. These offsetscould be translated to the following Open Annotation snippet:

background_link1 a oa:Annotation ;oa:hasTarget anchor1 ;oa:hasBody <http://en.wikipedia.org/wiki/Archaeology> .

anchor1 a oa:SpecificResource ;oa:hasSource ??? ; # which type has this object ?oa:hasSelector selector1 .

selector1 a oa:TextPositionSelector ;oa:start 0 ;oa:end 11 .

As noted in the snippet, our problem is then to define the type of the anchor1 resource.This object should represent the dc:subject field of CH record “09405/8F49”, but there isactually no way to describe this with EDM. We thus decided to leave this piece of informationout of our proposed solution.

5 Conclusion

In this work we describe a method for representing automatically created PATHS annotationsinto the EDM model. We first describe a simple way for representing the annotations and discussits benefits and drawbacks. One important weakness of the simple annotation schema lies inits inability to represent attributes of annotations, such as confidence scores. To overcome thislimitation we propose a more complex solution that involves reifing the annotation properties asinstances of dedicated classes, and representing the annotation attributes using class properties.For this we have re-used elements from the Open Annotation (OA) model.

The method presented here, called EDMPaths, is able to properly represent the annotationsfollowing EDM, but some information which was previously present following ESE has beenleft out. In particular, information regarding the particular offset of the anchor that caused theannotation was produced has proven difficult to represent.

One of our main design goals has been to avoid creating new non-standard classes andproperties when defining EDMPaths. We think we have succeed on this particular aspect,mainly by reusing elements from initiatives such as the Open Annotation model. However, theproposal describes some properties which are still specific for the PATHS project.

References

[Agirre and de Lacalle, 2011] Agirre, E. and de Lacalle, O. L. (2011). D2.1: Processing andrepresentation of content for first prototype. Technical report, PATHS project.

[Doerr et al., 2010] Doerr, M., Gradmann, S., Hennicke, S., Isaac, A., Meghini, C., and van deSompel, H. (2010). The europeana data model (EDM). In World Library and Information

Congress: 76th IFLA general conference and assembly, pages 10–15.

11

Page 14: Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

[Otegi et al., 2012] Otegi, A., Agirre, E., and Soroa, A. (2012). D2.2: Processing and represen-tation of content for second prototype. Technical report, PATHS project.

12

Page 15: Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

• Event information associated with the item: CHOs often provide event- or activity-relatedinformation, such as people walking, etc. We enrich the items by means of a predefinedlist of words that can be used to refer to events. This data allows answering questionslike “give me items with people running”, “items with people playing”, etc.

• Related items: CH items which are semantically related.

• Background links that relate CH items with external resources such as Wikipedia. Whenlinking a CH item with some external resource, we keep track of the original text snip-pet from which the association is derived. For instance, an item could be related to aWikipedia article because of some text snippet of the dc:description field. In such casewe store the reference to the field and offset as attributes.2 (note that in some caseshowever there is little point in keeping the text, because the enrichment is done based ona combination of metadata fields)

The PATHS project started in 2011, and it adopted the representation schema of choicethen, ESE3. We extended it extended to a format called ESEPaths to represent the enrichmentinformation just mentioned [Agirre and de Lacalle, 2011, Otegi et al., 2012]. In this documentwe describe a proposal for representing PATHS enrichments following EDM (Europeana DataModel), the new data model used by Europeana.

The document is structured as follows. We first introduce ESEPaths (Section 2), thenthe roadmap for a simple conversion to EDM (Section 3). Section 4 explains some possible(advanced) solutions to the problems identified in Section 3. Finally the conclusions are drawn.

2 ESEPaths

PATHS has defined a format derived from ESE, called ESEPaths, which adds the enrichmentinformation described above. Specifically, ESEPaths adds the following fields:

• <paths:informativeness> with the informativeness score of the ESE record.

• <paths:vocabulary>, which links the ESE record with vocabulary terms. The elementhas the following attributes:

– name: name of the external vocabulary.– URI: the address (URI) of the specific category in the vocabulary.– confidence: the confidence of the association.

• <paths:event> which links the ESE record with external events. The element has thefollowing attributes:

– source: the name of the external resource of the event (for instance, WordNet).– canonical_form: the canonical word form of the annotated event.– confidence: confidence of the association.

2Keeping track of this information, for instance, for an interface showing those annotations, as it can empha-

size the specific snippet and link it to the Wikipedia/dbpedia article when the user points to it.

3http://www.europeana.eu/schemas/ese/

2

Page 16: Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

<record><!-- Existing ESE record --><dc:identifier>http://www.thebowesmuseum.org.uk/10432/</dc:identifier><europeana:uri>http://www.europeana.eu/resolve/record/09405/8F49</europeana:uri><dc:title>Stembridge Windmill, High Ham, Somerset</dc:title><dc:description>This is a random-coursed blue lias ...</dc:description><dcterms:isPartOf>Bowes Museum</dcterms:isPartOf><dc:subject>1670</dc:subject><dc:type>Image</dc:type><europeana:provider>CultureGrid</europeana:provider><europeana:isShownAt>http://www.thebowesmuseum.org.uk/10432/</europeana:isShownAt><europeana:hasObject>false</europeana:hasObject><europeana:country>uk</europeana:country><europeana:type>IMAGE</europeana:type><europeana:language>en</europeana:language>

<!-- ESEPaths augmentation -->

<!-- item informativeness --><paths:informativeness>0.7</paths:informativeness>

<!-- vocabulary mapping --><paths:vocabulary confidence="0.8" source="wikicat"

URI="http://en.wikipedia.org/wiki/Category:Tower_mills">Tower Mills</paths:vocabulary>

<!-- events --><paths:event confidence="0.8" source="wordnet" canonical_form="play"

start_offset="120" end_offset="127" field="dc:description">playing</paths:event>

<!-- related items --><paths:related_item confidence="0.8" field="dc:subject" field_no="1"

method="LDA">http://www.europeana.eu/portal/record/09405t/A6F9A

</paths:related_item>

<!-- background links items --><paths:background_link source="wikipedia" start_offset="0" end_offset="11"

field="dc:subject" confidence="0.015"method="wikipedia-miner-1.2.0"title="Archaeology">http://en.wikipedia.org/wiki/Archaeology

</paths:background_link></record>

Figure 1: Example of an ESEPaths record

3

Page 17: Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

• <paths:related_item> which links the ESE record with related CH items. The elementhas the following attributes:

– confidence: confidence of the association.– method: which method produced the association– field: the name of the ESE field whose content suggests the similarity relation.– field_no: the position of the ESE field described above (useful in case the ESE

records contains more than one field with the same name).

• <paths:background_link>: which links the ESE record with an item from an externalresource. The element has the following attributes:

– source: the name of the external resource.– start_offset: the offset (in characters) within the field element where the text

anchor begins.– end_offset: the offset (in characters) within the field element where the text anchor

ends.– field: the field of the ESE record where the anchor for this relation is located.– confidence: confidence of the association.– method: which method produced the association.– title: title of the URL which the background link points to.– sentiment: polarity of the textual information included in the corresponding link.

It has fixed values, namely “pos” for positive results, “neg” for negative and “neu” forneutral.

Figure 1 shows an example of a CH record in ESEPaths. The first lines are just a copy ofthe original ESE record, whereas the new elements (in the paths namespace) are at the end.Note that identifiers (incl. URIs) are not real, and shortened so that the listing fits on the page.

3 Roadmap for basic conversion of ESEPaths to EDM

As said before, all the data produced by the PATHS project is encoded following the ESEformat extended with new elements. However, Europeana is switching from ESE to a new datamodel, EDM. The main difference between ESE and EDM is that the latter is more expressiveand based on Semantic Web and Linked Data technologies (RDF, ontologies). In this section,we outline the main design we devise for switching from ESEPaths to EDM.

The main design criteria we have followed is the following:

1. All PATHS annotations should be properly represented using EDM.

2. It must be possible to retrieve particular PATHS annotations.

3. We should depart as less as possible from standard EDM.

4

Page 18: Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

The first criterion states that all PATHS annotations should be described using EDM. Aswill be shown below, some annotation attributes are difficult to represent following EDM and,as a consequence, a compromise has to be made between describing PATHS annotations in theirfull richness and using proper EDM concepts and properties for representing them. The secondcriterion states that the EDM representation has to respect the types of the PATHS annotations.For instance, it has to be possible to retrieve all background links of a particular CH item (asopposite as, say, its related items). Finally, the last criterion states that we should use widelyused EDM objects and properties as possible. In particular, the EDM representation shoulduse the set of elements described by Europeana’s instructions for providers4, when possible.

We now describe the main steps to describe the PATHS annotations to EDM.

From ESEPaths to EDM

We start describing the resources which are already in Europeana. This includes an Euro-peana ore:Aggregation resource with information about the digital aggregation process itself(provider, etc)5.

<http://data.europeana.eu/aggregation/provider/09405/8F49> a ore:Aggregation;edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>;edm:dataProvider "English Heritage - Viewfinder";edm:provider "CultureGrid";edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>;edm:object <http://www.culturegrid.org.uk/1512084/thumbnail_image_jpeg>;edm:rights <http://www.europeana.eu/rights/rr-f/>.

Europeana also provides a proxy for the CHO, attached to this aggregation6:

<http://data.europeana.eu/proxy/provider/09405/8F49> a ore:Proxy;ore:proxyFor <http://data.europeana.eu/item/09405/8F49>;ore:proxyIn <http://data.europeana.eu/aggregation/provider/09405/8F49>;# Original ESE datadc:creator "Davies, J O";dc:date "[2001]";dc:title "Stembridge Windmill, High Ham, Somerset";dc:description "This is a random-coursed blue lias ...".

We now describe the way to represent the enrichment annotations as provided by the PATHSproject. We encapsulate these annotations into a new ore:Aggregation. This aggregationresource records a first set of enrichments created by the PATHS project over the original CHobject. It includes all relevant information like provider name, access rights, etc. as well as theannotations referring to the whole CH object, as opposed to enrichment information extractedfrom some subset of the CH object’s metadata.

4http://europeanalabs.eu/wiki/EDMObjectTemplatesProviders5The resource identifier of the aggregation used in the example is not real. The real one should be provided

by Europeana.

6Note again that the resource identifier of the proxy used in the example is not real.

5

Page 19: Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

<http://www.paths-project.eu/aggregation/paths/09405/8F49> a ore:Aggregationedm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>;edm:provider "PATHS";edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>;edm:rights <http://www.paths-project.eu/rights/rr-f/>;# item informativenesspaths:informativeness "0.7".

There are some notes to be aware of:

• The isShownAt property points to the original record, as the PATHS project does notstore any information besides the proper enrichment of CH items.

• The edm:rights property refers to the annotated information (instead of the rights ofthe original CH item).

• As said before, the paths:informativeness element pertains to the PATHS aggregationresource because it refers to the CH object as a whole.

Finally, we create a proxy resource for the PATHS aggregation and describe the remainingpaths annotations within the scope (as properties) of this resource:

<http://www.paths-project.eu/proxy/paths/09405/8F49> a ore:Proxy;ore:proxyFor <http://data.europeana.eu/item/09405/8F49>;ore:proxyIn <http://www.paths-project.eu/aggregation/paths/09405/8F49># vocabulary mappingedm:isRelatedTo:vocabulary <http://www.paths-project.eu/vocabulary/Tower_mills>;# eventsedm:isRelatedTo <http://www.paths-project.eu/event/playing>;# related itemsedm:isRelatedTo <http://www.europeana.eu/portal/record/09405t/A6F9A>;# background links itemsedm:isRelatedTo <http://en.wikipedia.org/wiki/Archaeology>.

# Or <http://dbpedia.org/resource/Archaeology>

Representing various types of enrichment. As shown in the example, the proxy resourcerelates the CH item with external resources such as vocabulary concepts, events, related itemsor objects from some external sources (such as Wikipedia or dbpedia). As all the associationsare described by means of the high-level edm:isRelatedTo property, it is necessary to properlydeclare the types of the external objects related to the CH object. Otherwise, there would beno way to discriminate among the different types of PATHS annotations (for instance, therewould be no way to specifically retrieve the vocabulary concepts related to a CH object). As afirst solution, we can include a separate description for the resources linked to the CH objectusing SKOS7.Within PATHS we define the following types of external resources:

• Related CH items: are of type paths:RelatedItemConcept, which is in turn a subclassof skos:Concept.

• Vocabulary concepts are of type skos:Concept.7http://www.w3.org/2004/02/skos

6

Page 20: Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

• Events are of type paths:EventConcept, a subclass of skos:Concept. It represents anyconcept which refers to a (type of) event (such as “run”, “play”, etc).

• Background links are of type paths:BackgroundLinkConcept, a subclass of skos:Concept.

Note that these classes are meant to offer a way to discriminate among the different typesof annotations inside the PATHS project. The classes are therefore loosely defined, in the sensethat they do not describe the proper semantic type of the resources. For instance, PATHScan relate a CH object with a dbpedia resource representing a place (New_York), a person(Pablo_Picasso), etc. However, within the scope of the PATHS annotations, the only explicitcommon type for all those resources can be inherited from their “background link” status.

Also note that at the time being, Europeana would not be able to perfectly ingest datathat uses such sub-classes, as they depart from the set of elements described by Europeana’sinstructions for providers8. This would require Europeana to handle specialisations of EDM,which is not precisely scheduled at the time of writing.

Based on the above, we also include the following statements in the example:

<http://www.paths-project.eu/vocabulary/Tower_mills> a skos:Concept;skos:prefLabel "Tower Mills"@en.

<http://www.paths-project.eu/event/playing> a paths:EventConcept;skos:prefLabel "playing"@en.

<http://www.europeana.eu/portal/record/09405t/A6F9A> a paths:RelatedItemConcept.

<http://en.wikipedia.org/wiki/Archaeology> a paths:BackgroundLinkConcept;skos:prefLabel "Archeology"@en.

along with the definitions of these new types:

paths:EventConcept a owl:Class ;rdfs:subClassOf skos:Concept ;rdfs:label "Event Concept"@en ;skos:definition "A concept describing an Event"@en .

paths:RelatedItemConcept a owl:Class ;rdfs:subClassOf skos:Concept ;rdfs:label "Related Item Concept"@en ;skos:definition "A concept describing a CH record"@en .

paths:BackgroundLinkConcept a owl:Class ;rdfs:subClassOf skos:Concept ;rdfs:label "Background Link Concept"@en ;skos:definition "A concept describing an object from an

external source such as dbpedia"@en .

The above definitions can be put next to the annotation data, in a separate file directlyprovided to Europeana or others, or even served over the Web in a Linked Data scenario. Thewhole EDM representation for the item is shown in Figure 2.

8http://europeanalabs.eu/wiki/EDMObjectTemplatesProviders

7

Page 21: Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

<http://data.europeana.eu/aggregation/provider/09405/8F49> a ore:Aggregation;edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>;edm:dataProvider "English Heritage - Viewfinder";edm:provider "CultureGrid";edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>;edm:object <http://www.culturegrid.org.uk/1512084/thumbnail_image_jpeg>;edm:rights <http://www.europeana.eu/rights/rr-f/>.

<http://data.europeana.eu/proxy/europeana/09405/8F49> a ore:Proxy;ore:proxyFor <http://data.europeana.eu/item/09405/8F49>;ore:proxyIn <http://www.paths-project.eu/aggregation/europeana/09405/8F49>;# Existing ESE recorddc:creator "Davies, J O";dc:date "[2001]";dc:title "Stembridge Windmill, High Ham, Somerset";dc:description "This is a random-coursed blue lias ...".

<http://www.paths-project.eu/aggregation/europeana/09405/8F49> a ore:Aggregation;edm:aggregatedCHO <http://data.europeana.eu/item/09405/8F49>;edm:provider "PATHS";edm:isShownAt <http://viewfinder.english-heritage.org.uk/imageUID=8>;edm:rights <http://www.paths-project.eu/rights/rr-f/>;

# item informativenesspaths:informativeness "0.7".

<http://www.paths-project.eu/proxy/europeana/09405/8F49> a ore:Proxy;ore:proxyFor <http://data.europeana.eu/item/09405/8F49>;ore:proxyIn <http://www.paths-project.eu/aggregation/europeana/09405/8F49># vocabulary mappingedm:isRelatedTo:vocabulary <http://www.paths-project.eu/vocabulary/Tower_mills>;# eventsedm:isRelatedTo <http://www.paths-project.eu/event/playing>;# related itemsedm:isRelatedTo <http://www.europeana.eu/portal/record/09405t/A6F9A>;# background links itemsedm:isRelatedTo <http://en.wikipedia.org/wiki/Archaeology>.

# Or <http://dbpedia.org/resource/Archaeology><http://www.paths-project.eu/vocabulary/Tower_mills> a skos:Concept;skos:prefLabel "Tower Mills"@en.

<http://www.paths-project.eu/event/playing> a paths:EventConcept;skos:prefLabel "playing"@en.

<http://www.europeana.eu/portal/record/09405t/A6F9A> a paths:RelatedItemConcept.

Figure 2: EDM representation of the ESEPaths example

8

Page 22: Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

Using specific metadata fields to represent enrichments Alternatively, if a PATHSenrichment is known to be certain, a new metadata field can be created for the CH object. Forinstance if the mapping of the CH record to a vocabulary concept is known to be sure, we cancreate a new dc:subject field linking the CH record with the appropriate vocabulary concept.Note however that PATHS enrichments are automatically performed, and it is not certain thata concept enrichment derived from a dc:subject would result in a dc:subject relation betweenthe object and the concept. The link to the concept may have been identified based on only asmall part of the original field, thus missing some of the original semantics. Thus some manualassessment has to be done in order to promote the annotation into a proper metadata field.

4 Using Open Annotation to represent attributes in relations

The roadmap described in the previous section covers the main aspects of ESEPaths. However,there is a first piece of ESEPaths data, which can not be easily represented in EDM as it inheritsRDF’s focus on binary relations: attributes on relations. Almost all annotations created by thePATHS project have some information associated to them. Especially, many annotations recorda confidence value, describing the level of certainty of the automatic method when creating theannotation.

A way to overcome this limitation in an RDF-based model would be to reify the annota-tion into an instance of a dedicated class, and represent the annotation attributes using classproperties. For this we can re-use elements from the Open Annotation (OA) model9. Considerthis ESEPaths snippet:

<record>...<europeana:uri>http://www.europeana.eu/resolve/record/09405/8F49</europeana:uri><paths:background_link source="wikipedia" start_offset="0" end_offset="11"

field="dc:subject" confidence="0.015"method="wikipedia-miner-1.2.0"title="Archaeology">http://en.wikipedia.org/wiki/Archaeology

</paths:background_link></record>

We would create the following oa:Annotation for it:

background_link1 a oa:Annotation ;a paths:BackgroundLinkAnnotation ;oa:hasTarget <http://www.paths-project.eu/proxy/europeana/09405/8F49> ;oa:hasBody <http://en.wikipedia.org/wiki/Archaeology> ;

#Or <http://dbpedia.org/resource/Archaeology>paths:source <http://en.wikipedia.org> ;

#Or <dbpedia.org>paths:confidence "0.015" .

In the example, the <paths:background_link> annotation has been converted (reified) toan oa:Annotation resource background_link_resource1 of type paths:BackgroundLinkAnnotation,

9http://www.openannotation.org/spec/core/

9

Page 23: Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

linked by the oa:hasTarget relation to the PATHS proxy resource. The attributes of the orig-inal relation are now represented as properties of this new resource.

An alternative of the above approach would be using the OA “motivation” property forrepresenting the annotation. The OA motivation is meant to represent “the reasons why theAnnotation was created, not just the agents involved”10, which fits particularly well with thekind of information we want to represent. The “motivation” approach would lead to the followingtriplets:

background_link1 a oa:Annotation ;oa:motivatedBy paths:backgroundLinkMotivation ;oa:hasTarget <http://www.paths-project.eu/proxy/europeana/09405/8F49> ;oa:hasBody <http://en.wikipedia.org/wiki/Archaeology> ;

#Or <http://dbpedia.org/resource/Archaeology>paths:source <http://en.wikipedia.org> ;

#Or <dbpedia.org>paths:confidence "0.015" .

In this case, the <paths:background_link> object is of type oa:Annotation, and it is alsooa:motivatedBy a paths:backgroundLinkMotivation, an instance of skos:Concept.

Both approaches described so far solve the main problem of attaching attributes to relations,and also the need of defining specific relations for PATHS such as paths:background_link,that would conflict with the metadata fields currently used by EDM. Note however that theproperties of the newly defined reified annotations are still specific for PATHS (paths:source,paths:confidence, etc).

On a side note, using reified concepts for annotation raises the issue of whether we shouldstill keep the proxy-based representation next to it. Because now all the PATHS enrichmentdata is attached to the reified annotation, the Proxy object described in Section 3 will conveylittle or no information at all, compared to the original data.

4.1 Offsets and selectors

There is another piece of ESEPaths data, which is not currently represented in EDMPaths,namely, the field and offset attributes of the relations. Because all PATHS annotations areextracted from the textual content of some metadata field in the original CH record representa-tion, ESEPaths annotations keeps track of the original text snippet (called the anchor) whichwas used to derive the enrichment.

In order to track this kind of provenance information, EDM could re-use the selectors fromthe Open Annotation model11. For instance, Consider the following ESEPaths snippet:

<record>...<europeana:uri>http://www.europeana.eu/resolve/record/09405/8F49</europeana:uri>...<paths:background_link start_offset="0" end_offset="11"

field="dc:subject" ... >http://en.wikipedia.org/wiki/Archaeology

</paths:background_link></record>

10http://www.openannotation.org/spec/core/core.html#Motivations11http://www.openannotation.org/spec/core/specific.html#Selectors

10

Page 24: Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

It describes an “background link” annotation for the CH object “09405/8F49” which was ex-tracted by analyzing the offsets 0-11of the dc:subject of the original record. These offsetscould be translated to the following Open Annotation snippet:

background_link1 a oa:Annotation ;oa:hasTarget anchor1 ;oa:hasBody <http://en.wikipedia.org/wiki/Archaeology> .

anchor1 a oa:SpecificResource ;oa:hasSource ??? ; # which type has this object ?oa:hasSelector selector1 .

selector1 a oa:TextPositionSelector ;oa:start 0 ;oa:end 11 .

As noted in the snippet, our problem is then to define the type of the anchor1 resource.This object should represent the dc:subject field of CH record “09405/8F49”, but there isactually no way to describe this with EDM. We thus decided to leave this piece of informationout of our proposed solution.

5 Conclusion

In this work we describe a method for representing automatically created PATHS annotationsinto the EDM model. We first describe a simple way for representing the annotations and discussits benefits and drawbacks. One important weakness of the simple annotation schema lies inits inability to represent attributes of annotations, such as confidence scores. To overcome thislimitation we propose a more complex solution that involves reifing the annotation properties asinstances of dedicated classes, and representing the annotation attributes using class properties.For this we have re-used elements from the Open Annotation (OA) model.

The method presented here, called EDMPaths, is able to properly represent the annotationsfollowing EDM, but some information which was previously present following ESE has beenleft out. In particular, information regarding the particular offset of the anchor that caused theannotation was produced has proven difficult to represent.

One of our main design goals has been to avoid creating new non-standard classes andproperties when defining EDMPaths. We think we have succeed on this particular aspect,mainly by reusing elements from initiatives such as the Open Annotation model. However, theproposal describes some properties which are still specific for the PATHS project.

References

[Agirre and de Lacalle, 2011] Agirre, E. and de Lacalle, O. L. (2011). D2.1: Processing andrepresentation of content for first prototype. Technical report, PATHS project.

[Doerr et al., 2010] Doerr, M., Gradmann, S., Hennicke, S., Isaac, A., Meghini, C., and van deSompel, H. (2010). The europeana data model (EDM). In World Library and Information

Congress: 76th IFLA general conference and assembly, pages 10–15.

11

Page 25: Roadmap from ESEPaths to EDMPaths: a note on representing annotations resulting from automatic enrichment - Aitor Soroa, Eneko Agirre, Arantxa Otegi and Antoine Isaac

[Otegi et al., 2012] Otegi, A., Agirre, E., and Soroa, A. (2012). D2.2: Processing and represen-tation of content for second prototype. Technical report, PATHS project.

12