Transcript
Page 1: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Semantic NeedGuiding Metadata Annotations by Questions People #ask

Hans-Jörg Happel, FZI Karlsruhe, Germany2010-11-09 @ 9th Int. Semantic Web Conference (ISWC 2010), Shanghai, China

Page 2: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Agenda• Introduction• Semantic Gap Heuristics• Semantic MediaWiki Study• Extension:Semantic Need• Summary & Outlook

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 2

Page 3: Semantic Need: Guiding Metadata Annotations by Questions People #ask

• SMW is a popular Semantic Web application that allows to annotate Wiki pages semantically

• Semantic interpretation of the existing Wiki categories• Syntax extension for [[Wiki links]]

– Relations to other pages: [[Capital::Abuja]]– Literals: [[Inhabitants::182418]]

Semantic MediaWiki (SMW)

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 3

Page 4: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Structured Queries in SMW• SMW also allows for structured queries

{{#ask: [[Category:Country]] [[OnContinent::Africa]] |?area |?...}}

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 4

SMW resembles the Semantic Web in small

Page 5: Semantic Need: Guiding Metadata Annotations by Questions People #ask

SMW Query Result{{#ask:

[[Category:Country]] [[OnContinent::Africa]] |?area |?...}}

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 5

???

…?

Page 6: Semantic Need: Guiding Metadata Annotations by Questions People #ask

What happend to „Nigeria“?

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 6

Info might be missing

…not annotated properly

Different property

name

Distributed data source not available

Page 7: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Semantic Gaps• Observation:

– „Semantic gap between supply and demand on the Semantic Web” [Mik09]

– Due to the evolutionary nature of the (Semantic) Web (OWA)

• What is missing? – i.e.:– KB: Axioms that are known (e.g. statements about Nigeria)– XKB: Axioms not yet known but people would like to know

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 7

Page 8: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Towards Semantic Need• Research questions

– How to identify „Semantic Gaps“?– Do „Semantic Gaps“ exist?– If yes, how to close these gaps?

• Research approach– Propose heuristics– Explorative: Analyze Public Semantic Web– Constructive: Design and evaluate tools

88Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China

Page 9: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Agenda• Introduction• Semantic Gap Heuristics• Semantic MediaWiki Study• Extension:Semantic Need• Summary & Outlook

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 9

Page 10: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Idea: Guide Annotation by Information Needs• Means for deriving information needs

– (Structured) queries– Information access/browsing– Context– …?

• We chose to focus on queries– Explicit; can be captured easily– Express a „demand“ [Mik09]– Recur across time and different people (at least in

IR! [Smy05, Tee06, Zha09])

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 10

Page 11: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Identifying „Semantic Gaps“• Focus on

– Conjuctive queries– Retrieving instances and their properties

• Core elements{{#ask: [[Category:Country]]

[[OfContinent::Africa]]|?hasArea|?population|?hasCapital|?Currency}}

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 11

Printout Statement

Printout Statement

ConditionsConditions

Page 12: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Semantic Gap Heuristic #1: Near Matches• Instance I KB is considered a near match of a

query q if:– I is not in the result set of q in KB

– There is at least one conjunctive query atom of q for which I is part of the result set

– I would be in the result set of q in KB XKB

• Correspondingly, we consider q to have an incomplete result set if it has „near matches“

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 12

Page 13: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Semantic Gap Heuristic #1: Example{{#ask:

[[Category:Country]] [[OnContinent::Africa]] |?area |?...}}

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 13

Egypt 1.001.449 km2 83.082.869 Cairo Egyptian pound

Lacks annotation [[OnContinent::Africa]]

„Near Match“

Lacks annotation [[OnContinent::Africa]]

„Near Match“

Page 14: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Semantic Gap Heuristic #2: Missing Printout Values• Instance I KB is considered to have missing

printout values for a query q if:– I is part of the result set of q– q contains a printout statement x for which no property

value of I exists in the KB

• Note: Technically, „missing printout values“ can be considered equivalent to near matches– SPARQL requires „OPTIONAL“ modifier to yield missing

printout values– SMW-QL allows to set printout values required

• Correspondigly, we consider q to have an sparse result set if it has at least one „missing printout value“

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 14

Page 15: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Semantic Gap Heuristic #2: Example

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 15

„Missing Printout Values“

„Missing Printout Values“

Page 16: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Agenda• Introduction• Semantic Gap Heuristics• Semantic MediaWiki Study• Extension:Semantic Need• Summary & Outlook

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 16

Page 17: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Public SMW Analysis: Design• Goal

– Do „Semantic Gaps“ exist?– Find out significance of missing result values and near

matches in real world queries

• Crawling public SMW installations– Collected ~200 public SMW installations via overview lists and

search engines– Selection of 8 SMW instances (filtered based on data and

technical reasons and random choice)– Those have on average 1880 annotations and 35 inline

queries

• Checking for sparse & incomplete query results– Analyzing 25 (out of 285) queries (only ASK-Queries, online

"Table"-output format, only queries with printout statements resp. conjunctions)

– 17 of these queries were located on Template pages

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 17

Page 18: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Public SMW Analysis: Results• Printout-Values

– In average, 16% of cells in a result set were empty due to missing annotations (up to 63% for certain queries)

Allows for identifying a total of 296 missing printout values– Validation showed that 13 out of 15 manually investigated

empty cells could be considered missing information

• Near matches– In average, 22% of all potential result pages of a query lack a

selective annotation (up to 94% for certain queries) Allows for identifying a total of 147 potentially missing

annotations for “selective” properties– Validation showed that 10 out of 15 manually investigated

near matches could be considered missing information

• Note: based on evaluation conditions, only around 9% of the overall inline queries were analyzed

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 18

Page 19: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Agenda• Introduction• Semantic Gap Heuristics• Semantic MediaWiki Study• Extension:Semantic Need• Summary & Outlook

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 19

Page 20: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Extension:Semantic Need: Idea• Goal:

– How to close „Semantic Gaps“?– Guide the creation of semantic annotations in SMW

• Design principles– „Need-driven Knowledge Sharing“ [Hap09b]– People are willing to contribute missing information, if

they recognize that there is concrete demand– Derived from related work and supported by user studies

• Core features– Capture and store needs (i.e. #ask-queries)– Guide annotations by extending and modifying the SMW

user interface based on information need heuristics (i.e. „near matches“ and „missing printout values“

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 2020

Page 21: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Screenshot: In-Page Annotation

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 22

HintHint Sources of needSources of need

Page 22: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Semantic Need Online Survey: Design

• 34 questions on SMW and Semantic Need• Target group: SMW experts (via mailinglist,

invitation) • Data collected in June/July 2010• 30 complete answers (out of 58)

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 23

Page 23: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Semantic Need Online Survey: Semantic Need can help• Problem patterns do occur

– Sparse result set: 12/30 considered problematic

– Incomplete result set: 23/30 considered problematic

• Stressed in free text• Core issue: „invisibility“ of the issue

• Usage of SMW differs– „Structured“ settings focus on quality– „Open“ settings focus on guidance– Semantic Need generally considered helpful by

both groups

24Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China

Page 24: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Maintenance practices: mostly ad hoc• Methods & tools used to maintain semantic data

– (7: n.a.; due to given external data model)– 12: none– 5: „simple“– 7: „advanced“ (e.g. scripts, documentation, team

decisions)

• How to find missing annotations for a given page– 6: Compare similar pages („extensional“)– 7: Check schema („intensional“)– 4: Text analysis– 10: Use query

25Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China

Page 25: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Agenda• Introduction• Semantic Gap Heuristics• Semantic MediaWiki Study• Extension:Semantic Need• Summary & Outlook

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 26

Page 26: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Insights• „Semantic Gaps“ do exist

– Information needs are a valuable source to find them

– „Missing printout values“ and „near matches“ seem to be useful heuristics

– Especially „incomplete result sets“ are considered problematic

• No systematic guidance & gardening of SMW knowledge bases

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 27

Page 27: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Design Implications• Semantic Annotation

– Issue: Costly, often driven by a pre-defined ontology structure

– Idea: Consider “incentives for annotation” [Han05]

• Semantic Search– Issue: Decoupling of provision & access– Idea: Consider information needs

• Need specification/ontology• Maintain semantic query logs

• Data Quality/Gardening/Maturing– Issue: The Semantic Web evolves continuously– Idea: Allow for better data quality modeling

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 28

Page 28: Semantic Need: Guiding Metadata Annotations by Questions People #ask

Summary & Outlook• Main contributions

– How to identify „Semantic Gaps“ Heuristics based on queries

– Do „Semantic Gaps“ exist? Yes– If yes, how to close these gaps? Semantic Need

• Next steps– Large scale analysis of „Semantic Gaps“ (more public SMW

instances)– Provide stable implementation und gather feedback from

field usage of Semantic Need

• Further research opportunities– Use needs to guide the sharing of semantic annotations– Use needs to create schema-level mappings or for

class/property evolution– Many more (Semantic query logs, UI, Incentives, …)

29Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China

Page 29: Semantic Need: Guiding Metadata Annotations by Questions People #ask

References• Extension:Semantic Need

– http://amazonas.fzi.de/semanticneed / (Demo Wiki)– http://www.mediawiki.org/wiki/Extension:Semantic_Need

• Extension:Woogle4MediaWiki (for non SMW-Wikis)– http://amazonas.fzi.de/wooglenative/ (Demo Wiki)– http://www.mediawiki.org/wiki/Extension:Woogle4MediaWiki

• Literature– [Han05] Handschuh, Siegfried: Creating ontology-based metadata by annotation for the semantic web,

Dissertation, 2005– [Hap09b] Hans-Jörg Happel: Towards Need-driven Knowledge Sharing in Distributed Teams. In

Proceedings of the 9th International Conference on Knowledge Management (I-KNOW 2009)– [Hap09c] Hans-Jörg Happel: Social Search and Need-driven Knowledge Sharing in Wikis with Woogle. In

Proceedings of the 5th international Symposium on Wikis and Open Collaboration (Orlando, Florida, October 25 - 27, 2009). WikiSym '09. ACM, New York, NY, 1-10.

– [Mik09]: Mika, P., Meij, E., Zaragoza, H.: Investigating the semantic gap through query log analysis. In: International Semantic Web Conference. Lecture Notes in Computer Science, vol. 5823, pp. 441–455. Springer (2009)

– [Smy05] Smyth, Barry ; Balfe, Evelyn ; Freyne, Jill ; Briggs, Peter ; Coyle, Maurice ; Boydell, Oisin: Exploiting Query Repetition and Regularity in an Adaptive Community-Based Web Search Engine. In: User Modeling and User-Adapted Interaction 14 (2005), Nr. 5, S. 383–423.

– [Tee06] Teevan, Jaime ; Adar, Eytan ; Jones, Rosie ; Potts, Michael: History repeats itself: repeat queries in Yahoo’s logs. In: SIGIR’06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA : ACM, 2006, S. 703–704.

– [Zha09] Zhang, Dell ; Lu, Jinsong: What queries are likely to recur in web search? In: SIGIR ’09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA : ACM, 2009, S. 827–828.

Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China 30

Page 30: Semantic Need: Guiding Metadata Annotations by Questions People #ask

The Semantic Web: Problems• Lack of resources: you might not annotate everything

– Metadata creation is costly– Access to metadata might be restricted to different spheres of sharing

(private, friends, world…)– “..probably the most important [open question] for the Semantic Web. How

to create incentives for annotation?” (Handschuh 2005, p198) [12]

• Lack of guidance: you might annotate the wrong things– „ Semantic gap between supply and demand on the Semantic Web” [Mik09]– The two processes of metadata creation and metadata use are decoupled

concerning time and actors– Existing annotation approaches drive the annotation process by the pre-

defined ontology structure

No unified theory, why metadata is created and how it is shared– Semantic Web Vision does not address the creator side of metadata –

it spends a lot of effort to convince people using the Semantic Web but not contributing to it

31Semantic Need: Guiding Metadata Annotations by Questions People #ask - ISWC 2010; Shanghai, China


Recommended