30
Semantic Search: different meanings

Semantic Search: different meanings

  • Upload
    xannon

  • View
    59

  • Download
    0

Embed Size (px)

DESCRIPTION

Semantic Search: different meanings. Semantic search: different meanings. Definition 1: Semantic search as the problem of searching documents beyond the syntactic level of matching keywords Hakia , PowerSet , SearchMonkey - PowerPoint PPT Presentation

Citation preview

Page 1: Semantic Search:  different meanings

Semantic Search: different meanings

Page 2: Semantic Search:  different meanings

Semantic search: different meanings

• Definition 1: Semantic search as the problem of searching documents beyond the syntactic level of matching keywords– Hakia, PowerSet, SearchMonkey

• Definition 2: Semantic search as the problem of searching large semantic web datasets– Watson, PowerAqua, Swoogle, Sindice, SWSE

Page 3: Semantic Search:  different meanings

Facing keyword-based search problems

• Relations between search terms: – “books about recommender systems” vs. “systems that

recommend books”• Polisemy

– “mouth” as part of the body vs. “mouth” as part of a stream

• Synonymy– “movies” vs. “films”

• Documents about individuals where query keywords do not appear: – “English banks”, individual “Abbey”

Page 4: Semantic Search:  different meanings

Several attempts from the IR community

• Early 80s: elaboration of conceptual frameworks and their introduction in IR models– Taxonomies (categories + hierarchical relations) ,

e.g., The ODP (Open Directory Project)– Thesaurus (categories + fixed hierarchical &

associative relations), e.g., WordNet (used by linguistic approaches)

– Algebraic methods such as LSA • Limitations: The level of conceptualization is

often shallow (specially at the level of relations)

Page 5: Semantic Search:  different meanings

The emergence of the SW

• Late 90s: introduction of ontologies as conceptual framework (classes + instances (KBs) + arbitrary semantic relations + rules) – Semantic search: Exploiting ontologies as a richer

conceptualizations & formal languages to enhance traditional keyword-based document retrieval

– Semantic search: Need to search this emergent and continuously growing structured information space (the Web of Data)

• DPLP, Geonames, DBPedia, BBC Music,... (http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataSets)

Page 6: Semantic Search:  different meanings

The Web of Data 2007

2008 2009

Extracted from: Linked Data Tutorial (Florianópolis) http://www.slideshare.net/ocorcho/linked-data-tutorial-florianpolis

Page 7: Semantic Search:  different meanings

LOD cloud May 2007

Figure from [4]

Facts:• Focal points:

• DBPedia: RDFized vesion of Wikipiedia; many ingoing and outgoing links

• Music-related datasets• Big datasets include FOAF, US Census data• Size approx. 1 billion triples, 250k links

Extracted from: Linked Data Tutorial (Florianópolis) http://www.slideshare.net/ocorcho/linked-data-tutorial-florianpolis

Page 8: Semantic Search:  different meanings

LOD cloud September 2008

Facts:• More than 35 datasets interlinked• Commercial players joined the cloud, e.g.,

BBC• Companies began to publish and host

dataset, e.g. OpenLink, Talis, or Garlik.• Size approx. 2 billion triples, 3 million links

Extracted from: Linked Data Tutorial (Florianópolis) http://www.slideshare.net/ocorcho/linked-data-tutorial-florianpolis

Page 9: Semantic Search:  different meanings

LOD cloud March 2009

Facts:• Big part from Linking Open Drug cloud and the

BIO2RDF project• Notable new datasets: Freebase, OpenCalais,

ACM/IEEE• Size > 10 billion triples

Extracted from: Linked Data Tutorial (Florianópolis) http://www.slideshare.net/ocorcho/linked-data-tutorial-florianpolis

Page 10: Semantic Search:  different meanings

The LOD clouds

Extracted from: Linked Data Tutorial (Florianópolis) http://www.slideshare.net/ocorcho/linked-data-tutorial-florianpolis

Page 11: Semantic Search:  different meanings

Commercial interest by publishers

Page 12: Semantic Search:  different meanings

Commercial interest by search engines

• 2007 Yahoo! Presents Search Monkey

Page 13: Semantic Search:  different meanings

Commercial interest by search engines

• July-2008 Microsoft buys Powerset

Page 14: Semantic Search:  different meanings

Commercial interest by search engines

• April 2010 Facebook announced the use of the Open Graph protocol

Page 15: Semantic Search:  different meanings

Commercial interest by search engines

• May-2009 Google announces Rich Snippets and it’s official use of RDFa and Microformats

Page 16: Semantic Search:  different meanings

Commercial interest by search engines

• July-2010 Google buys Metaweb (the company behind FreeBase)

Page 17: Semantic Search:  different meanings

Commercial interest by search engines• November-2010 Google announced the

support of the GoodRelations vocabulary for Google Rich Snippets.

Page 18: Semantic Search:  different meanings

Challenges

• Exploiting this new information space for semantic search purposes opens new research challenges:– Scalability– Heterogeneity– Uncertainty

Page 19: Semantic Search:  different meanings

Scalability

Effective exploitation of the linked data requires infrastructure that scales to a large and ever growing collection of interlinked data!

Page 20: Semantic Search:  different meanings

Heterogeneity

Dbpedia:Rudi_Studer

Dblp:Studer:Rudi.html

SW:/en/rudi_studer

Dblp:~ley/db/../author

SW:Person

Dbpedia:Professor

SCHEMA-LEVEL DATA-LEVEL

Align Reconcile,Combine

Effective exploitation of the data web requires an effective mechanism for • finding the relevant data sources• integrating data sources• combining elements from different data sources

Page 21: Semantic Search:  different meanings

Uncertainty

• Incomplete Representation of User’s Needs and content meanings– User cannot completely specify the need – The semantic information in the search space is

incompleteEffective exploitation requires• match user’s needs to data in an imprecise way • rank the results• be flexible enough to adjust to changes in constraints!

“Find action films directed by some Hong Kong film director and starring Chinese martial actors”

Page 22: Semantic Search:  different meanings

The Search Space: different representations

Page 23: Semantic Search:  different meanings

The search space: different representations

• Unstructured search space– The Web of documents (textual and multimedia

content)• Structured search space

– The Web of data (ontologies + Knowledge Bases)• Hybrid search space

– Unstructured content is enriched with metadata• Embedded annotations • Not embedded annotations

Page 24: Semantic Search:  different meanings

The unstructured search space

• The Web of human-understandable content.• The Web of documents and links

– <a href="http://creativecommons.org/licenses/by/3.0/">CC License</a>

Documents

Searchspace

Page 25: Semantic Search:  different meanings

Search engines

Page 26: Semantic Search:  different meanings

The structured search space• The Web of machine understandable content.• The Web of objects and relations

– <a rel="license" href="http://creativecommons.org/licenses/by/3.0/"> Creative Commons License </a>

objects

Searchspace

Page 27: Semantic Search:  different meanings

Search engines

Page 28: Semantic Search:  different meanings

The hybrid search space

• Enriching documents with metadata

Objects

Documents

How to interlink documents and data?

Searchspace

Page 29: Semantic Search:  different meanings

Two ways of interlinking metadata and documents

• Information Extraction• By relying on Web publishers

– More on the section Data on the (Semantic) Web

Page 30: Semantic Search:  different meanings

Search engines