Click here to load reader

Recent Trends in Semantic Search Technologies

Embed Size (px)

DESCRIPTION

A talked given by Peter Mika and Thanh Tran at SemTechBiz 2013

Citation preview

2. About the speakers Peter Mika Senior Research Scientist Head of Semantic Search group atYahoo! Labs Expertise: Semantic Search, WebObject Retrieval, Natural LanguageProcessing Tran Duc Thanh CEO of Semsolute, Semantic SearchTechnologies Company Served as Assistant Professor forKarlsruhe Institute of Technology andStanford University Expertise: Semantic Search,Semantic / Linked Data Management 3. Agenda Why Semantic Search What is Semantic Search Innovative Semantic Search Applications Behind the Scene Questions 4. Why Semantic Search? 5. Why Semantic Search? I. We are at the beginning of search. (Marissa Mayer) Solved large classes of queries, e.g. navigational Remaining queries are hard, not solvable by bruteforce, require deep understanding of the world andhuman cognition, e.g. Ambiguous searches: paris hilton Imprecise or overly precise searches Searches for descriptions: 34 year old computer scientistliving in barcelona Background knowledge and metadata can help toaddress poorly solved queriesMany of these querieswould not be asked byusers, who learned overtime what searchtechnology can and cannot do. 6. Why Semantic Search? II. The Semantic Web is now a reality Large amounts of data published in RDF Linked Data Metadata in HTML Facebooks Open Graph Protocol Schema.org Casual users Dont know SPARQL Unaware of the schema of the data Searching data instead or in addition to searchingdocuments Enable innovative search applications / tasks 7. What is Semantic Search? 8. Semantic Search: Using Semantic Models forSearch Semantic search is a retrieval paradigm that Exploits the semantics of the data or explicit backgroundknowledge to understand user intent and the meaning ofcontent Incorporates the intent of the query and the meaning ofcontent into the search process (semantic models) 9. Semantic Search: Different Kinds / DifferentUses of Semantic Models Wide range of semantic search systems Employ different semantic models, possibly atdifferent steps of the search process and in order tosupport different tasks Query formulation Query processing / understanding Ranking Result presentation Result / query refinement 10. Semantic models Semantics is concerned with the meaning of theresources made available for search Various representations of meaning Word-level models: models of relationships amongwords Taxonomies, thesauri, dictionaries of entity names Inference along linguistic relations, e.g. broader/narrowerterms Concept-level models: models of relationshipsamong objects Ontologies capture entities in the world and theirrelationships Inference along domain-specific relations 11. Graph-based Conceptual Models Core of W3C standards for knowledge representationand data exchange: RDF, OWL Large amount of data / knowledge on the Webavailable as graphs Linked Data: hundreds of interconnected datasetscapturing domain-independent and domain-specificknowledge Metadata in HTML RDFa, microdata, Facebooks OGP Private graphs Googles Knowledge Graph Facebook Graph Yahoos Knowledge Base (talk yesterday) Microsofts Satori 12. Linked Data 13. Where can you find Linked Data? Downloads Dbpedia data dumps SPARQL access LOD cache by OpenLink: 51 billion triples Keyword search Sindice by SindiceTech 14. Google Knowledge Graph Start with Freebases database, which had 12 millionentities As of June 2012, Knowledge Graph has 500 millionentities and over 3.5 billion relationships betweenthose entities Prioritize properties based on what users were most 15. Facebooks Open Graph Protocol The Like button provides publishers with a way topromote their content on Facebook and buildcommunities Shows up in profiles and news feed Site owners can later reach users who have liked anobject Facebook Graph API allows 3rd party developers toaccess the data Open Graph Protocol is an RDFa-based format thatallows to describe the object that the user Likes 16. Facebooks Open Graph Protocol RDF vocabulary to be used in conjunction with RDFa Simplify the work of developers by restricting the freedom in RDFa Activities, Businesses, Groups, Organizations, People, Places,Products and Entertainment Only HTML accepted http://opengraphprotocol.org/The Rock (1996) ... 17. Semantic Web markup: schema.org Agreement on a shared set of schemas for common typesof web content Use a single format to communicate the same information to all threesearch engines Bing, Google, and Yahoo! (June, 2011), Yandex (Nov, 2011) Microdata and RDFa support Schemas for most common web content Business listings, images/video, recipes, reviews, products, jobs Community [email protected] 18. Schema.org 19. Current state of metadata on the Web Analysis of the Bing/Yahoo! Search Crawl US crawl, January, 2012 31% of webpages, 5% of domains contain some metadata P. Mika, T. Potter. Metadata Statistics for a Large Web Corpus,LDOW 2012 WebDataCommons.org Data extracted from a public crawl (commoncrawl.org) February, 2012 results show 11% of URLs with metadatacompared to 5% in 2009/2010 data 7.3 billion triples available for download H.Mhleisen, C.Bizer.Web Data Commons - ExtractingStructured Data from Two Large Web Corpora, LDOW 2012 Large increase in RDFa and microdata adoption comparedto microformats 20. Where can you find HTML metadata? Web Data Commons Glimmer: glimmer.research.yahoo.com Online index of the schema.org data in Web DataCommons 21. Innovative Semantic Search Applications 22. Innovative Semantic Search Applications Entity search: entity/entities as results Factual search: direct answers, facts (about entities) Relational search: complex relationships between entities Semantic auto-completion: suggesting queries based onthe intent of the provided inputs Results aggregation / analysis / prediction: applycomputational models Semantic log analysis: understanding user behavior interms of objects Semantic profiling: recommendations based on particularinterests Semantic context: contextual model of users / interests Support for complex tasks, e.g. booking a vacation using acombination of services Conversational search 23. Entity Search: Entity-basedDisambiguation 24. Entity Search: Entity Summary 25. Entity Search: Entity-based Navigation / Exploration 26. Factual Search 27. Relational Search 28. Semantic auto-completion: Facebook GraphSearch 29. Semantic Auto-completion: Semsolutes semantic searchengineVorlesung Knowledge Discovery - InstitutAIFBSyntacticCompletionsKeywordsSemanticCompletions29 30. Results Aggregation 31. Contextual (pervasive, ambient) searchYahoo! ConnectedTV:Widget engineembedded into theTVYahoo! IntoNow:recognize audio andshow related content 32. Interactive Voice Search Siri Question-Answering Variety of backend sourcesincluding Wolfram Alpha andvarious Yahoo! services Task completion E.g. schedule an event 33. Conversational Search Googles Interactive Voice Search 34. Conversational Search Parlance EU project Complex dialogs around a set of objects Restaurant Area Price range Type of cuisine Complete system Automated Speech Recognition (ASR) Spoken Language Understanding (SLU) Interaction Management Knowledge Base Natural Language Generation (NLG) Text-to-Speech (TTS) Video Commercial alternatives from Nuance 35. Behind the Scene 36. Main Technological Building Blocks Query Interpretation Spelling Correction Query Segmentation Entity Recognition Query Intent Interpretation for Semantic Auto-Completion Ranking Entity Ranking Relationship Ranking Aggregation Result Fusion Rank / Score Aggregation Result Presentation Summary Generation Visualization 37. Semsolutes Building Blocks - Keyword / Key PhraseInterpretationEntityaddress company sanfrancisco Semantic entity index Inverted index for entities /triples Return entities / entitiesrelationships as results tokeys Semantic entity ranking Structured language model:one language model for everyattribute Returns entities LMs thatmost likely generate thekeywords, i.e. the entitydescriptions that best match 38. Relationships / StructureEntityaddress company sanfranciscoSemsolutes Building Blocks Semantic GraphConstruction Offline component: query-independent schema graph Reuse schema Pseudo-schema construction:all possible connectionsbetween classes of entities,e.g. friendships between users Online component: query-specific keyword matchingelements Connect keyword matchingelements / entities to theclasses they belong to 39. Relationships / StructureEntityaddress company sanfranciscoSemsolutes Building Blocks Graph Exploration Top-k graph exploration Shortest-path based algorithmthat finds top-k graphsconnecting keyword matchingelements Top-k graph ranking Language model based Aggregated model thatcombines the LMs of entitiesmatching the keywords 40. Semsolutes Building Blocks Query Generation &ProcessingTripleRelationships / StructureEntityAddress of companies located in SanFrancisco?address company sanfrancisco Graph to query mapping Translation rules that map topranked graphs to structuredqueries (SQL, SPARQL) Translation rules that mapstructured queries to naturallanguage questions Graph matching Triple index: cover indexsupporting different triplepatterns Various join implementations 41. Yahoo! Spark: Entity Recommendation inSearch Different use cases in Web Search Some users are short on time Need direct answers Query expansion, question-answering, information boxes, richresults Other users want to explore Long term interests such as sports, celebrities, movies and music Long running tasks such as travel planning Spark is a search assistance tool for exploration Recommend related entities given the users currentquery Based on explicit relations in a Knowledge Base 42. Example user sessions 43. Spark example I. 44. Spark example II. 45. High-Level Architecture ViewEntitygraphDatapreprocessingFeatureextractionModellearningFeaturesourcesEditorialjudgementsDatapackRankingmodelRanking anddisambiguationEntitydataFeatures 46. Spark challenges Interpretation and disambiguation Obama and Toyota are places in Japan, but maybethe user is not looking for them The popularity of obama is not a sign of thepopularity of a Japanese town Ranking Release me from Engelbert Humperdinck shouldrank higher than Lesbian Seagull which onlyappeared on the soundtrack of a Beavis andButthead episode Editorial relevance vs. what people click Large-scale data processing and ML Knowledge Base built from Wikipedia, Yahoo!data, Web extraction Feature extraction from query logs, Flickr and TwitterdataEntitygraphDatapreprocessingFeatureextractionModellearningFeaturesourcesEditorialjudgementsDatapackRankingmodelRanking anddisambiguationEntitydataFeatures 47. Contact Peter Mika [email protected] @pmika Tran Duc Thanh [email protected] 48. Resources 49. Resources Detailed information Peter Mika. Entity Search on the Web, Keynote at Web ofLinked Entities WS Peter Mika, Thanh Tran. Semantic search tutorialSemTech2012 Books Ricardo Baeza-Yates and Berthier Ribeiro-Neto. ModernInformation Retrieval. ACM Press. 2011 Survey papers Thanh Tran, Peter Mika. Survey of Semantic SearchApproaches. Under submission, 2012. Conferences and workshops ISWC, ESWC, WWW, SIGIR, CIKM, SemTech Semantic Search workshop series Exploiting Semantic Annotations in Information Retrieval(ESAIR) Entity-oriented Search (EOS) workshop Web of Linked Entities (WoLE) workshop