UMBC an Honors University in Maryland 1 Finding and Ranking Knowledge on the Semantic Web Li Ding, Rong Pan, Tim Finin, Anupam Joshi, Yun Peng and Pranam

Embed Size (px)

DESCRIPTION

UMBC an Honors University in Maryland 3 Google has made us smarter

Citation preview

UMBC an Honors University in Maryland 1 Finding and Ranking Knowledge on the Semantic Web Li Ding, Rong Pan, Tim Finin, Anupam Joshi, Yun Peng and Pranam Kolari University of Maryland, Baltimore County This work was partially supported by DARPA contract F , NSF grants CCR and IIS and grants from IBM, Fujitsu and HP. UMBC an Honors University in Maryland 2 This talk Motivation Swoogle overview Bots navigate the Semantic Web Ranking Semantic Web content Use cases and applications Conclusions UMBC an Honors University in Maryland 3 Google has made us smarter UMBC an Honors University in Maryland 4 But what about our agents? tell register A Google for knowledge on the Semantic Web is needed by people and software agents UMBC an Honors University in Maryland 5 This talk Motivation Swoogle overview Bots navigate the Semantic Web Ranking Semantic Web content Use cases and applications Conclusions UMBC an Honors University in Maryland 6 title text UMBC an Honors University in Maryland 7 Swoogle Architecture metadata creation data analysis interface SWD discovery SWD Metadata Web Service Web Server SWD Cache The Web Candidate URLs Web Crawler SWD Reader IR analyzerSWD analyzer Agent Service Swoogle 2: 340K SWDs, 48M triples, 5K SWOs, 97K classes, 55K properties, 7M individuals (4/05) Swoogle 3: 700K SWDs, 135M triples, 7.7K SWOs, (11/05) Find Time Ontology We can use a set of keywords to search ontology. For example, time, before, after are basic concepts for a Time ontology. Demo 1 Digest Time Ontology (document view) Demo 2(a) Digest Time Ontology (term view) Demo 2(b) . TimeZone before intAfter Find Term Person Demo 3 Not capitalized! URIref is case sensitive! Digest Term Person Demo different properties 562 different properties Demo 5(a) Swoogle Today UMBC an Honors University in Maryland 14 Demo 5(b) Swoogle Statistics FOAF Trustix W3C Stanford UMBC an Honors University in Maryland 15 Swoogles Triple Store lets you shop And check out your triples into any of several reasoners UMBC an Honors University in Maryland 16 Summary Swoogle (Mar, 2004) Swoogle2 (Sep, 2004) Swoogle3 (July 2005) Automated SWD discovery SWD metadata creation and search Ontology rank (rational surfer model) Swoogle watch Web Interface Ontology dictionary Swoogle statistics Web service interface (WSDL) Bag of URIref IR search Triple shopping cart Better (re-)crawling strategies Better navigation models Index instance data More metadata (ontology mapping and OWL-S services ) Better web service interfaces IR component for string literals UMBC an Honors University in Maryland 17 This talk Motivation Swoogle overview Bots navigate the Semantic Web Ranking Semantic Web content Use cases and applications Conclusions UMBC an Honors University in Maryland 18 The Semantic Web Onion Universal RDF Graph RDF Document Class-instance Molecule Triple Physically hosting knowledge (About 100 triples per SWD in average) The Semantic Web (About 10M documents) Finest lossless set of triples triples modifying the same subject Atomic knowledge block Resource Literal Swoogle maintains metadata about objects in different layers of the Semantic Web Onion. UMBC an Honors University in Maryland 19 RDF graph Resource Web SWT SWD uses populates defines officialOnto isDefinedBy owl:imports rdfs:seeAlso rdfs:isDefinedBy SWO isUsedBy isPopulatedBy rdfs:subClassOf sameNamespace, sameLocalname Extends class-property bond Term Search Document Search literal Semantic Web Navigation Model Navigating the HTML web is simple; theres just one kind of link. The SW has more kinds of links and hence more navigation paths. UMBC an Honors University in Maryland 20 RDF graph Resource Web SWT SWD uses populates defines officialOnto isDefinedBy owl:imports rdfs:seeAlso rdfs:isDefinedBy SWO isUsedBy isPopulatedBy rdfs:subClassOf sameNamespace, sameLocalname Extends class-property bond Term Search Document Search literal Semantic Web Navigation Model Relations in 1 and 3 and parts of 4 require a global view to discover UMBC an Honors University in Maryland 21 foaf:Personfoaf:Agent rdfs:subClassOf foaf:mbox foaf:Person rdf:type foaf:mbox rdfs:domain owl:InverseFunctionalProperty owl:Class rdfs:range owl:Thing rdf:type foaf:Person rdf:typerdfs:seeAlsoowl:imports An Example We navigate the Semantic Web via links in the physical layer of RDF documents and also via links in the logical layer defined by the semantics of RDF and OWL. UMBC an Honors University in Maryland 22 This talk Motivation Swoogle overview Bots navigate the Semantic Web Ranking Semantic Web content Use cases and applications Conclusions UMBC an Honors University in Maryland 23 Rank has its privilege Google introduced a new approach to ranking query results using a simple popularity metric. It was a big improvement! Swoogle ranks its query results also When searching for an ontology, class or property, wouldnt one want to see the most used ones first? Ranking SW content requires different algorithms for different kinds of SW objects For SWDs, SWTs, individuals, assertions, molecules, etc UMBC an Honors University in Maryland 24 Google s PageRank A pages rank is a function of how many links point to it and the rank of the pages hosting those links. The random surfer model provides the intuition: (1)Jump to a random page (2)Select and follow a random link on the page and repeat until bored (3)If bored, go to (1) Ranked pages by the relative frequency with which they are visited. Jump to a random page Follow a random link bored? no yes UMBC an Honors University in Maryland 25 Ranking Semantic Web Documents Target: a pure SW dataset Nodes: a collection of online SWDs (330K SWDs, 1.5% are labeled as ontologies) Links: in addition to hyperlinks, term level relations are generalized into TM, EX, IM. Rational surfer model (extension of weighted PageRank) Semantic content (term level relations) encoded into links rank of node iteratively spread via links weight/capacity of link vary according to link semantics propagate weight to imported ontologies Evaluation Method: Compare OntoRank with PageRank for promoting ontologies even using the same Pure SW Dataset UMBC an Honors University in Maryland 26 An ExampleEX TMwPR =0.2 wPR =100 wPR =3 wPR =300 OntoRank =0.2 OntoRank =100 OntoRank =103 OntoRank =403 UMBC an Honors University in Maryland 27 Ontology Dictionary Motivation One ontology does not always provide all needed vocabulary There could be many scenario that requires assembling terms from multiple ontologies DIY ontology engineering 1.Search an appropriate class C 2.Search for popular properties used for modifying Cs class instance 3.Go back to step 1 if more classes are needed UMBC an Honors University in Maryland 28 Ranking Semantic Web Terms Pr(Term|Doc) can be measured by the normalized value of the product of the terms Popularity: how many SWDs is using the term. Frequency: how many times the term is used in the SWD SWDs are accessed non-uniformly by OntoRank TermRank estimates a terms importance as Pr(Term|Doc) * OntoRank(Doc) Evaluation Compare TermRank with Terms popularity for the top 10 highest rated terms and compose analytical evaluation. UMBC an Honors University in Maryland 29 Class-Property Bonds Class Definition rdfs:subClassOf -- foaf:Agent rdfs:label Person Class-Property Bond (introduced by instances) foaf:name dc:title Class-Property Bond (introduced by ontology) foaf:mbox foaf:name rdf:type owl:Class rdf:type a human being rdfs:comment foaf:name Tim Finin Tims FOAF File dc:title foaf:mbox rdfs:domain foaf:Agent rdfs:subClassOf rdfs:domain SWD1 SWD3 SWD2 foaf:Person UMBC an Honors University in Maryland 30 This talk Motivation Swoogle overview Bots navigate the Semantic Web Ranking Semantic Web content Use cases and applications Conclusions UMBC an Honors University in Maryland 31 Supporting Semantic Web Developers Finding SW content Ontologies, classes, properties, molecules, triples, partial ontology mappings, authoritative copies Ad hoc data collection Exploring how the SW is being used, e.g. Computing basic statistics Ranking properties used with foaf:person And misused Finding common typos UMBC an Honors University in Maryland 32 Applications and use cases Supporting Semantic Web developers, e.g., Ontology designers Vocabulary discovery Whos using my ontologies or data? Etc. Searching specialized collections, e.g., Proofs in Inference Web Text Meaning Representations of news stories in SemNews Supporting SW tools, e.g., Discovering mappings between ontologies UMBC an Honors University in Maryland 33 UMBC an Honors University in Maryland 34 UMBC an Honors University in Maryland 35 UMBC an Honors University in Maryland 36 This talk Motivation Swoogle overview Bots navigate the Semantic Web Ranking Semantic Web content Use cases and applications Conclusions UMBC an Honors University in Maryland 37 Will it Scale? How? Heres a rough estimate of the data in RDF documents on the semantic web based on Swoogles crawling System/dateTermsDocumentsIndividualsTriplesBytes Swoogle21.5x x10 5 7x10 6 5x10 7 7x10 9 Swoogle32x10 5 7x x x10 7 1x x10 5 5x10 6 5x10 7 5x10 8 5x x10 5 5x10 7 5x10 8 5x10 9 5x10 11 We think Swoogles centralized approach can be made to work for the next few years if not longer. UMBC an Honors University in Maryland 38 How much reasoning? SwoogleN (N