94
1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department of Computer Science and Engineering Wright State University, Dayton, OH-45435, USA

1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Embed Size (px)

Citation preview

Page 1: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

1

Research in Semantic Web and Information Retrieval:

Trust, Sensors, and Search

T. K. Prasad (Krishnaprasad Thirunarayan)Professor

Kno.e.sis CenterDepartment of Computer Science and EngineeringWright State University, Dayton, OH-45435, USA

Page 2: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

2

Page 3: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Knowledge Enabled Information and Services Science 3

http://knoesis.wright.edu/

Page 4: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Information Retrieval

Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).

4

Page 5: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Evolution of the Web

5

Page 6: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Semantic Web

Semantic Web is a standards-based extension of the WWW in which the semantics of information and services on the web is defined, so as to satisfy information need of people and enable machines to use the web content. Machine comprehensible structured data

6

Page 7: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Tim Berner-Lee’s Semantic Web Layer Cake

7

Page 8: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Updated Semantic Web Cake

8

Page 9: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

9

Trust Issues inSocial Media and Sensor

Networks

T. K. Prasad, Cory Henson, Amit Sheth and Pramod Anantharam

Kno.e.sis CenterDepartment of Computer Science and EngineeringWright State University, Dayton, OH-45435, USA

Page 10: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

10

Goal

Study semantic issues relevant to trust in Social Media - Data and Networks Sensor - Data and Networks

Generic Examples involving Trust Analyzing ratings/reviews online on TV

models before making purchasing decision from amazon.com

Seeking recommendations on handy man, car mechanic, etc. from neighbors

Page 11: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

11

Generic Approach

Propose models of trust/trust metrics to formalize trust aggregation and trust propagation to deal with indirect trust

Develop techniques and tools to glean trust information from

social media data (streams) and networks sensor data (streams) and networks

Page 12: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Trust in Social Media Networks

12

Page 13: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Previous WorkStructure of Trust

Trust between a pair of users is modelled as a real number in the closed interval [0,1] or [-1,1] Pros: Facilitates propagation and

computation of aggregated trust Cons:

Too fine-grained, total order Inherent difficulties in initializing,

understanding, and justifying computed trust values

Page 14: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Quote

Guha et al: While continuous-valued trusts are

mathematically clean from the standpoint of usability, most real-world systems will in fact use discrete values at which one user can rate another. E.g., Epinions, Ebay, Amazon, Facebook, etc

all use small sets for (dis)trust/rating values.

Page 15: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Trust-aware Recommender Systems

Collaborative Filtering systems exploit user-similarity to get recommendations. But suffer from data sparsity problem.Adding trust links improves quality of recommendations benefits cold-start users who most need it is robust w.r.t. spamming via engineered

profiles (Shilling Attacks)

15

Page 16: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

16

Our Research Propose a model of trust based on Partially ordered discrete values (with

emphasis on relative magnitude) Local but realistic semantics

Distinguishes functional and referral trust Distinguishes direct and inferred trust

Prefers direct information over conflicting inferred information

Represents ambiguity explicitly

HOLY GRAIL: Direct Semantics in favor of Indirect Translations

Page 17: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Essential conceptsTrust Scope: Context, Action, …Functional Trust: Agent a1 trusts agent a2’s ability in some context or for doing somethingReferral Trust: Agent a1 trusts agent a2’s ability to recommend another agent in some context or for doing somethingTrust is a relationship among agents/users, while belief is a relationship between agents/users and statements

17

Page 18: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Semantics : Interpretation

Four valued logic{inconsistent information, true, false, no information}

Trust / Distrust 4-valued “binary” function among

users

Belief / Disbelief: 4-valued “binary” function on users

and statements

Page 19: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Example: Trust Network - Different Trust Links and

Local Ordering on Trust Links

Alice trusts Bob for recommending good car mechanic.Bob trusts Dick to be a good car mechanic.

Charlie does not trust Dick to be a good car mechanic.

Alice trusts Bob more than Charlie, w.r.t. car mechanic context.Alice trusts Charlie more than Bob, w.r.t. baby sitter context.

19

Page 20: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Formalization of Semantics :Basis for Trust Computation

Algorithm

20

Page 21: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Formalization Approach

Given a trust network (Nodes, Edges with Trust Scopes, Local Orderings), specify when a source agent can trust, distrust, or be ambiguous about another target agent, reflecting:

Functional and referral trust linksDirect and inferred trust Locality

21

Page 22: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

22

Page 23: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

23

Similarly for Evidence in support of Negative Functional Trust.

Page 24: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

24

Page 25: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Quote summarizing potential

bug

The whole problem with the world is that fools and fanatics are always so certain of themselves, but wiser people so full of doubts.

--- Betrand Russell25

Page 26: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Possible Future Extensions

Trust links with trust-scoped exceptions

Straddles two extremes involving just trust links and just trust-scoped links

Trust values annotated with trust path length, target neighborhood summary, etc.Other forms of trust links formalized using upper ontology

26

Page 27: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Trust in Sensor Networks

27

Page 28: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Sensor Networks

Approaches to Trust Reputation-based Trust

Based on past behavior Policy-based Trust

Based on explicitly stated constraints Evidence-based Trust

Based on seeking/verifying evidence

28

Page 29: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Probabilistic basis for reputation-based trust in a Sensor Node

Sensor Reputation and Sensor Observation Credibility determined using outlier detection algorithm aggregating results over time Homogeneous sensor networks can

exploit spatio-temporal locality and redundancy for this purpose

Heterogeneous sensor networks require complex domain models for this purpose

29

Page 30: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

(cont’d)

Trust/Reputation in a Sensor node can be modeled as beta probability distribution function with parameters (,) gleaned from total number of correct () and erroneous () observations so far.

30

Page 31: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Motivation for using Beta PDF

Computational Ease Retain/manipulate just two values (,) Incremental update : after checking

whether new data is normal or outlier

Intuitively Satisfactory Initialization not necessary (flat PDF) PDF variation sufficiently expressive

That is, it assimilates updates and large number of observations satisfactorily

31

Page 32: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Next few slides shed light on beta probability distribution function

(1)Mathematical formulation(2)Graphs for intuitive understanding of its role

32

Page 33: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Role of Beta probability distribution function

33

x is a probability, so it ranges from 0-1

If the prior distribution of p is uniform, then the beta distribution gives posterior distribution of p after observing occurrencesof event with probability pand occurrences of the complementary event with probability (1-p).

Page 34: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

34

= , so the pdf’s are symmetric w.r.t 0.5.Note that the graphs get narrower as () increases.

Page 35: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

35

=/= , so the pdf’s are asymmetric w.r.t . 0.5.Note that the graphs get narrower as () increases.

Page 36: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Advantages: Robust w.r.t. attacks

Bad-mouthing attack E-commerce analogy: Sellers collude with

buyers to give bad ratings to others

Ballot stuffing attack E-commerce analogy: Sellers collude with

buyers to give it unfairly good ratings

Sleeper attacks Apparently trusted agent defects

36

Page 37: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Trust in Tweets

37

Page 38: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Twitter

Large network of peopleLarge number of tweets Tweet – 140 character

description of an event

Problem: How to organize tweets?

38

Page 39: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Exploiting trust information

Rank tweets according to trust information

Trust in the user who tweets

Belief (trust) in the tweet

39

Page 40: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Trust in the person who tweets

Popularity of the user Based on count of followers

Reputation of the user Based on history of making informed

observations

Enrich using Pagerank Analogy? Highly trusted followers count more

than lowly trusted followers

Page 41: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Belief (Trust) in a tweet

Belief in a tweet depends on the trust in the user who generates it.

Belief in a tweet depends on the content of “similar” tweets (originating from approximately the same location around the same time)

Page 42: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Trust in Linked Open Data

42

Page 43: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Linked Data

The Linking Open Data (LOD) project is a community-led effort to create openly accessible, and interlinked, RDF Data on the Web.

RDF: Resource Description Framework – graph-based representation language

43

Page 44: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Linked Data

44

Page 45: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Exploiting trust for access and standardization

Trust in the creator of the data, and belief (trust) in the data How well connected is the data?

Rank LOD according to trust information

45

Page 46: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Sensor Data on LOD

MesoWest weather data in US~20,000+ Sensor Systems~1 billion Observational AssertionsSensors linked with Geonames on LOD

http://wiki.knoesis.org/index.php/SSW

46

Page 47: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Trust in Active Perception

47

Page 48: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Active Perception

Perception is the process of observing, hypothesis generation, and verification

48

Page 49: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Evidence-based Trust

Observations (and hypotheses) are more trusted if they can be verified through empirical evidence

Sensors are more trusted if their observations are trusted

49

Page 50: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Evidence-based Trust

50

Strengthened Trust

Trust

Page 51: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Additional uses of active perception in sensors context

Determining actionable intelligence by narrowing set of explanations to oneEnable use of a “small” set of always on sensors to bootstrap and selectively turn-on additional sensors in a resource (e.g., power) constrained environment

51

Page 52: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

52

References

Krishnaprasad Thirunarayan, Dharan Althuru, Cory Henson, and Amit Sheth, “A Local Qualitative Approach to Referral and Functional Trust,” The 4th Indian International Conference on Artificial Intelligence (IICAI-09), December 2009. Cory Henson, Joshua Pschorr, Amit Sheth, and Krishnaprasad Thirunarayan, “SemSOS: Semantic Sensor Observation Service,” International Symposium on Collaborative Technologies and Systems (CTS2009), Workshop on Sensor Web Enablement (SWE2009), Baltimore, Maryland, 2009. Krishnaprasad Thirunarayan, Cory Henson, and Amit Sheth, “Situation Awareness via Abductive Reasoning from Semantic Sensor Data: A Preliminary Report,” International Symposium on Collaborative Technologies and Systems (CTS2009), Workshop on Collaborative Trusted Sensing, Baltimore, Maryland, 2009.

Page 53: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

53

References

A. Sheth and M. Nagarajan, “Semantics-Empowered Social Computing, IEEE Internet Computing”, Jan/Feb 2009, 76-80Amit Sheth, Cory Henson, and Satya Sahoo, "Semantic Sensor Web," IEEE Internet Computing, vol. 12, no. 4, July/August 2008, p. 78-83.

Page 54: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Machine and Citizen Sensor Data Demos

Illustrate semantic web and information retrieval techniques --

spatio-temporal-thematic ontologies, mash-ups, machine

and citizen sensor data analytics

54

Page 55: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

55

High-level Sensor

Low-level Sensor

How do we check if the three images depict …

• the same time and same place?

• same entity?

• a serious threat?

Motivating Scenario: Spatio-temporal-thematic analytics

Page 56: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Semantic Observation Service:

Overall Architecture and Details

Page 57: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

SemSOS Demohttp://knoesis.wright.edu/research/

semsci/application_domain/sem_sensor/cory/demos/ssos_demo/ssos_demo.htm

Twitris Demo http://twitris.knoesis.org/

57

Page 58: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

Situation Awareness : Analysis

Situation Awareness Components

Physical World: Sensor DataPerception: Entity MetadataComprehension: Relationship Metadata

Semantic Analysis

How is the data represented? Sensor Web EnablementWhat are the sources of the data? Provenance AnalysisWhat objects/events account for the data? Abductive ReasoningWhere did the event occur? Spatial AnalysisWhen did the event occur? Temporal AnalysisWhat is the significance of the event? Thematic AnalysisWhat are the reasons for inconsistency? Abductive Reasoning

Page 59: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

59

A Unified Approach to Retrieving Web Documents and Semantic Web

Data

Trivikram Immaneni* and Krishnaprasad Thirunarayan

Department of Computer Science and EngineeringWright State UniversityDayton, OH-45435, USA

*Currently at: Technorati, San Francisco

Page 60: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

60

Outline

Goal (What?)

Background and Motivation (Why?)

Unified Web Model (Why?)

Query Language and Examples (What?)

Implementation Details (How?)

Evaluation and Applications (Why?)

Conclusions

Page 61: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

61

Goal

Page 62: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

62

Integrate HTML Web and Semantic Web by establishing and exploiting connections between them => Unified Web ModelDesign and implement a language to retrieve data and documents from the Unified Web => Hybrid Query Language Implement the system using mature software components for indexing and search => SITAR

Page 63: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

63

Background and Motivation

Page 64: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

64

HTML Web

Hyperlinked Web of documentsContent human comprehensible

Search engines and web browsers search, retrieve, navigate, and display informationKeyword-based searches have low precision and high recall

Page 65: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

65

Semantic Web

Standards-based labeled graph of resources and binary properties (data)Content machine accessible Database techniques adapted to store and retrieve Semantic Web dataQuery formulation by lay users difficult but results are precise XML, RDF, SPARQL, Web Services, etc.

Page 66: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

66

Shoehorning HTML Web into Semantic Web

Document = Data node + Content

as string in RDF graphRegular expressions in SPARQL used to retrieve documents.

Drawbacks that IR tries to overcome Ease of query formulation: Keyword-based Dealing with Large datasets: Ranking

Page 67: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

67

Formalizing HTML Web as Semantic Web

Techniques for manual (re)-authoring of (legacy) documents using Semantic Web Technologies is neither feasible nor advisable. State-of-the-art NLP and information

extraction techniques inadequate Informal description indispensable for

human comprehension Escape route: Traceability via superposition (E.g.,

RDFa)

Page 68: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

68

Shoehorning Semantic Web into HTML Web

Currently, Semantic Web documents live on the HTML Web but their components are neither accessible nor reasoned with via keyword-based searches Swoogle attempts to rank Semantic

Web documents

Page 69: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

69

Unified Web Model (What?)

Page 70: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

70

Aim

Unified Web integrates the two Webs to enable improved hybrid retrieval of data and documents. Unified Web Model Hybrid Query Language

Page 71: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

71

Unified Web Model Graph

Node Abstract entity identified by its URI

Blank/Literal node names automatically generated

Home URI Section URI index words

Document Section (optional) Outgoing Links Section Triples Section …

Page 72: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

72

(cont’d)

Relationships (Edges) hasDocument

Relates Node to content string hyperlinksTo

Relates Node with another node to which the former node’s document contains a hyperlink

Asserts Relates Node with each RDF statement in the document

linksTo Relates Node with another node

to which the former node’s document contains a hyperlink, or

such that the former node’s document contains a triple with the latter node

Page 73: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

73

Example of Unified Web Model

Document http://www.abc.com/xyz.htm contains the RDF fragment:

… <mailTo: [email protected]/> …<rdf:RDF…>

<owl:Class rdf:ID=’http://www.abc.com/sw#Jaguar’/>

</rdf:RDF> …

Page 74: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

74

Page 75: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

75

Data Retrieval from Unified Web

Unified Web Model can be specified using RDF

In terms of rdfs:Resource, rdfs:Propery, rdfs:Statement, rdfs:Literal, refs:Subject, rdfs:Predicate, rdfs:Object, etc

Unified Web is a reified Semantic Web (user triples)

SPARQL usable as query language

Page 76: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

76

Information Retrieval from Unified Web

Node can be indexed using URI index words

Based on name, content, label, triples, etc

Node can be ranked using its phrasal / URI-based annotations and its node neighborhood

Page 77: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

77

Advantages

Semantic Web nodes can be retrieved using (associated) keywords Legacy document recall improved by interpreting hyperlink as Semantic Markup for reasoning.

Hyperlink: mailto:[email protected] Triple: <mailto:[email protected] rdf:type univ:prof>

Page 78: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

78

Semantics rich URIs (such as those from dictionary.com) in legacy documents can be incrementally equated with ontologies

Document: … <a href = http://dictionary.com/search?q=jaguar> Jaguar </a> God of the Underworld …

Ontology: … <http://dictionary.com/search?q=jaguar owl:Sameas http://www.animalOnto.com/Jaguar> ...

Page 79: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

79

Query Language and Examples (What?)

Page 80: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

80

Aim

Store and retrieve Semantic Web data, and use information in documents to enhance data retrieval Enable use of keywords to deal with

lack of complete URI information Peter affiliated-with ?X

Enable use of partial information about data being searched

Student :: Peter affiliated-with ?X

Page 81: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

81

Store and retrieve documents, and use information in the Semantic Web to enhance document retrieval

Docsearch(<animal>::<jaguar> Maya God)

Page 82: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

82

Sample Queries

Wordset queries: <peter haase> -> retrieves all URIs

indexed by BOTH peter AND haase Includes document and URIsURIs are indexed by words.

The words are obtained by analyzing URIs, from label literals, and anchor text of the URIs.

Page 83: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

83

Wordset Pair queries:

<phdstudent>::<peter> -> specifies that user is looking for peter, the phd student

Transitive closure

Page 84: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

84

More Queries

Get Peter the Phd student’s home page: getBindings ( [<phdstudent>::<peter> <homepage> ?

x] ) Get Peter Haase’s publications that have “Semantic” in their title:

getBindings([<peter haase> <publication> ?x] [?x <title> <semantic>])

Get group 1 element which is white in color getBindings( [?x <group> <group 1>]  [?x <color> <white>] )

Page 85: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

85

Homepages of Phd students named Peter that “talk about” Semantic Grid

getDocsByBindingsAndContent ( [<phdstudent>::<peter> <homepage> ?x] “semantic grid” )

getLinkingNodes (

http://www.aifb.uni-karlsruhe.de/Personen/viewPerson?id_db=2023 ) getAssertingNodes([<peter haase> <publication> ?x]). getDocsByIndexOrContent (peter haase)

Page 86: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

86

Implementation Details (How?)

SITAR : Semantic InformaTion Analysis and Retrieval system

Page 87: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

87

Tools Used

Apache Lucene 2.0 APIs in Java A high-performance, text search

engine library with smart indexing strategies.

Cyberneko HTML ParserJena ARP RDF parser

Page 88: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

88

Evaluation and Application (Why?)

Page 89: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

89

Experiments

DATASETs: AIFB SEAL data

The crawler collected 1665 files (English XHTML pages and RDF/OWL pages).

1455 (610 RDF files and 845 XHTML files) were successfully parsed and indexed

A total of 193520 triples were parsed and indexed

Page 90: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

90

Datasets (cont’d)

TAP dataset Periodic table Lehigh University BenchMarks

Page 91: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

91

Conclusions

Page 92: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

92

Developed a Hybrid Query language for data and document retrieval that is convenient because it is keyword-based that can be accurate and flexible because

disambiguation information can be provided that is expressive because it can support inheritance

reasoning that is pragmatic because it can work with legacy

documents

FUTURE WORK: Robust Ranking Strategy

Page 93: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

93

References

T. Immaneni and K. Thirunarayan, A Unified approach To Retrieving Web Documents and Semantic Web Data, In: Proceedings of the 4th European Semantic Web Conference (ESWC 2007), LNCS 4519, pp. 579-593, June 2007. T. Immaneni, and K. Thirunarayan, Hybrid Retrieval from the Unified Web, In: Proceedings of the 22nd Annual ACM Symposium on Applied Computing (ACM SAC 2007), pp. 1376-1380, March 2007.

Page 94: 1 Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search T. K. Prasad (Krishnaprasad Thirunarayan) Professor Kno.e.sis Center Department

THANK YOU!

http://knoesis.wright.edu/tkprasad/

94