Research in Semantic Web and Information Retrieval: Trust

Wright State UniversityCORE Scholar

Kno.e.sis Publications The Ohio Center of Excellence in Knowledge-Enabled Computing (Kno.e.sis)

12-10-2009

Research in Semantic Web and InformationRetrieval: Trust, Sensors, and SearchKrishnaprasad ThirunarayanWright State University - Main Campus, [email protected]

Follow this and additional works at: https://corescholar.libraries.wright.edu/knoesis

Part of the Bioinformatics Commons, Communication Technology and New Media Commons,Databases and Information Systems Commons, OS and Networks Commons, and the Science andTechnology Studies Commons

This Article is brought to you for free and open access by the The Ohio Center of Excellence in Knowledge-Enabled Computing (Kno.e.sis) at COREScholar. It has been accepted for inclusion in Kno.e.sis Publications by an authorized administrator of CORE Scholar. For more information, pleasecontact [email protected], [email protected].

Repository CitationThirunarayan, K. (2009). Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search. .https://corescholar.libraries.wright.edu/knoesis/77

https://corescholar.libraries.wright.edu?utm_source=corescholar.libraries.wright.edu%2Fknoesis%2F77&utm_medium=PDF&utm_campaign=PDFCoverPages

https://corescholar.libraries.wright.edu/knoesis?utm_source=corescholar.libraries.wright.edu%2Fknoesis%2F77&utm_medium=PDF&utm_campaign=PDFCoverPages

https://corescholar.libraries.wright.edu/knoesis_comm?utm_source=corescholar.libraries.wright.edu%2Fknoesis%2F77&utm_medium=PDF&utm_campaign=PDFCoverPages

https://corescholar.libraries.wright.edu/knoesis_comm?utm_source=corescholar.libraries.wright.edu%2Fknoesis%2F77&utm_medium=PDF&utm_campaign=PDFCoverPages

https://corescholar.libraries.wright.edu/knoesis?utm_source=corescholar.libraries.wright.edu%2Fknoesis%2F77&utm_medium=PDF&utm_campaign=PDFCoverPages

http://network.bepress.com/hgg/discipline/110?utm_source=corescholar.libraries.wright.edu%2Fknoesis%2F77&utm_medium=PDF&utm_campaign=PDFCoverPages






mailto:[email protected],%[email protected]

1

Research in Semantic Web and Information Retrieval:

Trust, Sensors, and Search

T. K. Prasad (Krishnaprasad Thirunarayan)

Professor Kno.e.sis Center

Department of Computer Science and Engineering Wright State University, Dayton, OH-45435, USA

2

http://knoesis1.wright.edu/SSW_NEW_BETA/

Knowledge Enabled Information and Services Science 3

http://knoesis.wright.edu/

http://knoesis.wright.edu/

Information Retrieval

Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).

4

Evolution of the Web

5

Semantic Web Semantic Web is a standards- based extension of the WWW in which the semantics of information and services on the web is defined, so as to satisfy information need of people and enable machines to use the web content. Machine comprehensible structured data

6

Tim Berner-Lee’s Semantic Web Layer Cake

7

Updated Semantic Web Cake

8

9

Trust Issues in Social Media and Sensor Networks

T. K. Prasad, Cory Henson,

Amit Sheth and Pramod Anantharam Kno.e.sis Center

Department of Computer Science and Engineering Wright State University, Dayton, OH-45435, USA

10

Goal

Study semantic issues relevant to trust in Social Media - Data and Networks Sensor - Data and Networks

Generic Examples involving Trust Analyzing ratings/reviews online on TV models

before making purchasing decision from amazon.com

Seeking recommendations on handy man, car mechanic, etc. from neighbors

11

Generic Approach

Propose models of trust/trust metrics to formalize trust aggregation and trust propagation to deal with indirect trust Develop techniques and tools to glean trust information from

social media data (streams) and networks sensor data (streams) and networks

Trust in Social Media Networks

12

Previous Work Structure of Trust

Trust between a pair of users is modelled as a real number in the closed interval [0,1] or [-1,1] Pros: Facilitates propagation and

computation of aggregated trust Cons:

Too fine-grained, total order Inherent difficulties in initializing,

understanding, and justifying computed trust values

Quote

Guha et al: While continuous-valued trusts are

mathematically clean from the standpoint of usability, most real-world systems will in fact use discrete values at which one user can rate another. E.g., Epinions, Ebay, Amazon, Facebook, etc all

use small sets for (dis)trust/rating values.

Trust-aware Recommender Systems

Collaborative Filtering systems exploit user-similarity to get recommendations. But suffer from data sparsity problem. Adding trust links improves quality of recommendations benefits cold-start users who most need it is robust w.r.t. spamming via engineered profiles

(Shilling Attacks)

15

16

Our Research Propose a model of trust based on Partially ordered discrete values (with emphasis

on relative magnitude) Local but realistic semantics

Distinguishes functional and referral trust Distinguishes direct and inferred trust

Prefers direct information over conflicting inferred information

Represents ambiguity explicitly

HOLY GRAIL: Direct Semantics in favor of Indirect Translations

Essential concepts Trust Scope: Context, Action, … Functional Trust: Agent a1 trusts agent a2’s ability in some context or for doing something Referral Trust: Agent a1 trusts agent a2’s ability to recommend another agent in some context or for doing something Trust is a relationship among agents/users, while belief is a relationship between agents/users and statements

17

Semantics : Interpretation Four valued logic {inconsistent information, true, false, no information} Trust / Distrust 4-valued “binary” function among users Belief / Disbelief: 4-valued “binary” function on users and

statements

Example: Trust Network - Different Trust Links and

Local Ordering on Trust Links

Alice trusts Bob for recommending good car mechanic. Bob trusts Dick to be a good car mechanic. Charlie does not trust Dick to be a good car mechanic. Alice trusts Bob more than Charlie, w.r.t. car mechanic context. Alice trusts Charlie more than Bob, w.r.t. baby sitter context.

19

Formalization of Semantics : Basis for Trust Computation Algorithm

20

Formalization Approach

Given a trust network (Nodes, Edges with Trust Scopes, Local Orderings), specify when a source agent can trust, distrust, or be ambiguous about another target agent, reflecting:

Functional and referral trust links Direct and inferred trust Locality

21

22

23

Similarly for Evidence in support of Negative Functional Trust.

24

Quote summarizing potential bug

The whole problem with the world is that fools and fanatics are always so certain of themselves, but wiser people so full of doubts.

--- Betrand Russell

25

Possible Future Extensions Trust links with trust-scoped exceptions

Straddles two extremes involving just trust links and just trust-scoped links

Trust values annotated with trust path length, target neighborhood summary, etc. Other forms of trust links formalized using upper ontology

26

Trust in Sensor Networks

27

Sensor Networks

Approaches to Trust Reputation-based Trust

Based on past behavior Policy-based Trust

Based on explicitly stated constraints Evidence-based Trust

Based on seeking/verifying evidence

28

Probabilistic basis for reputation-based trust in a Sensor Node

Sensor Reputation and Sensor Observation Credibility determined using outlier detection algorithm aggregating results over time Homogeneous sensor networks can

exploit spatio-temporal locality and redundancy for this purpose

Heterogeneous sensor networks require complex domain models for this purpose

29

(cont’d)

Trust/Reputation in a Sensor node can be modeled as beta probability distribution function with parameters (α,β) gleaned from total number of correct (α−1) and erroneous (β−1) observations so far.

30

Motivation for using Beta PDF

Computational Ease Retain/manipulate just two values (α,β) Incremental update : after checking

whether new data is normal or outlier Intuitively Satisfactory Initialization not necessary (flat PDF) PDF variation sufficiently expressive

That is, it assimilates updates and large number of observations satisfactorily

31

Next few slides shed light on beta probability distribution function

(1) Mathematical formulation (2)Graphs for intuitive understanding of its role

32

Role of Beta probability distribution function

33

x is a probability, so it ranges from 0-1

If the prior distribution of p is uniform, then the beta distribution gives posterior distribution of p after observing α−1 occurrences of event with probability p and β−1 occurrences of the complementary event with probability (1-p).

34

α= 5 β= 5

α= 1 β= 1

α= 2 β= 2

α= 10 β= 10

α= β, so the pdf’s are symmetric w.r.t 0.5. Note that the graphs get narrower as (α+β) increases.

35

α= 5 β= 25

α= 5 β= 10

α= 25 β= 5

α= 10 β= 5

α=/= β, so the pdf’s are asymmetric w.r.t . 0.5. Note that the graphs get narrower as (α+β) increases.

Advantages: Robust w.r.t. attacks

Bad-mouthing attack E-commerce analogy: Sellers collude with

buyers to give bad ratings to others

Ballot stuffing attack E-commerce analogy: Sellers collude with

buyers to give it unfairly good ratings

Sleeper attacks Apparently trusted agent defects

36

Trust in Tweets

37

Twitter

Large network of people Large number of tweets Tweet – 140 character description

of an event

Problem: How to organize tweets?

38

Exploiting trust information

Rank tweets according to trust information

Trust in the user who tweets

Belief (trust) in the tweet

39

Trust in the person who tweets Popularity of the user Based on count of followers Reputation of the user Based on history of making informed

observations Enrich using Pagerank Analogy? Highly trusted followers count more

than lowly trusted followers

Belief (Trust) in a tweet

Belief in a tweet depends on the trust in the user who generates it. Belief in a tweet depends on the content of “similar” tweets (originating from approximately the same location around the same time)

Trust in Linked Open Data

42

Linked Data

The Linking Open Data (LOD) project is a community-led effort to create openly accessible, and interlinked, RDF Data on the Web. RDF: Resource Description Framework – graph-based representation language

43

Linked Data

44

Exploiting trust for access and standardization

Trust in the creator of the data, and belief (trust) in the data How well connected is the data? Rank LOD according to trust information

45

Sensor Data on LOD

MesoWest weather data in US ~20,000+ Sensor Systems ~1 billion Observational Assertions Sensors linked with Geonames on LOD

http://wiki.knoesis.org/index.php/SSW

46

http://wiki.knoesis.org/index.php/SSW

Trust in Active Perception

47

Active Perception

Perception is the process of observing, hypothesis generation, and verification

48

Evidence-based Trust

Observations (and hypotheses) are more trusted if they can be verified through empirical evidence Sensors are more trusted if their observations are trusted

49

Evidence-based Trust

50

Strengthened Trust

Trust

Additional uses of active perception in sensors context

Determining actionable intelligence by narrowing set of explanations to one Enable use of a “small” set of always on sensors to bootstrap and selectively turn-on additional sensors in a resource (e.g., power) constrained environment

51

52

References

Krishnaprasad Thirunarayan, Dharan Althuru, Cory Henson, and Amit Sheth, “A Local Qualitative Approach to Referral and Functional Trust,” The 4th Indian International Conference on Artificial Intelligence (IICAI-09), December 2009. Cory Henson, Joshua Pschorr, Amit Sheth, and Krishnaprasad Thirunarayan, “SemSOS: Semantic Sensor Observation Service,” International Symposium on Collaborative Technologies and Systems (CTS2009), Workshop on Sensor Web Enablement (SWE2009), Baltimore, Maryland, 2009. Krishnaprasad Thirunarayan, Cory Henson, and Amit Sheth, “Situation Awareness via Abductive Reasoning from Semantic Sensor Data: A Preliminary Report,” International Symposium on Collaborative Technologies and Systems (CTS2009), Workshop on Collaborative Trusted Sensing, Baltimore, Maryland, 2009.

53

References

A. Sheth and M. Nagarajan, “Semantics-Empowered Social Computing, IEEE Internet Computing”, Jan/Feb 2009, 76-80 Amit Sheth, Cory Henson, and Satya Sahoo, "Semantic Sensor Web," IEEE Internet Computing, vol. 12, no. 4, July/August 2008, p. 78-83.

Machine and Citizen Sensor Data Demos

Illustrate semantic web and information retrieval techniques --

spatio-temporal-thematic ontologies, mash-ups, machine and citizen

sensor data analytics

54

55

High-level Sensor Low-level Sensor

How do we check if the three images depict …

• the same time and same place?

• same entity?

• a serious threat?

Motivating Scenario: Spatio-temporal-thematic analytics

Semantic Observation Service:

Overall Architecture and Details

SemSOS Demo http://knoesis.wright.edu/research/sems

ci/application_domain/sem_sensor/cory/demos/ssos_demo/ssos_demo.htm Twitris Demo http://twitris.knoesis.org/

57

http://knoesis.wright.edu/researchers/cory/demos/ssos_demo/ssos_demo.htm



http://twitris.knoesis.org/

Situation Awareness : Analysis Situation Awareness Components

Physical World: Sensor Data Perception: Entity Metadata Comprehension: Relationship Metadata

Semantic Analysis How is the data represented? Sensor Web Enablement What are the sources of the data? Provenance Analysis What objects/events account for the data? Abductive Reasoning Where did the event occur? Spatial Analysis When did the event occur? Temporal Analysis What is the significance of the event? Thematic Analysis What are the reasons for inconsistency? Abductive Reasoning

59

A Unified Approach to Retrieving Web Documents and Semantic Web Data

Trivikram Immaneni* and Krishnaprasad Thirunarayan

Department of Computer Science and Engineering

Wright State University Dayton, OH-45435, USA

*Currently at: Technorati, San Francisco

60

Outline

Goal (What?)

Background and Motivation (Why?)

Unified Web Model (Why?)

Query Language and Examples (What?)

Implementation Details (How?)

Evaluation and Applications (Why?)

Conclusions

61

Goal

62

Integrate HTML Web and Semantic Web by establishing and exploiting connections between them => Unified Web Model Design and implement a language to retrieve data and documents from the Unified Web => Hybrid Query Language Implement the system using mature software components for indexing and search => SITAR

63

Background and Motivation

64

HTML Web

Hyperlinked Web of documents Content human comprehensible Search engines and web browsers search, retrieve, navigate, and display information Keyword-based searches have low precision and high recall

65

Semantic Web

Standards-based labeled graph of resources and binary properties (data) Content machine accessible Database techniques adapted to store and retrieve Semantic Web data Query formulation by lay users difficult but results are precise XML, RDF, SPARQL, Web Services, etc.

66

Shoehorning HTML Web into Semantic Web

Document = Data node + Content as string in RDF graph

Regular expressions in SPARQL used to retrieve documents. Drawbacks that IR tries to overcome Ease of query formulation: Keyword-based Dealing with Large datasets: Ranking

67

Formalizing HTML Web as Semantic Web

Techniques for manual (re)-authoring of (legacy) documents using Semantic Web Technologies is neither feasible nor advisable. State-of-the-art NLP and information

extraction techniques inadequate Informal description indispensable for human

comprehension Escape route: Traceability via superposition (E.g., RDFa)

68

Shoehorning Semantic Web into HTML Web

Currently, Semantic Web documents live on the HTML Web but their components are neither accessible nor reasoned with via keyword-based searches Swoogle attempts to rank Semantic Web

documents

69

Unified Web Model (What?)

70

Aim

Unified Web integrates the two Webs to enable improved hybrid retrieval of data and documents. Unified Web Model Hybrid Query Language

71

Unified Web Model Graph

Node Abstract entity identified by its URI

Blank/Literal node names automatically generated Home URI Section

URI index words Document Section (optional) Outgoing Links Section Triples Section …

72

(cont’d)

Relationships (Edges) hasDocument

Relates Node to content string hyperlinksTo

Relates Node with another node to which the former node’s document contains a hyperlink

Asserts Relates Node with each RDF statement in the document

linksTo Relates Node with another node

to which the former node’s document contains a hyperlink, or such that the former node’s document contains a triple with

the latter node

73

Example of Unified Web Model

Document http://www.abc.com/xyz.htm contains the RDF fragment:

… <mailTo: [email protected]/> … <rdf:RDF…> <owl:Class

rdf:ID=’http://www.abc.com/sw#Jaguar’/> </rdf:RDF> …

mailto:[email protected]/

74

75

Data Retrieval from Unified Web

Unified Web Model can be specified using RDF

In terms of rdfs:Resource, rdfs:Propery, rdfs:Statement, rdfs:Literal, refs:Subject, rdfs:Predicate, rdfs:Object, etc

Unified Web is a reified Semantic Web (user triples)

SPARQL usable as query language

76

Information Retrieval from Unified Web

Node can be indexed using URI index words

Based on name, content, label, triples, etc

Node can be ranked using its phrasal / URI-based annotations and its node neighborhood

77

Advantages

Semantic Web nodes can be retrieved using (associated) keywords Legacy document recall improved by interpreting hyperlink as Semantic Markup for reasoning.

Hyperlink: mailto:[email protected] Triple: <mailto:[email protected] rdf:type univ:prof>

mailto:[email protected]

78

Semantics rich URIs (such as those from dictionary.com) in legacy documents can be incrementally equated with ontologies

Document: … <a href =

http://dictionary.com/search?q=jaguar> Jaguar </a> God of the Underworld …

Ontology: … <http://dictionary.com/search?q=jaguar owl:Sameas http://www.animalOnto.com/Jaguar> ...

http://dictionary.com/search?q=jaguar

http://dictionary.com/search?q=jaguar%20owl:Sameas%20http://www.animalOnto.com/Jaguar

http://dictionary.com/search?q=jaguar%20owl:Sameas%20http://www.animalOnto.com/Jaguar

http://www.animalonto.com/Jaguar

79

Query Language and Examples (What?)

80

Aim

Store and retrieve Semantic Web data, and use information in documents to enhance data retrieval Enable use of keywords to deal with lack

of complete URI information Peter affiliated-with ?X

Enable use of partial information about data being searched Student :: Peter affiliated-with ?X

81

Store and retrieve documents, and use information in the Semantic Web to enhance document retrieval Docsearch(<animal>::<jaguar> Maya God)

82

Sample Queries

Wordset queries: <peter haase> -> retrieves all URIs

indexed by BOTH peter AND haase Includes document and URIs URIs are indexed by words.

The words are obtained by analyzing URIs, from label literals, and anchor text of the URIs.

83

Wordset Pair queries:

<phdstudent>::<peter> -> specifies that

user is looking for peter, the phd student

Transitive closure

84

More Queries

Get Peter the Phd student’s home page: getBindings ( [<phdstudent>::<peter> <homepage> ?x] )

Get Peter Haase’s publications that have “Semantic” in their title:

getBindings([<peter haase> <publication> ?x] [?x <title> <semantic>])

Get group 1 element which is white in color getBindings( [?x <group> <group 1>] [?x <color> <white>] )

85

Homepages of Phd students named Peter that “talk about” Semantic Grid

getDocsByBindingsAndContent ( [<phdstudent>::<peter> <homepage> ?x] “semantic grid” )

getLinkingNodes ( http://www.aifb.uni-

karlsruhe.de/Personen/viewPerson?id_db=2023 ) getAssertingNodes

([<peter haase> <publication> ?x]). getDocsByIndexOrContent (peter haase)

http://www.aifb.uni-karlsruhe.de/Personen/viewPerson?id_db=2023

http://www.aifb.uni-karlsruhe.de/Personen/viewPerson?id_db=2023

86

Implementation Details (How?)

SITAR : Semantic InformaTion Analysis and Retrieval system

87

Tools Used

Apache Lucene 2.0 APIs in Java A high-performance, text search engine

library with smart indexing strategies. Cyberneko HTML Parser Jena ARP RDF parser

88

Evaluation and Application (Why?)

89

Experiments DATASETs: AIFB SEAL data

The crawler collected 1665 files (English XHTML pages and RDF/OWL pages).

1455 (610 RDF files and 845 XHTML files) were successfully parsed and indexed

A total of 193520 triples were parsed and indexed

90

Datasets (cont’d)

TAP dataset Periodic table Lehigh University BenchMarks

91

Conclusions

92

Developed a Hybrid Query language for data and document retrieval that is convenient because it is keyword-based that can be accurate and flexible because disambiguation

information can be provided that is expressive because it can support inheritance

reasoning that is pragmatic because it can work with legacy

documents

FUTURE WORK: Robust Ranking Strategy

93

References

T. Immaneni and K. Thirunarayan, A Unified approach To Retrieving Web Documents and Semantic Web Data, In: Proceedings of the 4th European Semantic Web Conference (ESWC 2007), LNCS 4519, pp. 579-593, June 2007. T. Immaneni, and K. Thirunarayan, Hybrid Retrieval from the Unified Web, In: Proceedings of the 22nd Annual ACM Symposium on Applied Computing (ACM SAC 2007), pp. 1376-1380, March 2007.

THANK YOU!

http://knoesis.wright.edu/tkprasad/

94