Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Wright State UniversityCORE Scholar
Kno.e.sis Publications The Ohio Center of Excellence in Knowledge-Enabled Computing (Kno.e.sis)
12-10-2009
Research in Semantic Web and InformationRetrieval: Trust, Sensors, and SearchKrishnaprasad ThirunarayanWright State University - Main Campus, [email protected]
Follow this and additional works at: https://corescholar.libraries.wright.edu/knoesis
Part of the Bioinformatics Commons, Communication Technology and New Media Commons,Databases and Information Systems Commons, OS and Networks Commons, and the Science andTechnology Studies Commons
This Article is brought to you for free and open access by the The Ohio Center of Excellence in Knowledge-Enabled Computing (Kno.e.sis) at COREScholar. It has been accepted for inclusion in Kno.e.sis Publications by an authorized administrator of CORE Scholar. For more information, pleasecontact [email protected], [email protected].
Repository CitationThirunarayan, K. (2009). Research in Semantic Web and Information Retrieval: Trust, Sensors, and Search. .https://corescholar.libraries.wright.edu/knoesis/77
1
Research in Semantic Web and Information Retrieval:
Trust, Sensors, and Search
T. K. Prasad (Krishnaprasad Thirunarayan)
Professor Kno.e.sis Center
Department of Computer Science and Engineering Wright State University, Dayton, OH-45435, USA
Knowledge Enabled Information and Services Science 3
http://knoesis.wright.edu/
Information Retrieval
Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).
4
Evolution of the Web
5
Semantic Web Semantic Web is a standards- based extension of the WWW in which the semantics of information and services on the web is defined, so as to satisfy information need of people and enable machines to use the web content. Machine comprehensible structured data
6
Tim Berner-Lee’s Semantic Web Layer Cake
7
Updated Semantic Web Cake
8
9
Trust Issues in Social Media and Sensor Networks
T. K. Prasad, Cory Henson,
Amit Sheth and Pramod Anantharam Kno.e.sis Center
Department of Computer Science and Engineering Wright State University, Dayton, OH-45435, USA
10
Goal
Study semantic issues relevant to trust in Social Media - Data and Networks Sensor - Data and Networks
Generic Examples involving Trust Analyzing ratings/reviews online on TV models
before making purchasing decision from amazon.com
Seeking recommendations on handy man, car mechanic, etc. from neighbors
11
Generic Approach
Propose models of trust/trust metrics to formalize trust aggregation and trust propagation to deal with indirect trust Develop techniques and tools to glean trust information from
social media data (streams) and networks sensor data (streams) and networks
Trust in Social Media Networks
12
Previous Work Structure of Trust
Trust between a pair of users is modelled as a real number in the closed interval [0,1] or [-1,1] Pros: Facilitates propagation and
computation of aggregated trust Cons:
Too fine-grained, total order Inherent difficulties in initializing,
understanding, and justifying computed trust values
Quote
Guha et al: While continuous-valued trusts are
mathematically clean from the standpoint of usability, most real-world systems will in fact use discrete values at which one user can rate another. E.g., Epinions, Ebay, Amazon, Facebook, etc all
use small sets for (dis)trust/rating values.
Trust-aware Recommender Systems
Collaborative Filtering systems exploit user-similarity to get recommendations. But suffer from data sparsity problem. Adding trust links improves quality of recommendations benefits cold-start users who most need it is robust w.r.t. spamming via engineered profiles
(Shilling Attacks)
15
16
Our Research Propose a model of trust based on Partially ordered discrete values (with emphasis
on relative magnitude) Local but realistic semantics
Distinguishes functional and referral trust Distinguishes direct and inferred trust
Prefers direct information over conflicting inferred information
Represents ambiguity explicitly
HOLY GRAIL: Direct Semantics in favor of Indirect Translations
Essential concepts Trust Scope: Context, Action, … Functional Trust: Agent a1 trusts agent a2’s ability in some context or for doing something Referral Trust: Agent a1 trusts agent a2’s ability to recommend another agent in some context or for doing something Trust is a relationship among agents/users, while belief is a relationship between agents/users and statements
17
Semantics : Interpretation Four valued logic {inconsistent information, true, false, no information} Trust / Distrust 4-valued “binary” function among users Belief / Disbelief: 4-valued “binary” function on users and
statements
Example: Trust Network - Different Trust Links and
Local Ordering on Trust Links
Alice trusts Bob for recommending good car mechanic. Bob trusts Dick to be a good car mechanic. Charlie does not trust Dick to be a good car mechanic. Alice trusts Bob more than Charlie, w.r.t. car mechanic context. Alice trusts Charlie more than Bob, w.r.t. baby sitter context.
19
Formalization of Semantics : Basis for Trust Computation Algorithm
20
Formalization Approach
Given a trust network (Nodes, Edges with Trust Scopes, Local Orderings), specify when a source agent can trust, distrust, or be ambiguous about another target agent, reflecting:
Functional and referral trust links Direct and inferred trust Locality
21
22
23
Similarly for Evidence in support of Negative Functional Trust.
24
Quote summarizing potential bug
The whole problem with the world is that fools and fanatics are always so certain of themselves, but wiser people so full of doubts.
--- Betrand Russell
25
Possible Future Extensions Trust links with trust-scoped exceptions
Straddles two extremes involving just trust links and just trust-scoped links
Trust values annotated with trust path length, target neighborhood summary, etc. Other forms of trust links formalized using upper ontology
26
Trust in Sensor Networks
27
Sensor Networks
Approaches to Trust Reputation-based Trust
Based on past behavior Policy-based Trust
Based on explicitly stated constraints Evidence-based Trust
Based on seeking/verifying evidence
28
Probabilistic basis for reputation-based trust in a Sensor Node
Sensor Reputation and Sensor Observation Credibility determined using outlier detection algorithm aggregating results over time Homogeneous sensor networks can
exploit spatio-temporal locality and redundancy for this purpose
Heterogeneous sensor networks require complex domain models for this purpose
29
(cont’d)
Trust/Reputation in a Sensor node can be modeled as beta probability distribution function with parameters (α,β) gleaned from total number of correct (α−1) and erroneous (β−1) observations so far.
30
Motivation for using Beta PDF
Computational Ease Retain/manipulate just two values (α,β) Incremental update : after checking
whether new data is normal or outlier Intuitively Satisfactory Initialization not necessary (flat PDF) PDF variation sufficiently expressive
That is, it assimilates updates and large number of observations satisfactorily
31
Next few slides shed light on beta probability distribution function
(1) Mathematical formulation (2)Graphs for intuitive understanding of its role
32
Role of Beta probability distribution function
33
x is a probability, so it ranges from 0-1
If the prior distribution of p is uniform, then the beta distribution gives posterior distribution of p after observing α−1 occurrences of event with probability p and β−1 occurrences of the complementary event with probability (1-p).
34
α= 5 β= 5
α= 1 β= 1
α= 2 β= 2
α= 10 β= 10
α= β, so the pdf’s are symmetric w.r.t 0.5. Note that the graphs get narrower as (α+β) increases.
35
α= 5 β= 25
α= 5 β= 10
α= 25 β= 5
α= 10 β= 5
α=/= β, so the pdf’s are asymmetric w.r.t . 0.5. Note that the graphs get narrower as (α+β) increases.
Advantages: Robust w.r.t. attacks
Bad-mouthing attack E-commerce analogy: Sellers collude with
buyers to give bad ratings to others
Ballot stuffing attack E-commerce analogy: Sellers collude with
buyers to give it unfairly good ratings
Sleeper attacks Apparently trusted agent defects
36
Trust in Tweets
37
Large network of people Large number of tweets Tweet – 140 character description
of an event
Problem: How to organize tweets?
38
Exploiting trust information
Rank tweets according to trust information
Trust in the user who tweets
Belief (trust) in the tweet
39
Trust in the person who tweets Popularity of the user Based on count of followers Reputation of the user Based on history of making informed
observations Enrich using Pagerank Analogy? Highly trusted followers count more
than lowly trusted followers
Belief (Trust) in a tweet
Belief in a tweet depends on the trust in the user who generates it. Belief in a tweet depends on the content of “similar” tweets (originating from approximately the same location around the same time)
Trust in Linked Open Data
42
Linked Data
The Linking Open Data (LOD) project is a community-led effort to create openly accessible, and interlinked, RDF Data on the Web. RDF: Resource Description Framework – graph-based representation language
43
Linked Data
44
Exploiting trust for access and standardization
Trust in the creator of the data, and belief (trust) in the data How well connected is the data? Rank LOD according to trust information
45
Sensor Data on LOD
MesoWest weather data in US ~20,000+ Sensor Systems ~1 billion Observational Assertions Sensors linked with Geonames on LOD
http://wiki.knoesis.org/index.php/SSW
46
Trust in Active Perception
47
Active Perception
Perception is the process of observing, hypothesis generation, and verification
48
Evidence-based Trust
Observations (and hypotheses) are more trusted if they can be verified through empirical evidence Sensors are more trusted if their observations are trusted
49
Evidence-based Trust
50
Strengthened Trust
Trust
Additional uses of active perception in sensors context
Determining actionable intelligence by narrowing set of explanations to one Enable use of a “small” set of always on sensors to bootstrap and selectively turn-on additional sensors in a resource (e.g., power) constrained environment
51
52
References
Krishnaprasad Thirunarayan, Dharan Althuru, Cory Henson, and Amit Sheth, “A Local Qualitative Approach to Referral and Functional Trust,” The 4th Indian International Conference on Artificial Intelligence (IICAI-09), December 2009. Cory Henson, Joshua Pschorr, Amit Sheth, and Krishnaprasad Thirunarayan, “SemSOS: Semantic Sensor Observation Service,” International Symposium on Collaborative Technologies and Systems (CTS2009), Workshop on Sensor Web Enablement (SWE2009), Baltimore, Maryland, 2009. Krishnaprasad Thirunarayan, Cory Henson, and Amit Sheth, “Situation Awareness via Abductive Reasoning from Semantic Sensor Data: A Preliminary Report,” International Symposium on Collaborative Technologies and Systems (CTS2009), Workshop on Collaborative Trusted Sensing, Baltimore, Maryland, 2009.
53
References
A. Sheth and M. Nagarajan, “Semantics-Empowered Social Computing, IEEE Internet Computing”, Jan/Feb 2009, 76-80 Amit Sheth, Cory Henson, and Satya Sahoo, "Semantic Sensor Web," IEEE Internet Computing, vol. 12, no. 4, July/August 2008, p. 78-83.
Machine and Citizen Sensor Data Demos
Illustrate semantic web and information retrieval techniques --
spatio-temporal-thematic ontologies, mash-ups, machine and citizen
sensor data analytics
54
55
High-level Sensor Low-level Sensor
How do we check if the three images depict …
• the same time and same place?
• same entity?
• a serious threat?
Motivating Scenario: Spatio-temporal-thematic analytics
Semantic Observation Service:
Overall Architecture and Details
SemSOS Demo http://knoesis.wright.edu/research/sems
ci/application_domain/sem_sensor/cory/demos/ssos_demo/ssos_demo.htm Twitris Demo http://twitris.knoesis.org/
57
Situation Awareness : Analysis Situation Awareness Components
Physical World: Sensor Data Perception: Entity Metadata Comprehension: Relationship Metadata
Semantic Analysis How is the data represented? Sensor Web Enablement What are the sources of the data? Provenance Analysis What objects/events account for the data? Abductive Reasoning Where did the event occur? Spatial Analysis When did the event occur? Temporal Analysis What is the significance of the event? Thematic Analysis What are the reasons for inconsistency? Abductive Reasoning
59
A Unified Approach to Retrieving Web Documents and Semantic Web Data
Trivikram Immaneni* and Krishnaprasad Thirunarayan
Department of Computer Science and Engineering
Wright State University Dayton, OH-45435, USA
*Currently at: Technorati, San Francisco
60
Outline
Goal (What?)
Background and Motivation (Why?)
Unified Web Model (Why?)
Query Language and Examples (What?)
Implementation Details (How?)
Evaluation and Applications (Why?)
Conclusions
61
Goal
62
Integrate HTML Web and Semantic Web by establishing and exploiting connections between them => Unified Web Model Design and implement a language to retrieve data and documents from the Unified Web => Hybrid Query Language Implement the system using mature software components for indexing and search => SITAR
63
Background and Motivation
64
HTML Web
Hyperlinked Web of documents Content human comprehensible Search engines and web browsers search, retrieve, navigate, and display information Keyword-based searches have low precision and high recall
65
Semantic Web
Standards-based labeled graph of resources and binary properties (data) Content machine accessible Database techniques adapted to store and retrieve Semantic Web data Query formulation by lay users difficult but results are precise XML, RDF, SPARQL, Web Services, etc.
66
Shoehorning HTML Web into Semantic Web
Document = Data node + Content as string in RDF graph
Regular expressions in SPARQL used to retrieve documents. Drawbacks that IR tries to overcome Ease of query formulation: Keyword-based Dealing with Large datasets: Ranking
67
Formalizing HTML Web as Semantic Web
Techniques for manual (re)-authoring of (legacy) documents using Semantic Web Technologies is neither feasible nor advisable. State-of-the-art NLP and information
extraction techniques inadequate Informal description indispensable for human
comprehension Escape route: Traceability via superposition (E.g., RDFa)
68
Shoehorning Semantic Web into HTML Web
Currently, Semantic Web documents live on the HTML Web but their components are neither accessible nor reasoned with via keyword-based searches Swoogle attempts to rank Semantic Web
documents
69
Unified Web Model (What?)
70
Aim
Unified Web integrates the two Webs to enable improved hybrid retrieval of data and documents. Unified Web Model Hybrid Query Language
71
Unified Web Model Graph
Node Abstract entity identified by its URI
Blank/Literal node names automatically generated Home URI Section
URI index words Document Section (optional) Outgoing Links Section Triples Section …
72
(cont’d)
Relationships (Edges) hasDocument
Relates Node to content string hyperlinksTo
Relates Node with another node to which the former node’s document contains a hyperlink
Asserts Relates Node with each RDF statement in the document
linksTo Relates Node with another node
to which the former node’s document contains a hyperlink, or such that the former node’s document contains a triple with
the latter node
73
Example of Unified Web Model
Document http://www.abc.com/xyz.htm contains the RDF fragment:
… <mailTo: [email protected]/> … <rdf:RDF…> <owl:Class
rdf:ID=’http://www.abc.com/sw#Jaguar’/> </rdf:RDF> …
74
75
Data Retrieval from Unified Web
Unified Web Model can be specified using RDF
In terms of rdfs:Resource, rdfs:Propery, rdfs:Statement, rdfs:Literal, refs:Subject, rdfs:Predicate, rdfs:Object, etc
Unified Web is a reified Semantic Web (user triples)
SPARQL usable as query language
76
Information Retrieval from Unified Web
Node can be indexed using URI index words
Based on name, content, label, triples, etc
Node can be ranked using its phrasal / URI-based annotations and its node neighborhood
77
Advantages
Semantic Web nodes can be retrieved using (associated) keywords Legacy document recall improved by interpreting hyperlink as Semantic Markup for reasoning.
Hyperlink: mailto:[email protected] Triple: <mailto:[email protected] rdf:type univ:prof>
78
Semantics rich URIs (such as those from dictionary.com) in legacy documents can be incrementally equated with ontologies
Document: … <a href =
http://dictionary.com/search?q=jaguar> Jaguar </a> God of the Underworld …
Ontology: … <http://dictionary.com/search?q=jaguar owl:Sameas http://www.animalOnto.com/Jaguar> ...
79
Query Language and Examples (What?)
80
Aim
Store and retrieve Semantic Web data, and use information in documents to enhance data retrieval Enable use of keywords to deal with lack
of complete URI information Peter affiliated-with ?X
Enable use of partial information about data being searched Student :: Peter affiliated-with ?X
81
Store and retrieve documents, and use information in the Semantic Web to enhance document retrieval Docsearch(<animal>::<jaguar> Maya God)
82
Sample Queries
Wordset queries: <peter haase> -> retrieves all URIs
indexed by BOTH peter AND haase Includes document and URIs URIs are indexed by words.
The words are obtained by analyzing URIs, from label literals, and anchor text of the URIs.
83
Wordset Pair queries:
<phdstudent>::<peter> -> specifies that
user is looking for peter, the phd student
Transitive closure
84
More Queries
Get Peter the Phd student’s home page: getBindings ( [<phdstudent>::<peter> <homepage> ?x] )
Get Peter Haase’s publications that have “Semantic” in their title:
getBindings([<peter haase> <publication> ?x] [?x <title> <semantic>])
Get group 1 element which is white in color getBindings( [?x <group> <group 1>] [?x <color> <white>] )
85
Homepages of Phd students named Peter that “talk about” Semantic Grid
getDocsByBindingsAndContent ( [<phdstudent>::<peter> <homepage> ?x] “semantic grid” )
getLinkingNodes ( http://www.aifb.uni-
karlsruhe.de/Personen/viewPerson?id_db=2023 ) getAssertingNodes
([<peter haase> <publication> ?x]). getDocsByIndexOrContent (peter haase)
86
Implementation Details (How?)
SITAR : Semantic InformaTion Analysis and Retrieval system
87
Tools Used
Apache Lucene 2.0 APIs in Java A high-performance, text search engine
library with smart indexing strategies. Cyberneko HTML Parser Jena ARP RDF parser
88
Evaluation and Application (Why?)
89
Experiments DATASETs: AIFB SEAL data
The crawler collected 1665 files (English XHTML pages and RDF/OWL pages).
1455 (610 RDF files and 845 XHTML files) were successfully parsed and indexed
A total of 193520 triples were parsed and indexed
90
Datasets (cont’d)
TAP dataset Periodic table Lehigh University BenchMarks
91
Conclusions
92
Developed a Hybrid Query language for data and document retrieval that is convenient because it is keyword-based that can be accurate and flexible because disambiguation
information can be provided that is expressive because it can support inheritance
reasoning that is pragmatic because it can work with legacy
documents
FUTURE WORK: Robust Ranking Strategy
93
References
T. Immaneni and K. Thirunarayan, A Unified approach To Retrieving Web Documents and Semantic Web Data, In: Proceedings of the 4th European Semantic Web Conference (ESWC 2007), LNCS 4519, pp. 579-593, June 2007. T. Immaneni, and K. Thirunarayan, Hybrid Retrieval from the Unified Web, In: Proceedings of the 22nd Annual ACM Symposium on Applied Computing (ACM SAC 2007), pp. 1376-1380, March 2007.
THANK YOU!
http://knoesis.wright.edu/tkprasad/
94