Upload
mathieu-daquin
View
2.129
Download
6
Embed Size (px)
DESCRIPTION
Presentation at the SPOT 2010 workhop on Provacy and Trust on the Social and Semantic Web.
Citation preview
Semantic Monitoring of Personal Web Activity to Support the Management of Trust and Privacy
Mathieu d'Aquin, Salman Elahi, Enrico MottaKnowledge Media Institute, The Open University, UK
Stating the obvious
Personal information exchange on the Web is – Big– Heterogeneous– Distributed– Fragmented– Sometimes implicit
Challenges to individuals
Lack of control over personal information
In sum, we don’t know the most important things about our personal data
What are all the websites that know my e-mail address?
What does amazon.co.uk or the website of my favorite airline know
about me?
Why this is important
• Because these things are useful to know in general
• Because these things can tell us a lot about our own behavior, our attitudes towards information sharing and exchange
• Because this behavior has strong implications in terms of privacy and defines our trust relationships with website online
So, what do we do?
• Unrestricted monitoring of information exchange on the Web by an individual user
• Building a semantically represented and processable datasets of what was shared and with who
• Analyze these datasets in terms of building models of the user’s behavior related to privacy, – levels of trust given to websites – levels criticality associated to different pieces of data
Loca
l Web
Age
nts
(e.g
., br
owse
r)Local
LoggingProxy
HTTP Requests
HTTP Responses
HTTP Requests
HTTP Responses Exte
rnal
Web
Site
s
Web Exchange RDF Logs
HTTP Ontology
Personal Information
Interaction Patterns
Ran over a period of 2.5 months yielded around 100 Million triples, representing about 3 Million HTTP requests.
Encodes all the info related to HTTP requests and responses.
Data sent and received stored separately.
<REQUEST RDF:ABOUT="#REQUEST-1257949232709-1257949233757"> <STARTEDAT>1257949232709</STARTEDAT> <ENDEDAT>1257949233757</ENDEDAT> <ORIGIN RDF:RESOURCE="127.0.0.1" /> <ONPORT>80</ONPORT> <TOHOST RDF:RESOURCE="API.FACEBOOK.COM" /> <METHOD RDF:RESOURCE="POST"/> <TOURL RDF:RESOURCE="HTTP://API.FACEBOOK.COM/RESTSERVER.PHP" /> <HTTPVERSION RDF:RESOURCE="HTTP-1.1" /> <HOST RDF:RESOURCE="API.FACEBOOK.COM" /> <CONTENT-TYPE RDF:RESOURCE="APPLICATION--X-WWW-FORM-URLENCODED" /> <USER-AGENT RDF:RESOURCE="MOZILLA--5.0_(MACINTOSH;_U;_INTEL_MAC_OS_X;_EN)_APPLEWEBKIT--526.9+_(KHTML._LIKE_GECKO)_ADOBEAIR--1.5.2" /> <REFERER RDF:RESOURCE="APP:--TWEETDECK.SWF" /> <X-FLASH-VERSION RDF:RESOURCE="10.0.32.18" /> <ACCEPT RDF:RESOURCE="*--*" /> <ACCEPT-LANGUAGE RDF:RESOURCE="EN-US" /> <ACCEPT-ENCODING RDF:RESOURCE="GZIP._DEFLATE" /> <COOKIE RDF:RESOURCE= "__QCA=1239783354-42963995-12118014;___UTMA=87286159.357565716.1239892196.1252686326.1257582307.16;___UTMZ=87286159.1257582307.16.16.UTMCCN= (REFERRAL)|UTMCSR=FACEBOOK.COM|UTMCCT=--TOS.PHP|UTMCMD=REFERRAL;_C_USER=605559235;_CUR_MAX_LAG=2;_DATR=1239398136-0711BF1215821A9C58848BF0FFD0020EC8450CFA7154B9E228C29;_LSD=P3ZPN;_LXE=METM.DAQUIN%40VIRGIN.NET;_LXS=3;_S_VSN_FACEBOOKPOC_1=9874874320812" /> <CONTENT-LENGTH RDF:RESOURCE="984" /> <CONNECTION RDF:RESOURCE="KEEP-ALIVE" /> <PROXY-CONNECTION RDF:RESOURCE="KEEP-ALIVE" /> <DATA RDF:RESOURCE="DATA_C22B691F691DABD5AE893B9CB2F8ADD7" /> <RESPONSE> <RESPONSE RDF:ABOUT="#RESPONSE-1257949232709--1257949233757"> <HTTPVERSION RDF:RESOURCE="HTTP--1.0" /> <RESPONSECODE RDF:RESOURCE="200_OK" /> <CACHE-CONTROL RDF:RESOURCE="PRIVATE._NO-STORE._NO-CACHE._MUST-REVALIDATE._POST-CHECK=0._PRE-CHECK=0" /> <CONTENT-TYPE RDF:RESOURCE="APPLICATION--JSON" /> <EXPIRES RDF:RESOURCE="MON._26_JUL_1997_05:00:00_GMT" /> <PRAGMA RDF:RESOURCE="NO-CACHE" /> <CONTENT-ENCODING RDF:RESOURCE="GZIP" /> <CONTENT-LENGTH RDF:RESOURCE="5943" /> <X-CACHE RDF:RESOURCE="MISS_FROM_ROEBURN.OPEN.AC.UK" /> <PROXY-CONNECTION RDF:RESOURCE="KEEP-ALIVE" /> <DATA RDF:RESOURCE="DATA_5CCF6054FD0FBA3EE7EB444E178EAF19" /> </RESPONSE></RESPONSE></REQUEST>
Basic analytics
Focusing on personal data exchange
• Extract information sent through parameters of HTTP Requests
http://uk.search.yahoo.com/beacon/module?p=idiocracy&url=http%3A%2F%2Fwww.imdb.com%2Ftitle%2Ftt0387808%2F
format=JSON&method=fql%2Emultiquery&api%5Fkey=51d350e8d92da1f5623512a9e801da2b&v
=1%2E0&queries=%7B%22query2%22%3A%22SELECT%20app%5Fid%2C%20display%5Fname%20FROM
%20application%20WHERE%20app%5Fid%20IN%20%28SELECT%20app%5Fid%20FROM%20%23query1
%29%22%2C%22query1%22%3A%22SELECT%20post%5Fid%2C%20source%5Fid%2C%20created%5Fti
me%2C%20updated%5Ftime%2C%20actor%5Fid%2C%20target%5Fid%2C%20app%5Fid%2C%20messa
ge%2C%20attachment%2C%20comments%2C%20likes%2C%20permalink%2C%20attribution%2C%2
0type%20FROM%20stream%20WHERE%20filter%5Fkey%20IN%20%28SELECT%20filter%5Fkey%20F
ROM%20stream%5Ffilter%20WHERE%20uid%20%3D%20605559235%20AND%20type%20%3D%20%27ne
wsfeed%27%29%20AND%20%28created%5Ftime%20%3E%3D%201257443596%29%20AND%20%28%28cr
eated%5Ftime%20%3E%201257945423%29%20OR%20%28updated%5Ftime%20%21%3D%20created%5
Ftime%29%29%20ORDER%20BY%20created%5Ftime%20DESC%20LIMIT%20200%22%7D&call%5Fid=1
2565739074246102&sig=01a13a72825ed83ed6d23bdf2791ad1a&session%5Fkey=be312ffdf9b9
e1a5ec6c5768%2D605559235
• Map this data onto a representation of a user profile (set of attributes of personal data)
Tool used to create mappings between data sent to websites (from logs on the right) with the user profile (left). Effectively reconstructing the profile from the data
What this tells us about Trust and Criticality of data
• 36 attributes, 1,080 values, to 123 domains• A model of what piece of personal information
was sent where (can answer the questions)• Taking the point of view of an external
observer, we can derive an observed model of trust and criticality of data– If this piece of data is critical to you and you give
it to bob, you must trust bob– If you give this piece of data to many untrusted
people, you probably don’t consider it critical
• The goal being to help the user to better understand his own behavior
The model formally
• Trust in a domain =
max of criticality of data it received
• Criticality of a piece of data =
1 / 1 + Σ (1- trust in websites
that received the data)
• Obviously, these 2 formulas are interdependent. Treating them as a sequence, with initial values at 0.5
Interacting with the model
Expose the user to his own observed behavior has observed, so that he can try to align it to his intended behavior
What we can do with this
• Help a user understand his own data exchange
• Compare websites and data in terms of the observed trust and criticality
• “Correct” the model by re-aligning it with the intended behavior
• Detect fundamental conflicts between the observed behavior and the intended behavior
• Observe correlations in the data
Where that leads us
• 1 first tools exploiting logs of personal Web activity
• Demonstrate the need for better ways to personal information management as personal Web data exchange
• Need to exploit and integrate local and external sources of data together to create new mechanisms supporting individuals in interpreting, understating and managing their information online