16
Semantic Monitoring of Personal Web Activity to Support the Management of Trust and Privacy Mathieu d'Aquin, Salman Elahi, Enrico Motta Knowledge Media Institute, The Open University, UK

Semantic Monitoring of Personal Web Activity to Support the Management of Trust and Privacy

Embed Size (px)

DESCRIPTION

Presentation at the SPOT 2010 workhop on Provacy and Trust on the Social and Semantic Web.

Citation preview

Page 1: Semantic Monitoring of Personal Web Activity to Support the Management of Trust and Privacy

Semantic Monitoring of Personal Web Activity to Support the Management of Trust and Privacy

Mathieu d'Aquin, Salman Elahi, Enrico MottaKnowledge Media Institute, The Open University, UK

Page 2: Semantic Monitoring of Personal Web Activity to Support the Management of Trust and Privacy

Stating the obvious

Personal information exchange on the Web is – Big– Heterogeneous– Distributed– Fragmented– Sometimes implicit

Page 3: Semantic Monitoring of Personal Web Activity to Support the Management of Trust and Privacy

Challenges to individuals

Lack of control over personal information

In sum, we don’t know the most important things about our personal data

What are all the websites that know my e-mail address?

What does amazon.co.uk or the website of my favorite airline know

about me?

Page 4: Semantic Monitoring of Personal Web Activity to Support the Management of Trust and Privacy

Why this is important

• Because these things are useful to know in general

• Because these things can tell us a lot about our own behavior, our attitudes towards information sharing and exchange

• Because this behavior has strong implications in terms of privacy and defines our trust relationships with website online

Page 5: Semantic Monitoring of Personal Web Activity to Support the Management of Trust and Privacy

So, what do we do?

• Unrestricted monitoring of information exchange on the Web by an individual user

• Building a semantically represented and processable datasets of what was shared and with who

• Analyze these datasets in terms of building models of the user’s behavior related to privacy, – levels of trust given to websites – levels criticality associated to different pieces of data

Page 6: Semantic Monitoring of Personal Web Activity to Support the Management of Trust and Privacy

Loca

l Web

Age

nts

(e.g

., br

owse

r)Local

LoggingProxy

HTTP Requests

HTTP Responses

HTTP Requests

HTTP Responses Exte

rnal

Web

Site

s

Web Exchange RDF Logs

HTTP Ontology

Personal Information

Interaction Patterns

Page 7: Semantic Monitoring of Personal Web Activity to Support the Management of Trust and Privacy

Ran over a period of 2.5 months yielded around 100 Million triples, representing about 3 Million HTTP requests.

Encodes all the info related to HTTP requests and responses.

Data sent and received stored separately.

<REQUEST RDF:ABOUT="#REQUEST-1257949232709-1257949233757"> <STARTEDAT>1257949232709</STARTEDAT> <ENDEDAT>1257949233757</ENDEDAT> <ORIGIN RDF:RESOURCE="127.0.0.1" /> <ONPORT>80</ONPORT> <TOHOST RDF:RESOURCE="API.FACEBOOK.COM" /> <METHOD RDF:RESOURCE="POST"/> <TOURL RDF:RESOURCE="HTTP://API.FACEBOOK.COM/RESTSERVER.PHP" /> <HTTPVERSION RDF:RESOURCE="HTTP-1.1" /> <HOST RDF:RESOURCE="API.FACEBOOK.COM" /> <CONTENT-TYPE RDF:RESOURCE="APPLICATION--X-WWW-FORM-URLENCODED" /> <USER-AGENT RDF:RESOURCE="MOZILLA--5.0_(MACINTOSH;_U;_INTEL_MAC_OS_X;_EN)_APPLEWEBKIT--526.9+_(KHTML._LIKE_GECKO)_ADOBEAIR--1.5.2" /> <REFERER RDF:RESOURCE="APP:--TWEETDECK.SWF" /> <X-FLASH-VERSION RDF:RESOURCE="10.0.32.18" /> <ACCEPT RDF:RESOURCE="*--*" /> <ACCEPT-LANGUAGE RDF:RESOURCE="EN-US" /> <ACCEPT-ENCODING RDF:RESOURCE="GZIP._DEFLATE" /> <COOKIE RDF:RESOURCE= "__QCA=1239783354-42963995-12118014;___UTMA=87286159.357565716.1239892196.1252686326.1257582307.16;___UTMZ=87286159.1257582307.16.16.UTMCCN= (REFERRAL)|UTMCSR=FACEBOOK.COM|UTMCCT=--TOS.PHP|UTMCMD=REFERRAL;_C_USER=605559235;_CUR_MAX_LAG=2;_DATR=1239398136-0711BF1215821A9C58848BF0FFD0020EC8450CFA7154B9E228C29;_LSD=P3ZPN;_LXE=METM.DAQUIN%40VIRGIN.NET;_LXS=3;_S_VSN_FACEBOOKPOC_1=9874874320812" /> <CONTENT-LENGTH RDF:RESOURCE="984" /> <CONNECTION RDF:RESOURCE="KEEP-ALIVE" /> <PROXY-CONNECTION RDF:RESOURCE="KEEP-ALIVE" /> <DATA RDF:RESOURCE="DATA_C22B691F691DABD5AE893B9CB2F8ADD7" /> <RESPONSE> <RESPONSE RDF:ABOUT="#RESPONSE-1257949232709--1257949233757"> <HTTPVERSION RDF:RESOURCE="HTTP--1.0" /> <RESPONSECODE RDF:RESOURCE="200_OK" /> <CACHE-CONTROL RDF:RESOURCE="PRIVATE._NO-STORE._NO-CACHE._MUST-REVALIDATE._POST-CHECK=0._PRE-CHECK=0" /> <CONTENT-TYPE RDF:RESOURCE="APPLICATION--JSON" /> <EXPIRES RDF:RESOURCE="MON._26_JUL_1997_05:00:00_GMT" /> <PRAGMA RDF:RESOURCE="NO-CACHE" /> <CONTENT-ENCODING RDF:RESOURCE="GZIP" /> <CONTENT-LENGTH RDF:RESOURCE="5943" /> <X-CACHE RDF:RESOURCE="MISS_FROM_ROEBURN.OPEN.AC.UK" /> <PROXY-CONNECTION RDF:RESOURCE="KEEP-ALIVE" /> <DATA RDF:RESOURCE="DATA_5CCF6054FD0FBA3EE7EB444E178EAF19" /> </RESPONSE></RESPONSE></REQUEST>

Page 8: Semantic Monitoring of Personal Web Activity to Support the Management of Trust and Privacy

Basic analytics

Page 9: Semantic Monitoring of Personal Web Activity to Support the Management of Trust and Privacy

Focusing on personal data exchange

• Extract information sent through parameters of HTTP Requests

http://uk.search.yahoo.com/beacon/module?p=idiocracy&url=http%3A%2F%2Fwww.imdb.com%2Ftitle%2Ftt0387808%2F

format=JSON&method=fql%2Emultiquery&api%5Fkey=51d350e8d92da1f5623512a9e801da2b&v

=1%2E0&queries=%7B%22query2%22%3A%22SELECT%20app%5Fid%2C%20display%5Fname%20FROM

%20application%20WHERE%20app%5Fid%20IN%20%28SELECT%20app%5Fid%20FROM%20%23query1

%29%22%2C%22query1%22%3A%22SELECT%20post%5Fid%2C%20source%5Fid%2C%20created%5Fti

me%2C%20updated%5Ftime%2C%20actor%5Fid%2C%20target%5Fid%2C%20app%5Fid%2C%20messa

ge%2C%20attachment%2C%20comments%2C%20likes%2C%20permalink%2C%20attribution%2C%2

0type%20FROM%20stream%20WHERE%20filter%5Fkey%20IN%20%28SELECT%20filter%5Fkey%20F

ROM%20stream%5Ffilter%20WHERE%20uid%20%3D%20605559235%20AND%20type%20%3D%20%27ne

wsfeed%27%29%20AND%20%28created%5Ftime%20%3E%3D%201257443596%29%20AND%20%28%28cr

eated%5Ftime%20%3E%201257945423%29%20OR%20%28updated%5Ftime%20%21%3D%20created%5

Ftime%29%29%20ORDER%20BY%20created%5Ftime%20DESC%20LIMIT%20200%22%7D&call%5Fid=1

2565739074246102&sig=01a13a72825ed83ed6d23bdf2791ad1a&session%5Fkey=be312ffdf9b9

e1a5ec6c5768%2D605559235

• Map this data onto a representation of a user profile (set of attributes of personal data)

Page 10: Semantic Monitoring of Personal Web Activity to Support the Management of Trust and Privacy

Tool used to create mappings between data sent to websites (from logs on the right) with the user profile (left). Effectively reconstructing the profile from the data

Page 11: Semantic Monitoring of Personal Web Activity to Support the Management of Trust and Privacy

What this tells us about Trust and Criticality of data

• 36 attributes, 1,080 values, to 123 domains• A model of what piece of personal information

was sent where (can answer the questions)• Taking the point of view of an external

observer, we can derive an observed model of trust and criticality of data– If this piece of data is critical to you and you give

it to bob, you must trust bob– If you give this piece of data to many untrusted

people, you probably don’t consider it critical

• The goal being to help the user to better understand his own behavior

Page 12: Semantic Monitoring of Personal Web Activity to Support the Management of Trust and Privacy

The model formally

• Trust in a domain =

max of criticality of data it received

• Criticality of a piece of data =

1 / 1 + Σ (1- trust in websites

that received the data)

• Obviously, these 2 formulas are interdependent. Treating them as a sequence, with initial values at 0.5

Page 13: Semantic Monitoring of Personal Web Activity to Support the Management of Trust and Privacy

Interacting with the model

Expose the user to his own observed behavior has observed, so that he can try to align it to his intended behavior

Page 14: Semantic Monitoring of Personal Web Activity to Support the Management of Trust and Privacy

What we can do with this

• Help a user understand his own data exchange

• Compare websites and data in terms of the observed trust and criticality

• “Correct” the model by re-aligning it with the intended behavior

• Detect fundamental conflicts between the observed behavior and the intended behavior

• Observe correlations in the data

Page 15: Semantic Monitoring of Personal Web Activity to Support the Management of Trust and Privacy

Where that leads us

• 1 first tools exploiting logs of personal Web activity

• Demonstrate the need for better ways to personal information management as personal Web data exchange

• Need to exploit and integrate local and external sources of data together to create new mechanisms supporting individuals in interpreting, understating and managing their information online

Page 16: Semantic Monitoring of Personal Web Activity to Support the Management of Trust and Privacy

Thank you

[email protected]

@mdaquin