Upload
dalal-al-azizy
View
45
Download
0
Embed Size (px)
Citation preview
DEANONYMISATION IN LINKED DATA: A RESEARCH ROADMAP
Dalal Al-‐Azizy David Millard, Nigel Shadbolt & Kieron O’Hara
{daaa1g09,dem,nrs,kmo}@ecs.soton.ac.uk
Web & Internet Sciences Research Group
The World Congress on Internet Security -‐ WorldCIS 2014, 8 December, London, England
Outlines • IntroducLon • Background: DeanonymisaLon, Linked Data • Research moLvaLon • Research objecLves • Related work • A paradox in Linked Data • Research roadmap • Conclusions
The World Congress on Internet Security -‐ WorldCIS 2014, 8 December, London, England 2/18
Introduction
Doc.
Web evolu?on Web of documents >> Web of data hyperlinks >> hyperdata (Machine-‐understandable data) (simpleicon.com, Mimagood.com)
Web 1.0 Read Only
Doc. Doc. !
!
!
!
!
!
!
!
Web 3.0 The SemanLc Web
Doc.
Web 2.0 Read-‐write
The World Congress on Internet Security -‐ WorldCIS 2014, 8 December, London, England 3/18
!!
Introduction • What about privacy?
“Jigsaw re-‐idenLficaLon”
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
The World Congress on Internet Security -‐ WorldCIS 2014, 8 December, London, England 4/18
Background: Deanonymisation • Major privacy threat • Matching data from different datasets to re-‐idenLfy anonymous data on the target dataset.
(Latanyasweeney.org)
The World Congress on Internet Security -‐ WorldCIS 2014, 8 December, London, England 5/18
Background: Linked Data • Linked data: publish. aggregate. integrate. explore. consume • Linked Data 4 rules: 1. Name people, things, concepts, etc. uLlising URIs 2. Search for these names uLlising HTTP URIs 3. Offer useful informaLon uLlising the standards RDF and
SPARQL when anyone search for a URI 4. Include links to URIs for more discovery
The World Congress on Internet Security -‐ WorldCIS 2014, 8 December, London, England 6/18
Linked Data: URI . RDF . Semantics
(Cool URIs for the SemanLc Web, Sauermann and Cyganiak, 2007) (www.sipmleicon.com)
Foaf:page statement in RDF
Rdfs:isDe?inedBy statement in RDF
http://www.ecs.soton.ac.uk/people/wh (HTML page about Prof. Wendy Hall)
<link> element in HTML header
http://rdf.ecs.soton.ac.uk/person/1650 (RDF about Prof. Wendy Hall)
http://id.ecs.soton.ac.uk/person/1650 (URI for Prof. Wendy Hall, the person)
HTML RDF
The World Congress on Internet Security -‐ WorldCIS 2014, 8 December, London, England 7/18
Linked Data: metadata description RDF Triples
Subject → predicate → object
<URI> → <URI> → <URI> or “literal”
“Wendy Hall Works in Southampton”
< hlp://rdf.ecs.soton.ac.uk/person/1650> <hlp://xmlns.com/foaf/0.1/worksIn>
<hlp://geonames.org/2637487/>
model
example
code
The World Congress on Internet Security -‐ WorldCIS 2014, 8 December, London, England 8/18
Linked Data: RDF graph “Wendy Hall Works in Southampton” Foaf:Person
Wendy Hall foaf:name
rdf:type
rdf:population
skos:subject
dbpedia:Southampton
Dbpedia:Oxfordd skos:subject
dp:Cities_in_UK
Rc:WendyHall
234,600
Rdf:livesIn
The World Congress on Internet Security -‐ WorldCIS 2014, 8 December, London, England 9/18
Linking Open Data Project Cloud
The World Congress on Internet Security -‐ WorldCIS 2014, 8 December, London, England 10/18
Research motivation • Endless publishing of personal data in the Web as Linked Data • Using online social networks • Mashups and data leakage from online social networks • Publishing government data • Increasing concern about data leakage • “Broken promises of anonymisaLon” (Ohm, 2010) • poliLcal and commercial interests for advancing open data and Linked Open Data over privacy interests
The World Congress on Internet Security -‐ WorldCIS 2014, 8 December, London, England 11/18
Research objectives • understanding the problem of deanonymisaLon comprehensively
• figuring out where data might be not fully secured before publishing them on the Web as Linked Data
• proposing technical soluLons for robust anonymity and legal protecLon for data use policies
• Acknowledging privacy challenges resulted from publishing data on the Web
The World Congress on Internet Security -‐ WorldCIS 2014, 8 December, London, England 12/18
deanonymisaLon
Related work • Deanonymisa?on: context, type of data • Linked Data: privacy challenges Ø Addressing the knowledge gap: “deanonymisaLon in Linked Data” Ø Research problem: “DeanonymisaLon is potenLally more possible by Linked Data”
Social Network image: hlp://kikolani.com
Linked Data
!
!!
!
!!
!
!
Databases
The World Congress on Internet Security -‐ WorldCIS 2014, 8 December, London, England 13/18
A paradox in Linked Data • Data discovery (value) vs. Data deanonymisa?on (threat)
co-‐reference X matching approach link discovery X linkage alacks • Risk of linked data • Trade-‐offs between linked data and privacy • Balancing values between data uLlity of linked data and privacy protecLon
The World Congress on Internet Security -‐ WorldCIS 2014, 8 December, London, England 14/18
Research roadmap • Analysis and classificaLon of deanonymisaLon problem • Deanonymising Linked Data
• Preserving privacy and leveraging data uLlity of Linked Data
Target dataset
Background knowledge
Auxiliary information
Deanonymisation
Threats Assistive techniques
Approaches Attacks
Deanonymised data
The World Congress on Internet Security -‐ WorldCIS 2014, 8 December, London, England 15/18
Conclusions • Understanding deanonymisaLon problem in Linked Data
• Apply deanonymisaLon framework in Linked Data
• Matching is a common approach. (= co-‐reference!)
• Linkage alacks and linkability threat. (= link discovery!)
The World Congress on Internet Security -‐ WorldCIS 2014, 8 December, London, England 16/18
Thank you for your aPen?on
Any comments, ques?ons?
Dalal Al-‐Azizy LinkedIn: DalalAzizy
The World Congress on Internet Security -‐ WorldCIS 2014, 8 December, London, England 17/18
References Berners-‐Lee, T. (2006). Linked data -‐ Design Issues. hlp://www.w3.org/DesignIssues/LinkedData.html
Narayanan, A., & ShmaLkov, V. (2008). Robust De-‐anonymizaLon of Large Sparse Datasets. In IEEE Symposium on Security and Privacy (pp. 111–125). Ieee. doi:10.1109/SP.2008.33 Ohm, P. (2010). Broken promises of privacy: Responding to the surprising failure of anonymizaLon. UCLA Law Review. Retrieved from hlp://papers.ssrn.com/sol3/Papers.cfm?abstract_id=1450006
S. Auer, J. Lehmann, A.-‐C. N. Ngomo, and A. Zaveri, “IntroducLon to Linked Data and its lifecycle on the web,” in Reasoning Web -‐ SemanLc Technologies for Intelligent Data Access, 2013, pp. 1–90.
The World Congress on Internet Security -‐ WorldCIS 2014, 8 December, London, England 18/18