12
Enforcing Policies on Social Media Data Extracted from the Web Nicoletta Fornara and Truc-Vien T. Nguyen Università della Svizzera italiana Lugano, Switzerland

Enforcing Policies on Social Media Data Extracted from the Web Nicoletta Fornara and Truc-Vien T. Nguyen Università della Svizzera italiana Lugano, Switzerland

Embed Size (px)

Citation preview

Page 1: Enforcing Policies on Social Media Data Extracted from the Web Nicoletta Fornara and Truc-Vien T. Nguyen Università della Svizzera italiana Lugano, Switzerland

Enforcing Policies on Social Media Data Extracted from the WebNicoletta Fornara and Truc-Vien T. NguyenUniversità della Svizzera italianaLugano, Switzerland

Page 2: Enforcing Policies on Social Media Data Extracted from the Web Nicoletta Fornara and Truc-Vien T. Nguyen Università della Svizzera italiana Lugano, Switzerland

Summary• Web/Internet data collection is becoming increasingly

important for many social science fields

• Being able to formalize and enforce policies for regulating the collection and the use of those data is crucial, especially taking into account privacy and confidentiality wishes of who provided the data

• Even if such policies are not all enforced by data publishers their fulfilment is crucial to follow an ethics in Internet Research

• We present the SemPolicy Manager Tool, which is able to enforce a given set of policies by taking into account the meaning of the collected data

Page 3: Enforcing Policies on Social Media Data Extracted from the Web Nicoletta Fornara and Truc-Vien T. Nguyen Università della Svizzera italiana Lugano, Switzerland

Web/Internet data collection technologies• Internet data collection by means of

– web service interfaces: a software designed to support Machine-to-Machine interaction over a network, or

– system specific APIs (Application Program Interface) a specific interface for accessing the data of a data provider

• Web data collection by means of web crawlers: a software which is able to ssystematically browse the World Wide Web, building a local repository of the portion of the Web that it visits, very often the purpose is Web indexing

• Examples used in the paper: – Facebook RestFB a Facebook Graph API written in Java– Twitter REST API an interface for programmatic access to read and

write Twitter data

Page 4: Enforcing Policies on Social Media Data Extracted from the Web Nicoletta Fornara and Truc-Vien T. Nguyen Università della Svizzera italiana Lugano, Switzerland

Type of Policies• Ethical guidelines proposed by various associations for social

research (e.g. American Association for Public Opinion Research at point I.A.5

• Legal constrains on the processing of personal data (e.g. the European Union Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data. – This directive states the necessity of anonymization at point (26), define

the notion of personal data and processing of personal data in Article 2, and constraint personal data processing in Article 8

• Web site policies/terms on how the data available on a web site can be used for automatic data collection (e.g. Facebook Automated Data Collection Terms and robot.txt files)

Page 5: Enforcing Policies on Social Media Data Extracted from the Web Nicoletta Fornara and Truc-Vien T. Nguyen Università della Svizzera italiana Lugano, Switzerland

The SemPolicy Manager ToolInnovative technologies used for realizing the tool:

1. Semantic Web technologies for expressing the meaning of the data

2. Declarative norms formalization and enforcement for expressing policies

3. Natual Language Processing Techniques used to enrich the collected data with new semantic information contained in unstructured text

Page 6: Enforcing Policies on Social Media Data Extracted from the Web Nicoletta Fornara and Truc-Vien T. Nguyen Università della Svizzera italiana Lugano, Switzerland

Architecture of the SemPolicy Manager Tool

Data Collector

Semantic Analysis Component

Enforcement service

Social NetworkData Ontology

PolicyOntology

writes enriches

imports

reads

Page 7: Enforcing Policies on Social Media Data Extracted from the Web Nicoletta Fornara and Truc-Vien T. Nguyen Università della Svizzera italiana Lugano, Switzerland

Using the SemPolicy Manager Tool (1)We evaluated the tool on a specific use case:

• the collection of social network data from Facebook and Twitter, and the enforcement on those data of certain articles of the EU Directive 95/46/EC, stating the necessity of anonymization of personal data and of data revealing confidential information on people (point 26, Article 2 and 8).

The enforced policies are:

• Policy 1. It is obligatory to make anonymous all personal data relating to an identified or identifiable natural person in order to store, retrieve, and use them. Those properties include: username, user ID, first name, last name, full name, web site.

• Policy 2. It is obligatory to anonymize or remove a text if it reveals racial or ethnic origin, political opinions, religious or philosophical beliefs.

Page 8: Enforcing Policies on Social Media Data Extracted from the Web Nicoletta Fornara and Truc-Vien T. Nguyen Università della Svizzera italiana Lugano, Switzerland

Policy 1 and 2 -> 3 Obligations• From Policy 1 and 2 we formalized the following three obligations having

an activation condition and an action to be performed:

• Policy 1-Obligation 1: it is activated when in the SN Ontology there is a user personal data which is not popular. The obliged action consists in retrieving all user's personal information and then anonymize them.

• Policy 1-Obligation 2: it is activated when in the SN Ontology there is a message (the content of a post or of a comment or of a twit) and it contains personal information. The obliged action consists in anonymizing all personal information that appear in the content of posts/comments/twits.

• Policy 2-Obligation 1: it is activated when in the semantically enriched collected data there is a statement (post or comment or twit) whose content is related to a sensitive topic. The obliged action consists in removing sensitive topics in the content of posts or comments, or twitts.

Page 9: Enforcing Policies on Social Media Data Extracted from the Web Nicoletta Fornara and Truc-Vien T. Nguyen Università della Svizzera italiana Lugano, Switzerland

Using the SemPolicy Manager Tool (2)• The Semantic Analysis Component needs to identify in the

collected data (post, comments, and twits) 1. personal data: first name, last name, full name (of

people), web sites (popular names do not need to be anonymized)

2. sensitive data: data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs.

• The Enforcement Service is in charge of checking if the policies, stored in the Policy Ontology, are active (this depends on the semantic content of the collected data) and it is in charge of enforcing the active policies.

Page 10: Enforcing Policies on Social Media Data Extracted from the Web Nicoletta Fornara and Truc-Vien T. Nguyen Università della Svizzera italiana Lugano, Switzerland

Evaluation of the SemPolicy Manager Tool• The response time, for the enforcement of the three

obligations* reaches a stable level at some point, this means that our application can be applied in reality. 1. The first obligation takes more time (with Facebook data it takes

50 minutes with 200 seed users, with Twitter data it takes 12 minutes with 500 users) than the other ones because there are many private attributes of facebook/twitter users, even more than the number of private data entries found within the messages.

2. The second obligation requires 5 minutes for 200 Facebook seed users and 12 minutes for 500 Twitter users.

3. The third obligation requires 0.20 minutes for 200 Facebook seed users and 0.28 minutes for 400 Twitter users.

* using a PC with Intel(R) Core(TM) 2 Quad CPU Q9650 @ 3.00Ghz and 4GB RAM

Page 11: Enforcing Policies on Social Media Data Extracted from the Web Nicoletta Fornara and Truc-Vien T. Nguyen Università della Svizzera italiana Lugano, Switzerland

Conclusions• Thanks to the use of Semantic Web Technologies for

representing the collected data and the policies, it is possible to change the activation condition of the formalized policies without the need to reprogram the tool

• The tool can be used to enforce other policies but it may be necessary to program the software for the execution of the obliged action and/or extending the Semantic Analysis Component

• In our future work we plan to study how to improve the user interface of the SemPolicy Manager Tool

Page 12: Enforcing Policies on Social Media Data Extracted from the Web Nicoletta Fornara and Truc-Vien T. Nguyen Università della Svizzera italiana Lugano, Switzerland

• Thank you for your attention!

•Questions?