18
Automated Tracking of Online Service Policies J. Trent Adams 1 Kevin Bauer 2 Asa Hardcastle 3 Dirk Grunwald 2 Douglas Sicker 2 net Society 2 University of Colorado 3 OpenL 38th Research Conference on Communication, Information and Internet Policy

Automated Tracking of Online Service Policies J. Trent Adams 1 Kevin Bauer 2 Asa Hardcastle 3 Dirk Grunwald 2 Douglas Sicker 2 1 The Internet Society 2

Embed Size (px)

Citation preview

Automated Tracking of Online Service Policies

J. Trent Adams1 Kevin Bauer2 Asa Hardcastle3 Dirk Grunwald2 Douglas Sicker2

1 The Internet Society 2 University of Colorado 3 OpenLiberty.org

38th Research Conference on Communication, Information and Internet Policy

TPRC 2010: Automated Tracking of Online Service Policies

2

What They Know

Search queries Web browsing habits

Shopping habits

Social relationshipsOffline behaviors Personal interests

Possible medical conditions

Financial status

TPRC 2010: Automated Tracking of Online Service Policies

3

User Tracking is Easy and CommonWhen a user visits a website…

Website

Implicit information revealed: IP address HTTP request headers (user-agent, operating system,local time and language, referrer)

This information alone can be usedto construct an identifying, trackable profile [EFF’s Panopticlick, PETS ’10]

Additional tracking elements:Sites often embed cookiesand other tools to explicitly identify and track usersdictionary.reference.com

Source: http://blogs.wsj.com/wtk

TPRC 2010: Automated Tracking of Online Service Policies

4

The Need for Clear Policy Articulation

Given the inherent privacy risks in ordinary web browsing, most sites explicitlyexplain how they handle sensitive user data (PII) in a human-readable, natural

language privacy policy or terms of service document

Pros of natural language policies

Near universal deployment

Cons of natural language policies

Users must find, read, andcomprehend the policies

Comprehension is poor for natural language policies

[McDonald et al., PETS ’09]

TPRC 2010: Automated Tracking of Online Service Policies

5

Structured Policy Formats: P3P

• The Platform for Privacy Preferences (P3P) is a machine-readable XML schema for encoding:– What kind of user information is collected– How any collected user information is used– How long user information is stored

• P3P files can be automatically parsed and semantically analyzed by the web browser

• Users can specify their own preferences and interact only with sites with compatible policies

• Policy information can be transformed into “standardized” formats to improve policy comprehension

TPRC 2010: Automated Tracking of Online Service Policies

6

P3P and Standardized Policy Formats

Structured policy formats (like P3P) can be summarized and displayed to users in standardized, easy to read formats

... “Privacy Finder” P3P Search Engine Result

TPRC 2010: Automated Tracking of Online Service Policies

7

Slow Adoption for P3P

A study by Cranor et al. found that the most popular web sites tend to be more likely to offer P3P, but overall deployment is very low

Source: Cranor et al., Electronic Commerce Research and Applications 2008

2006: Only 10.25% offer P3P

2008: Only 13.59% offer P3P

TPRC 2010: Automated Tracking of Online Service Policies

8

Our Goal: Make Interacting with Natural Language Policies Easier

P3P adoption is limited, but human-readable policies are prevalent

This is a stop-gap measure: Until a structured policy format is widelyadopted, we must interact with natural language policies

Our contribution: Design and implement Policy Audit System- Aggregates natural language policies for a wide variety of

websites - Periodically checks these policy documents for updates - Enables distribution of policies to interested users - Notifies users about specific changes in policies

P3PNatural language policy tracking

… New structured policy format?

TPRC 2010: Automated Tracking of Online Service Policies

9

Policy Audit System: Architecture

Key Components:- Policy Monitor: Periodically fetches known policy documents for a large set of websites; checks policies for changes- Policy Library: The collection of policy documents for each site over time- Policy Library Mirrors: Copies of the policy library hosted by third parties

- Clients: Offers a way for users to obtain current or past policy information

TPRC 2010: Automated Tracking of Online Service Policies

10

Policy Monitor

• Periodically fetches a set of policy document URLs • Extracts relevant policy text using standard text parsing techniques• Compares the latest version to previously seen version to detect changes• Records latest version (if changed)• Based on the EFF’s TOSBack service (http://www.tosback.org)

TPRC 2010: Automated Tracking of Online Service Policies

11

Policy Library

• The Policy Monitor produces a library of policy documents, as they change over time

• The Policy Library is a directory structure available via the web:– A list of tracked web websites– Policy text snapshots, or previous versions– Various metadata to help find the latest document version

• The master library is hosted by the University of Colorado• Currently tracking 76 distinct policies (more coming soon)

TPRC 2010: Automated Tracking of Online Service Policies

12

Policy Library Mirrors

• Policy Library copies that are distributed among trusted parties• The Electronic Frontier Foundation (EFF), the Center for Democracy

and Technology (CDT), and the University of Colorado host Policy Library mirrors

TPRC 2010: Automated Tracking of Online Service Policies

13

Clients

• Generically, a client offers an interface to the Policy Library, providing access to policy data

• A client could offer the ability to search the library, automate change notification via twitter, ATOM, RSS, or e-mail

• We developed a client as a Firefox plugin that displays policy information (and notification of changes) for the current site the user is visiting

TPRC 2010: Automated Tracking of Online Service Policies

14

Example Client: Firefox Browser Plug-in*

• Accesses the Policy Library and alerts the user when they visit a website that publishes a policy that the Policy Monitor is tracking

Alert Icons

Visiting a site that’s not tracked

Visiting a trackedsite, but no changein policy since last visit

Visiting a trackedsite with an updatedpolicy since last visit

Visiting a tracked sitewith an unread policy

* sponsored by

TPRC 2010: Automated Tracking of Online Service Policies

15

Plug-in: Visiting a Tracked Site

Menu lists tracked policies

TPRC 2010: Automated Tracking of Online Service Policies

16

Plug-in: Visiting a Tracked Site with Policy Changes

TPRC 2010: Automated Tracking of Online Service Policies

17

Plug-in: Discovering Third Party Information Disclosure

Current policies for a visited pagewww.apple.com/itunes

Notify user of third-party pageelements

TPRC 2010: Automated Tracking of Online Service Policies

18

Summary and Conclusion• Given the absence of a widely adopted structured policy format, we argue that

steps should be taken to make natural language policies easier for users to understand

• To this end, we present the Policy Audit System to track natural language policy documents and notify users of policy updates

• Our hope is that this work helps individuals make sense of natural language policies while we wait for a structured policy data format to be widely adopted

For more informationProject overview: http://www.policymonitor.org/aboutDevelopment community: http://www.policymonitor.org/sourcecodeFirefox plug-in download: http://www.policymonitor.org/auditplugin

Thank [email protected]