1
Overview As technology has advanced, the way in which privacy is both protected and violated has changed with it. In the case of the Internet, the improved ability to share information can lead to more ways in which privacy can be breached. The Internet has brought new concerns about privacy in an age where computers can permanently store many kinds of records. Goal of the Project The overall goal of our project was to create a command line executed tool written in python to query the Yahoo! Answers database and obtain relevant privacy complaint data to be further analyzed. Preliminary Analysis Our four-member team started by searching for privacy complaints posted by Yahoo! users on the Answers search engine, as well as the terms or phrases that produced relevant data. The idea was to look for relevant questions about individuals’ privacy concerns. Method The initial process was to connect to and query Yahoo! Answers for Many Eyes Visualization Analysis Many Eyes is a software tool used to visualize patterns within a text. It is used to visually explore information and to help holistically analyze a data set. The Word Cloud Generator, a text analysis tool, enables you to see the frequency with which words appear in a given text and the relationship between words within that text. The Word Cloud Generator was used to further analyze the textual data that was stored in the database. The figure below shows the analysis of the text using this visualization tool. Conclusion This research demonstrates that Yahoo! Answers provides a useful dataset for further analysis using tool such as the Natural Language Toolkit and Many Eyes, because it reflects both people’s real world concerns and questions about privacy issues. Future Work Future work could include increasing the list of privacy related terms and phrases used to generate results, collect more data, and sanitize the code to obtain more relevant results from specified categories. Using natural language processing for linguistic analysis of the text and creating a taxonomy of privacy terms are also goals. Investigating Privacy Complaints Anand Sonkar 1 , Jennifer King 2 , Nick Doty 2 , Prof. Deirdre Mulligan 2 1 Arizona State University, 2 University of California , Berkeley Results Database consists of an organized collection of data. A table with eight rows was created within the database to store the results collected from Yahoo! Answers. As of July 2010, our team has generated over 7,000 results, including the keywords related to privacy, using the previously described tool. In our preliminary research our team derived a list of about 12 words/phrases related to privacy that were efficient in collecting data. Yahoo! Answers search engine. Database populated with the results. Flowchart to create python command line tool. Analyzing the text using Word Cloud Generator. Table of fields created in the database. This work was supported by the TRUST Center (NSF award numbe CCF-0424422)

Investigating Privacy Complaints

  • Upload
    delila

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Investigating Privacy Complaints . Anand Sonkar 1 , Jennifer King 2 , Nick Doty 2 , Prof. Deirdre Mulligan 2 1 Arizona State University, 2 University of California , Berkeley. Many Eyes Visualization Analysis - PowerPoint PPT Presentation

Citation preview

Page 1: Investigating Privacy Complaints

OverviewAs technology has advanced, the way in which privacy is both protected and violated has changed with it. In the case of the Internet, the improved ability to share information can lead to more ways in which privacy can be breached. The Internet has brought new concerns about privacy in an age where computers can permanently store many kinds of records.

Goal of the ProjectThe overall goal of our project was to create a command line executed tool written in python to query the Yahoo! Answers database and obtain relevant privacy complaint data to be further analyzed.

Preliminary AnalysisOur four-member team started by searching for privacy complaints posted by Yahoo! users on the Answers search engine, as well as the terms or phrases that produced relevant data. The idea was to look for relevant questions about individuals’ privacy concerns.

MethodThe initial process was to connect to and query Yahoo! Answers for specific keywords and store the results into a MySQL database. The flowchart above summarizes the process of how the script is executed.

Many Eyes Visualization AnalysisMany Eyes is a software tool used to visualize patterns within a text. It is used to visually explore information and to help holistically analyze a data set. The Word Cloud Generator, a text analysis tool, enables you to see the frequency with which words appear in a given text and the relationship between words within that text. The Word Cloud Generator was used to further analyze the textual data that was stored in the database. The figure below shows the analysis of the text using this visualization tool.

ConclusionThis research demonstrates that Yahoo! Answers provides a useful dataset for further analysis using tool such as the Natural Language Toolkit and Many Eyes, because it reflects both people’s real world concerns and questions about privacy issues.

Future WorkFuture work could include increasing the list of privacy related terms and phrases used to generate results, collect more data, and sanitize the code to obtain more relevant results from specified categories. Using natural language processing for linguistic analysis of the text and creating a taxonomy of privacy terms are also goals.

AcknowledgementsI would like to thank my graduate mentors Jennifer King, and Nick Doty and Professor Deirdre Mulligan for providing me the opportunity to work on this project. I would also like to thank Christopher Castillo, Jennifer Felder, German Gomez and Rafael Negron. Special thanks to Dr. Kristen Gates, NSF and TRUST for providing me with the opportunity to conduct this research.

Investigating Privacy Complaints Anand Sonkar1 , Jennifer King2 , Nick Doty2 , Prof. Deirdre Mulligan2

1Arizona State University, 2University of California , Berkeley

ResultsDatabase consists of an organized collection of data. A table with eight rows was created within the database to store the results collected from Yahoo! Answers.

As of July 2010, our team has generated over 7,000 results, including the keywords related to privacy, using the previously described tool. In our preliminary research our team derived a list of about 12 words/phrases related to privacy that were efficient in collecting data.

Yahoo! Answers search engine. Database populated with the results.

Flowchart to create python command line tool.

Analyzing the text using Word Cloud Generator.

Table of fields created in the database.

This work was supported by the TRUST Center (NSF award number CCF-0424422)