Upload
darlene-malone
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Prof. Jason Hong, Carnegie Mellon University
Rapid End-User Programming and Visualization for the Web
IDA Session 52007 CS Study Panel
24 April 2008
Research Areas• End-User Programming
• Extracting and visualizing data from web• Usable Privacy and Security
• Anti-phishing (training, detection)• Managing privacy and security policies
• Mobile Computing• Location-based services• Context-aware computing
Jason HongAssistant ProfessorHuman-Computer Interaction InstituteCarnegie Mellon University
PhD: University of California, Berkeley
Potential Military Applications• Tools for rapidly integrating data and
web services• Better visualizations of large data sets• Effective training for security• Automated algorithms for detecting
phishing scams• Better interfaces for managing security
Principal Investigator
Contact Information
School of Computer ScienceCarnegie Mellon University2504D Newell-Simon Hall 5000 Forbes Ave
Tel: (412) 268 1251 Fax: (412) 268 1266E-mail: [email protected]: http://www.cs.cmu.edu/~jasonh
Principal Investigator
30000 Foot View
• High-level problems observed: – Stovepipes - Data and services spread over multiple systems
– Agility - Integration takes months or years
– Overload - Too much information to easily process
• Goal: Make it easy for people to visualize and process data gathered from variety of sources– Information extraction + visualization + machine learning
– No PhD required
• Analogies: – Spreadsheets
– Visual Basic
Mashups as Key Focus Area
• More specifically, provide an end-user programming tool that makes it easy to create mashups– Mashups are applications that combine content and
services from multiple web sites
– Ex. Craigslist.com + GoogleMaps = Housingmaps.com
Other Example Mashups
• Other example mashups– Ex. MySpace child predators
– Ex. Locations of friends on MySpace or Facebook
• Common themes– Aggregating multiple sources (web pages, databases, etc)
– Handling multiple data formats (not designed to be shared)
– Processing the data (filtering, summarizing, etc)
– Supporting multiple forms of output (graphs, maps, lists)
Creating Mashups is Difficult
• Requires lots of skill to create a mashup– Ex. Housingmaps creator has PhD in computer science
– Ex. MySpace predator list took months of custom coding
• Requires programming expertise in many areas– Web crawling
– Text parsing and pattern matching
– Web services (WSDL and REST)
– Databases
– HTML
• Can we accelerate this process to a matter of days or hours for non-experts?
End-User Programming
• Haggis, an end-user programming tool1. Rapidly extract and combine data from multiple sources
2. Quickly create high-quality interfaces and visualizations
3. Use programming-by-example techniques to specify what is normal and what is anomalous
1. Extract data from multiple sources
• Improved wizards for extracting data from web pages– Can specify example of desired links, system generalizes
• Improved wizards for extracting data from web pages– Can specify example of desired links, system generalizes
– Better support for other patterns on web• Tables, street addresses, etc
• Support for real-time data– Weather, traffic, stocks, any web page periodically updated
– Sensor Andrew, sensor network being deployed at CMU• Electrical usage, water usage, etc
1. Extract data from multiple sources
2. Interfaces and Visualizations
• Wizards for supporting common UI patterns– Table views, maps, graph views, alerts, etc
• Programming-by-example techniques
2. Interfaces and Visualizations
• Output as a web page or desktop widget– Yahoo Widgets, Google Desktop, Windows Sidebar
2. Interfaces and Visualizations
• Output as a web page or desktop widget– Yahoo Widgets, Google Desktop, Windows Sidebar
3. Normal versus Anomalous
• Problem: Too much data, gets dropped on floor• Solution: “Teach” the system what patterns to look for
– Analyst-in-the-loop: infoviz + machine learning
– Long-term goal
• Example:– eBay “penny sellers”, could create custom software, but slow
– Analyst uses visualization to find some examples of penny sellers and gives hints to system as to why
– System finds more suspects, analyst gives relevance feedback
– As new data streams in, system can flag suspects
• Can help address high turnover rate at intelligence agencies, loss of organizational memory
Current Progress
• First round of interviews completed– Sensor Andrew team (Civil and Electrical Engineers)
– Mashup Camp
– Programmers around CMU
• Initial prototype of “plumbing” in progress– An Integrated Development Environment (IDE) for
programmers, to facilitate extraction and visualization of data
– Low-level support for extracting data from tables, basic visualizations, etc
– Higher-level tools later to be built on top
• First round of user tests planned for August
Past Work with Marmite
•Wizard for extracting data from arbitrary web pages
•Combine operators together in a dataflow (Unix)
•View the data in multiple ways (table, map)
How Marmite Works
•Wizard for getting data from web pages
•Combine operators together in a dataflow (Unix)
•View the data in multiple ways (table, map)
How Marmite Works
•Operators let you knowwhat operations can be done
•Input, processing, output
How Marmite Works
•Operators are chained together in a dataflow (Unix)
How Marmite Works
•Current data is shown
How Marmite Works
•And multiple views too
How Marmite Works
•A wizard UI for helping people get the data they want
Some High-Level Design Issues
• Centralized model– Clean data model: well-managed, well-formatted,
common representations, well-known databases, etc
• Decentralized model– “Anarchic”, multiple data formats in multiple places
– Hard to get lots of people to agree on data format and representation
– More likely scenario (look at how databases are used today)
– Haggis is being designed for this model, assuming that a person may have to clean up the data and resolve formats
Other High-Level Design Issues
• Discovery– What data sources are available?
– May need some kind of centralized store that describes these (sort of like DNS for Internet)
• Security– Access control, who can access what data sources?
– This is a general problem with sensor data
• Privacy– What kinds of queries / apps should people be able to do?
– Unclear how to restrict those in practice