30
CX4242: Data & Visual Analytics Mahdi Roozbahani Lecturer, Computational Science and Engineering, Georgia Tech

CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

CX4242:

Data & Visual Analytics

Mahdi Roozbahani

Lecturer, Computational Science and

Engineering, Georgia Tech

Page 2: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

Assignments Overview(Tentative and subject to change)

C X 4242

Page 3: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

Assignment 1

Platforms, Languages & TechnologiesPython, Gephi, SQLite, D3, OpenRefine

QuestionsQ1: Collecting and visualizing data (Python & Gephi)

Q2: Analysing data using SQLite

Q3: D3 Warmup

Q4: Analysing data through OpenRefine

Page 4: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

Assignment 2

Platforms, Languages & TechnologiesD3, Tableau

QuestionsQ1: Designing a good table and visualizing data with Tableau Q2: Force

directed graph using D3

Q3: Scatter plots using D3 Q4: Heatmap using D3

Q5: Interactive visualization using D3 Q6: Choropleth map

using D3

Q7: Pros and cons of various visualization tools

Page 5: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

Assignment 3

Platforms, Languages & TechnologiesJava, Hadoop, Spark, Pig, Azure

QuestionsQ1: Analyzing a graph with Hadoop/Java

Q2: Analyzing a graph with Spark/Scala on Databricks Q3: Analyzing

data with Pig on AWS

Q4: Analyzing a graph using Hadoop on Microsoft Azure Q5:

Regression using Azure ML Studio

Page 6: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

Assignment 4

Platforms, Languages & TechnologiesPypy, PageRank, Random Forest, SciKit Learn

QuestionsQ1: Scalable single-machine PageRank

Q2: Implementing a random forest classifier

Q3: Using Scikit-Learn for running various classifiers

Page 7: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

Collection

Cleaning

Integration

Visualization

Analysis

Presentation

Dissemination

Page 8: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

Building blocks. Not Rigid “Steps”.

Can skip some

Can go back (two-way street)

• Data types inform visualization design

• Data size informs choice of algorithms

• Visualization motivates more data cleaning

• Visualization challenges algorithm

assumptions

e.g., user finds that results don’t make sense

Collection

Cleaning

Integration

Visualization

Analysis

Presentation

Dissemination

Page 9: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

How “big data” affects the

process?(Hint: almost everything is harder!)

The Vs of big data (3Vs originally, then 7, now 42)

Volume: “billions”, “petabytes” are common

Velocity: think Twitter, fraud detection, etc.

Variety: text (webpages), video (youtube)…

Veracity: uncertainty of data

Variability

Visualization

Value

Collection

Cleaning

Integration

Visualization

Analysis

Presentation

Dissemination

http://www.ibmbigdatahub.com/infographic/four-vs-big-data

http://dataconomy.com/seven-vs-big-data/

https://tdwi.org/articles/2017/02/08/10-vs-of-big-data.aspx

Page 10: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

Three Example Projects from Polo and Mahdi Research group

Page 11: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

Apolo Graph Exploration:

Machine Learning + Visualization

18

Apolo: Making Sense of Large Network Data by Combining Rich User Interaction and Machine Learning.

Duen Horng (Polo) Chau, Aniket Kittur, Jason I. Hong, Christos Faloutsos. CHI 2011.

Page 12: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

19

Beautiful Hairball

Death Star

Spaghetti

Page 13: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

Finding More Relevant Nodes

Apolo uses guilt-by-association

(Belief Propagation)

HCIPaper

Data MiningPaper

Citation network

20

Page 14: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

Demo: Mapping the Sensemaking Literature

22

Nodes: 80k papers from Google Scholar (node size: #citation)Edges: 150k citations

Page 15: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who
Page 16: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

Key Ideas (Recap)

Specify exemplars

Find other relevant nodes (BP)

24

Page 17: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

What did Apolo go through?

Collection

Cleaning

Integration

Visualization

Analysis

Presentation

Dissemination

Scrape Google Scholar. No API. 😩

Design inference algorithm (Which nodes to show next?)

Paper, talks, lectures

Interactive visualization you just saw

You will a new Apolo prototype (called Argo)

Page 18: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

26

Apolo: Making Sense of Large Network Data by Combining Rich User Interaction and

Machine Learning. Duen Horng (Polo) Chau, Aniket Kittur, Jason I. Hong, Christos Faloutsos.

ACM Conference on Human Factors in Computing Systems (CHI) 2011. May 7-12, 2011.

Page 19: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

NetProbe:

Fraud Detection in Online Auction

NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks. Shashank Pandit, Duen Horng (Polo)

Chau, Samuel Wang, Christos Faloutsos. WWW 2007

Page 20: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

Find bad sellers (fraudsters) on eBay

who don’t deliver their items

NetProbe: The Problem

Buyer

$$$

Seller

28

Non-delivery fraud is a common auction fraud

source: https://www.fbi.gov/contact-us/field-offices/portland/news/press-releases/fbi-tech-tuesday---building-a-digital-defense-against-auction-fraud

Page 21: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

29

Page 22: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

NetProbe: Key Ideas

Fraudsters fabricate their reputation by

“trading” with their accomplices

Fake transactions form near bipartite cores

How to detect them?

30

Page 23: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

NetProbe: Key Ideas

Use Belief Propagation

31

F A H

Fraudster

Accomplice

Honest

Darker means

more likely

Page 24: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

NetProbe: Main Results

33

Page 25: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

34

“Belgian Police”

Page 26: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

35

Page 27: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

What did NetProbe go through?

Collection

Cleaning

Integration

Visualization

Analysis

Presentation

Dissemination

Scraping (built a “scraper”/“crawler”)

Design detection algorithm

Not released

Paper, talks, lectures

Page 28: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

37

NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks. Shashank

Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. International Conference on World Wide

Web (WWW) 2007. May 8-12, 2007. Banff, Alberta, Canada. Pages 201-210.

Page 30: CX4242: Data & Visual Analytics - Visualization€¦ · Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007. Find bad sellers (fraudsters) on eBay who

Homework 1 (out next week; tasks subject to change)

• Simple “End-to-end” analysis

• Collect data using API

• Store in SQLite database

• Create graph from data

• Analyze, using SQL queries (e.g.,

create graph’s degree distribution)

• Visualize graph using Gephi

• Describe your discoveries

Collection

Cleaning

Integration

Visualization

Analysis

Presentation

Dissemination