1
How: Evaluation of visualization techniques for EDA: F+C Explain challenges of displaying large data set (how to see a million+ data points? ie. our age old screen space challenge) Explain the guideline of "overview, filter and detail-on- demand" in design. Relate this to EDA. For example, how to generate an overview if the data is not previously known to be hierarchical (ie. what to display, what to leave out if total number of data points exceeds the total available pixels). How does one provide detail-on-demand, PNZ, O+D, F+C? Is that task/data dependent? Explain how much do we know about the effectiveness and applicability of these visualization techniques, and how we contribute to existing knowledge by evaluating these techniques Theory General F+C Evaluations: lit review Text • Graphical Evaluations: real-life tasks Tree (Adam/ Dmitry study [CHI 2006 ] ++??) Time-series data/xy data (LGE study ) Evaluations: human perceptual studies IT4 IT5 How: EDA Systems Apply study results to an application domain Tools for data analysis Line Graph Explorer (XY data) Session Viewer (Web log data) Information retrieval/management music (MusicLand , MusicLand ++ ??) email (Evita [ CPSC534a report ] ) Previous literature search on personal information management for the Memoplex project InfoVis'06: paper rejected ; CHI 2007 ? VSS , APGV06 Venue?: paper AVI’06 Poster: InfoVis’05 CHI 200 7 Visualizatio n for EDA What: Data Explain what large means using the Agilent high through-put instrument data, and Google keynote study clickstream data Explain that the data sets are not only large, but growing due to the relative ease in data collection Explain the data itself: multivariate (?) with metadata (both continuous and categorical). Data may have hierarchical structure. In our examples, they are both time-series data. The instrument data is continuous (subject to sampling errors), while the clickstream data is discrete (by click events) What: EDA Explain what analysts have been doing to analyze such large datasets. Essentially, they have been using some sort of statistics (descriptive, clustering, factor analysis, PCA; Machine learning). Explain the pros (scalable) and cons of such approach (some are hypothesis instead of data-driven, ie. confirmatory rather than exploratory. As a result, it can be difficult to discover data pattern. No cross-checking with data regarding results?) Explain what EDA is, and how people explore data visually, and why is that needed in the analysis. Why: Visualization Explain how visualization can help by working with established statistical methods (EDA -> generate hypothesis -> checked with statistics/further analyze data with statistical methods -> visually verify analysis results -> visually display results for presentation/report) Explain existing EDA visualization for large data set (TJ, space-filling...) Other related documents and sites: •Time line (Year 2) •GPPF •roadmap v.1

How: Evaluation of visualization techniques for EDA: F+C Explain challenges of displaying large data set (how to see a million+ data points? ie. our age

Embed Size (px)

Citation preview

Page 1: How: Evaluation of visualization techniques for EDA: F+C Explain challenges of displaying large data set (how to see a million+ data points? ie. our age

How: Evaluation of visualization techniques for EDA: F+C

• Explain challenges of displaying large data set (how to see a million+ data points? ie. our age old screen space challenge)

• Explain the guideline of "overview, filter and detail-on-demand" in design. Relate this to EDA. For example, how to generate an overview if the data is not previously known to be hierarchical (ie. what to display, what to leave out if total number of data points exceeds the total available pixels). How does one provide detail-on-demand, PNZ, O+D, F+C? Is that task/data dependent?

• Explain how much do we know about the effectiveness and applicability of these visualization techniques, and how we contribute to existing knowledge by evaluating these techniques

Theory

• General F+C

Evaluations: lit review

• Text

• Graphical

Evaluations: real-life tasks

• Tree (Adam/Dmitry study [CHI 2006] ++??)

• Time-series data/xy data (LGE study)

Evaluations: human perceptual studies

• IT4

• IT5

How: EDA Systems

Apply study results to an application domain

Tools for data analysis

• Line Graph Explorer (XY data)

• Session Viewer (Web log data)

Information retrieval/management

• music (MusicLand, MusicLand++ ??)

• email (Evita [CPSC534a report])

• Previous literature search on personal information management for the Memoplex project

InfoVis'06: paper rejected;

CHI 2007?

VSS, APGV06

Venue?: paper

AVI’06

Poster: InfoVis’05

CHI 2007

Visualization for EDA

Visualization for EDA

What: Data

• Explain what large means using the Agilent high through-put instrument data, and Google keynote study clickstream data

• Explain that the data sets are not only large, but growing due to the relative ease in data collection

• Explain the data itself: multivariate (?) with metadata (both continuous and categorical). Data may have hierarchical structure. In our examples, they are both time-series data. The instrument data is continuous (subject to sampling errors), while the clickstream data is discrete (by click events)

What: EDA

• Explain what analysts have been doing to analyze such large datasets.

• Essentially, they have been using some sort of statistics (descriptive, clustering, factor analysis, PCA; Machine learning).

• Explain the pros (scalable) and cons of such approach (some are hypothesis instead of data-driven, ie. confirmatory rather than exploratory. As a result, it can be difficult to discover data pattern. No cross-checking with data regarding results?)

• Explain what EDA is, and how people explore data visually, and why is that needed in the analysis.

Why: Visualization

Explain how visualization can help by working with established statistical methods (EDA -> generate hypothesis -> checked with statistics/further analyze data with statistical methods -> visually verify analysis results -> visually display results for presentation/report)

Explain existing EDA visualization for large data set (TJ, space-filling...)

Other related documents and sites:•Time line (Year 2)•GPPF •roadmap v.1

Other related documents and sites:•Time line (Year 2)•GPPF •roadmap v.1