30
Infographics and Visualisation (or: Beyond the Pie Chart) LSS: ITNPBD4, 1 November 2016

Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

Infographics and Visualisation

(or: Beyond the Pie Chart)

LSS: ITNPBD4, 1 November 2016

Page 2: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

ITNPD4: Applications of Big Data 2

Overview (short: we covered most of this in the tutorial) –  Why infographics and visualisation

–  What’s the problem we’re trying to solve? –  What makes for good infographics and visualisations? –  Where are we now in this area?

–  Interactive visualisations

Overview

Page 3: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

The problem •  Data analysis may tell you something about the structure

of a problem •  Or may predict how to optimise something

–  Profit, energy usage etc. •  BUT:

–  In general you will have to convince someone else –  And they may not be convinced by the numbers on their own

•  They expect some sort of graphic that they can show to the Board/CEO to convince them –  A visualisation, perhaps an infographic.

•  The other side of this is that people may be presenting their data with a particular axe to grind…

ITNPD4: Applications of Big Data 3

Page 4: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

Visualisation and infographics •  Visualisation is the generic name for displaying data

–  May be a single image –  Or a movie, for example. “Visualizations help people see things that were not obvious to them before” (SAS website)

•  There is also sonification, where data is sounded out: this works, because our ears are very good a picking up patterns. –  E.g. Geiger counter, reversing systems in modern cars.

•  Infographics may be single images –  Providing a visualisation of a specific set of data. –  But they may also be interactive

ITNPD4: Applications of Big Data 4

Page 5: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

Infographics •  An infographic is a

picture that displays information in an accessable and/or informative way.

•  Can be quite simple •  …or quite complex

ITNPD4: Applications of Big Data 5

Page 6: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

…not a new idea (Minard, 1869)!

ITNPD4: Applications of Big Data 6

The standard text in this area is E. R. Tufte, “The visual display of quantitative information”

Page 7: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

ITNPD4: Applications of Big Data 7

Infographic shows the troops and troop movements on the eastern from in World War 2.

Page 8: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

Visualisation of low-dimensional datasets

•  Low-dimensional datasets are often visualised as simple X/Y graphs: but even here there are issues –  For both X and Y axes:

•  Offset (is the origin at 0?) •  Scale •  Linear or logarithmic? •  Continuous or broken axes.

–  Graph lines: •  One or more than one? •  Line style: continuous, dashed, dotted… •  Line colour •  Symbols and/or lines?

ITNPD4: Applications of Big Data 8

Page 9: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

ITNPD4: Applications of Big Data 9

0 100 200 300 400 500 600 700 800 900 1000-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

100 101 102 103-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0 100 200 300 400 500 600 700 800 900 100010-4

10-3

10-2

10-1

100

101

100 101 102 10310-4

10-3

10-2

10-1

100

101

Page 10: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

ITNPD4: Applications of Big Data 10

0 100 200 300 400 500 600 700 800 900 1000-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0 100 200 300 400 500 600 700 800 900 1000-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Using different line styles and colours…

Page 11: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

ITNPD4: Applications of Big Data 11

Visualising 3D data.

Page 12: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

Visualising high dimensional datasets

•  This is harder: and can be where infographics comes in –  Cannot do this directly.

•  Can plot two or three dimensions directly, but not more •  Clever infographics can plot more dimensions, for example using

geographical location, lines of varying thickness and colour, multiple symbols

–  How can we show the structure of such datasets? •  When we can’t think of one-off target-domain clever tricks …

–  Discuss earlier infographics

•  Clearly depends on what we are trying to show! –  Geography as timeline, for example –  See also http://www.creativebloq.com/graphic-design-tips/great-

infographic-design-tips-1232813

ITNPD4: Applications of Big Data 12

Page 13: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

What can we do in general

•  Let’s say that we don’t have any inspiration for designing a good infographic (!) –  Infographics often depends on specific factors

•  E.g. dates, geographic distribution, …

•  Can we find 2 or 3 (or even a few more) dimensions that … – … in some sense

•  …summarise (what we want to emphasise about) the dataset?

•  Ways forward: projecting and clustering

ITNPD4: Applications of Big Data 13

Page 14: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

Choosing dimensions and projecting data

•  If the data is randomly spread throughout all the dimensions and has no structure? –  Give up. There’s nothing to be learned from it (if it really is

random) •  Datasets that have something to tell us have some

from of structure •  Maybe the data lie (largely) on a smaller

dimensional subset of the high-dimensional space. –  As opposed to being spread randomly and evenly

throughout the original space.

ITNPD4: Applications of Big Data 14

Page 15: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

Example •  Say that we have 3-dimensional data, sampled over time

–  Each point is (x,y,z,t): really 4-dimensional data •  and -1 <= x2+y2+z2 <=1, 0<=t<=10 (the points (x,y,z) are

inside a sphere, of radius 1, centered at the origin) •  Let’s also say that at each time t, sqrt(x2+y2+z2) = t/10

–  So that the points at time t are on the surface of a sphere of radius t/10

•  Clearly, if we simply look at all the(x,y,z) points (ignoring t) they are spread throughout the sphere –  But not in an unstructured way

ITNPD4: Applications of Big Data 15

Page 16: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

Discovering structure in data

•  There are many techniques for discovering (uncovering) structure –  Principal component analysis (pca) –  Linearly projecting a high dimensional dataset on to a

smaller number of dimensions •  In such a way that as much as possible of the variance in the

data is contained in this smaller number of dimensions •  And the dimensions are orthogonal to each other •  Well-understood and commonly used technique for data

dimension reduction

ITNPD4: Applications of Big Data 16

Page 17: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

ITNPD4: Applications of Big Data 17

Page 18: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

Independent components analysis •  Independent components analysis (ica)

–  a statistical and computational technique for revealing hidden factors that underlie sets of random variables, measurements, or signals. Hyvärinen, (U Helsinki)

•  Essentially looking for dimensions that co-vary •  Finding ways of summarising points in the N-dimensional

space using less than N values. •  Data is assumed to be a linear mixture of underlying

latent variables –  These are assumed non-Gaussian, and mutually independent:

independent components •  Related to PCA, but can find structure when PCA fails to

do so

ITNPD4: Applications of Big Data 18

Page 19: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

Example: input

ITNPD4: Applications of Big Data 19

Page 20: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

ICA output

ITNPD4: Applications of Big Data 20

Page 21: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

ITNPD4: Applications of Big Data 21

Page 22: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

Clustering data

•  Often rather than projecting data on to other axes, it is better to look at how the data points are grouped –  The aim is to classify a large number of data vectors

into a small number of manageable groups •  Does the data fall into clusters?

–  How unevenly distributed is the data? –  Does it cluster in

•  The original high-dimensional space •  In a lower-dimensional projected space?

ITNPD4: Applications of Big Data 22

Page 23: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

How does clustering work?

•  Techniques –  Partition or Hierarchical

ITNPD4: Applications of Big Data 23

Page 24: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

Examples

ITNPD4: Applications of Big Data 24

Page 25: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

Partition-based clustering •  Based on distance between vectors

–  But which distance? •  Euclidean •  City-block? •  Weighted versions •  Chebychev distance

•  Forming clusters: –  Simple method:

•  Start with each vector as a single-element cluster •  Identify two closest vectors and combine them into the same

cluster. •  Keep doing this until the distance between the two closest

vectors not in the same cluster is large.

ITNPD4: Applications of Big Data 25

Page 26: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

Criticisms of clustering

•  Clustering is descriptive, and not unique –  Actual clusters may depend on techniques used, as well

as on the data •  Clustering techniques will always find clusters

–  Even when there aren’t any! –  (This implies some measure for quality of clustering should

be used) •  Clustering techniques depend strongly on the

measures used –  There should ideally be some conceptual support of the

measures used to calculate distances between vectors.

ITNPD4: Applications of Big Data 26

Page 27: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

Examples:

•  Google News indexes –  Uses text to create topic clusters

•  Title, article listings •  Used to discover multiple reports of same story

•  Video clusters on YouTube –  Uses keywords, popularity, viewer engagement, user

browsing history –  http://www.strutta.com/blog/six-degrees-of-youtube/

ITNPD4: Applications of Big Data 27

Page 28: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

Infographics tools

•  At its simplest, Excel has many facilities for creating infographics and visualisations. –  But it’s limited, and proprietary (though one can

import comma separated values) •  Matlab? Not free! Good graphing tools •  Flot: jQuery and JavaScript based •  Google Chart API: free

–  JavaScript based, browser output

•  D3: JavaScript based, very powerful.

ITNPD4: Applications of Big Data 28

Page 29: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

Using visualisation and infographics

•  As noted earlier, infographics and visualisation –  Is about communication of ideas about data,

discoveries from data mining etc to others •  But visualisation has another important usage as

well –  Exploratory (Initial) data analysis

•  How can you decide which tools to apply to data – and how to apply them – if you haven’t an initial idea of what might be useful?

ITNPD4: Applications of Big Data 29

Page 30: Infographics and Visualisation (or: Beyond the Pie Chart · • This is harder: and can be where infographics comes in – Cannot do this directly. • Can plot two or three dimensions

ITNPD4: Applications of Big Data 30