StatMine

Preview:

DESCRIPTION

Presentation given at Dutch Information Visualisation Event 2014

Citation preview

StatMine – prototypeStatMineExploring official statistics

Martijn Tennekes, Edwin de Jonge, Jan van der Laan & Jessica Solcer

Statistics Netherlands (CBS)

Visweek 2013

StatMine, statistical goldmineEdwin de Jonge (@edwindjonge)Jan van der Laan, Jessica Solcer

Statistics Netherlands / CBSDutch Information Visualisation Event 2014, June 19, 2014

StatMine 0.2 2

Statistics Netherlands / CBS

- Creates and publishes official statistics on economics, demographics, health care and others.

- Since 1899

- Website: www.cbs.nl- Online DB: http://statline.cbs.nl (since 1997)

Why StatMine?

– Online StatLine contains more than one billion (109) facts‐ Policy makers‐ Journalists‐ Citizens‐ Enterprises‐ Economists‐ Social scientist‐ Historicians‐ etc

StatMine 0.2 3

StatMine 4

Problem 1Numbers ≠ Information

1. Numbers ≠ Information

We know from a user study that:1. Many interesting patterns in StatLine are not

spotted by users

2. Many important topics in StatLine are scattered across multiple tables

StatMine 0.2 5

StatMine 6

H1:Data analysis= Data insight

H1. Data insight

Goal of StatMine 0.1 was to provide more insight StatLine numbers by

• Presenting these facts visually and interactively

• We tested this succesfully on 4 “difficult StatLine tables.

StatMine 0.2 7

StatMine 0.2 8

Bar chart

- compare

Line chart

- development

Bubble/scatter chart

- correlationMosaic chart

- structure

an exploration of dissemination data: StatMine 9

Chart type – bar chart

StatMine 0.2 10

Small multiples?

StatMine 0.2 11

Demo

an exploration of dissemination data: StatMine 12

StatMine 0.1 Results

Tested on 25 users:

Findings:- Test persons think that visualizing data

adds value (small multiples)- Data owners look at their data

differently- They want this tool to check their data

before publication.StatMine 0.2 13

StatMine 14

Problem 2:Fragmented Information

2. Fragmented information

Most information in StatLine is fragmented:

‐ Energy consumption wrt economic growth‐ Perceived public safety wrt registered crime

– Users currently need to look into multiple tables and combine the information by hand. Gebruiker moet in meerdere tabellen kijken en informatie zelf combineren

StatMine 0.2 15

StatMine 16

2. Merge data!

H2. Table joining

Goal StatMine 0.2: create more insight by:

- Letting users combine tables- Condition: share at least one

column/data dimension.- Tested on small set of tables.

StatMine 0.2 17

StatMine 0.2 Results

Test persons: 20 internal, 40 external (policy makers, journalists).

Findings:- External users enthousiast about

visual possibilities StatMine- Joining of data fills a user need.

StatMine 0.2 18

StatMine 19

Problem 3Statistical numbers are uncertain

H3. Confidence intervals

– Al facts Statistics Netherlands have confidence interval

– European Statistics Code of Practice (12.2): ‐ “sampling and non sampling errors should

be systematically documented”

Goal StatMine 0.3:

Investigate how uncertainty in numbers can be presented understandable to users.

StatMine 20

Restricted to:‐ How do users interpret CI’s? And what does

that affect the interpretation of facts?‐ Do users need CI’s?

Assumption: ‐ For test data set of point estimate with CI

available

StatMine 0.3

StatMine 0.2 21

User test (100+) with synthetic data shows that:

‐ CI’s improve validity of user statements (they are more correct)

User test CI’s

StatMine 0.2 22

StatMine 0.3

– Prototype StatMine 0.3:‐ Show uncertainty in Line Charts‐ Bar Charts‐ Tested on 25 test persons.

23

Line charts with uncertainty

24

Bar charts with uncertainty

25

StatMine 0.4

–Build on CBS open data API–Will be public–Currently in beta test, ETA (2014 Q3)

26

Questions?

27