27
StatMine – prototype StatMine Exploring official statistics Martijn Tennekes, Edwin de Jonge, Jan van der Jessica Solcer Statistics Netherlands (CBS) Visweek 2013 StatMine, statistical goldmine Edwin de Jonge (@edwindjonge) Jan van der Laan, Jessica Solcer Statistics Netherlands / CBS Dutch Information Visualisation Event 2014, June 19, 2014

StatMine

Embed Size (px)

DESCRIPTION

Presentation given at Dutch Information Visualisation Event 2014

Citation preview

Page 1: StatMine

StatMine – prototypeStatMineExploring official statistics

Martijn Tennekes, Edwin de Jonge, Jan van der Laan & Jessica Solcer

Statistics Netherlands (CBS)

Visweek 2013

StatMine, statistical goldmineEdwin de Jonge (@edwindjonge)Jan van der Laan, Jessica Solcer

Statistics Netherlands / CBSDutch Information Visualisation Event 2014, June 19, 2014

Page 2: StatMine

StatMine 0.2 2

Statistics Netherlands / CBS

- Creates and publishes official statistics on economics, demographics, health care and others.

- Since 1899

- Website: www.cbs.nl- Online DB: http://statline.cbs.nl (since 1997)

Page 3: StatMine

Why StatMine?

– Online StatLine contains more than one billion (109) facts‐ Policy makers‐ Journalists‐ Citizens‐ Enterprises‐ Economists‐ Social scientist‐ Historicians‐ etc

StatMine 0.2 3

Page 4: StatMine

StatMine 4

Problem 1Numbers ≠ Information

Page 5: StatMine

1. Numbers ≠ Information

We know from a user study that:1. Many interesting patterns in StatLine are not

spotted by users

2. Many important topics in StatLine are scattered across multiple tables

StatMine 0.2 5

Page 6: StatMine

StatMine 6

H1:Data analysis= Data insight

Page 7: StatMine

H1. Data insight

Goal of StatMine 0.1 was to provide more insight StatLine numbers by

• Presenting these facts visually and interactively

• We tested this succesfully on 4 “difficult StatLine tables.

StatMine 0.2 7

Page 8: StatMine

StatMine 0.2 8

Bar chart

- compare

Line chart

- development

Bubble/scatter chart

- correlationMosaic chart

- structure

Page 9: StatMine

an exploration of dissemination data: StatMine 9

Chart type – bar chart

Page 10: StatMine

StatMine 0.2 10

Small multiples?

Page 11: StatMine

StatMine 0.2 11

Page 12: StatMine

Demo

an exploration of dissemination data: StatMine 12

Page 13: StatMine

StatMine 0.1 Results

Tested on 25 users:

Findings:- Test persons think that visualizing data

adds value (small multiples)- Data owners look at their data

differently- They want this tool to check their data

before publication.StatMine 0.2 13

Page 14: StatMine

StatMine 14

Problem 2:Fragmented Information

Page 15: StatMine

2. Fragmented information

Most information in StatLine is fragmented:

‐ Energy consumption wrt economic growth‐ Perceived public safety wrt registered crime

– Users currently need to look into multiple tables and combine the information by hand. Gebruiker moet in meerdere tabellen kijken en informatie zelf combineren

StatMine 0.2 15

Page 16: StatMine

StatMine 16

2. Merge data!

Page 17: StatMine

H2. Table joining

Goal StatMine 0.2: create more insight by:

- Letting users combine tables- Condition: share at least one

column/data dimension.- Tested on small set of tables.

StatMine 0.2 17

Page 18: StatMine

StatMine 0.2 Results

Test persons: 20 internal, 40 external (policy makers, journalists).

Findings:- External users enthousiast about

visual possibilities StatMine- Joining of data fills a user need.

StatMine 0.2 18

Page 19: StatMine

StatMine 19

Problem 3Statistical numbers are uncertain

Page 20: StatMine

H3. Confidence intervals

– Al facts Statistics Netherlands have confidence interval

– European Statistics Code of Practice (12.2): ‐ “sampling and non sampling errors should

be systematically documented”

Goal StatMine 0.3:

Investigate how uncertainty in numbers can be presented understandable to users.

StatMine 20

Page 21: StatMine

Restricted to:‐ How do users interpret CI’s? And what does

that affect the interpretation of facts?‐ Do users need CI’s?

Assumption: ‐ For test data set of point estimate with CI

available

StatMine 0.3

StatMine 0.2 21

Page 22: StatMine

User test (100+) with synthetic data shows that:

‐ CI’s improve validity of user statements (they are more correct)

User test CI’s

StatMine 0.2 22

Page 23: StatMine

StatMine 0.3

– Prototype StatMine 0.3:‐ Show uncertainty in Line Charts‐ Bar Charts‐ Tested on 25 test persons.

23

Page 24: StatMine

Line charts with uncertainty

24

Page 25: StatMine

Bar charts with uncertainty

25

Page 26: StatMine

StatMine 0.4

–Build on CBS open data API–Will be public–Currently in beta test, ETA (2014 Q3)

26

Page 27: StatMine

Questions?

27