28
Report: Chantal Brakus & Piet Daas ESSnet Workshop #BigData WP 8: Methodology

ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

Report: Chantal Brakus & Piet Daas

ESSnet Workshop #BigData WP 8: Methodology

Page 2: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

Goal workshop April 25 &26 @Heerlen

The goal of the workshop is to select, prioritize and discuss

the most important topics in the area of :

– Methodology

– Quality and

– IT

when using big data for official statistics; i.e. the topics on

which WP8 focusses.

2

Page 3: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

#ESSnet workshop Day 1

Start workshop at Tuesday April 25th from 13:00 to 17:00

List of participants:

Facilitators: Frans Duijsings, Daniël Herbers and Chantal Brakus (CBS)

3

Name Organisation WP

Magdalena Six Statistics Austria 8

Jacek Maślankowski Statistics Poland 7

Luis Sanguiao Statistics Spain 5

Øyvind Langsrud Statistics Norway 4

Jean-Marc Museux Eurostat -

Fernando Reis Eurostat -

Maiki Ilves Statistics Estonia 3

Owen Abbott Office of National Statistics 1

António Portugal Statistics Portugal 7

Dan Wu Statistics Sweden 1

Valentin Chavdarov Statistics Bulgaria 2

Vesna Horvat Statistical Office of the Republic of Slovenia 6

Ciprian Alexandru Statistics Romania 5

Eleni Bisioti Statistics Greece 4

Marc Debusschere Statistics Belgium 9

Marco Puts Statistics Netherlands 8

Piet Daas Statistics Netherlands 8

Martijn Tennekes Statistics Netherlands 5

Page 4: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

Let’s get started with BINGO!

4

Page 5: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

Time 4 real action Visuals (release the artist)

5

Page 6: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

WP1 Webscraping job vacancies

The aim of this pilot is to demonstrate by concrete estimates which approaches (techniques, methodology etc.) are most suitable to produce statistical estimates in the domain of job vacancies and under which conditions these approaches can be used in the ESS. The intention is to explore a mix of sources including job portals, job adverts on enterprise websites, and job vacancy data from third party sources.

6

Page 7: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

WP2 Webscraping enterprise characteristics

The aim of this pilot is to

investigate whether

webscraping, text mining

and inference techniques

can be used to collect,

process and improve

general information

about enterprises.

7

Page 8: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

WP3 Smart meters

The aim of this pilot is to demonstrate by concrete estimates whether buildings equipped with smart meters (= electricity meters which can be read from a distance and measure electricity consumption at a high frequency) can be used to produce energy statistics but can also be relevant as a supplement for other statistics e.g. census housing statistics, household costs, impact on environment, statistisc about energy production.

8

Page 9: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

WP4 AIS data

Aim of this work package is

to investigate whether real-

time measurement data of

ship positions (measured by

the so-called AIS-system) can

be used 1) to improve the

quality and internal

comparability of existing

statistics and 2) for new

statistical products relevant

for the ESS.

9

Page 10: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

WP5 Mobile phone data

The aim of this workpackage is to

investigate how NSIs may obtain

more or less ‘stable’ and

continuous access to the data.

10

Page 11: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

WP6 Early estimates

The aim of this pilot is to

investigate how a

combination of (early

available) multiple Big

Data sources and existing

official statistical data can

be used in order to create

existing or new early

estimates for statistics.

11

Page 12: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

WP7 Multiple domains

The aim of this pilot is to

investigate how a combination of

Big Data sources and existing

official statistical data can be

used to improve current statistics

and create new statistics in

statistical domains.

12

Page 13: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

Presenting the Visuals

13

Page 14: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

Time 2 brainstorm – Methodology

The notes on the board are

the results of a 1 hour

brainstorm on

Methodological issues/topics

when using Big Data for

Statistics

Participants: Fernando, Marc,

Martijn, Luis and Valentin.

14

Page 15: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

Time 2 brainstorm – Quality

The notes on the board are

the results of a 1 hour

brainstorm on Quality

issues/topics when using Big

Data for Statistics

Participants: Magdalena,

Maiki, Owen, Piet, Øvind and

Vesna.

15

Page 16: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

Time 2 brainstorm – IT

The notes on the board are

the results of a 1 hour

brainstorm on Quality

issues/topics when using Big

Data for Statistics

Participants: Jean-Marc,

Eleni, Marco, Antonio, Dan,

Ciprian and Jacek.

16

Page 17: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

Time 2 socialise: table 1 #dinner #mijnstreek #heerlen

17

Page 18: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

Time 2 socialise : table 2 #dinner #mijnstreek #heerlen

18

Page 19: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

#ESSnet workshop Day 2

Start workshop at Wednesday April 26th from 9:00 to 12:00

List of participants:

Facilitators: Frans Duijsings, Daniël Herbers and Chantal Brakus (CBS)

19

Name Organisation WP

Magdalena Six Statistics Austria 8

Jacek Maślankowski Statistics Poland 7

Luis Sanguiao Statistics Spain 5

Øyvind Langsrud Statistics Norway 4

Jean-Marc Museux Eurostat -

Fernando Reis Eurostat -

Maiki Ilves Statistics Estonia 3

Owen Abbott Office of National Statistics 1

António Portugal Statistics Portugal 7

Dan Wu Statistics Sweden 1

Valentin Chavdarov Statistics Bulgaria 2

Vesna Horvat Statistical Office of the Republic of Slovenia 6

Ciprian Alexandru Statistics Romania 5

Eleni Bisioti Statistics Greece 4

Marc Debusschere Statistics Belgium 9

Marco Puts Statistics Netherlands 8

Piet Daas Statistics Netherlands 8

Page 20: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

Recap day 1: Many ideas Check for missing topics (1)

20

Page 21: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

Recap day 1: Many ideas Check for missing topics (2)

21

Page 22: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

IT brainstorm adding structure!

22

Grouping IT Topics: Data source Data storage Data processing

Page 23: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

Priorities in Big Data Methodology

23

– Assessing accuracy – What should our final product look like? – Deal with spatial dimension – Changes in datasources – Machine learning in official statistics – Data linkage – Secure multi-party computation – Inference – Sampling – Who processes data-architecture – Unit identification problem

Page 24: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

Priorities in Big Data Quality

24

– Coverage

– Comparability over time

– Processing errors

– Process chain control

– Linkability

– Measurement error

– Model errors and Precision

Page 25: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

Priorities in Big Data IT

25

– Metadata management (ontology) – Big Data Processing Life Cycle – Format of Big Data processing/unified

framework (languages/libraries) – Datahub (access 2 Big Data) – Data source integration – Chosing the right infrastructure – List of secure and tested API’s – Shared Libraries and standards to document – Data-lakes (link with traditional sources) – Training/skills/knowledge – Speed of algorithmes

Page 26: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

Linking the dots: Priorities in Big Data

26

Page 27: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

Overview: Priorities in Big Data WP 8

27

Page 28: ESSnet Workshop #BigData WP 8: Methodology · Fernando Reis Eurostat - Maiki Ilves Statistics Estonia 3 Owen Abbott Office of National Statistics 1 António Portugal Statistics Portugal

What’s next?

– Report of the workshop

– Put results on Wiki

– Link outcomes to WP’s

– Link outcomes to results of other international

projects

– Check with WP-leaders

– Think, discuss and write

– Write deliverables.

28