13
1 5th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019 Data scientists vs. Statisticians: Lessons learned after two years of practical experience Prof. Dr. Bertrand Loison, Swiss Federal Statistical Office, Vice-Director 5 th International Conference on Big Data for Official Statistics Session “Data Science and Capacity Development in Official Statistics” Kigali, Rwanda, 3 Mai 2019

Data scientists vs. Statisticians: Lessons learned after two years … · 2019-05-09 · Session “Data Science and Capacity Development in Official Statistics” Kigali, Rwanda,

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data scientists vs. Statisticians: Lessons learned after two years … · 2019-05-09 · Session “Data Science and Capacity Development in Official Statistics” Kigali, Rwanda,

15th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019

Data scientists vs. Statisticians: Lessons learned after two years of practical experienceProf. Dr. Bertrand Loison, Swiss Federal Statistical Office, Vice-Director

5th International Conference on Big Data for Official Statistics

Session “Data Science and Capacity Development in Official Statistics”

Kigali, Rwanda, 3 Mai 2019

Page 2: Data scientists vs. Statisticians: Lessons learned after two years … · 2019-05-09 · Session “Data Science and Capacity Development in Official Statistics” Kigali, Rwanda,

25th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019

Two years of practical experiences (2017 – 2019)

GL-Antrag

01.01.2017 31.12.201901.04.2017 01.07.2017 01.10.2017 01.01.2018 01.04.2018 01.07.2018 01.10.2018 01.01.2019 01.04.2019 01.07.2019 01.10.2019

Meilleures pratiques internationales Un Statistical Commission – UN Global Working Group on Big Data for Official Statistics (https:// unstats.un.org/bigdata/) Rapports de voyages GWG Big

Data

DélivrablesVersion 1.0 / 31.10.2017

FEUILLE DE ROUTE POUR L’IMPLEMENTATION AU SEIN DE L’OFS DE LA STRATEGIE D’INNOVATION SUR LES DONNEES

Stra

tégi

e (S

)

Définition de la stratégie (S1)

Phase pilote de mise en oeuvre (S2)

Evaluation de la phase pilote (S3)

Stratégie Data Innovation 2020 – 2023 (S1.1 – S1.4)

Reporting devant la GL et rapport de fin projet (S2.1 – S2.2)

Rapport d’évaluation (S3.1)

Com

péte

nces

(C)

Soutien «On the job» (C1)

Formation «Off the job» (C2)

Evaluation de la phase pilote (C3)

Plan de soutien «On the job» (C1.1 – C1.2)

Concept de formation «Off the job» (C2.1 – C2.2)

Inventaire des sources intenes dans SMS (DI1.1 – DI1.2)

Liste des data brokers intéressants pour l’OFS (DI2.1)

Draft des nécessaires modifications des bases légales

(DI3.1)

Infrastructure pilote + besoin métiers pour SIP (DI4.1 – DI4.2)

Com

mun

icat

ion

(CO

)

Communication interne (CO1)

Communication externe (CO2)

Rapport d’évaluation (DI5.1)

Reporting, workshops et articles (CO1.1 – CO1.10)

1er Datathon OFS 2019, communications (CO2.1 – 2.8)

Don

nées

et i

nfra

stru

ctur

e (D

I)

Accès aux données internes (DI1)

Accès aux données externes DI2)

Bases légales (DI3)

Plateforme informatique (DI4)

Evaluation de la phase pilote (DI5)

Rapport d’évaluation (C3.1)

Définition de la stratégie 2017-2019 (S1.1)Définition de la feuille de route 2017-2019 (S1.2)

Définition de la stratégie 2020-2023 (S1.3)Définition de la feuille de route 2020-2023 (S1.4)

Cohérence avec le MJP 2020 - 2023

Identification des projets pilotes (S2.1)

01.01.2017 - 31.03.2017

T1

01.04.2017 - 30.06.2017

T2

01.07.2017 - 30.09.2017

T3

01.10.2017 - 31.12.2017

T4

01.01.2018 - 31.03.2018

T1

01.04.2018 - 30.06.2018

T2

01.07.2018 - 30.09.2018

T3

01.10.2018 - 31.12.2018

T4

01.01.2019 - 31.03.2019

T1

01.04.2019 - 30.06.2019

T2

01.07.2019 - 30.09.2019

T3

01.10.2019 - 31.12.2019

T4

Projets pilotes (1 par division BB, WI, GS, RU, REG) (S2.2)

Evaluation des projets pilotes (S3.1)

Plan de soutien est prêt (C1.1)

Soutien spécifique du personnel faisant partie des projets pilotes (C1.2)

Soutien des membres du comité de direction concernant les approches relatives à la science des données. (C2.1)

Définition de la stratégie 2020-2023 (S1.3)Définition de la feuille de route 2020-2023 (S1.4)

Inventaire de base des sources de données internes. (DI1.1)

Saisie des métadonnées dans SMS (DI1.2)

Mise en place de l’infrastructure pilot (DI4.1)

Besoins métiers formulés (DI4.2)

Evaluation de l’infrastructure pilote (DI5.1)

Préparation de la présentation lors des JSS 2017 (CO1.1)

Mobilisation de la communauté (CO2.2)Conception Datathon 2019 (CO2.1) Communication concernant les

projets pilotes (CO3.1)

GL-Reporting Kick-off interne Communication publique Assemblée des cadres Journal interne Regiostat Fedestat

Workshop avec les cadres de chaque division métier

Soutien donné par Statoo Consulting et METH

Soutien donné par Statoo Consulting et METH

Présentation de la stratégie dans la domaine public 1er Datathon OFS 2019 Articles scientifiques

Inventaire des données consolidées présentent ou pas dans la KD

Infrastructure Open Source A intégrer dans la SIP 2020-2023

Identification du potentiel des data brokers comme «livreurs/fournisseurs» de données (D2.1)

Identification du potentiel des data brokers comme «livreurs/fournisseurs» de données (D2.1)

Conception d’un module de formation (C2.2)

Intégration dans la formation officielle EducStat

Aquisition d’expérience pratique. Input pour l’élaboration du MJP 2020 - 2023

+ + =

Sources:Data Innovation Strategy 1.0 : https://www.bfs.admin.ch/bfs/en/home/news/whats-new.gnpdetail.2017-0673.htmlPilot Projects : https://www.experimental.bfs.admin.ch/en/

Page 3: Data scientists vs. Statisticians: Lessons learned after two years … · 2019-05-09 · Session “Data Science and Capacity Development in Official Statistics” Kigali, Rwanda,

35th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019

Agenda

1. What did we do ?

2. What are our experiences ?

3. What will be the next major step ?

Page 4: Data scientists vs. Statisticians: Lessons learned after two years … · 2019-05-09 · Session “Data Science and Capacity Development in Official Statistics” Kigali, Rwanda,

45th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019

We have created an ad-hoc agile organization inside FSO - #1

Rules1. Ad-hoc organization is under

the final responsibility of a board member.

2. Members of the ad-hoc organization have been stayed subordinated in their line organization !

3. Five pilot projects are managed by the ad-hoc organization. The final goal is to go into production at the end of 2019.

26 people (Statisticians, Methodologists, IT Specialists)aged between 26 and 58 years old !

Page 5: Data scientists vs. Statisticians: Lessons learned after two years … · 2019-05-09 · Session “Data Science and Capacity Development in Official Statistics” Kigali, Rwanda,

55th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019

1. We have created an ad-hoc agile organization inside FSO - #2

Pilotproject X

Analysts Methodologists

IT Specialists

Source: European Commission, Open Data Maturity in Europe 2016, Figure 32, October 2016 (goo.gl/WPHR54)

• Classical hierarchy that needed to be forgotten…• High degree of autonomy was and is still required…• Choosing the right people at the beginning was not easy…

Page 6: Data scientists vs. Statisticians: Lessons learned after two years … · 2019-05-09 · Session “Data Science and Capacity Development in Official Statistics” Kigali, Rwanda,

65th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019

2. We have complemented their skills with off- and on-the-job teaching - #1

Off-the-job (9 x 1 day = 9 days dispatched over 3 months)

• R for Data Science• Python for Data Science• Selected topics in data preparation/preprocessing• Dimensionality reduction (e.g. PCA)• Clustering (e.g. k-means)• General introduction to supervised methods• Regression (e.g. linear, logistic)• Classification (e.g. k-NN, support vector machines)

• Decision trees• Bayesian learning• Neural networks• Deep neural networks (e.g. CNN,

RNN and LSTM)• Model evaluation • Feature selection

Page 7: Data scientists vs. Statisticians: Lessons learned after two years … · 2019-05-09 · Session “Data Science and Capacity Development in Official Statistics” Kigali, Rwanda,

75th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019

2. We have complemented their skills with off- and on-the-job teaching - #2

On-the-job (over 2 years)

Scientific advisors • Professor for Data Science (University of Geneva) and the FSO’s deputy head of

methods have accompanied/supervised the five pilot projects and validate the results from the scientific point of view.

Team of “Data Scientists” • Key idea is to share/transfer the skills and the knowledge inside the FSO Organization. • Trying to avoid an organizational split between “traditional statisticians” and “modern

statisticians or data scientists”.

Page 8: Data scientists vs. Statisticians: Lessons learned after two years … · 2019-05-09 · Session “Data Science and Capacity Development in Official Statistics” Kigali, Rwanda,

85th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019

Agenda

1. What did we do ?

2. What are our experiences ?

3. What will be the next major step ?

Page 9: Data scientists vs. Statisticians: Lessons learned after two years … · 2019-05-09 · Session “Data Science and Capacity Development in Official Statistics” Kigali, Rwanda,

95th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019

What are our experiences ? - #1

Hard skills• Math & statistical skill, Technical skill and subject matter can be learned. These skills are

not always easy to acquire but it seems not to be the hard part of the game! Up to 2020 we will integrate a standard curriculum “Data Scientist” in our internal teaching

program. We will collaborate more with Universities to define the exact content.

Soft Skills• Soft skills (e.g. collaboration, creativity, problem solving, communication, story telling)

need a change in the mindset of all employees. These skills are not so easy to teach… they primary need to be practiced on a daily basis Up to 2020 we will open a new mandatory curriculum “Agile Organization”.

Page 10: Data scientists vs. Statisticians: Lessons learned after two years … · 2019-05-09 · Session “Data Science and Capacity Development in Official Statistics” Kigali, Rwanda,

105th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019

What are our experiences ? - #2

Academic curricula• Swiss Universities offer standards curricula in data science. The content of these curricula

often does not match with the FSO’s expectations. FSO does not really need “Supermen or Superwomen” that can cover all the production

process on his own.

National Statistical Institute as employer• Recruiting Data Scientist for a NSI likes FSO is not easy. Salaries are not so attractive

and the tasks that we can offer to a young data scientist do not offer the expected variety and freedom that they are looking for.

FSO wants to empower its employees and not to replace them with Data Scientist!

Page 11: Data scientists vs. Statisticians: Lessons learned after two years … · 2019-05-09 · Session “Data Science and Capacity Development in Official Statistics” Kigali, Rwanda,

115th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019

Agenda

1. What did we do ?

2. What are our experiences ?

3. What will be the next major step ?

Page 12: Data scientists vs. Statisticians: Lessons learned after two years … · 2019-05-09 · Session “Data Science and Capacity Development in Official Statistics” Kigali, Rwanda,

125th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019

What will be the next major step ?

Version 2.0 of our Data Innovation StrategyThe FSO is currently defining a version 2.0 of its data innovation strategy for the years 2020 - 2023. The creation of an official Data Innovation Lab and a Competence Center for Data Science will (probably) be at the heart of the new strategy.

Strategic objective 1: To create a central, cross-departmental and independent "Data Innovation Lab" (DIL) with a focus on data innovation services, i.e. the application of “complementary analytics methods” to data (and not to the type of data source and/or technology), for the FSO (I), the whole Swiss public statistics system (II) and the whole federal administration (III).

Page 13: Data scientists vs. Statisticians: Lessons learned after two years … · 2019-05-09 · Session “Data Science and Capacity Development in Official Statistics” Kigali, Rwanda,

135th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019

Questions & Answers