Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
15th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019
Data scientists vs. Statisticians: Lessons learned after two years of practical experienceProf. Dr. Bertrand Loison, Swiss Federal Statistical Office, Vice-Director
5th International Conference on Big Data for Official Statistics
Session “Data Science and Capacity Development in Official Statistics”
Kigali, Rwanda, 3 Mai 2019
25th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019
Two years of practical experiences (2017 – 2019)
GL-Antrag
01.01.2017 31.12.201901.04.2017 01.07.2017 01.10.2017 01.01.2018 01.04.2018 01.07.2018 01.10.2018 01.01.2019 01.04.2019 01.07.2019 01.10.2019
Meilleures pratiques internationales Un Statistical Commission – UN Global Working Group on Big Data for Official Statistics (https:// unstats.un.org/bigdata/) Rapports de voyages GWG Big
Data
DélivrablesVersion 1.0 / 31.10.2017
FEUILLE DE ROUTE POUR L’IMPLEMENTATION AU SEIN DE L’OFS DE LA STRATEGIE D’INNOVATION SUR LES DONNEES
Stra
tégi
e (S
)
Définition de la stratégie (S1)
Phase pilote de mise en oeuvre (S2)
Evaluation de la phase pilote (S3)
Stratégie Data Innovation 2020 – 2023 (S1.1 – S1.4)
Reporting devant la GL et rapport de fin projet (S2.1 – S2.2)
Rapport d’évaluation (S3.1)
Com
péte
nces
(C)
Soutien «On the job» (C1)
Formation «Off the job» (C2)
Evaluation de la phase pilote (C3)
Plan de soutien «On the job» (C1.1 – C1.2)
Concept de formation «Off the job» (C2.1 – C2.2)
Inventaire des sources intenes dans SMS (DI1.1 – DI1.2)
Liste des data brokers intéressants pour l’OFS (DI2.1)
Draft des nécessaires modifications des bases légales
(DI3.1)
Infrastructure pilote + besoin métiers pour SIP (DI4.1 – DI4.2)
Com
mun
icat
ion
(CO
)
Communication interne (CO1)
Communication externe (CO2)
Rapport d’évaluation (DI5.1)
Reporting, workshops et articles (CO1.1 – CO1.10)
1er Datathon OFS 2019, communications (CO2.1 – 2.8)
Don
nées
et i
nfra
stru
ctur
e (D
I)
Accès aux données internes (DI1)
Accès aux données externes DI2)
Bases légales (DI3)
Plateforme informatique (DI4)
Evaluation de la phase pilote (DI5)
Rapport d’évaluation (C3.1)
Définition de la stratégie 2017-2019 (S1.1)Définition de la feuille de route 2017-2019 (S1.2)
Définition de la stratégie 2020-2023 (S1.3)Définition de la feuille de route 2020-2023 (S1.4)
Cohérence avec le MJP 2020 - 2023
Identification des projets pilotes (S2.1)
01.01.2017 - 31.03.2017
T1
01.04.2017 - 30.06.2017
T2
01.07.2017 - 30.09.2017
T3
01.10.2017 - 31.12.2017
T4
01.01.2018 - 31.03.2018
T1
01.04.2018 - 30.06.2018
T2
01.07.2018 - 30.09.2018
T3
01.10.2018 - 31.12.2018
T4
01.01.2019 - 31.03.2019
T1
01.04.2019 - 30.06.2019
T2
01.07.2019 - 30.09.2019
T3
01.10.2019 - 31.12.2019
T4
Projets pilotes (1 par division BB, WI, GS, RU, REG) (S2.2)
Evaluation des projets pilotes (S3.1)
Plan de soutien est prêt (C1.1)
Soutien spécifique du personnel faisant partie des projets pilotes (C1.2)
Soutien des membres du comité de direction concernant les approches relatives à la science des données. (C2.1)
Définition de la stratégie 2020-2023 (S1.3)Définition de la feuille de route 2020-2023 (S1.4)
Inventaire de base des sources de données internes. (DI1.1)
Saisie des métadonnées dans SMS (DI1.2)
Mise en place de l’infrastructure pilot (DI4.1)
Besoins métiers formulés (DI4.2)
Evaluation de l’infrastructure pilote (DI5.1)
Préparation de la présentation lors des JSS 2017 (CO1.1)
Mobilisation de la communauté (CO2.2)Conception Datathon 2019 (CO2.1) Communication concernant les
projets pilotes (CO3.1)
GL-Reporting Kick-off interne Communication publique Assemblée des cadres Journal interne Regiostat Fedestat
Workshop avec les cadres de chaque division métier
Soutien donné par Statoo Consulting et METH
Soutien donné par Statoo Consulting et METH
Présentation de la stratégie dans la domaine public 1er Datathon OFS 2019 Articles scientifiques
Inventaire des données consolidées présentent ou pas dans la KD
Infrastructure Open Source A intégrer dans la SIP 2020-2023
Identification du potentiel des data brokers comme «livreurs/fournisseurs» de données (D2.1)
Identification du potentiel des data brokers comme «livreurs/fournisseurs» de données (D2.1)
Conception d’un module de formation (C2.2)
Intégration dans la formation officielle EducStat
Aquisition d’expérience pratique. Input pour l’élaboration du MJP 2020 - 2023
+ + =
Sources:Data Innovation Strategy 1.0 : https://www.bfs.admin.ch/bfs/en/home/news/whats-new.gnpdetail.2017-0673.htmlPilot Projects : https://www.experimental.bfs.admin.ch/en/
35th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019
Agenda
1. What did we do ?
2. What are our experiences ?
3. What will be the next major step ?
45th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019
We have created an ad-hoc agile organization inside FSO - #1
Rules1. Ad-hoc organization is under
the final responsibility of a board member.
2. Members of the ad-hoc organization have been stayed subordinated in their line organization !
3. Five pilot projects are managed by the ad-hoc organization. The final goal is to go into production at the end of 2019.
26 people (Statisticians, Methodologists, IT Specialists)aged between 26 and 58 years old !
55th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019
1. We have created an ad-hoc agile organization inside FSO - #2
Pilotproject X
Analysts Methodologists
IT Specialists
Source: European Commission, Open Data Maturity in Europe 2016, Figure 32, October 2016 (goo.gl/WPHR54)
• Classical hierarchy that needed to be forgotten…• High degree of autonomy was and is still required…• Choosing the right people at the beginning was not easy…
65th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019
2. We have complemented their skills with off- and on-the-job teaching - #1
Off-the-job (9 x 1 day = 9 days dispatched over 3 months)
• R for Data Science• Python for Data Science• Selected topics in data preparation/preprocessing• Dimensionality reduction (e.g. PCA)• Clustering (e.g. k-means)• General introduction to supervised methods• Regression (e.g. linear, logistic)• Classification (e.g. k-NN, support vector machines)
• Decision trees• Bayesian learning• Neural networks• Deep neural networks (e.g. CNN,
RNN and LSTM)• Model evaluation • Feature selection
75th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019
2. We have complemented their skills with off- and on-the-job teaching - #2
On-the-job (over 2 years)
Scientific advisors • Professor for Data Science (University of Geneva) and the FSO’s deputy head of
methods have accompanied/supervised the five pilot projects and validate the results from the scientific point of view.
Team of “Data Scientists” • Key idea is to share/transfer the skills and the knowledge inside the FSO Organization. • Trying to avoid an organizational split between “traditional statisticians” and “modern
statisticians or data scientists”.
85th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019
Agenda
1. What did we do ?
2. What are our experiences ?
3. What will be the next major step ?
95th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019
What are our experiences ? - #1
Hard skills• Math & statistical skill, Technical skill and subject matter can be learned. These skills are
not always easy to acquire but it seems not to be the hard part of the game! Up to 2020 we will integrate a standard curriculum “Data Scientist” in our internal teaching
program. We will collaborate more with Universities to define the exact content.
Soft Skills• Soft skills (e.g. collaboration, creativity, problem solving, communication, story telling)
need a change in the mindset of all employees. These skills are not so easy to teach… they primary need to be practiced on a daily basis Up to 2020 we will open a new mandatory curriculum “Agile Organization”.
105th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019
What are our experiences ? - #2
Academic curricula• Swiss Universities offer standards curricula in data science. The content of these curricula
often does not match with the FSO’s expectations. FSO does not really need “Supermen or Superwomen” that can cover all the production
process on his own.
National Statistical Institute as employer• Recruiting Data Scientist for a NSI likes FSO is not easy. Salaries are not so attractive
and the tasks that we can offer to a young data scientist do not offer the expected variety and freedom that they are looking for.
FSO wants to empower its employees and not to replace them with Data Scientist!
115th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019
Agenda
1. What did we do ?
2. What are our experiences ?
3. What will be the next major step ?
125th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019
What will be the next major step ?
Version 2.0 of our Data Innovation StrategyThe FSO is currently defining a version 2.0 of its data innovation strategy for the years 2020 - 2023. The creation of an official Data Innovation Lab and a Competence Center for Data Science will (probably) be at the heart of the new strategy.
Strategic objective 1: To create a central, cross-departmental and independent "Data Innovation Lab" (DIL) with a focus on data innovation services, i.e. the application of “complementary analytics methods” to data (and not to the type of data source and/or technology), for the FSO (I), the whole Swiss public statistics system (II) and the whole federal administration (III).
135th International Conference on Big Data for Official Statistics | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | 3 Mai 2019
Questions & Answers