Upload
proto204
View
55
Download
2
Embed Size (px)
Citation preview
1
Université Paris-Saclay / CNRSBALÁZS KÉGL
Center for Data ScienceParis-Saclay
RAMP DATA CHALLENGES WITH
MODULARIZATION AND CODE SUBMISSION
• Senior researcher CNRS
• machine learning (20 years) interfacing with particle physics (10 years)
• Head of the Paris-Saclay Center for Data Science
• interfacing with biology, economy, climatology, chemistry, etc. (4 years)
2
WHO AM I?Balázs Kégl
Center for Data ScienceParis-SaclayB. Kégl (CNRS)
Biology & bioinformaticsIBISC/UEvry LRI/UPSudHepatinovCESP/UPSud-UVSQ-Inserm IGM-I2BC/UPSud MIA/AgroMIAj-MIG/INRALMAS/Centrale
ChemistryEA4041/UPSud
Earth sciencesLATMOS/UVSQ GEOPS/UPSudIPSL/UVSQLSCE/UVSQLMD/Polytechnique
EconomyLM/ENSAE RITM/UPSudLFA/ENSAE
NeuroscienceUNICOG/InsermU1000/InsermNeuroSpin/CEA
Particle physics astrophysics & cosmologyLPP/Polytechnique DMPH/ONERACosmoStat/CEAIAS/UPSudAIM/CEALAL/UPSud
The Paris-Saclay Center for Data ScienceData Science for scientific Data
250 researchers in 35 laboratories
Machine learningLRI/UPSud LTCI/TelecomCMLA/Cachan LS/ENSAELIX/PolytechniqueMIA/AgroCMA/PolytechniqueLSS/SupélecCVN/Centrale LMAS/CentraleDTIM/ONERAIBISC/UEvry
VisualizationINRIALIMSI
Signal processingLTCI/TelecomCMA/PolytechniqueCVN/CentraleLSS/SupélecCMLA/CachanLIMSIDTIM/ONERA
StatisticsLMO/UPSud LS/ENSAELSS/SupélecCMA/PolytechniqueLMAS/CentraleMIA/AgroParisTech
Data sciencestatistics
machine learninginformation retrieval
signal processingdata visualization
databases
Domain sciencehuman society
life brain earth
universe
Tool buildingsoftware engineering
clouds/gridshigh-performance
computingoptimization
Data scientist
Applied scientist
Domain scientist
Data engineer
Software engineer
Center for Data ScienceParis-Saclay
datascience-paris-saclay.fr
@SaclayCDS
LIST/CEA
3
Center for Data ScienceParis-Saclay
A multi-disciplinary initiative, building interfaces, matching people, helping them launching projects
345 affiliated researchers, 50 laboratories
Center for Data ScienceParis-SaclayB. Kégl (CNRS) 4
Data scientist
Data Value Architect
Domain expertSoftware engineer
Data engineer
Tool building Data domains
Data sciencestatistics
machine learning information retrieval
signal processing data visualization
databases
software engineeringclouds/grids
high-performancecomputing
optimization
energy and physical sciences health and life sciences Earth and environment
economy and society brain
THE DATA SCIENCE ECOSYSTEMhttps://medium.com/@balazskegl
7Olga Kokshagina 2015
INTERMEDIARIES: THE GROWING INTEREST FOR « CROWDS » - > EXPLOSION OF TOOLS
! Crowdsourcing ! is a model leveraging
on novel technologies (web 2.0, mobile apps, social networks)
! To build content and a
structured set of information by gathering contributions from large groups of individuals
5
9
We decided to turn them into a tool for
1. Collaborative prototyping 2. Teaching aid 3. Data science process management
12
Code submission
1. lets us deliver a working prototype 2. lets the participants collaborate 3. makes the backend challenging to
run (cloud management)
Center for Data ScienceParis-SaclayB. Kégl (CNRS)
RAPID ANALYTICS AND MODEL PROTOTYPING
13
functional data, time series, data augmentation, deep learning, learning on simulations, nonstandard and
multi-objective losses
17
RAMP.STUDIO DATA CHALLENGE WITH CODE SUBMISSION
21 challenges 39 events 1205 users
5952 predictive models
18
RAMP.STUDIO DATA CHALLENGE WITH CODE SUBMISSION
12 hackathons 6 remote data challenges
11 course data camps
Center for Data ScienceParis-SaclayB. Kégl (CNRS)
CLASSIFYING AND REGRESSING ON MOLECULAR SPECTRA
19
chemotherapy drug in elastic pocket
laser spectrometer
molecular spectra
feature extractor 1
feature extractor 2
regressor
concentration
classifier
drug type
Center for Data ScienceParis-SaclayB. Kégl (CNRS)
FORECASTING EL NINO SIX MONTHS AHEAD
21
… 300.14 299.83 298.76 299.87 299.82 300.15 300.10 299.50… …
time series feature extractor
x (a fixed length feature vector)regressor
Center for Data ScienceParis-SaclayB. Kégl (CNRS) 22
what you achieved with a well tuned deep net
the diversity gap
the human blender gap
competitive phase
collaborative phase
THE POWER OF THE (COLLABORATING) CROWD
Center for Data ScienceParis-SaclayB. Kégl (CNRS)
OPEN PHASE LETS PARTICIPANTS CATCH UP THE GOAL OF TEACHING
24
26
ONGOING CHALLENGES: CLASSIFY POLLENATING INSECTS
4.5K€ for the competitive phase 3K€ for the collaborative phase 50 GPU hours per participant
https://www.ramp.studio/events/pollenating_insects_3_JNI_2017
34
Université Paris-Saclay / CNRSBALÁZS KÉGL
Center for Data ScienceParis-Saclay
RAMP DATA CHALLENGES WITH
MODULARIZATION AND CODE SUBMISSION
FRUGAL DATA SCIENCE PROCESS MANAGEMENT
35
data flowTHE DATA FLOW
X
data
con
nect
ors
CALFE CLF
predictive workflow
full automation production
dashboard decision support
ypred
36
POC report
expert labeler amazon turk
simulator
y
ypred score type
score
cross-validation schemeRAMP
full automation production
dashboard decision support
data flow“works on”
data
con
nect
ors
X
data scientists
CALFE CLF
workflow
business unit domain scientists
data value architects
data engineers
THE IDEAL SEQUENCE
• toolkit: https://github.com/paris-saclay-cds/ramp-workflow
• for designing workflows
• set of ready-made metrics, workflows, CV schemes, data readers
• unique command-line test script
• examples: https://github.com/ramp-kits
• a zoo of problems, experiments, workflows
• (at least) one initial solution
37
RAMP-WORKFLOW & RAMP-KITS
38
A SINGLE SCRIPT TO DEFINE THE BUNDLE
X ypred score type
score
cross-validation scheme
data
con
nect
ors
FE CLF
workflow
39
A SINGLE EXECUTABLE TO TEST THE SUBMISSIONS
• Keep your different submissions in a simple file structure
• Communicate them on git
• Execute them also from the notebook
40
You can
1. Participate in upcoming RAMPs 2. Use RAMP in teaching or training 3. Use the toolkit for your own workflows 4. Submit it to us if you want to run a data
challenge
41
toolkit: github.com/paris-saclay-cds/ramp-workflow
examples: github.com/ramp-kits
blogs: medium.com/@balazskegl
slack: ramp-studio.slack.com
frontend: www.ramp.studio
mail: [email protected]