Upload
kyle-walsh
View
222
Download
4
Tags:
Embed Size (px)
Citation preview
December 5, 2012
MASTODONS meeting
Laboratoire de l’Accélérateur Linéaire (LAL) / CNRS & Université Paris SudLaboratoire de la Recherche en Informatique (LRI) / CNRS & INRIA & Université Paris SudLaboratoire de l’Informatique du Parallélisme (LIP) / CNRS & INRIA & ENS Lyon & UCB Lyon
PI: Balázs Kégl (LAL) presented by Cécile Germain (LRI)
Data in physics: Large-scale data storage, data management, and data analysis for next generation particle physics experiments
DeePhy
Balázs is at NIPS
Cécile Germain / LRI MASTODONS DeePhy
3
The collaboration
LAL
AppStat
Auger
TAOLRI
CompSci / Statistics
Experimental Physics
ATLAS
ILC/Calice
LIP
Engineering support
Service Informatique
AVALON
Cécile Germain / LRI MASTODONS DeePhy
4
The participants
•Laboratoire de l’Accélérateur Linéaire (LAL)
•Rémi Bardenet (PhD student, AppStat), Djalel Benbouzid (PhD student, AppStat), François-David Collin (IR CNRS, SI), Laurent Duflot (CR CNRS, ATLAS), Diego Garcia Gamez (postdoc, Auger), Michel Jouvin (IR CNRS, SI), Balázs Kégl (CR CNRS, AppStat&Auger), Oleg Lodygensky (IR CNRS, SI), Roman Poeschl (ILC, DR CNRS), David Rousseau (DR CNRS, ATLAS)
•Laboratoire de la Recherche en Informatique (LRI)
•Cécile Germain (Professeur, UPSud, TAO), Tristan Glatard (CR CNRS, Laboratoire CREATIS), Michèle Sebag (DR CNRS, TAO)
•Laboratoire de l’Informatique du Parallélisme (LIP)
•Simon Delamare (IR CNRS), Gilles Fedak (CR INRIA), Laurent Lefêvre (CR INRIA)
Cécile Germain / LRI MASTODONS DeePhy
5
Projects
•ANR Siminole: 2010-2014, LAL/LRI/Telecom ParisTech
•Large-scale simulation-based probabilistic inference, optimization, and discriminative learning with applications in experimental physics
•ANR MapReduce: ??
•MRM Grille Paris Sud: 2010-2014, LAL/LRI
•FP7 EDGI/DEGISCO: ??
•...
Cécile Germain / LRI MASTODONS DeePhy
6
Meetings
•Regular (phone) meetings between the PIs
•Project meeting, November 23
•Clouds pour le Calcul Scientifique: November 27-28, 2012 LAL@Orsay
•The First International Workshop on BigData in Science: Systems, Infrastructures and Applications (BIGDATA'2013): October 3-4, 2013 ICPP, Lyon, FRANCE
Cécile Germain / LRI MASTODONS DeePhy
7
Budget 2012
• LAL
• LRI
• LIP
Cécile Germain / LRI MASTODONS DeePhy
8
Mission of AppStat@LAL
•Motivate fundamental research by real applications
•Bring state-of-the-art analysis techniques to experimental physics
Computational Statistics
High-energy physics
Data analysis methodology
Real dataMotivation
Where we are in 2012
Cécile Germain / LRI MASTODONS DeePhy
9
Triggers and lean classifiers
D. Benbouzid, R. Busa-Fekete, and B. Kégl, “Fast classification using sparse decision DAGs,”, International Conference on Machine Learning (ICML), 2012
The telescope image The JEM EUSO telescope on the ISS
Trigger = fast classifier: signal vs. background
Boosting works well but produces slow classifiersWe designed a Markov decision graph (MDDAG) algorithm using reinforcement learning
MDDAG can be reused in any test-time constrained problem (object detection, web page ranking), can replace classical cascade designs
Leads to interesting research questions (sparsity, representation, deep learning)
Where we are in 2012
Cécile Germain / LRI MASTODONS DeePhy
10
Adaptive Metropolis for mixture signals
R. Bardenet, O. Cappé; , G. Fort, and B. Kégl, “Adaptive Metropolis with online relabeling,” in International Conference on Artificial Intelligence and Statistics (AISTATS), 2012.
The Auger tank signalCosmic rays and the Pierre Auger observatory
Classical adaptive Metropolis was suboptimal due to symmetries (label switching)
We designed adaptive Metropolis with online relabeling (AMOR) that works well on our problem
AMOR is extendible to any problem involving inference on parametrized mixture signals
Where we are in 2012
Cécile Germain / LRI MASTODONS DeePhy
L’équipe-projet TAO de l’INRIA-Saclay
5 chercheurs ou EC seniors
9 juniors
25 PhD, post-docs et Ingénieurs
Computer-Go : 3 Gold medals, 2010
CMAES : de la théorie de l’optimisation stochatstique aux applications avec EADS, IFP, PSA, SIMINOLE
Autonomics + e-Science: Grid Observatory, Green Computing Observatory avec EGI
Cécile Germain / LRI MASTODONS DeePhy
Digital curation of the behavioural data of the EGI grid: observe and publishwww.grid-observatory.orgSpecific challenges for analysis “unusual” extreme statistics: which metrics? Are our systems stationary? (in fact no)Optimisation, autonomics How to build the knowledge? No Gold
Standard, too rare experts Infer latent causes, eg analyze traces as text files, and more Build credible benchmarks and models, eg piecewise AR instead of long range dependence
The Grid Observatory
Managed ElementES
Monitor
Analyze
Execute
Plan
Knowledge
Autonomic ManagerES
Cécile Germain / LRI MASTODONS DeePhyFailure: it’s not a bug, it’s a feature[D. Feng, C.Germain and T. Glatard. Distributed Monitoring with Collaborative Prediction.
In « 12th IEEE International Symposium on Cluster, Cloud and Grid Computing (CCGrid'12) » 2012]
« A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable »
BDII
LFC
SE
VOMS
CE
N2
N3N1
N4
firewallService/hardware
N5
CEn.…
k.…
n.…
SE SE SE…..
……
ce-hd
bdii lfc voms ….
lcg-cr 1 1 1 1
nmap 1 1 0 0
srm-ls 1 0 1 1
….Dependency
matrix
Collaborative prediction more efficient than detection/diagnosis
Cécile Germain / LRI MASTODONS DeePhyFailure: it’s not a bug, it’s a feature[D. Feng, C.Germain and T. Glatard. Distributed Monitoring with Collaborative Prediction.
In « 12th IEEE International Symposium on Cluster, Cloud and Grid Computing (CCGrid'12) » 2012]
« A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable »
BDII
LFC
SE
VOMS
CE
N2
N3N1
N4
firewallService/hardware
N5
CEn.…
k.…
n.…
SE SE SE…..
……
Probe matrix
Collaborative prediction more efficient than detection/diagnosis
lcg-cp
Cécile Germain / LRI MASTODONS DeePhy
15
BigData and BigComputationScientific challenges for the next 4-5 years
The LHC The EGEE/EGI gridThe ATLAS collaboration
largest: 26 kmhighest energy: 7 TeVcoldest: 1.9 Kemptiest: (check because13 atm is very high pressure)
largest: 200K CPUs, 15 PBmost distributed: 250 sitesbusiest: 1000 jobs/day
largest: 3000 scientistswidest: 38 countries and 174 institutions
Cécile Germain / LRI MASTODONS DeePhy
16
BigData and BigComputationScientific challenges for the next 4-5 years
•Opportunities
•We can apply computationally expensive analysis techniques we could not dream of ten years ago
•The Google cat: deep learning technique running on 16K cores for three days, watching 10M YouTube stills[Le et al., ICML’12]
•Large-scale parallel Monte-Carlo Markov chains or likelihood-free simulation based approximate Bayesian computation [Tavaré et al., 1997]
Cécile Germain / LRI MASTODONS DeePhy
17
BigData and BigComputationScientific challenges for the next 4-5 years
•Challenges
•Large-scale machine learning is a new paradigm:
•statistical overfitting is no longer a danger, underfitting is
•optimization time interferes with the approximation-estimation trade-off, mediocre optimization algorithms like stochastic gradient descent become competitive. [Bottou-Bousquet, 2008, 2011]
•The algorithmic toolbox shrinks considerably: it is often better to use a large data set and a suboptimal but fast algorithm then a small data set and an optimal expensive technique
•Data management and work-flow become crucial, often more important than optimizing the single-thread technique
Cécile Germain / LRI MASTODONS DeePhy
18
Data as a communication tool between Physics and CS
•A well-designed challenge can attract a large number of professional data miners
•Tricky:
•data cleaning and formatting, defining a standardized problem, evaluation metric, scripting, web interface, marketing
•requires the collaboration of physicists, computer scientists, and engineers
•The prize is worthy: objective evaluation of a large number of techniques on a given problem
What we will do in 2013
Cécile Germain / LRI MASTODONS DeePhy
19
The Higgs boson challenge
Observation à plus de 5 sigmad’une particule dont les propriétéssont compatible avec celle du bosonde Higgs Phys. Lett. B 716 (2012) 1-29
Spectre filtré à partir de quelquesmilliards de collisions proton-proton enregistrées
Piètre resolutionPic du signal à coté d’un pic de bruit de fond large et important
Il s’agit maintenant d’établir que la nouvelle particule est vraiment le boson de Higgs qu’on attend en la détectant dans d’autres canaux plus difficiles
What we will do in 2013
Atlas
Signal Higgs
Cécile Germain / LRI MASTODONS DeePhy
20
The Higgs boson challenge
•Challenge 1: estimating the mass of the Higgs boson candidate
•as precisely as possible (at the moment: 20%)
•within a fixed CPU time (0.1s par event)
•Monte-Carlo integration in 5-6 dimensions with constraints
•Adaptive importance sampling, adaptive MCMC, etc.
What we will do in 2013
Cécile Germain / LRI MASTODONS DeePhy
21
The Higgs boson challenge
•Challenge 2: detecting the Higgs boson
•standard classification problem with ~100 features at two classes: signal vs. background
•features are constructed “manually”
•AdaBoost, SVM, neural networks, etc.
•Challenge 3: unsupervised feature extraction
•two-class classification on “raw” input: a variablesize list of particles
•targeting the deep learning community
What we will do in 2013
• Challenges scientifiques du projet à 4 ou 5 ans et affinage éventuel de ces challenges depuis la soumission des projets,
• Organisation du projet, modalités de travail collaboratif, agenda des réunions techniques de l'année 2012,
• Premiers résultats scientifiques obtenus ou le cas échéant identification des premières pistes de recherche explorées et positionnement par rapport à l'état de l'art,
• Objectifs scientifiques pour 2013, en donner une vue plus détaillée que celle du point 1.
• 1 slide sur l'utilisation du budget alloué au projet.