21
arbara Mascialino Monte Carlo 2005 Chattanooga, April 19 th 200 nte Carlo 2005 - Chattanooga, April 200 B. Mascialino, A. Pfeiffer, M. G. Pia, A. B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon, P. Viarengo Ribon, P. Viarengo

Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo 2005 - Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,

Embed Size (px)

DESCRIPTION

Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Qualitative evaluation Quantitative evaluation GoF statistical toolkit A project to develop a statistical comparison system A project to develop a statistical comparison system Comparison of distributions Goodness of fit testing

Citation preview

Page 1: Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo 2005 - Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,

Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005

Monte Carlo 2005 - Chattanooga, April 2005

B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon, P. ViarengoB. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon, P. Viarengo

Page 2: Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo 2005 - Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,

Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005

Provide tools for the Provide tools for the statistical comparisonstatistical comparison of distributions of distributions equivalent reference distributions experimental measurements data from reference sources functions deriving from theoretical calculations or fits

Detector monitoringDetector monitoring Simulation validationSimulation validation

Reconstruction vs. expectationReconstruction vs. expectationRegression testingRegression testingPhysics analysisPhysics analysis

Data analysis

Page 3: Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo 2005 - Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,

Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005

Qualitative evaluationQualitative evaluationQuantitative evaluationQuantitative evaluation

GoF statistical toolkit

A project to develop a statistical comparison systemstatistical comparison system

Comparison of distributionsComparison of distributions

Goodness of fit testingGoodness of fit testing

Page 4: Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo 2005 - Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,

Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005

• United Software Development ProcessUnited Software Development Process, specifically tailoredtailored to the project– practical guidance and tools from the RUPRUP– both rigorous and lightweight– mapping onto ISO 15504

• Guidance from ISO 15504ISO 15504

• Incremental and iterative life cycle model

Software process guidelines

SPIRAL APPROACHSPIRAL APPROACH

Page 5: Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo 2005 - Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,

Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005

• The project adopts a solid architectural approachsolid architectural approach– to offer the functionalityfunctionality and the qualityquality needed by the users– to be maintainablemaintainable over a large time scale– to be extensibleextensible, to accommodate future evolutions of the

requirements

• Component-based approachComponent-based approach– to facilitate re-use and integration in different frameworks

• AIDAAIDA– adopt a (HEP) standard– no dependence on any specific analysis tool

Architectural guidelines

Page 6: Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo 2005 - Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,

Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005

Page 7: Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo 2005 - Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,

Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005

The tests are specialised on the kind of distribution The tests are specialised on the kind of distribution (binned/unbinned)(binned/unbinned)

Page 8: Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo 2005 - Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,

Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005

G.A.P Cirrone, S. Donadio, S. Guatelli, A. Mantero, B. Mascialino, S. Parlati, M.G. Pia, A. Pfeiffer, A. Ribon, P. Viarengo

“A Goodness-of-Fit Statistical Toolkit”IEEE- Transactions on Nuclear Science (2004), 51 (5): 2056-2063.

Release StatisticsTesting-V1-01-00 downloadable from the web:http://www.ge.infn.it/geant4/analysis/HEPstatistics/

Page 9: Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo 2005 - Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,

Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005

• Applies to binnedbinned distributions

• It can be useful also in case of unbinned distributions, but the data must be grouped into classes

• Cannot be applied if the counting of the theoretical frequencies in each class is < 5

– When this is not the case, one could try to unify contiguous classes until the minimum theoretical frequency is reached

– Otherwise one could use Yates’ correction

Chi-squared testChi-squared test

Page 10: Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo 2005 - Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,

Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

EMPIRICAL DISTRIBUTION FUNCTIONORIGINAL DISTRIBUTIONS

• Kolmogorov-Smirnov test

• Goodman approximation of KS test

• Kuiper test

)(

4 22

nmnmDmn

)()( xGxFSupD mnmn

)()()()( 00* xFxFMaxxFxFMaxD TT

Dmn

Tests based on maximum distanceTests based on maximum distanceunbinned distributionsunbinned distributions

SUPREMUMSUPREMUMSTATISTICSSTATISTICS

Page 11: Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo 2005 - Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,

Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005

• Fisz-Cramer-von Mises test

• Approx Anderson-Darling test

i

ii xFxFnnnnt 2

21221

21 )]()([)(

i k kkk

kiikk

ik nhHnH

HnnFhnkn

nA

4)(

)(1)1(

)1( 2

22

Tests containing a weighting functionTests containing a weighting functionbinned/unbinned distributionsbinned/unbinned distributions

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

EMPIRICAL DISTRIBUTION FUNCTIONORIGINAL DISTRIBUTIONS

QUADRATICQUADRATICSTATISTICSSTATISTICS

+ + WEIGHTING WEIGHTING FUNCTIONFUNCTION

Sum/integral of all the distances

Page 12: Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo 2005 - Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,

Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005

The user is completely shieldedshielded from both statistical and computing complexity.

USERUSER

EXTRACTS THE ALGORITHM WRITING ONE LINE OF CODEEXTRACTS THE ALGORITHM WRITING ONE LINE OF CODE

TOOLKITTOOLKIT STATISTICALSTATISTICALRESULTRESULT

User’s point of viewUser’s point of view• Simple user layerSimple user layer• Only deal with AIDA objectsAIDA objects and choice of comparison algorithmcomparison algorithm

Page 13: Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo 2005 - Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,

Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005

Software testingSoftware testing

Rigorous software process adopted

Test process

Unit tests

Integration tests

System tests

Testing focuses primarily on the evaluation or assessment of qualityquality of the software product, guaranteeing its correctnesscorrectness and robustnessrobustness.

• finding and documenting defects in software quality• validating software product functions as designed• validating that the requirements have been implemented appropriately

Test result summaries are included as part of the documentation of the Toolkit release and are available on the web.

Page 14: Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo 2005 - Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,

Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005

• Weighted KS tests

• Weighted CVM tests• CVM approximation to a 2 (Tiku test)• Exact Anderson-Darling test

• Watson test• Watson approximation to a 2 (Tiku test)

With these tests the GoF Statistical Toolkit will be the most complete toolkit for the two-sample problem in physics as well as in the statisticsdomain.

Work in progress: new testsWork in progress: new tests

unbinned distributionsunbinned distributions

binned/unbinned binned/unbinned distributionsdistributions

unbinned distributionsunbinned distributions

supremum statisticssupremum statistics

quadratic statisticsquadratic statistics

Page 15: Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo 2005 - Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,

Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005

2 loses information in a test for unbinned distribution by grouping the data into cellsKac, Kiefer and Wolfowitz (1955) showed that Kolmogorov-Smirnov test

requires n4/5 observations compared to n observations for 2 to attain the same power

Cramer-von Mises and Anderson-Darling statistics are expected to be superior to Kolmogorov-Smirnov’s, since they make a comparison of the two distributions all along the range of x, rather than looking for a marked difference at one point

22 Supremum Supremum statistics statistics

teststests

Tests Tests containing a containing a

weight functionweight function< <

In terms of power:

IsIs 2 the most powerful algorithm?The power of a test is the

probability of rejecting the null hypothesis correctly

Page 16: Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo 2005 - Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,

Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005

Examples of practical applicationsExamples of practical applications

Page 17: Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo 2005 - Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,

Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005

MiMicroscopic validation of physics

p-va

lue

H0 REJECTION AREA

The three Geant4 models are equivalent

Physics models under test:• Geant4 Standard• Geant4 Low Energy – Livermore• Geant4 Low Energy – Penelope

Reference data:• NIST ESTAR - ICRU 37

Z

p-value stability study

Geant4 LowE PenelopeGeant4 LowE Penelope Geant4 StandardGeant4 StandardGeant4 LowE EEDLGeant4 LowE EEDLNIST - XCOMNIST - XCOM

Geant4 LowE PenelopeGeant4 LowE Penelope Geant4 StandardGeant4 Standard

Geant4 LowE EEDLGeant4 LowE EEDL

K. Amako, S. Guatelli, V. Ivanchenko, M. Maire, B. Mascialino, K. Murakami, P. Nieminen, L. Pandola, S. Parlati, A. Pfeiffer, M. G. Pia, M.

Piergentili, T. Sasaki, L. UrbanPrecision validation of Geant4 electromagnetic

physics

Page 18: Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo 2005 - Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,

Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005

Radioprotection Radioprotection applications in manned space missions

-Comparison of inflatable and conventional rigid habitat concepts: Effect of different shielding materialsdifferent shielding materials Effect of shielding thickness shielding thickness E.m.E.m. and hadronic interactionshadronic interactions

S. Guatelli, B. Mascialino, P. Nieminen, M. G. PiaRadioprotection for interplanetary manned missions

inflatable habitat

GCR

vacuum air

phantomAl structure/ inflatable structure + shielding

2.15 cm Al

Inflatable habitat + 10 cm water

Inflatable habitat + 5 cm water

4 cm Al

Energy deposit in the phantom by GCR p

S. Guatelli, B. Mascialino, P. Nieminen, M. G. PiaRadioprotection for interplanetary manned missions

thanks to Susanna GuatelliKS TEST

Page 19: Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo 2005 - Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,

Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005

2 not appropriate(< 5 entries in some

bins, physical information would be

lost if rebinned)

Anderson-Darling

Ac (95%) =0.752A. Mantero, B. Mascialino, P. Nieminen, M. G. Pia, A. Owens,

M. Bavdaz, A. PeacockA library for simulated X-ray emission from planetary surfaces

Test beam at BessyTest beam at BessyBepi-Colombo missionBepi-Colombo mission

Energy (keV)

Cou

nts

X-ray fluorescence spectrum in Iceand basalt(EIN=6.5 keV)

Very complex distributions

thanks to Alfonso Mantero

Page 20: Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo 2005 - Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,

Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005

Medical applications-IMRTMedical applications-IMRT

Range D p-value

-84 -60 mm 0.385 0.23-59 -48 mm 0.27 0.90-47 47 mm 0.43 0.1948 59 mm 0.30 0.8260 84 mm 0.40 0.10

Range D p-value

-56 -35 mm 0.26 0.89-34 -22 mm 0.43 0.42-21 21 mm 0.38 0.0822 32 mm 0.26 0.9833 36 mm 0.57 0.13

Kolmogorov-Smirnov testDistance (mm) Distance (mm)

% d

ose

% d

ose

F. Foppiano, B. Mascialino, M. G. Pia, M. PiergentiliGeant4 simulation of an accelerator head for intensity

modulated radiotherapy

thanks to Michela Piergentili

Page 21: Barbara MascialinoMonte Carlo 2005Chattanooga, April 19 th 2005 Monte Carlo 2005 - Chattanooga, April 2005 B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon,

Barbara Mascialino Monte Carlo 2005 Chattanooga, April 19th 2005

• This is a newnew up-to-dateup-to-date easy to handleeasy to handle and powerfulpowerful tool for statistical comparison in particle physics.

• It the first tool supplying such a variety of sophisticated and sophisticated and powerful statistical testspowerful statistical tests in HEP.

• ReleasedReleased and downloadable from the webdownloadable from the web.

• AIDAAIDA interfaces allow its integration in any other data analysis tool.

Applications in: Applications in: HEPHEP, , astrophysicsastrophysics, , medical physicsmedical physics, … , …

ConclusionsConclusions