The History of
Keysteps of Computational Statistics
Wilfried Grossmann, University of Vienna, Austria
Michael G. Schimek, Medical University of Graz, Austria
Peter Paul Sint, Austrian Academy of Sciences, Vienna
30 Years
2
1974Department of Statistics and
Informatics, University of Vienna
Peter Paul, a „senior“ Assistant Professor
3
Department of Statistics and Informatics,
University of Vienna
1974 A few years after
Wilfried, a „junior“ Assistant Professor
4
1974 A few years after
Michael, a first year student
University of Vienna
Gerhard Bruckmann
5
Outline of PresentationThe Beginning of COMPSTAT• Early statistical computing• The institutional environment• The first symposium and the Compstat Society
Developments in Computational Statistics (CS)
• CS and statistical theory• CS and algorithms• CS and computer science• CS and application
The COMPSTAT Symposia
6
The Beginning of COMPSTAT
7
Early Computational Statistics
• The Beginnings in Vienna– Institute of Statistics
• Part of the Law Faculty - S. Sagoroff - Leipzig/Sofia/USA/Berlin//Vienna - Energy Balances
• first Computer: first generation machine – Paid for by Rockefeller-Foundation 1960– Arrival of the ‚Electronic Brain‘ 1st generation
» Never again similar enthusiasm
• Institute of Advanced Studies - Ford Institute– Statistical machines - card counting - >2nd generation
• Replaced by IBM /360-44 - 3rd gen. SSP / SPSS
– Computing Center
8
Statistics-Computational
One year Biostatistics department Oxford UniversityStill: Not strongly integrated in international statistical community - Main contacts ISI: Central Statistical Office, Sagoroff1973 ISI-session in Vienna - emphasis on applications - computational methods rareBring statisticians with our interests to ViennaEncouragement by publisher Arnulf Liebing /Physica/ What is specific to our department?Concept of Computational Statistics - Johannes Gordesch (Math) - Peter Paul Sint (Physics)
9
First COMPSTAT Call
COMPSTAT 1974
-Gerhart Bruckmann - Local fame as analyst of voting results during election nights-Leopold Schmetterer (successor of Sagoroff) - Internationally known Mathematical Statistician
(Franz Ferschl, incoming professor of statistics, new editor of Metrika - added as an editor by the publisher)
10
S. Sagoroff and M. Tantilov
11
First COMPSTAT Editors
12
Preface of the first Proceedings
13
Logic of the Logo
14
J. Gordesch at Compstat76 Berlin
15
Getting of Age
• International from the start • Compstat Society since Berlin• Leiden NL 1978 Integration into IASC• Edinburgh GB 1980 - Toulouse F 1982• Eastern Europe needed Politics ISI-IASC• Local Projects redirected: Prague 1984• Rome I 1986 - Copenhagen 1988 DK• Dubrovnik YU 1990 - Neuchâtel CH 1992
16
Prague 1984
17
Developments in Computational Statistics
18
Computational Statistics
• What is Computational Statistics?– A question raised many times at the end of
the 80ies and beginning of the 90ies inside the community
19
Computational Statistics
• Working definition (A. Westlake)Computational Statistics is related to the advance of statistical theory and methods through the use of computational methods. This includes both the use of computation to explore the impact of theories and methods, and development of algorithms to make these ideas available to users
20
Computational Statistics
Computational Statistics
Statistical Theory Algorithms
Applications ComputerScience
Numerical Analysis
Statistical Software
ModellingSeminumerical
Algorithms
21
Computational Statistics and Statistical Theory
• The statistical journey in the 20th century
• The Theory Era
• The Methodology Era
22
Computational Statistics and Statistical Theory
• The statistical journey in the 20th century– B. Efron:
Statistics in the 20th century is a journey between three poles:
• Applications• Mathematics• Computation
23
Computational Statistics and Statistical Theory
• The Theory Era(Pearson, Neyman, Fisher, Wald)– From models for solving practical problems
towards a mathematical decision theoretic framework
– Based on optimality principles– Application is based on computations feasible
for paper and pencil or mechanical computing devices
24
Computational Statistics and Statistical Theory
• Modelling Era (1) – Tukey’s paper about the future of data
analysis (1962) as a turning point from mathematics towards computation
• Confirmatory versus explanatory analysis• Dynamics of data analysis• “Robustness”• Importance of Graphics
25
Computational Statistics and Statistical Theory
• Modelling Era (2)– Important developments in the modelling era
• Nonparametric and Robust Methods• Kaplan-Meier and Proportional Hazards• Logistic Regression and GLM• Jackknife and Bootstrap• EM and MCMC• Empirical Bayes and James-Stein Estimation
26
Computational Statistics and Statistical Theory
• Modelling Era (3)– The modelling area is characterized by a
strong interplay between statistical theory and computational statistics
– The computer as a workbench for statistical experiments (going back to v. Neumann and S. Ulam)
• Passive usage: Studying feasibility of statistical theory by simulation
• Active usage: Obtain results which cannot be computed by conventional numerical algorithms
27
Computational Statistics and Statistical Theory
• COMPSTAT was probably not always at the frontier of this developments but the programs and the proceedings reflect quite well the dynamics of the subject in the Modelling Era
28
Computational Statistics and Algorithms
• Numerical Algorithms– Matrix Computation, Optimization
• Random Numbers / Monte Carlo• Semi-numerical Algorithms
– Sorting, Searching, Combinatorial Methods, Graph Theoretic Algorithms,…
• Graphical Algorithms• Symbolic Computation (?)• Mathematical vs. Statistical Modelling
29
Computational Statistics and Algorithms
• Statistics and Numerical Algorithms (1)– Fast Fourier Transform (Tukey)– Recursive Algorithms and Filtering (Kalman
Filter)
(Both topics seem to be not core topics in computational statistics)
30
Computational Statistics and Algorithms
• Statistics in Numerical Algorithms (2)– Adaptation of optimization techniques (e.g.
scoring methods)– Behaviour of optimization methods in
statistical context (numerical convergence vs. stochastic convergence concepts)
Implicit Consideration at COMPSTAT
31
Computational Statistics and Algorithms
• Statistics and Random Numbers / Monte Carlo– Generation of Random numbers was (and is)
probably more a topic of mathematics (number theory) and computer science
• In the beginning of COMPSTAT there was also some connection to simulation
– Genuine application of Monte Carlo Methods in connection with new developments of statistical theory (e.g. MCMC)
32
Computational Statistics and Algorithms
• Statistics and semi-numerical algorithms – Applications in context of nonparametric statistics and
analysis of tabular data• Feasibility of conditional inference for logistic models
– New developments on the borderline between statistics and computer science
• Data Mining as a new statistical modelling paradigm
COMPSTAT was open towards these developments
and integrated it into the program
33
Computational Statistics and Algorithms
• Statistics and Graphical Algorithms – Development rather complementary to the
developments of computer science, – Important issues (L. Wilkinson):
• Graphics are not only a tool for displaying results but rather a tool for perceiving relationships
• Dynamic graphics as important tool for data analysis• Graphics are a means of model formalization reflecting
quantitative and qualitative traits of its variables
Represented quite well at COMPSTAT
34
Computational Statistics and Algorithms
• Mathematical vs. Statistical Modelling – Emphasis on different methods (e.g.
Differential Equations)– Different modelling environments (J. Nelder)
• Data structures in statistics• Exploratory nature of statistical analysis (statistical
analysis cycle)• Competence of users
35
Computational Statistics and Computer Science
• Developments in Statistical Software
• Development of Statistical Languages
• Developments in Statistical Database Management
36
Computational Statistics and Computer Science
• Developments in Statistical Software (1) – From numerical subroutines towards
statistical packages– Main goals:
• Taking into account the peculiarities of statistical data analysis
• Usage of actual hardware developments
37
Computational Statistics and Computer Science
• Developments in Statistical Software (2)– COMPSTAT was from the beginning onwards
an important forum for the development of statistical software
• The proceedings in the beginning of the eighties show numerous software developments for specific statistical models
• There was always some tension in connection with presentation of commercial software developments and the scientific character of the conference
38
Computational Statistics and Computer Science
• Development of Statistical Languages (1)– GLIM was probably the first genuine statistical
modelling language• Present at COMPSTAT from the very beginning
39
Computational Statistics and Computer Science
• Development of Statistical Languages (2)– The S language set up a new paradigm for
computing which is of interest also outside statistical applications
• Contribution in Computer Science honoured by the ACM Software System Award for J. Chambers
Also it started already in 1976 it took a long time to enter the COMPSTAT community
40
Computational Statistics and Computer Science
• Development of Statistical Languages (3)– R got rather fast popularity inside COMPSTAT
due to free availability and effective organisation of CRAN
– Omegahat: An umbrella for open source projects in computational statistics covering not only statistical computation but also other important aspects in distributed computing
41
Computational Statistics and Computer Science
• Development of Statistical Languages (4)– XLISP-Stat as proof of concept (in particular
for animated graphics) – XploRe as Java based production system
42
Computational Statistics and Computer Science
• Statistical Data Base Management– Main challenge is appropriate usage of the
developments in database technology in statistical context
• Combination of statistical data structures and statistical processing activities with conceptual data models
• Representation of tabular data• Metadata as a tool to capture the complexity of statistical
data
A small but active group inside the COMPSTAT community from the very beginning
43
Computational Statistics and Applications
• Challenges for Computational StatisticsRather independent from application area– Data
• Data capture• Data structures• Data size
– Analysis Process• Analysis strategies• The role of the statistician in the computer age
44
Computational Statistics and Applications
• Data challenges (1)– Contributions towards data challenges occur
occasionally at COMPSTAT
• Actual problems – Data capture
• Data capture tools are rather a side branch of computational statistics and more connected to official statistics
• A new challenge are data streams which have up to now attracted not so much attention in the computational statistics community
45
Computational Statistics and Applications
• Data challenges (2)– Data structures
• New problems (e.g. in connection with data mining) raise questions with respect to the applicability of the basic statistical analysis paradigm (population, sample, measurement process)
– Data size• Handling huge datasets
All these challenges seem to be at the moment not core topics of computational statistics
46
Computational Statistics and Applications
• Analysis process– Analysis strategies
• The question of formalization of analysis strategies was a hot topic at the COMPSTAT conferences in the end of the 80ies, but there was limited success
– The role of statisticians in the computer age• Is progress in computational statistics an enabler
for statisticians or leads it towards a de-skilling of the statistical profession?
47
The COMPSTAT Symposia
48
A full set of COMPSTAT proceedings (one statistical outlier removed)
Do you see the CSDA volumes in the background ?
Here they are !
49
The COMPSTAT Symposia I
Symposium Year Organizers # Sub-missions
# Papers I/C
# Particip-ants
Vienna 1974 Sint 50 100
Berlin 1976 Gordesch
Naeve
58 180
Leiden 1978 Corsten
Hermans
68 310
Edinburgh 1980 Barrit
Wishart
250 4/82 750
Toulouse 1982 Caussinus
Ettinger
Tomassone
250 15/60 500
50
The COMPSTAT Symposia IISymposium Year Organizers # Sub-
missions# Papers
I/C# Particip-
ants
Prag 1984 Havranek
Sidak
Novak
300 7/65 ???
Rome 1986 De Antoni
Lauro
Rizzi
300 14/60 900
Copenhag-en
1988 Edwards
Raun
300 9/51 800
Dubrovnik 1990 Momirovic 115 6/43 180
Neuchâtel 1992 Dodge
Whittaker
115 11/115 200
51
COMPSTAT 1994 Vienna and Satellite Meeting on Smoothing Semmering (World Cultural Heritage)
Randy Eubank
Andrew Westlake, Allmut Hörmann, Wolfgang Härdle
52
On the track from Vienna to Semmering in the Austrian Alps (historical train)
The organizer
53
Satellite Meeting on Smoothing
We finally arrived at the mountain spa Semmering
Antoine de Falguerollesand the organizer at the opening
54
The COMPSTAT Symposia IIISymposium Year Organizers # Sub-
missions# Papers
I/C# Particip-
ants
Vienna
Semmring
(Satellite)
1994 Dutter
Grossmann
Schimek
200
30
11/60
7/26
380
50
Barcelona 1996 Prat 250 13/56 300
Bristol 1998 Payne
Green
180 12/58 370
Utrecht 2000 Van der Heijden
Bethlehem
250 15/60 220
Berlin 2002 Härdle 220 9/90 260
55
The COMPSTAT proceedings from the Vienna and Semmering meetings
Model of Vienna University
Kastalia Fountain