25
The Changing Concept of a Scientific Fact Survey of the Machine Learning Community Responses and Open Questions Establishing Scientific Facts Victoria Stodden Department of Statistics Columbia University Setting Time Aright, Copenhagen September, 2011 1 / 25

Establishing Scientific Facts - Stanford Universityvcs/talks/VictoriaStoddenFQXiSept2011.pdf · 1660 (the \Invisible College"), ... P. J. Bickel, J. B. Brown, H. Huang & Q. Li

  • Upload
    lynhu

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

The Changing Concept of a Scientific FactSurvey of the Machine Learning Community

Responses and Open Questions

Establishing Scientific Facts

Victoria StoddenDepartment of Statistics

Columbia University

Setting Time Aright, CopenhagenSeptember, 2011

1 / 25

The Changing Concept of a Scientific FactSurvey of the Machine Learning Community

Responses and Open Questions

The Changing Concept of a Scientific FactThe Scientific RecordScientific Research is ChangingExamplesThe Credibility Crisis

Survey of the Machine Learning Community

Responses and Open Questions

2 / 25

The Changing Concept of a Scientific FactSurvey of the Machine Learning Community

Responses and Open QuestionsExamplesThe Credibility Crisis

The Concept of a Scientific Fact

In Opus Tertium (1267) Roger Bacon distin-guishes experimental science by:

1. verification of conclusions by directexperiment,

2. discovery of truths unreachable by otherapproaches,

3. investigation of the secrets of nature,opening us to a knowledge of past andfuture.

I described a repeating cycle of observation, hypothesis,experimentation, and the need for independent verification,

I recorded his experiments (e.g. the nature and cause of therainbow) in enough detail to permit reproducibility by others.

3 / 25

The Changing Concept of a Scientific FactSurvey of the Machine Learning Community

Responses and Open QuestionsExamplesThe Credibility Crisis

Inductive Scientific Reasoning

In Novum Organum (1620) Francis Bacon proposes:

1. the gathering of facts, by observation orexperimentation,

2. verification of general principles.

“There are and can be only two ways ofsearching into and discovering truth. Theone flies from the senses and particulars tothe most general axioms, and from theseprinciples, the truth of which it takes forsettled and immoveable. ... The otherderives axioms from the senses and par-ticulars, rising by a gradual and unbrokenascent, so that it arrives at the most gen-eral axioms last of all. This is the trueway, but as yet untried.”

4 / 25

The Changing Concept of a Scientific FactSurvey of the Machine Learning Community

Responses and Open QuestionsExamplesThe Credibility Crisis

The Scientific Record

I The Royal Society of London founded1660 (the “Invisible College”),

I members discussed Francis Bacon’s“new science” from 1645,

I Society correspondence reviewed bythe first Secretary, Henry Oldenburg,

I Oldenburg became the founder, editor,author, and publisher of PhilosophicalTransactions, launched in 1665.

5 / 25

The Changing Concept of a Scientific FactSurvey of the Machine Learning Community

Responses and Open QuestionsExamplesThe Credibility Crisis

Scientific Research is Changing

Scientific computation is becoming central to the scientificmethod:

I Changing how research is conducted in many fields,

I Changing the nature of how we learn about our world.

Conjecture: Today’s academic scientist probably has more incommon with a large corporation’s information technology managerthan with a philosophy or English professor at the same university.

6 / 25

The Changing Concept of a Scientific FactSurvey of the Machine Learning Community

Responses and Open QuestionsExamplesThe Credibility Crisis

I. Examples of Pervasiveness of Computational Methods

I For example, in statistics:

JASA June Computational Articles Code Publicly Available

1996 9 of 20 0%2006 33 of 35 9%2009 32 of 32 16%2011 29 of 29 21%

I Social network data and the quantitative revolution in socialscience (Lazier et al. 2009);

I Computation reaches into traditionally nonquantitative fields:e.g. Wordhoard project at Northwestern examining worddistributions by Shakespearian play.

7 / 25

The Changing Concept of a Scientific FactSurvey of the Machine Learning Community

Responses and Open QuestionsExamplesThe Credibility Crisis

1. Climate Simulation: Community Climate Models

8 / 25

The Changing Concept of a Scientific FactSurvey of the Machine Learning Community

Responses and Open QuestionsExamplesThe Credibility Crisis

2. High Energy Physics: Large Hadron Collider

I 4 LHC experiments at CERN: 15 petabytes produced annually

I Data shared through grid to mobilize computing power

I Director-General of CERN (Heuer): “Ten or 20 years ago wemight have been able to repeat an experiment. They weresimpler, cheaper and on a smaller scale. Today that is not thecase. So if we need to re-evaluate the data we collect to testa new theory, or adjust it to a new development, we are goingto have to be able reuse it. That means we are going to needto save it as open data.” Computer Weekly, August 6, 2008

9 / 25

The Changing Concept of a Scientific FactSurvey of the Machine Learning Community

Responses and Open QuestionsExamplesThe Credibility Crisis

3. Dynamic modeling of macromolecules: SaliLab UCSF

10 / 25

The Changing Concept of a Scientific FactSurvey of the Machine Learning Community

Responses and Open QuestionsExamplesThe Credibility Crisis

4. Mathematical “proof” by simulation and grid search

Phil. Tran

s. R. Soc. A | vol. 367 n

o. 1906 pp

. 4235–4470 | 13 Nov 2009

Statistical challen

ges o

f hig

h-d

imen

sion

al data

Founded in 1660, the Royal Society is the independent scientific academy of the UK, dedicated to promotingexcellence in science

Registered Charity No 207043

IntroductionStatistical challenges of high-dimensional data 4237I. M. Johnstone & D. M. Titterington

ArticlesSelective inference in complex research 4255Y. Benjamini, R. Heller & D. Yekutieli

Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing 4273D. Donoho & J. Tanner

On landmark selection and sampling in high-dimensional data analysis 4295M.-A. Belabbas & P. J. Wolfe

An overview of recent developments in genomics and associated statistical methods 4313P. J. Bickel, J. B. Brown, H. Huang & Q. Li

Cherry-picking for complex data: robust structure discovery 4339D. L. Banks, L. House & K. Killourhy

Statistical inference for exploratory data analysis and model diagnostics 4361A. Buja, D. Cook, H. Hofmann, M. Lawrence, E.-K. Lee, D. F. Swayne & H. Wickham

Sufficient dimension reduction and prediction in regression 4385K. P. Adragni & R. D. Cook

Identifying graph clusters using variational inference and links to covariance parametrization 4407D. Barber

Classification of sparse high-dimensional vectors 4427Yu. I. Ingster, C. Pouet & A. B. Tsybakov

Feature selection by higher criticism thresholding achieves the optimal phase diagram 4449D. Donoho & J. Jin

13 November 2009

volume 367 · number 1906 · pages 4235–4470

rsta.royalsocietypublishing.orgPublished in Great Britain by the Royal Society, 6–9 Carlton House Terrace, London SW1Y 5AG

Statistical challenges of high-dimensional dataPapers of a Theme Issue compiled and edited by D. L. Banks, P. J. Bickel, Iain M. Johnstone and D. Michael Titterington

13 November 2009

Statistical challenges of high-dimensional dataPapers of a Theme Issue compiled and edited by D. L. Banks, P. J. Bickel, Iain M. Johnstone and D. Michael Titterington

In this issue

The world’s longest running science journal

ISSN 1364-503X

volume 367

number 1906

pages 4235–4470

RSTA_367_1906_cover.qxd 09/25/09 07:27 PM Page 1

11 / 25

The Changing Concept of a Scientific FactSurvey of the Machine Learning Community

Responses and Open QuestionsExamplesThe Credibility Crisis

Evidence of a problem..

Relaxed practices regarding the communication of computationaldetails is creating a credibility crisis in computational science, notonly among scientists, but as a basis for policy decisions and in thepublic mind.

Recent prominent examples,

I Climategate 2009,

I Microarray-based clinical trials recently terminated at DukeUniversity.

12 / 25

The Changing Concept of a Scientific FactSurvey of the Machine Learning Community

Responses and Open QuestionsExamplesThe Credibility Crisis

Clinical trials based on flawed genomic studies

Timeline:

I Potti et al (2006), Nature Medicine; (2006) NEJM; (2007)Lancet Oncology; (2007) Journal of Clinical Oncology:evidence of genomic signatures to guide use ofchemotheraputics (all since retracted),

I Coombes, Wang, Baggerly at M.D. Anderson Cancer Centercannot replicate, and find simple flaws: genes misaligned byone row, column labels flipped, genes repeated and missingfrom analysis..

I 2007 correspondence and a supplementary report submitted tothe Journal of Clinical Oncology and publication declined;2008 Nature Medicine declines their correspondence.

I Clinical trials initiated in 2007 (Duke), 2008 (Moffitt).

13 / 25

The Changing Concept of a Scientific FactSurvey of the Machine Learning Community

Responses and Open QuestionsExamplesThe Credibility Crisis

Clinical trials based on flawed genomic studies

I Duke launches internal investigation Sept 2009; all three trialssuspended in Oct 2009,

I Oct 2009: results reported validated, regardless of errors,because data blinded (later found not to be true),

I Jan 2010: Duke clinical trials resume, patients allocated totreatment and control groups. “Neither the review nor theraw data are being made available at this time.”

I July 2010: 33 prominent biostatisticians write to Varmus ashead of IOM urging suspension of the trials and anexamination of standards of review, including reproducibility.

I Sept 2010: IOM committee “Review of Omics-Based Tests forPredicting Patient Outcomes in Clinical Trials” formed,

I Nov 2010: Potti resigns and the clinical trials are terminated.

14 / 25

The Changing Concept of a Scientific FactSurvey of the Machine Learning Community

Responses and Open QuestionsExamplesThe Credibility Crisis

Controlling Error is Central to Scientific Progress

“The scientific method’s central motiva-tion is the ubiquity of error - the aware-ness that mistakes and self-delusion cancreep in absolutely anywhere and thatthe scientist’s effort is primarily expendedin recognizing and rooting out error.”David Donoho et al. (2009)

15 / 25

The Changing Concept of a Scientific FactSurvey of the Machine Learning Community

Responses and Open QuestionsExamplesThe Credibility Crisis

The Third Branch of the Scientific Method

I Branch 1: Deductive/Theory: e.g. mathematics; logic,

I Branch 2: Inductive/Empirical: e.g. the machinery ofhypothesis testing; statistical analysis of controlledexperiments,

I Branch 3? Large scale extrapolation and prediction, usingsimulation and other data-intensive methods.

16 / 25

The Changing Concept of a Scientific FactSurvey of the Machine Learning Community

Responses and Open QuestionsExamplesThe Credibility Crisis

Toward a Resolution of the Credibility Crisis

I Typical scientific communication doesn’t include sufficientdetail for reproducibility ie. the code and data that generatedthe findings.

I Most published computational scientific results today are nearimpossible to replicate.

Thesis: Computational science cannot be elevated to a thirdbranch of the scientific method until it generates routinelyverifiable knowledge. (Donoho, Stodden, et al. 2009)

Sharing of underlying code and data is a necessary part of thissolution, enabling Reproducible Research.

17 / 25

The Changing Concept of a Scientific FactSurvey of the Machine Learning Community

Responses and Open Questions

Survey of Machine Learning Community (Stodden 2010)

Question: Why isn’t reproducibility practiced more widely?Answer builds on literature of free revealing and open innovation inindustry, and the sociology of science.

I Sample: American academics registered at the MachineLearning conference NIPS.

I Respondents: 134 responses from 593 requests (∼23%).

18 / 25

The Changing Concept of a Scientific FactSurvey of the Machine Learning Community

Responses and Open Questions

Top Reasons Not to Share

Code Data

77% Time to document and clean up 54%52% Dealing with questions from users 34%44% Not receiving attribution 42%40% Possibility of patents -34% Legal barriers (ie. copyright) 41%

- Time to verify release with admin 38%30% Potential loss of future publications 35%30% Competitors may get an advantage 33%20% Web/Disk space limitations 29%

19 / 25

The Changing Concept of a Scientific FactSurvey of the Machine Learning Community

Responses and Open Questions

20 / 25

The Changing Concept of a Scientific FactSurvey of the Machine Learning Community

Responses and Open Questions

Top Reasons to Share

Code Data

91% Encourage scientific advancement 81%90% Encourage sharing in others 79%86% Be a good community member 79%82% Set a standard for the field 76%85% Improve the caliber of research 74%81% Get others to work on the problem 79%85% Increase in publicity 73%78% Opportunity for feedback 71%71% Finding collaborators 71%

21 / 25

The Changing Concept of a Scientific FactSurvey of the Machine Learning Community

Responses and Open Questions

Grassroots Efforts in Many Fields, Policies

Independent efforts by researchers:

I AMP 2011 “Reproducible Research: Tools and Strategies for Scientific Computing”

I AMP / ICIAM 2011 “Community Forum on Reproducible Research Policies”

I SIAM Geosciences 2011 “Reproducible and Open Source Software in the Geosciences”

I ENAR International Biometric Society 2011: Panel on Reproducible Research

I AAAS 2011: “The Digitization of Science: Reproducibility and Interdisciplinary Knowledge Transfer”

I SIAM CSE 2011: “Verifiable, Reproducible Computational Science”

I Yale 2009: Roundtable on Data and Code Sharing in the Computational Sciences

I ACM SIGMOD conferences

I ...

Policy changes:

I NSF/OCI report on Grand Challenge Communities (Dec 2010)

I NSF report “Changing the Conduct of Science in the Information Age” (Aug 2011)

I IOM “Review of Omics-based Tests for Predicting Patient Outcomes in Clinical Trials”

I NIH, NSF multiple requests for input on data policies

I Journal policy movement toward code and data requirements (ie. Science Feb 2011)

I ...

22 / 25

The Changing Concept of a Scientific FactSurvey of the Machine Learning Community

Responses and Open Questions

Popular Press

I “The Truth Wears Off,” New Yorker Magazine, Dec 2010:asserts the ‘discovery’ of a mysterious effect by whichreplicated experiments decrease in significance level.

I “it appears that nature often gives us different answers”I evidence provided in the article:

I tests on three schizophrenia drugs,I Professor Schooler’s inability to replicate his own research

results,I his colleagues’ assurances that this happens ‘all this time,’I ESP experiments from the 1930’s,I tests for symmetry in sex selection,I temporal trends in hundreds of ecology papers.

Question: why bias the publication of results towards ones that agree with previously published results? (Merton’s

proposed Universalism scientific norm)

23 / 25

The Changing Concept of a Scientific FactSurvey of the Machine Learning Community

Responses and Open Questions

Popular Press

I “Lies, Damned Lies, and Medical Science,” The Atlantic, Nov2010.

I profile of the work of John Ioannidis, Stanford UniversitySchool of Medicine.

I exposure of bias and flawed statistical reasoning in medicalresearch,

I decline effect due to initial ‘exaggerations’ of the results andresearcher error,

I misinterpretation of p-values, artificial lowering of p-values.

24 / 25

The Changing Concept of a Scientific FactSurvey of the Machine Learning Community

Responses and Open Questions

Open Questions Regarding Open Data and Code

I Massive codes or datasets, software support, streaming data,

I Tools for ease of implementation ie. data provenance andworkflow, (“progress depends on artificial aids becoming sofamiliar they are regarded as natural” I.J. Good, 1958),

I Taleb Effect - scientific discoveries as (misused) black boxes,

I nefarious uses? public misinterpretation?

I black boxes and opacity in software (why the traditionalmethods section is inadequate, massive codebases),

I lock-in: calcification of ideas in software?

I independent replication discouraged?

I policy maker engagement: finding support for our norms.

25 / 25