7
1 Validation & Critical Thinking Gerard J. Kleywegt Uppsala University Critical thinking What is wrong here? Critical thinking And what is wrong here? Critical thinking What is wrong here? (1) The tacR gene regulates the human nervous system (2) The tacQ gene is similar to tacR but is found in E. coli ==> So, the tacQ gene regulates the E. coli nervous system! Critical thinking Of course there is a fine line between critical thinking and silliness … Knowledge pyramid Data Information Knowledge Wisdom Nobel Prize • Processing • Visualisation • Analysis • Interpretation • Validation • Insight • Experience • Swedish friends ? • Luck ? • Longevity

Validation & Critical Thinking - Uppsala Universityxray.bmc.uu.se/kurs/BioinfX3/2008/13_validation.pdf · Validation & Critical Thinking Gerard J. Kleywegt Uppsala University Critical

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Validation & Critical Thinking - Uppsala Universityxray.bmc.uu.se/kurs/BioinfX3/2008/13_validation.pdf · Validation & Critical Thinking Gerard J. Kleywegt Uppsala University Critical

1

Validation & Critical Thinking

Gerard J. KleywegtUppsala University

Critical thinking

What is wrong here?

Critical thinking

And what is wrong here?

Critical thinking

What is wrong here?

(1) The tacR gene regulates the humannervous system

(2) The tacQ gene is similar to tacR but isfound in E. coli

==> So, the tacQ gene regulates the E. colinervous system!

Critical thinking

Of course there is a fine line betweencritical thinking and silliness …

Knowledge pyramid

Data

Information

Knowledge

Wisdom

Nobel Prize

• Processing• Visualisation

• Analysis• Interpretation• Validation

• Insight• Experience

• Swedish friends ?• Luck ?• Longevity

Page 2: Validation & Critical Thinking - Uppsala Universityxray.bmc.uu.se/kurs/BioinfX3/2008/13_validation.pdf · Validation & Critical Thinking Gerard J. Kleywegt Uppsala University Critical

2

Data versus information

DataFactsObservations

InformationContextMeaningInterpretation

ATOM 2567 N PHE B 175 7.821 -25.530 -22.848 1.00 8.71ATOM 2568 CA PHE B 175 8.845 -25.172 -21.877 1.00 9.41ATOM 2569 C PHE B 175 9.449 -23.798 -22.169 1.00 10.02ATOM 2570 O PHE B 175 10.664 -23.613 -22.103 1.00 10.37ATOM 2571 CB PHE B 175 9.928 -26.251 -21.848 1.00 9.53ATOM 2572 CG PHE B 175 10.969 -26.137 -22.982 1.00 10.03ATOM 2573 CD1 PHE B 175 12.356 -25.819 -22.988 1.00 10.51ATOM 2574 CD2 PHE B 175 11.725 -27.211 -23.402 1.00 10.25ATOM 2575 CE1 PHE B 175 11.821 -27.095 -22.869 1.00 11.17ATOM 2576 CE2 PHE B 175 12.282 -26.086 -24.008 1.00 10.95ATOM 2577 CZ PHE B 175 10.953 -26.335 -23.622 1.00 11.38

Karl Popper - falsifiability

A theory that is not falsifiable is not scientific Example

Theory: all swans are white New observation: black swan (Australia) New theory 1: Australian ornithologists are incompetent New theory 2: all swans except Cygnus atratus are white; C.

atratus is black

Astrology versus astronomy

Occam’s razor

Do not make more assumptions than strictlyneeded

When you hear hoof beats, think horses, notzebras (unless you are in Africa!)

KISS principle - Keep It Simple, Stupid Of two equivalent theories or explanations, all

other things being equal, the simpler one is tobe preferred

Maximum parsimony

Bioinformatics basics

Don’t always believe what databases /programs / lecturers tell you!They (almost) always give you some answer, but …

this can be misleading and is sometimes wrong

Don’t be a naïve userGarbage in, garbage outStatistical versus biological significanceUse common sense!

Bioinformatics basics

What is the right question to ask?

Understand limitations of data, databases,search algorithms, alignment methods,prediction methods, etc.

Evaluate result: does it answer yourquestion? Does it make sense?

Page 3: Validation & Critical Thinking - Uppsala Universityxray.bmc.uu.se/kurs/BioinfX3/2008/13_validation.pdf · Validation & Critical Thinking Gerard J. Kleywegt Uppsala University Critical

3

Validation

Validation = establishing or checking thetruth or accuracy of (something)TheoryHypothesisModelAssertion, claim, statement, observation

Integral part of scientific activity!

Science, errors & validation

Prior knowledge ObservationsExperiment

Hypothesisor Model

Predictions

Precision versus accuracy

Precise, but not veryaccurateEx: π~4.0053±0.0001

Fairly accurate, butnot very preciseEx: π~3.1±0.1

Accurate and preciseEx: π~3.1416±0.0001

Errors affect measurements

Random errors (noise)Affect precisionUsually normally distributedReduce by increasing nr of observations

Systematic errors (bias)Affect accuracyIncomplete knowledge or inadequate designReproducible

Gross errors (bloopers)Incorrect assumptions, undetected mistakes or

malfunctionsSometimes detectable as outliers

Errors affect measurements

Bias(accuracy)

Precision (uncertainty; random error)

Errors affect measurements

How tall is Gerard?

200 203 202 203 202201 203 80

Random error? Systematic error? Gross error?

Page 4: Validation & Critical Thinking - Uppsala Universityxray.bmc.uu.se/kurs/BioinfX3/2008/13_validation.pdf · Validation & Critical Thinking Gerard J. Kleywegt Uppsala University Critical

4

Science, errors & validation

Prior knowledge ObservationsExperiment

Hypothesisor Model

Predictions

ParameterisationOptimised values

Random errors ✔(precision)

✔✔

Systematic errors ✔(accuracy)

✔✔

Gross errors ✔(both)

✔ ✔✔

Science not immune to Murphy’s Law!

Science, errors & validation

Prior knowledge ObservationsExperiment

Hypothesisor Model

Predictions

Fit? Explain?

Quality?Quantity?Inf. content?Reliable?

Experiments

Correct?

Independentobservations

Predict?

Other priorknowledge

Fit?

Structure validation Structure validation

What type of residue is this? What is wrong with it? How did it end up in the PDB?

Structure validation

Should we trust the PDB?

Structures are based onexperimental data

Amount of data differs Structures are

interpretations of data PDB must accept all

depositions

Resolution

Low resolutionLittle detail

High resolutionMuch detail

Page 5: Validation & Critical Thinking - Uppsala Universityxray.bmc.uu.se/kurs/BioinfX3/2008/13_validation.pdf · Validation & Critical Thinking Gerard J. Kleywegt Uppsala University Critical

5

Resolution

1ISR 4.0 Å 1EA7 0.9 Å

Interpretation

Structure validation

Users of structures must make sure thatthese are reliable for their purposes

Ramachandran plot

Fit of model and electron density(http://eds.bmc.uu.se/)

Validation tutorial:http://xray.bmc.uu.se/embo2001/modval/

Torsion angles

Dihedral or torsion angle - given 4sequential, bonded atoms A-B-C-D Dihedral = angle between the planes

ABC and BCD Torsion = looking at the projection

along bond B-C, the angle overwhich one has to rotate A to bring iton top of D (clockwise = positive)

note: torsion (ABCD) = torsion(DCBA)

phi = torsion (C[i-1]-N[i]-Cα[i]-C[i]) psi = torsion (N[i]-Cα[i]-C[i]-N[i+1])

Validation alert:The arrow pointsthe wrong way!!!

Ramachandran plot

Steric clashes (pink dashed lines) develop during rotationaround phi (left) and psi (right)

Only certain phi, psi combinations are stericallyfavourable/allowed: Ramachandran plot

Ramachandran plot

Favourable regions inthe Ramachandranplot

Good models havevery few residuesoutside these regions

If there are any, thereis usually a goodreason

Page 6: Validation & Critical Thinking - Uppsala Universityxray.bmc.uu.se/kurs/BioinfX3/2008/13_validation.pdf · Validation & Critical Thinking Gerard J. Kleywegt Uppsala University Critical

6

Ramachandran plot

Good model:Few outliersStrong concentration in core regions

PDBsum

Same structure, different data

Global qualityimportant

Local quality alsoActive siteLigandSubstrate analogueMetal-binding siteImportant loop…

Electron density fit

Good fit of modeland density

Electron density fit

Poor fit of modeland density

Electron density fit

Page 7: Validation & Critical Thinking - Uppsala Universityxray.bmc.uu.se/kurs/BioinfX3/2008/13_validation.pdf · Validation & Critical Thinking Gerard J. Kleywegt Uppsala University Critical

7

PDBreport

http://swift.cmbi.ru.nl/gv/pdbreport/

} !!!

Oops!

Playing the Blame Game …

Why do errors make it into the literatureand the PDB? Who is to blame?

Playing the Blame Game …

Suggestions from studentsCold Spring Harbor course, 2005Copenhagen University course, 2006

Playing the Blame Game …

Crystallographer (ignorance, lack of experience,incompetence, incorrect preconceptions/bias, cheating,laziness, “science by mouse-click”, stress, can’t bebothered to fix minor problems, no validation)

PI (pressure to publish/graduate fast, career interest,competition, grant writing, insufficient supervision)

Referees/Editors (lazy, inadequate reviewing routines, noaccess to raw data, “validation by senior author name”,lack of experience)

Software (misses or causes errors) PDB (doesn’t check) External (competition/danger of being scooped) Nature (limitations of the technique/resolution, errors hard

to detect, poor data)