85
Humanizing bioinformatics Jan Aerts Assistant Professor - ESAT/SCD BioData Analysis & Visualization Faculty of Engineering Leuven University [email protected] @jandot

Humanizing bioinformatics

Embed Size (px)

DESCRIPTION

In this talk, I explain the need for basic visualization know-how in bioinformatics.

Citation preview

Page 1: Humanizing bioinformatics

Humanizing bioinformatics

Jan AertsAssistant Professor - ESAT/SCDBioData Analysis & VisualizationFaculty of EngineeringLeuven University

[email protected]@jandot

Page 2: Humanizing bioinformatics

whoami

Leuven

Page 3: Humanizing bioinformatics

whoami

Wageningen

Page 4: Humanizing bioinformatics

whoami

Roslin

Page 5: Humanizing bioinformatics

whoami

Hinxton

Page 6: Humanizing bioinformatics

whoami

Leuven

Page 7: Humanizing bioinformatics

why “humanizing bioinformatics”?

Page 8: Humanizing bioinformatics

scientific research paradigms-

big & complex data-

what about the user?-

data visualization

what I’ll talk about

Page 9: Humanizing bioinformatics

scientific research throughout time

Page 10: Humanizing bioinformatics

Science Paradigms

1st 1,000s years ago empirical

2nd 100s years ago theoretical

3rd last few decades computational

4rd today data exploration

Jim Gray

Page 11: Humanizing bioinformatics

Science Paradigms

1st 1,000s years ago empirical

2nd 100s years ago theoretical

3rd last few decades computational

4rd today data exploration

Jim Gray

computational biology

bioinformatics

Page 12: Humanizing bioinformatics

ever bigger datasets

ever more complicated mining algorithms

Page 13: Humanizing bioinformatics

case in point:genome sequencing

Page 14: Humanizing bioinformatics

why do we sequence?

Page 15: Humanizing bioinformatics

variation discovery

transcriptionally active sites

gene expression

copy number variation

miRNA expression & discovery

protein-DNA interactionsalternative splicing

Page 16: Humanizing bioinformatics

coverage

reads

polymorphisms

gene model

single nucleotide polymorphisms

Page 17: Humanizing bioinformatics

Robberecht et al, 2010

Molecular Biology of the Cell, 4th Edition

structural variation

Page 18: Humanizing bioinformatics
Page 19: Humanizing bioinformatics
Page 20: Humanizing bioinformatics

Human Genome Project

Page 21: Humanizing bioinformatics

automate, automate, automate

Page 22: Humanizing bioinformatics

HGP:15 years, $3 billion, tens of labs => 1 genome

now:1 week, $5000, 1 technician => 1 genome

Page 23: Humanizing bioinformatics

genome sequencing throughput

Mardis, 2010

Page 24: Humanizing bioinformatics

genome sequencing throughput

“next-generation” sequencing platforms

Mardis, 2010

Page 25: Humanizing bioinformatics

NHGRI

Page 26: Humanizing bioinformatics

Metzker et al, 2010

Page 27: Humanizing bioinformatics

big throughput => big data

Page 28: Humanizing bioinformatics

advanced data structures

Page 29: Humanizing bioinformatics

advanced data mining

support vector machine recursive feature elimination

manifold learning

adaptive cascade sharing trees

Page 30: Humanizing bioinformatics

“Dammit Jim, I’m a doctor, not a bioinformatician!”Christophe Lambert

Page 31: Humanizing bioinformatics

“Dammit Jim, I’m a doctor, not a bioinformatician!”

We’re alienating the user...too much datablind trust (?) in bioinformatician

Page 32: Humanizing bioinformatics

what’s the question?

what parameters should I use?

can I trust this output?

I can’t wrap my head around this...

but...

Page 33: Humanizing bioinformatics

what’s the question?

4th paradigm

question -> hypothesis -> generate data

Page 34: Humanizing bioinformatics

what’s the question?

4th paradigm

question -> hypothesis -> generate data

generate data -> see what we can do with it

Page 35: Humanizing bioinformatics

Gene interaction data: “A regulates B”

Page 36: Humanizing bioinformatics

what parameters should I use?

Page 37: Humanizing bioinformatics

peak

Page 38: Humanizing bioinformatics

but is this?

Page 39: Humanizing bioinformatics

van de Wiel et al, 2010

Page 40: Humanizing bioinformatics

T. Voet

Page 41: Humanizing bioinformatics

putative mutations

filter 1

filter 2

A B C

different settings for filters

filter 3

data filteringcan I trust this output?

Page 42: Humanizing bioinformatics

AB

C

Page 43: Humanizing bioinformatics

AB

CState of the art: run many filter pipelines and take intersection

Page 44: Humanizing bioinformatics

AB

C

What we should have found...

Page 45: Humanizing bioinformatics

different algorithms for finding the same thing

Page 46: Humanizing bioinformatics
Page 47: Humanizing bioinformatics

I can’t wrap my head around this...too much (?) info

Page 48: Humanizing bioinformatics

treatment plan for cancer patients

heterogeneous datasetsmultiple abstraction levels

multiple sourcesmultiple formats

population/family datapatient/clinical data

MR/CT/X-ray tissue samples

pathways gene expression data

collaborative data examinationpathologist geneticist

biologist

Page 49: Humanizing bioinformatics

researcher is lost...

Page 50: Humanizing bioinformatics

data visualization

Page 51: Humanizing bioinformatics

“... the use of computer-supported, interactive, visual representations of data to amplify cognition” (S Card, J Mackinlay & B Schneiderman)

“... computer-based visualization systems providing visual representations of datasets intended to help people carry out some task more effectively.” (T Munzner)

Page 52: Humanizing bioinformatics
Page 53: Humanizing bioinformatics

cognitive task => perceptive task

Page 54: Humanizing bioinformatics
Page 55: Humanizing bioinformatics

II IIII IIIIII IVIVx y x y x y x y

10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.588.0 6.95 8.0 8.14 8.0 6.77 8.0 5.7613.0 7.58 13.0 8.74 13.0 12.74 8.0 7.719.0 8.81 9.0 8.77 9.0 7.11 8.0 8.8411.0 8.33 11.0 9.26 11.0 7.81 8.0 8.4714.0 9.96 14.0 8.10 14.0 8.84 8.0 7.046.0 7.24 6.0 6.13 6.0 6.08 8.0 5.254.0 4.26 4.0 3.10 4.0 5.39 19.0 12.5012.0 10.84 12.0 9.13 12.0 8.15 8.0 5.567.0 4.82 7.0 7.26 7.0 6.42 8.0 7.915.0 5.68 5.0 4.74 5.0 5.73 8.0 6.80

n = 11 correlation x & y = 0.816regression line: y = 3+0.5x

mean x = 9.0mean y = 7.5

variance x = 11.0variance y = 4.12

Page 56: Humanizing bioinformatics
Page 57: Humanizing bioinformatics

exploration explanation

Page 58: Humanizing bioinformatics

pictorial superiority effect

“information”

“informa” “i”65% 1%

72hr

exploration explanation

Page 59: Humanizing bioinformatics

J van Wijk

exploration explanation

Page 60: Humanizing bioinformatics

exploration explanation

Page 61: Humanizing bioinformatics

some of the principles

know your visual encodings

power of the plane

danger of depth

eyes beat memoryoverview, zoom and filter, details on demandoverview, zoom and filter, details on demandoverview, zoom and filter, details on demandoverview, zoom and filter, details on demandoverview, zoom and filter, details on demand

...

(taken from T Munzner)

Page 62: Humanizing bioinformatics

visual encoding channels

position on common scaleposition on unaligned scale

2D size3D size

Mackinlay

Page 63: Humanizing bioinformatics

“power of the plane”

position on common scaleposition on unaligned scale

2D size3D size

Page 64: Humanizing bioinformatics

examples of sub-optimal encoding

Page 65: Humanizing bioinformatics
Page 66: Humanizing bioinformatics
Page 67: Humanizing bioinformatics

Florence Nightingale

Page 68: Humanizing bioinformatics

Florence Nightingale

Page 69: Humanizing bioinformatics

Don’t believe everything you see

Page 70: Humanizing bioinformatics
Page 71: Humanizing bioinformatics

networks... <sigh>

Page 72: Humanizing bioinformatics
Page 73: Humanizing bioinformatics

Martin Krzewinsky

same network

Page 74: Humanizing bioinformatics

Martin Krzewinsky

different networks!

Page 75: Humanizing bioinformatics

3D, anyone?

Page 76: Humanizing bioinformatics

3D, anyone?

occlusioninteraction complexityperspective distortion

text legibility

Page 77: Humanizing bioinformatics

Gene interaction data: “A regulates B”

Page 78: Humanizing bioinformatics

regulator

manager

workhorse

Page 79: Humanizing bioinformatics

size of effect shown in graphic“lie factor” =

size of effect in data

Page 80: Humanizing bioinformatics
Page 81: Humanizing bioinformatics

Humanizing bioinformatics

Page 82: Humanizing bioinformatics

Humanizing bioinformatics

there and back again

put the user back in the loop!

Page 83: Humanizing bioinformatics

Thank you

Page 84: Humanizing bioinformatics

Acknowledgments

• graphics creators

• Tamara Munzner

• Martin Krzewinski

Page 85: Humanizing bioinformatics

Image attributions

... got lost ...

If you find something that’s yours, let me know!