44
Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK Matthew T. Dougherty NCMI - Baylor College of Medicine Houston, Texas

Digital Archives for Molecular Microscopy

  • Upload
    roland

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK Matthew T. Dougherty NCMI - Baylor College of Medicine Houston, Texas. Digital Archives for Molecular Microscopy. Bioimage Informatics. Informatics in support of biological imaging - PowerPoint PPT Presentation

Citation preview

Page 1: Digital Archives for Molecular Microscopy

Digital Archives forMolecular Microscopy

A community database forbiological research

Christoph BestEuropean Bioinformatics Institute,

Cambridge, UK

Matthew T. DoughertyNCMI - Baylor College of Medicine

Houston, Texas

Page 2: Digital Archives for Molecular Microscopy

Bioimage Informatics Informatics in support of biological imaging Why?

Image data rapidly increasing (Confocal) Fluorescence microscopy (Cellular B.) EMDB: Electron Microscopy (Structural Biology) High-throughput methods (Genome Biology)

Enabling science by making data accessible, reliable, and understandable

Standards&Conventions Public Databases

Quality assessment Open Microscopy Environment

S.Haertel, U. Chile

J. Swedlow, U. Dundee

EMDB, EBI

Page 3: Digital Archives for Molecular Microscopy

Structural Databases at EBI Protein Databank (PDB)

Atomic structures (positions of atoms) PDB file format, mmCIF Derived from X-ray crystallography Long tradition, curated data base Huge: 65,000+ entries, 3 wwPDB sites

Electron Microscopy Databank (EMDB) Part of PDB at EBI and Rutgers 600 density maps of macromolecular structures and

subcellular complexes Started 2002 Curated, but limited metadata, experiment info XML-based

Page 4: Digital Archives for Molecular Microscopy

Page 4

SCIENTIFIC BACKGROUND

Page 5: Digital Archives for Molecular Microscopy

Page 5

Electron microscope

From Schweikert, 2004

Biocenter, U Helsinki

Page 6: Digital Archives for Molecular Microscopy

Page 6

Page 7: Digital Archives for Molecular Microscopy

Page 7

Single-particle method

Tripeptidyl-peptidase II(TPP II)

courtesy of B. Rockel, Martinsried

Molecular structure Many images

computationally combined

3D from 2D resolution increase by

avaraging

Page 8: Digital Archives for Molecular Microscopy

Page 8

Single-particle analysis: GroEL to 4A

Ludtke et al, Structure 2008

Page 9: Digital Archives for Molecular Microscopy

Page 9

Data Management Issues

Initial EM images:

O(1000), 4k x 4k -> O(10GPixel) Particle stacks:

O(100,000), 256x256 -> O(10 GPixel) Final data set: 1 MVoxel small Processing power:

O(100) cores, some weeks, lab-owned clusters Software:

1970s FORTRAN codes, 1990s C codes

fragmented communities, lack of standards

Page 10: Digital Archives for Molecular Microscopy

Page 10

Electron tomography 3D reconstruction by taking a series of images

from different angles Difficulty: Nanometer accuracy Problems:

Limited tilt range ↔ missing wedge⇒ distortion

Imperfections of the tilt ↔ alignment⇒ limited resolution

Computational reconstruction algorithms

Page 11: Digital Archives for Molecular Microscopy

Page 11

Tomography of eukaryotic cells

PROJECTION SLICE

O. Medalia et al, Science, 2002Dictyostelium discoideum

Page 12: Digital Archives for Molecular Microscopy

Page 12

Image enhancement

Before

Cytoskeleton of Spiroplasma melliferum

J. Kürner et al., Science, 2005

Page 13: Digital Archives for Molecular Microscopy

Page 13

Image enhancement

yellow: geodetic line J. Kürner J. Kürner et al.,et al., Science, Science, 20052005

After

Page 14: Digital Archives for Molecular Microscopy

Page 14

Automated image analysis

Manual Automatic

A. Linaroudis, Ph.D. Thesis, 2006

Automatic segmentation to identify points/lines/surfaces

Page 15: Digital Archives for Molecular Microscopy

Page 15

Data Management Issues

Original data:

60 images, 8k x 8k -> O(4 GPixel) Reconstruction:

8k x 8k x 256 -> O(16 GPixel) ? Software:

1970s algorithm in 1990s software Visualization:

“let's buy more memory” Future: web-based applications (Google Maps) ?

Page 16: Digital Archives for Molecular Microscopy

The Electron Microscopy Data Bank

contains EM-derived density maps complementary to coordinate sets in PDB established 2002 @ EBI (Kim Henrick) web-based submission and retrieval hand-curated (R. Newman)

A bit like Ebay – and you won't make any money, either

Page 17: Digital Archives for Molecular Microscopy

THE ELECTRON MICROSCOPY DATA BANK

Page 18: Digital Archives for Molecular Microscopy

A Unified Data Resource for EM

NIH-funded joint project

Baylor College of Medicine, Houston (W. Chiu, M. Baker)

Rutgers University, New Jersey [H. Berman, C. Lawson)

PDBe, EBI, Cambridge, UK [K. Henrick, C. Best, R. Newman

Baylor College of MedicineHouston, TX

Rutgers University,Piscataway, NJ

European Bioinformatics Institute,Cambridge, UK

Page 19: Digital Archives for Molecular Microscopy

Characteristics

Curated Community Archive: PDB and EMDB NIH, EU (in past), and BBSRC funding (+ EMBL) Worldwide cooperation Advisory boards and task forces from the community Open deposition and retrieval

→ Alternative access systems by other institutions 760 entries, 26 GB data ca 100 entries/year curation both in Europe and US

Page 20: Digital Archives for Molecular Microscopy

Growth of EMDB

Page 21: Digital Archives for Molecular Microscopy

EMDep deposition system 750 entries, current rate approx. 15-20/month Contents of an entry:

Metadata (XML header) → experimental metadataMap (any format, converted to CCP4/MRC)Additional files

Java/Tomcat/XML

Page 22: Digital Archives for Molecular Microscopy

Unified data resource plan

Page 23: Digital Archives for Molecular Microscopy

Joint deposition system

Page 24: Digital Archives for Molecular Microscopy

EMDB search systemJava/Tomcat

Page 25: Digital Archives for Molecular Microscopy

EMDB search systemJava/Tomcat

Page 26: Digital Archives for Molecular Microscopy

EMDB Atlas pages

XSLT

Page 27: Digital Archives for Molecular Microscopy

ISSUES

Page 28: Digital Archives for Molecular Microscopy

Metadata management

Difficult: many rounds of consulting the community

Still most fields remain empty Data harvesting

LIMS, PIMS -> rarely used Processing pipelines, image processing software

-> Lack of standards, idiosyncrasies Image formats: Appalling lack of standards

Page 29: Digital Archives for Molecular Microscopy

Data issues

Current: Deposit final result of experiment and computation

How much of original/intermediate data should be deposited?

Issues: Cost / Practicability Reproducibility of experiment Intellectual property (un-exploited results?) Usefulness

Page 30: Digital Archives for Molecular Microscopy

Non-data issues

Embargo: Image data can be withheld up to two years Allows original researcher to further exploit them Journals and funders must define:

what data must be deposited when they are to be released

Quality Standards: Require community acceptance Technically difficult Data Bank does enrich/annotate, but does not do

science → quality standards must be set by scientists

Page 31: Digital Archives for Molecular Microscopy

Image data formats

Current: Variety of historical ad hoc formats Unclear definitions, variations in different software

Need: Interoperability Standards Technical level? Acceptance? → Question for the community

HDF5 Common container format to deal with numerical data Heavyweight library, but widely available (but Java?) Would at least solve low-level format problems Metadata format still needs to be specified

Page 32: Digital Archives for Molecular Microscopy

Ontologies

Systematic way to define classes of objects attributes of these objects relationships between objects

Provides framework for metadata models Advantage: Powerful formal method Disadvantage: Not yet widely used

Page 33: Digital Archives for Molecular Microscopy

TECHNICAL DEVELOPMENTS

Page 34: Digital Archives for Molecular Microscopy

Rich data sets

Submissions consist of maps (increasingly more than one) relations between data sets → unexpressed

XML-based standards for represen-ting relationships between data:

Subject-predicate-object relationships (RDF framework)

Harvesting interface to EM processing software Web-based visualization for sub-mission and

retrieval, complex sub-missions assembled interactively (AJAX)

Page 35: Digital Archives for Molecular Microscopy

Rich data submissions

Page 36: Digital Archives for Molecular Microscopy

Possible XML representation

Page 37: Digital Archives for Molecular Microscopy

Bioimage informatics tools

Current EMDB interface: simple and efficient but must be extended to

accommodate more complex experiments

OMERO interface: geared at labs, not

public databases All the beauty of AJAX high-performance

visualization

Page 38: Digital Archives for Molecular Microscopy

multichannel imageslab notebooktaggingimage markup

Bioimage informatics tools

BISQUE/BISUICK (UCSB)

Page 39: Digital Archives for Molecular Microscopy

No Standards

Experiment?Image?Analytics?Annotations?

Current Imaging Workflow Paradigm

Jason Swedlow(U. Dundee)

Page 40: Digital Archives for Molecular Microscopy

Towards Image Informatics

Page 41: Digital Archives for Molecular Microscopy

OMERO in 2007/8/9

Jason Swedlow(Univ. Dundee)

Page 42: Digital Archives for Molecular Microscopy

CONCLUSIONS

Page 43: Digital Archives for Molecular Microscopy

Imaging Centers

USERS

Databases

Grid/cloud computing/storagein house storage

storage and co

mputing engines

data submission

data harvesting

acquisition,storage, and

managementof images

storagedistributionquality assessment

Software

A Virtual Research Community

Page 44: Digital Archives for Molecular Microscopy

CONCLUSIONS

Community data bases are a central part of the Scientific Data Infrastructure

Image databases rapidly growing Technical challenges: data formats, size Standards and interoperability Improve metadata collection Keep the community engaged