Upload
roland
View
28
Download
0
Tags:
Embed Size (px)
DESCRIPTION
A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK Matthew T. Dougherty NCMI - Baylor College of Medicine Houston, Texas. Digital Archives for Molecular Microscopy. Bioimage Informatics. Informatics in support of biological imaging - PowerPoint PPT Presentation
Citation preview
Digital Archives forMolecular Microscopy
A community database forbiological research
Christoph BestEuropean Bioinformatics Institute,
Cambridge, UK
Matthew T. DoughertyNCMI - Baylor College of Medicine
Houston, Texas
Bioimage Informatics Informatics in support of biological imaging Why?
Image data rapidly increasing (Confocal) Fluorescence microscopy (Cellular B.) EMDB: Electron Microscopy (Structural Biology) High-throughput methods (Genome Biology)
Enabling science by making data accessible, reliable, and understandable
Standards&Conventions Public Databases
Quality assessment Open Microscopy Environment
S.Haertel, U. Chile
J. Swedlow, U. Dundee
EMDB, EBI
Structural Databases at EBI Protein Databank (PDB)
Atomic structures (positions of atoms) PDB file format, mmCIF Derived from X-ray crystallography Long tradition, curated data base Huge: 65,000+ entries, 3 wwPDB sites
Electron Microscopy Databank (EMDB) Part of PDB at EBI and Rutgers 600 density maps of macromolecular structures and
subcellular complexes Started 2002 Curated, but limited metadata, experiment info XML-based
Page 4
SCIENTIFIC BACKGROUND
Page 5
Electron microscope
From Schweikert, 2004
Biocenter, U Helsinki
Page 6
Page 7
Single-particle method
Tripeptidyl-peptidase II(TPP II)
courtesy of B. Rockel, Martinsried
Molecular structure Many images
computationally combined
3D from 2D resolution increase by
avaraging
Page 8
Single-particle analysis: GroEL to 4A
Ludtke et al, Structure 2008
Page 9
Data Management Issues
Initial EM images:
O(1000), 4k x 4k -> O(10GPixel) Particle stacks:
O(100,000), 256x256 -> O(10 GPixel) Final data set: 1 MVoxel small Processing power:
O(100) cores, some weeks, lab-owned clusters Software:
1970s FORTRAN codes, 1990s C codes
fragmented communities, lack of standards
Page 10
Electron tomography 3D reconstruction by taking a series of images
from different angles Difficulty: Nanometer accuracy Problems:
Limited tilt range ↔ missing wedge⇒ distortion
Imperfections of the tilt ↔ alignment⇒ limited resolution
Computational reconstruction algorithms
Page 11
Tomography of eukaryotic cells
PROJECTION SLICE
O. Medalia et al, Science, 2002Dictyostelium discoideum
Page 12
Image enhancement
Before
Cytoskeleton of Spiroplasma melliferum
J. Kürner et al., Science, 2005
Page 13
Image enhancement
yellow: geodetic line J. Kürner J. Kürner et al.,et al., Science, Science, 20052005
After
Page 14
Automated image analysis
Manual Automatic
A. Linaroudis, Ph.D. Thesis, 2006
Automatic segmentation to identify points/lines/surfaces
Page 15
Data Management Issues
Original data:
60 images, 8k x 8k -> O(4 GPixel) Reconstruction:
8k x 8k x 256 -> O(16 GPixel) ? Software:
1970s algorithm in 1990s software Visualization:
“let's buy more memory” Future: web-based applications (Google Maps) ?
The Electron Microscopy Data Bank
contains EM-derived density maps complementary to coordinate sets in PDB established 2002 @ EBI (Kim Henrick) web-based submission and retrieval hand-curated (R. Newman)
A bit like Ebay – and you won't make any money, either
THE ELECTRON MICROSCOPY DATA BANK
A Unified Data Resource for EM
NIH-funded joint project
Baylor College of Medicine, Houston (W. Chiu, M. Baker)
Rutgers University, New Jersey [H. Berman, C. Lawson)
PDBe, EBI, Cambridge, UK [K. Henrick, C. Best, R. Newman
Baylor College of MedicineHouston, TX
Rutgers University,Piscataway, NJ
European Bioinformatics Institute,Cambridge, UK
Characteristics
Curated Community Archive: PDB and EMDB NIH, EU (in past), and BBSRC funding (+ EMBL) Worldwide cooperation Advisory boards and task forces from the community Open deposition and retrieval
→ Alternative access systems by other institutions 760 entries, 26 GB data ca 100 entries/year curation both in Europe and US
Growth of EMDB
EMDep deposition system 750 entries, current rate approx. 15-20/month Contents of an entry:
Metadata (XML header) → experimental metadataMap (any format, converted to CCP4/MRC)Additional files
Java/Tomcat/XML
Unified data resource plan
Joint deposition system
EMDB search systemJava/Tomcat
EMDB search systemJava/Tomcat
EMDB Atlas pages
XSLT
ISSUES
Metadata management
Difficult: many rounds of consulting the community
Still most fields remain empty Data harvesting
LIMS, PIMS -> rarely used Processing pipelines, image processing software
-> Lack of standards, idiosyncrasies Image formats: Appalling lack of standards
Data issues
Current: Deposit final result of experiment and computation
How much of original/intermediate data should be deposited?
Issues: Cost / Practicability Reproducibility of experiment Intellectual property (un-exploited results?) Usefulness
Non-data issues
Embargo: Image data can be withheld up to two years Allows original researcher to further exploit them Journals and funders must define:
what data must be deposited when they are to be released
Quality Standards: Require community acceptance Technically difficult Data Bank does enrich/annotate, but does not do
science → quality standards must be set by scientists
Image data formats
Current: Variety of historical ad hoc formats Unclear definitions, variations in different software
Need: Interoperability Standards Technical level? Acceptance? → Question for the community
HDF5 Common container format to deal with numerical data Heavyweight library, but widely available (but Java?) Would at least solve low-level format problems Metadata format still needs to be specified
Ontologies
Systematic way to define classes of objects attributes of these objects relationships between objects
Provides framework for metadata models Advantage: Powerful formal method Disadvantage: Not yet widely used
TECHNICAL DEVELOPMENTS
Rich data sets
Submissions consist of maps (increasingly more than one) relations between data sets → unexpressed
XML-based standards for represen-ting relationships between data:
Subject-predicate-object relationships (RDF framework)
Harvesting interface to EM processing software Web-based visualization for sub-mission and
retrieval, complex sub-missions assembled interactively (AJAX)
Rich data submissions
Possible XML representation
Bioimage informatics tools
Current EMDB interface: simple and efficient but must be extended to
accommodate more complex experiments
OMERO interface: geared at labs, not
public databases All the beauty of AJAX high-performance
visualization
multichannel imageslab notebooktaggingimage markup
Bioimage informatics tools
BISQUE/BISUICK (UCSB)
No Standards
Experiment?Image?Analytics?Annotations?
Current Imaging Workflow Paradigm
Jason Swedlow(U. Dundee)
Towards Image Informatics
OMERO in 2007/8/9
Jason Swedlow(Univ. Dundee)
CONCLUSIONS
Imaging Centers
USERS
Databases
Grid/cloud computing/storagein house storage
storage and co
mputing engines
data submission
data harvesting
acquisition,storage, and
managementof images
storagedistributionquality assessment
Software
A Virtual Research Community
CONCLUSIONS
Community data bases are a central part of the Scientific Data Infrastructure
Image databases rapidly growing Technical challenges: data formats, size Standards and interoperability Improve metadata collection Keep the community engaged