Upload
bosc-2010
View
684
Download
2
Tags:
Embed Size (px)
Citation preview
IPRStats: a Visualization Tool for InterProScan
Iddo FriedbergMicrobiology and
Computer Science & Software Engineering
Miami Universityhttp://github.com/devrkel/IPRStats.git
Microbes are Everywhere
● 1030 prokaryotic cells on Earth (give or take a couple)
● Dominate the biosphere● 90% of the cells in your body
are prokaryotic (1014)● Found in the most hostile
environments
Microbes do Everything● Nutrient reservoir:
● 4x1010 tons carbon (rivaling plants)
● 1x1010 tons Nitrogen● 1x109 tons phosphorous
●
almost
Of course there is health...
● Communicable diseases
● Heart disease● Gastric cancer● Irritable Bowel
Syndrome
...and Wellness
Microbial Genomics
Phage phi-X174 1978: 5.5Kbp
H. influenzae 1995: 1.7Mbp
Classic microbial genomics
Classic microbial genomics
Classic microbial genomics
Microbes live in Communities& only 1% can be cultured
What is Metagenomics?• Culture independent approach to study
microbial communities– < 1% of microbes can be cultured– DNA directly isolated from environmental sample
and sequenced
• Examining genomic content of organisms in community/environment to better understand:– Diversity of organisms– Their roles and interactions in the ecosystem
Metagenomics is the Application of Genomics to Communities
Some things we can learn using Metagenomics
● Taxonomic content: Taxon diversity in a habitat (using taxonomic markers)
• Functional content: biological functions, qualitative and quantitative profiles
• Coping with the environment: differences in functional content between habitats
• Decompose the biotic / abiotic elements in a habitat: metadata analysis
A Metagenomic project
● Sequencing● Assembly● Diversity analysis● Annotation
● Gene finding ● Function prediction
● Diversity analysis● Comparative
analysis
A Metagenomic project
● Sequencing● Assembly● Diversity analysis● Annotation
● Gene finding ● Function prediction
● Diversity analysis● Comparative
analysis
A Metagenomic project
● Sequencing● Assembly● Annotation
● Gene finding ● Function prediction
● Diversity analysis● Comparative
analysis
Population analysis tools
InterProScan● Signature search against an
integrated resource of domains and functional sites
● Easy to install, cluster-enabled (pleasantly parallel)
● Maintained by EBI
● Can annotate whole genomes
● PIR, Pfam, TIGRFam, Panther, Prodom, PRINTS,...
● Needs a visualization tool for population / metagenomic annotation
IPRStats
File Help
PFAM
PIR
GENE3D
HAMAP
PANTHER
PRINTS
PRODOM
PROFILE
PROSITE
SMART
SUPERFAMILY
TIGRFAMs
Charting
Full Databases
Python SAX Parser
Aggreg ateQ
ueries
Resulting Tables
Open XML file
GUI: wxPythonExcel export: xlwt
IPRStats(wx.Frame)
Results(sqlite or pytables)
Menu(wx.MenuBar)
PropertiesDlg(wx.Dialog)
Table(wx.PyGridTableBase)
standalone
HTML
XLS(using xlwt)
IPS
exporters
XML
IPS
importers
StatsData
Settings
IPRStats Architecture
Chart(wx.StaticBitmap)
?What is PyTables?
- package for creating data structures that can handle large amounts of data- uses NumPy (for in memory) and HDF5 (for disk storage) structures- uses Numexpr (jit compiler) for evaluating expressions (like queries)- in the context of IPRScan, it provides a way of accessing a huge table of data without requiring that all the data be in memory
Pros- HDF5 provides very fast, compact and efficient indexing- NumPy provides efficient in-memory storage- Minimizes disk and memory usage- Very fast read times compared to SQLite and MySQL
Cons- Large memory overhead (particularly in comparison to smaller datasets)- Many large, complex dependencies including HDF5, NumPy, Numexpr and Cython- Slow write times (particularly important since IPRStats bottlenecks with writing)
Multiple graph formats
Pie charts
Bar graphs
Conclusions & Future
● A lightweight, machine-independent visualization tool for InterProScan annotations
● License: AFL● Todo:
● Comparative population analysis● Large dataset handling● More graphic options● Anything else you like...
– http://github.com/devrkel/IPRStats.git
Thanks
● David Ream● Han Wang● Ian Fleming● David Vincent● Ryan Kelly● EBI● Miami University startup funding● Miami University Undergraduate Summer Scholars
Program
The Friedberg Lab is Recruiting
● Graduate students● Postdocs● Catch me later, email me, or look at
iddo-friedberg.net to learn more