© 2013 DBCLS Licensed under CC BY 2.1JAPAN
Technology development of database integration to
make sense of big data in lifescience
Hidemasa BonoDatabase Center for Lifescience (DBCLS) Research Organization of Information and
Systems (ROIS), JAPAN
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
Who we are: togoDB• The integrated database project in Japan• Collaborative effort to recycle data
–Provide data which can easily reuse–Retain data which is part of ‘public data’
2
TogoHeadquarters
Technology developer
DNA data archiver
Universities & institutesData organizer
http://biosciencedbc.jp/
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
NBDC portal
3
http://biosciencedbc.jp/
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
4
http://integbio.jp/dbcatalog/
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
5photo by @hirabat (1st Bono Conference on 20130113 )
• No registration• Not only for academia,
also for-profit
Free!
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
Big data in lifescience• Output mostly from machines
–NGS(Next Generation Sequencers)• over 100M lines, 2Gbyte in size/sample• Ethical issues: Personal human genome
• So many variations in...–Data format–Application: re-sequencing, de novo seq, RNA-seq,...–Annotation: granularity of metadata
Pictures from Togo Picture Galleryhttp://g86.dbcls.jp/togopic/
NGS(SRA) GEO ArrayExpress
GenomeMetagenome
RNAseqChIPseq microarray (GeneChip,
Oligoarray)
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
Making sense of big data...
1. Exhaustive, but functional index
3. Highly curated dataset
2. Search engine for lifescience
NGS(SRA) GEO ArrayExpress
GenomeMetagenome
RNAseqChIPseq microarray (GeneChip,
Oligoarray)
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
What we have developed1.Yellow pages for NGS data archived
http://SRA.dbcls.jp/
2.Search engine for nucleotide sequenceshttp://GGRNA.dbcls.jp/
3.Summarization and visualization of reference transcriptome data
http://RefEx.dbcls.jp/8
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
1. DBCLS SRA
• Yellow pages for NGS data archived–Indexed by metadata. Search by....
• Statistics• Publications• Diseases
–Direct link to original DB(SRA)• Pre-calculated QC data
9
Search data
Download
Quality Check
Data processing
Analysis
Pipeline to help users re-use public NGS data
http://SRA.dbcls.jp/
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
Statistics: studies
10http://SRA.dbcls.jp/
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
11
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
Statistics: samples
12http://SRA.dbcls.jp/
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
Search by publications
13
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
Search by diseases
14
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
Search by diseases(cont.)
15http://SRA.dbcls.jp/
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
GGRNA
16
• Quickly finds nucleotiode sequence as well as other fields in RefSeq transcripts using suffix array
• Easily highlights PCR primers, microarray probes and target sequences of siRNA
2. GooGle like RNA search engine http://GGRNA.dbcls.jp/
Naito Y. & Bono H. Nucleic Acids Res. (2012) 40: W592-6.doi: 10.1093/nar/gks448
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
17
Genome version of GGRNA? Yes, we can!
GooGle like Genome search engine http://GGGenome.dbcls.jp/
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
3. RefEx: Reference Expression Dataset
• 40 organs dataset, 4 different methods, with BodyParts3D –Reference of gene expression in normal organs
throughout the mammalian body–Practical example of reuse of useful public data
• The search for "tissue-specific genes"
18
ESTClassical Expressed Sequence Tags
GeneChipAffymetrix’s microarray
CAGECap Analysis of Gene Expression
RNAseqTranscriptome Sequencing
http://RefEx.dbcls.jp/
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
19http://RefEx.dbcls.jp/
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
20
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
21
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
What we have developed 1.Yellow pages for NGS data archived
http://SRA.dbcls.jp/
2.Search engine for nucleotide sequenceshttp://GGRNA.dbcls.jp/
3.Summarization and visualization of reference transcriptome data
http://RefEx.dbcls.jp/22
are developing
© 2013 DBCLS Licensed under CC BY 2.1JAPAN
TogoTVArchive of talks and tutorial videos expounding how to use biological databases and tools
23
http://togotv.dbcls.jp/en/Acknowledgement•Members in DBCLS for technology development
•NBDC for funding/DDBJ for storage & CPU time
•All people for sharing precious data