Transcript
Page 1: Technology development of database integration to make sense of big data in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

Technology development of database integration to

make sense of big data in lifescience

Hidemasa BonoDatabase Center for Lifescience (DBCLS) Research Organization of Information and

Systems (ROIS), JAPAN

Page 2: Technology development of database integration to make sense of big data in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

Who we are: togoDB• The integrated database project in Japan• Collaborative effort to recycle data

–Provide data which can easily reuse–Retain data which is part of ‘public data’

2

TogoHeadquarters

Technology developer

DNA data archiver

Universities & institutesData organizer

http://biosciencedbc.jp/

Page 3: Technology development of database integration to make sense of big data in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

NBDC portal

3

http://biosciencedbc.jp/

Page 4: Technology development of database integration to make sense of big data in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

4

http://integbio.jp/dbcatalog/

Page 5: Technology development of database integration to make sense of big data in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

5photo by @hirabat (1st Bono Conference on 20130113 )

• No registration• Not only for academia,

also for-profit

Free!

Page 6: Technology development of database integration to make sense of big data in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

Big data in lifescience• Output mostly from machines

–NGS(Next Generation Sequencers)• over 100M lines, 2Gbyte in size/sample• Ethical issues: Personal human genome

• So many variations in...–Data format–Application: re-sequencing, de novo seq, RNA-seq,...–Annotation: granularity of metadata

Pictures from Togo Picture Galleryhttp://g86.dbcls.jp/togopic/

NGS(SRA) GEO ArrayExpress

GenomeMetagenome

RNAseqChIPseq microarray (GeneChip,

Oligoarray)

Page 7: Technology development of database integration to make sense of big data in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

Making sense of big data...

1. Exhaustive, but functional index

3. Highly curated dataset

2. Search engine for lifescience

NGS(SRA) GEO ArrayExpress

GenomeMetagenome

RNAseqChIPseq microarray (GeneChip,

Oligoarray)

Page 8: Technology development of database integration to make sense of big data in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

What we have developed1.Yellow pages for NGS data archived

http://SRA.dbcls.jp/

2.Search engine for nucleotide sequenceshttp://GGRNA.dbcls.jp/

3.Summarization and visualization of reference transcriptome data

http://RefEx.dbcls.jp/8

Page 9: Technology development of database integration to make sense of big data in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

1. DBCLS SRA

• Yellow pages for NGS data archived–Indexed by metadata. Search by....

• Statistics• Publications• Diseases

–Direct link to original DB(SRA)• Pre-calculated QC data

9

Search data

Download

Quality Check

Data processing

Analysis

Pipeline to help users re-use public NGS data

http://SRA.dbcls.jp/

Page 10: Technology development of database integration to make sense of big data in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

Statistics: studies

10http://SRA.dbcls.jp/

Page 11: Technology development of database integration to make sense of big data in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

11

Page 12: Technology development of database integration to make sense of big data in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

Statistics: samples

12http://SRA.dbcls.jp/

Page 13: Technology development of database integration to make sense of big data in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

Search by publications

13

Page 14: Technology development of database integration to make sense of big data in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

Search by diseases

14

Page 15: Technology development of database integration to make sense of big data in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

Search by diseases(cont.)

15http://SRA.dbcls.jp/

Page 16: Technology development of database integration to make sense of big data in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

GGRNA

16

• Quickly finds nucleotiode sequence as well as other fields in RefSeq transcripts using suffix array

• Easily highlights PCR primers, microarray probes and target sequences of siRNA

2. GooGle like RNA search engine http://GGRNA.dbcls.jp/

Naito Y. & Bono H. Nucleic Acids Res. (2012) 40: W592-6.doi: 10.1093/nar/gks448

Page 17: Technology development of database integration to make sense of big data in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

17

Genome version of GGRNA? Yes, we can!

GooGle like Genome search engine http://GGGenome.dbcls.jp/

Page 18: Technology development of database integration to make sense of big data in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

3. RefEx: Reference Expression Dataset

• 40 organs dataset, 4 different methods, with BodyParts3D –Reference of gene expression in normal organs

throughout the mammalian body–Practical example of reuse of useful public data

• The search for "tissue-specific genes"

18

ESTClassical Expressed Sequence Tags

GeneChipAffymetrix’s microarray

CAGECap Analysis of Gene Expression

RNAseqTranscriptome Sequencing

http://RefEx.dbcls.jp/

Page 19: Technology development of database integration to make sense of big data in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

19http://RefEx.dbcls.jp/

Page 20: Technology development of database integration to make sense of big data in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

20

Page 21: Technology development of database integration to make sense of big data in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

21

Page 22: Technology development of database integration to make sense of big data in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

What we have developed 1.Yellow pages for NGS data archived

http://SRA.dbcls.jp/

2.Search engine for nucleotide sequenceshttp://GGRNA.dbcls.jp/

3.Summarization and visualization of reference transcriptome data

http://RefEx.dbcls.jp/22

are developing

Page 23: Technology development of database integration to make sense of big data in lifescience

© 2013 DBCLS Licensed under CC BY 2.1JAPAN

TogoTVArchive of talks and tutorial videos expounding how to use biological databases and tools

23

http://togotv.dbcls.jp/en/Acknowledgement•Members in DBCLS for technology development

•NBDC for funding/DDBJ for storage & CPU time

•All people for sharing precious data


Recommended