Genome Content Management: A Tale of Small RNA

Preview:

DESCRIPTION

Presentation

Citation preview

Genome Content ManagementA Tale of Small RNA

William Spoonerwhs@eaglegenomics.com

@whsqwghlm

eSI workshop on data flows in NGS

Eagle: An Open Source Business

Consultancy/advice Training Support Installation/Integration Customization Out sourced management

BusinessOpen Community(e.g. Academia)

Service Company

ServiceCollaboration

The Sequencing Cliff

$1000 genome~$0.01 Mbase (30x coverage)

Bioinformatics Crash Landing?

What needs to change?

The following must increase:

1. Number/efficiency of bioinformaticians,

2. Hardware scalability,

3. Software quality.

Survey – Attitudes to OS Software

Stab

ility

/relia

bilit

y

Scie

ntifi

c va

lidat

ion

Compu

tatio

nal e

fficien

cy

Easy

to in

stal

l/mai

ntai

n

Visu

al re

pres

enta

tion

Secu

rity

Inte

grat

ion

Ease

of u

se

Avai

labi

lity

of tr

aini

ng

Comm

ercial

sup

port

0

20

40

60

80

100

120

IrreleventUsefulImportant

%

Technical attributes win

Technical attributes win

Technical attributes: WIN

Usability attributes lose

Usability attributes lose

Usability attributes: LOSE

Source - http://eaglegenomics.com/survey

BUT…CAN THIS APPROACH SCALE?

Bioinformaticians like to;

• Develop their own solutions,

• Using open-source software,

• That’s stable, reliable, and published.

Bioinformaticians don’t like to;

•Develop user-friendly, supported software,

•Or train others to use it.

Is this the Answer?

“Genome Content Management is the set of processes and technologies that support the creating, managing, and reporting of genomic data.”

Create

Man

ag

e

Report

Create

Report

Ext

end

Manage

Share

Reuse

TIMELINE: Bespoke….Common Models….Content Management Systems

Genome Content Management Systems (G-CMS)

Work

flow

Ori

en

ted

Data

base O

rien

ted

Open Source Proprietary

Ensembl as a GCMS

Comparative Genomics

Functional Genomics

Variation

Assembly/GenesData Integration

Data Reporting

Data Analysis

Data Integration

Data Querying

Data QC

API

Auto-Scaling Ensembl

Databases Activities

APIDatabases Activities

APIDatabases Activities

APIDatabases Activities

APIDatabases Activities

API

Databases Activities

eHive

ActivitiesActivities

ActivitiesActivities

ActivitiesActivities

miRNA Prediction

Addfurther metrics to each transcript, e.g.:

MiRDeep Friedländer et. al. 2008

MiPred Jiang et. al. 2007

Drosha Site Finder Helvik et. al. 2007

RNAFold/RNAeval Gruber et. Al. 2008

Align small RNA reads to genome, store expressed regions

Test regions for stable secondary structure, store transcripts

Ensembl eHivebased

miRNAPrediction

from

Small RNA-Seq

Example Prediction: G.ga Chr. 19gga-mir-142 CACAGTACACTCATCCATAAAGTAGGAAACACTACACCCTGCAGTGCTGTTTAGTAGTGCTTTCTACTTTATGGGTGACTGCACTGTCgga-miR-142-3p/gga-miR-142-5p CCATAAAGTAGGAAACACTACA GTAGTGCTTTCTACTTTATGGG

IAH_G0249505300 .((.((.((((((((((((((((((.((((((.....(((....)))...)))))).)))))))))))))))))).)).)) IAH_G0249505300 AGTACACTCATCCATAAAGTAGGAAACACTACACCCTGCAGTGCTGTTTAGTAGTGCTTTCTACTTTATGGGTGACTGCACKN-1523_BP25_NR000026458 AGTACACTCATCCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000050389 .GTACACTCATCCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000037148 ..TACACTCATCCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000055284 .......TCATCCATAAAGTAGGAAACACT...................................................KN-1523_BP25_NR000055535 ........CATCCATAAAGTAGGAAACACTA..................................................KN-1523_BP25_NR000026404 ........CATCCATAAAGTAGGAAACACT...................................................KN-1523_BP25_NR000044105 ........CATCCATAAAGTAGGAAACACT...................................................KN-1523_BP25_NR000031787 ........CATCCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000031957 .........ATCCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000022166 .........ATCCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000045530 .........ATCCATAAAGTAGGAAACACTA..................................................KN-1523_BP25_NR000046154 .........ATCCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000037676 ..........TCCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000032163 ..........TCCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000065952 ..........TCCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000064893 ..........TCCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000060177 ..........TCCATAAAGTAGGAAACACTA..................................................KN-1523_BP25_NR000054812 ...........CCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000058317 ...........CCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000007107 ...........CCATAAAGTAGGAAACACTA..................................................KN-1523_BP25_NR000055367 ...........CCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000016790 ...........CCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000041436 ...........CCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000034949 ...........CCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000008209 ...........CCATAAAGTAGGAAACAC....................................................KN-1523_BP25_NR000060347 ...........CCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000002912 ...........CCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000006332 ...........CCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000038759 ............CATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000049956 ............CATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000048198 .............ATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000042788 ...............................................TTAGTAGTGCTTTCTACTTTAT............KN-1523_BP25_NR000051592 .................................................AGTAGTGCTTTCTACTTTATG...........KN-1523_BP25_NR000064475 .................................................AGTAGTGCTTTCTACTTTATG...........KN-1523_BP25_NR000065559 ..................................................GTAGTGCTTTCTACTTTATG...........KN-1523_BP25_NR000031530 ....................................................AGTGCTTTCTACTTTATG...........KN-1523_BP25_NR000049786 ....................................................AGTGCTTTCTACTTTATGGG.........KN-1523_BP25_NR000000111 .....................................................GTGCTTTCTACTTTATGGG.........

KnownmiRNA

PredictedmiRNA

Droshasite

Conclusions

• This talk;

• Started with a business model,

• Ended with a sequence annotation,

• The two are linked by content management;

• Reuse• Extend• Share

Acknowledgements

• Mick Watson

- Institute of Animal Health

• Nick James

- Eagle Genomics Ltd.

• Madhu Donepudi

- Eagle Genomics Ltd.

Bioinformatics in the age of the

$1000 Genome?

http://eaglegenomics.com/survey

Recommended