16
Genome Content Management A Tale of Small RNA William Spooner [email protected] @whsqwghlm eSI workshop on data flows in NGS

Genome Content Management: A Tale of Small RNA

Embed Size (px)

DESCRIPTION

Presentation

Citation preview

Page 1: Genome Content Management: A Tale of Small RNA

Genome Content ManagementA Tale of Small RNA

William [email protected]

@whsqwghlm

eSI workshop on data flows in NGS

Page 2: Genome Content Management: A Tale of Small RNA

Eagle: An Open Source Business

Consultancy/advice Training Support Installation/Integration Customization Out sourced management

BusinessOpen Community(e.g. Academia)

Service Company

ServiceCollaboration

Page 3: Genome Content Management: A Tale of Small RNA

The Sequencing Cliff

$1000 genome~$0.01 Mbase (30x coverage)

Page 4: Genome Content Management: A Tale of Small RNA

Bioinformatics Crash Landing?

What needs to change?

The following must increase:

1. Number/efficiency of bioinformaticians,

2. Hardware scalability,

3. Software quality.

Page 5: Genome Content Management: A Tale of Small RNA

Survey – Attitudes to OS Software

Stab

ility

/relia

bilit

y

Scie

ntifi

c va

lidat

ion

Compu

tatio

nal e

fficien

cy

Easy

to in

stal

l/mai

ntai

n

Visu

al re

pres

enta

tion

Secu

rity

Inte

grat

ion

Ease

of u

se

Avai

labi

lity

of tr

aini

ng

Comm

ercial

sup

port

0

20

40

60

80

100

120

IrreleventUsefulImportant

%

Technical attributes win

Technical attributes win

Technical attributes: WIN

Usability attributes lose

Usability attributes lose

Usability attributes: LOSE

Source - http://eaglegenomics.com/survey

Page 6: Genome Content Management: A Tale of Small RNA

BUT…CAN THIS APPROACH SCALE?

Bioinformaticians like to;

• Develop their own solutions,

• Using open-source software,

• That’s stable, reliable, and published.

Bioinformaticians don’t like to;

•Develop user-friendly, supported software,

•Or train others to use it.

Page 7: Genome Content Management: A Tale of Small RNA

Is this the Answer?

“Genome Content Management is the set of processes and technologies that support the creating, managing, and reporting of genomic data.”

Create

Man

ag

e

Report

Create

Report

Ext

end

Manage

Share

Reuse

TIMELINE: Bespoke….Common Models….Content Management Systems

Page 8: Genome Content Management: A Tale of Small RNA

Genome Content Management Systems (G-CMS)

Work

flow

Ori

en

ted

Data

base O

rien

ted

Open Source Proprietary

Page 9: Genome Content Management: A Tale of Small RNA

Ensembl as a GCMS

Comparative Genomics

Functional Genomics

Variation

Assembly/GenesData Integration

Data Reporting

Data Analysis

Data Integration

Data Querying

Data QC

API

Page 10: Genome Content Management: A Tale of Small RNA

Auto-Scaling Ensembl

Databases Activities

APIDatabases Activities

APIDatabases Activities

APIDatabases Activities

APIDatabases Activities

API

Databases Activities

eHive

ActivitiesActivities

ActivitiesActivities

ActivitiesActivities

Page 11: Genome Content Management: A Tale of Small RNA

miRNA Prediction

Addfurther metrics to each transcript, e.g.:

MiRDeep Friedländer et. al. 2008

MiPred Jiang et. al. 2007

Drosha Site Finder Helvik et. al. 2007

RNAFold/RNAeval Gruber et. Al. 2008

Align small RNA reads to genome, store expressed regions

Test regions for stable secondary structure, store transcripts

Page 12: Genome Content Management: A Tale of Small RNA

Ensembl eHivebased

miRNAPrediction

from

Small RNA-Seq

Page 13: Genome Content Management: A Tale of Small RNA

Example Prediction: G.ga Chr. 19gga-mir-142 CACAGTACACTCATCCATAAAGTAGGAAACACTACACCCTGCAGTGCTGTTTAGTAGTGCTTTCTACTTTATGGGTGACTGCACTGTCgga-miR-142-3p/gga-miR-142-5p CCATAAAGTAGGAAACACTACA GTAGTGCTTTCTACTTTATGGG

IAH_G0249505300 .((.((.((((((((((((((((((.((((((.....(((....)))...)))))).)))))))))))))))))).)).)) IAH_G0249505300 AGTACACTCATCCATAAAGTAGGAAACACTACACCCTGCAGTGCTGTTTAGTAGTGCTTTCTACTTTATGGGTGACTGCACKN-1523_BP25_NR000026458 AGTACACTCATCCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000050389 .GTACACTCATCCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000037148 ..TACACTCATCCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000055284 .......TCATCCATAAAGTAGGAAACACT...................................................KN-1523_BP25_NR000055535 ........CATCCATAAAGTAGGAAACACTA..................................................KN-1523_BP25_NR000026404 ........CATCCATAAAGTAGGAAACACT...................................................KN-1523_BP25_NR000044105 ........CATCCATAAAGTAGGAAACACT...................................................KN-1523_BP25_NR000031787 ........CATCCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000031957 .........ATCCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000022166 .........ATCCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000045530 .........ATCCATAAAGTAGGAAACACTA..................................................KN-1523_BP25_NR000046154 .........ATCCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000037676 ..........TCCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000032163 ..........TCCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000065952 ..........TCCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000064893 ..........TCCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000060177 ..........TCCATAAAGTAGGAAACACTA..................................................KN-1523_BP25_NR000054812 ...........CCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000058317 ...........CCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000007107 ...........CCATAAAGTAGGAAACACTA..................................................KN-1523_BP25_NR000055367 ...........CCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000016790 ...........CCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000041436 ...........CCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000034949 ...........CCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000008209 ...........CCATAAAGTAGGAAACAC....................................................KN-1523_BP25_NR000060347 ...........CCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000002912 ...........CCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000006332 ...........CCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000038759 ............CATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000049956 ............CATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000048198 .............ATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000042788 ...............................................TTAGTAGTGCTTTCTACTTTAT............KN-1523_BP25_NR000051592 .................................................AGTAGTGCTTTCTACTTTATG...........KN-1523_BP25_NR000064475 .................................................AGTAGTGCTTTCTACTTTATG...........KN-1523_BP25_NR000065559 ..................................................GTAGTGCTTTCTACTTTATG...........KN-1523_BP25_NR000031530 ....................................................AGTGCTTTCTACTTTATG...........KN-1523_BP25_NR000049786 ....................................................AGTGCTTTCTACTTTATGGG.........KN-1523_BP25_NR000000111 .....................................................GTGCTTTCTACTTTATGGG.........

KnownmiRNA

PredictedmiRNA

Droshasite

Page 14: Genome Content Management: A Tale of Small RNA

Conclusions

• This talk;

• Started with a business model,

• Ended with a sequence annotation,

• The two are linked by content management;

• Reuse• Extend• Share

Page 15: Genome Content Management: A Tale of Small RNA

Acknowledgements

• Mick Watson

- Institute of Animal Health

• Nick James

- Eagle Genomics Ltd.

• Madhu Donepudi

- Eagle Genomics Ltd.

Page 16: Genome Content Management: A Tale of Small RNA

Bioinformatics in the age of the

$1000 Genome?

http://eaglegenomics.com/survey