38
Genome Browses and Data Display Andy Conley 3/26/2012 1

Genome Browses and Data Display

  • Upload
    jabari

  • View
    25

  • Download
    1

Embed Size (px)

DESCRIPTION

Genome Browses and Data Display. Andy Conley 3 / 26 /2012. Who is this crazy looking guy?. James Kent. Know that name. He is one of greatest, perhaps the greatest, bioinformatics programmers ever. He was deeply involved in the assembly of the public human genome project. - PowerPoint PPT Presentation

Citation preview

Page 1: Genome Browses and Data Display

1

Genome Browses and Data Display

Andy Conley3/26/2012

Page 2: Genome Browses and Data Display

2

James Kent. Know that name.

He is one of greatest, perhaps the greatest, bioinformatics programmers ever.

He was deeply involved in the assembly of the public human genome project.

If you were in the fall class, you compiled the James Kent Source tree. Almost all his.

He speaks nothing but the truth.

Who is this crazy looking guy?

Page 3: Genome Browses and Data Display

3

“Genome browsers facilitate genomic analysis by presenting alignment, experimental and annotation data in the context of genomic DNA sequences.”

Melissa S Cline & James W Kent, 2009

Genome browsers aggregate data

He knows what a genome browser should be

Page 4: Genome Browses and Data Display

4

The UCSC Genome Browser

Clicking on any of these takes you to a page full of details CDKN2A

Page 5: Genome Browses and Data Display

5

They are any kind of genomic information

Genes

Transposable element insertions

Transcription factor binding sites

Sites prone to recombination

Conservation of genomics sequences

Extremely important in modern times are tracks displaying ChIP-seq or RNA-seq data

Tracks don’t have to be genes

Page 6: Genome Browses and Data Display

6

Arguably the most advanced genome browser, it is much more than a tool for looking at genomes

It integrates a huge amount of data for each gene it displays.

The UCSC also has a graphical front end for downloading from its huge backend database

What’s good about the UCSC GB?

Page 7: Genome Browses and Data Display

7

It hosts the ENCODE project, one of the largest, probably the largest, assemblies of functional genomic data.

It let’s you jump between orthologous regions in different genome: CDKN2A

It’s a massive, massive database backend of over 6500 tables.

This UCSC browser does so much more

Page 8: Genome Browses and Data Display

8

It’s really, really, really hard to install.

It’s impossible to understand unless you’ve tried to do it.

The UCSC genome browser works so well for the genomes that it has because it is so very, very specialized for those genomes.

Each track in the UCSC browser has been lovingly crafted.

So why aren’t there dozens of UCSC Implementations

Page 9: Genome Browses and Data Display

9

A ridiculous number of genomes

They’re going to be coming out even faster in the next year or two, then faster after that.

Things like the new PacBio providing longer reads should make assembling eukaryotic genomes easier.

There are many genomes out now

Page 10: Genome Browses and Data Display

10

You can’t load them/annotate them by hand – it all has to be automated.

The UCSC guys do it for the human genome because it’s the human genome.

They’re all different from each other.

You have to have some easily deployable storage/display method for your data.

How do you handle so many genomes?

Page 11: Genome Browses and Data Display

11

There are a number of choices out there for a genome browser

There are really just 2 big ones: UCSCGMOD & GBrowse

We already discussed why you don’t use the UCSC browser for projects

Browser choices

Page 12: Genome Browses and Data Display

12

Generic – It can handle any organism

Model Organism – Not really, whatever genome

Database – Not really a database, but there is a database in it.

GMOD just sounds good

gmod.org

Generic Model Organism Database

Page 13: Genome Browses and Data Display

13

A simple, easily deployable method for storing, viewing and editing genomic data.

GMOD has many, many parts

Some of the big ones:

Apollo – EwwChado – A mechanism for storing genomic dataGBrowse – A genome browser

So what is GMOD Then?

Page 14: Genome Browses and Data Display

14

Probably (definitely) the most commonly used of the GMOD components

It is a simple but extensible platform for displaying genomic data

It is maintained mostly by this man: Scott Cain

GBrowse

Page 15: Genome Browses and Data Display

15

Many projects use GBrowse as their genome viewer

GBrowse installations

Page 16: Genome Browses and Data Display

16

WormBase is to the C.elegans genome what the UCSC browser is to the human and mouse genomes. It is huge.

WormBase

Page 17: Genome Browses and Data Display

17

FlyBase hosts many Drosophila genomes, though not with the depth of WormBase

WormBase is really at the top of non-UCSC browsers in it’s depth of information

This makes sense, given that nematodes are so heavily studied and very easy to work with.

FlyBase

Page 18: Genome Browses and Data Display

18

The result of the first couple years of the class

Currently maintained by Lee Katz at the CDC

NBase

Page 19: Genome Browses and Data Display

19

More from NBase

Page 20: Genome Browses and Data Display

20

You can use colors for information

Darker genes had more programs that indicated them being horizontally transferred

This shows genes that we thought were horizontally transferred

Page 21: Genome Browses and Data Display

21

We had a track of virulence factors in the first year

Clicking on any of them took you to details for the gene, a link to VFDB, etc.

You can also have specialized tracks

Page 22: Genome Browses and Data Display

22

You can alter how tracks are show in other ways

Add and remove tracks, change the link that appears over a feature in the genome.

This goes beyond colors

Page 23: Genome Browses and Data Display

23

One big, important thing:

“Genome browsers facilitate genomic analysis by presenting alignment, experimental and annotation data in the context of genomic DNA sequences.”

Melissa S Cline & James W Kent, 2009

Genome browsers, in short, aggregate data.

What do all of these have in common?

Page 24: Genome Browses and Data Display

24

My rotifertranscriptome browser. It doesn’t have to be a genome

Not super exciting from this view. Just the predicted coding region of an assembled contig (mRNA)

You can do even more customization

Page 25: Genome Browses and Data Display

25

All of this is in the conf

Page 26: Genome Browses and Data Display

26

The relative ordering of things in a genome.

Just a few years ago, this was not available in GBrowse, it is now.

This could easily work for comparing different bacterial species

Synteny in GBrowse

Page 27: Genome Browses and Data Display

27

GBrowse_syn on TAIR

Page 28: Genome Browses and Data Display

28

It’s More interesting in WormBase

Page 29: Genome Browses and Data Display

29

Are genome browsers useful?

Page 30: Genome Browses and Data Display

30

We deal with huge volumes of data

The fall class will recall my hatred of GUIs

We want high-throughput

Genome browses give you none of this. None.

We are bioinformaticists

Page 31: Genome Browses and Data Display

31

I spent quite a bit of time in undergrad doing bench work for Dr. Nils Kroger across the street.

I worked with these little guys:

Fascinating creatures

I cared about three genes:Sil1, Sil2, Sil3

They day the genome browsercame out changed the game

I wasn’t always a computer nerd

Page 32: Genome Browses and Data Display

32

Still pretty useful

My main uses:

1. Make sure my data are correct. Are my intersections between genes and transposable element insertions correct?

2. Download hosted data.3. Make nice pictures4. Like a biologist, gene information about specific

genes

How useful is it for us?

Page 33: Genome Browses and Data Display

33

How useful is it really?

It really depends on who you ask

It’s really for biologists: they find the browser, search for their favorite gene and get some details about it.

Once again, data aggregation.

In answer to the question

Page 34: Genome Browses and Data Display

34

They were super excited about it

They use it all the time

It is like magic to them. If you were to show an iPhone to somebody from 1975, it would be pretty much the same thing. Almost.

The rotifer browser

Page 35: Genome Browses and Data Display

35

Will it ever be the greatest genome browser?No. That will always be the UCSC browser

Will it remain the easiest to install for some time?Probably

Will you get the best return on time spentYep

Synteny is horribly conserved in Haemophilus, so avoid Gbrowse_syn for this class, but do keep it in mind.

Conclusion of GBrowse

Page 36: Genome Browses and Data Display

36

Genome browsers:

Allow navigation of the genomeShow genomic features, whatever they areShow annotationsShow comparisons

Just to make sure you’ve got it

Page 37: Genome Browses and Data Display

37

GBrowse, and all of GMOD, use GFF files

Generic Feature Format

Most of it is pretty simple.Chromosome(contig)start, stop, strand, id

The last column is what’s important. It lets you put whatever information about the feature you want in there.

It’s a very flexible format.

Database backends

Page 38: Genome Browses and Data Display

38

Thanks for listening

Questions?