Http://cs273a.stanford.edu 1 UCSC Genome Browser Tutorial The UCSC Toolset & Portal to the...

Preview:

Citation preview

http://cs273a.stanford.edu 1

UCSC Genome Browser Tutorial

http://genome.ucsc.edu/

http://genome-test.cse.ucsc.edu/

The UCSC Toolset & Portalto the Human Genome

• Genome Browser• Table Browser

“I was blind and now I can see”

http://cs273a.stanford.edu 2

UCSC Genome Browser

[version9a]

http://www.openhelix.com/downloads/ucsc/ucsc_home.shtml

3

The UCSC Homepage: http://genome.ucsc.edu

navigate

navigateGeneral information

Specific information—new features, current status, etc.

4

The Genome Browser Gatewaystart page choices, December 2006

Make your Gateway choices:

1. Select Clade

2. Select species: search 1 species at a time

3. Assembly: the official backbone DNA sequence

1 2 3

practically speaking, there is no such thing as a genome.

there is only a genome assembly. assemblies update.

frequently. think moving target...

5

Everything in Genomics is a Moving Target

The genomes Their annotations The Portals Our understanding of Biology

Conclusion:

write code

that can be

run...

and rerun

and rerun

and rerun

and rerun

6

The Genome Browser Gatewaystart page, basic search

7

The Genome Browser Gatewaystart page choices, December 2006

Make your Gateway choices:

1. Select Clade

2. Select species: search 1 species at a time

3. Assembly: the official backbone DNA sequence

4. Position: location in the genome to examine

5. Image width: how many pixels in display window; 5000 max

6. Configure: make fonts bigger + other choices

4 5

6

8

The Genome Browser Gatewaystart page, basic search

text/ID searches

Helpful search examples,

suggestions below

Use this Gateway to search by: Gene names, symbols Chromosome number: chr7, or region: chr11:1038475-

1075482 Keywords: kinase, receptor IDs: NP, NM, OMIM, and more…

See lower part of page for help with format

4

9

The Genome Browser Gatewaysample search for Human TP53

Sample search: human, March 2006 assembly, tp53

select

Select from results list ID search may go right to a viewer page, if unique

10

Overview of the wholeGenome Browser page

(mature release)

}Genome viewer section

mRNA and EST Tracks

Expression and Regulation

Comparative Genomics

Variation and Repeats

Groups of data

Mapping and Sequencing Tracks

Genes and Gene Prediction Tracks

ENCODE Tracks

11

Different species, different tracks, same software

Species may have different data tracks Layout, software, functions the same

12

Sample Genome Viewer image, TP53 region

base positionSTS markers

Known genes

RefSeq genes

GenBank seqs

repeats

17 species compared

SNPs

single species compared

13

Visual Cues on the Genome Browser

Track colors may have meaning—for example, Known Gene track:

•If there is a corresponding PDB entry, = black•If there is a corresponding NCBI Reviewed seq, = dark blue•If there is a corresponding NCBI Provisional seq, = light blue

Tick marks; a single location (STS, SNP)

For some tracks, the height of a bar is increased likelihoodof an evolutionary relationship (conservation track)

Intron, and direction of transcription <<< or >>>

<exon exon exon< < < < < < <ex 5' UTR3' UTR

14

Options for Changing Images: Upper Section

Change your view or location with controls at the top Use “base” to get right down to the nucleotides Configure: to change font, window size, more…

Specifya

position

fonts,window,

more

Walkleft orright

Zoomin

Zoomout

click tozoom 3x

and re-center

15

Annotation Track display options

Some data is ON or OFF by default

Links to infoand/or filters

Menu links to info about the tracks: content, methods You change the view with pulldown menus

enforcechanges

After making changes, REFRESH to enforce the change

Change track view

16

Annotation Track options, defined Hide: removes a track from view

Dense: all items collapsed into a single line

Squish: each item = separate line, but 50% height + packed

Pack: each item separate, but efficiently stacked (full height)

Full: each item on separate line

17

Reset, Hide, Configure or Refresh to change settings

You control the views Use pulldown menus Configure options page

reset, back to defaults start from

scratch

enforce any changes (hide, full, squish…)

18

Annotation Track options, if altered….important point: the browser remembers!

Session information (the position you were examining) Track choices (squish, pack, full, etc) Filter parameters (if you changed the colors of any items, or the

subset to be displayed) …are all saved on your computer. When you come back in a

couple of days to use it again, these will still be set. You may—or may not—intend this.

To clear your “cart” or parameters, click default tracks

OR

19

Saved Sessions

20

Click Any Viewer Object for Details

Example: click your mouse anywhere on the TP53 line

Click the item

New web page

opens

Many details and links to more data about TP53

21

Click annotation track item for details pages

Not all genes have This much detail.

Different annotation tracks

carry different data.

informativedescriptionother resource links

microarray data

mRNA secondary structure

links to sequences

protein domains/structure

homologs in other species

Gene Ontology™ descriptions

mRNA descriptions

pathways

22

Get DNA, with Extended Case/Color Options

Use the DNA link at the top

Plain or Extended options

Change colors, fonts, etc.

23

Get Sequence from Details Pages

Click a track, go to Sequence section of details page

Click the line Click the item

sequence sectionon detail page

24

Accessing the BLAT tool

Rapid searches by INDEXING the entire genome Works best with high similarity matches See documentation and publication for details

Kent, WJ. Genome Res. 2002. 12:656

BLAT = BLAST-like Alignment Tool

25

BLAT tool overview: www.openhelix.com/sampleseqs.html

Make choices

DNA limit 25000 basesProtein limit 10000 aa25 total sequences

Paste one or more

sequences

Or upload

submit

26

BLAT results, with links

Results with demo sequences, settings default; sort = Query, Score Score is a count of matches—higher number, better

match Click browser to go to Genome Browser image location (next slide) Click details to see the alignment to genomic sequence (2nd slide)

sorting

go

to b

row

ser/

vie

we

r

go

to a

lign

me

nt d

eta

il

27

BLAT results, browser link

From browser click in BLAT results A new line with your Sequence from BLAT Search appears!

query

click to flip frame

Watch out for reading frame! Click - - - > to flip frame Base position = full and zoomed in enough to see

amino acids

28

BLAT results,alignment details

Your query

Genomic match, color cues

Side-by-side alignment

yoursgenomic

29

Understand Blat’s Limitation

Blat was designed to rapidly align sequence from onegenome back to itself (e.g., EST/cDNA data)

It can and it does miss clear hits at times

Blat actually allows for a single mismatch, but it alsoremoves k-mers with excessive counts for efficiency.

Not suitable for cross-species mapping.

30

Bunch More Goodies – Click Around

31

Bibliography:

http://genome.ucsc.edu/goldenPath/pubs.html The UCSC Genome Browser Database:

update 2008, update 2007, and earlier. UCSC Genome Browser Tutorial UCSC Genome Browser: Deep support

for molecular biomedical research The UCSC Known Genes, 2006. The UCSC Gene Sorter, 2007. Piloting the Zebrafish Genome Browser,

2006.

32

UCSC Genome Browser

[version9a]

33

Genome Browser Database

Primary table: positions, names, etc.

UnderlyingDatabase(MySQL)

Auxiliary table: related data

visualize search & download

34

The Table Browser

Open browser

Open browser

http://genome.ucsc.edu/

35

Table Browser: Choose Genome

In the Human genome (hg16),

search for simple repeats on a chromosome 4 location

with copy number more than 10and download the sequence.

In the Human genome (hg16),

search for simple repeats on a chromosome 4 location

with copy number more than 10and download the sequence.

Choose Genome

36

Table Browser: Choose Table to Search

In the Human genome (hg16),

search for simple repeats on a chromosome 4 location

with copy number more than 10and download the sequence.

Choose Data Table

37

Table Browser: Describe Table

Describe table

38

Table Browser: Choose Region to Search

In the Human genome (hg16), search for simple repeats

on a chromosome 4 locationwith copy number more than 10

and download the sequence.

Choose Region to Search

39

Table Browser: Upload Locations to Search

Paste Upload

40

Table Browser: Filter to Refine Search

Create Filter

In the Human genome (hg16), search for simple repeats

on a chromosome 4 locationwith copy number more than 10

and download the sequence.

Submit Filter

41

Table Browser: Output Data

Output data

In the Human genome (hg16), search for simple repeats

on a chromosome 4 locationwith copy number more than 10

and download the sequence.

42

Table Browser: Output Formats

Output formats

Text Fields

43

Table Browser: Fasta Sequence Output

Sequence

44

Table Browser: Database Format Outputs

Database

45

Table Browser: Custom Track Output

Custom Track

46

Table Browser: Hyperlinks Output

Hyperlinks

47

Table Browser: Obtaining Output

Adding name creates file on desktop,leaving blank creates output in browser.

(exception: custom track)

Data Summary

48

Table Browser: Output configuration

Sequence Format

Get Sequence

49

Table Browser: Intersecting Data

Find simple repeats (copy number > 10) within known genes

and download the sequence.

Intersect

2nd Table

Any Overlap

Submit

50

Table Browser: Intersecting Data Narrows Search

Filtered simple repeats, intersected (overlapping)

w/ known genes

Summary

Filtered simple repeats

51

Table Browser: Downloading Sequence Data

Sequence Format

Get Sequence

52

Table Browser: Correlating Data Tables

Correlate 2 Datasets

Get Results

53

Custom Tracks: Table Browser Searches

Create Track

Get Output

54

Custom Tracks: Name and Configure Track

Download track file to desktop

Name Track:SRepeatKGenes

Describe Track:Intersection …

Choose defaultview in browser

In Genome Browser

55

Custom Tracks: Open Track in Genome Browser

Open Details

Compare

“…caused by anexpanded, unstable

trinucleotide repeat…”

56

Custom Tracks: Track in Table Browser

Custom tracks also are available for filtering and intersections

on the Table Browser

57

Custom Tracks: User-generated Data in Track

Custom Tracks Link

Custom Track How-to

58

Custom Tracks: Four Steps to Create Track

Four steps to create a custom trackDefine track characteristics Define browser characteristicsFormat your dataUpload and view your track

59

Custom Tracks: Submit Track

Copy and pastesmall or simple tracks

Submit File

http://genome.ucsc.edu/FAQ/FAQformat

60

Custom Tracks: Track Appears in Genome Browser

61

Custom Tracks: Track Characteristics

Default view of custom track is “pack”

Default viewof other tracks set

62

Custom Tracks: Track Appears in Table Browser

Custom Track alsoappears in

Table Browser

63

Custom Tracks from Outside Sources

Custom Tracks Link

Contributed Track

64

Bibliography:

http://genome.ucsc.edu/goldenPath/pubs.html The UCSC Table Browser, 2004. Bejerano et al., Nature Methods, 2005. The UCSC Proteome Browser Phylogenomic Resources at the UCSC Genome

Browser