23
2013 & BMMB 597D: Analyzing Next Genera<on Sequencing Data Week 1, Lecture 2 István Albert Bioinforma<cs Consul<ng Center Penn State

%Week%1,%Lecture%2% - Pennsylvania State University

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: %Week%1,%Lecture%2% - Pennsylvania State University

2013%&%BMMB%597D:%Analyzing%Next%Genera<on%Sequencing%Data%

%

%Week%1,%Lecture%2%

István'Albert''

Bioinforma<cs%Consul<ng%Center%

%

Penn%State%

Page 2: %Week%1,%Lecture%2% - Pennsylvania State University

Biological%file%formats%

Each%file%format%represents%

%

1.   Informa3on%–%types%of%knowledge%that%are%stored%in%the%file%

%

2.   Op3miza3on%–%%types%of%opera<ons%that%are%easy/efficient%to%perform%

The%above%implies%that%some%informa<on%may%not%

be%present%or%cannot%be%easily%extracted%from%a%

certain%file%format.%

Page 3: %Week%1,%Lecture%2% - Pennsylvania State University

Download%your%first%data%file%

Saccharomyces%Cerevisae%Feature%File%

You%can%use%the%a%browser%or%the%command%line:%

%

(The%URL%is%on%the%course%webpage)%

%

h:p://downloads.yeastgenome.org/cura3on/chromosomal_feature/saccharomyces_cerevisiae.gff''%

Page 4: %Week%1,%Lecture%2% - Pennsylvania State University

What%is%this?%%

Look%at%the%extension%!%a%GFF%file.%

•  Due%to%a%historical%limita<on%(20%years%or%so%ago)%

and%only%on%Windows%!%files%ended%up%having%a%

three%character%extension%!%.txt,%.exe'etc.%

•  This%limita<on%also%turned%out%to%be%a%blessing%

and%it%stood%the%test%of%<me.%Makes%it%easy%to%see%

the%file%type.%

•  Note:%the%file%extension%can%be%incorrect%!%mind&

bogglingly%confusing%errors%may%arise%then.%

Page 5: %Week%1,%Lecture%2% - Pennsylvania State University

Use%unix%command%to%“poke”%at%the%file%

Page 6: %Week%1,%Lecture%2% - Pennsylvania State University

Tabular%formats%

•  Many%common%bioinforma<cs%data%formats%are%

column%based%and%tab&separated%

%

•  First%format%we%deal%with%will%be%the%%

GFF3'–'Generic''Feature''Format'

(search%for%GFF3%to%see%the%specifica<on%for%version%3)%

%

hap://www.sequenceontology.org/gff3.shtml%

%

Page 7: %Week%1,%Lecture%2% - Pennsylvania State University

The%GFF3%specifica<on%

Page 8: %Week%1,%Lecture%2% - Pennsylvania State University

GFF%format%

Search%for%GFF3%!%hap://www.sequenceontology.org/gff3.shtml%

Tab%separated%with%9%columns.%Missing%aaributes%may%be%replaced%with%a%%dot%!%.%

1.   Seqid'%%%%%%%%%%(usually%chromosome)%

2.   Source%%%%%%%%%(where%is%the%data%coming%from)%

3.   Type%%%%%%%%%%%%%(usually%a%term%from%the%sequence%ontology)%

4.   Start''%%%%%%%%%%%(interval%start%rela<ve%to%the%seqid)%5.   End''''%%%%%%%%%%%(interval%end%rela<ve%to%the%seqid)%6.   Score'''%%%%%%%%%(the%score%of%the%feature,%a%floa<ng%point%number)%

7.   Strand''%%%%%%%%(+/&/.)%8.   Phase'''''''%%%%(used%to%indicate%reading%frame%for%coding%sequences)%

9.   A:ributes%%%%(semicolon%separated%aaributes%!%Name=ABC;ID=1)%

Example%aaribute%specifica<on:%name=REB1;id=YP33546

Page 9: %Week%1,%Lecture%2% - Pennsylvania State University

What%do%the%terms%mean?%

Page 10: %Week%1,%Lecture%2% - Pennsylvania State University

Sequence%ontology%browser%

Page 11: %Week%1,%Lecture%2% - Pennsylvania State University

Searching%for%

%

X_element_combinatorial_repeat%

%

Page 12: %Week%1,%Lecture%2% - Pennsylvania State University

Unix%commands%in%this%lecture%

%

•  wc, cat, head, tail, sort, cut, grep, more, clear

Handy'Tips'%

CTRL&C%or%ESC%!%interrupts%any%process%that%may%be%running%

%

clear%!%clears%the%screen%

%

%cursor%keys%allow%you%to%recall%past%commands%%

%

%auto&complete%!%write%part%of%the%filename%then%press%TAB%

Page 13: %Week%1,%Lecture%2% - Pennsylvania State University

Check%head/tail%of%the%file%

Page 14: %Week%1,%Lecture%2% - Pennsylvania State University

Paging%data%with:%less%(more)%

•  q%or%ESC%!%quits%the%pager%

•  SPACE%or%f%!%go%forward,%next%page%

•  b%!%go%backward%

•  /%word%!%search%for%a%word%%

%

•  /%!%repeats%the%search%for%the%last%word%

Page 15: %Week%1,%Lecture%2% - Pennsylvania State University

Find%paaerns%in%the%file%with%GREP%

Page 16: %Week%1,%Lecture%2% - Pennsylvania State University

Connec<ng%streams%

•  Input%streams:%data%from%the%keyboard%or%%files%

•  Output%streams:%print%on%screen,%into%files%

Stream%redirec<on%the%symbols%of%“arrows”%<,%>%

%

Input%stream%redirec<on%from%file:%%<'filename'Output%stream%redirec<on%to%a%file:%>'filename''

Page 17: %Week%1,%Lecture%2% - Pennsylvania State University

Redirec<ng%to%a%file%%

creates/overwrites%that%file%

Page 18: %Week%1,%Lecture%2% - Pennsylvania State University

Piping%streams%

•  The%pipe%character%%|'channels%the%output%of%one%command%into%the%other%

%

(located%above%the%ENTER%key)%

%

You%can%pipe%mul<ple%commands%together%

Page 19: %Week%1,%Lecture%2% - Pennsylvania State University

Piping%commands%

Page 20: %Week%1,%Lecture%2% - Pennsylvania State University

Break%the%file%into%two%–%before%the%

the%word%FASTA%and%aker%the%word%

Page 21: %Week%1,%Lecture%2% - Pennsylvania State University

How%many%of%each%elements%

Page 22: %Week%1,%Lecture%2% - Pennsylvania State University

Find%out%how%many%of%each%features%

Page 23: %Week%1,%Lecture%2% - Pennsylvania State University

Homework%2%

•  Create%a%file%that%show%all%ontology%terms%that%are%

present%in%the%provided%GFF%file%with%a%count%of%

how%many%<mes%this%element%occurs%in%the%yeast%

genome.%%

%

What%is%the%top&ten%list%of%terms%that%occur%most%

frequently%(provide%counts)%

•  Pick%an%ontology%term%that%is%unfamiliar%to%you%

and%look%it%up%in%the%Sequence%Ontology,%paste%

the%explana<on%into%the%homework%