Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
8/30/12'
1'
2012'('BMMB'597D:'Analyzing'Next'Genera>on'Sequencing'Data'
''Week'1,'Lecture'2'
István'Albert''
Biochemistry'and'Molecular'Biology''and'Bioinforma>cs'Consul>ng'Center'
'Penn'State'
Get'a'good'text'editor'
Desired'features:'syntax'highligh3ng,'line'numbering,'ability'to'view'white'space''• Komodo'Edit'• Sublime'Text'• TextMate''
There'are'many'other'op>ons.''
Download'the'data'for'the'lecture'
The'url'sent'out'via'email'(also'on'the'course'webpage)''
hYp://downloads.yeastgenome.org/cura>on/chromosomal_feature/saccharomyces_cerevisiae.gff'''
Biological'file'formats'
Each'file'format'represents' '
1. Informa3on'–'types'of'knowledge'that'are' stored'in'the'file ''
2. Op3miza3on'–''types'of'opera>ons'that'are'easy/efficient'to'perform'
The'above'implies'that'some'informa>on'may'not'be'present'or'cannot'be'easily'extracted'from'a'certain'file'format. '
8/30/12'
2'
Tabular'formats'
• Many'common'bioinforma>cs'data'formats'are'column'based'and'tab(separated''
• First'format'we'deal'with'will'be'the''
GFF3 '–'Generic''Feature''Format'
(search'for'GFF3'to'see'the'specifica>on'for'version'3 )''
hYp://www.sequenceontology.org/gff3.shtml' '
The'GFF3'specifica>on'
GFF'format'Search'for'GFF3'!'hYp://www.sequenceontology.org/gff3.shtml'
Tab'separated'with'9'columns.'Missing'aYributes'may'be'replaced'with'a''dot'!'.'
1. Seqid'''''''''''(usually'chromosome)'2. Source'''''''''(where'is'the'data'coming'from)'3. Type'''''''''''''(usually'a'term'from'the'sequence'ontology)'4. Start'''''''''''''(interval'start'rela>ve'to'the'seqid)'5. End'''''''''''''''(interval'end'rela>ve'to'the'seqid)'6. Score''''''''''''(the'score'of'the'feature,'a'floa>ng'point'number)'7. Strand''''''''''(+/(/.)'8. Phase'''''''''''(used'to'indicate'reading'frame'for'coding'sequences)'9. APributes''''(semicolon'separated'aYributes'!'Name=ABC;ID=1)'
Example'aYribute'specifica>on:'name=REB1;id=YP33546
Variants'of'GFF'–'GTF'2 ''
GTF'2'–'Gene'Transfer'Format' same'9'columns'as'the'GFF''
hPp://mblab.wustl.edu/GTF2 .html'
Differences''1. Only'a'subset'of'types'are'allowed'in'column'3:'CDS, start_codon, stop_codon a nd'a'
few'more''
2. AYribute'column'format'change,'key'values'are'separated'by'space'and'not'semicolon'=''3. Two'mandatory'aYributes'at'the'end'of'the'record:'
'• gene_id'value;'''''A'globally'unique'iden>fier'for'the'genomic'source'of'the'transcript'
'• transcript_id'value;'''''A'globally'unique'iden>fier'for'the'predicted'transcript.'
'Example'aYribute'specifica>on:'name “REB1”; id “YP33546”'
8/30/12'
3'
What'do'the'terms'mean?' Sequence'ontology'browser'
Searching'for''
X_element_combinatorial_repeat''
Unix'commands'in'this'lecture'
'• wc, cat, head, tail, sort, cut, grep, more, clear
Handy'Tips''
CTRL(C'!'interrupts'any'process'that'may'be'running''
clear'!'clears'the'screen''
'cursor'keys'allow'you'to'recall'past'commands'''
'auto(complete'!'write'part'of'the'filename'then'press'TAB '
8/30/12'
4'
Inves>gate'your'data' Check'head/tail'of'the'file'
Paging'data'with:'less'(more)'
• q'or'ESC'!'quits'the'pager'
• SPACE'or'f'!'go'forward,'next'page'
• b'!'go'backward'
• /'word'!'search'for'a'word'''
• /'!'repeats'the'search'for'the'last'word'
Find'paYerns'in'the'file'
8/30/12'
5'
Connec>ng'streams'
• Input'streams:'entry'from'the'keyboard'or''files'
• Output'streams:'print'on'screen,'into'files'
Stream'redirec>on'the'symbols'of'“arrows”'<,'>''
Input'stream'redirec>on'from'file:''<'filename'Output'stream'redirec>on'to'a'file:'>'filename''
Redirec>ng'to'a'file''creates/overwrites'that'file'
Piping'streams'
• The'pipe'character''|'channels'the'output'of'one'command'into'the'other'
'(located'above'the'ENTER'key)'
'
You'can'pipe'mul>ple'commands'together'
Piping'commands'
8/30/12'
6'
Isola>ng'relevant'parts'of'our'file' How'many'of'each'elements'
Find'out'how'many'of'each'features' Homework'2'
• Create'a'file'that'lists'all'possible'ontology'terms'that'are'present'in'the'provided'GFF'file'with'a'count'of'how'many'>mes'this'element'occurs'in'the'yeast'genome.'Sort'this'file'by'this'count'in'reverse'order'(hint:'man'sort)'
• Pick'an'ontology'term'that'is'unfamiliar'to'you'and'look'it'up'in'the'Sequence'Ontology,'paste'the'explana>on'into'the'homework'