13
2014 BMMB 852: Applied Bioinforma8cs Week 6, Lecture 12 István Albert Bioinforma8cs Consul8ng Center Penn State

%Week%6,%Lecture%12% - Pennsylvania State University · 2014. 10. 2. · 2014%&%BMMB%852:%Applied%Bioinformacs% % %Week%6,%Lecture%12% István’Albert’ ’ Bioinformacs%Consul8ng%Center%

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: %Week%6,%Lecture%12% - Pennsylvania State University · 2014. 10. 2. · 2014%&%BMMB%852:%Applied%Bioinformacs% % %Week%6,%Lecture%12% István’Albert’ ’ Bioinformacs%Consul8ng%Center%

2014  -­‐  BMMB  852:  Applied  Bioinforma8cs  

   Week  6,  Lecture  12  

István  Albert    

Bioinforma8cs  Consul8ng  Center    

Penn  State  

Page 2: %Week%6,%Lecture%12% - Pennsylvania State University · 2014. 10. 2. · 2014%&%BMMB%852:%Applied%Bioinformacs% % %Week%6,%Lecture%12% István’Albert’ ’ Bioinformacs%Consul8ng%Center%

blastdbcmd  –  an  unsung  hero  

•  a  useful  tool  with  an  unfortunate  name  

•  and  unfortunate  parameters  

•  and  unfortunate  documenta8on  

Page 3: %Week%6,%Lecture%12% - Pennsylvania State University · 2014. 10. 2. · 2014%&%BMMB%852:%Applied%Bioinformacs% % %Week%6,%Lecture%12% István’Albert’ ’ Bioinformacs%Consul8ng%Center%

makeblastdb  -­‐parse_seqids  op8on  

•  Use  the  –parse_seqids  flag  when  invoking  makeblastdb  à  allows  the  retrieval  of  sequences  based  upon  sequence  iden;fiers.      

•  In  that  case,  each  sequence  must  have  a  unique  iden;fier,  and  that  iden8fier  must  have  a  specific  format      see  also  sec8on  5.14  Limi)ng  a  Search  with  a  List  of  Iden)fiers    in  the  BLAST+  handbook    

Page 4: %Week%6,%Lecture%12% - Pennsylvania State University · 2014. 10. 2. · 2014%&%BMMB%852:%Applied%Bioinformacs% % %Week%6,%Lecture%12% István’Albert’ ’ Bioinformacs%Consul8ng%Center%

FASTA  sequence  ID  format  values  

Page 5: %Week%6,%Lecture%12% - Pennsylvania State University · 2014. 10. 2. · 2014%&%BMMB%852:%Applied%Bioinformacs% % %Week%6,%Lecture%12% István’Albert’ ’ Bioinformacs%Consul8ng%Center%

Accession  number  prefixes  

Some  prefixes  are  have  addi8onal  meaning.      Others  are  may  only  indicate  a  database  or  molecule  type.      

Page 6: %Week%6,%Lecture%12% - Pennsylvania State University · 2014. 10. 2. · 2014%&%BMMB%852:%Applied%Bioinformacs% % %Week%6,%Lecture%12% István’Albert’ ’ Bioinformacs%Consul8ng%Center%

One  of  the  most  common  ques8ons  How  to  extract  a    

small  sub-­‐sequence  from  a  genome?  

There  are  a  number  of  answers  –  blastdbcmd  could  be  the  simplest  but  it  is  not  all  that  well  documented  

Page 7: %Week%6,%Lecture%12% - Pennsylvania State University · 2014. 10. 2. · 2014%&%BMMB%852:%Applied%Bioinformacs% % %Week%6,%Lecture%12% István’Albert’ ’ Bioinformacs%Consul8ng%Center%

Get  the  Ebola  genome  for  the  1999  outbreak:    BioProject:  PRJNA14703  

blastdbcmd  –  format  and  extract  sequences  in  the  blast  database  

Page 8: %Week%6,%Lecture%12% - Pennsylvania State University · 2014. 10. 2. · 2014%&%BMMB%852:%Applied%Bioinformacs% % %Week%6,%Lecture%12% István’Albert’ ’ Bioinformacs%Consul8ng%Center%

More  formacng  op8ons  

Page 9: %Week%6,%Lecture%12% - Pennsylvania State University · 2014. 10. 2. · 2014%&%BMMB%852:%Applied%Bioinformacs% % %Week%6,%Lecture%12% István’Albert’ ’ Bioinformacs%Consul8ng%Center%

List  of  BLAST+  programs  

Page 10: %Week%6,%Lecture%12% - Pennsylvania State University · 2014. 10. 2. · 2014%&%BMMB%852:%Applied%Bioinformacs% % %Week%6,%Lecture%12% István’Albert’ ’ Bioinformacs%Consul8ng%Center%

What  I  think  programs  should  be  called  

Official  Name   Query   Subject   What  should  it  be  called  

blastn   Nucleo8de   Nucleo8de   blast  NN  

blastp   Protein   Protein   blast  PP  

blastx   Nucleo8de   Protein   blast  NP  

tblastn   Protein   Nucleo8de   blast  PN  

tblastx   Nucleo8de   Nucleo8de   tblast  NN  

Page 11: %Week%6,%Lecture%12% - Pennsylvania State University · 2014. 10. 2. · 2014%&%BMMB%852:%Applied%Bioinformacs% % %Week%6,%Lecture%12% István’Albert’ ’ Bioinformacs%Consul8ng%Center%

Running  tools  in  the  blast  family:  blastp  

•  Think  it  trough:  What?  Where?  How?  

protein  vs  protein  

Page 12: %Week%6,%Lecture%12% - Pennsylvania State University · 2014. 10. 2. · 2014%&%BMMB%852:%Applied%Bioinformacs% % %Week%6,%Lecture%12% István’Albert’ ’ Bioinformacs%Consul8ng%Center%

blastx  and  tblastn  

•  nucleo8de  vs  protein  •  protein  vs  nucleo8de  

It  is  very  easy  to  list  the  query/database  incorrectly  or  use  the  wrong  types.      Blast  oeen  will  not  report  it  and  produces  no  hits.  

Page 13: %Week%6,%Lecture%12% - Pennsylvania State University · 2014. 10. 2. · 2014%&%BMMB%852:%Applied%Bioinformacs% % %Week%6,%Lecture%12% István’Albert’ ’ Bioinformacs%Consul8ng%Center%

Homework  12    

Create  a  blast  database  from  all  proteins  found  in  the  2014  Ebola  paper  (you’ll  have  at  least  891).    •  find  the  shortest  and  longest  protein  among  these    

•  Compare  these  proteins  to  the  NP_066243  nucleoprotein  iden8fied  during  the  1999  Ebola  outbreak.  What  are  the  best  and  worst  matches.