76
Crea%on, cura%on and analysis of Protein and RNA alignments with Jalview Jim Procter, Jalview Coordinator [email protected] Sco:sh Phylogeny Discussion Group James HuBon InsDtute 18 th February 2013

Creation, curation and analysis of RNA and Protein alignments with Jalview

Embed Size (px)

DESCRIPTION

Talk given at the Scottish Phylogeny Discussion Group on Feb 18th, 2013 at the James Hutton Institute in Invergowrie, Scotland, UK. It reviews the biological sequence and alignment data visualization. Full summary and blog post at: http://www.jalview.org/Community/Community-news/Jalview-and-2013-Google-Summer-of-Code-at-the-Scottish-Phylogeny-Discussion

Citation preview

Page 1: Creation, curation and analysis of RNA and Protein alignments with Jalview

Crea%on,  cura%on  and  analysis  of  Protein  and  RNA  alignments  with  Jalview  

Jim  Procter,  Jalview  Coordinator  [email protected]  Sco:sh  Phylogeny  Discussion  Group  James  HuBon  InsDtute  18th  February  2013  

Page 2: Creation, curation and analysis of RNA and Protein alignments with Jalview

What  is  Jalview  ?  

•  A  java  alignment  viewer  java  alignment  viewer    ..  but  not  just  for  viewing..  

•  Java  ?  –  PlaPorm  independence  –  Standalone  or  web  based  tool  

•  Open  source  since  ‘98  –  Coordinated  in  Dundee  –  Core  development  funded  by  …  

Page 3: Creation, curation and analysis of RNA and Protein alignments with Jalview

•  Columns  relate  disDnct  sites  in  biomolecules  

10

FER2_ARATH/1-17Q93Z60_ARATH/1-17FER1_MAIZE/1-21O80429_MAIZE/1-12

M A S T A L S - - - - S A IV S T S F L RM A S T A L S - - - - S A IV S T S F L RM A T V L G S P R A P A F F F S S S S L RM A A T A L - - - - - - - - - SM S I L R

Quality

Conservation

Consensus

T-COFFEE

* * 7 7 7 7 - - - - - - - - - * 6 * 4 * *

M A S T A L S - - - - S A IV S T S F L R9 9 7 6 5 5 3 1 1 1 1 3 4 4 5 7 8 9 9 9 9

Alignment  programs  T-­‐COFFEE,  Muscle,  Clustal,  etc..  

Sequence  Database  search  Blast,  HMMer,  etc..  

Ortholog  database  Orthodb,  Panther,  HOGENOM,  etc.  

Domain  or  MoDf  database  NCBI  CDD,  Prosite,  Pfam/Rfam,  PIR  ..  

Expert  Knowledge  Experimental  characterizaJon,  mutaJon  studies.  

PhylogeneDc  trees  MrBayes/RaXML/etc  

2D  &  3D  structure  predicDon  Homology  modelling,  Fold  recogniJon,  secondary  structure  &  disorder  

Published  Literature  Figures,  supplementary  info  

Create  

Curate  

Refine  

EvoluDonary  Analysis  PosiJve  selecJon  analysis,  Molecular  basis  of  character  traits  

Molecular  analysis  Structure-­‐funcJon  relaJonships,  AcJve  sites,  Binding  moJfs  

Page 4: Creation, curation and analysis of RNA and Protein alignments with Jalview

CuraDng  Alignments  –  Alignments  someDmes  need  manual  curaDon  –  Correct  alignments  conserve  common  properDes  –  Shading  can  highlight  differences  

Page 5: Creation, curation and analysis of RNA and Protein alignments with Jalview

•  Columns  relate  disDnct  sites  in  biomolecules  

10

FER2_ARATH/1-17Q93Z60_ARATH/1-17FER1_MAIZE/1-21O80429_MAIZE/1-12

M A S T A L S - - - - S A IV S T S F L RM A S T A L S - - - - S A IV S T S F L RM A T V L G S P R A P A F F F S S S S L RM A A T A L - - - - - - - - - SM S I L R

Quality

Conservation

Consensus

T-COFFEE

* * 7 7 7 7 - - - - - - - - - * 6 * 4 * *

M A S T A L S - - - - S A IV S T S F L R9 9 7 6 5 5 3 1 1 1 1 3 4 4 5 7 8 9 9 9 9

Alignments  are  central  …  Alignment  programs  T-­‐COFFEE,  Muscle,  Clustal,  etc..  

Sequence  Database  search  Blast,  HMMer,  etc..  

Ortholog  database  Orthodb,  Panther,  HOGENOM,  etc.  

Domain  or  MoDf  database  NCBI  CDD,  Prosite,  Pfam/Rfam,  PIR  ..  

Expert  Knowledge  Experimental  characterizaJon,  mutaJon  studies.  

PhylogeneDc  trees  MrBayes/RaXML/etc  

2D  &  3D  structure  predicDon  Homology  modelling,  Fold  recogniJon,  secondary  structure  &  disorder  

Published  Literature  Figures,  supplementary  info  

Create  

Curate  

Refine  Analyse  

Annotate  

EvoluDonary  Analysis  PosiJve  selecJon  analysis,  Molecular  basis  of  character  traits  

Molecular  analysis  Structure-­‐funcJon  relaJonships,  AcJve  sites,  Binding  moJfs  

Page 6: Creation, curation and analysis of RNA and Protein alignments with Jalview

Distributed  Annotation  System  

In  t  er  a  c  t  i  v  e  E  d  i  t  i  n  g   V  i  su  ali  z  a  t  i  on  

A  li  gn  m  e  n  t  s  

S  t  r  u  c  t  u  re  s  

Fe  a  t  u  re  s  

A  nn  o  t  a  t  i  on  

T  ree  s  

Se  qu  e  n  c  e  s  

P  C  A  

 

F  i  gu  r  e  Ge  n  er  a  t  i  on  

C  lic  k  a  b  l  e  H  TM  L  C  lic  k  a  b  l  e  H  TM  L   L  i  n  e  A  r  t  L  i  n  e  A  r  t  

I  m  a  g  e  s  I  m  a  g  e  s  

A  n  al  y  s  i  s  Pairwise

 

alignment

 C  on  se  n  s  u  s  C  on  se  r  v  a  t  io  n  &  C  lu  s  t  er  in  g  

Shading  

Trees/PCA  

Page 7: Creation, curation and analysis of RNA and Protein alignments with Jalview

J Song et al. Science 2011;331:1036-1040 Published by AAAS

Fig. 1 Structural overview of mDNMT1(650–1602)–DNA 19-nucleotide oligomer complex with bound AdoHcy.

Results at each stage require analysis: manual verification

A stage may be repeated several times with different methods and parameters

Data, analysis, evidence and results must be properly recorded

Analysis involves lots of different kinds of data:

Sequences, alignments, trees, structures, functional assays, literature.

Page 8: Creation, curation and analysis of RNA and Protein alignments with Jalview

Illustrator   Word/LaTeX..  Etc...  

Google  Alignment  on  Web  

Jalview  

•  Key  qualiDes  for  a  workbench  –  Undo  and  Redo!  –  Archival  of  data,  results  and  all  display  

se:ngs  –  Filtering  &  MulDple  views  

–  Easy  access  to  to  databases  and  programs  

Page 9: Creation, curation and analysis of RNA and Protein alignments with Jalview

Lightweight  UI  

Integrate  with  web  sites  

MulD-­‐windowed  UI  

VisualizaDon  &  Analysis  

Common  Data  &  Analysis  EdiDng,  messaging  and  

File  Input/Output  

Jalview  Flavours  

Page 10: Creation, curation and analysis of RNA and Protein alignments with Jalview

Distributed  Annotation  System  

In  t  er  a  c  t  i  v  e  E  d  i  t  i  n  g   V  i  su  ali  z  a  t  i  on  

A  li  gn  m  e  n  t  s  

S  t  r  u  c  t  u  re  s  

Fe  a  t  u  re  s  

A  nn  o  t  a  t  i  on  

T  ree  s  

Se  qu  e  n  c  e  s  

P  C  A  

 

F  i  gu  r  e  Ge  n  er  a  t  i  on  

C  lic  k  a  b  l  e  H  TM  L  C  lic  k  a  b  l  e  H  TM  L   L  i  n  e  A  r  t  L  i  n  e  A  r  t  

I  m  a  g  e  s  I  m  a  g  e  s  

ClustalW  

Mafft  

Multiple    alignment

Protein  Disorder

Functional  site  analysis

Protein  2ndary  structure

A  n  al  y  s  i  s  Pairwise

 

alignment

 C  on  se  n  s  u  s  C  on  se  r  v  a  t  io  n  &  C  lu  s  t  er  in  g  

Shading  

Trees/PCA  

Page 11: Creation, curation and analysis of RNA and Protein alignments with Jalview

hBp://www.jalview.org  

New  logo  

Jalview  Launch  BuBons  

InstallaDon  packages  and  

source  Help  and  

documentaDon  

Jalview  Community  

Jalview  Development  and  release  history  

Jalview  training  news  and  

course  dates  

Page 12: Creation, curation and analysis of RNA and Protein alignments with Jalview

Launching  the  jalview  desktop  

Page 13: Creation, curation and analysis of RNA and Protein alignments with Jalview
Page 14: Creation, curation and analysis of RNA and Protein alignments with Jalview

Total  launches  by  conDnent  (Sep  2012)  

Americas  26,575  2012  27,427  2011  -­‐3%  

Europe  25,161  2012  21,612  2011  +16%  

Asia  15,395  2012  11,386  2011  +35%  

Oceania  758        2012  1,184  2011  -­‐36%  

Africa  468        2012  346         2011  +35%  

UK  5,886  +11%  

India  3168    75%  

Malaysia  1464  66%  

Brazil  2109  +50%  

Page 15: Creation, curation and analysis of RNA and Protein alignments with Jalview

Jalview  News  

Page 16: Creation, curation and analysis of RNA and Protein alignments with Jalview

How  accurate  are  the  Google  AnalyDcs  Launch  StaDsDcs  ?  

Page 17: Creation, curation and analysis of RNA and Protein alignments with Jalview

0  

5000  

10000  

15000  

20000  

25000  

30000  

January   February   March   April   May   June   July   August   September   October   November   December  

Year  on  Year  increase  in  Desktop  Launches  as  measured  by  Google  AnalyDcs  vs  our  web  server  logs  

2012  

2011  

2010  

2009  

Server  logs  

Google  AnalyJcs   2012  

2011  

2010  

2009  

1  in  4  users  say  ‘no’  to  Google  AnalyDcs  

Page 18: Creation, curation and analysis of RNA and Protein alignments with Jalview

0  

5000  

10000  

15000  

20000  

25000  

30000  

Feb  Apr   Jun  Aug  Oct  Dec  Feb  Apr   Jun  Aug  Oct  Dec  Feb  Apr   Jun  Aug  Oct  Dec  Feb  Apr   Jun  Aug  Oct  Dec  Feb  Apr   Jun  Aug  Oct  Dec  Feb  Apr   Jun  Aug  Oct  Dec  

Desktop  launches  per  month  

Unique  IPs  per  month  

2011  2010  

2012  Total  275539  55805  IPs  

2012  2008   2009  

2011  Total  233320    48563  IPs  

2008  Total  144868  23924  IPs  

2009  Total  181118    32537  IPs  

2010  Total  191321    38483  IPs  

Jalview  Desktop  monthly  usage  from  2007-­‐2012  

Page 19: Creation, curation and analysis of RNA and Protein alignments with Jalview

The  Jalview  Example  Project  

•  Demonstrates  most  of  Jalview’s  key  features  

Page 20: Creation, curation and analysis of RNA and Protein alignments with Jalview

One  alignment,  many  views  

Sequence  features  highlight  key  regions  like  funcDonal  sites  

Alignment  annotaDon  area  shows  graphs  and  symbols  from  

calculaDons  and  manual  curaDon  

Page 21: Creation, curation and analysis of RNA and Protein alignments with Jalview

Linked  tree  viewer  allows  subgroups  to  be  idenDfied  in  alignment  

Group  selec1ons  

Colours  and  mouseovers  

Linked  Jmol  viewer  shows  one  or  more  

structures  coloured  by  alignment  views  

Page 22: Creation, curation and analysis of RNA and Protein alignments with Jalview

Typical  jalview  workflow  1.  Import  sequence  or  alignment  

–  Drag/drop,  Paste,  or  URL  2.  Decorate  sequences  with  references  &  annotaDon  

from  external  databases  3.  Create  alignment  4.  Use  built  in  shading,  conservaDon  analysis,  tree  and  

PCA  capabiliDes  to  explore  –  Also  use  annotaDon  and  structure  data  if  available  

5.  Select  regions  for  refinement  or  further  analysis  6.  Import  trees,  annotaDon,  etc.  created  with  other  

programs  to  explore  further  7.  Prepare  annotated  views  for  publicaDon  

Page 23: Creation, curation and analysis of RNA and Protein alignments with Jalview

PhylogeneDc  analysis  and  Jalview  

•  Built  in  tree  methods  –  UPGMA  

•  Fast,  simple,  but  not  reliable  for  phylogeneDc  inferrence  

–  Neighbour  joining  •  Slower  than  UPGMA  •  Perhaps  useful  for  a  first  approximaDon  

–  Jalview’s  implementaDon  is  not  the  most  efficient  

•  Import  trees  for  subgroup  analysis  –  Load  any  number  of  Newick/NH  Extended  files  onto  an  alignment  from  another  program  

•  Bootstraps  are  displayed  

Page 24: Creation, curation and analysis of RNA and Protein alignments with Jalview

Typical  jalview  workflow  1.  Import  sequence  or  alignment  

–  Drag/drop,  Paste,  or  URL  2.  Decorate  sequences  with  references  &  annotaDon  

from  external  databases  3.  Create  alignment  4.  Use  built  in  shading,  conservaDon  analysis,  tree  and  

PCA  capabiliDes  to  explore  –  Also  use  annotaDon  and  structure  data  if  available  

5.  Select  regions  for  refinement  or  further  analysis  6.  Import  trees,  annotaDon,  etc.  created  with  other  

programs  to  explore  further  7.  Prepare  annotated  views  for  publicaDon  

Save  as    Jalview  project    

at  strategic  points  

Page 25: Creation, curation and analysis of RNA and Protein alignments with Jalview

•  Jalview  projects  store  key  data  for  a  session  – Alignments  – AnnotaDon  &  Database  IDs  –  Structures  –  Trees  – Display  se:ngs  

•  New  Jalview  versions  are  tested  for  backwards  compaDbility  – Archival  record  of  your  analysis  

Page 26: Creation, curation and analysis of RNA and Protein alignments with Jalview

Distributed  Annotation  System  

In  t  er  a  c  t  i  v  e  E  d  i  t  i  n  g   V  i  su  ali  z  a  t  i  on  

A  li  gn  m  e  n  t  s  

S  t  r  u  c  t  u  re  s  

Fe  a  t  u  re  s  

A  nn  o  t  a  t  i  on  

T  ree  s  

Se  qu  e  n  c  e  s  

P  C  A  

 

Retrieval  from  External  Databases  Standard  DBs  

Page 27: Creation, curation and analysis of RNA and Protein alignments with Jalview

DAS  allows  Jalview  access  to  Over  270  Sequence  Databases…    

•  New  database  dialog  in  2.8.x  series  

•  Cross  source  querying  •  ENSEMBL  sources  

Page 28: Creation, curation and analysis of RNA and Protein alignments with Jalview

JDAS  

Developed  by  Rafael  Jiminez,  Jonathan  Warren  and  other  Java  developers  in  DAS  Communty  

Page 29: Creation, curation and analysis of RNA and Protein alignments with Jalview

Sequence  Features  

Page 30: Creation, curation and analysis of RNA and Protein alignments with Jalview

Sources  of  sequence  feature  data  

•  Jalview  sequence  annotaDon  files  •  DAS  sources  •  GFF  files  •  Certain  ‘rich’  alignment  formats  

– Stockholm  – AMSA  

Page 31: Creation, curation and analysis of RNA and Protein alignments with Jalview

Sequence  Features  Dialog  box  

DAS  ANNOTATION  

SERVERS  

• Query  matches  ID  to  Authority  • Map  to  local  reference  frame  

• Mouse  over  for  feature  name,  links  and  scores  

• Group  features  by  source  • Type==colour  • Highlight  start-­‐end  • Order  for  opDmal  display  

• Select  specific  sources  • Filtered  list  • Add  user  defined  sources  

Page 32: Creation, curation and analysis of RNA and Protein alignments with Jalview

Shading,  thresholding,  colour  by  label.  

Page 33: Creation, curation and analysis of RNA and Protein alignments with Jalview

ClustalW  

Mafft  

AACon  

Clustal  Omega  quick  alignment  of  millions  

of  sequences  

Assorted  protein  disorder  

predictors  

Protein  conservaDon  calculaDons  

JABAWS  2  services    in  Jalview  2.8  

Page 34: Creation, curation and analysis of RNA and Protein alignments with Jalview

Alignment  Web  Services:  JABAWS:MSA  

Peter  Troshin    

jws2

jaws2

HTTP

Replaces  original  Jalview  2  services:  •  Extensible  framework  for  wrapping  command  line  

programs  •  Can  be  installed  on  user’s  own  machine/cluster  

See  Troshin  et  al.  applicaDon  note  in  Bioinforma1cs  for  more  details.  

Page 35: Creation, curation and analysis of RNA and Protein alignments with Jalview

www.compbio.dundee.ac.uk/jabaws  

Jalview  Web  Service  GUI  

JABAWS  Java  Client  

JABAWS:MSA  Troshin  et  al.  2011,  Bioinforma1cs.    JABAWS  2  In  Prepara1on.    

Native JABAWS installs on a range of platforms

JABAWS Virtual Appliance foryour private use. powered by

JABAWS AmazonMachine Image on EC2

Page 36: Creation, curation and analysis of RNA and Protein alignments with Jalview

Jalview’s  JABAWS  ConfiguraDon  Panel  

Page 37: Creation, curation and analysis of RNA and Protein alignments with Jalview

Jalview’s  Alignment  Methods  

•  JABWS  alignment  services  

–  Preset  aligment  modes  – User  defined  se`ngs  

•  Pairwise  alignment  –  Needleman  and  Wunsch  

•  Mostly  used  internally  

Page 38: Creation, curation and analysis of RNA and Protein alignments with Jalview

0  

2000  

4000  

6000  

8000  

10000  

12000  Jun-­‐05  

Aug-­‐05  

Oct-­‐05  

Dec-­‐05

 Feb-­‐06  

Apr-­‐06  

Jun-­‐06  

Aug-­‐06  

Oct-­‐06  

Dec-­‐06

 Feb-­‐07  

Apr-­‐07  

Jun-­‐07  

Aug-­‐07  

Oct-­‐07  

Dec-­‐07

 Feb-­‐08  

Apr-­‐08  

Jun-­‐08  

Aug-­‐08  

Oct-­‐08  

Dec-­‐08

 Feb-­‐09  

Apr-­‐09  

Jun-­‐09  

Aug-­‐09  

Oct-­‐09  

Dec-­‐09

 Feb-­‐10  

Apr-­‐10  

Jun-­‐10  

Aug-­‐10  

Oct-­‐10  

Dec-­‐10

 Feb-­‐11  

Apr-­‐11  

Jun-­‐11  

Aug-­‐11  

Oct-­‐11  

Dec-­‐11

 Feb-­‐12  

Apr-­‐12  

Jun-­‐12  

Aug-­‐12  

TcoffeeWS  

ProbconsWS  

MuscleWS  

MavWS  

ClustalOWS  

ClustalWS  

JNet  MSA  PredicDons  

JNet  Sequence  PredicDons  

Mav  Alignments  

Clustal  Alignments  

Muscle  Alignments  

Jalview  Public  Service  StaDsDcs  93939  Alignments  and  8789  Jpred3  Jobs  From  September  2011  –  August  2012  

83950  Alignments  and  8193  Jpred3  Jobs  From  September  2010  –  August  2011  

Page 39: Creation, curation and analysis of RNA and Protein alignments with Jalview

Common  types  of  alignment  algorthm  

a.  Sequence  database  searches  –  opDmal  alignment  between  query  and  hit  e.g.  Blast  (single  sequence),  PSI-­‐Blast  and  HMMER  

b.  Progressive  –  opDmise  alignment  between  branches  on  guide  tree  e.g.  ClustalW  

c.  TransiDve  –  opDmise  MSA  to  maximise  consistency  between  pairs  e.g.  T-­‐COFFEE,  ProbCons  

Profile  methods  –  e.g.  Muscle  and  MAFFT  are  hybrid  of  b  and  c.    Latest  methods,  e.g.  ClustalO,  are  hybrids  –  employ  sampling  strategies  to  speed  up  tree  building  &  refinement.  

Query

Hit4Hit2

Hit1

Hit5

Hit3

aQuery

Hit4Hit2

Hit1

Hit5

Hit3

bQuery

Hit4Hit2

Hit1

Hit5

Hit3

c

Figure  adapted  from    Procter  et  al.  (2010)  Nature  Methods  7  S16  -­‐  S25  

Page 40: Creation, curation and analysis of RNA and Protein alignments with Jalview

Alignment  Job  Parameter  Se:ngs  Browse  or  edit  to  change  

name  of  set  

text  box  to  add  notes  for  the  parameter  set  

Start  job  with  current  se:ngs  

or  cancel.  

BuBons  appear  to  create,  update,  rename  or  delete  user  se:ngs.  

Parameters  contains  more  complex  se:ngs  

ToolDps  give  brief  descripDon  and  link  

to  further  info  

ToolDps  give  brief  descripDon  and  link  

(right  click)  to  further  info  

Page 41: Creation, curation and analysis of RNA and Protein alignments with Jalview

‘Realignment’  –  adding  sequences  to  an  exisDng  alignment  

Clustal  Realign  opDons:  •  Jalview  leaves  gaps  in  when  

sending  sequences  to  JABAWS  •  ClustalW  

–  Fixes  sequences  with  gaps,  aligns  other  sequences  to  profile  

•  ClustalO  –  Creates  an  HMM  from  sequences  with  gaps  

–  Aligns  all  sequences  to  HMM  

Page 42: Creation, curation and analysis of RNA and Protein alignments with Jalview

ClustalW  

Mafft  

AACon  

Clustal  Omega  quick  alignment  of  millions  

of  sequences  

Page 43: Creation, curation and analysis of RNA and Protein alignments with Jalview

18  new  alignment  conservaDon  calculaDons  provided  as  web  services  

AACon  

•  Work  like  built-­‐in  calculaDons  

•  GUI  to  control  parameters  

•  Se:ngs  stored  in  project  file  

Page 44: Creation, curation and analysis of RNA and Protein alignments with Jalview

Are  conserva%on  scores  trustworthy  ?  •  Good  quality  alignments  

– ConservaDon  ==  probable  molecular  similarity  

•  Poor  alignments  – ConservaDon  ==  random  noise  

•  How  do  you  measure  MSA  reliability  ?  – Try  different  methods.  Vary  parameters  – Compare  the  results.  

Page 45: Creation, curation and analysis of RNA and Protein alignments with Jalview

10

FER2_ARATH/1-17Q93Z60_ARATH/1-17FER1_MAIZE/1-21O80429_MAIZE/1-12

M A S T A L S - - - - S A IV S T S F L RM A S T A L S - - - - S A IV S T S F L RM A T V L G S P R A P A F F F S S S S L RM A A T A L - - - - - - - - - SM S I L R

T-COFFEE

Quality

Consensus

Conservation

9 9 7 6 5 5 3 1 1 1 1 3 4 4 5 7 8 9 9 9 9

M A S T A L S - - - - S A IV S T S F L R

* * 7 7 7 7 - - - - - - - - - * 6 * 4 * *

Consistent Pairwise Alignments Score = 740 Length = 17 PID = 100% Q93Z60_ARATH MASTALSSAIVSTSFLR ||||||||||||||||| FER2_ARATH MASTALSSAIVSTSFLR

Score = 160 Length = 17 PID=29.41% FER1_MAIZE LGSPRAPAFFFSSSSLR ..| .. . |.| || FER2_ARATH MASTALSSAIVSTSFLR

Score = 160 Length = 17 PID=29.41 FER1_MAIZE LGSPRAPAFFFSSSSLR ..| .. . |.| || Q93Z60_ARATH MASTALSSAIVSTSFLR

Score = 310 Length = 12 PID=58.33% O80429_MAIZE MAATALSMSILR ||.|||| .|. FER2_ARATH MASTALSSAIVS

Score = 310 Length = 12 PID = 58.33% O80429_MAIZE MAATALSMSILR ||.|||| .|. Q93Z60_ARATH MASTALSSAIVS

Major inconsistency: +4 Shift Score = 120 Length = 12 PID = 41.67% O80429_MAIZE MAATALSMSILR .| .| | || FER1_MAIZE APAFFFSSSSLR

10

FER2_ARATH/1-17Q93Z60_ARATH/1-17FER1_MAIZE/1-21O80429_MAIZE/1-12

M A S T A L S - - - - S A IV S T S F L RM A S T A L S - - - - S A IV S T S F L RM A T V L G S P R A P A F F F S S S S L RM A A T A L - - - - - - - - - SM S I L R

Quality

Conservation

Consensus

T-COFFEE

* * 7 7 7 7 - - - - - - - - - * 6 * 4 * *

M A S T A L S - - - - S A IV S T S F L R9 9 7 6 5 5 3 1 1 1 1 3 4 4 5 7 8 9 9 9 9

•  Calculate  ‘shix’  between  all  pairwise  aligments  and  the  mulDple  sequence  alignment  –  Higher  shixs  are  less  reliable  

Page 46: Creation, curation and analysis of RNA and Protein alignments with Jalview

T-­‐COFFEE  alignment  reliability  scores  

This  figure  shows  a  set  of  structures  

superimposed  according  to  an  alignment  

generated  by  T-­COFFEE.  The  T-­COFFEE  

reliability  score  highlights  the  most  reliable  

regions  in  red,  and  least  reliable  in  blue.

Easiest  to  do  this  is  from  command  line,  or  from  the  T-­‐COFFEE  web  site    •  CORE  –  score  an  alignment  

•  hBp://tcoffee.crg.cat/apps/tcoffee/do:core  

•  M-­‐COFFEE  –  combine  results  from  many  ‘popular  aligners’  •  hBp://tcoffee.crg.cat/apps/tcoffee/do:mcoffee  

•  Jalview  can  read  the  ‘score_ascii’  file  for  an  alignment    Hope  to  add  funcDonality  to  JABAWS  in  future  

Page 47: Creation, curation and analysis of RNA and Protein alignments with Jalview

MulD  Harmony  –  from  Jaap  Heringa’s  Group  at  Amsterdam  Free  University  

Progress  monitored  in  job  service  window  

SRBS  client  submits  alignment  and  

groups  to  service  

Return  results  to  user  

Page 48: Creation, curation and analysis of RNA and Protein alignments with Jalview
Page 49: Creation, curation and analysis of RNA and Protein alignments with Jalview

Protein  Secondary  Structure  PredicDon  

•  Neural  network  trained  on  amino  acid  profiles  – Predicts  Helix,  shEet,  or  Coil  based  on  sliding  window  

•  Also  predicts  coiled  coils  and  surface  accessibili%es  

•  Server  can  take  – Single  Sequence  

•  Service  find  homologs  with  PSI-­‐Blast  

– Alignment  •  Service  uses  MSA  to  calculate  profile  for  predicDon  

Page 50: Creation, curation and analysis of RNA and Protein alignments with Jalview

Sequence  features  are  overlaid  on  alignment  to  highlight  key  regions  

Alignment  annotaDon  area  shows  graphs  and  symbols  from  

calculaDons  and  manual  curaDon  

Just  to  recap…  

AnnotaDon  like  T-­‐COFFEE  scores  can  be  used  to  shade  alignment  

Page 51: Creation, curation and analysis of RNA and Protein alignments with Jalview
Page 52: Creation, curation and analysis of RNA and Protein alignments with Jalview

Protein  Disorder  predicDon  •  Complementary  problem  to  secondary  structure  predicDon  –  Recognise  structured  &  unstructured  domains  –  Predict  holes  in  density  maps  (REM450)  – Detect  flexible  loops  (‘HOTLOOPS’)  

•  Programs  provided  by  JABAWS  2  employ  – Machine  learning  methods  (DisEMBL)  –  Similarity  to  disordered  sequences  (RONN)  –  Empirical  amino  acid  staDsDcs  (IUPred,  GlobPlot)  

Page 53: Creation, curation and analysis of RNA and Protein alignments with Jalview

Disorder  PredicDons  from  JABAWS      

Jalview  JABAWS  2.0  

Client  Process  results  into  both  annota%on  and  features  

   

JABAWS  Analysis  Service  

Features  highlight  disordered  region  or  structured  domain  

predicDons  

Use  Threshold  &  Per-­‐sequence  opDon  on  ‘Colour  by  AnnotaDon’  dialog  to  shade  alignment  using  raw  scores  

Page 54: Creation, curation and analysis of RNA and Protein alignments with Jalview

Disorder  in  Interleukin  7  Orthologs  

Page 55: Creation, curation and analysis of RNA and Protein alignments with Jalview

Jalview  and  NucleoDde  data  •  Basic  by  current  standards  •  Built  in  cDNA-­‐>Amino  acid  transla%on  

– Works  with  Aligned  cDNA  –  Preserves  alignment  annotaDon  – Does  not  backtranslate  

•  European  nucleo%de  archive  records  –  Parse  cDNA  annotaDon  in  conDgs  

•  Display  of  WUSS  or  VIENNA  RNA  secondary  structure  nota%on  –  Stockholm  file  import  –  LocaRNA  Clustal  files  

Page 56: Creation, curation and analysis of RNA and Protein alignments with Jalview

TranslaDon  of  annotated  cDNA  alignment  

Codon  highlighDng  

Mouse  is  over  Arginine  

Structure  HighlighDng  

Page 57: Creation, curation and analysis of RNA and Protein alignments with Jalview

Jalview  2.8  and  RNA  2nd-­‐ary  Structure  

Structure  Consensus  Logo:  Shows  base  pair  distribution  at  each  paired  position  in  a  given  RNA  secondary  structure.

Linked  VARNA  RNA  Secondary  Structure  viewer  and  editor.

RALEE  style  colouring  highlights  disDnct  stems  and  

helices  

Page 58: Creation, curation and analysis of RNA and Protein alignments with Jalview

VARNA  has  a  wide  range  of  2D  RNA  plots  

and  supports  interacDve  annotaDon  

SelecDons  and  mouse  posiDons  shared  

between  alignment  view  and  VARNA    

VARNA:  InteracDve  drawing  and  ediDng  of  the  RNA  secondary  structure  Kévin  Darty,  Alain  Denise  and  Yann  Ponty  Bioinforma1cs  (2009)  25  1974-­‐1975  

Page 59: Creation, curation and analysis of RNA and Protein alignments with Jalview

Michele  Clamp  Director  of  Informa1cs  and  Scien1fic  Applica1ons,  Harvard.  

James  Cuff  High  Performance  Compu1ng,  Harvard.  

A  brief  history  of  Jalview  

Steve  Searle  Now  at  

Sanger,  UK  

Andrew  Waterhouse  U.  Basel.  

Jalview  Version  2  2005  

Jalview  Version  1  1997  

Jim  Procter  

David  MarDn  

2004  Jalview  1  published.    

VAMSAS  eScience  project  

Page 60: Creation, curation and analysis of RNA and Protein alignments with Jalview

Jalview  v2  Alignment,  Analysis,  Figure  Genera%on  

TOPALi  v2    

Evolu%on  &  Phylogeny  

AstexViewer@MSD-­‐EBI    

Structure  analysis  

Iain  Milne  

Dominik  Lindner  

Frank  Wright  

David  Marshall  

Pierre  Marguerite  

Tom  Oldfield  

Andrew  Waterhouse  

Jim  Procter    

David  MarDn,  Geoff  Barton  

VAMSAS  VisualizaDon  and  Analysis  of  Molecular  Sequences,  Alignments  and  Structures  

Page 61: Creation, curation and analysis of RNA and Protein alignments with Jalview

Analysis  of  Protein  

Sequences  

Analysis  of    Nucleic  Acid    Sequences  

Analysis  of  Protein  

Structures  

VAMSAS  VisualizaDon  and  Analysis  of  Molecular  Sequences,  Alignments  and  Structures  

Page 62: Creation, curation and analysis of RNA and Protein alignments with Jalview

Jalview  

TOPALi   AstexViewer  @MSD-­‐EBI  

Aim:  Enable  user  to  move  between  different  VAMSAS  ApplicaDons  

Databases,  AnnotaDon  Alignment  

2-­‐ary  Structure  PredicDon  

Structure  Databases  Structural  Clustering  Uniprot/MSD  Mapping  

Model  SelecDon  Phylogeny  

Ancestral  Sequences  PosiDve  SelecDon  RecombinaDon  

SDll  can’t  do  this  L  

Page 63: Creation, curation and analysis of RNA and Protein alignments with Jalview

Michele  Clamp  Interim  Director  of  Research  Compu1ng,  Harvard.  

James  Cuff  Cycle  Compu1ng,  USA.    

A  brief  history  of  Jalview  

Steve  Searle  Now  at  

Sanger,  UK  

Andrew  Waterhouse  U.  Basel.  

Jalview  Version  2  2005  

Jalview  Version  1  1997  

Jim  Procter  

David  MarDn  

2004  Jalview  1  published.    

2.1  

2.2  

2.3  

2.4  

VAMSAS  eScience  project  

Jalview  2  Paper  2009  

Page 64: Creation, curation and analysis of RNA and Protein alignments with Jalview

Unsupported  graphical  tools  cause  headaches..  

User  Headaches:  •  Can’t  do  what  you  want:  

–  Read/write  format  ‘X’  –  Run  my  analysis  ‘Y’  –  Load  my  10Gb  dataset  

•  Slow  or  difficult  to  use  •  Doesn’t  work  properly  

Core  Developer  needed  for:  •  Staying‘state  of  the  art’:  

–  New  standards  –  New  analysis  methods  –  New  data  

•  Maintenance  –  Dependencies  –  New  OS  Versions  –  New  hardware  

•  Hard  to  do  ‘research’  and  maintain  soxware  

Page 65: Creation, curation and analysis of RNA and Protein alignments with Jalview

Michele  Clamp  Interim  Director  of  Research  Compu1ng,  Harvard.  

James  Cuff  Cycle  Compu1ng,  USA.  

A  brief  history  of  Jalview  

Steve  Searle  Now  at  

Sanger,  UK  

Andrew  Waterhouse  U.  Basel.  

Jalview  Version  2  2005  

Jalview  Version  1  1997  

Jim  Procter  

David  MarDn  

2004  Jalview  1  published.    

Jalview  2  Paper  2009  

2.1  

2.2  

2.3  

2.4   2.5  2.6  

New  BBSRC  Funding  2009-­‐2014  

.1   2.7  

Peter    Troshin  

 

2.8  

VAMSAS  eScience  project  

Page 66: Creation, curation and analysis of RNA and Protein alignments with Jalview

The  Jalview  5  Year  Plan  

Extensible  Maintainable  

Sustainable  

Community  

More  Capable  

Users  and  Developers  

More  Flexible  

1  Oct  2009   30th  Sep  2014  

5  Year  Tools  and  Resources  Development  Fund  Grant  from  the  UK  Biotechnology  and  Biological  Sciences  Research  Council  

Page 67: Creation, curation and analysis of RNA and Protein alignments with Jalview

The  Jalview  5  Year  Plan  

Extensible  Maintainable  

More  Capable  More  Flexible  

Soxware  

Engineering  

Large  

Datasets  

Analysis  

services  

1  Oct  2009   30th  Sep  2014  

Sustainable  

Community  

GUI  refactor  &  plugins  

Page 68: Creation, curation and analysis of RNA and Protein alignments with Jalview

Co-­‐developed  with  Sasha  Sherstnev  on  Jpred  BBSRC  BBR  

Page 69: Creation, curation and analysis of RNA and Protein alignments with Jalview

The  Jalview  5  Year  Plan  

Extensible  Maintainable  

More  Capable  More  Flexible  

Soxware  

Engineering  

Large  

Datasets  

Analysis  

services  

1  Oct  2009   30th  Sep  2014  

Sustainable  

Community  Users  and  Developers  

Community  Outreach  • Website  redesign  

• Training  • Users  and  Developers  

• InternaDonalisaDon  

• True  Open  Source  Development  

• Issue  tracker  and  open  repository  

GUI  refactor  &  plugins  

Page 70: Creation, curation and analysis of RNA and Protein alignments with Jalview
Page 71: Creation, curation and analysis of RNA and Protein alignments with Jalview

www.google-melange.org  www.google-melange.com  

Page 72: Creation, curation and analysis of RNA and Protein alignments with Jalview

Lauren  Lui,  UC  Santa  Cruz.  hBp://jalview-­‐rnasupport.blogspot.com/  

alignment  fetcher  

Purine/pyrimidine  colourscheme  

Colouring  to  highlight  helical  structure  

WUSS  annotaDon  parser    (from  RALEE)  

NESCent  

Page 73: Creation, curation and analysis of RNA and Protein alignments with Jalview

Jan  Engelhardt  (Uni.  Leipzig)  

NESCent  

Page 74: Creation, curation and analysis of RNA and Protein alignments with Jalview

Google Summer of Code 2013 : Why should I participate?  

Students gain:  v  skills  v  real world experience  v  sample code  v  contacts  

Organizations gain:  v  new contributions & contributors  v  global exposure  

NESCent  

Page 75: Creation, curation and analysis of RNA and Protein alignments with Jalview

Google  and  NESCent  Summer  of  Code  2013  New  phylogeny  support  in  Jalview  ?  

•  cDNA/protein  ediDng  – Display  AA  translaDon  of  cDNA  alignment  – ORF  support  

•  Bacterial  genome  alignment  manipulaDon  – New  ediDng  

•  JABAWS  –  Phylogeny  services  –  Extend  alignment  services  to  

•  Return  guide  trees  and  uncertainty  scores  •  Support  ‘add’  and  profile  alignment  

Page 76: Creation, curation and analysis of RNA and Protein alignments with Jalview

Kersten  Schroeder    U.  Dundee,  UK.  Paul  Gardiner  Rfam,NZ.  Albert  Vilella,  EBI,  UK.  

The  Jalview  developers  Michele  Clamp  Harvard,  USA.  James  Cuff  Cycle  Compu1ng  

Steve  Searle  Sanger,  UK  

Andrew  Waterhouse  Basel,  Switzerland.  

Geoff  Barton  (Money)  David  MarDn  (Teaching)  Peter  Troshin  (JABAWS)  Barry  Strachan  (logo)  Tom  Walsh  (Apache)  Ryan  Maclaughlan  (CSS)  Andrew  Millar  (Drupal)    All  the  Jalview  users,  and  …    

RNA  Features    Lauren  Lui  UC  Santa  Cruz,  USA.  Jan  Engelhardt  Univ.  Leipzig,  Germany.  Yann  Ponty  (VARNA)  École  Polytechnique,  Fr.           RNA Experts T-­‐COFFEE  Scores  Paolo  di  Tomasso  Notredame  Group,    CRG,  Spain.  

Google  Summer  of  Code  Hilmar  Lapp  &  Karen  Cranston  NESCent  (Duke  U.)  

Yann  Ponty,  École  P.,  Fr.