72
Transcriptional regulation &Clustering Elena Nikolaeva [email protected] University of Tartu, Estonia MTAT.03.239 Bioinforma2cs

Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva [email protected] University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Transcriptional regulation &Clustering

Elena Nikolaeva [email protected] University of Tartu, Estonia

MTAT.03.239  Bioinforma2cs  

Page 2: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

•  Part  1:  Transcrip2onal  regula2on  -   Gene  regula*on  in  eukaryotes  -   PWM  -   TFBS  predic*on  using  PWM    

 •     Part  2:  Clustering  -   Goal  -   Types  of  clustering  -   Distance  measures  -   Applica*ons  

Page 3: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Informa2on  flow  in  eukaryo2c  cell    

h@p://www.nature.com/scitable/topicpage/gene-­‐expression-­‐14121669  

Page 4: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Intron  is  any  nucleo*de  sequence  within  a  gene  that  is  removed  by  RNA  splicing  while  the  final  mature  RNA  product  of  a  gene  is  being  generated  

Exon  is  any  nucleo*de  sequence  encoded  by  a  gene  that  remains  present  within  the  final  mature  RNA  product  of  that  gene  

Page 5: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!
Page 6: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Transcrip2on  factor  

Image  from  “Op*miza*on  of  PWMs  using  sta*s*cally  synchrofasosta*c  morphogene*c  infrastructural  modeling”  by  Konstan*n  Tretjakov  

TF1  

TF2  

perform  this  func*on:  alone  or  with  other  proteins  in  a  complex,    by  promo*ng  (as  an  ac*vator),    or  blocking  (as  a  repressor)    the  recruitment  of  RNA  polymerase  (the  enzyme  that  performs  the  transcrip*on  of  gene*c  informa*on  from  DNA  to  RNA)  

Is   a   protein   that   binds   to   specific   DNA   sequences,   thereby  controlling   the  flow  (or   transcrip=on)  of  gene=c   informa=on   from  DNA  to  messenger  RNA  

Page 7: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Transcrip2onal  regulators  can  determine  cell  types  

h@p://www.nature.com/scitable/topicpage/gene-­‐expression-­‐14121669  

Page 8: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

8

Gene

Enhancer

TSS: Transcription Start Site

“Proximal” promoter (100bp-2Kb 5’ Upstream)

How  is  gene  expression  regulated?  Transcrip*on  begins  when  an  RNA  polymerase  binds  to  a  so-­‐called  promoter  sequence  on  the  DNA  molecule  

Binding  of  regulatory  proteins  to  an  enhancer  sequence  causes  a  shi\  in  chroma*n  structure  that  either  promotes  or  inhibits  RNA  polymerase  and  transcrip*on  factor  binding  

Promoter  analysis.  TFBS  Detec*on  by  D.Rico  

Page 9: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Promoters  •  Promoters  are  DNA  segments  upstream  of  transcripts  that  ini*ate  transcrip*on  

•  Promoter  a"racts  RNA  Polymerase  to  the  transcrip*on  start  site  

5’ Promoter 3’

9 Promoter  analysis.  TFBS  Detec*on  by  D.Rico  

Page 10: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Enhancers  Is   a   short   region   of   DNA   that   can   be   bound   with  proteins  to  enhance  transcrip=on  levels  of  genes  (does  not  need  to  be  par*cularly  close  to  the  genes  it  acts  on)  

h@p://www.nature.com/scitable/topicpage/gene-­‐expression-­‐14121669  

Page 11: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

11

Page 12: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Transcrip2on  repression  

An  inac*ve  repressor  protein  can  become  ac*vated  by  another  molecule  

interfere  with  RNA  polymerase  binding  to  the  promoter,  effec*vely  preven*ng  transcrip*on.  

h@p://www.nature.com/scitable/topicpage/gene-­‐expression-­‐14121669  

Page 13: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

How  to  iden2fy  Transcrip2on  Factor  Binding  Sites(TFBS)?  

h@p://www.nature.com/scitable/topicpage/gene-­‐expression-­‐14121669  

Page 14: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Transcription factors recognize specific sequences.

http://www.bio.jhu.edu/Faculty/Privalov/

TGAGTCATGACTCA

Gcn4

DNA

TFs  recognize  specific  sequences  

h@p://www.bio.jhu.edu/Faculty/Privalov/  

Page 15: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Some positions can have multiple nucleotides.IUPAC ambiguity codes

Some  posi2ons  can  have  different  nucleo2des  

Page 16: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

TGAGTCATGACTCA TGASTCA

Gcn4 consensus sequenceGcn4  consensus  sequence  

Page 17: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

TFBS:  Detec2on  methods  in vivo

Functional analysis ChIP

in vitro on cloned fragment Footprinting reactions Exonuclease digests Gel retardation (EMSA) UV Crosslinking

in vitro on artificial DNA: SELEX: Systematic Evolution of Ligands

by Exponential enrichment

Slide  from  Promoter  analysis.  TFBS  Detec*on  by  D.Rico  

Page 18: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

ChIP-Seq can be used to detect TF

binding sites.

ChIP-­‐Seq  can  be  used  to  detect  TF  binding  sites  

Not  all  nucleo*des  are  likely  to  be  present    at  each  posi*on    

Page 19: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

19

TF  Binding  Sites  

•  Problems:  –  o\en  poorly  defined  consensus  –  Sequences  not  conserved  within  species,  and  even  worse  between  species  

–  Examples  of  enhancers  func*onally  conserved  but  not  sequence-­‐conserved  

– Most  of  the  TFBS  sequence  data  comes  from  just  a  few  species  

– Very  o\en  in  vitro  experiments  –  2  completely  different  binding  sites  could  be  merged  in  the  same  matrix/consensus  

19 Promoter  analysis.  TFBS  Detec*on  by  D.Rico  

Page 20: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Binding  sites  and  mo2fs  

•  Transcrip*on  factor  binding  is  specific,  hence  binding  sites  are  similar  to  each  other,  but  variability  is  o\en  seen  

•  A  mo*f  is  the  common  sequence  pa@ern  among  binding  sites  of  transcrip*on  factor  

Page 21: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Data collection

Probabilities can be calculated and corrected for background

Also  called  posi*on-­‐specific  scoring  matrices  (PSSMs).  In  log  scale.   21

Page 22: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

From  PFM  to  PWM/PSSM  

22 h@p://www.nature.com/nrg/journal/v5/n4/box/nrg1315_BX2.html  

Page 23: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

SEQUENCE  LOGOS:  The  informa*on  content  of  a  matrix  column  ranges  from  0  (no  base  preference)  and  2  (only  1  base  used).    

h@p://weblogo.berkeley.edu/   h@p://www.lecb.ncifcrf.gov/~toms/sequencelogo.html  23

Page 24: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

AAGTTC AAGCTC AGGCTC AAGGTC

A 430000 C 000204 G 014100 T 000140

Consensus:  ARGBTC  

Summary  

24 Slide  from  Promoter  analysis.  TFBS  Detec*on  by  D.Rico  

Page 25: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

25

Transfac:   not   free,   >848   matrices,   loads   of  informa*on   and   references,   quality   score   based  on  methods  used  

Jaspar:   open   sources,   174   matrices,   minimal  informa*on,   majority   based   on   SELEX   method  (80%)  

25

PWM  databases  

Page 26: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

TRANSFAC®  

26 h@p://www.gene-­‐regula*on.com/pub/databases.html    

Page 27: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

 h@p://jaspar.genereg.net/      

27

Page 28: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

28

Jaspar  example:  Pax6  

28

Page 29: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Fu*lity  Theorem:    Essen*ally  all  predicted  TFBSs  will  have  no  func*onal  role    It’s  necessary  to  constrain  the  search  space  

Page 30: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

•  Promoter  regions  •  Conserved  sequences  •  Open  chroma*n  •  Integrate  over  a  promoter  region.  •  Proximity  to  transcrip*on  start  site  (TSS)  •  etc  …  

Mul2ple  approaches  to  constrain  the  search  space  

Page 31: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Cluster  Analysis  

Adapted  from  Meelis  Kull’s  slides  Bioinforma*cs  course  2011    

Page 32: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

 

Clustering  is  finding  groups  of  objects  such  that:  •  similar  (or  related)  to  the  objects  in  the  same  group  

and  •  different  from  (or  unrelated)  to  the  objects  in  other  

groups  

What  is  cluster  analysis?  

Page 33: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

 

•  Intui*on  building    •  Hypothesis  genera*on    •  Summarizing  /  compressing  large  data  

Why  to  cluster  biological  data?  

Page 34: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Par22onal  vs  Hierarchical    

Page 35: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Fuzzy  vs  Non-­‐Fuzzy  Fuzzy vs Non-Fuzzy

Each object belongs to eachcluster with some weight(the weight can be zero)

Each object belongs to exactly one cluster

Each  object  belongs  to  each    cluster  with  some  weight    (the  weight  can  be  zero)  

Each  object  belongs  to    exactly  one  cluster  

Page 36: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Hierarchical  clustering  Hierarchical clustering

Hierarchical clustering is usually depicted as a dendrogram (tree)Hierarchical  clustering  is  usually  depicted  as  a  dendrogram  (tree)  

Page 37: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Hierarchical  clustering  

•  Each  subtree  corresponds  to  a  cluster    •  Height  of  branching  shows  distance  

Hierarchical clustering

• Each subtree corresponds to a cluster• Height of branching shows distance

Page 38: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Hierarchical clustering (0)

Algorithm for Agglomerative Hierarchical Clustering:Join the two closest objects

Algorithm  for  Agglomera*ve  Hierarchical  Clustering:    Join  the  two  closest  objects  

Hierarchical  clustering  

Page 39: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Join  the  two  closest  objects  

Hierarchical clustering (1)

Join the two closest objects

Hierarchical  clustering  (1)  

Page 40: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Hierarchical  clustering  (2)  

Keep  joining  the  closest  pairs  

Hierarchical clustering (2)

Keep joining the closest pairs

Page 41: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Hierarchical  clustering  (3)  

Keep  joining  the  closest  pairs  

Hierarchical clustering (3)

Keep joining the closest pairs

Page 42: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Hierarchical  clustering  (4)  

Keep  joining  the  closest  pairs  

Hierarchical clustering (4)

Keep joining the closest pairs

Page 43: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Hierarchical  clustering  (5)  Hierarchical clustering (5)

Keep joining the closest pairsKeep  joining  the  closest  pairs  

Page 44: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Hierarchical  clustering  (10)  Hierarchical clustering (10)

After 10 steps we have 4 clusters left

A\er  10  steps  we  have  4  clusters  le\  

Page 45: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Hierarchical  clustering  (10)  Hierarchical clustering (10)

Several ways to measure distancebetween clusters:• Single linkage (MIN)

Several  ways  to  measure  distance  between  clusters:    •  Single  linkage(MIN)    

Page 46: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Hierarchical  clustering  (10)  Hierarchical clustering (10)

Several ways to measure distancebetween clusters:• Single linkage (MIN) • Complete linkage (MAX)

Several  ways  to  measure  distance  between  clusters:    •  Single  linkage(MIN)  •   Complete  linkage(MAX)    

Page 47: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Hierarchical  clustering  (10)  Hierarchical clustering (10)Several ways to measure distancebetween clusters:• Single linkage (MIN) • Complete linkage (MAX)• Average linkage• Weighted• Unweighted• ...

Several  ways  to  measure  distance  between  clusters:    •  Single  linkage(MIN)  •  Complete  linkage(MAX)    •  Average  linkage  

•  Weighted    •  Unweighted  ...  

Page 48: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Hierarchical  clustering  (11)  Hierarchical clustering (11)

In this example and at this stage we have the same result as in partitional clustering

In  this  example  and  at  this  stage  we  have  the  same  result  as  in  par**onal  clustering  

Page 49: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Hierarchical  clustering  (12)  Hierarchical clustering (12)

In the final step the two remaining clusters are joined into a single cluster

In  the  final  step  the  two  remaining  clusters  are  joined  into  a  single  cluster  

Page 50: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Hierarchical  clustering  (13)  Hierarchical clustering (13)

In the final step the two remaining clusters are joined into a single cluster

In  the  final  step  the  two  remaining  clusters  are  joined  into  a  single  cluster  

Page 51: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Examples  of  Hierarchical    Clustering  in  Bioinforma2cs  

Examples of Hierarchical Clustering in Bioinformatics

PhylogenyGene expression clustering

Page 52: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

K-­‐means  clustering  

•  Par**onal,  non-­‐fuzzy  •   Par**ons  the  data  into  K  clusters  •   K  is  given  by  the  user  

Algorithm:  •  Choose  K  ini*al  centers  for  the  clusters  •   Assign  each  object  to  its  closest  center  •   Recalculate  cluster  centers    •   Repeat  un*l  converges  

Page 53: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

K-­‐means  (1)  K-means (1)

Page 54: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

K-­‐means  (2)  K-means (2)

Page 55: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

K-­‐means  (3)  K-means (3)

Page 56: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

K-­‐means  (4)  K-means (4)

Page 57: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

K-­‐means  (5)  K-means (5)

Page 58: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

K-­‐means  (6)  K-means (6)

Page 59: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

K-­‐means  clustering  summary  

•  One  of  the  fastest  clustering  algorithms  •  Therefore  very  widely  used  •   Sensi*ve  to  the  choice  of  ini*al  centres  

•   many  algorithms  to  choose  ini*al  centres  cleverly  

•   Assumes  that  the  mean  can  be  calculated  •   can  be  used  on  vector  data  •   cannot  be  used  on  sequences      (what  is  the  mean  of  A  and  T?)  

Page 60: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Distance  measures  Distance measuresDistance of vectors and

• Euclidean distance

• Manhattan distance

• Correlation distance

Distance of sequences and

• Hamming distance => 3

• Levenshtein distance

x = (x1, . . . , xn) y = (y1, . . . , yn)

d(x, y) =

����n�

i=1

(xi − yi)2

d(x, y) =n�

i=1

|xi − yi|

d(x, y) = 1− r(x, y)is Pearson

correlation coefficientr(x, y)

ACCTTG TACCTGACCTTGTACCTG

.ACCTTGTACC.TG => 2

Page 61: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

K-­‐medoids  clustering  

•  The  same  as  K-­‐means,      except  that  the  center  is  required  to  be  at  an    object  

•  Medoid  -­‐  an  object  which  has  minimal  total  distance  to  all  other  objects  in  its  cluster  

•  Can  be  used  on  more  complex  data,  with  any  distance  measure  

•  Slower  than  K-­‐means  

Page 62: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

K-­‐medoids  (1)  K-medoids (1)

Page 63: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

K-­‐medoids  (2)  K-medoids (2)

Page 64: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

K-­‐medoids  (3)  K-medoids (3)

Page 65: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

K-­‐medoids  (4)  K-medoids (4)

Page 66: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

K-­‐medoids  (5)  K-medoids (5)

Page 67: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

K-­‐medoids  (6)  K-medoids (6)

Page 68: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

K-­‐medoids  (7)  K-medoids (7)

Page 69: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

K-­‐medoids  (8)  K-medoids (8)

Page 70: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

K-­‐medoids  (9)  K-medoids (9)

Page 71: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

Examples of K-means and K-medoids in Bioinformatics

Gene expression clustering

Sequence clustering

Examples  of  K-­‐means  and  K-­‐medoids  in  Bioinforma2cs  

Page 72: Transcriptional regulation &Clustering · Transcriptional regulation &Clustering Elena Nikolaeva elenanik@ut.ee University of Tartu, Estonia MTAT.03.239)Bioinforma2cs! • Part1 :Transcriponalregulaon!

 

•  Aims:  intui*on,  hypothesis  genera*on,  summariza*on  •  Types:    

•  Hierarchical/Par**onal  •  Fuzzy/Non-­‐Fuzzy  •  Vector-­‐based/Distance-­‐based    etc.  

•  Distance  measures  •  Euclidean,  Manha@an,  Correla*on  •  Hamming,  Levenshtein  •  etc.  

•  Applica*ons:  •  Clustering  genes,  sequences,  organisms,  etc.  

Summary  of  Clustering