28
Explorations of Multidimensional Sequence Space

Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

Explorations ofMultidimensional Sequence Space

Page 2: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

one symbol -> 1D

coordinate of dimension = pattern length

Page 3: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

Two symbols -> Dimension = length of pattern

length 1 = 1D:

Page 4: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

Two symbols -> Dimension = length of pattern

length 2 = 2D:

dimensions correspond to positionFor each dimension two possibiities

Note: Here is a possible bifurcation: a larger alphabet could be represented as more choices along the axis of position!

Page 5: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

Two symbols -> Dimension = length of pattern

length 3 = 3D:

Page 6: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

Two symbols -> Dimension = length of pattern

length 4 = 4D:

aka Hypercube

Page 7: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

Two symbols -> Dimension = length of pattern

Page 8: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

Three Symbols (another solution is to use more values for each dimension)

Page 9: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

Four Symbols:

I.e.: with an alphabet of 4, we have a hypercube (4D) already with a pattern size of 2, provided we stick to a binary pattern in each dimension.

Page 10: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

hypercubes at 2 and 4 alphabets

2 character alphabet, pattern size 4

4 character alphabet, pattern size 2

Page 11: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

Three Symbols Alphabet suggests fractal representation

Page 12: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

3 fractal

enlarge fill in

outer pattern repeats inner pattern= self similar= fractal

Page 13: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

3 character alphapet3 pattern fractal

Page 14: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

3 character alphapet4 pattern fractal Conjecture:

For n -> infinity, the fractal midght fill a 2D triangle

Note: check Mandelbrot

Page 15: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

Same for 4 character alphabet

1 position

2 positions

3 positions

Page 16: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

4 character alphabet continued

(with cheating I didn’t actually add beads)

4 positions

Page 17: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

4 character alphabet continued

(with cheating I didn’t actually add beads)

5 positions

Page 18: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

4 character alphabet continued

(with cheating I didn’t actually add beads)

6 positions

Page 19: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

4 character alphabet continued

(with cheating I didn’t actually add beads)

7 positions

Page 20: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

Animated GIf 1-12 positions

Page 21: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

Protein Space in JalView

Page 22: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

Alignment of V F A ATPase ATP binding SU(catalytic and non-catalytic SU)

Page 23: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

UPGMA tree of V F A ATPase ATP binding SU with line dropped to partition (and colour) the 4 SU types (VA cat and non cat, F cat and non cat). Note that details of the tree $%#&@.

Page 24: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

PCA analysis of V F A ATPase ATP binding SU using colours from the UPGMA tree

Page 25: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

Same PCA analysis of V F A ATPase ATP binding SU using colours from the UPGMA tree, but turned slightly. (Giardia A SU selected in grey.)

Page 26: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

Same PCA analysis of V F A ATPase ATP binding SU Using colours from the UPGMA tree, but replacing the 1st with the 5th axis. (Eukaryotic A SU selected in grey.)

Page 27: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

Same PCA analysis of V F A ATPase ATP binding SU Using colours from the UPGMA tree, but replacing the 1st with the 6th axis. (Eukaryotic B SU selected in grey - forgot rice.)

Page 28: Explorations of Multidimensional Sequence Space. one symbol -> 1D coordinate of dimension = pattern length

Problems• Jalview’s approach requires an alignment.• Solution: Use pattern absence / presence as coordinate• Which patterns?

– GBLOCKS (new additions use PSSMs)– CDD PSSM profiles– It would be nice to stick to small words.

• One could screen for words/motifs/PSSMs that have a good power of resolution:– PCA with all, choose only the ones that contribute to the main axis– probably better to do data bank search and find how often it is

present. One could generate random motifs (or all possible motifs) and check them out (Criterion needs work).

– Empirical orthogonality– Exhaustive vs random – How to judge discriminatory power (maybe 5% significance value)– Present absence - optimal discriminatory power?