21
Suffix Trees ALGGEN: Algorithmics and genetics group Dep. Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya Dr. Xavier Messeguer http://www.lsi.upc.es/~alggen

Suffix Trees ALGGEN: Algorithmics and genetics group Dep. Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya Dr. Xavier Messeguer

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Suffix Trees

ALGGEN: Algorithmics and genetics group

Dep. Llenguatges i Sistemes Informàtics

Universitat Politècnica de Catalunya

Dr. Xavier Messeguerhttp://www.lsi.upc.es/~alggen

Suffix trees

Given string ababaas:

1: ababaas

2: babaas

3: abaas

4: baas

5: aas

6: as

7: s

as,3

s,6

as,5

s,7

as,4ba

baas,2

a

babaas,1

a

babaas,1

ba

baas,2

as,3

as,4

s,6

as,5

s,7

Suffixes:

What kind of queries?

Queries on Suffix trees

a

babaas,1as,3

ba

baas,2

as,4

s,6

as,5

s,7

• Does the sequence ababaas contain any ocurrence of patterns abab, aab, and ab?

• Find repeats within the sequence ababaas.

…………………………

…………………………

Quadratic Insertion algorithm

Given the string ababaabbs

ababaabbs,1

Quadratic Insertion algorithm

Given the string ababaabbs

babaabbs,2

ababaabbs,1

Quadratic Insertion algorithm

Given the string ababaabbs

babaabbs,2

ababaabbs,1ababaabbs,1

Quadratic Insertion algorithm

Given the string ababaabbs

babaabbs,2

ababaabbs,1

abbs,3

Quadratic Insertion algorithm

Given the string ababaabbs

babaabbs,2

ababaabbs,1

abbs,3

ba

baabbs,2

Quadratic Insertion algorithm

Given the string ababaabbs

ababaabbs,1

abbs,3

ba

baabbs,2

abbs,4

Quadratic Insertion algorithm

Given the string ababaabbs

ababaabbs,1

abbs,3

abbs,4ba

baabbs,2

abbs,4

abbs,3ba

a

baabbs,1

Quadratic Insertion algorithm

Given the string ababaabbs

abbs,4ba

baabbs,2

abbs,4

abbs,3ba

a

baabbs,1

abbs,5

Quadratic Insertion algorithm

Given the string ababaabbs

abbs,4ba

baabbs,2

abbs,4

abbs,3ba

a

baabbs,1

abbs,5

Quadratic Insertion algorithm

Given the string ababaabbs

abbs,4

ba

ba

baabbs,2

abbs,4

a abbs,5

b

a abbs,3

baabbs,1

Quadratic Insertion algorithm

Given the string ababaabbs

abbs,4ba

baabbs,2

abbs,4

a abbs,5

b

a abbs,3

baabbs,1

bs,6

Quadratic Insertion algorithm

Given the string ababaabbs

abbs,4ba

baabbs,2

abbs,4

a abbs,5

b

a abbs,3

baabbs,1

bs,6

Quadratic Insertion algorithm

Given the string ababaabbs

a abbs,5

b

a abbs,3

baabbs,1

bs,6

a

baabbs,2

b

abbs,4

bs,7

Quadratic Insertion algorithm

Given the string ababaabbs

a abbs,5

b

a abbs,3

baabbs,1

bs,6

a

baabbs,2

b

abbs,4

bs,7

s,7

Quadratic Insertion algorithm

Given the string ababaabbs

a abbs,5

b

a abbs,3

baabbs,1

bs,6

a

baabbs,2

b

abbs,4

bs,7

s,7

s,7

Quadratic Insertion algorithm

Given the string ababaabbs

abbs,4ba

baabbs,2

abbs,4

a abbs,5

b

a abbs,3

baabbs,1

bs,6

a

baabbs,2

b

abbs,4

Definition of MUM

… a a t g….c t g...

… c g t g….c c c ...

MatchingUniqueMaximal

MUM

Search for MUMs

Given strings ababaabs and aabaat:

List of UM aab,abaa,baa.

ba

a

s,8

s,6s,7

baabs,2

ba

baabs,1

abs,3

a

s,5

abs,4b

ab

t,2t,5

t,6

t,4aat,1

t,3

(through the list of UM)

1st: Bottom-up traversal

2nd: Search for maximals

(Through the tree)

MUMs: aab,abaa.