29
E.G.M. Petrakis Tries 1 Tries Trees of order >= 2 Variable length keys The decision on what path to follow is taken based on potion of the key Static environment, fast retrieval but large space overhead Applications: dictionaries text searching (patricia tries) compression (Ziv-Lembel encoding)

E.G.M. PetrakisTries1 Trees of order >= 2 Variable length keys The decision on what path to follow is taken based on potion of the key Static environment,

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

E.G.M. Petrakis Tries 1

Tries

Trees of order >= 2 Variable length keys The decision on what path to follow is

taken based on potion of the key Static environment, fast retrieval but

large space overhead Applications:

dictionaries text searching (patricia tries) compression (Ziv-Lembel encoding)

E.G.M. Petrakis Tries 2

Example MACCBOY

MACAWMACARONMACARONIMACARONICMACAQUEMACADAMIAMACADAMMACACACOMACABRE

.

.

.

E.G.M. Petrakis Tries 3

1. Words in TreeM

A

C

A

D Q R WCB

C

A

B

O

Y

@

@R A A

N O

OU

CE EM

@

@

@

@

@

@

@

@O I

C

NI

A

@: word terminating symbolas many words as terminating symbols

E.G.M. Petrakis Tries 4

Word Searching At most m+1 comparisons to find the word

x@=a1a2….am+1, with root: i=1 and t(ai): child of node ai.

i=1; u = ai;

while (u != @ && i <= m) { if (t(ai): undefined) X is not in the trie; else { u = t(ai); i = i+1; } if (u == @ && word(t) == X) X is in the trie; else X is not in the trie;

}

E.G.M. Petrakis Tries 5

2. Words at the LeafsM

A

C

A

D Q R WCB

C

MACCBOY

MACAWMACACOA

N O

OMACAQUE

MMACABRE

MACARONI

MACARONIC

MACADAMI

C

MACAROONI

MACADAMI

.

.

...

..

.

.

E.G.M. Petrakis Tries 6

Example Implementation

26 symbols/node => large space overhead

E.G.M. Petrakis Tries 7

Insertions

“bluejay” inserted

E.G.M. Petrakis Tries 8

3. Compact TrieMAC

A

Q RO

MACAW

C

MACABOY

...

MACAQUE

N

DAM

I

MACADAMI

MACADAM

MACARON I

MACARONI

C

MACARONIC

MACAROO

compact nodes

wordsat the leafs

E.G.M. Petrakis Tries 9

Patricia Trie

Practical Algorithm To Retrieve Information Coded In Alphanumeric especially good for text searching replace sub-words by a reference position in

text together with its length e.g., “jay” in “the blue jay” is (10,3)

internal nodes keep only the length and the first letter of the phrase

external nodes keep the position of the phrase in the text

E.G.M. Petrakis Tries 10

Suffix Trie Construct a trie with all

suffixes “the blue-jay@” => 12

suffixes Merge branches with

common prefix Leafs point to suffices Search: “e-jay”

go to position” 3 or 7 increase by query

length: 5 check suffices at

positions 3+5 and 7+5 the second one matches

1. theblue-jay@2. heblue-jay@3. eblue-jay@4. blue-jay@5. lue-jay@6. ue-jay@7. e-jay@8. -jay@9. jay@10. ay@11. y@12.@

E.G.M. Petrakis Tries 11

“the blue-jay”

t h e b l u yaj- @

b -

91 64 52 11 1210

73

8

Suffix Trie

Search: “e-jay@”

E.G.M. Petrakis Tries 12

Search Patricia

Scan query from left to right go to first matching symbol in trie increase by the increment if not in the trie => search failed if all symbols match the query =>

found !!!

E.G.M. Petrakis Tries 13

Patricia Pros & Cons

Good for secondary storage applications the trie is not high

The space overhead is still a problem

Mainly useful for static environments

E.G.M. Petrakis Tries 14

Ziv-Lempel Encoding

Mainly compression method for any kind of data based on tries and symbol alphabets

Basic idea: compute code for repeating symbols or for sets of symbols

Globally and locally optimal Compared with Huffman

no a-priori knowledge of symbol probabilities Huffman is globally optimum but, not always Locally optimal (for parts of the input)

E.G.M. Petrakis Tries 15

Encoding

Construct trie: nodes: code of a symbol or of sequences of

symbols arc: alphabet symbol initially a trie with all symbols each symbol is assigned a unique code

Scan input from left to right: read next symbol follow appropriate branch in trie if symbol not in trie => insert symbol in trie,

assign it a new code and output its code

E.G.M. Petrakis Tries 16

initial trie

root

codes

a c zb

1 2 3 26

........

E.G.M. Petrakis Tries 17

alphabet {a, b, c}initial trie:

a b a b c b a b a b a a1 2 3 4 5 6 7 8 9 10 11 12

α

b

c

1 2 3

Input:

E.G.M. Petrakis Tries 18

α

output 1 output 2

b

α

b

c

1 2 3

α α

b

c

1 2 3

b

α

b

c

1

2 3b

4

α

b

c

1 2

3b

4

α

5

E.G.M. Petrakis Tries 19

c

b

output 4

output 3

a

b

α

b

c

1 2

3b

4

α

5

α

b

c

1 2

3b

4

α

5c

6

b

α

5

b c

2 3

α

5

b

7

E.G.M. Petrakis Tries 20

output 5

a

b

a

output 8

a

output 1

2

8

b

b

α

5

2

8

b

b

α

5

2

8

b

b

α

5

c

2

9

b

b

α

5

α

α

81 c

1

b

4

α

6

E.G.M. Petrakis Tries 21

Decoding

Follows the same sequence of steps in reverse order start from the same initial trie scan the code from left to right read next code symbol go to the node having the same code if the code is not in the trie guess the

appropriate position in the trie

E.G.M. Petrakis Tries 22

ab

c

1 2 3

Alphabet: {a, b, c}Code: 1 2 4 3 5 8 1

E.G.M. Petrakis Tries 23

1

2

4

output “α”

output “b”

α

b

c

1 2 3

α

b

c

1 2 3

1

4

b

3

cb

α

2

read(1): must be coming from “a”

In order to go from “1” to “2”, we must have visited “b”

E.G.M. Petrakis Tries 24

output “αb”

3

5

1

4

b

3

c

b

α

2

α

In order to go from “2” to “4” we must go through “ab” (passing from 1)“a” was a child of “2”

E.G.M. Petrakis Tries 25

output “c”

5

6

5

1

4

b

3

cb

α

2

α

c

In order to go from “4” to “3” we must go through “c” and“c” was child of “4”

E.G.M. Petrakis Tries 26

6

5

1

4

b

3

cb

α

2

α

c

b

7

8

In order to go from “3” to “5” we must go through “ba” and “b” was child of “3”The next symbol is “8”, but … there is no “8” in the trie!!

E.G.M. Petrakis Tries 27

output “bαb”

1

6

5

1

4

b

3

cb

α

2

α

c

b

7

b

8

This may happen only if after “5” we passed through the same path since “8” cannot be under “6” or “7” (there is no “6” or “7” in the code)

E.G.M. Petrakis Tries 28

output “α”

9

α

6

5

1

4

b

3

cb

α

2

α

c

b

7

b

8

E.G.M. Petrakis Tries 29

n

output “dxd”

Special case: “x” can be null

d

x

.....

..... .....d

x

.....

..... .....

nd

Generalize