PhD Defensetildeweb.au.dk/au121/advising/slides/jesper-sindahl-nielsen.pdf · PhD Defense Jesper...

Preview:

Citation preview

AARHUSUNIVERSITY

PhD Defense

Jesper Sindahl Nielsen

Implicit Data Structures,Sorting, and Text Indexing

Overview

1/24

Jesper Sindahl Nielsen

Overview

1/24

Jesper Sindahl Nielsen

1. Algorithms and Data Structures(a) Abstract description(b) Example

2. Implicit Data Structures(a) Finger Search(b) Priority Queues

3. Integer Sorting4. Text Indexing5. State and University Library

Overview

1/24

Jesper Sindahl Nielsen

1. Algorithms and Data Structures(a) Abstract description(b) Example

2. Implicit Data Structures(a) Finger Search(b) Priority Queues

3. Integer Sorting4. Text Indexing5. State and University Library

Overview

1/24

Jesper Sindahl Nielsen

1. Algorithms and Data Structures(a) Abstract description(b) Example

2. Implicit Data Structures(a) Finger Search(b) Priority Queues

3. Integer Sorting4. Text Indexing5. State and University Library

Overview

1/24

Jesper Sindahl Nielsen

1. Algorithms and Data Structures(a) Abstract description(b) Example

2. Implicit Data Structures(a) Finger Search(b) Priority Queues

3. Integer Sorting4. Text Indexing5. State and University Library

Overview

1/24

Jesper Sindahl Nielsen

1. Algorithms and Data Structures(a) Abstract description(b) Example

2. Implicit Data Structures(a) Finger Search(b) Priority Queues

3. Integer Sorting4. Text Indexing5. State and University Library

Overview

1/24

Jesper Sindahl Nielsen

1. Algorithms and Data Structures(a) Abstract description(b) Example

2. Implicit Data Structures(a) Finger Search(b) Priority Queues

3. Integer Sorting4. Text Indexing5. State and University Library

Algorithms

2/24

Jesper Sindahl Nielsen

Algorithms

2/24

Jesper Sindahl Nielsen

Algorithm

Algorithms

2/24

Jesper Sindahl Nielsen

Algorithminput

Algorithms

2/24

Jesper Sindahl Nielsen

Algorithminput output

Algorithms

2/24

Jesper Sindahl Nielsen

Algorithminput output

x + yAddition

Algorithms

2/24

Jesper Sindahl Nielsen

Algorithminput output

x=5y=7 x + y

Addition

Algorithms

2/24

Jesper Sindahl Nielsen

Algorithminput output

12x=5y=7 x + y

Addition

Algorithms

2/24

Jesper Sindahl Nielsen

Algorithminput output

12x=5y=7 x + y

Addition

Sort L

Algorithms

2/24

Jesper Sindahl Nielsen

Algorithminput output

12x=5y=7 x + y

Addition

L=5,1,3,2,4,6 Sort L

Algorithms

2/24

Jesper Sindahl Nielsen

Algorithminput output

12x=5y=7 x + y

Addition

L=1,2,3,4,5,6L=5,1,3,2,4,6 Sort L

We want to design fast algorithms

Data Structures

3/24

Jesper Sindahl Nielsen

Data Structures

3/24

Jesper Sindahl Nielsen

Data

Data Structures

3/24

Jesper Sindahl Nielsen

DataData

Structure

Data Structures

3/24

Jesper Sindahl Nielsen

DataData

Structure

Data organized insome clever manner

Data Structures

3/24

Jesper Sindahl Nielsen

Data BuildData

Structure

Data organized insome clever manner

Data Structures

3/24

Jesper Sindahl Nielsen

Data BuildData

Structure

Data organized insome clever manner

Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123

Data Structures

3/24

Jesper Sindahl Nielsen

Data BuildData

Structure

Data organized insome clever manner

Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123

create sorted phonebook

Data Structures

3/24

Jesper Sindahl Nielsen

Data BuildData

Structure

Data organized insome clever manner

Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123

create sorted phonebook

Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123

Data Structures

3/24

Jesper Sindahl Nielsen

Data BuildData

Structure

Data organized insome clever manner

DataStructure

input

Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123

create sorted phonebook

Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123

Data Structures

3/24

Jesper Sindahl Nielsen

Data BuildData

Structure

Data organized insome clever manner

DataStructure

Queryinput

Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123

create sorted phonebook

Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123

Data Structures

3/24

Jesper Sindahl Nielsen

Data BuildData

Structure

Data organized insome clever manner

DataStructure

Queryinput

answer to query on data

Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123

create sorted phonebook

Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123

Data Structures

3/24

Jesper Sindahl Nielsen

Data BuildData

Structure

Data organized insome clever manner

DataStructure

Queryinput

answer to query on data

Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123

create sorted phonebook

Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123

Arnold

Data Structures

3/24

Jesper Sindahl Nielsen

Data BuildData

Structure

Data organized insome clever manner

DataStructure

Queryinput

answer to query on data

Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123

create sorted phonebook

Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123

find phone number

Arnold

Data Structures

3/24

Jesper Sindahl Nielsen

Data BuildData

Structure

Data organized insome clever manner

DataStructure

Queryinput

answer to query on data

Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123

create sorted phonebook

Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123

find phone number

Arnold 123456

Data Structures

3/24

Jesper Sindahl Nielsen

Data BuildData

Structure

Data organized insome clever manner

DataStructure

Queryinput

answer to query on data

Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123

create sorted phonebook

Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123

find phone number

Mom 375874

Arnold 123456

Data Structures

3/24

Jesper Sindahl Nielsen

Data BuildData

Structure

Data organized insome clever manner

DataStructure

Queryinput

answer to query on data

Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123

create sorted phonebook

Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123

find phone number

Mom 375874

Arnold 123456

efficient when data is organized

Data Structures

3/24

Jesper Sindahl Nielsen

Data BuildData

Structure

Data organized insome clever manner

Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123

create sorted phonebook

Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123Data

StructureUpdate

inputUpd. DataStructure

Data Structures

3/24

Jesper Sindahl Nielsen

Data BuildData

Structure

Data organized insome clever manner

Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123

create sorted phonebook

Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123Data

StructureUpdate

inputUpd. DataStructure

Dad,357951

Data Structures

3/24

Jesper Sindahl Nielsen

Data BuildData

Structure

Data organized insome clever manner

Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123

create sorted phonebook

Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123Data

StructureUpdate

inputUpd. DataStructure

Dad,357951

Data Structures

3/24

Jesper Sindahl Nielsen

Data BuildData

Structure

Data organized insome clever manner

Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123

create sorted phonebook

Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123Data

StructureUpdate

inputUpd. DataStructure

Dad,357951

Advisor,359617Arnold,123456Brother,681958Dad,357951Jean,456321Mom,375874Sylvester,987654Wesley,789123

Change phone number

Data Structures

3/24

Jesper Sindahl Nielsen

Data BuildData

Structure

Data organized insome clever manner

Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123

create sorted phonebook

Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123Data

StructureUpdate

inputUpd. DataStructure

Dad,357951

Advisor,359617Arnold,123456Brother,681958Dad,357951Jean,456321Mom,375874Sylvester,987654Wesley,789123

Change phone number

Data Structures

3/24

Jesper Sindahl Nielsen

Data BuildData

Structure

Data organized insome clever manner

Mom,375874Dad,278393Brother,681958Advisor,359617Arnold,123456Sylvester,987654Jean-Claude,456321Wesley,789123

create sorted phonebook

Advisor,359617Arnold,123456Brother,681958Dad,278393Jean,456321Mom,375874Sylvester,987654Wesley,789123Data

StructureUpdate

inputUpd. DataStructure

Dad,357951

Advisor,359617Arnold,123456Brother,681958Dad,357951Jean,456321Mom,375874Sylvester,987654Wesley,789123

Change phone number

We want space efficient structures,fast queries, and fast updates

Outline

4/24

Jesper Sindahl Nielsen

1. Algorithms and Data Structures(a) Abstract description(b) Example

2. Implicit Data Structures(a) Finger Search(b) Priority Queues

3. Integer Sorting4. Text Indexing5. State and University Library

Outline

4/24

Jesper Sindahl Nielsen

1. Algorithms and Data Structures(a) Abstract description(b) Example

2. Implicit Data Structures(a) Finger Search(b) Priority Queues

3. Integer Sorting4. Text Indexing5. State and University Library

Implicit Data Structures

5/24

Jesper Sindahl Nielsen

Implicit Data Structures

5/24

Jesper Sindahl Nielsen

Primary goal: space efficiency

Implicit Data Structures

5/24

Jesper Sindahl Nielsen

Primary goal: space efficiency

Data Structure has n comparable elements, laid out in an array of size n.

Implicit Data Structures

5/24

Jesper Sindahl Nielsen

Primary goal: space efficiency

0 1 2

Data Structure has n comparable elements, laid out in an array of size n.

n− 1

Implicit Data Structures

5/24

Jesper Sindahl Nielsen

Primary goal: space efficiency

0 1 2

Data Structure has n comparable elements, laid out in an array of size n.

n− 1

Only the array and n is maintained between operations.

Implicit Data Structures

5/24

Jesper Sindahl Nielsen

Primary goal: space efficiency

0 1 2

Data Structure has n comparable elements, laid out in an array of size n.

n− 1

Only the array and n is maintained between operations.

Strict implicit model: Nothing but elements and n stored

Implicit Data Structures

5/24

Jesper Sindahl Nielsen

Primary goal: space efficiency

0 1 2

Data Structure has n comparable elements, laid out in an array of size n.

n− 1

Only the array and n is maintained between operations.

Strict implicit model: Nothing but elements and n stored

Weak implicit model: Allow a constant number of extra memory cells

Implicit Data Structures

5/24

Jesper Sindahl Nielsen

Primary goal: space efficiency

0 1 2

Data Structure has n comparable elements, laid out in an array of size n.

n− 1

Only the array and n is maintained between operations.

Strict implicit model: Nothing but elements and n stored

Weak implicit model: Allow a constant number of extra memory cells

We work with the strict implicit model

Implicit Data Structures

5/24

Jesper Sindahl Nielsen

Primary goal: space efficiency

0 1 2

Data Structure has n comparable elements, laid out in an array of size n.

n− 1

Only the array and n is maintained between operations.

Strict implicit model: Nothing but elements and n stored

Weak implicit model: Allow a constant number of extra memory cells

We work with the strict implicit model

Fundamental trick: pair encoding bits.

Implicit Data Structures

5/24

Jesper Sindahl Nielsen

Primary goal: space efficiency

0 1 2

Data Structure has n comparable elements, laid out in an array of size n.

n− 1

Only the array and n is maintained between operations.

Strict implicit model: Nothing but elements and n stored

Weak implicit model: Allow a constant number of extra memory cells

We work with the strict implicit model

Fundamental trick: pair encoding bits. x yx<y ⇒1

x>y ⇒0

Implicit Data Structures

5/24

Jesper Sindahl Nielsen

Primary goal: space efficiency

0 1 2

Data Structure has n comparable elements, laid out in an array of size n.

n− 1

Only the array and n is maintained between operations.

Strict implicit model: Nothing but elements and n stored

Weak implicit model: Allow a constant number of extra memory cells

We work with the strict implicit model

Fundamental trick: pair encoding bits. x yx<y ⇒1

x>y ⇒0Requires distinct elements.

Implicit Data Structures

5/24

Jesper Sindahl Nielsen

Primary goal: space efficiency

0 1 2

Data Structure has n comparable elements, laid out in an array of size n.

n− 1

Only the array and n is maintained between operations.

Strict implicit model: Nothing but elements and n stored

Weak implicit model: Allow a constant number of extra memory cells

We work with the strict implicit model

Fundamental trick: pair encoding bits. x yx<y ⇒1

x>y ⇒0Requires distinct elements.

Procedures can use a constant number of working memory/registers

Dictionary and Fingersearch

6/24

Jesper Sindahl Nielsen

Dictionary and Fingersearch

6/24

Jesper Sindahl Nielsen

Maintain a (dynamic) set of n keys, while supporting:

Dictionary and Fingersearch

6/24

Jesper Sindahl Nielsen

Insert(k)

Maintain a (dynamic) set of n keys, while supporting:

Dictionary and Fingersearch

6/24

Jesper Sindahl Nielsen

Insert(k)

Delete(k)

Maintain a (dynamic) set of n keys, while supporting:

Dictionary and Fingersearch

6/24

Jesper Sindahl Nielsen

Insert(k)

Delete(k)

Find(k)

Maintain a (dynamic) set of n keys, while supporting:

Dictionary and Fingersearch

6/24

Jesper Sindahl Nielsen

Insert(k)

Delete(k)

Find(k)

Predecessor(k)

Maintain a (dynamic) set of n keys, while supporting:

Dictionary and Fingersearch

6/24

Jesper Sindahl Nielsen

Insert(k)

Delete(k)

Find(k)

Predecessor(k)

Successor(k)

Maintain a (dynamic) set of n keys, while supporting:

Dictionary and Fingersearch

6/24

Jesper Sindahl Nielsen

Insert(k)

Delete(k)

Find(k)

Predecessor(k)

Time: O(log n)

Successor(k)

Maintain a (dynamic) set of n keys, while supporting:

Dictionary and Fingersearch

6/24

Jesper Sindahl Nielsen

Insert(k)

Delete(k)

Find(k)

Predecessor(k)

Only in dynamic structures

Time: O(log n)

Successor(k)

Maintain a (dynamic) set of n keys, while supporting:

Dictionary and Fingersearch

6/24

Jesper Sindahl Nielsen

Insert(k)

Delete(k)

Find(k)

Predecessor(k)

Only in dynamic structures

Time: O(log n)

Successor(k)

Maintain a (dynamic) set of n keys, while supporting:

Special element: the finger element

Dictionary and Fingersearch

6/24

Jesper Sindahl Nielsen

Insert(k)

Delete(k)

Find(k)

Predecessor(k)

Only in dynamic structures

Time: O(log n)

Successor(k)

Maintain a (dynamic) set of n keys, while supporting:

Special element: the finger element

a c d g j ` m o q s z0 1 2 3 4 5 6 7 8 9 10

Dictionary and Fingersearch

6/24

Jesper Sindahl Nielsen

Insert(k)

Delete(k)

Find(k)

Predecessor(k)

Only in dynamic structures

Time: O(log n)

Successor(k)

Maintain a (dynamic) set of n keys, while supporting:

Special element: the finger element

a c d g j ` m o q s z0 1 2 3 4 5 6 7 8 9 10

n=11

Dictionary and Fingersearch

6/24

Jesper Sindahl Nielsen

Insert(k)

Delete(k)

Find(k)

Predecessor(k)

Only in dynamic structures

Time: O(log n)

Successor(k)

Maintain a (dynamic) set of n keys, while supporting:

Special element: the finger element

a c d g j ` m o q s z0 1 2 3 4 5 6 7 8 9 10

finger

n=11

Dictionary and Fingersearch

6/24

Jesper Sindahl Nielsen

Insert(k)

Delete(k)

Find(k)

Predecessor(k)

Only in dynamic structures

Time: O(log n)

Successor(k)

Maintain a (dynamic) set of n keys, while supporting:

Special element: the finger element

a c d g j ` m o q s z0 1 2 3 4 5 6 7 8 9 10

finger

n=11

Find(m)

Dictionary and Fingersearch

6/24

Jesper Sindahl Nielsen

Insert(k)

Delete(k)

Find(k)

Predecessor(k)

Only in dynamic structures

Time: O(log n)

Successor(k)

Maintain a (dynamic) set of n keys, while supporting:

Special element: the finger element

a c d g j ` m o q s z0 1 2 3 4 5 6 7 8 9 10

finger

n=11

Find(m)

Distance: t = 3

Dictionary and Fingersearch

6/24

Jesper Sindahl Nielsen

Insert(k)

Delete(k)

Find(k)

Predecessor(k)

Only in dynamic structures

Time: O(log n)

Successor(k)

Maintain a (dynamic) set of n keys, while supporting:

Special element: the finger element

a c d g j ` m o q s z0 1 2 3 4 5 6 7 8 9 10

finger

n=11

Find(m)

Distance: t = 3

Time: O(log t) forFind, Predecessor, and Successor

Implicit Finger Search Idea

7/24

Jesper Sindahl Nielsen

Implicit Finger Search Idea

7/24

Jesper Sindahl Nielsen

Implicit Finger Search Idea

7/24

Jesper Sindahl Nielsen

· · ·

220

221

222

22`

· · · · · · · · ·

Implicit Finger Search Idea

7/24

Jesper Sindahl Nielsen

· · ·

220

221

222

D0

22`

· · · · · · · · ·

Implicit Finger Search Idea

7/24

Jesper Sindahl Nielsen

· · ·

220

221

222

D0 D1

22`

· · · · · · · · ·

Implicit Finger Search Idea

7/24

Jesper Sindahl Nielsen

· · ·

· · ·

220

221

222

D0 D1 D2 D`

22`

· · · · · · · · ·

Implicit Finger Search Idea

7/24

Jesper Sindahl Nielsen

· · ·

· · ·

220

221

222

D0 D1 D2 D`

22`

· · ·

Elements in sorted order

· · · · · ·

Implicit Finger Search Idea

7/24

Jesper Sindahl Nielsen

· · ·

· · ·

220

221

222

D0 D1 D2 D`

f

22`

· · ·

Elements in sorted order

· · · · · ·

Implicit Finger Search Idea

7/24

Jesper Sindahl Nielsen

· · ·

· · ·

220

221

222

D0 D1 D2 D`

f

22`

· · ·

Elements in sorted order

· · · · · ·

Implicit Finger Search Idea

7/24

Jesper Sindahl Nielsen

· · ·

· · ·

220

221

222

D0 D1 D2 D`

f

22`

· · ·

Elements in sorted order

· · · · · ·

Implicit Finger Search Idea

7/24

Jesper Sindahl Nielsen

· · ·

· · ·

220

221

222

D0 D1 D2 D`

f

22`

· · ·

Elements in sorted order

· · · · · ·

Implicit Finger Search Idea

7/24

Jesper Sindahl Nielsen

· · ·

· · ·

220

221

222

D0 D1 D2 D`

f

22`

· · ·

Elements in sorted order

· · · · · ·

Implicit Finger Search Idea

7/24

Jesper Sindahl Nielsen

· · ·

· · ·

220

221

222

D0 D1 D2 D`

f

22`

· · ·

Elements in sorted order

· · · · · ·

Implicit Finger Search Idea

7/24

Jesper Sindahl Nielsen

· · ·

· · ·

220

221

222

D0 D1 D2 D`

f

22`

· · ·

Elements in sorted order

· · · · · ·

Implicit Finger Search Idea

7/24

Jesper Sindahl Nielsen

· · ·

· · ·

220

221

222

D0 D1 D2 D`

f

22`

· · ·

Elements in sorted order

· · ·

· · · · · ·

· · ·

Implicit Finger Search Idea

7/24

Jesper Sindahl Nielsen

· · ·

· · ·

220

221

222

D0 D1 D2 D`

f

22`

· · ·

Elements in sorted order

· · ·

· · · · · ·

· · ·

Operation Time

Insert

Delete

Find O(log t)

Change-Finger

O(logn)

O(logn)

O(nε)

Implicit Finger Search Idea

7/24

Jesper Sindahl Nielsen

· · ·

· · ·

220

221

222

D0 D1 D2 D`

f

22`

· · ·

Elements in sorted order

· · ·

· · · · · ·

· · ·

Operation Time

Insert

Delete

Find O(log t)

Change-Finger

O(logn)

O(logn)

O(nε)

optimal trade-off

Priority Queues

8/24

Jesper Sindahl Nielsen

Priority Queues

8/24

Jesper Sindahl Nielsen

Maintain a set of n keys and support:

• FindMin: Return the minimum element• ExtractMin: Remove the minimum element from the set• Insert(k): Add the key k to the set

Priority Queues

8/24

Jesper Sindahl Nielsen

Maintain a set of n keys and support:

• FindMin: Return the minimum element• ExtractMin: Remove the minimum element from the set• Insert(k): Add the key k to the set

Priority Queues

8/24

Jesper Sindahl Nielsen

Maintain a set of n keys and support:

• FindMin: Return the minimum element• ExtractMin: Remove the minimum element from the set• Insert(k): Add the key k to the set

Priority Queues

8/24

Jesper Sindahl Nielsen

Maintain a set of n keys and support:

• FindMin: Return the minimum element• ExtractMin: Remove the minimum element from the set• Insert(k): Add the key k to the set

Priority Queues

8/24

Jesper Sindahl Nielsen

Maintain a set of n keys and support:

• FindMin: Return the minimum element• ExtractMin: Remove the minimum element from the set• Insert(k): Add the key k to the set

Insert ExtractMin Moves StrictIdenticalElements

Amortized bounds are marked with ?

Priority Queues

8/24

Jesper Sindahl Nielsen

Maintain a set of n keys and support:

• FindMin: Return the minimum element• ExtractMin: Remove the minimum element from the set• Insert(k): Add the key k to the set

Williams

Insert ExtractMin Moves StrictIdenticalElements

Amortized bounds are marked with ?

log nlog n log n yes yes

Priority Queues

8/24

Jesper Sindahl Nielsen

Maintain a set of n keys and support:

• FindMin: Return the minimum element• ExtractMin: Remove the minimum element from the set• Insert(k): Add the key k to the set

Williams

Carlsson et al.

Insert ExtractMin Moves StrictIdenticalElements

Amortized bounds are marked with ?

log nlog n log n yes yes

log n1 log n no yes

Priority Queues

8/24

Jesper Sindahl Nielsen

Maintain a set of n keys and support:

• FindMin: Return the minimum element• ExtractMin: Remove the minimum element from the set• Insert(k): Add the key k to the set

Williams

Carlsson et al.

Insert ExtractMin Moves StrictIdenticalElements

Amortized bounds are marked with ?

log nlog n log n yes yes

log n1 log n no yes

log n1 log n no yesEdelkamp et al.

Priority Queues

8/24

Jesper Sindahl Nielsen

Maintain a set of n keys and support:

• FindMin: Return the minimum element• ExtractMin: Remove the minimum element from the set• Insert(k): Add the key k to the set

Williams

Carlsson et al.

Harvey & Zatloukal

Insert ExtractMin Moves StrictIdenticalElements

Amortized bounds are marked with ?

log nlog n log n yes yes

log n1 log n no yes

log n1 log n no yesEdelkamp et al.

? log n? 1 ? log n yes yes

Priority Queues

8/24

Jesper Sindahl Nielsen

Maintain a set of n keys and support:

• FindMin: Return the minimum element• ExtractMin: Remove the minimum element from the set• Insert(k): Add the key k to the set

Williams

Franceschini & Munro

Carlsson et al.

Harvey & Zatloukal

Insert ExtractMin Moves StrictIdenticalElements

Amortized bounds are marked with ?

log nlog n log n yes yes

log n1 log n no yes

log n1 log n no yesEdelkamp et al.

? log n? 1 ? log n yes yes

? log n? log n ? 1 yes no

Priority Queues

8/24

Jesper Sindahl Nielsen

Maintain a set of n keys and support:

• FindMin: Return the minimum element• ExtractMin: Remove the minimum element from the set• Insert(k): Add the key k to the set

Williams

Franceschini & Munro

Carlsson et al.

Harvey & Zatloukal

Insert ExtractMin Moves StrictIdenticalElements

Amortized bounds are marked with ?

log nlog n log n yes yes

log n1 log n no yes

log n1 log n no yesEdelkamp et al.

? log n? 1 ? log n yes yes

? log n? log n ? 1 yes no

? log n? 1 ? 1 yes yesNew

Priority Queues

8/24

Jesper Sindahl Nielsen

Maintain a set of n keys and support:

• FindMin: Return the minimum element• ExtractMin: Remove the minimum element from the set• Insert(k): Add the key k to the set

Williams

Franceschini & Munro

Carlsson et al.

Harvey & Zatloukal

Insert ExtractMin Moves StrictIdenticalElements

Amortized bounds are marked with ?

log nlog n log n yes yes

log n1 log n no yes

log n1 log n no yesEdelkamp et al.

? log n? 1 ? log n yes yes

? log n? log n ? 1 yes no

? log n? 1 ? 1 yes yes

log n1 log n yes no

New

New

Outline

9/24

Jesper Sindahl Nielsen

Outline

9/24

Jesper Sindahl Nielsen

1. Algorithms and Data Structures(a) Abstract description(b) Example

2. Implicit Data Structures(a) Finger Search(b) Priority Queues

3. Integer Sorting4. Text Indexing5. State and University Library

Outline

9/24

Jesper Sindahl Nielsen

1. Algorithms and Data Structures(a) Abstract description(b) Example

2. Implicit Data Structures(a) Finger Search(b) Priority Queues

3. Integer Sorting4. Text Indexing5. State and University Library

Integer Sorting

10/24

Jesper Sindahl Nielsen

Comparison based sorting: only allowed operation ≤

Integer Sorting

10/24

Jesper Sindahl Nielsen

Comparison based sorting: only allowed operation ≤

Integer Sorting

10/24

Jesper Sindahl Nielsen

Comparison based sorting: only allowed operation ≤

A B

Integer Sorting

10/24

Jesper Sindahl Nielsen

Comparison based sorting: only allowed operation ≤

A B

Integer Sorting

10/24

Jesper Sindahl Nielsen

Comparison based sorting: only allowed operation ≤

A B

conclude A > B

Integer Sorting

10/24

Jesper Sindahl Nielsen

Comparison based sorting: only allowed operation ≤

A B

conclude A > B

To sort n gold pieces we needΘ(n log n) scalings/comparisons

Integer Sorting

10/24

Jesper Sindahl Nielsen

Comparison based sorting: only allowed operation ≤

A B

conclude A > B

To sort n gold pieces we needΘ(n log n) scalings/comparisons

A B

287315

Integer Sorting

10/24

Jesper Sindahl Nielsen

Comparison based sorting: only allowed operation ≤

A B

conclude A > B

To sort n gold pieces we needΘ(n log n) scalings/comparisons

A B

287315

Question: Can we sort faster ifwe have access to a digital scale?

Integer Sorting

10/24

Jesper Sindahl Nielsen

Comparison based sorting: only allowed operation ≤

A B

conclude A > B

To sort n gold pieces we needΘ(n log n) scalings/comparisons

A B

287315

Question: Can we sort faster ifwe have access to a digital scale?

Answer: Yes.

Integer Sorting

10/24

Jesper Sindahl Nielsen

Comparison based sorting: only allowed operation ≤

A B

conclude A > B

To sort n gold pieces we needΘ(n log n) scalings/comparisons

A B

287315

Question: Can we sort faster ifwe have access to a digital scale?

Answer: Yes.

Question: How much faster?

Integer Sorting

10/24

Jesper Sindahl Nielsen

Comparison based sorting: only allowed operation ≤

A B

conclude A > B

To sort n gold pieces we needΘ(n log n) scalings/comparisons

A B

287315

Question: Can we sort faster ifwe have access to a digital scale?

Answer: Yes.

Depends on accuracy of the scale

Question: How much faster?

Integer Sorting

11/24

Jesper Sindahl Nielsen

Integer Sorting

11/24

Jesper Sindahl Nielsen

Time Space NotesAlgorithm

w is the number of bits per word.

Integer Sorting

11/24

Jesper Sindahl Nielsen

Time Space NotesAlgorithm

Radix sort (1887?) O(n · wlogn

) O(n) optimal for w = O(log n)

w is the number of bits per word.

Integer Sorting

11/24

Jesper Sindahl Nielsen

Time Space NotesAlgorithm

Radix sort (1887?) O(n · wlogn

) O(n) optimal for w = O(log n)

vEB sort (1975) O(n log w) O(2w) Only O(n) space with Y-fast trie

w is the number of bits per word.

Integer Sorting

11/24

Jesper Sindahl Nielsen

Time Space NotesAlgorithm

Radix sort (1887?) O(n · wlogn

) O(n) optimal for w = O(log n)

vEB sort (1975) O(n log w) O(2w) Only O(n) space with Y-fast trie

Kirkpatrick &Reisch (1984)

O(n log wlogn

) Only asymptotically better thanvEB when w = O(log n)

w is the number of bits per word.

Integer Sorting

11/24

Jesper Sindahl Nielsen

Time Space NotesAlgorithm

Radix sort (1887?) O(n · wlogn

) O(n) optimal for w = O(log n)

vEB sort (1975) O(n log w) O(2w) Only O(n) space with Y-fast trie

Kirkpatrick &Reisch (1984)

O(n log wlogn

) Only asymptotically better thanvEB when w = O(log n)

Signature sort(1995)

O(n) O(n) Only for w = Ω(log2+ε n)

w is the number of bits per word.

Integer Sorting

11/24

Jesper Sindahl Nielsen

Time Space NotesAlgorithm

Radix sort (1887?) O(n · wlogn

) O(n) optimal for w = O(log n)

vEB sort (1975) O(n log w) O(2w) Only O(n) space with Y-fast trie

Kirkpatrick &Reisch (1984)

O(n log wlogn

) Only asymptotically better thanvEB when w = O(log n)

Signature sort(1995)

O(n) O(n) Only for w = Ω(log2+ε n)

Han & Thorup(2002) O(n

√log(w/ log n)) Randomized with mulitplicationO(n)

w is the number of bits per word.

Integer Sorting

11/24

Jesper Sindahl Nielsen

Time Space NotesAlgorithm

Radix sort (1887?) O(n · wlogn

) O(n) optimal for w = O(log n)

vEB sort (1975) O(n log w) O(2w) Only O(n) space with Y-fast trie

Kirkpatrick &Reisch (1984)

O(n log wlogn

) Only asymptotically better thanvEB when w = O(log n)

Signature sort(1995)

O(n) O(n) Only for w = Ω(log2+ε n)

Han & Thorup(2002) O(n

√log(w/ log n)) Randomized with mulitplication

Thorup (2002) O(n log log n) O(n) Randomized, AC0

O(n)

w is the number of bits per word.

Integer Sorting

11/24

Jesper Sindahl Nielsen

Time Space NotesAlgorithm

Radix sort (1887?) O(n · wlogn

) O(n) optimal for w = O(log n)

vEB sort (1975) O(n log w) O(2w) Only O(n) space with Y-fast trie

Kirkpatrick &Reisch (1984)

O(n log wlogn

) Only asymptotically better thanvEB when w = O(log n)

Signature sort(1995)

O(n) O(n) Only for w = Ω(log2+ε n)

Han & Thorup(2002) O(n

√log(w/ log n)) Randomized with mulitplication

Thorup (2002) O(n log log n) O(n) Randomized, AC0

O(n)

Han (2004) O(n log log n) O(n) Deterministic, with multiplication

w is the number of bits per word.

Integer Sorting

12/24

Jesper Sindahl Nielsen

Time NotesAlgorithm

Radix sort (1887?) O(n · wlogn

) Optimal for w = O(log n)

Signature sort (1995) O(n) Only for w = Ω(log2+ε n)

Han & Thorup (2002) O(n√log(w/ log n)

)Randomized with mulitplication

Time per element

w

log n

O(1)

O(√log log n)

log2 n log log n log2+ε n

NEW1

2

3

1

2

3

Overview

13/24

Jesper Sindahl Nielsen

1. Algorithms and Data Structures(a) Abstract description(b) Example

2. Implicit Data Structures(a) Finger Search(b) Priority Queues

3. Integer Sorting4. Text Indexing5. State and University Library

Overview

13/24

Jesper Sindahl Nielsen

1. Algorithms and Data Structures(a) Abstract description(b) Example

2. Implicit Data Structures(a) Finger Search(b) Priority Queues

3. Integer Sorting4. Text Indexing5. State and University Library

Text Indexing

14/24

Jesper Sindahl Nielsen

Text Indexing

14/24

Jesper Sindahl Nielsen

Most of us daily use services like Google, Bing, etc.

Text Indexing

14/24

Jesper Sindahl Nielsen

Most of us daily use services like Google, Bing, etc.

Text Indexing

14/24

Jesper Sindahl Nielsen

Most of us daily use services like Google, Bing, etc.

We want documents that contain both terms

Text Indexing

14/24

Jesper Sindahl Nielsen

Most of us daily use services like Google, Bing, etc.

We want documents that contain both terms

Text Indexing

14/24

Jesper Sindahl Nielsen

Most of us daily use services like Google, Bing, etc.

We want documents that contain both terms

Find all emails sent to madalgo that are not seminars

Text Indexing - Two Patterns

15/24

Jesper Sindahl Nielsen

Text Indexing - Two Patterns

15/24

Jesper Sindahl Nielsen

Problem definition:

Text Indexing - Two Patterns

15/24

Jesper Sindahl Nielsen

Problem definition:

Given documents D = d1, d2, . . . , dD

Text Indexing - Two Patterns

15/24

Jesper Sindahl Nielsen

Problem definition:

Given documents D = d1, d2, . . . , dD

preprocess and store D such that

Text Indexing - Two Patterns

15/24

Jesper Sindahl Nielsen

Problem definition:

Given documents D = d1, d2, . . . , dD

preprocess and store D such that

when given two text strings P1, P2 (online)

Text Indexing - Two Patterns

15/24

Jesper Sindahl Nielsen

Problem definition:

Given documents D = d1, d2, . . . , dD

preprocess and store D such that

when given two text strings P1, P2 (online)

we can report all di where both P1 and P2 occur

Text Indexing - Two Patterns

15/24

Jesper Sindahl Nielsen

Problem definition:

Given documents D = d1, d2, . . . , dD

preprocess and store D such that

when given two text strings P1, P2 (online)

we can report all di where both P1 and P2 occur(The two-pattern problem)

Text Indexing - Two Patterns

15/24

Jesper Sindahl Nielsen

Problem definition:

Given documents D = d1, d2, . . . , dD

preprocess and store D such that

when given two text strings P1, P2 (online)

we can report all di where both P1 and P2 occur

we can report all di where both P1 occurs but P2

does not occur

(The two-pattern problem)

Text Indexing - Two Patterns

15/24

Jesper Sindahl Nielsen

Problem definition:

Given documents D = d1, d2, . . . , dD

preprocess and store D such that

when given two text strings P1, P2 (online)

we can report all di where both P1 and P2 occur

we can report all di where both P1 occurs but P2

does not occur

(The two-pattern problem)

(The forbidden pattern problem)

Text Indexing - Two Patterns

16/24

Jesper Sindahl Nielsen

Space Query Time AuthorsProblem

Text Indexing - Two Patterns

16/24

Jesper Sindahl Nielsen

Space Query Time AuthorsProblem

Two-Pattern (2P) n2−α logc n nα + t Ferragina et al. ’03

Text Indexing - Two Patterns

16/24

Jesper Sindahl Nielsen

Space Query Time AuthorsProblem

Two-Pattern (2P) n2−α logc n nα + t Ferragina et al. ’03

Two-Pattern n logn√nt log n log2 n + t Cohen and Porat ’10

Text Indexing - Two Patterns

16/24

Jesper Sindahl Nielsen

Space Query Time AuthorsProblem

Two-Pattern (2P) n2−α logc n nα + t Ferragina et al. ’03

Two-Pattern n logn√nt log n log2 n + t

Two-Pattern n√nt log n log n + t

Cohen and Porat ’10

Hon et al. ’10

Text Indexing - Two Patterns

16/24

Jesper Sindahl Nielsen

Space Query Time AuthorsProblem

Two-Pattern (2P) n2−α logc n nα + t Ferragina et al. ’03

Two-Pattern n logn√nt log n log2 n + t

Two-Pattern n√nt log n log n + t

Cohen and Porat ’10

Hon et al. ’10

Forbidden Pattern (FP) n3/2/ logn√n + t Fischer et al. ’12

Text Indexing - Two Patterns

16/24

Jesper Sindahl Nielsen

Space Query Time AuthorsProblem

Two-Pattern (2P) n2−α logc n nα + t Ferragina et al. ’03

Two-Pattern n logn√nt log n log2 n + t

Two-Pattern n√nt log n log n + t

Cohen and Porat ’10

Hon et al. ’10

Forbidden Pattern (FP) n3/2/ logn√n + t Fischer et al. ’12

Forbidden Pattern Hon et al. ’12n√nt log n log2 n + t

Text Indexing - Two Patterns

16/24

Jesper Sindahl Nielsen

Space Query Time AuthorsProblem

Two-Pattern (2P) n2−α logc n nα + t Ferragina et al. ’03

Two-Pattern n logn√nt log n log2 n + t

Two-Pattern n√nt log n log n + t

Cohen and Porat ’10

Hon et al. ’10

Forbidden Pattern (FP) n3/2/ logn√n + t Fischer et al. ’12

Forbidden Pattern Hon et al. ’12n√nt log n log2 n + t

Both 2P and FP n√nt log n logε n + t New

Text Indexing - Two Patterns

16/24

Jesper Sindahl Nielsen

Space Query Time AuthorsProblem

Two-Pattern (2P) n2−α logc n nα + t Ferragina et al. ’03

Two-Pattern n logn√nt log n log2 n + t

Two-Pattern n√nt log n log n + t

Cohen and Porat ’10

Hon et al. ’10

Forbidden Pattern (FP) n3/2/ logn√n + t Fischer et al. ’12

Forbidden Pattern Hon et al. ’12n√nt log n log2 n + t

Both 2P and FP n√nt log n logε n + t New

Is√n query time really necessary?

Text Indexing - Two Patterns

17/24

Jesper Sindahl Nielsen

Text Indexing - Two Patterns

17/24

Jesper Sindahl Nielsen

Build suffix tree on concatenation of the documents

Text Indexing - Two Patterns

17/24

Jesper Sindahl Nielsen

Build suffix tree on concatenation of the documents

S = d1$d2$ · · · $dD$

Text Indexing - Two Patterns

17/24

Jesper Sindahl Nielsen

Build suffix tree on concatenation of the documents

S = d1$d2$ · · · $dD$

Text Indexing - Two Patterns

17/24

Jesper Sindahl Nielsen

Build suffix tree on concatenation of the documents

S = d1$d2$ · · · $dD$

Annotate each leaf with thedocument id the suffx starts in

4 5 4 7 6 4 8 9 9 1 2 3

Text Indexing - Two Patterns

17/24

Jesper Sindahl Nielsen

Build suffix tree on concatenation of the documents

S = d1$d2$ · · · $dD$

Annotate each leaf with thedocument id the suffx starts in

4 5 4 7 6 4 8 9 9 1 2 3Query: P1 P2and

Text Indexing - Two Patterns

17/24

Jesper Sindahl Nielsen

Build suffix tree on concatenation of the documents

S = d1$d2$ · · · $dD$

Annotate each leaf with thedocument id the suffx starts in

4 5 4 7 6 4 8 9 9 1 2 3Query: P1 P2and

Text Indexing - Two Patterns

17/24

Jesper Sindahl Nielsen

Build suffix tree on concatenation of the documents

S = d1$d2$ · · · $dD$

Annotate each leaf with thedocument id the suffx starts in

4 5 4 7 6 4 8 9 9 1 2 3Query: P1 P2and

P1P2

Text Indexing - Two Patterns

17/24

Jesper Sindahl Nielsen

Build suffix tree on concatenation of the documents

S = d1$d2$ · · · $dD$

Annotate each leaf with thedocument id the suffx starts in

4 5 4 7 6 4 8 9 9 1 2 3Query: P1 P2and

A

P1P2

Text Indexing - Two Patterns

17/24

Jesper Sindahl Nielsen

Build suffix tree on concatenation of the documents

S = d1$d2$ · · · $dD$

Annotate each leaf with thedocument id the suffx starts in

4 5 4 7 6 4 8 9 9 1 2 3Query: P1 P2and

A B

P1P2

Text Indexing - Two Patterns

17/24

Jesper Sindahl Nielsen

Build suffix tree on concatenation of the documents

S = d1$d2$ · · · $dD$

Annotate each leaf with thedocument id the suffx starts in

4 5 4 7 6 4 8 9 9 1 2 3Query: P1 P2and

A B

Report: A B∩

P1P2

Text Indexing - Two Patterns

17/24

Jesper Sindahl Nielsen

Build suffix tree on concatenation of the documents

S = d1$d2$ · · · $dD$

Annotate each leaf with thedocument id the suffx starts in

4 5 4 7 6 4 8 9 9 1 2 3Query: P1 P2and

A B

Report: A B∩

P1P2

= 4

Text Indexing - Two Patterns

17/24

Jesper Sindahl Nielsen

Build suffix tree on concatenation of the documents

S = d1$d2$ · · · $dD$

Annotate each leaf with thedocument id the suffx starts in

4 5 4 7 6 4 8 9 9 1 2 3Query: P1 P2and

A B

Report: A B∩

P1P2

= 4set intersection is usually hard

Text Indexing - Two Patterns

18/24

Jesper Sindahl Nielsen

Text Indexing - Two Patterns

18/24

Jesper Sindahl Nielsen

Lower Bound NotesProblem

Text Indexing - Two Patterns

18/24

Jesper Sindahl Nielsen

Lower Bound NotesProblem

Both 2P and FP ω is matrix multiplication constant (new)P (n) + nQ(n) = Ω(nω)

Text Indexing - Two Patterns

18/24

Jesper Sindahl Nielsen

Lower Bound NotesProblem

Both 2P and FP ω is matrix multiplication constant (new)P (n) + nQ(n) = Ω(nω)

Two-Pattern P (n) + n1+εQ(n) = Ω(n4/3−o(1)) Kopelowitz et al (3SUM)

Text Indexing - Two Patterns

18/24

Jesper Sindahl Nielsen

Lower Bound NotesProblem

Both 2P and FP ω is matrix multiplication constant (new)P (n) + nQ(n) = Ω(nω)

Two-Pattern P (n) + n1+εQ(n) = Ω(n4/3−o(1))

Two-Pattern P (n) + n3/4+εQ(n) = Ω(n4/3−ε)

Kopelowitz et al (3SUM)

Kopelowitz et al (3SUM)

Text Indexing - Two Patterns

18/24

Jesper Sindahl Nielsen

Lower Bound NotesProblem

Both 2P and FP ω is matrix multiplication constant (new)P (n) + nQ(n) = Ω(nω)

Two-Pattern P (n) + n1+εQ(n) = Ω(n4/3−o(1))

Two-Pattern P (n) + n3/4+εQ(n) = Ω(n4/3−ε)

Kopelowitz et al (3SUM)

Kopelowitz et al (3SUM)

Text Indexing - Two Patterns

18/24

Jesper Sindahl Nielsen

Lower Bound NotesProblem

Both 2P and FP ω is matrix multiplication constant (new)

Both 2P and FP Pointer Machine Model (new)

P (n) + nQ(n) = Ω(nω)

S(n)Q(n) = Ω(n2−o(1))

Two-Pattern P (n) + n1+εQ(n) = Ω(n4/3−o(1))

Two-Pattern P (n) + n3/4+εQ(n) = Ω(n4/3−ε)

Kopelowitz et al (3SUM)

Kopelowitz et al (3SUM)

Text Indexing - Two Patterns

18/24

Jesper Sindahl Nielsen

Lower Bound NotesProblem

Both 2P and FP ω is matrix multiplication constant (new)

Both 2P and FP Pointer Machine Model (new)

Semi group model (new)

P (n) + nQ(n) = Ω(nω)

S(n)Q(n) = Ω(n2−o(1))

S(n)Q(n)2 = Ω(n2−o(1))Both 2P and FP

Two-Pattern P (n) + n1+εQ(n) = Ω(n4/3−o(1))

Two-Pattern P (n) + n3/4+εQ(n) = Ω(n4/3−ε)

Kopelowitz et al (3SUM)

Kopelowitz et al (3SUM)

Text Indexing - Two Patterns

18/24

Jesper Sindahl Nielsen

Lower Bound NotesProblem

Both 2P and FP ω is matrix multiplication constant (new)

Both 2P and FP Pointer Machine Model (new)

Semi group model (new)

P (n) + nQ(n) = Ω(nω)

S(n)Q(n) = Ω(n2−o(1))

S(n)Q(n)2 = Ω(n2−o(1))Both 2P and FP

Two-Pattern P (n) + n1+εQ(n) = Ω(n4/3−o(1))

Two-Pattern P (n) + n3/4+εQ(n) = Ω(n4/3−ε)

Kopelowitz et al (3SUM)

Kopelowitz et al (3SUM)

Q(n) = (nt)12−α ⇒

Both 2P and FPS(n) = Ω(n

1+6α1+2α

−o(1))

Pointer Machine Modelt is the output size (new)

Text Indexing - How to prove the lower bounds?

19/24

Jesper Sindahl Nielsen

We use two different techniques:

1) Reduction from Matrix Multiplication

2) Find a hard input for a Pointer Machine Data Structure

1) Given two matrices, produce a set of documentsand queries such that the product of the twomatrices can be found

Text Indexing - How to prove the lower bounds?

20/24

Jesper Sindahl Nielsen

× =T F TT F FT T T

F F TF T TT F T

A B C

Ci,j = T ⇔ ∃` s.t.

Ai,` = B`,j = T

Text Indexing - How to prove the lower bounds?

20/24

Jesper Sindahl Nielsen

× =T F TT F FT T T

F F TF T TT F T

A B C

dA1 = 1#2#3#

123

Ci,j = T ⇔ ∃` s.t.

Ai,` = B`,j = T

Text Indexing - How to prove the lower bounds?

20/24

Jesper Sindahl Nielsen

× =T F TT F FT T T

F F TF T TT F T

A B C

dA1 = 1#2#3#

dA2 = 3#

123

Ci,j = T ⇔ ∃` s.t.

Ai,` = B`,j = T

Text Indexing - How to prove the lower bounds?

20/24

Jesper Sindahl Nielsen

× =T F TT F FT T T

F F TF T TT F T

A B C

dA1 = 1#2#3#

dA2 = 3#

dA3 = 1#3#

123

Ci,j = T ⇔ ∃` s.t.

Ai,` = B`,j = T

Text Indexing - How to prove the lower bounds?

20/24

Jesper Sindahl Nielsen

× =T F TT F FT T T

F F TF T TT F T

A B C

dA1 = 1#2#3#

dA2 = 3#

dA3 = 1#3#

dB1 = 6#

123

4 5 6 Ci,j = T ⇔ ∃` s.t.

Ai,` = B`,j = T

Text Indexing - How to prove the lower bounds?

20/24

Jesper Sindahl Nielsen

× =T F TT F FT T T

F F TF T TT F T

A B C

dA1 = 1#2#3#

dA2 = 3#

dA3 = 1#3#

dB1 = 6#

dB2 = 5#6#

123

Ci,j = T ⇔ ∃` s.t.

Ai,` = B`,j = T

Text Indexing - How to prove the lower bounds?

20/24

Jesper Sindahl Nielsen

× =T F TT F FT T T

F F TF T TT F T

A B C

dA1 = 1#2#3#

dA2 = 3#

dA3 = 1#3#

dB1 = 6#

dB2 = 5#6#

dB3 = 4#6#

123

Ci,j = T ⇔ ∃` s.t.

Ai,` = B`,j = T

Text Indexing - How to prove the lower bounds?

20/24

Jesper Sindahl Nielsen

× =T F TT F FT T T

F F TF T TT F T

A B C

dA1 = 1#2#3#

dA2 = 3#

dA3 = 1#3#

dB1 = 6#

dB2 = 5#6#

dB3 = 4#6#

d1 = 1#2#3#6#

d2 = 3#5#6#

d3 = 1#3#4#6#

123

Ci,j = T ⇔ ∃` s.t.

Ai,` = B`,j = T

Text Indexing - How to prove the lower bounds?

20/24

Jesper Sindahl Nielsen

× =T F TT F FT T T

F F TF T TT F T

A B C

dA1 = 1#2#3#

dA2 = 3#

dA3 = 1#3#

dB1 = 6#

dB2 = 5#6#

dB3 = 4#6#

d1 = 1#2#3#6#

d2 = 3#5#6#

d3 = 1#3#4#6# D

123

Ci,j = T ⇔ ∃` s.t.

Ai,` = B`,j = T

Text Indexing - How to prove the lower bounds?

20/24

Jesper Sindahl Nielsen

× =T F TT F FT T T

F F TF T TT F T

A B C

dA1 = 1#2#3#

dA2 = 3#

dA3 = 1#3#

dB1 = 6#

dB2 = 5#6#

dB3 = 4#6#

d1 = 1#2#3#6#

d2 = 3#5#6#

d3 = 1#3#4#6# D

123

Ci,j = T ⇔ ∃` s.t.

Ai,` = B`,j = T

Text Indexing - How to prove the lower bounds?

20/24

P1 = 1 P2 = 4Jesper Sindahl Nielsen

Query:

× =T F TT F FT T T

F F TF T TT F T

T

A B C

dA1 = 1#2#3#

dA2 = 3#

dA3 = 1#3#

dB1 = 6#

dB2 = 5#6#

dB3 = 4#6#

d1 = 1#2#3#6#

d2 = 3#5#6#

d3 = 1#3#4#6# D

123

Ci,j = T ⇔ ∃` s.t.

Ai,` = B`,j = T

Text Indexing - How to prove the lower bounds?

20/24

P1 = 1 P2 = 4Jesper Sindahl Nielsen

Query:

× =T F TT F FT T T

F F TF T TT F T

T

A B C

dA1 = 1#2#3#

dA2 = 3#

dA3 = 1#3#

dB1 = 6#

dB2 = 5#6#

dB3 = 4#6#

d1 = 1#2#3#6#

d2 = 3#5#6#

d3 = 1#3#4#6# D

123

Ci,j = T ⇔ ∃` s.t.

Ai,` = B`,j = T

Text Indexing - How to prove the lower bounds?

20/24

1 5P1 = 1 P2 =Jesper Sindahl Nielsen

Query:

× =T F TT F FT T T

F F TF T TT F T

T F

A B C

dA1 = 1#2#3#

dA2 = 3#

dA3 = 1#3#

dB1 = 6#

dB2 = 5#6#

dB3 = 4#6#

d1 = 1#2#3#6#

d2 = 3#5#6#

d3 = 1#3#4#6# D

123

Ci,j = T ⇔ ∃` s.t.

Ai,` = B`,j = T

Text Indexing - How to prove the lower bounds?

20/24

1 5P1 = 1 P2 =Jesper Sindahl Nielsen

Query:

× =T F TT F FT T T

F F TF T TT F T

T F

A B C

dA1 = 1#2#3#

dA2 = 3#

dA3 = 1#3#

dB1 = 6#

dB2 = 5#6#

dB3 = 4#6#

d1 = 1#2#3#6#

d2 = 3#5#6#

d3 = 1#3#4#6# D

123

Ci,j = T ⇔ ∃` s.t.

Ai,` = B`,j = T

i ∈ d` ⇔ Ai,` = Tj + 3 ∈ d` ⇔ B`,j = T

Text Indexing - How to prove the lower bounds?

20/24

1 5P1 = 1 P2 =Jesper Sindahl Nielsen

Query:

× =T F TT F FT T T

F F TF T TT F T

T F TF F TT

A B C

dA1 = 1#2#3#

dA2 = 3#

dA3 = 1#3#

dB1 = 6#

dB2 = 5#6#

dB3 = 4#6#

d1 = 1#2#3#6#

d2 = 3#5#6#

d3 = 1#3#4#6# D

123

Ci,j = T ⇔ ∃` s.t.

Ai,` = B`,j = T

i ∈ d` ⇔ Ai,` = Tj + 3 ∈ d` ⇔ B`,j = T

Text Indexing - How to prove the lower bounds?

20/24

Jesper Sindahl Nielsen

Query:

× =T F TT F FT T T

F F TF T TT F T

T F TF F TT

A B C

dA1 = 1#2#3#

dA2 = 3#

dA3 = 1#3#

dB1 = 6#

dB2 = 5#6#

dB3 = 4#6#

d1 = 1#2#3#6#

d2 = 3#5#6#

d3 = 1#3#4#6# D

123

Ci,j = T ⇔ ∃` s.t.

Ai,` = B`,j = T

i ∈ d` ⇔ Ai,` = Tj + 3 ∈ d` ⇔ B`,j = T

Text Indexing - How to prove the lower bounds?

20/24

3 5P1 = P2 =Jesper Sindahl Nielsen

Query:

× =T F TT F FT T T

F F TF T TT F T

T F TF F TT T

A B C

dA1 = 1#2#3#

dA2 = 3#

dA3 = 1#3#

dB1 = 6#

dB2 = 5#6#

dB3 = 4#6#

d1 = 1#2#3#6#

d2 = 3#5#6#

d3 = 1#3#4#6# D

123

Ci,j = T ⇔ ∃` s.t.

Ai,` = B`,j = T

i ∈ d` ⇔ Ai,` = Tj + 3 ∈ d` ⇔ B`,j = T

Text Indexing - How to prove the lower bounds?

20/24

3 5P1 = P2 =Jesper Sindahl Nielsen

Query:

× =T F TT F FT T T

F F TF T TT F T

T F TF F TT T

A B C

dA1 = 1#2#3#

dA2 = 3#

dA3 = 1#3#

dB1 = 6#

dB2 = 5#6#

dB3 = 4#6#

d1 = 1#2#3#6#

d2 = 3#5#6#

d3 = 1#3#4#6# D

123

Ci,j = T ⇔ ∃` s.t.

Ai,` = B`,j = T

i ∈ d` ⇔ Ai,` = Tj + 3 ∈ d` ⇔ B`,j = T

Text Indexing - How to prove the lower bounds?

20/24

Jesper Sindahl Nielsen

3 6P1 = P2 =Query:

× =T F TT F FT T T

F F TF T TT F T

T F TF F TT T T

A B C

dA1 = 1#2#3#

dA2 = 3#

dA3 = 1#3#

dB1 = 6#

dB2 = 5#6#

dB3 = 4#6#

d1 = 1#2#3#6#

d2 = 3#5#6#

d3 = 1#3#4#6# D

123

Ci,j = T ⇔ ∃` s.t.

Ai,` = B`,j = T

i ∈ d` ⇔ Ai,` = Tj + 3 ∈ d` ⇔ B`,j = T

Text Indexing - How to prove the lower bounds?

20/24

Jesper Sindahl Nielsen

3 6P1 = P2 =Query:

Overview

21/24

Jesper Sindahl Nielsen

1. Algorithms and Data Structures(a) Abstract description(b) Example

2. Implicit Data Structures(a) Finger Search(b) Priority Queues

3. Integer Sorting4. Text Indexing5. State and University Library

Overview

21/24

Jesper Sindahl Nielsen

1. Algorithms and Data Structures(a) Abstract description(b) Example

2. Implicit Data Structures(a) Finger Search(b) Priority Queues

3. Integer Sorting4. Text Indexing5. State and University Library

State and University Library

22/24

Jesper Sindahl Nielsen

State and University Library

22/24

Jesper Sindahl Nielsen

State and University Library

22/24

Jesper Sindahl Nielsen

State and University Library

22/24

Jesper Sindahl Nielsen

State and University Library

22/24

Jesper Sindahl Nielsen

”Old Days”

State and University Library

22/24

Jesper Sindahl Nielsen

”Old Days”

State and University Library

22/24

Jesper Sindahl Nielsen

”Old Days”

State and University Library

22/24

Jesper Sindahl Nielsen

”Old Days”

State and University Library

22/24

Jesper Sindahl Nielsen

”Old Days”

State and University Library

22/24

Jesper Sindahl Nielsen

”Old Days”

State and University Library

22/24

Jesper Sindahl Nielsen

”Old Days”

State and University Library

22/24

Jesper Sindahl Nielsen

”Old Days”

LOTS of data

A nuisance in a collection

23/24

Jesper Sindahl Nielsen

A nuisance in a collection

23/24

Jesper Sindahl Nielsen

Danish Radio collection 2 hour chunks

Before one runs out another is started

A nuisance in a collection

23/24

Jesper Sindahl Nielsen

Danish Radio collection 2 hour chunks

Before one runs out another is started⇒ overlap

A nuisance in a collection

23/24

Jesper Sindahl Nielsen

Danish Radio collection 2 hour chunks

Before one runs out another is started⇒ overlap

Annoying for customers

A nuisance in a collection

23/24

Jesper Sindahl Nielsen

Danish Radio collection 2 hour chunks

Before one runs out another is started⇒ overlap

Annoying for customers

Task: find overlap and eliminate it

A nuisance in a collection

23/24

Jesper Sindahl Nielsen

Danish Radio collection 2 hour chunks

Before one runs out another is started⇒ overlap

Annoying for customers

Task: find overlap and eliminate it

A nuisance in a collection

23/24

Jesper Sindahl Nielsen

Danish Radio collection 2 hour chunks

Before one runs out another is started⇒ overlap

Annoying for customers

Task: find overlap and eliminate it

A nuisance in a collection

23/24

Jesper Sindahl Nielsen

Danish Radio collection 2 hour chunks

Before one runs out another is started⇒ overlap

Annoying for customers

Task: find overlap and eliminate it

Use Cross Correlation.

24/24

Thank You

Jesper Sindahl Nielsen

Recommended