79
Data Structures and Algorithms Emmanuel Stefanakis http://www2.unb.ca/~estef/ Stefanakis, E., 2014. Geographic Databases and Information Systems. CreateSpace Independent Publ. [In English], pp.386. Get a copy from Amazon Chapter 8

Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Data Structures and Algorithms

Emmanuel Stefanakis

http://www2.unb.ca/~estef/

Stefanakis, E., 2014. Geographic Databases and Information Systems.

CreateSpace Independent Publ. [In English], pp.386.

Get a copy from Amazon

Chapter 8

Page 2: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Storing data elements

• Problem: Given two numbers find their sum

x = 20 sum = x+y = 20+10 = 30

y = 10

• Program (in C like) int x; x is a variable of type integer

int y;

int sum; built-in types (in programming languages)

x=10; y=20; int 4-byte integer

sum = x+y; float 4-byte real

printf(“%d”, sum); char 1 byte character

double 8-byte real etc.

2

Page 3: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Storing data elements

(type) (variable name)

int x; 4 bytes allocated in RAM by the operating system

x = 20;

3 Main memory (RAM)

20

2100 (physical address)

x 4 bytes

pseudo-name

x = 20

&x = 2100

*(&x) = 20

address content

20<10> = 0 0 0 1 0 1 0 0 <2>

Page 4: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Table

• Table:

– Collection of data elements of the same type (e.g., of 5 integers)

int x[5]; 5x4=20 bytes allocated in RAM by the operating system

x[0] = 2; stored in 3200-3

x[1] = 3;

x[2] = 1;

x[3] = 5;

x[4] = 8;

4

Main memory (RAM)

2

3200

x

5x4=20 bytes

3 1 5 8

x[0] x[4]

3212

stored in 3212-5

3200+3x4=3212

Direct access to all table elements

through the calculated address

Page 5: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

5

Searching the elements in a Table…

• Traditional databases…

– Alphanumeric datasets

• A set of N=11 atomic values

• Question: Is it “9” present in this set ?

3 11 6 7 17 15 1 9 8 2 13

Page 6: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

6

• Algorithm 1: Sequential (serial) Search

– Very slow!

» Complexity: O(N) (i.e., N comparisons)

3 11 6 7 17 15 1 9 8 2 13

=9 ? NO

=9 ? NO

=9 ? NO

=9 ? YES

FOUND!

N numbers

Searching the elements in a Table…

Page 7: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

7

• Algorithm 2: Binary Search

– Similar to the phone book search

– The numbers are sorted first

1 2 3 6 7 8 9 11 13 15 17

3 11 6 7 17 15 1 9 8 2 13

sorting algorithm

Searching the elements in a Table…

Page 8: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

8

• Algorithm 2: Binary Search

• Question: Is it “9” present in this set ?

1 2 3 6 7 8 9 11 13 15 17 1 2 3 6 7 8 9 11 13 15 17

1 2 3 6 7 8 9 11 13 15 17 1 2 3 6 7 8 9 11 13 15 17

Step 1: Go to the middle element

=9 ? NO

8 < 9

Searching the elements in a Table…

Page 9: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

9

• Algorithm 2: Binary Search

• Question: Is it “9” present in this set ?

1 2 3 6 7 8 9 11 13 15 17 1 2 3 6 7 8 9 11 13 15 17

9 11 13 15 17 9 11 13 15 17

Step 2: Continue recursively

=9 ? NO

13 > 9

Searching the elements in a Table…

Page 10: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

10

• Algorithm 2: Binary Search

• Question: Is it “9” present in this set ?

1 2 3 6 7 8 9 11 13 15 17

9 11 13 15 17

9 11

1 2 3 6 7 8 9 11 13 15 17

9 11 13 15 17

9 11

=9 ? YES

FOUND !

Searching the elements in a Table…

Page 11: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

11

• Algorithm 2: Binary Search

• What’s the complexity ?

– In step 1 N numbers N/20

– In step 2 N/2 numbers N/21

– In step 3 N/4 numbers N/22

– …

– In step k 1 number (left) N/2k-1 = 1

1 2 3 6 7 8 9 11 13 15 17

9 11 13 15 17

9 11

1 2 3 6 7 8 9 11 13 15 17

9 11 13 15 17

9 11

1 2 3 6 7 8 9 11 13 15 17

9 11 13 15 17

9 11

1 2 3 6 7 8 9 11 13 15 17

9 11 13 15 17

9 11

1 = N/2k-1 2k-1 = N

log2(2k-1) = log2(N)

k-1 = log2(N) k = log2N + 1

Searching the elements in a Table…

Page 12: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

12

• Algorithm 2: Binary Search

• Very fast !

– Complexity: O(log2N)

Serial search Binary Search

N O(N) O(log2N)

10 10 3

100 100 7

1,000 1,000 10

10,000 10,000 13

100,000 100,000 17

1,000,000 1,000,000 20 comparisons

Searching the elements in a Table…

Page 13: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Table

• Advantages:

– Direct access to its elements

– Fast search – binary search

• if items sorted; complexity O(Nlog2N)

• Disadvantage:

– The structure is not appropriate for dynamic data

• i.e., data that change often

13

Page 14: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Table

• It is not appropriate for dynamic data

1. The size of the table must be defined in the

beginning of a program

int x[100]; a table of 100 integers

– what if only 10 elements are stored?

» Waste of space

– what if 100 elements are stored and a new element

appears?

» Table overflows

14

Page 15: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Table

• It is not appropriate for dynamic data

2. The binary search requires sorted elements

• Hard to maintain the table sorted (costly)

– what if a new element with value 5 arrives?

» All elements right of 3 must be shifted one spot

right (costly)

– what if element 3 is deleted?

» All elements right of 3 must be shifted one spot

left (costly)

15

1 2 3 6 7 8 9 11 13 1 2 3 6 7 8 9 11 13

Page 16: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Abstract Data Types (ADT)

• Each programming language

– offers some built-in data types

• e.g., int, short, float, double, char

• Developer may define their own types

– User-defined data types or ADT

• e.g., point, line, polygon, etc.

16

Page 17: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Abstract Data Types (ADT)

• Definition of an ADT (in C-like)

struct point {

float x; 4 bytes

float y; 4 bytes

}

point P; 8 bytes

P.x =32.4; stored in the first 4 bytes

P.y = -66.2; stored in the second 4 bytes

17

Main memory (RAM)

32.4

12300 (physical address)

P

8 bytes

-66.2

Page 18: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Abstract Data Types (ADT)

• Owners table in the Cadastre Database

struct owner { struct date {

int ID; 4 short day; 2

char SURNAME[20]; 20 short month; 2

char NAME[12]; 12 int year; 4

date BIRTH_DATE; 8 }

char STREET[30]; 30

int NUMBER; 4 owner O[100]; definition of a

char P_CODE[5]; 5 table of records

char CITY[12]; 12 sizeof(owner) = 95 bytes (each record)

} 18

Page 19: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Abstract Data Types (ADT)

• Owner’s record in the Cadastre Database

owner O[100]; table of records 9500 bytes or 9.5KB

Populate the first row in the table:

O[0].ID = 46419735; O[0].NUMBER = 45;

O[0].SURNAME = “SMITH”; O[0].P_CODE = “11562”;

O[0].NAME = “JOHN”; O[0].CITY = “ATHENS”;

O[0].BIRTH_DATE.day = 15;

O[0].BIRTH_DATE.month = 8;

O[0].BIRTH_DATE.year = 1952;

O[0].STREET[30] = “EL. VENIZELOU”; 95 bytes needed

19

Page 20: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Linked-List

• This is an alternative structure …

– for storing a collection of data elements of the

same type (built-in or ADT)

• data are maintained in a list

– The linked-list makes use of an ADT called node:

struct node {

int x; data element (4 bytes)

int *p; address of the next element in RAM (4bytes)

}

node N; definition of a new node

20

x p

N 8 bytes

Page 21: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Linked-List

• A node accommodates…

– one data element, and

– the address of the next node (one pointer)

• To store N data elements…

– N nodes are required

– these nodes are stored scattered in the RAM

• any new node is stored randomly in the RAM

and pointed by the previous node in the list

– all nodes can be accessed through the head node

21

data pointer

Node

Page 22: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Linked-List

• A linked list example…

22

3 3210

Head

2100

Main memory (RAM)

2100

6 4523

3210

7 6100

4523

9 4780

6100

Page 23: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Linked-List

• Advantages

– Insertions and deletions of nodes (i.e., data

elements are easily handled)

– A sorted list can be easily maintained

• through appropriate arrangement of pointers

– The list size is always appropriate and dependent

on the stored list elements

– i.e., 10 elements 10 nodes

100 elements 100 nodes

no overflow occurs

23

Page 24: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Linked-List

• Example: insert data element 4…

• The list must be maintained sorted

24

3 3210

Head

2100

Main memory (RAM)

2100

6 4523

3210

7 6100

4523

9 4780

6100

4 ?

Page 25: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Linked-List

• Example: insert data element 4…

• The list must be maintained sorted

25

3 3210

Head

2100

Main memory (RAM)

2100

6 4523

3210

7 6100

4523

9 4780

6100

4 ? Step 1:

Create a node;

store 4; and

allocate memory

in RAM

4 null

3930

Page 26: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Linked-List

• Example: insert data element 4…

• The list must be maintained sorted

26

3 3930

Head

2100

Main memory (RAM)

2100

6 4523

3210

7 6100

4523

9 4780

6100

4 ? Step 2:

Arrange the

pointers

4 3210

3930

Page 27: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Linked-List

• Example: delete data element 7…

• The list must be maintained sorted

27

3 3930

Head

2100

Main memory (RAM)

2100

6 4523

3210

7 6100

4523

9 4780

6100

7 ?

4 3210

3930

Page 28: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Linked-List

• Example: delete data element 7…

• The list must be maintained sorted

28

3 3930

Head

2100

Main memory (RAM)

2100

6 6100

3210

7 null

4523

9 4780

6100

7 ? Step 1:

Arrange the

pointers

4 3210

3930

Page 29: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Linked-List

• Example: delete data element 7…

• The list must be maintained sorted

29

3 3930

Head

2100

Main memory (RAM)

2100

6 6100

3210

9 4780

6100

7 ? Step 2:

Free node

(de-allocate

memory)

4 3210

3930

Page 30: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Linked-List

• Disadvantage…

– the binary search cannot be applied…

• … although the list is sorted

• … any node is reachable through head only

– only serial search !

30

1 2 3 6 7 8 9 11 13 15 17 1 2 3 6 7 8 9 11 13 15 17

1 2 3 6 7 8 9 11 13 15 17

Head

Page 31: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Table (vs) Linked-List

• Tables

– Support fast search (binary search)

– Inefficient dynamic data handling

• Linked-Lists

– Efficient dynamic data handling

– Fast search not supported

• Need to combine the advantages

– i.e., fast search and efficient handling Trees

31

Page 32: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Tree

• Trees are…

– Hierarchical and non-linear (multi-dimensional)

linked-lists

– Consist of nodes

• Each node has

– One precedent

– Zero, one or more descendant

– All nodes are accessed through the root (head)

only

32

Page 33: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Tree

• Example tree…

33

Α

Ε

Β Γ

Κ Λ

Ζ

Η Ι

Δ

Θ

Μ

root

degree = 3 ύψος = 5

Leaf-nodes: { G,H,L,E,J,K }

level = 0

level = 4

sub-tree ABDIL = path

Α

Ε

Β Γ

Κ Λ

Ζ

Η Ι

Δ

Θ

Μ

ύψος = 5

Α

Ε

Β Γ

Κ Λ

Ζ

Η Ι

Δ

Θ

Μ

A

E

B C

J K

F

G I

D

H

L

height = 5

Page 34: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Tree

• A binary tree…

34

Ε

Β

Δ

Γ

Α

Ζ Η

E

B

D

C

A

F G

Data

Node structure

Pointer to the

Left child

Pointer to the

Right child

All nodes of degree 2; i.e., 2 children per node (maximum)

Page 35: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Tree

• A full and balanced binary tree…

35 All leaf-nodes at the same level. All non-leaf nodes have two children.

Ε

Β

Α

Κ Λ

Δ

Θ Ι

Η

Γ

Ξ Ο

Ζ

Μ Ν

E

B

A

J K

D

H I

G

C

N O

F

L M

Page 36: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Tree

• A full and balanced binary tree…

– Height (h) vs Number of nodes (N)

36

2h+(2h-1) = N 2h+1 = N+1 h = log2(N+1)-1

Ε

Β

Α

Κ Λ

Δ

Θ Ι

Η

Γ

Ξ Ο

Ζ

Μ Ν

E

B

A

J K

D

H I

G

C

N O

F

L M

Tree level Nodes in this

level

Sum of nodes from the roo to

this level Height

0 (root) 20 = 1 2

0 0

1 21

21+(2

1-1) 1

2 22 2

2+(2

2-1) 2

... ... ...

h 2h 2

h+(2

h-1) h

t

Page 37: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Tree

• The binary search tree…

– It is a binary tree

– For each node,

• data values in the left sub-

tree are less than x (15)

• data values in the right sub-

tree are greater than x (15)

37

x

x >x

x

x >x

2 13

7

4

3

6

15

18

2 13

7

4

3

6

15

18

13

7

4

3

6

15

18

7

4

3

6

15

18

4

3

6

15

18

3

6

15

186

15

18

Page 38: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Tree

• Building the binary search tree…

Insert: { 15, 18, 6, 3, 4, 7, 13, 2, 20, 9, 17 }

38

x

x >x

x

x >x

15

15

18

(a) Insert: 15 (b) Insert: 18

6

15

186

15

18

3

6

15

186

15

18

(c) Insert: 6 (d) Insert: 3

4

3

6

15

18

4

3

6

15

18

3

6

15

186

15

18

7

4

3

6

15

18

7

4

3

6

15

18

4

3

6

15

18

3

6

15

186

15

18

(e) Insert: 4 (f) Insert: 7

Page 39: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Tree

• Building the binary search tree…

Insert: { 15, 18, 6, 3, 4, 7, 13, 2, 20, 9, 17 }

39

x

x >x

x

x >x

13

7

4

3

6

15

18

13

7

4

3

6

15

18

7

4

3

6

15

18

4

3

6

15

18

3

6

15

186

15

18

2 13

7

4

3

6

15

18

2 13

7

4

3

6

15

18

13

7

4

3

6

15

18

7

4

3

6

15

18

4

3

6

15

18

3

6

15

186

15

18

(g) Insert: 13 (h) Insert: 2

20

2 13

7

4

3

6

15

18

20

2 13

7

4

3

6

15

18

2 13

7

4

3

6

15

18

13

7

4

3

6

15

18

7

4

3

6

15

18

4

3

6

15

18

3

6

15

186

15

18

9

20

2 13

7

4

3

6

15

18

9

20

2 13

7

4

3

6

15

18

20

2 13

7

4

3

6

15

18

2 13

7

4

3

6

15

18

13

7

4

3

6

15

18

7

4

3

6

15

18

4

3

6

15

18

3

6

15

186

15

18

(i) Insert: 20 (j) Insert: 9

Page 40: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Tree

• Building the binary search tree…

Insert: { 15, 18, 6, 3, 4, 7, 13, 2, 20, 9, 17 }

Search for: 13 (follow the arrows)

if full and balanced search in O(log2N) 40

x

x >x

x

x >x

9

20

2 13

7

4

3

6

15

18

17

9

20

2 13

7

4

3

6

15

18

17

9

20

2 13

7

4

3

6

15

18

9

20

2 13

7

4

3

6

15

18

20

2 13

7

4

3

6

15

18

2 13

7

4

3

6

15

18

13

7

4

3

6

15

18

7

4

3

6

15

18

4

3

6

15

18

3

6

15

186

15

18

17

(k) Insert: 17

Page 41: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Tree

• Building the binary search tree…

– depends on the order the elements are

inserted

– these three trees contain the same elements:

{2,3,6,8,11,13,15} 41

x

x >x

x

x >x

3

6

13

8

112 15

3

6

13

8

112 15

2

6

11

8

153

13

2

6

11

8

153

13

2

6

13

8

11

3

15

2

6

13

8

11

3

15

best structure

Page 42: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Tree

• Tree 2-3:

– Always balanced

– Logarithmic search guaranteed – O(log2N)

– Data elements reside in the leaf-nodes only

– Non-leaf nodes are index nodes

– Each non-leaf node may have 2 or 3 children

– Each leaf node may accommodate 2 or 3 elements 42

Page 43: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Tree

• Tree 2-3:

43

7 16

5 - 8 12 19 -

19125 872 16

7 167 16

5 -5 - 8 128 12 19 -19 -

19125 872 16

log3N < height < log2N

Page 44: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Tree

• Tree 2-3:

– the tree grows from the top to remain balanced

44

7 16

5 - 8 12 19 -

19125 872 16

7 167 16

5 -5 - 8 128 12 19 -19 -

1919121255 887722 1616

7 16

5 6 8 12 19 -

19125 872 166

7 167 16

5 65 6 8 128 12 19 -19 -

1919121255 887722 161666

Initial tree Insert 6

7 16

5 6 8 12 19 -

19125 872 166

7 167 16

5 65 6 8 128 12 19 -19 -

1919121255 887722 161666

? 10

7 16

5 6 8 - 19 -

195 872 166

12 -

1210

?

7 167 16

5 65 6 8 -8 - 19 -19 -

191955 887722 161666

12 -12 -

12121010

?

Insert 10 Node split to insert 10

Page 45: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Tree

• Tree 2-3:

– the tree grows from the top to remain balanced

45

10 -

5 6 8 - 19 -

195 872 166

12 -

1210

16 -7 -

10 -10 -

5 65 6 8 -8 - 19 -19 -

191955 887722 161666

12 -12 -

12121010

16 -16 -7 -7 -

10 inserted; the tree has grawn one level

Page 46: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Tree

• Tree 2-3:

– Tree height complexity < O(log2N)

• Worst case: all nodes have only 2 children

In level 0 (leaves) Ν/20 = Ν nodes

In level 1 Ν/21 = Ν/2 nodes

In level 2 Ν/22

...

In level h (root) Ν/2h = 1 node

Ν/2h = 1 h = log2N

46

10 -

5 6 8 - 19 -

195 872 166

12 -

1210

16 -7 -

10 -10 -

5 65 6 8 -8 - 19 -19 -

191955 887722 161666

12 -12 -

12121010

16 -16 -7 -7 -

Page 47: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Structure: Tree

• B+tree:

– Extending the idea of tree 2-3.

– Used in DBMS to index alphanumeric data

– Each node has between M and M/2 children (e.g., M=100)

47

22 24 25 26 28 29 30 32 34 36 37 39 41 42

24 28 34 39 42

34 42

22 24 25 26 28 29 30 32 34 36 37 39 41 42

24 28 34 39 42

34 42

M=4

Page 48: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

48

B+Tree Index Structure

• Indexing the attribute table…

ID NAME AGE

22 LOLA 34

41 NINI 23

39 KIKI 24

24 ZIZI 67

26 PAPA 21

42 MAMA 76

39 LALA 45

37 WIWI 34

32 RIRI 67

36 TOTO 24

34 SASA 39

30 PEPE 19

28 BOBO 49

25 ZOZO 72

OWNERS

pointer to

the table record

An index is built on attribute: ID

22 24 25 26 28 29 30 32 34 36 37 39 41 42

24 28 34 39 42

34 42

22 24 25 26 28 29 30 32 34 36 37 39 41 42

24 28 34 39 42

34 42

Page 49: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Introduction to

Graph Theory

Emmanuel Stefanakis

[email protected]

Page 50: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

• Definition…

– A graph G(N,E)

• is a collection of

– Nodes N, and

– Edges E

• can simulate…

– a road network

– the countries of a continent (nodes) and their

topological relationships (edges) 50

Graphs…

Page 51: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

• Types of nodes and edges…

51

Graphs…

edge

node 1 blind edge

floating

edge

multiple edges

node 2

node 1 is “adjacent” to node 2 (connected through a single edge)

Page 52: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

• Types of graphs…

52

Graphs…

A simple “non-directed” graph

A

B

C

D

E

F

A “directed” graph

A

B

C

D

E

F

Page 53: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

• Types of graphs…

53

Graphs…

A “directed and weighted” graph

moving from node B to E (through nodes C and D) costs: 3+8+6=17

A

B

C

D

E

F

5

3 8

4

6

2

9

7

5

3 8

4

6

2

9

7

The weights may

represent: time,

distance, fuel

consumption, etc.

Page 54: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

• The definition of a graph…

– does not imply its graphical representation

54

Graphs…

1

5

3

4

2

1

5

3

4

2

Both graphs G(N,E) are defined as:

N: {1, 2, 3, 4, 5}; E (directed): {(1,2), (3,2), (4,3), (4,1), (3,5)}

Page 55: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

• The concept of “Polygon”

– A closed chain of edges form a polygon

– Any edge is the border of two polygons

– A directed edge has a left and a right polygon

– A polygon may have holes (or islands)

– A hole may be simple (if it consists of one edge with the

same start and end node) or complex (if it consists of more

than one edges

– An edge is called bridge edge, if when deleted a hole is

created non-connected graphs

55

Graphs…

Page 56: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

• The concept of “Polygon”

56

Graphs…

polygon

bridge simple hole

complex hole

Page 57: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Graphs…

• Planar and non-planar

graphs…

– Planar graphs…

• When two edges intersect

node

– Non-planar graphs…

• Two edges may intersect

without any node 57

Page 58: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Planar Graphs…

• In Planar graphs…

– The concept of polygon is meaningful…

– Euler criterion

• In planar graphs: p – e + n = 2

– p: number of polygons

– e: number of edges

– n: number of nodes

58

Page 59: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Planar Graphs…

• Euler criterion

– In planar graphs: p – e + n = 2

p = 4

e = 8 4 – 8 + 6 = 2 (ok!)

n = 6

59

Check the consistency of the topology

(a necessary but not sufficient condition)

Page 60: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Planar Graphs…

• Euler criterion (generalized version)…

– for c non-connected graphs

– In planar graphs: p – e + n – c = 1

p = 6

e = 11 6 – 11 + 9 – 3 = 1 (ok!)

n = 9

c = 3

60

Page 61: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Representation of a Graph…

• Data structure: table

– Simple graph (symmetric table)

61

A

B

C

D

E

F

1 1 0 0 0 0 F

1 1 1 1 0 0 E

0 1 1 1 0 1 D

0 1 1 1 1 0 C

0 0 0 1 1 1 B

0 0 1 0 1 1 A

F E D C B A

1 1 0 0 0 0

1 1 1 1 0 0

0 1 1 1 0 1

0 1 1 1 1 0

0 0 0 1 1 1

0 0 1 0 1 1

Page 62: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Representation of a Graph…

• Data structure: table

– Directed graph

62

A

B

C

D

E

F

1 1 0 0 0 0 F

1 1 0 1 0 0 E

0 1 1 0 0 1 D

0 0 1 1 0 0 C

0 0 0 1 1 1 B

0 0 0 0 0 1 A

F E D C B A

1 1 0 0 0 0

1 1 0 1 0 0

0 1 1 0 0 1

0 0 1 1 0 0

0 0 0 1 1 1

0 0 0 0 0 1

from

to

Page 63: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Representation of a Graph…

• Data structure: table

– Directed and weighted graph

63

A

B

C

D

E

F

5

3 8

4

6

2

9

7

5

3 8

4

6

2

9

7

0 7 F

9 0 4 E

6 0 2 D

8 0 C

3 0 5 B

0 A

F E D C B A

0 7

9 0 4

6 0 2

8 0

3 0 5

0

from

to

Page 64: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Representation of a Graph…

• Data structure: linked-list

– Directed graph

64

A

B

C

D

E

F Α Β Γ Δ Ε Ζ

Α Δ Α Γ Ε

Γ Ε Ζ

Α A Β B Γ C Δ D Ε E Ζ F

Α A Δ D Α A Γ C Ε E

Γ C Ε E Ζ F

Page 65: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Representation of a Graph…

• Data structure: linked-list

– Directed and weighted graph

65

A

B

C

D

E

F

5

3 8

4

6

2

9

7

5

3 8

4

6

2

9

7

Α Β Γ Δ Ε Ζ

Α/5 Δ/8 Α/2 Γ/4 Ε/7

Γ/3 Ε/6 Ζ/9

Α A Β B Γ C Δ D Ε E Ζ F

Α/5 A/5 Δ/8 D/8 Α/2 A/2 Γ/4 C/4 Ε/7 E/7

Γ/3 C/3 Ε/6 E/6 Ζ/9 F/9

Page 66: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Shortest Path in a Network

• Dijkstra Algorithm

– Finds the shortest path from A to all graph nodes

66

A

B

C

D E

10

15

10

2 4

2 8

3

10

15

10

2 4

2 8

3

N o = A

Page 67: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Shortest Path in a Network

• Dijkstra Algorithm

67

Page 68: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Shortest Path in a Network

• Dijkstra Algorithm

– Finds the shortest path from A to all graph nodes

68

A

B

C

DE

10

15

10

2 4

2

8

3

10

15

10

2 4

2

8

3

Ko = A

11 8 6 2 0 E ABCDE 5

11 8 6 2 0 D E ABCD 4

12 8 6 2 0 C ED ABC 3

12 10 6 2 0 B CED AB 2

15 10 2 0 A BCE A 1

0 A 0

Ε Δ Γ Β Α Curr

CN

Next

NVN Visited

VN Step

11 8 6 2 0 5

11 8 6 2 0 4

12 8 6 2 0 3

12 10 6 2 0 2

15 10 2 0 1

0 0

E D C B A

Page 69: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Shortest Path in a Network

• Dijkstra Algorithm

– Finds the shortest path from A to all graph nodes

69

A

B

C

DE

10

15

10

2 4

2

8

3

10

15

10

2 4

2

8

3

Ko = A

118620EABCDE5

118620DEABCD4

128620CEDABC3

1210620BCEDAB2

151020ABCEA1

0A0

ΕΔΓΒΑCurrNextVisitedStep

1186205

1186204

1286203

12106202

1510201

00

EDCBA

Find shortest path from A to E:

E (11-3=8) D (8-2=6) C (6-4=2) B (2-2=0) Α.

Hence: ABCDE

Page 70: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Shortest Path in a Network

• Dijkstra Algorithm

– Complexity: O(K2)

• where K the number of nodes.

• Very slow in big networks!

• Artificial Intelligence:

– A* Algorithm (heuristics)

70

Page 71: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Graph problems

• Other problems solved using graphs…

– TSP problem

• Traveling salesman problem

– Coloring the polygons of a thematic map using as

less colors as possible.

71

A

B

Γ

Ε

Δ

Ζ

ΗwAZ

wAB

wΒΔ

wΓΔ

wAH

wΒΓ

wΖΕ

wΕΗ

wΓΕ

wΓΗ

wΖΔ

A

B

Γ

Ε

Δ

Ζ

Η

A

B

Γ

Ε

Δ

Ζ

ΗwAZ

wAB

wΒΔ

wΓΔ

wAH

wΒΓ

wΖΕ

wΕΗ

wΓΕ

wΓΗ

wΖΔ

Page 72: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Topological Data Structures

• In GIS usually …

– we build topological data structures to accelerate

processing

– Topological relationships can be extracted through

processing…

• however, it is time consuming and

• this is why sometimes they are pre-calculated

72

Page 73: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Topological Data Structures

• A planar graph

• The description of its topology (data structure)

73

Page 74: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Topological Data Structures

• Some typical queries

– Supported by the topological data structure

74

GENERAL QUERY EXAMPLE

Find the neighbors of a polygon. Which parcels meet the park?

Is there any hole in the polygon? Which are the islands of this lake?

Are the two nodes connected? Is there a road connecting the two towns?

Page 75: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Topological Data Structures

• Some typical queries

– Supported by the topological data structure

75

GENERAL QUERY EXAMPLE

Which is the shortest path from node A to

node B?

Which is the shortest route from town A to

town B?

Which edges meet at this node? Which roads end at this square?

Which is the intersection of two polygons

A and B?

Which area has a annual average temperature

higher than 10oC and is a forest?

Page 76: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Topological Data Structures

• A example structure (in CARIS)

76

Page 77: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Topological Data Structures

• A example structure (in CARIS)

77

Page 78: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Topological Data Structures

• A example structure (in CARIS)

78

Page 79: Semi-structured Data and XML in Geographic Data Modeling ...estef/UNB_Home_files/GISBookSlides/Chapter_8.pdf · –one data element, and –the address of the next node (one pointer)

Data Structures and Algorithms

Emmanuel Stefanakis

http://www2.unb.ca/~estef/

Stefanakis, E., 2014. Geographic Databases and Information Systems.

CreateSpace Independent Publ. [In English], pp.386.

Get a copy from Amazon

Chapter 8