18
Bit Sequential (bSQ) Data Model and Peano Count Trees (P-trees) Department of Computer Science North Dakota State University, USA (the bSQ and P-tree technology is patented by NDSU)

Bit Sequential (bSQ) Data Model and Peano Count Trees (P-trees) Department of Computer Science North Dakota State University, USA (the bSQ and P-tree technology

Embed Size (px)

Citation preview

Page 1: Bit Sequential (bSQ) Data Model and Peano Count Trees (P-trees) Department of Computer Science North Dakota State University, USA (the bSQ and P-tree technology

Bit Sequential (bSQ) Data Modeland

Peano Count Trees (P-trees)

Department of Computer ScienceNorth Dakota State University, USA

(the bSQ and P-tree technology is patented by NDSU)

Page 2: Bit Sequential (bSQ) Data Model and Peano Count Trees (P-trees) Department of Computer Science North Dakota State University, USA (the bSQ and P-tree technology

Background on Spatial Data

Pixel – a point in a spaceBand – feature attribute of the pixelsValue – usually one byte (0~255)Images have different numbers of bands

– TM4/5: 7 bands (B, G, R, NIR, MIR, TIR, MIR2)– TM7: 8 bands (B, G, R, NIR, MIR, TIR, MIR2, PC)– TIFF: 3 bands (B, G, R)– Ground data: individual bands (Yield, Moisture,

Nitrate level, Temperature, elevation…)

Page 3: Bit Sequential (bSQ) Data Model and Peano Count Trees (P-trees) Department of Computer Science North Dakota State University, USA (the bSQ and P-tree technology

RSI dataset example

TIFF image Yield Map

RSI data can be viewed as collection of pixels. Each pixel has a value for each feature attribute

For example, the RSI dataset above has 320 rows and 320 columns of pixels (102,400 pixels) and 4 feature attributes (B,G,R,Y). The (B,G,R) feature bands are in the TIFF image and the Y feature is color coded in the Yield

Map.

Page 4: Bit Sequential (bSQ) Data Model and Peano Count Trees (P-trees) Department of Computer Science North Dakota State University, USA (the bSQ and P-tree technology

Spatial Data Formats

Existing formats– BSQ (Band Sequential) – BIL (Band Interleaved by Line) – BIP (Band Interleaved by Pixel)

New format– bSQ (bit Sequential)

Page 5: Bit Sequential (bSQ) Data Model and Peano Count Trees (P-trees) Department of Computer Science North Dakota State University, USA (the bSQ and P-tree technology

Spatial Data Formats (Cont.)

BAND-1 254 127 (1111 1110) (0111 1111)

14 193 (0000 1110) (1100 0001)

BAND-237 240(0010 0101) (1111 0000)

200 19(1100 1000) (0001 0011)

BSQ format (2 files)

Band 1: 254 127 14 193 Band 2: 37 240 200 19

Page 6: Bit Sequential (bSQ) Data Model and Peano Count Trees (P-trees) Department of Computer Science North Dakota State University, USA (the bSQ and P-tree technology

Spatial Data Formats (Cont.)

BAND-1 254 127 (1111 1110) (0111 1111)

14 193 (0000 1110) (1100 0001)

BAND-237 240(0010 0101) (1111 0000)

200 19(1100 1000) (0001 0011)

BSQ format (2 files)

Band 1: 254 127 14 193 Band 2: 37 240 200 19

BIL format (1 file)

254 127 37 240 14 193 200 19

Page 7: Bit Sequential (bSQ) Data Model and Peano Count Trees (P-trees) Department of Computer Science North Dakota State University, USA (the bSQ and P-tree technology

Spatial Data Formats (Cont.)

BAND-1 254 127 (1111 1110) (0111 1111)

14 193 (0000 1110) (1100 0001)

BAND-237 240(0010 0101) (1111 0000)

200 19(1100 1000) (0001 0011)

BSQ format (2 files)

Band 1: 254 127 14 193 Band 2: 37 240 200 19

BIL format (1 file)

254 127 37 240 14 193 200 19

BIP format (1 file)

254 37 127 240 14 200 193 19

Page 8: Bit Sequential (bSQ) Data Model and Peano Count Trees (P-trees) Department of Computer Science North Dakota State University, USA (the bSQ and P-tree technology

Spatial Data Formats (Cont.)

BAND-1 254 127 (1111 1110) (0111 1111)

14 193 (0000 1110) (1100 0001)

BAND-237 240(0010 0101) (1111 0000)

200 19(1100 1000) (0001 0011)

BSQ format (2 files)

Band 1: 254 127 14 193 Band 2: 37 240 200 19

BIL format (1 file)

254 127 37 240 14 193 200 19

BIP format (1 file)

254 37 127 240 14 200 193 19

bSQ format (16 files)B11 B12 B13 B14 B15 B16 B17 B18 B21 B22 B23 B24 B25 B26 B27 B28 1 1 1 1 1 1 1 0 0 0 1 0 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 1 1 0 0 1 0 0 0 1 1 0 0 0 0 0 1 0 0 0 1 0 0 1 1

Page 9: Bit Sequential (bSQ) Data Model and Peano Count Trees (P-trees) Department of Computer Science North Dakota State University, USA (the bSQ and P-tree technology

bSQ Format

Split each band into eight separate files, one for each bit position.

Reasons of using bSQ format– Different bits contribute to the value differently. – bSQ format facilitates the representation of a

precision hierarchy (from 1 bit up to 8 bit precision). – bSQ format facilitates the creation of an efficient data

structure P-tree, P-tree algebra and T-cube.

Page 10: Bit Sequential (bSQ) Data Model and Peano Count Trees (P-trees) Department of Computer Science North Dakota State University, USA (the bSQ and P-tree technology

The “tabular” formats (inverted list)

BSQ and bSQ are “tabular” formats– BSQ consist of a separate table for each feature band

– bSQ consist of a separate table for each bit of each band

One can view it this way:– The data set is initially one “relation” or table, R(K1,..,Kk, A1, A2,

…, An) where K1,..,Kk are the structure attributes and each Ai is a feature attribute.

• The structure attributes of a 2-D image are the X and Y coordinates of the pixels (rows).

• The feature attributes are the bands, B,G,R, NIR, …

• In BSQ we separate each feature into a separate file and suppress the structure attributes altogether (under the assumption that the pixels are always arranged in raster order.

• In bSQ we separate each bit of each feature into a separate file (same raster order assumption)

Page 11: Bit Sequential (bSQ) Data Model and Peano Count Trees (P-trees) Department of Computer Science North Dakota State University, USA (the bSQ and P-tree technology

Peano Count Tree (P-tree)

P-tree represents spatial bSQ data bit-by-bit in a recursive quadrant-by-quadrant arrangement.

An P-tree is a lossless representation of the original data.

A P-tree is a compressed structure.A P-tree is “count pre-computed”.

Page 12: Bit Sequential (bSQ) Data Model and Peano Count Trees (P-trees) Department of Computer Science North Dakota State University, USA (the bSQ and P-tree technology

An example of Ptree

Peano or Z-ordering Pure (Pure-1/Pure-0) quadrant Root Count

Level Fan-out QID (Quadrant ID)

1 1 1 1 1 1 0 01 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1

55

16 8 15 16

3 0 4 1 4 4 3 4

1 1 1 0 0 0 1 0 1 1 0 1

16 16

55

0 4 4 4 4

158

1 1 1 0

3

0 0 1 0

1

1 1

3

0 1

Page 13: Bit Sequential (bSQ) Data Model and Peano Count Trees (P-trees) Department of Computer Science North Dakota State University, USA (the bSQ and P-tree technology

55

16 8 15 16

3 0 4 1 4 4 3 4

1 1 1 0 0 0 1 0 1 1 0 1

An example of Ptree

Peano or Z-ordering Pure (Pure-1/Pure-0) quadrant Root Count

Level Fan-out QID (Quadrant ID)

1 1 1 1 1 1 0 01 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1

0 1 2 3

111

( 7, 1 ) ( 111, 001 ) 10.10.11

2

3

2 . 2 . 3

001

Page 14: Bit Sequential (bSQ) Data Model and Peano Count Trees (P-trees) Department of Computer Science North Dakota State University, USA (the bSQ and P-tree technology

P-tree variation – PM-tree

Peano Mask tree (PM-tree) uses mask instead of count. 1 denotes pure-1, 0 denotes pure-0 and m denotes mixed. It provides an efficient way for ANDing.

1 1 1 1 1 1 0 01 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1

m

1 m m 1

m 0 1 m 1 1 m 1

1 1 1 0 0 0 1 0 1 1 0 1

Page 15: Bit Sequential (bSQ) Data Model and Peano Count Trees (P-trees) Department of Computer Science North Dakota State University, USA (the bSQ and P-tree technology

Ptree Algebra

AndOrComplementOther (XOR, etc)

Ptree: 55 ____________/ / \ \___________ / ___ / \___ \ / / \ \ 16 ____8__ _15__ 16 / / | \ / | \ \ 3 0 4 1 4 4 3 4 //|\ //|\ //|\ 1110 0010 1101

Complement: 9 ____________/ / \ \___________ / ___ / \___ \ / / \ \ 0 ____8__ __1__ 0 / / | \ / | \ \ 1 4 0 3 0 0 1 0 //|\ //|\ //|\ 0001 1101 0010

Page 16: Bit Sequential (bSQ) Data Model and Peano Count Trees (P-trees) Department of Computer Science North Dakota State University, USA (the bSQ and P-tree technology

Ptree ANDing Operation

PM-tree1: m ______/ / \ \______ / / \ \ / / \ \ 1 m m 1 / / \ \ / / \ \ m 0 1 m 1 1 m 1 //|\ //|\ //|\ 1110 0010 1101

PM-tree2: m ______/ / \ \______ / / \ \ / / \ \ 1 0 m 0 / / \ \ 1 1 1 m //|\ 0100

Result: m ________ / / \ \___ / ____ / \ \ / / \ \ 1 0 m 0 / | \ \ 1 1 m m //|\ //|\ 1101 0100

0 100 101 102 12 132 20 21 220 221 223 23 3 & 0 20 21 22 231 RESULT0 0 0 20 20 20 21 21 21 220 221 223 22 220 221 223 23 231 231

Depth-first Pure 1 path code

Page 17: Bit Sequential (bSQ) Data Model and Peano Count Trees (P-trees) Department of Computer Science North Dakota State University, USA (the bSQ and P-tree technology

Basic, Value and Tuple Ptrees

Value Ptrees(i.e., P1, 001 = P11’ AND P12’ AND P13)

Tuple Ptrees(i.e., P001, 010, 111 = P1, 001 AND P2, 010 AND P3, 111)

AND

AND

Basic Ptrees(i.e., P11, P12, …, P18, P21, …, P28, …, P71, …, P78)

Page 18: Bit Sequential (bSQ) Data Model and Peano Count Trees (P-trees) Department of Computer Science North Dakota State University, USA (the bSQ and P-tree technology

Algorithm Build the set of confident rules, C (initially empty) as follows:

– Start with 1-bit values, 2 bands; – then 1-bit values and 3 bands; …– then 2-bit values and 2 bands;– then 2-bit values and 3 bands; …– . . .– At each stage defined above, do the following:

• Find all confident rules by rolling-up the T-cube along each potential consequent set using summation.

• Comparing these sums with the support threshold to isolate rule support sets with the minimum support.

• Compare the normalized T-cube values (divide by the rolled-up sum) with the minimum confidence level to isolate the confident rules.

• Place any new confident rule in C, but only if the rank is higher than any of its generalizations already in C.