Spatial Data What is special about Spatial Data? Briggs Henan University 2012 1

Preview:

Citation preview

Spatial DataWhat is special about Spatial Data?

Briggs Henan University 2012

1

What is needed for spatial analysis?

1. Location information—a map2. An attribute dataset: e.g

population, rainfall3. Links between the locations

and the attributes4. Spatial proximity information

– Knowledge about relative spatial location

– Topological information

Briggs Henan University 2012

2Topology --knowledge about relative spatial positioningTopography --the form of the land surface, in particular, its elevation

Berry’s geographic matrix

locationAttributes or variables

Variable 1 Variable 2 … Variable P

areal unit 1

areal unit 2.

.

.

areal unit n

locationAttributes or variables

Population Income … Variable P

areal unit 1

areal unit 2.

.

.

areal unit n

locationAttributes or Variables

Population Income … Variable P

Henan

Shanxi.

.

.

areal unit n

time

geographicassociations

geographicdistribution geographic

fact

Berry, B.J.L 1964 Approaches to regional analysis: A synthesis . Annals of the Association of American Geographers, 54, pp. 2-11

2010

1990

2000

3

Briggs Henan University 2012

Admin_Name Admin_TypeCode_GB GMI_ADMINArea_km2Area_mi2 Area_prcnt_CHArea_prcnt_AllPop2008 PopDenKM2_03Anhui Province 340000 ANH 139400 53800 1.44 1.44 61350000 463.5Beijing City 110000 BJN 16808 6490 0.17 0.17 22000000 1309Chongqing City CQG 82300 31800 0.85 0.85 31442300 379Fujian Province 350000 FUJ 121400 46900 1.26 1.26 36040000 289.2Fujian, ROC ROC PNG 182.66 70.51 0.00 91261Gansu Province 620000 GAN 454000 175300 4.71 4.70 26281200 57.7Guangdong Province 440000 GND 177900 68700 1.84 1.84 95440000 467Guangxi Province_AR 450000 GNG 236700 91400 2.45 2.45 48160000 207Guizhou Province 520000 GUI 176100 68000 1.82 1.82 37927300 222Hainan Province 460000 HAI 33920 13100 0.35 0.35 8540000 241Hebei Province 130000 HEB 187700 72500 1.94 1.94 69888200 363Heilongjiang Province 230000 HLN 460000 177600 4.77 4.76 38253900 83Henan Province 410000 HEN 167000 64500 1.73 1.73 94290000 582Hong Kong SAR HKG 1104 422 0.011 0.01 7003700 6380Hubei Province 420000 HUB 185900 71800 1.93 1.92 57110000 324Hunan Province 430000 HUN 211800 81800 2.19 2.19 63800000 316Inner MongoliaProvince_AR 150000 NMN 1183000 456800 12.28 12.24 24137300 20.2Jiangsu Province 320000 JNS 102600 39600 1.06 1.06 76773000 724Jiangxi Province 360000 JNG 166900 64400 1.73 1.73 44000000 257Jilin Province 220000 JIL 187400 72400 1.94 1.94 27340000 145

Briggs Henan University 2012

4

Types of Spatial Data

• Continuous (surface) data

• Polygon (lattice) data

• Point data

• Network data

Briggs Henan University 2012

5

Spatial data type 1: Continuous (Surface Data)

• Spatially continuous data– attributes exist everywhere

• There are an infinite number locations

– But, attributes are usually only measured at a few locations

• There is a sample of point measurements

• e.g. precipitation, elevation

– A surface is used to represent continuous data

Briggs Henan University 2012

6

Spatial data type 2: Polygon Data• polygons completely covering

the area*– Attributes exist and are

measured at each location– Area can be:

• irregular (e.g. US state or China province boundaries)

• regular (e.g. remote sensing images in raster format)

Briggs Henan University 2012

7*Polygons completely covering an area are called a lattice

Spatial data type 3: Point data

• Point pattern– The locations are the focus– In many cases, there is no attribute involved

Briggs Henan University 2012

8

Spatial data type 4: Network data• Attributes may measure

– the network itself (the roads)– Objects on the network (cars)

• We often treat network objects as point data, which can cause serious errors– Crimes occur at addresses on

networks, but we often treat them as points

Briggs Henan University 2012

9

See: Yamada and Thill Local Indicators of network-constrained clusters in spatial point patterns. Geographical Analysis 39 (3) 2007 p. 268-292

Which will we study?

Point data(point pattern analysis: clustering and dispersion)

Polygon data* (polygon analysis: spatial autocorrelation and spatial regression)

Continuous data*

(Surface analysis: interpolation, trend surface analysis and kriging)

Briggs Henan University 2012

10

1: Analyzing Point Patserns (clusterirg and dispersion)2: Analyzing Polygons  (Spatial Autocorrelation and Spatial Regression models)3Surface analysis: nterpolation, trend surface analysis and kriging)

*in the fall semester

Converting from one type of data to another.

--very common in spatial analysis

Briggs Henan University 2012

11

Converting point to continuous data:interpolation

##

###

##

#

### #

## ##

###

# # ### # #

## ## # #

## # ##

## #

####

## #### #

# ###

# #

# ## # ##

##

#

#

#####

# # # #

#

##

###

# ### ## ##### ## ## ##

#

# ## ######## ### #

## # ##

## ##

#

##

## ## # ## ##

## ##

#

# ### # # ## ### ## ## ## # # ### ### # #

##

###

##

# ##

# # ## #

#

### # # ###

#

#####

##

# ######

##### # ####

## ## #

#

#### ## ## ##

## #

# ####

# ##

###

##

##

##

12

Briggs Henan University 2012

Interpolation• Finding attribute values at locations where

there is no data, using locations with known data values

• Usually based on– Value at known location– Distance from known location

• Methods used– Inverse distance weighting– Kriging

Briggs Henan University 2012

13

Simple linear interpolation

Unknown

Known

Converting point data to polygons using Thiessen polygons

#

#

###

##

##

## ### ##

###

# # ### # #

#

# ## # ### # ##

## #

####

## #### ## ###

##

# ## # ##

#

# #

#

#####

# # # #

#

#

###

## ### ## #

#### ## ## ## ##

## ######## ### #

# # # #### #

##

##

## ## # ## ##

## ##

## #

## # # ## ##

# ## ## ## # # ### ### # ####

##

### #

## # ## #

#

### # # ###

#

#####

## # ##

######

### # #####

# ## #

#

#### ##

## #### #

# ####

# #####

##

####

14

Briggs Henan University 2012

Thiessen or Proximity Polgons(also called Dirichlet or Voronoi Polygons)

• Polygons created from a point layer

• Each point has a polygon (and each polygon has one point)

• any location within the polygon is closer to the enclosed point than to any other point

• space is divided as ‘evenly’ as possible between the polygons

A

Thiessen or Proximity Polygons

15

Briggs Henan University 2012

How to create Thiessen Polygons

Briggs Henan University 2012

16

1. Connect point to its nearest (closest) neighbor

2. Draw perpendicular line at midpoint

3. Repeat for other points

4. Thiessen polygons

Converting polygon to point data using Centroids

• Centroid—the balancing point for a polygon• used to apply point pattern analysis to polygon data• More about this later

Briggs Henan University 2012

17

Using a polygon to represent a set of points: Convex Hull

• the smallest convex polygon able to contain a set of points– no concave angles pointing inward

• A rubber band wrapped around a set of points• “reverse” of the centroid• Convex hull often used to create the boundary

of a study area– a “buffer” zone often added – Used in point pattern analysis to solve the boundary

problem.• Called a “guard zone”

No!

Briggs Henan University 2012

18

Models for Spatial Data:Raster and Vector

two alternative methods for representing spatial data

Briggs Henan University 2012

19

0 1 2 3 4 5 6 7 8 90 R T1 R T2 H R3 R4 R R5 R6 R T T H7 R T T8 R9 R

Real World

Vector RepresentationRaster Representation

Concept of Vector and Raster

line

polygon

point

20

Briggs Henan University 2012

house

river

trees

Comparing Raster and Vector ModelsRaster Model• area is covered by grid with (usually) equal-size, square cells• attributes are recorded by giving each cell a single value based on the majority feature (attribute) in the cell, such as

land use type or soil type• Image data is a special case of raster data in which the “attribute” is a reflectance value from the geomagnetic

spectrum– cells in image data often called pixels (picture elements)

Vector ModelThe fundamental concept of vector GIS is that all geographic features in the real work can be represented either as:• points or dots (nodes): trees, poles, fire plugs, airports, cities• lines (arcs): streams, streets, sewers,• areas (polygons): land parcels, cities, counties, forest, rock type Because representation depends on shape, ArcGIS refers to files containing vector data as shapefiles

21

Briggs Henan University 2012

Raster model

Briggs Henan University 2012

22

corn

wheat

fruit

clov

er

fruit

0 1 2 3 4 5 6 7 8 90123456789

1 1 1 1 1 4 4 5 5 51 1 1 1 1 4 4 5 5 51 1 1 1 1 4 4 5 5 51 1 1 1 1 4 4 5 5 51 1 1 1 1 4 4 5 5 52 2 2 2 2 2 2 3 3 32 2 2 2 2 2 2 3 3 32 2 2 2 2 2 2 3 3 32 2 4 4 2 2 2 3 3 32 2 4 4 2 2 2 3 3 3

Land use (or soil type)

186

21

Each cell (pixel) has a value between 0 and 255 (8 bits)

Image

Vector Model• point (node): 0-dimensions

– single x,y coordinate pair– zero area– tree, oil well, location for label

• line (arc): 1-dimension– two connected x,y coordinates– road, stream– A network is simply 2 or more

connected lines

• polygon : 2-dimensions– four or more ordered and connected

x,y coordinates – first and last x,y pairs are the same– encloses an area– county, lake

1

2

7 8

.x=7

Point: 7,2y=2

Line: 7,2 8,1

Polygon: 7,2 8,1 7,1 7,2

1

2

7 8

1

2

1

1

2

7 8

23

Briggs Henan University 2012

Using raster and vector models to represent surfaces

Briggs Henan University 2012

24

Representing Surfaces with raster and vector models –3 ways• Contour lines

– Lines of equal surface value– Good for maps but not computers!

• Digital elevation model (raster)– raster cells record surface value

• TIN (vector)– Triangulated Irregular Network (TIN)– triangle vertices (corners) record surface

value

Briggs Henan University 2012

25

Contour (isolines) Lines for surface representation

Advantages• Easy to understand (for most people!)

– Circle = hill top (or basin)

– Downhill > = ridge– Uphill < = valley– Closer lines = steeper slope

Disadvantages• Not good for computer representation• Lines difficult to store in computer

Contour lines of constant elevation--also called isolines (iso = equal)

Raster for surface representation

Each cell in the raster records the height (elevation) of the surface

Briggs Henan University 2012

27

Raster cells(Contain elevation values)

Surface

105

110

115

120

Raster cells with elevation valueContour lines

• a set of non-overlapping triangles formed from irregularly spaced points

• preferably, points are located at “significant” locations, – bottom of valleys, tops of ridges

• Each corner of the triangle (vertex) has:– x, y horizontal coordinates

– z vertical coordinate measuring elevation.

Triangulated Irregular Network (TIN):Vector surface representation

Point # X Y Z1 10 30 1602 25 30 1503 30 25 1404 15 20 130

etc

valley

ridge

vertex

1 2

4 3

5

Draft: How to Create a TIN surface:

from points to surfaces

Briggs Henan University 2012

29

Thiessen3.jpg Thiessen4.jpg

Links together all spatial concepts: point, line, polygon, surface

Using raster and vector models to represent polygons(and points and lines)

Briggs Henan University 2012

30

Representing Polygons (and points and lines)

with raster and vector models

Briggs Henan University 2012

31

• Raster model not good– not accurate

• Also a big challenge for the vector model– but much more accurate– the solution to this challenge resulted in the

modern GIS system

0 1 2 3 4 5 6 7 8 90123456789

1 1 1 1 1 4 4 5 5 51 1 1 1 1 4 4 5 5 51 1 1 1 1 4 4 5 5 51 1 1 1 1 4 4 5 5 51 1 1 1 1 4 4 5 5 52 2 2 2 2 2 2 3 3 32 2 2 2 2 2 2 3 3 32 2 2 2 2 2 2 3 3 32 2 4 4 2 2 2 3 3 32 2 4 4 2 2 2 3 3 3X

Using Raster model for points, lines and polygons

--not good!

Briggs Henan University 2012

32

Polygon boundary not accurate

Line not accurate

Point located at cell center--even if its not

Point “lost” if two points in one cell

For points

For lines and polygons

Using vector model to represent points, lines and polygons:

Node/Arc/Polygon TopologyThe relationships between all spatial elements (points, lines, and polygons) defined by four concepts:

• Node-ARC relationship:– specifies which points (nodes) are connected to form arcs (lines)

• Arc-Arc relationship – specifies which arcs are connected to form networks

• Polygon-Arc relationship– defines polygons (areas) by specifying

which arcs form their boundary

• From-To relationship on all arcs – Every arc has a direction from a node to a node

– This allows

• This establishes left side and right side of an arc (e.g. street)

• Also polygon on the left and polygon on the right for

every side of the polygon LeftRight

from

to

33

Briggs Henan University 2012

from to

New

!

Node TableNode ID Easting Northing

1 126.5 578.12 218.6 581.93 224.2 470.44 129.1 471.9

Node Feature Attribute TableNode ID Control Crosswalk ADA?

1 light yes yes2 stop no no3 yield no no4 none yes no

Arc TableArc ID From N To N L Poly R PolyI 4 1 A34II 1 2 A34III 2 3 A35 A34IV 3 4 A34 Polygon Feature AttributeTable

Polygon ID Owner AddressA34 J. Smith 500 BirchA35 R. White 200 Main

Polygon TablePolygon ID Arc ListA34 I, II, III, IVA35 III, VI, VII, XI

Arc Feature Attribute TableArc ID Length Condition Lanes NameI 106 good 4II 92 poor 4 BirchIII 111 fair 2IV 95 fair 2 Cherry

Birch

Cherry

I

II

III

IV

1

4 3

Node/Arc/ Polygon and Attribute DataExample of computer implementation

Spatial DataAttribute Data

A35SmithEstateA34

2

34

Briggs Henan University 2012

This is how a vector GIS system works!

This data structure was invented by Scott Morehouse at the Harvard Laboratory for Computer Graphics in the 1960s.

Another graduate student named Jack Dangermond hired Scott Morehouse, moved to Redlands, CA, started a new

company called ESRI Inc., and created the first commercial GIS system, ArcInfo, in 1971

Modern GIS was born!

Briggs Henan University 2012

35

Other ways to represent polygons with vector model

2. Whole polygon structure

3. Points and Polygons structure

•Used in earlier GIS systems before node/arc/polygon system invented

•Still used today for some, more simple, spatial data (e.g. shapefiles)

•Discuss these if we have time!

Briggs Henan University 2012

36

Vector Data Structures: Whole Polygon

Whole Polygon (boundary structure): list coordinates of points in order as you ‘walk around’ the outside boundary of the polygon.– all data stored in one file – coordinates/borders for adjacent polygons stored twice;

• may not be same, resulting in slivers (gaps), or overlap

– all lines are ‘double’ (except for those on the outside periphery)– no topological information about polygons

• which are adjacent and have a common boundary?

– used by the first computer mapping program, SYMAP, in late 1960s– used by SAS/GRAPH and many later business mapping programs– Still used by shapefiles.

Topology --knowledge about relative spatial positioning -- knowledge about shared geometry

Topography --the form of the land surface, in particular, its elevation

37

Briggs Henan University 2012

Whole Polygon:illustration A 3 4

A 4 4

A 4 2

A 3 2

A 3 4

B 4 4

B 5 4

B 5 2

B 4 2

B 4 4

C 3 2

C 4 2

C 4 0

E A B

C D

1 2 3 4 5

0

1

2

3

4

5

C 3 0

C 3 2

D 4 2

D 5 2

D 5 0

D 4 0

D 4 2

E 1 5

E 5 5

E 5 4

E 3 4

E 3 0

E 1 0

E 1 5

Data File

38

Briggs Henan University 2012

Vector Data Structures: Points & Polygons

Points and Polygons: list ID numbers of points in order as you ‘walk around’ the outside boundary

• a second file lists all points and their coordinates.– solves the duplicate coordinate/double border problem

– still no topological information• Do not know which polygons have a common border

– first used by CALFORM, the second generation mapping package, from the Laboratory for Computer Graphics and Spatial Analysis at Harvard in early ‘70s

39

Briggs Henan University 2012

Points and Polygons:Illustration 1 3 4

2 4 4

3 4 2

4 3 2

5 5 4

6 5 2

7 5 0

8 4 0

9 3 0

10 1 0

11 1 5

12 5 5

E A B

C D

1 2 3 4 5

0

1

2

3

4

5 A 1, 2, 3, 4, 1

B 2, 5, 6, 3, 2

C 4, 3, 8, 9, 4

D 3, 6, 7, 8, 3

E 11, 12, 5, 1, 9, 10, 11

Points File

12

34

5

6

78910

1112

Polygons File

40

Briggs Henan University 2012

Hopefully, you now have a better understanding of

what is special about spatial data!

Monday, we will begin talking about Spatial Statistics

Briggs Henan University 2012

41

Briggs Henan University 2012

42

Recommended