View
217
Download
0
Tags:
Embed Size (px)
Citation preview
2
ContentsContents
– the basics of data storage– overview of databases
• the database approach
• types of databases
• databases in GIS
– design considerations– development of an ARC/INFO database
– the basics of data storage– overview of databases
• the database approach
• types of databases
• databases in GIS
– design considerations– development of an ARC/INFO database
3
Conceptual, logical and physical ...Conceptual, logical and physical ...
Conceptual Logical Physical
4
A storage hierarchy ...A storage hierarchy ...
– files/tables• records
• fields(types …)
– databases
– information systems
– decision support systems (DSS)
– approaches to storage• application/file based
• databases
– files/tables• records
• fields(types …)
– databases
– information systems
– decision support systems (DSS)
– approaches to storage• application/file based
• databases
increasing
complexity
5
Application based approachApplication based approach
PermitsPermits
Tax/RatesAssessment
Tax/RatesAssessment
Assessment Data
Permit Data
Sewer DataSewerMaintenance
SewerMaintenance
Applications using data stored as Application Specific data
6
Database approachDatabase approach
PermitsPermits
Tax/RatesAssessmentTax/RatesAssessment Assessment DataAssessment Data
Permit DataPermit Data
Sewer DataSewer DataSewerMaintenanceSewer
MaintenanceD
ata
base M
an
ag
em
en
t S
yste
m
Database approach and use of shared data -implications for GIS
7
Database … a definitionDatabase … a definition
• A collection of interrelated data stored together with controlled redundancy to serve one or more applications in an optimal fashion.
• A common and controlled approach is used in adding new data and modifying and retrieving existing data within the data base
• A collection of interrelated data stored together with controlled redundancy to serve one or more applications in an optimal fashion.
• A common and controlled approach is used in adding new data and modifying and retrieving existing data within the data base
8
Databases… objectives/advantagesDatabases… objectives/advantages
– centralised data storage and management … global view of data … data dictionary
• standardisation of all aspects of data management
• reduced duplication
• multiple access / retrieval flexibility
• integrity constraints … validation enforced
• ...
– data base management system (DBMS)
– centralised data storage and management … global view of data … data dictionary
• standardisation of all aspects of data management
• reduced duplication
• multiple access / retrieval flexibility
• integrity constraints … validation enforced
• ...
– data base management system (DBMS)
9
Database/s… data dictionaryDatabase/s… data dictionary
– the most critical (?) element of a database– data about data… metadata– essential for system development– uses include
• design - entities and data relationships
• data capture - entry/validation
• operations - program documentation
• maintenance (impact assessment of proposed changes , est. of effort, cost …)
– the most critical (?) element of a database– data about data… metadata– essential for system development– uses include
• design - entities and data relationships
• data capture - entry/validation
• operations - program documentation
• maintenance (impact assessment of proposed changes , est. of effort, cost …)
12
DBMS … key modulesDBMS … key modules
– a data description/definition module• defines/creates/restructures
• enforces rules
– a query module• retrieval for queries, ad-hoc queries, simple reports
– a report writing program– a high level language interface– ...
– a data description/definition module• defines/creates/restructures
• enforces rules
– a query module• retrieval for queries, ad-hoc queries, simple reports
– a report writing program– a high level language interface– ...
13
Database… stages of developmentDatabase… stages of development
– information systems plan for organisation – system specification … user needs analysis– conceptual design … data modelling
• hardware and software independent
– physical design … database design– database implementation– monitoring/audit
– information systems plan for organisation – system specification … user needs analysis– conceptual design … data modelling
• hardware and software independent
– physical design … database design– database implementation– monitoring/audit
15
Organisational strategy and ITLand Information System (LIS) (i)
Organisational strategy and ITLand Information System (LIS) (i)
– Problems/issues:• rationalisation of land related information in
government agencies
• the removal/reduction of duplication
• introduction of economies in data capture, maintenance and storage
• better (and wider) access to data
– Problems/issues:• rationalisation of land related information in
government agencies
• the removal/reduction of duplication
• introduction of economies in data capture, maintenance and storage
• better (and wider) access to data
solutions ...
16
Organisational strategy and ITLand Information System (LIS) (ii)
Organisational strategy and ITLand Information System (LIS) (ii)
– Solutions:• better data distribution mechanism (data format and
location transparent to user)
• knowledge of data distribution built into the data dictionary
• reduction of data duplication
• uniform query language (SQL)
• coding and data interchange standardisation ( … SDTS)
– Solutions:• better data distribution mechanism (data format and
location transparent to user)
• knowledge of data distribution built into the data dictionary
• reduction of data duplication
• uniform query language (SQL)
• coding and data interchange standardisation ( … SDTS)
19
Database types - hierarchical (i)Database types - hierarchical (i)
– lends itself to GIS use as data are often hierarchical in structure e.g. municipality x province x country
– records divided into logically related fields … connected in a tree-like arrangement
– master field in each group of records … pointers … updates require pointers to be modified
– fast preset queries … ad hoc queries difficult or impossible
– lends itself to GIS use as data are often hierarchical in structure e.g. municipality x province x country
– records divided into logically related fields … connected in a tree-like arrangement
– master field in each group of records … pointers … updates require pointers to be modified
– fast preset queries … ad hoc queries difficult or impossible
20
Database types - hierarchical (ii)Database types - hierarchical (ii)
COUNTRY (USA)
States
Counties
Boundaries
Nodes
23
Database types - network (i)Database types - network (i)
– similar to hierarchical but have multiple connections between files to accommodate many to many (M:M) relationships
– access to a particular file without searching the entire hierarchy above that file
– linked records … quick preset searches … large overhead in pointer management
– modification after creation difficult
– similar to hierarchical but have multiple connections between files to accommodate many to many (M:M) relationships
– access to a particular file without searching the entire hierarchy above that file
– linked records … quick preset searches … large overhead in pointer management
– modification after creation difficult
26
Database types - relational (i)Database types - relational (i)
– model developed from mathematics– records and fields in a 2-dimensional table– no pointers etc … any field can be used to link
one table to another– normalisation … redundancy/stable structure– ad hoc queries SQL… modifications easy– not very efficient for GIS …SQL3
– model developed from mathematics– records and fields in a 2-dimensional table– no pointers etc … any field can be used to link
one table to another– normalisation … redundancy/stable structure– ad hoc queries SQL… modifications easy– not very efficient for GIS …SQL3
30
Centralised vs distributedCentralised vs distributed
– a database does not necessarily mean a centralised arrangement i.e. all data in one physical place
– a database does not necessarily mean a centralised arrangement i.e. all data in one physical place
31
GIS and distributed databases ...– trend towards open systems ...
• special hardware and software can be used widely … specific applications optimised
• system/network communications is easier
– modular implementation from an overall design … incremental change
– unlimited capacity (nodes) … lower risks
32
Approaches to GIS system designApproaches to GIS system design
– develop a proprietary system– develop a hybrid system: proprietary graphics +
commercial DBMS for attribute data (e.g. ARC/INFO)
– use commercial DBMS and develop spatial functions and graphics display used in geographic analysis (e.g. siroDBMS, System9)
– develop a spatial DBMS from scratch
– develop a proprietary system– develop a hybrid system: proprietary graphics +
commercial DBMS for attribute data (e.g. ARC/INFO)
– use commercial DBMS and develop spatial functions and graphics display used in geographic analysis (e.g. siroDBMS, System9)
– develop a spatial DBMS from scratch
35
GIS databases … some problems (i)GIS databases … some problems (i)
– centralised risk• centralisation demands better quality control other higher
potential for disaster
– cost• large DBMSs are expensive to design, implement and operate
• piecemeal design is difficult
– complexity• need to keep track of complex hardware and software
• need to keep track of graphical as well as attribute data and the links
– centralised risk• centralisation demands better quality control other higher
potential for disaster
– cost• large DBMSs are expensive to design, implement and operate
• piecemeal design is difficult
– complexity• need to keep track of complex hardware and software
• need to keep track of graphical as well as attribute data and the links
36
GIS databases … some problems (ii)GIS databases … some problems (ii)
Cascading effects of change in a GIS database (ESRI 1989)
39
Objectives of designObjectives of design
– a good design results in a database which:• contains necessary data but no redundant data
• organises data so that different users access the same data
• accommodates different views of the data
• distinguishes applications which maintain data from those that use it
• appropriately represents, codes and organises geographic features
– a good design results in a database which:• contains necessary data but no redundant data
• organises data so that different users access the same data
• accommodates different views of the data
• distinguishes applications which maintain data from those that use it
• appropriately represents, codes and organises geographic features
40
Design methodology (for ARC/INFO)Design methodology (for ARC/INFO)
– conceptual model• model the users’ view
• define entities and their relationships
– logical model• identify representation of entities
• match to ARC/INFO data model
• organise into geographic data sets
– physical model
– conceptual model• model the users’ view
• define entities and their relationships
– logical model• identify representation of entities
• match to ARC/INFO data model
• organise into geographic data sets
– physical model
41
Design methodology (for ARC/INFO)Design methodology (for ARC/INFO)
– 1. Model the users’ view– 2. Define entities and their relationships– 3. Identify representation of entities– 4. Match to ARC/INFO data model– 5. Organise into geographic data sets –
– 1. Model the users’ view– 2. Define entities and their relationships– 3. Identify representation of entities– 4. Match to ARC/INFO data model– 5. Organise into geographic data sets –
42
1. Model the users’ view1. Model the users’ view
– create a model of work performed by users for which ‘location’ is a factor
• identify organisational functions
• identify the data which supports the functions
– organise data into sets of geographic features• data function matrix
– high level classification of data
– interdependence of data and function
– difference between users and creators of data
– create a model of work performed by users for which ‘location’ is a factor
• identify organisational functions
• identify the data which supports the functions
– organise data into sets of geographic features• data function matrix
– high level classification of data
– interdependence of data and function
– difference between users and creators of data
45
2. Define entities and their relationships2. Define entities and their relationships
– entities: distinguishable objects which have a common set of properties
• identify and describe entities
• identify and describe the relationship among these entities
• document the process– diagrams
– data dictionary
• Normalise the data
– entities: distinguishable objects which have a common set of properties
• identify and describe entities
• identify and describe the relationship among these entities
• document the process– diagrams
– data dictionary
• Normalise the data
48
NormalisationNormalisation
– First Normal Form (1NF)– Second Normal Form (2NF)– Third Normal Form (3NF)
– First Normal Form (1NF)– Second Normal Form (2NF)– Third Normal Form (3NF)
ASR - Assessor
53
3. Identify representation of entities3. Identify representation of entities
– determine the most effective spatial representation for geographic features
– consider whether:• a feature might be represented on a map• the shape of a feature might be significant in
performing geographic analysis• the feature will have different representations and
different map scales • textual attributes of the feature will be displayed on
map products• ...
– determine the most effective spatial representation for geographic features
– consider whether:• a feature might be represented on a map• the shape of a feature might be significant in
performing geographic analysis• the feature will have different representations and
different map scales • textual attributes of the feature will be displayed on
map products• ...
54
4. Match to ARC/INFO data model4. Match to ARC/INFO data model
– determine the appropriate ARC/INFO representation for entities
• points, lines, polygons
– ensure complex feature classes are supported• route comprised of sections which in turn are based
on arcs
• a region is composed of polygons
• event is a point or a line which occurs along a route
– others (e.g. GRID, TIN)
– determine the appropriate ARC/INFO representation for entities
• points, lines, polygons
– ensure complex feature classes are supported• route comprised of sections which in turn are based
on arcs
• a region is composed of polygons
• event is a point or a line which occurs along a route
– others (e.g. GRID, TIN)
55
Matching to ARC/INFO data model
Entity Spatialtype
ARC/INFO
Relatedto
Coverage Attribute files
Anno.LUT
56
5. Organise into geographic data sets
5. Organise into geographic data sets
– to identify and name the geographic data sets that will contain the various entities:
• define the contents of geographic data sets (coverages, grids etc)
• name workspaces, geographic data sets, entities and attributes
• complete entity definitions
• add cartographic text and lookup tables
– to identify and name the geographic data sets that will contain the various entities:
• define the contents of geographic data sets (coverages, grids etc)
• name workspaces, geographic data sets, entities and attributes
• complete entity definitions
• add cartographic text and lookup tables
57
5(i) Define the content of geographic data sets5(i) Define the content of geographic data sets
– Data sets supported : coverage, grid, tin, image and drawing
– coverages several entities can be grouped into a single coverage
– DBMS : stored in a separate database management system
– Data sets supported : coverage, grid, tin, image and drawing
– coverages several entities can be grouped into a single coverage
– DBMS : stored in a separate database management system
58
5 (ii) Geographic datasets, entities and attributes
5 (ii) Geographic datasets, entities and attributes
– coverage definitions
• high level summary of the data physically stored in the database
• required for defining the coverage structure
– file naming conventions in ARC/INFO
– coverage definitions• high level summary of the data physically stored in
the database
• required for defining the coverage structure
– file naming conventions in ARC/INFO
59
5 (iii) Complete entity definitions5 (iii) Complete entity definitions
– background information: coverage name, data source, agency, number of records etc.
– attribute definition• attribute name, type, field width
• validation rules/ permitted values
– background information: coverage name, data source, agency, number of records etc.
– attribute definition• attribute name, type, field width
• validation rules/ permitted values
60
5 (iv) Cartographic text & code tables5 (iv) Cartographic text & code tables
– annotation (text, placing rules etc)– look up tables
• pre defined set of values
• description/ labels
• means of creating displays based on attribute values
– annotation (text, placing rules etc)– look up tables
• pre defined set of values
• description/ labels
• means of creating displays based on attribute values
61
Robinson (Ch 14): Scale and GIS databasesRobinson (Ch 14): Scale and GIS databases
– (past) map’s scale greatly influenced map content and data resolution
– GIS data are ‘scaleless’ … scale is still a critical factor with digital databases - because of the ways in which we create digital databases
– scale and resolution (Tab 14.1)
– (past) map’s scale greatly influenced map content and data resolution
– GIS data are ‘scaleless’ … scale is still a critical factor with digital databases - because of the ways in which we create digital databases
– scale and resolution (Tab 14.1)
62
Robinson (Ch 14): Scale and resolution issues Robinson (Ch 14): Scale and resolution issues
– symbolisation and display problems– handling databases of different scales
• join problems (e.g. urban rural)
• merge problems (different themes)
• scale levels– in general
– large scale data (AM/FM etc.)
– symbolisation and display problems– handling databases of different scales
• join problems (e.g. urban rural)
• merge problems (different themes)
• scale levels– in general
– large scale data (AM/FM etc.)
63
Robinson (Ch 15): Managing large GIS Robinson (Ch 15): Managing large GIS
– Data organisation• partitioning
• spatial indexes
• metadata
– data compression• run length encoding (RLE)
• quadtree encoding
• others ...
– Data organisation• partitioning
• spatial indexes
• metadata
– data compression• run length encoding (RLE)
• quadtree encoding
• others ...