BA_thesis_excerpt

FLOOD RISK ASSESSMENT AND MAPPING USING GEOGRAPHICAL

INFORMATION SYSTEM: CASE STUDY IN KUALA LUMPUR

HADI

AEA090705

THESIS SUBMITTED IN FULFILMENT OF THE REQUIREMENTS FOR

THE DEGREE OF BACHELOR OF ARTS AND SOCIAL SCIENCE

DEPARTMENT OF GEOGRAPHY

FACULTY OF ARTS AND SOCIAL SCIENCE

UNIVERSITY OF MALAYA

SESSION 2011/2012

ABSTRACT

The last decade has seen increasing frequency and magnitude of hydrometeorological

hazards in Malaysia. Flash flood in particular has been considered as a major recurring hazard in

Malaysia, especially in urban areas where development activities and people livelihood are closely

interconnected with rivers. Rapid and uncontrollable development in Kuala Lumpur, the Malaysia

capital, especially in floodplain areas have increased the risk associated with flash flood. The risk

encompasses a wide dimension of livelihood and there is a growing need for effective integrated

flood risk management in Kuala Lumpur. One important component of the integrated approach is

the assessment of flood risk that results from the action of hazard on the vulnerable population and

elements in exposure. The risk varies in space and time and therefore can be best assessed using

geographical approach. Geographical Information System (GIS) provides powerful capabilities to

facilitate such spatial assessment and digital environment to store, process, and manage the involved

large amount of spatial and non-spatial data, as well as cartographic capabilities to produce high

quality flood risk maps to communicate the risk information. This study demonstrated the

integration of GIS in flood risk assessment and mapping using two spatial models: statistical (binary

logistic regression) model and index model. The logistic regression generated equation to predict

probability of flooding occurence from the statistically significant predictor variables, to produce

flood probability map. The index model conceptualized and quantified risk as a function of hazard,

vulnerability, and coping capacity, using weighted overlay analysis, to produce flood risk index

map. These maps are invaluable to assist policy makers and planners in planning land use

development and flood disaster management, while providing local residents the flood danger level

at their houses.

METHODOLOGY

CHART 3.3 FLOOD PREDICTION LOGISTIC REGRESSION MODELLING

SCENARIO 1

Geocode Flood Events’ Location and Digitize OCCURENCE Points

SCENARIO 2

Using Flood Area (Polygon) Year 2008

Randomly Generate OCCURENCE Sample Points Inside Flood Area

Randomly Generate NON-OCCURENCE Sample Points Outside Buffer Zones

Create Buffer for OCCURENCE Points

Randomly Generate NON-OCCURENCE Sample Points Outside Flood Area

Overlay with IV** raster layers:

Elevation Slope Planar

Curvature

Profile

Curvature

Population

Density

Drainage

Density

Distance

to River

Road

Density

Rainfall*

Land Use (Built Up and

Non Built Up)

Flow

Accumul

ation

Distance to

Mitigation Site*

Subbasin

Area

Curvature

Intersect Points with All IV Layers to Extract Rasters’ Values

from All Layers at All NON-OCCURENCE Sample Points

Intersect Points with All IV Layers to Extract Rasters’

Values from All Layers at All OCCURENCE Sample Points

Tabulate Extracted Rasters’ Values at All

OCCURENCE Sample Points

OCCURENCE Points IV Table NON-OCCURENCE Points IV Table

Assign DV*** value = 0 Assign DV*** value = 1

Open and Combine Tables into a

single dataset in SPSS

Run Backward Stepwise Binary Logistic Regression

Interpret Result and Construct Predictive Equation with

Significant IVs and Their Coefficients

Map Flood Probability for The Whole

Study Area using GIS Raster Calculator

Tabulate Extracted Rasters’ Values at All

NON-OCCURENCE Sample Points

Notes:

* Used in Scenario 2

** Independent Variables

** * Dependent Variable. New column created with value all 0 for NON-OCCURENCE table, all 1 for OCCURENCE table.

s

Distance to

river

Slo

pe

Dis

tan

ce t

o r

iver

Ele

vati

on

Dra

inag

e d

ensi

ty

Ro

ad d

ensi

ty

Rai

nfa

ll

Sub

bas

in A

rea

Dis

tan

ce t

o m

itig

atio

n s

ite

Cu

rvat

ure

Pla

nar

Cu

rvat

ure

Pro

file

cu

rvat

ure

Lan

d u

se (

bu

ilt u

p o

r

no

n b

uilt

up

)

Po

pu

lati

on

den

sity

Sample OCCURRENCE and NON-

OCCURRENCE Points

Intersect Point Tool extracts raster value

from all overlaid layers based on sample

points location, to generate independent

variables

FIGURE 3.4 MULTIPLE LAYER OVERLAY FOR LOGISTIC REGRESSION MODELLING

CHART 3.4 FLOOD RISK INDEX MODELLING

Hazard

Distance to

River

Coping Capacity

RISK

Exposure

Historical

Data

Binary Logistic

Regression

Model

Physical

Vulnerability

People

Vulnerability

Socioeconomic

Vulnerability

Distance to rescue station

Distance to shelter

Distance to hospital

Distance to main road

Distance to warning sign

board

Population Density

% People Aged

Below 10 Years Old

% People Aged 65

and Over

Land use

Type

Road

Density

Vulnerability

Flood Area

Up to Year

2000

Water Depth

Frequency of

Occurence

Flood

Occurence

Probability

NO NAME OF DATA

LAYER SOURCE USES

ENTITY

TYPE

DATA

MODEL ATTRIBUTES

1 Administrative Digitized from Kuala

Lumpur Local District

Map in 2010 Population

and Housing Census

Report published by

Department of Statistics

Malaysia

Main base map (for population

layer)

Polygon Vector Local district name, area.

2 Land Use Kuala Lumpur

2008

GIS Unit, JPF*, DBKL** Characterisation of social

geography in the study area,

socioeconomic vulnerability

assessment in flood risk index

modelling, extract river feature

for physical distance analysis to

assess physical vulnerability

also as independent variable for

flood probability (binary logistic

regression) modelling, raster

analysis mask, flood impact

estimation.

Polygon Vector FID, location, ID, and land use classes

(including:

1. Industry

2. Institution

3. Open area

4. Recreation

5. Religious use

6. Residential

7. Public facilities

8. Community facilities

9. KTM train rail

10. Commercial

11. Cemetery

12. Agriculture, fishery, forestry

13. Land reserve for electricity line

14. School

15. Squatter

16. River and water bodies

TABLE 4.1 DATABASE SUMMARY

17. Terminal)

3 Topography 30m x 30m Digital

Elevation Model (DEM)

downloaded from USGS

website, clipped for study

area, converted to raster

(.img).

To derive slope, aspect,

curvature, planar curvature,

profile curvature, flow direction,

flow accumulation (drainage),

contour, watershed, and 3D

view of study area. The DEM-

derivatives are parameters in

both index and logistic

regression modelling

Surface Raster Elevation

4 Roads (2008) Provided by JPF DBKL To estimate road density raster

for logistic regression, extract

main road for distance analysis

as factor in coping capacity

evaluation, flood economic

impact estimation

Polyline Vector Reference no., strategic, area, parliment,

hierarchy, hierarchy no., status, status

code, length, width, geometric area, from

(origin), to (destination),

5 Congested Road,

Morning (AM) and

Afternoon (PM)

Digitized from image

provided by JPB***

DBKL

Distance analysis for coping

capacity

Polyline Vector Road name

6 Infrastructure

Fire and rescue

station

Addresses obtained from

bomba.gov.my, geocoded

and point digitized on

location coordinate

Proximity analysis for coping

capacity assessment, distance to

mitigation site is input in

logistic regression analysis.

Point Vector Name, address, x-, y- coordinate

Public hospital and

clinic

Multipurpose hall

and community

center

Flood mitigation

work location

Flood electronic

warning sign board


moh.gov.my, geocoded

and digitized as point

feature.


jkpdbkl.com, geocoded

and point digitized on

location coordinate

Digitized from paper map

in Kuala Lumpur Flood

Problem and Solution

Report (DBKL, 2005)

Digitized from paper map

in Kuala Lumpur Flood

Problem and Solution

Report (DBKL, 2005)

7 Subbasin Area Digitized from image in

KL City Plan 2020 Report

(DBKL)

Hydrology base map, show

small catchment

Vector Polygon Name of draining stream/river, area.

8 Flood Points Geocoded and digitized

from DID Malaysia report

on flood events in Kuala

Lumpur year 2004 till

2011

Sample OCCURENCE points

(Dependent Variable/binary

response value = 1) for logistic

regression, flood historical

hazard assessment.

Vector Point Location, date and year of event, depth,

frequency

9 Flood Area Digitized from flood areal

extent paper map prepared

by DID Malaysia

Sample OCCURENCE points

(DV=1) for logistic regression,

flood exposure identification

Vector Polygon Year (until 2000, 2008, and 3 March

2009), Location, Area

10 Rainfall (2009) Daily station reading in

MET*** website

Create rainfall distribution

surface by interpolating

maximum value at reading

stations, as one dependent

variable for logistic regression

modelling

Vector Point Date, station location coordinate, rainfall

reading

11 Population Data entry from 2010

Population and Housing

Census

To derive population density,

percentage age group <15 and

65> for people vulnerability

assessment

Vector Polygon Local district name, total population,

population by ethnic (Malaysian, Non-

Malaysian, Bumiputera, Malay,other

Bumiputera, Chinese, India, others),

population by age group (0-4, 5-9, 10-14,

15-19, 20-24, 25-29, 30-34, 35-39, 40-44,

45-49, 5—54, 55-59, 60-64, 65-69, 70-74,

75 and over), population by gender (Male

and Female), households, living quarters,

population density.

RESULTS

The main results of both flood occurence probability logistic regression modeling and

risk index modeling are presented in the following maps. Maps are used to communicate the flood

risk information to the target stakeholders. Risk map should be produced with proper cartography

design for easier visualization to effectively convey the information to all the map readers.

Results of Binary Logistic Regression Test

Table 5.1d Variables in the Equation

B S.E. Wald df Sig. Exp(B)

95% C.I.for

EXP(B)

Lower Upper

Step 7a distriv -.002 .001 10.104 1 .001 .998 .997 .999

slope -.026 .014 3.336 1 .068 .974 .948 1.002

popdens .000 .000 4.640 1 .031 1.000 .999 1.000

Density .077 .034 5.147 1 .023 1.080 1.010 1.154

@10landuse(1) .735 .338 4.727 1 .030 2.086 1.075 4.049

Constant 1.802 .993 3.295 1 .070 6.060

a. Variable(s) entered on step 1: fill, subbasi, distriv, flowacc, slope, popdens, plancur, profile,

drainde, Density, @10landuse.

Table 5.1d above is the main result of the binary logistic regression analysis, after seven

iteration steps that eliminated statistically insignificant predictor variables sequentially. Referring

to the statement of variable(s) entered on step 1, ‗curvatu‘ was not included because SPSS found

redundancy between ‗curvatu‘ variable and, profile and planar curvature. Table 5.4 suggests that

from 12 dependent variables tested, 5 were found ‗significant‘. They are ‗distriv‘ (distance t river),

‗slope‘, ‗popdens‘ (population density), ‗density‘ (road density) and ‗@10landuse(1)‘ (built-up).

However, the results need further interpretation in terms of their significance.

The significance value (Sig) of Wald statistic is one way to assess the significance of the 5

variables. ‗Distriv‘, ‗popdens‘, ‗density‘, and ‗@10landuse(1)‘ were statistically significant at 95%

confidence interval having their Sig. Value less than 0.05. ‗Slope‘ is not significant in this case

because its Sig. value more than 0.05.

Another way to tell whether the predictors are significant was the Exp(B) value, which is the

odds ratio. Exp(B) equals to 1 indicates that increase in the corresponding variable does not change

the odds of the outcome occuring. Exp(B) value of less than 1 (more than 1) indicates that a one

unit increase in the value of the correponding variable leads to drop (increase) in the odds.

Interpreting in this way, ‗popdens‘is not significant as increase in its value does not change the

likelihood of flooding to occur. ‗Distriv‘ and ‗slope‘ cause a slight drop in the odds of flooding to

occur, while ‗density‘causes a slight increase in the odds. A more significant predictor is

‗@10landuse‘; presence of built-up land use increase the odds more than double, that is flooding is

more than 2 times as likely as in ‗non-built up‘ area.

Despite the significance tests, the ―B‖ values in table 5.4 are the logistic coefficients from

which the predicive equation was constructed as below:

Logit (y) = 1.802 – 0.002 distriv – 0.026 slope + 0.77 density + 0.735 @10landuse(1)

y = – –

– –

where y is the odds/probability of flooding occurence, distriv is distance to river, slope is slope

steepness , density is road density, and @10landuse(1) is built up land use category. Note that

variable ‗popdens‘ with ‗B‘ value of 0.000 can be left out from the equation due to its zero or

almost no contribution to the prediction.

The overall significance of the model is given by classification table 5.1e and the model

summary (table 5.1f). Classification table shows that 67.9% of the cases were correctly classified.

However, the Nagelkerke R square tells that only 27.6% variation in dependent variable value can

be explained by the independent variables.

Table 5.1e Classification Tablea

Observed

Predicted

Flood Percentage

Correct No Yes

Step 7 Flood No 62 33 65.3

Yes 28 67 70.5

Overall Percentage 67.9

a. The cut value is .500

Table 5.1f Model Summary

Step

-2 Log

likelihood

Cox & Snell

R Square

Nagelkerke R

Square

7 219.325b .207 .276

Map 5.1 FLOOD OCCURENCE PROBABILITY MAP BASED ON BINARY LOGISTIC

REGRESSION ANALYSIS SCENARIO A

Map 5.1 shows flood probability as predicted by the significant predictors from binary

logistic regression scenario A—using OCCURRENCE points of flooding location from year 2004

to 2011. The significant predictor variables are distance to river, slope, road density, and land use.

The computed probability was classified into five categories with equal class interval: very low

(0.00 – 0.20), low (0.21 – 0.40), moderate (0.41 – 0.60), high (0.61 – 0.80), and very high (0.81 and

above). The model predicts a vast area, from Taman Ibu Kota in the northeastern through the city

center local district of Bandar Kuala Lumpur until Taman Overseas Union in the south area, to be

very highly probable to experience flooding. Flooding is very unlikely to occur in the areas near

Taman Bukit Maluri, Jinjang Utara, Taman Kepong, and Taman Cheras.

Map 5.2 FLOOD OCCURENCE PROBABILITY MAP BASED ON BINARY LOGISTIC

REGRESSION ANALYSIS SCENARIO B

Logit (y) = 11.348 – 0.53 rf11max – 0.002 distriv + 0.690 plancur – 0.021 fill + 0.621

lu10(1) – 0.206 drainde

y = – – – –

– – – –

Map 5.2 shows the predicted probability of flooding occurrence based on binary logistic

regression scenario B—using sample OCCURRENCE points from the available flood areal extent

for year 2008. The significant variables are rainfall, distance to river, planar curvature, elevation,

landuse, and drainage density. In both scenarios, distance to river and landuse (built up or non-

built up) remain as important factors that determine the likelihood of flooding to happen. An even

wider area was classified with very high chance of flooding. Moderate and low flood probability

were predicted near Bukit Kiara, Universti Malaya, Ulu Klang, Taman Cheras, and Sungai Besi.

Map 5.3 LEVEL OF TOTAL VULNERABILITY TO FLOOD HAZARD

Map 5.4 COPING CAPACITY LEVEL

Map 5.3 shows that vulnerability is generally low (green). The highly vulnerable (pink)

areas were identified mainly in Batu, Setapak, and Petaling. Map 5.4 shows that majority of the

study area has high coping capacity, with low coping capacity detected in the western part of Batu,

Ulu Klang, and South part of Petaling.

Combination of predominant very high hazard (scenario A probability in this case) and the

more variable but generally low vulnerability, in map 5.5, results in a mixed distribution pattern of

low, moderate, and high value of (hazard x vulnerability). Pink areas are areas with high

vulnerability as identified in map 5.3.

Map 5.5 HAZARD X VULNERABILITY

Finally, taking into account coping capacity element, which is predominantly high, results in

risk map 5.6. The middle area where hazard level is high has low risk due to high coping capacity.

High risk areas appear in highly vulnerable areas with lower coping capacity, in Setapak and

Petaling especially. Very high flood risk was observable at the western side of Taman Overseas

Union.

Map 5.6 FLOOD RISK = HAZARD X VULNERABILITY / COPING CAPACITY

Risk

Score

Risk Level Cell

Count

%

0 Very Low 2511 6.14

1 Low 11990 29.34

2 Moderate 15752 38.54

3 High 9214 22.55

4 and 5 Very High 1402 3.43

Documents

BA_thesis_excerpt