1
Flood Damage and Influencing Factors: A Bayesian Network Perspective Kristin Vogel 1 , Carsten Riggelsen 1 , Bruno Merz 2 , Heidi Kreibich 2 , Frank Scherbaum 1 1 Institute of Earth and Environmental Science, Potsdam University; Germany 2 GFZ - GeoForschungszentrum Potsdam; Germany contact: [email protected] 1. Flood risk assessment Classical approaches relate flood dam- age for a certain class of objects to inun- dation depth. Single and joint effects of other param- eters (e.g. inundation duration, qual- ity of warning) are largely unknown and widely neglected in damage assessments. A dataset of 1135 (partly missing) obser- vations of 29 variables collected after the 2002 and 2005/2006 floods in Elbe and Danube offers a data mining opportunity for learning a Bayesian Network reveal- ing so far ignored interactions. Dresden 2002 2. Bayesian Network A Bayesian Network (BN) describes a joint probability distribution decomposing it into a product of (local) conditional probability distributions according to a directed acyclic graph, which encodes the conditional independences. Graph structure, DAG, and parameters, Θ, can be learned from data, d, and is chosen here as Maximum aposteriori (MAP) of the joint posterior: P(DAG, Θ|d) P(d|DAG, Θ) P(Θ, DAG). 3. Automatic Discretization Since we do not want to make assumptions about the functional form of the (conditional) distribution functions, we need to discretize continuous vari- ables for BN learning. Λ, a set of interval boundaries for all continuous variables, defines the discretization that "bins" the continuous data, d c , into a discrete version, d. Instead of a discretization prior to BN learning, we aim for optimizing BN and discretization simulta- neously, using an extension of the BN MAP score: P(DAG, Θ, Λ|d c ) P(d c |d, Λ) P(d|DAG, Θ, Λ) P(DAG, Θ, Λ). The score also takes care about the regularization of the number of intervals and network arcs. 4. A Single Continuous Target An originally continuous target vari- able (e.g. relative loss of building) is rediscretized into a very fine resolu- tion for an almost continuous approxi- mation of the conditional densities. The number of realizations per state decreases significantly for the target variable, leading to unreliable param- eter estimates, if the maximum likeli- hood estimator is used. A Gaussian kernel density estimator is used instead, exploiting the observa- tions not only of the state of interest, but also of neighbouring states. -12 -10 -8 -6 -4 -2 0 0.0 0.1 0.2 0.3 0.4 conditional probability of building loss log loss good precaution and warning bad precaution and warning Effect of discretization refinement Shaded histograms show conditional densities for the coarse automatic discretization. Lines show the corresponding conditional densities for the re- fined discretization 5. Results sd-f FL BN 0.08 0.12 0.16 RMSE sd-f FL BN 0.4 0.5 0.6 0.7 0.8 correlation coefficiant Comparison of prediction performance of BN with currently used approaches (sd-f: stage damage function - depends only on water depth and object class; FL:FLEMOps+r - model devel- oped from same data set) 100 bootstrap samples, each with 100 events, are drawn from the dataset; the building loss pre- diction is quantified by root mean squared error (left) and Pearson correlation coefficient (right). Box 2 "Bayesian network" shows the network learned from the collected flood data using the automatic discretization ap- proach. The learned BN reveals and confirms non-trivial inter- actions. The performance of the BN (with a refined discretization of the building loss variable) in terms of predicting the building loss is compared to flood damage assessment approaches currently used in Germany (see picture to the left). Even though the BN is not designed for an optimal prediction of the target variable distribution (but for the joint distribution), the quality of building loss predictions is comparable to existing procedures. Moreover the BN has the benefit to capture and allow reasoning under uncertainty

Effect of discretization refinementleo.ugr.es/pgm2012/proceedings/posters/vogel_flood_poster.pdf · Kristin Vogel 1, Carsten Riggelsen 1, Bruno Merz 2, Heidi Kreibich 2, Frank Scherbaum

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Effect of discretization refinementleo.ugr.es/pgm2012/proceedings/posters/vogel_flood_poster.pdf · Kristin Vogel 1, Carsten Riggelsen 1, Bruno Merz 2, Heidi Kreibich 2, Frank Scherbaum

Flood Damage and Influencing Factors: A Bayesian Network PerspectiveKristin Vogel 1, Carsten Riggelsen 1, Bruno Merz 2, Heidi Kreibich 2, Frank Scherbaum 1

1 Institute of Earth and Environmental Science, Potsdam University; Germany2 GFZ - GeoForschungszentrum Potsdam; Germany

contact: [email protected]

1. Flood risk assessment• Classical approaches relate flood dam-

age for a certain class of objects to inun-dation depth.

• Single and joint effects of other param-eters (e.g. inundation duration, qual-ity of warning) are largely unknown andwidely neglected in damage assessments.

• A dataset of 1135 (partly missing) obser-vations of 29 variables collected after the2002 and 2005/2006 floods in Elbe andDanube offers a data mining opportunityfor learning a Bayesian Network reveal-ing so far ignored interactions.

Dresden 2002

2. Bayesian Network

• A Bayesian Network (BN) describes a joint probability distribution decomposing it into aproduct of (local) conditional probability distributions according to a directed acyclic graph,which encodes the conditional independences.

• Graph structure, DAG, and parameters, Θ, can be learned from data, d, and is chosen here asMaximum aposteriori (MAP) of the joint posterior: P(DAG,Θ|d) ∝ P(d|DAG,Θ)P(Θ,DAG).

3. Automatic Discretization• Since we do not want to make assumptions about

the functional form of the (conditional) distributionfunctions, we need to discretize continuous vari-ables for BN learning.

• Λ, a set of interval boundaries for all continuousvariables, defines the discretization that "bins" thecontinuous data, dc, into a discrete version, d.

• Instead of a discretization prior to BN learning, weaim for optimizing BN and discretization simulta-neously, using an extension of the BN MAP score:

P(DAG,Θ,Λ|dc) ∝ P(dc|d,Λ)P(d|DAG,Θ,Λ)P(DAG,Θ,Λ).

• The score also takes care about the regularization ofthe number of intervals and network arcs.

4. A Single Continuous Target• An originally continuous target vari-

able (e.g. relative loss of building) isrediscretized into a very fine resolu-tion for an almost continuous approxi-mation of the conditional densities.

• The number of realizations per statedecreases significantly for the targetvariable, leading to unreliable param-eter estimates, if the maximum likeli-hood estimator is used.

• A Gaussian kernel density estimator isused instead, exploiting the observa-tions not only of the state of interest,but also of neighbouring states.

−12 −10 −8 −6 −4 −2 0

0.0

0.1

0.2

0.3

0.4

conditional probability of building loss

log loss

good precaution and warning

bad precaution and warning

Effect of discretization refinementShaded histograms show conditional densities forthe coarse automatic discretization. Lines showthe corresponding conditional densities for the re-fined discretization

5. Results

sd−f FL BN

0.08

0.12

0.16

RMSE

sd−f FL BN

0.4

0.5

0.6

0.7

0.8

correlation coefficiant

Comparison of prediction performance of BN with currently used approaches (sd-f: stagedamage function - depends only on water depth and object class; FL:FLEMOps+r - model devel-oped from same data set)100 bootstrap samples, each with 100 events, are drawn from the dataset; the building loss pre-diction is quantified by root mean squared error (left) and Pearson correlation coefficient (right).

• Box 2 "Bayesian network" shows the network learned fromthe collected flood data using the automatic discretization ap-proach. The learned BN reveals and confirms non-trivial inter-actions.

• The performance of the BN (with a refined discretization of thebuilding loss variable) in terms of predicting the building lossis compared to flood damage assessment approaches currentlyused in Germany (see picture to the left).

• Even though the BN is not designed for an optimal predictionof the target variable distribution (but for the joint distribution),the quality of building loss predictions is comparable to existingprocedures. Moreover the BN has the benefit to capture andallow reasoning under uncertainty