36
TerraFl ow Flow Computation on Massive Grid Terrains Helena Mitasova Dept. of Marine, Earth & Atmospheric Sciences, NCSU, USA http://www.cs.duke.edu/geo*/terr Lars Arge Laura Toma Dept. of Computer Science Duke University, USA

TerraFlow Flow Computation on Massive Grid Terrains Helena Mitasova Dept. of Marine, Earth & Atmospheric Sciences, NCSU, USA /terraflow

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

TerraFlow

Flow Computation onMassive Grid Terrains

Helena Mitasova

Dept. of Marine, Earth & Atmospheric Sciences, NCSU,

USA

http://www.cs.duke.edu/geo*/terraflow

Lars Arge

Laura Toma

Dept. of Computer Science

Duke University,

USA

TerraFlow

Flow direction • The direction water flows at a cell

Flow Routing• Compute flow direction for all cells in the terrain, including flat

areas

Flow accumulation value• Total amount of water which flows through a cell per unit width of

contour• Flow is distributed according to the flow directions

Flow Accumulation• Compute flow accumulation values for all cells in the terrain

Modeling Flow on Grids

TerraFlow

Modeling Flow

Flow Direction Flow Accumulation

Sierra-Nevada DEM

TerraFlow

Applications

Automatic estimation of various terrain parameters• watershed basins

• stream network

• topographic indices

Surface saturation Soil water content Erosion Deposition Forest structure Sediment transport Solar radiation

TerraFlow

Massive Data

Remote sensing data available • NASA-SRTM (whole Earth 5TB at 30m resolution) • USGS (entire US at 10m resolution)• LIDAR (1m resolution)

Ex: Appalachian Mountains dataset• 100m resolution (500MB)• 30m resolution (5.5GB)• 10m resolution (50GB)• 1m resolution (5TB)

TerraFlow

Process Massive Data? GRASS

• r.watershed, ...• Killed after running for 17 days on a 6700 x 4300 grid (approx

50 MB dataset)

TARDEM• flood, d8, aread8• Killed after running for 20 days on a 12000 x 10000 grid

(appox 240 MB dataset)• CPU utilization 5%, 3GB swap file

ArcInfo • flowdirection, flowaccumulation• Can handle the 130MB dataset

• Doesn’t work for datasets bigger than 2GB

TerraFlow

TerraFlow Terraflow is Our suite of programs for flow routing and

flow accumulation on massive grids [ATV`00,AC&al`02]

Flow routing and flow accumulation modeled as graph problems and solved in optimal I/O bounds

Efficient• 2-1000 times faster on very large grids than existing software

Scalable• 1 billion elements!! (>2GB data)

Flexible • Allows for both D8 and D-inf flow modeling

http://www.cs.duke.edu/geo*/terraflow

TerraFlow

r.terraflow

Port of Terraflow into GRASS Preliminary results on

• Augment with additional features • Output plateaus, depressions, tci, water outlet queries,

watershed basins

• Comparison with GRASS flow routines• r.watershed, r.flow, r.topidx, ...

• Performance results

TerraFlow

Outline

Scalability to large data• Why standard programs are not in general scalable

• One approach to improve scalability• I/O-efficient algorithms

r.terraflow • Algorithm outline

• Related work and programs

• Preliminary comparison and performance results

• Output illustration

TerraFlow

Scalability to Massive Data

Why? Most GIS programss assume data fits in memory and minimize only

CPU computation

But..Massive data does not fit in main memory! OS places data on disk and moves data in and out of memory

• Data is moved in blocks

• Accessing the disk is 1000 times slower than accessing main memory when processing massive data disk I/O is the bottleneck, rather than CPU time!

TerraFlow

Scalability to Massive Data

How? Local data accesses vs. scattered data accesses

Example: reading an array from disk• Array size N = 10 elements

• Disk block size = 2 elements

• Memory size = 4 elements (2 blocks)

1 2 10 9 5 6 3 4 8 71 5 2 6 3 8 9 4 7 10

Algorithm 2: Loads 5 blocksAlgorithm 1: Loads 10 blocks

N blocks >> N/B blocks

TerraFlow

Example

r.watershed• r.watershed –m el=elev_grid dir=dir_grid ac=accu_grid

• Running on a 500MHz PIII, 1GB RAM, FreeBSD

• On Hawaii dataset we let it run for 17 days in which it completed 65%

Kaweah

1100 x 1400

1.6M elements

Puerto Rico

4400 x 1300

6M elements

Hawaii

6800 x 4300

28M elements

Capdem

12000 x 10000

122M elements

r.watershed 12 min 5 days 26 days ?

However good the OS, it cannot change the data access pattern of the program!!

TerraFlow

TerraFlow Approach

Redesign the algorithm to be I/O-Efficient • Block size is large! at least 8KB (32KB, 64KB)• Compute on whole block while it is in memory

• Avoid loading a block each time• Improved locality• Speedup = block size

I/O efficient algorithms• measure of complexity: number of blocks

transfered between main memory and disk

http://www.cs.duke.edu/geo*/terraflow

TerraFlow

r.terraflow outline

Step 1: Flow routingWater flows downhill: SFD, MFD

• Compute SFD/MFD flow directions by inspecting 8 neighbor points

• Identify flat areas: plateaus and sinks

http://www.cs.duke.edu/geo*/terraflow

TerraFlow

Flow Routing on Flat Areas

…no obvious flow direction Plateaus

• Assign flow directions such that each cell flows towards the nearest spill point of the plateau

Sinks• Either catch the water inside the sink

• Assign flow directions towards the center of the sink

• Or route the water outside the sink using uphill flow directions• Simulate flooding the terrain: sinks plateaus

• Assign uphill flow directions on the original terrain by assigning downhill flow directions on the flooded terrain

TerraFlow

r.terraflow outline

Step 2: Compute flow accumulation• Water flows following the flow directions

• Goal: Compute the total amount of water through each grid cell• Initially one unit of water in each grid cell

• Every cell distributes water to the neighbors pointed to by its flow direction(s)

All these steps can be solved I/O-efficiently• Flow routing: modeled as graph problems (breadth-first search,

connected components, graph contraction)

• Flow accumulation: sweeping using an I/O-efficient priority queue

TerraFlow

Related Work

TerraFlow’s emphasis • Computational aspects, not modeling

Flow modeling• [O’Callaghan and Mark 1984]

• D8 method for flow accumulation

• [Jenson and Domingue 1988]• General technique of flooding

• Software• GRASS, ArcInfo,Tardem, Topaz, Tapes-G, RiverTools

TerraFlow

GRASS Raster Flow Functions

r.watershed• Most commonly used. Uses A* algorithm to determine flow of water. Ehlschlaeger,

USACERL.

• Input: elevation, [..]

• Output: flow direction, flow accumulation, [waterhseds, stream segments, slope length, slope steepness]

• Flow direction grid equivalent to running r.drain for every cell on the grid

• Watershed grid equivalent to running r.water.outlet for multiple outlets

• r.drain• Traces the least-cost (steepest-downslope) flow path from a given cell. Stops in pits.

• Input: elevation, point coordinates

• Output: least-cost path

• r.water.outlet• Generates a watershed basin from a flow direction map. Ehlschlaeger, USACERL.

• Input: flow direction (from r.watershed), basin coordinates

• Output: watershed basin map

TerraFlow

GRASS Raster Flow Functions

r.basin.fill• Generates a raster map of watershed subbasins. Larry Band.• Input: stream network (from r.watershed), thinned ridge network (by hand!)• Output: watersheds subbasins

r.topmodel, r.topidx• Simulates TOPMODEL, Keith Beven. • Input: elevation, basin, TOPMODEL parameters file• Output: flow direction, filled elevation, tci, watersheds, [..]

r.flow, r.flowmd• Constructs flowlines, flowpath lengths and flowline densities. Flowlines stop in

pits. Mitas, Mitasova, Hofierka, Zlocha.• Input: elevation, [..]• Output: flowline density, flowlines (vector), lengths

More complex models• r.water.fea - Finite element analysis program for hydrologic simulations• r.hydro.CASC2D - Fully integrated distributed cascaded 2D hydrologic modeling.• r.wrat - Water Resource Assessment Tool

TerraFlow

r.terraflow features

Input• elevation grid

Output• flow direction grid

• SFD (D8) single flow directions• MFD (Dinf) multiple flow directions

• flow accumulation grid• Option to switch to SFD when flow value exceeds an user-

defined threshold

• topographic convergence index (tci) grid• plateau and depressions grid

TerraFlowGRASS:>r.terraflow help

Description:Flow computation for massive grids.

Usage:r.terraflow [-sq] elev=name filled=name direction=name watershed=name accumulation=name tci=name [d8cut=value] [memory=value] [STREAM_DIR=name] [stats=name]

Flags:-s SFD (D8) flow (default is MFD)

-q Quiet

Parameters: elev Input elevation grid filled Output (filled) elevation grid direction Output direction grid watershed Output watershed grid accumulation Output accumulation grid tci Output tci grid d8cut If flow accumulation is larger than this value it is routed using SFD (D8) direction

(meaningfull only for MFD flow only). default: infinity

memory Main memory size (in MB) default: 300

STREAM_DIR Location of intermediate STREAMs default: /var/tmp

stats Stats file default: stats.outv

http://www.cs.duke.edu/geo*/terraflow

TerraFlow

Preliminary Experimental ResultsPIII dual 1GHz processor, 1GB RAM

DatasetGrid

dimensions

Grid size (million

elements)

Kaweah 1163 x 1424 1.6

Puerto Rico 4452 x 1378 5.9

Sierra Nevada 3750 x 2672 9.5

Hawaii 6784 x 4369 28.2

Lower New England

9148 x 8509 77.8

Panama11283 x 10862

122.5

r.terraflow

1.85 min

4.65 min

19.22 min

22.35 min

114 min

3.5 hr

r.watershed

9.2 min

93 min

18.2 hours

killed after 6 days

< 1% done

TerraFlow

Panama DEM

TerraFlow

Panama r.terraflow MFD

TerraFlow

TerraFlow

r.terraflow MFD zoom,3D

TerraFlow

r.terraflow SFD zoom,3D

TerraFlow

r.terraflow MFD zoom,2D

TerraFlow

r.terraflow SFD zoom,2D

TerraFlow

r.terraflow MFD TCI zoom,2D

TerraFlow

r.terraflow SFD TCI zoom,2D

TerraFlow

Flat DEM

TerraFlow

r.terraflow MFD

TerraFlow

r.terraflow SFD

TerraFlow

r.watershed

TerraFlow

Conclusions/Future Work

Work in progress• More features

• Water outlet queries

• Watershed delineation

• Experimental analysis

Other features? Modeling? Other (intensive computing, I/O-bound) applications?

http://www.cs.duke.edu/geo*/terraflowhttp://www.cs.duke.edu/geo*/terraflowhttp://www.cs.duke.edu/geo*/terraflow