14
TRLN High Performance Data Storage System 21 Sep 2006 Jim Porto Ken Galluppi

TRLN High Performance Data Storage System 21 Sep 2006 Jim Porto Ken Galluppi

Embed Size (px)

Citation preview

Page 1: TRLN High Performance Data Storage System 21 Sep 2006 Jim Porto Ken Galluppi

TRLN High Performance Data Storage System

21 Sep 2006

Jim Porto

Ken Galluppi

Page 2: TRLN High Performance Data Storage System 21 Sep 2006 Jim Porto Ken Galluppi

2

Not Many Answers

• We do not have a large disaster database—no one really does—that’s the problem.

• Data exists in many formats and is distributed across 100 counties and numerous state agencies.

• I think we are all open to using metadata tags, at a minimum, on the data that we do compile.

Page 3: TRLN High Performance Data Storage System 21 Sep 2006 Jim Porto Ken Galluppi

3

Not Many Answers• We are open to “standard” metadata formats.• Size—we estimate that the size of all the

databases that we want to compile will be on the order of 100 TB. Depends on what we add, but we could be adding up to 10% per year.

• A sizeable portion of the data could be permanent and will require long term storage. We hope to develop longitudinal data sets for time analysis.

• Don’t know how much data will be generated per month or per year.

Page 4: TRLN High Performance Data Storage System 21 Sep 2006 Jim Porto Ken Galluppi

4

Not Many Answers• Don’t know yet either the granularity or the

frequency of access.

• Software is not complex at this time, but development of very large database algorithms to tie spatial features together, to provide 3d spatial analysis, and to provide time series analysis for visual image displays and comparisons could become fairly complex.

Page 5: TRLN High Performance Data Storage System 21 Sep 2006 Jim Porto Ken Galluppi

5

The DataBaseNCOneMap Data Element Types (Example)

EconomyHealthEnvironmentSocio-DemographicBio-HabitatGeophysicalWeatherLand CoverUtility-Telecom

Imagery-Basemap

Elevation

Location-Geodetic

Structures

Transportation

Cadastral

Boundaries

Hydrography

Page 6: TRLN High Performance Data Storage System 21 Sep 2006 Jim Porto Ken Galluppi

6

Problems With NCOne Map DB

• Display only—can not manipulate data once displayed.

• Distributed across 100 counties and multiple state agencies.

• Multiple formats for databases.

• Access and security concerns.

• Not all data is captured.

Page 7: TRLN High Performance Data Storage System 21 Sep 2006 Jim Porto Ken Galluppi

7

A Disaster GEO-Informatics Database

County Databases

WMS

NC Map

1000 TB TRLN Database•Research Modeling•Operations•Planning

State Databases

Page 8: TRLN High Performance Data Storage System 21 Sep 2006 Jim Porto Ken Galluppi

8

Characteristics of Database• Could have a significant archive component.• Could have a significant transient component

requiring periodic flushing (ex. weather radar,)• Builds relationships between permanent and

transient data.• One model is to link to distributed data bases

(local, state, and federal), rapidly build transient databases both for mapping and for data manipulation, AND to partner with Triangle Universities to archive static data in data depositories.

Page 9: TRLN High Performance Data Storage System 21 Sep 2006 Jim Porto Ken Galluppi

9

Computer Science Issues

• How to maintain assembled data for transient use once maps have been rendered.

• Management of large data sets assembled on the fly.

• Time sampling and flushing of unused pieces.

• Data ownership.

• Data security.

Page 10: TRLN High Performance Data Storage System 21 Sep 2006 Jim Porto Ken Galluppi

10

Additional Computer Science Issues

• Data integrity compared to archived data.

• Bandwidth to move data around the network.

• Visualization of large datasets.

• How to build applications on top of large data sets (eg spatial analysis in standard GIS packages is not feasible)

Page 11: TRLN High Performance Data Storage System 21 Sep 2006 Jim Porto Ken Galluppi

11

Uses of Database• Development of new modeling and display

software that can handle 3d, time series, and very large databases.

• Substantive research into disaster vulnerabilities through large scale modeling.

• Source of regional and state operational information.

• Source of regional and state planning information.

Page 12: TRLN High Performance Data Storage System 21 Sep 2006 Jim Porto Ken Galluppi

12

Benefits of Database• Aid to Disaster Decision Makers (General

Assembly is increasingly interested in the State’s Disaster Response).

• Aid for Scenario construction and testing to improve preparedness.

• Cross discipline approach bringing in many professions. (Environmental quality, meteorology, public health, economics, planning and land use, agriculture, forestry, for example.)

• Regional economic analysis more feasible.

Page 13: TRLN High Performance Data Storage System 21 Sep 2006 Jim Porto Ken Galluppi

13

Benefits of Database• Gain experience in working with ultra large

databases.

• Develop new software tools to integrate , to visualize, and to analyze very large, diverse databases.

• Develop solutions for security and working agreements for data ownership.

Page 14: TRLN High Performance Data Storage System 21 Sep 2006 Jim Porto Ken Galluppi

14

Final Thoughts• We want to advance the state of knowledge in

the use if large datasets.

• We want to give the Triangle Universities a distinctive project that will garner national recognition.

• AND we want to provide relevant research that aids the citizens of NC (especially since this project if being funded by tax dollars.)