Upload
aneko
View
32
Download
0
Tags:
Embed Size (px)
DESCRIPTION
World Data Center Climate: Terabyte Data Storage in a Relational Database System. Michael Lautenschlager, Hannes Thiemann and Frank Toussaint ICSU World Data Center Climate Model and Data / Max-Planck-Institute for Meorology Hamburg, Germany. - PowerPoint PPT Presentation
Citation preview
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 1
WS Spatiotemporal Databases for Geosciences, Biomedical sciences and Physical
sciencesEdinburgh, November 1st + 2nd, 2005
World Data Center Climate:Terabyte Data Storage in a
Relational Database System
WDCC Home: www.wdcc-climate.de / WDCC Contact: [email protected]
Michael Lautenschlager, Hannes Thiemann and Frank Toussaint
ICSU World Data Center ClimateModel and Data / Max-Planck-Institute for Meorology
Hamburg, Germany
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 2
Content:
Introduction of WDCC
CERA2 Data Model
Data Access
Connection to Mass Storage Archive
Summary
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 3
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 4
WDCC Content
ERA40
IPCC
CEOPBALTEX HOAPS
CARIBIC
WOCE
ERA15/40NCEP
GEBCO
COSMOS
Simulations @ MPI, GKSS,…
Data from Earth SystemModelling andRelated Observations
EH5/MPI-OMIPCC-AR4
Start: Approved in January 2003Maintenance: Model and Data (M&D/MPI-M) and German Climate Computing Centre (DKRZ)
Oktober 2005: 580 Experiments / 68.000 Data Sets
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 5
WDCC Access
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 6
WDCC Size
4.6 Billion BLOBs
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 7
WDCC DB StorageStorage of global coverages per file or BLOB :
all levels, all parameters arbitrary time intervals
all levels, all parameters 1 moment (6 by 6 hours)
1 level, 1 parameter 1 moment (= 1 BLOB = 1 global field)
parameters
leve
lsda
ys/4
days
/4
parameters
leve
lstim
e
how we get the grid data:Files from climate model
postprocessing step 1: homogenizing time and calculation of diagnostics
postprocessing step 2: isolation of levels & parameters and creation of BLOB table input
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 8
Data Model
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 9
(I) Data catalogue and Unix files (pointer or BLOB-table-entry)
Enable search and identification of data Allow for data access as they are (coarse granularity)
(II) Application-oriented data storage Time series of individual variables are stored as BLOB
entries in DB Tables (fine granularity)Allow for fast and selective data access
Storage in standard data format (GRIB, NetCDF)Allow for application of standard data processing routines
(PINGOs, CDOs)
CERA1) Concept:Semantic Data Management
1) Climate and Environmental data Retrieval and Archiving
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 10
Level 1 - Interface:Metadata entries(XML, ASCII)+ Data Files
Level 2 – Interf.:Separate filescontaining BLOBtable data in application adapted structure(time series ofsingle variables)
Experiment Description
Unix-FilesTable / Pointer
Dataset 1Description
Dataset nDescription
BLOB DataTable
BLOB DataTable
WDCC Data Topology
BLOB DB Table corresponds to scalable, virtual file at the operating system level.
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 11
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 12
CERA Data Model
Entry
Reference
Status
Distribution
Contact Coverage
Parameter
SpatialReferenceLocal Adm.
Data Access
Data Org
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 13
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 14
CERA Modules
3 Modules:
• DATA_ACCESSfor automatted data access ( remote data access)
• DATA_ORGorganization of grid data( geo-references of grid points in BLOBs)
• CODEmatching of (internal) model code numbers
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 15
The CERA2 data model …allows for data search according to discipline, keyword, variable,
project, author, geographical region and time interval and for data retrieval.
allows for specification of data processing (aggregation and selection) without attaching the primary data.
is flexible with respect to local adaptations, to storage of different types of geo-referenced data, and to definition of data topologies (hierarchical, network, ….).
is open for cooperation and interchange with other database systems (e.g. FGDC metadata standard and ISO 19115 included).
But:is not the simplest data model for each single application.
Data Model Functions
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 16
Data Access
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 17
Web Access to WDCC
METADATA: DATA:
GUI: display in applet JDBC
jblob-script: Search for DS names
JDBC jblob –f …
http: - html-display
- xml-download (ISO, DC, …)
downloadhttp
URL:http://…
dynamichtml pages
http:htmlS
ervl
et /
JS
PlnternetApplication
Server
web browser
Interactive Catalogue Access
Catalogue access via WWW
• URL parsed by JSP
• integrated DB retrieval by JSP
• response in standard html
• efficient administration of detailed meta information
request: URL
write toclient disk
http:file download
Ser
vlet
/ J
SPlnternet
ApplicationServer
web browser
HTTP and JDBC Data Download
• request handeled by JSP• return of binary file
request: html form
jdbcfile download
request: jdbc
write toclient disk
progr. „jblob“
Data download via WWW
• standard client side jdbc retrieval• return of binary fileData download via script/batch
raw xml
xhtml
ISO xml
DC xml
... variousmetadataformats
http:XML
xsl –mapping
xsql
–qu
ery
see wini.wdc-climate.de
lnternetApplication
Server
Metadata access via WWW:
• xsql query to DB
• xml output from DB
• xsl mapping to any metadata format
XML Interface for http Metadata Output
request: URL user applications
plain ASCII
html tables
binary objects. . .
variousdata
formats
http:plain, bin,
htmlJava
Ser
vletlnternet
ApplicationServer
user applications
Data access via WWW
• URL parsed by servlet
• query: DB access by jdbc
• response in any format
http Data Output
request: URL
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 22
Connection to Mass Storage Archive
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 23
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 24
Oracle DBMS+ HSM
DXDB:Unitree client on DB machines forcommunication betweenOracle DB and tape archive
Tapes Disks
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 25
Use of DXDB
DXDB is used for Ordinary Oracle datafiles Redo logs Backup
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 26
TBS - RW
TblPartition 1
TBS - RW
TblPartition 2
dxdb
TBS - RO
TblPartition 1
All tablespaces are moved
“at once” to dxdb
MigoutMigin
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 27
Migout / Migin
Migout takes place after files haven’t been modified for x minutes
Only one migout process per dxdb-filesystem Migin takes place immediately after a file is requested.
Only parts accessed are retrieved from the backend storage.
One migin process per requested file.
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 28
dxdb
LWM
HWM
Purging
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 29
Pro
It works It’s fast Applications don’t have to wait until files are completely
restored from tapes.
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 30
Contra
It works
Dxdb not supported by Oracle Oracle's officially supported Backend requirements do not necessarily match requirements from other applications like HSM systems (i.e. connection to Unitree is not standarised).
- If the backend works
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 31
SummaryEfficient handling of detailed metadata
• easy and structured administration of > 60 metadata tables
• access support:Java Server Pages (JSP), Servlets, jdbc, xsqlincluding standard DB features (sql, views, triggers, ... )
Efficient handling of fine granularity data
• random access to arbitrary time steps of single parameters
• access support:Java Server Pages (JSP), Servlets, jdbcincluding standard DB features (authorisation, ... )
• transparent migration of bulk data to tape
M.Lautenschlager (WDCC / MPI-M) / 27.10.05 / 32
The Winter TopTen Program identifies the world’s largest and most heavily used databases.
Email reached in September, 13th: ….. Congratulations on achieving Grand Prize award winner status (1) in Database Size, Other, All and TopTen Winner status Database Size, Other, Linux;Workload, Other, Linux in Winter Corp.'s 2005 TopTen Program! .......
(1) Grand prizes are awarded for first place winners in the All Environments categories only.
WDCC's CERA DB has been identified as the largest Linux DB.
http://www.wintercorp.com/VLDB/2005_TopTen_Survey/2005TopTenWinners.pdf