Upload
nodin
View
35
Download
0
Tags:
Embed Size (px)
DESCRIPTION
CERA / WDCC. Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008. Statistics Requirements + Features General architecture Implementation (current and new) Migration Summary. Contents. - PowerPoint PPT Presentation
Citation preview
CERA / WDCC
Hannes ThiemannMax-Planck-Institut für Meteorologie
Modelle und Datenhannes.thiemann @ zmaw.de
NCAR, October 27th – 29th, 2008
Contents
Statistics Requirements + Features General architecture Implementation (current and new) Migration Summary
Basic Statistics
WDCC / CERA: General Statistics at 01-10-2008 00:00:10 Database Size (TByte): 370
Number of blobs: 6663287791 (6.6 billion) Data access by fields and not by files.
Number of experiments: 1146
Number of datasets: 142062
Total size divided by number of BLOBs gives the average size of data access granules: 50 - 60 kB/BLOB
Users by continent
12%
25%
27%
4%
13%
19% Campus
Germany
Europe
AF+OC+SA
North America
Asia
Active Users 1-Jan-2008 until 14-Oct-2008
Download destinations
Download destinations 1-Jan-2008 until 14-Oct-2008
3% 12% 6%
0%
14%65%
Campus
Germany
Europe
OC, AF, SA
North America
Asia
Records per download
66 6772
85 87 9098
010
20304050
607080
90100
1 12 120 240 600 1200 12000
Records
Per
cen
t
Recordsize
1
10
100
1000
10000
100000
1000000
10000000
100000000
1000000000
10000000000
1 8 29 32 35 84 89 92 96 99
Percent
Byt
es
Requirements and constraints
Access over WAN Downloads typically quite small, but huge
downloads to some extent. Small downloads imply that users are not willing to
wait long … We can not scan through large files for each
download Granularity has to be small
Datatypes
Model data Climatological runs (global and regional) (IPCC, …) Weather forecasts (DPHASE, CEOP, …)
Reanalysis data Observational data (COPS, CARIBIC, …) Satellite data products
Formats
CERA provides the ability to store data of any format:
These are the formats used GRIB (60%) NetCDF (18%) Other (22%)
General Architecture
Midtier
Data
General Architecture
Metadata Data
ProxyWebserver
Appl. Server
Entry
Reference
Status
Distribution
Contact Coverage
Parameter
SpatialReferenceLocal Adm.
Data AccessData Org
Select timestep + regionConvert format
Storage within CERA
1 Data of timestep i
2 Data of timestep i+1
3 Data of timestep i+2
n Data of timestep i+n
…
Database TableD
ata
of
sing
le
varia
ble
Index
Handicap
Handicap: not enough disk space available Data stored within database: approx. 400 TB Disks available: approx. 24 TB
Database has been coupled transparently to the HSM system
How do we avoid frequent tape accesses? Big cache Store data as close as possible according to the
needs of users: split into single variables
TBS - RW
TblPartition 1
TBS - RW
TblPartition 2
dxdb
TBS - RO
TblPartition 1
All tablespaces are moved
“at once” to dxdb
MigoutMigin
Data migration
Inside the datafile
Primary Key
Lob Index
Table
Blob data
Header 128k
Frontend versus Backend
Header 128k
Filesystem Frontend HSM Backend
Header 128k
Part 1 = 512 MB
Part 2 = 512 MB
Retrieving data
4
Header 128k
3 1
2 5
Tape Request
Warehouse features
Compression – nothing special used within the server
Partitioning – allow parts of data to be moved to HSM
Backup Nologging - beware of crash … Read only - two copies on tape
New implementation
Metadata database will stay as is
Oracle Databases holding data will be replaced by a new, self-made development
Why? There is a certain risk that a future version of Oracle
may not work with a / any HSM system On the long run some license costs shall be saved
General Architecture - new
Metadata Data
Webserver
Appl. Server
Oracle-DB Blobserver
CERA-Container
Instead of keeping data within blobs in Oracle databases, data records will be kept within so called CERA Container Files.
Ability to keep huge number of records. They provide fast access independent of position
within file (granular access). Provided fault tolerance against tape damages by
keeping checksums within the files. Enclose read/write operations against container files
in transactions. Well known format
Migration
Concept / Team (namely Peter Drakenberg, DKRZ) Not yet really finished
Software First software ready, in order to migrate data
Convert old data Started last week, but will take at least a year
Dataflow: outbound
1
2
Webserver
Appl. Server
34
Metadata Data
5
6
7
8
Processing
Dataflow: inbound
Metadata Dataserver
Postprocessing
Model run
GFS
Summary
CERA allows for the storage of data of different kind Format independent Metadata enables addressing of internal and
external data Users are typically fetching only small amounts of
data. System allows for efficient access to small data
granules By using warehousing functions like Partitioning by using small Oracle database Blobs or - in future
- CERA Container files.
Thank you !