18
Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht, Stephan Kindermann, Wolfgang Stahl German Climate Computing Centre (DKRZ) Hamburg CAS2K9 September 13th – 16th, 2009 in Annecy, France

Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,

Embed Size (px)

Citation preview

Page 1: Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,

Michael Lautenschlager, Hannes Thiemann, Frank Toussaint

WDC Climate / Max-Planck-Institute for Meteorology, Hamburg

Joachim Biercamp, Ulf Garternicht, Stephan Kindermann, Wolfgang Stahl

German Climate Computing Centre (DKRZ) Hamburg

CAS2K9September 13th – 16th, 2009 in Annecy, France

Page 2: Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,

2

blizzard:blizzard:/work

/pf/scratch

/work/pf

/scratch

tape:/hpss/arch /hpss/doku /dxul/ut /dxul/utf /dxul/utd

tape:/hpss/arch /hpss/doku /dxul/ut /dxul/utf /dxul/utd

xtape:xtape:

ssh blizzard

(sftp xtape.dkrz.de)„get /hpss/arch/<prjid>/<myfile>“

pftp

HPSS(10 Pbyte /a )

HPSS(10 Pbyte /a )

GPFS(3 Pbyte)

GPFS(3 Pbyte)

IBM Power62 x Login250 x Compute150 TFlops peak

IBM Power62 x Login250 x Compute150 TFlops peak

StorageTek SilosTotal Capacity: 60000 Tapes Approx. 60 PB

(LTO and Titan)

Page 3: Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,

Data production on IBM-P6: 50-60 PB/year

Limit for long-term archiving: 10 PB/year

◦ Required is a complete data catalogue entry in WDCC

(metadata) but decision procedure for long-term archive

transition is not finally decided (data storage policy).

Limit for field-based data access: 1 PB/year

◦ Oracle BLOB-tables are replaced by CERA container file

infrastructure which is developed by DKRZ/M&D

3

Page 4: Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,

4

Oct. 2008

Mid of 2009:10 PB

Page 5: Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,

5

Oct. 2008

Mid of 2009:400 TB

Page 6: Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,

6

Page 7: Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,

Data system (HPSS) (Information on DKRZ Webserver)

The DXUL/UniTree will be replaced by HPSS (High Performance

Storage System). The existing DXUL-administered data -

about 9 PetaByte – will be transferred.

6 robot-operated silos with 60.000 slots for T10000 A/B, LTO4,

9940B and 9840C magnetic cartridges provide a primary

capacity of 60 PetaByte with 75 tape drives.

The average bandwidth of the data server is at least 3

GigaByte/s while simultaneously reading and writing with

peak flow rate up to 5 GigaByte/s.

390 TB Oracle BLOB data transferred into CERA container files

7

Page 8: Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,

9 PB DXUL/UniTree data have to be transfered to HPSS

without copying the data

◦ 9 PB DXUL data are stored in 25,000 cartridges and 25 * 10**6 files

◦ It was not feasible to run two systems in parallel for 3 -5 years which is

the estimated time for copying from DXUL/UniTree to HPSS at DKRZ

Challenges of physical movement from Powderhorn (Unitree)

into SL8500 (HPSS):

◦ Technical aspects

◦ Legal aspects

◦ Quality assurance

8

Page 9: Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,

Challenges of physical movement from Powderhorn

(Unitree) into SL8500 (HPSS):

◦ Technical aspects: In principal it is possible to read UniTree cartridges

with HPSS but it has been tested with old systems and with less

complexity of name spaces (17 name spaces on 3 servers have to

consolidated into 1 HPSS name space)

◦ Legal aspects: An unexpected license problem appeared with the

proprietary UniTree library data format. Solution was to write data library

information after consilidation into one large text file (10 GB).

◦ Quality assurance: complete comparison of metadata and checksum

comparison of a subset of 1% of the data files

Transfer to HPSS has been successfully completed, the new

system is up and running with the old data.

9

Page 10: Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,

10

3 of 6 StorageTek SL8500 silos under construction

Room for 10 000 magnetic cartridges in each silo

Page 11: Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,

CERA-2 data model left unchanged

◦ Metadata model modifications are planned in

relation to the outcome of the EU-project

METAFOR and CIM (Common Information Model)

WDCC metadata are still residing in Oracle

database tables which build the searchable

data catalogue

11

Page 12: Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,

12

Entry

Reference

Status

Distribution

Contact Coverage

Parameter

SpatialReferenceLocal Adm.

Data Access

Data Org

METAFOR / CIM:• Data provenance information• Searchable Earth system model description

Unchangedsince 1999

Page 13: Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,

Field-based data access is changing from

Oracle BLOB data tables into CERA

container files for two reasons:

◦ Financial aspect: Oracle license costs for an

Internet accessible database system of the size of

PB are out of DKRZ‘s scope.

13

Page 14: Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,

◦ Technical aspect: The BLOB data concept in the

range of TB and PB requires seamless data

transition between disk and tape in order to keep

the RDBMS restartable. This worked for Oracle

and UniTree but it could not be guaranteed for the

future neither by Oracle nor by HPSS.

◦ Requirement for BLOB data replacement: Transfer

to CERA container files has to be transparent for

CERA-2 and user data access.

14

Page 15: Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,

15

Model variables

Mod

el R

un T

ime

2 D: small LOBs (180 KB)

3 D: large LOBs (3 MB)

Each columm is one data table in CERA-2

T2M Precip SLP2D variables . . Temp

Water vapour

3D variables . .

T1T2T3.......Tn...................Tend

CERA Container Files• are LOBs plus index for random data access

• are transparent for field-based data access in WDCC

• include basic security mechanisms of Oracle BLOBs

Page 16: Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,

Motivated by long-term archive strategy and scientific

applications like CMIP5/AR5 the WDCC data access is

extended:

◦ CERA Container Files: transparent field-based data access from

tapes and disks (substitution of Oracle BLOB data tables)

◦ Oracle B-Files: transparent file-based data access from disk and

tapes

◦ Thredds Data Server: field-based data access from files on disks

(CMIP5/AR5)

◦ Intransparent data access: URLs provide links to data which are

not directly/transparently accessible by WDCC/CERA (e.g.

remote data archives)16

Page 17: Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,

17

Appl. Server

Storage@DKRZTDS

(or the like)

LobServer

HPSS

CERA

DB Layer• What• Where• Who

• When• How

Midtier

Archive: files

Container: Lobs

Page 18: Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,

Three major decisions are made in connetion with long-term

archiving in transistion to HLRE2 and HPSS:

Limitation of annual growth rates

◦ File archive: 10 PB/year

◦ CERA Container Files: 1 PB/year

Development of CERA Container File infrastructure with

emphasis on field-based data access from tapes

Integration of transparent file-based data access into

WDCC/CERA in addition to traditional field-based data access

18