Upload
bran
View
38
Download
9
Embed Size (px)
DESCRIPTION
HDF HDF/HDF-EOS Workshop III Sept. 14-16, 1999. Mike Folk, HDF Group http://hdf.ncsa.uiuc.edu/ National Center for Supercomputing Applications University of Illinois at Urbana-Champaign. Topics. I.Overview II. NCSA HDF Activities III. HDF5 IV. HDF4 vs. HDF5. I. HDF Overview. - PowerPoint PPT Presentation
Citation preview
1
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Mike Folk, HDF Grouphttp://hdf.ncsa.uiuc.edu/
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign
HDFHDF
HDF/HDF-EOS Workshop IIIHDF/HDF-EOS Workshop IIISept. 14-16, 1999Sept. 14-16, 1999
2
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
TopicsTopics
I. Overview
II. NCSA HDF Activities
III. HDF5
IV. HDF4 vs. HDF5
3
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
I. HDF OverviewI. HDF Overview
4
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
HDF MissionHDF Mission
To develop, promote, deploy, and support To develop, promote, deploy, and support open and free technologies that facilitate open and free technologies that facilitate scientific data storage, exchange, access, scientific data storage, exchange, access,
analysis and discovery. analysis and discovery.
5
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
What is HDF?What is HDF?
• Scientific data file format & supporting software
• For images, arrays, tables, other structures
• Features– Portability across architectures
• I/O library• Files
– Efficient I/O
– Efficient storage
6
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Why use HDF?Why use HDF?
• Manage data
• Share data
• Use software that understands HDF
• Improve I/O performance
• Improve storage efficiency
• Use an open standard
7
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
An HDF File: A Collection of An HDF File: A Collection of Scientific Data ObjectsScientific Data Objects
HDF file containing four 3-D arraysHDF file containing four 3-D arrays
8
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Mixing HDF Objects in One FileMixing HDF Objects in One File
3-D array3-D arrayRaster imageRaster image
TableTable
groupgroup
Raster Raster imageimage
palettepalette
HDF fileHDF file
3-D array3-D array
Lat lon temp---- ---- ----- 12 23 3.1 15 24 4.2 17 21 3.6 16 35 5.7
9
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Utilities and applications for Utilities and applications for manipulating, viewing, and manipulating, viewing, and analyzing data.analyzing data.
HDF I/O libraryHDF I/O library
– High-level, object-specific APIs.High-level, object-specific APIs.
– Low-level API for I/O to files, etc.Low-level API for I/O to files, etc.
File or other data source. File or other data source.
General Applications
ApplicationProgramming
Interfaces
Low-levelInterface
HDFfile
HDF SoftwareHDF Software
}
10
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
HDF Applications SoftwareHDF Applications Software
• Free software– NCSA HDF library and utilities– Other software
• Commercial/other software that “understands”– all of HDF (Noesys, IDL, HDF Explorer)– certain HDF objects (MATLAB, WebWinds)– certain HDF applications (SHARP, WIM)
• http://hdf.ncsa.uiuc.edu/tools.html
11
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDFUniversity of Illinois at Urbana-Champaign
What platforms does HDF run on?What platforms does HDF run on?
• Sun: Solaris
• SGI: Indy, Power Challenge, Origin, Cray C90, YMP, T3E
• HP9000, HP-Convex Exemplar
• IBM: RS6000, SP2
• DEC: Alpha/Digital UNIX, OpenVMSVAX: OpenVMS
• Intel: Solarisx86, Linux, FreeBSD, Windows NT/98
• PowerPC: Mac-OS
12
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
A Sampling of HDF UsersA Sampling of HDF Users
NCSA-affiliated Science teams NCSA-affiliated Science teams Visualization, data exch, fast I/O, ... Visualization, data exch, fast I/O, ...
Mathworks, Fortner Software, Mathworks, Fortner Software, Format supported by vendors of visFormat supported by vendors of vis Research Systems Inc., etc. Research Systems Inc., etc. and data analysis softwareand data analysis software
BoeingBoeing Space-time change detection in imagesSpace-time change detection in images
Distributed Oceanographic DataDistributed Oceanographic Data Remote access to earth science dataRemote access to earth science dataSystem (DODS)System (DODS)
Army Research LabArmy Research Lab Network distributed global memoryNetwork distributed global memory
Center for Analysis & PredictionCenter for Analysis & Prediction Fast parallel I/O, portability, Fast parallel I/O, portability, of Storms of Storms multi-resolution grids multi-resolution grids
TRAPPIST TRAPPIST Exchange, analysis & visualization of Exchange, analysis & visualization of (Euro consortium) (Euro consortium) non-destructive testing datanon-destructive testing data
13
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Major User #1: EOSDISMajor User #1: EOSDIS
• ESDIS ProjectESDIS Project– open standard exchange format and I/O library for EOSDIS
– EOS applications
• HDF requirements– Earth science data types (HDF-EOS, etc,)
– User support for scientists, data producers, etc.
– Library and file structure improvements
– HDF tools, utilities, access software
– Software maintenance and QA
14
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Major User #2: ASCIMajor User #2: ASCI
• ASCI Data Models and Formats (DMF) Group – open standard exchange format and I/O library for ASCI
– DOE tri-lab ASCI applications
• HDF requirements– large datasets (> a terabyte)
– ASCI data types, especially meshes
– good performance in massive parallel environments
– primarily HDF 5
15
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
II. NCSA HDF ActivitiesII. NCSA HDF Activities
16
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Java applicationsJava applications
• HDF APIs– Basis for tools that access HDF
• HDF Viewers– HDF browser/visualizer
• HDF4 Data Server Prototype– Lessons learned about remote access to
17
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Remote Data AccessRemote Data Access
• The SDB: Web-based Server-side Data Browser
• Java for remote access
• WP-ESIP: DODS project
• Computational Grids (Globus/GASS)
18
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
HDF StandardizationHDF Standardization
• To share files, users must organize them similarly.
• HDF user groups create standard profiles– Ways to organize data in HDF files.– Metadata– API
• Examples: HDF-EOS, ASCI DMF
19
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
General ApplicationsHDF-EOS APIHDF-EOS API
ApplicationProgramming
Interfaces
Low-levelInterface
HDFfile
HDF-EOS software layersHDF-EOS software layers
HDF-EOS ApplicationsHDF-EOS Applications
HDF-EOS profiles
20
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
““HDF Configuration Record” (HCR)HDF Configuration Record” (HCR)
• To simplify the tasks of defining, comparing, and producing HDF-EOS files
• Formal (ODL) descriptions of HDF-EOS objects
21
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
HCR of SwathHCR of Swath/* Project XYZ *//* First version defined on June 10th, 1998 */OBJECT = SWATH
NAME = SCAN1OBJECT = Dimension
NAME = GeoTrackSize = 1200
END_OBJECT = DimensionOBJECT = Dimension
NAME = GeoCrossTrackSize = 205
END_OBJECT = DimensionOBJECT = Dimension
NAME = DataXSize = 2410
END_OBJECT = DimensionEND_OBJECT = SWATHEND
22
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
HCRHCR
• HCR Utilities:– Converters: HCR HDF-EOS– Edit HCR and HDF-EOS– Compare HCR with HDF-EOS file
• Current projects: – Extend HCR converters to all of HDF4– Similar work with HDF5– XML too
23
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
III. HDF5III. HDF5
24
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Why HDF5?Why HDF5?
• HDF shortcomings exposed by EOSDIS, ASCI and others...– Limits on object & file size (<2GB)– Limited number of of objects (<20K)– Rigid data models– I/O performance– Aging software infrastructure (code entropy)
25
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
• …new Demands...– Bigger, faster machines and storage systems
• massive parallelism, parallel file systems
• teraflop speeds, terabyte storage
– Greater complexity• complex data structures
• complex subsetting
– More emphasis on remote & distributed access
26
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
• … and ASCI Requirements – Compatibility with vector bundle model– Compatibility with MPI-IO– Ability to transform data between memory & storage– Parallel file systems: PIOFS, HPSS, etc.
27
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
New HDF5 FeaturesNew HDF5 Features
• More scalable– Larger arrays and files– More objects
• Improved data model– New datatypes– Single comprehensive dataset object
• Improved software– More flexible, robust library– More flexible API– More I/O options
28
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
HDF5 data modelHDF5 data model
• Two primary objects
• Dataset– multidimensional array of elements – rich variety of datatypes
• group– directory-like structure – contains datasets, groups, other objects
29
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Dataset componentsDataset components
• multidimensional array
• header with metadata– datatype– dataspace– attributes– storage properties
30
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Simple datatypesSimple datatypes
• The usual scalars: integer & float
• user-defined scalars (e.g. 13-bit integers)
• variable length (e.g. strings)
• pointers to objects or regions of datasets
• enumeration
• opaque
31
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Compound datatypesCompound datatypes
• User-defined
• Comparable to C structs
• Members can be simple or compound types
• Members can be multidimensional
32
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Data Spaces Data Spaces
• How data are organized to form a dataset – rank– dimensions
• Subsetting during I/O operations– What subset of data is to be moved– In-memory organization of data– In-file organization of data
33
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
5
3
HDF5 dataset: array of recordsHDF5 dataset: array of records
Dimensionality: 5 x 3Dimensionality: 5 x 3
RecordRecord
int8int8 int4int4 int16int16 float32float32Datatype:Datatype:
34
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
2D array of integers 3D array of floats
File Memory
DataspacesDataspacesReading Dataset into Memory from FileReading Dataset into Memory from File
Read
35
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Selection: Examples of mappings between file selections Selection: Examples of mappings between file selections and memory selections. and memory selections.
(c) A sequence of points from a 2D array to a sequence of points in a 3D array.
(d) Union of slabs in file to union of slabs in memory. No. of elements must be equal.
(b) A regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array
(a) A hyperslab from a 2D array to the corner of a smaller 2D array
36
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Attributes Attributes
• Named pieces of data
• Stored in a dataset or group header
• Operations are scaled down versions of the dataset operations – Not extendible – No compression – No partial I/O
37
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Property listProperty list
• Properties of objects or operations
• Describe how to create, store, access and transfer data
38
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Some PropertiesSome Properties
• chunked
• compressed
• extendable
• split file
Metadata for Fred
Dataset “Fred”
File AFile A
File BFile B
Data for FredData for Fred
Better subsetting access time; extendable
Improves storage efficiency, transmission speed
Datasets can be extended in any direction
Metadata in one file, raw data in another.
39
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Metadata
Dataset
Datatype
time = 32.4pressure = 987temp = 56
int16
Dataspace
Dim_3=2
Dim_2=4
Dim_1=5Rank=2 Storage properties
Chunked; compressed
Attributes
Data
Dataset componentsDataset components
40
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
GroupsGroups
• Structures for organizing the file
• Like Vgroups in HDF4
• Like directories in hierarchical file system
• Every file starts with a root group
• Groups have attributes
41
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
“root”
GroupsGroups
• A mechanism for collections of related objects
• Every file starts with a root group
• Can have attributes• Like directories
in Unix, but a graph, rather than a tree
42
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
GroupsGroups
Groups and members of groups can be sharedGroups and members of groups can be shared
root
43
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
mount!mount!mount!mount!
MountingMounting
root
File A
root
File B
44
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Reading & writing with HDF5Reading & writing with HDF5
• Set properties
• Describe the data – datatypes– rank and dimensions– mapping between file and memory
• Read/write
45
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Files needn’t be files - Virtual File LayerFiles needn’t be files - Virtual File Layer
VFL: A public API for writing I/O drivers
memorympiostdio
Hid_t
Files Memory
““File” HandleFile” Handle
I/O driversnetwork
Network
VFL: Virtual File I/O LayerVFL: Virtual File I/O Layer
““Storage”Storage”
46
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
HDF5 toolsHDF5 tools
• Current– hdf5ls - lists contents of HDF5 file
– h5dumper - higher level view
– hdf5hdf4 converter
• Future– Convert HDF5 ascii, binary, GIFF, etc
– Convert HDF4 HDF5
– Java tools - VisAD, etc.
– File/code generation from DDL description
– Talking to vendors
47
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Other HDF5 activitiesOther HDF5 activities
• Performance tuning
• Object model
• Fortran and C++ API
• Thread-safe HDF5
48
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
IV. HDF4 vs. HDF5IV. HDF4 vs. HDF5
49
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
HDF4 vs. HDF5HDF4 vs. HDF5
• HDF4– Original format and library – Compatible with all earlier
versions– 6 primary objects
• multidim array of scalars• raster image, palette• table• annotation • group
– Biggest current user: Earth Observing System Data and Info System (EOSDIS)
• HDF5 - successor to HDF4– New format and library– Not compatible with earlier versions– 2 primary objects
• multidim. array of records• group
– Biggest current user: Accelerated Strategic Computing Initiative (ASCI)
50
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
HDF4 object types can be derived from HDF4 object types can be derived from HDF5 datasets and groupsHDF5 datasets and groups
HDF5 dataset
03 04 43 43 43 -3 72 44 50 34 45 77 34 23 57 45 67 87 00 45
March 15, 1990. Simulation with k=10.0, beta=1.22e3. Calculate the magnitude ...
HDF5 group
HDF4 Vgroup
HDF4 SDSn-dim arrayof scalars
HDF4 8-bit raster
HDF4 24-bit raster
2-dim array ofmulti-component
scalarsHDF4 Vdata1-dim arrayof records
lat lon temp 12 23 3.1 15 24 4.2 17 21 3.6 23 35 7.2 25 31 6.3
51
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Status of HDF4 vs. HDF5Status of HDF4 vs. HDF5
• HDF4 is still an EOS standard
• HDF5 likely also
• HDF4 maintenance– Maintained as long as EOS needs it– Minimal new feature
• New applications: use HDF5 if possible!– New features, performance improvements, etc.
52
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
HDF InformationHDF Information
• HDF Information Center– http://hdf.ncsa.uiuc.edu/
• HDF Help email address– [email protected]
• HDF users mailing list– [email protected]