Reading HDF family of formats via NetCDF-Java / CDM John Caron UCAR/Unidata

Preview:

Citation preview

Reading HDF family of formatsvia NetCDF-Java / CDM

John Caron

UCAR/Unidata

NetCDF-Java library

• 100% Java• Open Source (LGPL, MIT)• Independent implementation• Used as a component in other software (partial)

– Integrated Data Viewer, THREDDS Data Server (Unidata)– Panoply (NASA)– ncBrowse (EPIC/NOAA)– Java NEXRAD Viewer (NCDC/NOAA)– MyWorld GIS (Northwestern)– EDC for ArcGIS, ERRDAP (SFSC/NOAA)– Live Access Server (PMEL/NOAA)– ncWMS (Reading)– Matlab plug-in (USGS)

NetcdfDataset

ApplicationScientific Feature Types

NetCDF-Java/

CDM architecture

OPeNDAP

THREDDS

Catalog.xml NetCDF-3

HDF5

I/O service provider

GRIB

GINI

NIDS

NetcdfFile

NetCDF-4

…Nexrad

DMSP

CoordSystem Builder

Datatype Adapter

NcMLNcML

Format Readers (IOSP)

• General: NetCDF, HDF5, HDF4, OPeNDAP• Gridded: GRIB-1, GRIB-2, GEMPAK• Radar: NEXRAD 2&3, DORADE, CINRAD,

Universal Format• Point: BUFR, ASCII• Satellite: DMSP, GINI, McIDAS AREA• Misc: GTOPO, Lightning, etc• Others in development (partial):

– AVHRR, GPCP, GACP, SRB, SSMI, HIRS (NCDC)

Line of Code (est)

LOC semicolons ratio LOC ratio seminetcdf3 1977 846 1 1hdf4 3151 1405 1.6 1.7hdf-eos 3737 1695 1.9 2.0hdf5 5735 2672 2.9 3.2

common 28121 9267

Why all the trouble?

• ~20-40% C/C++ time spent on portability issues• Platform Independence

– Linux, Solaris, Windows (Sun)– Mac OS X (Apple)– AIX, Linux, Windows, z/OS (IBM)– HP-UX (Hewlitt-Packard)

• Progammer productivity– Object-Oriented– Garbage Collected – no memory leaks– Rich libraries– Open source

• Faster than C for some applications

Independent implementation

• Written entirely from reading HDF4, HDF5 file specifications

• Helped debug (HDF5), validate file specs

• File format spec is what will be needed in 100 years to read legacy data– OTOH, semantics not always obvious

• Don’t confuse reference implementation with the file/protocol specification

HDF family of formats

• HDF5/NetCDF-4

• HDF4

• HDF-EOS

• Note: read-only, no parellel I/O, etc

HDF5/NetCDF4

• Goal is to read all HDF5– Can read all HDF5 files that we have example– including references, soft links– Complete coverage difficult to guarantee –

combinatoric explosion

• Some esoteric features we are skipping– File drivers, external files, slib compression

• Working on a comprehensive test harness– JNI interface to Netcdf4/HDF5 library– read every byte and compare

HDF4 / HDF-EOS

• Complete, works against all examples

• Tested against 400 sample files (27 Gb)– thanks to Ruth Duerr (NSIDC)

• Spot checked against HDFView

• Need systematic test to compare reading against the HDF4 C Library

Geolocation Primer

Swath

Float lat(245, 33477);

Float lon(245, 33477);

Float time(33477);

Float data(245, 33477);

Just know that its swath data• 245 points cross track• 33477 along the track• Each scan has a time coordinate

Swath

Float lat(33477, 245);

Float lon(33477, 245);

Float time(33477);

Float data(245, 33477);

Swath

Float lat(999,999);

Float lon(999,999);

Float time(999);

Float data(999,999);

Swath

Float v1(999, 999);

Float v2(999, 999);

Float v3(999);

Float v4(999,999);

If you write data

• Don’t rely on variable name conventions

• Don’t rely on index ordering

• Don’t rely on matching index sizes

• Minimize “you just have to know that…”

Dimensions

Dimensions

d1=999;

d2=999;

Variables:

float v1(d1=999, d2=999);

float v2(d1=999, d2=999);

float v3(d2=999);

float v4(d2=999,d1=999);

Good

Variables: float v1(d1=999, d2=999); v1:standard_name = “Latitude”; float v2(d1=999, d2=999); v2:standard_name = “Longitude”; float v3(d2=999); v3:standard_name = “Time”; float v4(d2=999,d1=999);

Data_type = “Swath”;Conventions = “My unique name”;

If you write data

• Unique signature

• Specify dimensions

• Identify georeferencing coordinates

• Identify data type

• Units are not optional

HDF-EOS, HDF-EOS2

• Read “structural metadata” field to obtain more semantics

• Parse text in “ODL”– Data type: Swath, Grid, Point– Dimensions– Geolocation coordinate variable types:

Latitude, Longitude, Time

HDF-EOS, HDF-EOS2

• Good– Unique signature, identify coordinates and

data type

• Not so good– ODL– Not using hdf4/5 constructs

• Bad– No data units– No time coordinate units!

Better EOS

Variables: float v1(999, 999); v1:standard_name = “Latitude”; v1:dims = “d1 d2”; float v2(999, 999); v2:standard_name = “Longitude”; v2:dims = “d1 d2”; float v3(999); v3:standard_name = “Time”; v3:dims = “d2”; float v4(999,999); v4:dims = “d2 d1”;

NPP (i1.4.0.3_NPP_QUAL)

• Good– XML better than ODL

• Not so good– Not using hdf4/5 constructs

• Bad– No data units– No time coordinate units!

• Fatal Error: please reboot – Metadata not in the same file

Summary

• Netcdf-Java reads entire HDFx family

• Good for Java-philes

• Needs more testing – Send example files, $

• Dimensions are not optional

• Keep structural and georeferncing metadata in the same file as the data– Can also have specialized external files

Contact

caron@ucar.edu

Google “netcdf java”

NetCDF-4 andCommon Data Model(Data Access Layer)

Dimension primer

Float lat(180);

Float lon(360);

Float alt(20);

Float time(1200);

Float data(1200,20,180,360);

Unique Name!

Float lfip(lfip=180);

Float lflop(lflop=180);

Float zorg(zorg=20);

Float skdf(skdf=1200);

Float dglot(skdf=1200,zorg=20,

lfip=180,lflop=180);

Float lfip(180);

Float lflop(180);

Float zorg(20);

Float freebish(1200);

Float dglot(1200,20,180,180);

Float lat(180);

Float lon(180);

Float alt(20);

Float time(1200);

Float data(1200,20,180,180);

Recommended