Upload
imogene-ball
View
217
Download
2
Tags:
Embed Size (px)
Citation preview
Astronomical Tiled Image Compression
How
&
Why
Authors:
Rob Seaman, NOAOBill Pence, NASA/GSFCRick White, STScIMark Dickinson, NOAOFrank Valdes, NOAONelson Zárate, NOAO
Statement of problem
No one compression is always bestNew instruments and survey
programs will dwarf data sets that have come before
Observatories' data storage costsTransport latency & bandwidth
challenge not just budgets, but technology and human patience
The bottom line is data handling throughput, not static storage
Host level compressionPer-file gzip compression
Contents of file are opaque
Speed of compression
Speed of decompression
Size of output
Limited support for on-the-fly decompression
How
FITS tile compression convention
Provides a general framework
Supports any compression
algorithm that can operate on
multidimensional image sections
FITS headers remain readable
Access to individual FITS HDUs
Files are still FITS
LimitationsOnly partially supported by IRAFSupported by CFITSIO, but caveats:Not idempotent, even a losslessly
compressed file would suffer keyword changes
Original convention covered only per-HDU issues, e.g., compressing a SIF produced same binary table as MEF original
Only application was the limited imcopy example program
Unsupported algorithms
Improvementsfpack compression toolCompress images in-place Multi-image archives for efficiencyIdempotentSupports FITS ChecksumApplications layered on CFITSIO
access compressed files and file archives transparently
Support for HcompressGeneral purpose option for
adaptively scaling input data.
fpack / funpackfpack, a FITS tile-compression engine. Version 0.8.2 (25 September
2006)usage: fpack [-r|-p|-g|-h] [-w|-t <axes>] [-n <bits>] [-v] [-Etc] <FITS>
Flags must appear (separately) before filenames: -r Rice compression [default], or -p PLIO compression, or -g GZIP (per-tile) compression -h Hcompress compression -w override tile size to be whole image, or -t <axes> comma separated list of tile sizes [default=row] -n <bits> noise bits to preserve for real pixels [default=4] -v verbose -F clobber output [default overwrites input in-
place] -K keep (don't delete, overwrite or change) input files -A <file> write (append or clobber) output to single file, or -P <pre> prepend <pre> to create separate output filenames -L list and validate contents, files unchanged -H print this message -V print version number <FITS> FITS files or extensions to pack
… & WhyPreserve the scientific integrity of
processed astronomical data setsNative integer data products permit
lossless compression techniques for neutral effect, or
May benefit from lossy compression for high compression factors
Processing, pipeline or hands-on, often creates floating point
Choose lossy compression, orScale data into integers
Compression statistics
Additional cost for gzip’ed floating point output from pipeline is $2.86 per image versus Rice compressed integers.
BenefitsReduced:
DiskspaceBandwidthLatency
Remove need to decompressPack multiple files for efficient
transportHeaders remain readableIndividual HDUs are accessibleChoice of algorithm isn’t fixed
DMS architecture
Benefits NSA, NHPP, NVO portal
No need for ASCII header filesSmaller footprintFaster replicationFiles remain FITS throughoutExtends upstream into domesExtends downstream to usersCompression can be free or
better than free