Stockpile Resource Center – Aircraft Compatibility Summer Work Presentation: Graflab Data Compression Study Myuran Kanga August 12, 2010 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
1. Stockpile Resource Center Aircraft Compatibility Summer Work
Presentation: Graflab Data Compression Study Myuran Kanga August
12, 2010 Sandia is a multiprogram laboratory operated by Sandia
Corporation, a Lockheed Martin Company, for the United States
Department of Energys National Nuclear Security Administration
under contract DE-AC04-94AL85000.
2. Presentation Outline Introduction Project Overview Sam
Sterns Data Compression Uses for Data Compression Types of Data
Compression Three Algorithms Testing Procedure
Compression/Decompression Example Findings Conclusion Page ii
3. Presentation Outline Introduction Project Overview Sam
Sterns Data Compression Uses for Data Compression Types of Data
Compression Three Algorithms Testing Procedure
Compression/Decompression Example Findings Conclusion Page 1
4. Introduction Myuran Kanga Bachelors Degree: Oklahoma State
University Electrical Engineering Masters Fellowship Program: Rice
University Electrical Engineering (Communications Specialization)
Sandia: Meaningful Work/Projects: - Team Assimilation - Shaker
Testing - Cadence ORCAD Electronic Design Software familiarization
- ORCAD Installation/licensing procedure documentation - Courses
Quality for Project Management, Engineering Excellence, Labview
Core I, and Labview Core II - Graflab Data Compression
Study/Evaluation Page 2
5. Presentation Outline Introduction Project Overview Sam
Sterns Data Compression Uses for Data Compression Types of Data
Compression Three Algorithms Testing Procedure
Compression/Decompression Example Findings Conclusion Page 3
6. Project Overview Graflab Data Compression Study Page 4
Summary: Evaluation of three Data Compression Algorithms created by
Dr. Samuel D. Sterns. Primary Investigator/Technical Project Lead:
Myuran Kanga Key Personnel: Jerry Cap and Troy Skousen Biography:
Author Compression Algorithms: Dr. Sam Sterns [1] - Electrical
Engineer specializing in digital signal processing and adaptive
signal processing - Distinguished Member of the Technical Staff at
Sandia National Laboratories for 27 years. Retired in 1996. -
Author/Co-author of 7 signal processing textbooks - Professor
Emeritus at the University of New Mexico, involved with
teaching/research at the university since 1960.
7. Project Overview Graflab Data Compression Study cont. Page 5
Project: Evaluation and interpretation of three data compression
algorithms. - Algorithms labeled 2, 3, and 4 - Code written in
Matlab - Each similar in nature - Algorithms implement additional
and more sophisticated methods of compression - More complex
algorithms said to require longer computational time but greater
accuracy - Hope to utilize compression with GRAFLAB - GRAFLAB is a
database, analysis, and plotting package used for data reduction,
analysis, and archival purposes at Sandia.
8. Presentation Outline Introduction Project Overview Sam
Sterns Data Compression Uses for Data Compression Types of Data
Compression Three Algorithms Testing Procedure
Compression/Decompression Example Findings Conclusion Page 6
9. What is Data Compression? Page 7 [2]
10. Data Compression Definition: The process of encoding
information using fewer units of storage than an un-encoded
representation of data, through the use of specific encoding
schemes. [3] Data compression, or sometimes called source coding,
is the process of converting input data into another data stream
that has a smaller size, but retains the essential information
contained within the original data stream. Page 8
11. Presentation Outline Introduction Project Overview Sam
Sterns Data Compression Uses for Data Compression Types of Data
Compression Three Algorithms Testing Procedure
Compression/Decompression Example Findings Conclusion Page 9
12. Data Compression Implementations Page 10 - Compression is
useful because it helps reduce the consumption of resources, such
as hard disk space or transmission bandwidth. - With the interest
and surge in environmental test data for the Surveillance Program,
significant strains on computer storage resources will occur. -
Archiving of environmental test data from legacy systems, including
data for the Environment Test lab. - Familiar examples of data
compressed files include .zip, .rar, .tar file extensions. [4]
13. Presentation Outline Introduction Project Overview Sam
Sterns Data Compression Uses for Data Compression Types of Data
Compression Three Algorithms Testing Procedure
Compression/Decompression Example Findings Conclusion Page 11
14. Lossless vs. Lossy Compression Two forms of compression:
Lossless and Lossy Lossless compression: - These types of
algorithms usually exploit statistical redundancy to represent the
users data more concisely without error. - Most real-world data has
statistical redundancy - Example In English text, the letter e is
much more common than the letter z. Similarly the probability that
the letter q will be followed by the letter z is very small. Page
12
15. Lossless vs. Lossy Compression Lossy Compression: - Guided
by research on how people perceive the data in question. - Used
when some loss of fidelity is acceptable. - As an example, the
human eye is more sensitive to subtle variations in luminance than
to variations in color. Therefore, color complexity can be reduced
to maintain the integrity of images, etc. - JPEG image compression
works in part by rounding off some of this less important
information. - Lossy data compression provides a method of
obtaining the best fidelity for a given amount of compression
desired. Page 13
16. Presentation Outline Introduction Project Overview Sam
Sterns Data Compression Uses for Data Compression Types of Data
Compression Three Algorithms Testing Procedure
Compression/Decompression Example Findings Conclusion Page 14
17. Compression Algorithms Page 15 Compression 2 - Quantizes
the data signal and packs the result into a sequence of bytes.
Compression 3 - Predicts the quantized data and packs the
prediction error into a sequence of bytes. Compression 4 - Said to
provide the maximum compression - Encodes the prediction error into
a sequence of bytes using adaptive arithmetic coding. [5]
18. Compression Algorithms cont. Page 16 Quantization - The
process of mapping a continuous range of values by a relatively
small set of discrete symbols or integer values. - Sampling occurs
on a periodic basis to convert the continuous signal to discrete
values. - Can by viewed as accumulating data in bins [6]
19. Compression Algorithms cont. Page 17 Linear Prediction [7]
- Signal processing tool used in which future values of a digital
signal are estimated as a linear function of previous samples in
the data. - Time varying digital filter, excitation function,
desired output y(n) - Finding the appropriate excitation function
and filter coefficients to minimize the error of the predicted y(n)
and original y(n). - Also called Linear Predictive Coding - Common
application: - Speech compression - Transmit only filter
coefficients (Hk) and excitation sequence x(n) - For extreme
compression, only transmit filter coefficients and use a
fix-frequency excitation voice-coder )( 1 0 0 )()( jnx N j M j b
jjnya jny N j j nejnyny a1 )()()( N j j jnyn ay 1 ^ )()( )()()( ^
nnyne y
20. Compression Algorithms cont. Page 18 Arithmetic Coding [8]
- Long data strings are represented by a single number, which is
obtained by repeatedly partitioning the range of possible values in
proportion to the probabilities of the data string. - Example
string: DABDDB Symbol Part 1 Part 2 Freq. Product Total D 65 x 3
23328 A 64 x 0 3 0 B 63 x 1 3 x 1 648 D 62 x 3 3 x 1 x 2 648 D 61 x
3 3 x 1 x 2 x 3 324 B 60 x 1 3 x 1 x 2 x 3 x 3 54 25002
sFrequencieTotalDataCoded _ 2510023321325002 Part 1: - 6 digit
string = Radix of 6 - Multiplied by index of letter A = 0 to D = 3
Part 2: - Multiply by frequency of accumulated product in symbol
data
21. Presentation Outline Introduction Project Overview Sam
Sterns Data Compression Uses for Data Compression Types of Data
Compression Three Algorithms Testing Procedure
Compression/Decompression Example Findings Conclusion Page 19
22. Evaluation Procedure/Analysis Page 20 Classical Waveform
Compression Study: - Triangle Wave - Trapezoid Wave - Sine Wave -
Sawtooth Wave - Hanning Window - Harmonic Sine Waves - Combined
Sine Waves - Gap Analysis - White Noise - Sine Wave with Noise -
Power Spectral Density - Square Wave - .wav File Waveforms created
manually in individual m-files for predictability of vector
arrangement in Matlab. Frequencies and signal durations are easily
modifiable.
24. Testing and Measurements Page 22 Implemented Analysis and
Measurements: - Input and output data array sizes - Percentage
accuracy of compression - Compression ratio - Relative
computational time - Percent difference: Max. and Min. values of
original and decompressed waveforms - Percent difference: Standard
deviation value of original and decompressed waveforms - Percent
error: Max. and min. values of original and decompressed waveforms
- Percent error: Standard deviation value of original and
decompressed waveforms - Root Mean Square values of original and
decompressed waveforms - Normal values of original and decompressed
waveforms - Difference in RMS values - Difference in Normal
values
25. Presentation Outline Introduction Project Overview Sam
Sterns Data Compression Uses for Data Compression Types of Data
Compression Three Algorithms Testing Procedure
Compression/Decompression Example Findings Conclusion Page 23
26. Compression/Decompression Example Page 24 Using Compression
4, the compression ratio of the file was 1.52 with an accuracy of
99.6078 percent. M-file written to create this .wav file for
real-world compression/decompression testing. Compressed output
using Compression 2 and 4 Turn up your volume, the amplitude of the
compressed file is much lower. Compressed data should not represent
the original data string. This example demonstrates the
inefficiency of Compression 2. Original Song Compressed Song
Compression 2 Decompressed Song Compressed Song Compression 4
27. Presentation Outline Introduction Project Overview Sam
Sterns Data Compression Uses for Data Compression Types of Data
Compression Three Algorithms Testing Procedure
Compression/Decompression Example Findings Conclusion Page 25
28. Findings Page 26 Compression 2: - Generally, this algorithm
produced a compression ratio of about 1 in most cases. For simple
waveforms like the square wave, compression did occur. - Fastest
compression algorithm of the three - Inefficient compression
Compression ratio of 1 = No compression Compression 3 and 4: -
Compression ratio increases with increased data length/duration -
Increased data length/duration causes longer calculation times
Within limits - Compression 4 produced a much higher compression
ratio in comparison to other algorithms - Compression 4 is the
slowest algorithm Three compression methods Special Cases: - The
square wave produces 100% accuracy and very high compression with
all three algorithms - White Noise does not seem to compress much
past a ratio of 1 - Code has been modified to handle gaps in the
input data - The accuracy of compression/decompression for all
three algorithms has proven to be above 99% in all cases
29. Presentation Outline Introduction Project Overview Sam
Sterns Data Compression Uses for Data Compression Types of Data
Compression Three Algorithms Testing Procedure
Compression/Decompression Example Findings Conclusion Page 27
30. Future Work Page 28 - Similar waveform analysis with the
raw data files provided by Dr. Sam Sterns - Additional error or
warning messages - Noise - Gaps - Invalid array data -
Implementation of compression algorithms into Graflab database -
Investigate possibilities of real-time compression/decompression
Recommendations: - Filter noise from data prior to compression -
Compress all data, disregarding size - Continue implementation of
replacing gaps with zeros
31. Summer Work Applicability / Benefit Page 29 - Applicability
to our organization - Meaningful work - Storing new and legacy
environmental test data from the surveillance program -
Environmental Test lab data storage - Opportunity to continue
education - Improved Matlab skills - Introduction to Labview -
ORCAD familiarity - Organizational and leadership skills Management
course - Assimilation to Albuquerque, work environment at Sandia
National Laboratories, and Aircraft Compatibility [9] [10]
32. Citations and Questions [1] University of New Mexico ECE,
Dr. Samuel D. Stearns, 2010. [Online]. Available:
http://www.ece.unm.edu/faculty/stearns/. [Accessed: July 2010]. [2]
Plus Magazine, Text, Bytes and Videotape, January 1, 2003.
[Online]. Available:
http://plus.maths.org/issue23/features/data/data.jpg. [Accessed:
August 2010]. [3] Wikipedia, Data compression, July 20, 2010.
[Online]. Available: http://en.wikipedia.org/wiki/Data_compression.
[Accessed: August 2010]. [4] Hoax-slyer.com, Burning-hard-drive,
2010. [Online]. Available: http://www.hoax-
slayer.com/images/burning-hard-drive.jpg. [Accessed: August 2010].
[5] S. Sterns, Encoding and Decoding of Instrumentation and
Telemetry Waveforms. Samuel D. Sterns: Sandia National
Laboratories. January 25, 2008. [6] Wikipedia, Quantization (signal
processing), July 2, 2010. [Online]. Available:
http://en.wikipedia.org/wiki/Quantization_(signal_processing).
[Accessed: June 2010]. [7] Connexions, Linear Prediction and Cross
Synthesis, March 18, 2008. [Online]. Available:
http://cnx.org/content/m15478/latest/ . [Accessed: June 2010]. [8]
Wikipedia, Arithmetic coding, August 7, 2010. [Online]. Available:
http://en.wikipedia.org/wiki/Arithmetic_coding. [Accessed: June
2010]. [9] Rice University, Home page, 2010. [Online]. Available:
http://www.rice.edu. [Accessed: August 2010]. Appendix I
33. Citations and Questions [10] Sandia National Laboratories,
Home page, 2010. [Online]. Available: http://www.sandia.gov.
[Accessed: August 2010]. [11] T. Skousen. (private communication).
2010. [12] J. Cap. (private communication). 2010. Appendix II