Upload
antony-woods
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
CIT 140: Introduction to IT Slide #1
CSC 140: Introduction to IT
Advanced File Processing
CIT 140: Introduction to IT Slide #2
Topics
1. Compressing files: 1. compress,
2. gzip,
3. bzip2
2. Archiving Files: tar
3. Sorting files: sort
CIT 140: Introduction to IT Slide #3
Data Compression
Problem: How can we store X bytes usingonly Y < X bytes?
Solution: Find redundancies in the data.1. Run-length encoding
Encode reptitions as the repeated value and a count.
Ex: thethethe -> the3
2. Dictionary encodingBuild dictionary of words.
Encode each with a number.
Common words: the, an, is, this
CIT 140: Introduction to IT Slide #4
Data Compression
"Ask not what your country can do for you -- ask what you can do for your country."
Dictionary:1 ask
2 not
3 what
4 your
5 country
6 can
7 do
8 for
9 you
Encoded version:
“1 2 3 4 5 6 7 8 9 – 1 3 9 6 7 8 4 5.”
CIT 140: Introduction to IT Slide #5
Compressing Files: compress
compress [-c] [-d] [-l] [-v] file1 [file2, …]
-c Send output to stdout.
-d Decompress instead of compressing.
-v Provide verbose output.
CIT 140: Introduction to IT Slide #6
Compressing Files Old SchoolThe compress command
compress [options][file-list]
CIT 140: Introduction to IT Slide #7
The uncompress command
Uncompressing Files Old School
CIT 140: Introduction to IT Slide #8
Compressing Files: gzip
gzip [-#] [-c] [-d] [-l] [-v] file1 [file2, …]
-# Specify compression level. Default=6.
-c Send output to stdout.
-d Decompress instead of compressing.
-l List compression stats.
-v Provide verbose output.
CIT 140: Introduction to IT Slide #9
Compressing Files: gzip> man bash >bash.man> man tcsh >tcsh.man> ls –l *man-rw-r--r-- 1 waldenj 267350 Oct 4 19:48 bash.man-rw-r--r-- 1 waldenj 239534 Oct 4 19:48 tcsh.man> gzip *.man> ls –l *gz-rw-r--r-- 1 waldenj 71333 Oct 4 19:45 bash.man.gz-rw-r--r-- 1 waldenj 69759 Oct 4 19:45 tcsh.man.gz> gzip –l *gz compressed uncompressed ratio uncompressed_name 71333 267350 73.3% bash.man 69759 239534 70.8% tcsh.man 141092 506884 72.1% (totals)>
CIT 140: Introduction to IT Slide #10
Uncompressing Files: gunzip> gunzip bash.man.gz> ls -l *man *gz-rw-r--r-- 1 waldenj 267350 Oct 4 19:45 bash.man-rw-r--r-- 1 waldenj 69759 Oct 4 19:45 tcsh.man.gz> gzip -v bash.manbash.man: 73.3% -- replaced with bash.man.gz> gzip -dc bash.man.gz | lessUser Commands BASH(1)NAME bash - GNU Bourne-Again Shell …> ls -l *man *gz-rw-r--r-- 1 waldenj 71333 Oct 4 19:45 bash.man.gz-rw-r--r-- 1 waldenj 69759 Oct 4 19:45 tcsh.man.gz
CIT 140: Introduction to IT Slide #11
Modern Compression: bzip2
bzip2 [-#] [-c] [-d] [-l] [-v] file1 [file2, …]
-# Specify compression level. Default=9.
-c Send output to stdout.
-d Decompress instead of compressing.
-v Provide verbose output.
CIT 140: Introduction to IT Slide #12
Modern Compression: bzip2> bzip2 -v bash.man tcsh.man bash.man: 4.821:1, 1.659 bits/byte, 79.26% saved, 267350 in,
55456 out. tcsh.man: 4.259:1, 1.878 bits/byte, 76.52% saved, 239534 in,
56236 out.> ls -l *bz2-rw-r--r-- 1 waldenj 55456 Oct 4 19:45 bash.man.bz2-rw-r--r-- 1 waldenj 56236 Oct 4 19:48 tcsh.man.bz2> bzip2 -d bash.man.bz2> bunzip2 tcsh.man.bz2> ls -l *.man-rw-r--r-- 1 waldenj 267350 Oct 4 19:45 bash.man-rw-r--r-- 1 waldenj 239534 Oct 4 19:48 tcsh.man> bzip2 -dc bash.man.bz2 |lessUser Commands BASH(1)NAME bash - GNU Bourne-Again Shell
CIT 140: Introduction to IT Slide #13
Displaying Compressed Files
zcat– Identical to compress -dc
gzcat– Identical to gzip -dc
bzcat2– Identical to bzip2 -dc
CIT 140: Introduction to IT Slide #14
Compression Benchmarks> ls -l patch*
-rw-r--r-- 1 waldenj 28944395 Oct 4 19:37 patch-2.6.13
-rw-r--r-- 1 waldenj 10238237 Oct 4 19:37 patch-2.6.13.Z
-rw-r--r-- 1 waldenj 5009926 Oct 4 19:37 patch-2.6.13.bz2
-rw-r--r-- 1 waldenj 6220228 Oct 4 19:37 patch-2.6.13.gz
Compression Tool Compression Ratio
compress 64.6%
gzip 78.5%
bzip2 82.7%
CIT 140: Introduction to IT Slide #15
Archiving Files: tar
tar [-c] [-t] [-x] [-v] [-f file.tar] file1 [file2, …]
-c Create a new tape archive.
-f Write the archive to specified fileinstead of writing to tape.
-t Trace (view) archive contents.
-v Provide verbose output.
-x eXtract archive contents.
CIT 140: Introduction to IT Slide #16
Archiving Files: tar> tar -cvf manpages.tar *.manbash.mantcsh.man> ls -l manpages.tar-rw-r--r-- 1 waldenj 512000 Oct 4 21:01 manpages.tar> tar -tf manpages.tarbash.mantcsh.man> tar -tvf manpages.tar-rw-r--r-- waldenj/students 267350 2005-10-04 19:45 bash.man-rw-r--r-- waldenj/students 239534 2005-10-04 19:48 tcsh.man> mkdir tmp> cd tmp> tar -xvf ../manpages.tarbash.mantcsh.man
CIT 140: Introduction to IT Slide #17
Other File Compression Tools
PKzip/WinZipzip, unzip
ARJarj, unarj
RARrar, unrar
CIT 140: Introduction to IT Slide #18
Sorting
Ordering set of items by some criteria.
Systems in which sorting is used include:– Words in a dictionary.– Names of people in a telephone directory.– Numbers.
CIT 140: Introduction to IT Slide #19
Sorting: sort
sort [-f] [-i] [-d] [-l] [-v] file1 [file2, …]
-d Sort in dictionary order (default.)
-f Ignore case of letters.
-i Ignore non-printable characters.
-n Sort in numerical order.
-r Reverse order of sort
-u Do not list duplicate lines in output.
CIT 140: Introduction to IT Slide #20
sort Example> cat days.txtSundayMondayTuesdayWednesdayThursdayFridaySaturday> sort days.txtFridayMondaySaturdaySundayThursdayTuesdayWednesday
CIT 140: Introduction to IT Slide #21
sort Example> cat days.txtSundayMondayTuesdayWednesdayThursdayFridaySaturday> sort -r days.txtWednesdayTuesdayThursdaySundaySaturdayMondayFriday
CIT 140: Introduction to IT Slide #22
sort Example> cat numbers.txt10155715820019> sort numbers.txt10120015571589> sort -n numbers.txt95810120015571