22
CIT 140: Introduction to IT Slide #1 CSC 140: Introduction to IT Advanced File Processing

CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Advanced File Processing

Embed Size (px)

Citation preview

Page 1: CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Advanced File Processing

CIT 140: Introduction to IT Slide #1

CSC 140: Introduction to IT

Advanced File Processing

Page 2: CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Advanced File Processing

CIT 140: Introduction to IT Slide #2

Topics

1. Compressing files: 1. compress,

2. gzip,

3. bzip2

2. Archiving Files: tar

3. Sorting files: sort

Page 3: CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Advanced File Processing

CIT 140: Introduction to IT Slide #3

Data Compression

Problem: How can we store X bytes usingonly Y < X bytes?

Solution: Find redundancies in the data.1. Run-length encoding

Encode reptitions as the repeated value and a count.

Ex: thethethe -> the3

2. Dictionary encodingBuild dictionary of words.

Encode each with a number.

Common words: the, an, is, this

Page 4: CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Advanced File Processing

CIT 140: Introduction to IT Slide #4

Data Compression

"Ask not what your country can do for you -- ask what you can do for your country."

Dictionary:1 ask

2 not

3 what

4 your

5 country

6 can

7 do

8 for

9 you

Encoded version:

“1 2 3 4 5 6 7 8 9 – 1 3 9 6 7 8 4 5.”

Page 5: CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Advanced File Processing

CIT 140: Introduction to IT Slide #5

Compressing Files: compress

compress [-c] [-d] [-l] [-v] file1 [file2, …]

-c Send output to stdout.

-d Decompress instead of compressing.

-v Provide verbose output.

Page 6: CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Advanced File Processing

CIT 140: Introduction to IT Slide #6

Compressing Files Old SchoolThe compress command

compress [options][file-list]

Page 7: CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Advanced File Processing

CIT 140: Introduction to IT Slide #7

The uncompress command

Uncompressing Files Old School

Page 8: CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Advanced File Processing

CIT 140: Introduction to IT Slide #8

Compressing Files: gzip

gzip [-#] [-c] [-d] [-l] [-v] file1 [file2, …]

-# Specify compression level. Default=6.

-c Send output to stdout.

-d Decompress instead of compressing.

-l List compression stats.

-v Provide verbose output.

Page 9: CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Advanced File Processing

CIT 140: Introduction to IT Slide #9

Compressing Files: gzip> man bash >bash.man> man tcsh >tcsh.man> ls –l *man-rw-r--r-- 1 waldenj 267350 Oct 4 19:48 bash.man-rw-r--r-- 1 waldenj 239534 Oct 4 19:48 tcsh.man> gzip *.man> ls –l *gz-rw-r--r-- 1 waldenj 71333 Oct 4 19:45 bash.man.gz-rw-r--r-- 1 waldenj 69759 Oct 4 19:45 tcsh.man.gz> gzip –l *gz compressed uncompressed ratio uncompressed_name 71333 267350 73.3% bash.man 69759 239534 70.8% tcsh.man 141092 506884 72.1% (totals)>

Page 10: CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Advanced File Processing

CIT 140: Introduction to IT Slide #10

Uncompressing Files: gunzip> gunzip bash.man.gz> ls -l *man *gz-rw-r--r-- 1 waldenj 267350 Oct 4 19:45 bash.man-rw-r--r-- 1 waldenj 69759 Oct 4 19:45 tcsh.man.gz> gzip -v bash.manbash.man: 73.3% -- replaced with bash.man.gz> gzip -dc bash.man.gz | lessUser Commands BASH(1)NAME bash - GNU Bourne-Again Shell …> ls -l *man *gz-rw-r--r-- 1 waldenj 71333 Oct 4 19:45 bash.man.gz-rw-r--r-- 1 waldenj 69759 Oct 4 19:45 tcsh.man.gz

Page 11: CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Advanced File Processing

CIT 140: Introduction to IT Slide #11

Modern Compression: bzip2

bzip2 [-#] [-c] [-d] [-l] [-v] file1 [file2, …]

-# Specify compression level. Default=9.

-c Send output to stdout.

-d Decompress instead of compressing.

-v Provide verbose output.

Page 12: CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Advanced File Processing

CIT 140: Introduction to IT Slide #12

Modern Compression: bzip2> bzip2 -v bash.man tcsh.man bash.man: 4.821:1, 1.659 bits/byte, 79.26% saved, 267350 in,

55456 out. tcsh.man: 4.259:1, 1.878 bits/byte, 76.52% saved, 239534 in,

56236 out.> ls -l *bz2-rw-r--r-- 1 waldenj 55456 Oct 4 19:45 bash.man.bz2-rw-r--r-- 1 waldenj 56236 Oct 4 19:48 tcsh.man.bz2> bzip2 -d bash.man.bz2> bunzip2 tcsh.man.bz2> ls -l *.man-rw-r--r-- 1 waldenj 267350 Oct 4 19:45 bash.man-rw-r--r-- 1 waldenj 239534 Oct 4 19:48 tcsh.man> bzip2 -dc bash.man.bz2 |lessUser Commands BASH(1)NAME bash - GNU Bourne-Again Shell

Page 13: CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Advanced File Processing

CIT 140: Introduction to IT Slide #13

Displaying Compressed Files

zcat– Identical to compress -dc

gzcat– Identical to gzip -dc

bzcat2– Identical to bzip2 -dc

Page 14: CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Advanced File Processing

CIT 140: Introduction to IT Slide #14

Compression Benchmarks> ls -l patch*

-rw-r--r-- 1 waldenj 28944395 Oct 4 19:37 patch-2.6.13

-rw-r--r-- 1 waldenj 10238237 Oct 4 19:37 patch-2.6.13.Z

-rw-r--r-- 1 waldenj 5009926 Oct 4 19:37 patch-2.6.13.bz2

-rw-r--r-- 1 waldenj 6220228 Oct 4 19:37 patch-2.6.13.gz

Compression Tool Compression Ratio

compress 64.6%

gzip 78.5%

bzip2 82.7%

Page 15: CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Advanced File Processing

CIT 140: Introduction to IT Slide #15

Archiving Files: tar

tar [-c] [-t] [-x] [-v] [-f file.tar] file1 [file2, …]

-c Create a new tape archive.

-f Write the archive to specified fileinstead of writing to tape.

-t Trace (view) archive contents.

-v Provide verbose output.

-x eXtract archive contents.

Page 16: CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Advanced File Processing

CIT 140: Introduction to IT Slide #16

Archiving Files: tar> tar -cvf manpages.tar *.manbash.mantcsh.man> ls -l manpages.tar-rw-r--r-- 1 waldenj 512000 Oct 4 21:01 manpages.tar> tar -tf manpages.tarbash.mantcsh.man> tar -tvf manpages.tar-rw-r--r-- waldenj/students 267350 2005-10-04 19:45 bash.man-rw-r--r-- waldenj/students 239534 2005-10-04 19:48 tcsh.man> mkdir tmp> cd tmp> tar -xvf ../manpages.tarbash.mantcsh.man

Page 17: CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Advanced File Processing

CIT 140: Introduction to IT Slide #17

Other File Compression Tools

PKzip/WinZipzip, unzip

ARJarj, unarj

RARrar, unrar

Page 18: CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Advanced File Processing

CIT 140: Introduction to IT Slide #18

Sorting

Ordering set of items by some criteria.

Systems in which sorting is used include:– Words in a dictionary.– Names of people in a telephone directory.– Numbers.

Page 19: CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Advanced File Processing

CIT 140: Introduction to IT Slide #19

Sorting: sort

sort [-f] [-i] [-d] [-l] [-v] file1 [file2, …]

-d Sort in dictionary order (default.)

-f Ignore case of letters.

-i Ignore non-printable characters.

-n Sort in numerical order.

-r Reverse order of sort

-u Do not list duplicate lines in output.

Page 20: CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Advanced File Processing

CIT 140: Introduction to IT Slide #20

sort Example> cat days.txtSundayMondayTuesdayWednesdayThursdayFridaySaturday> sort days.txtFridayMondaySaturdaySundayThursdayTuesdayWednesday

Page 21: CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Advanced File Processing

CIT 140: Introduction to IT Slide #21

sort Example> cat days.txtSundayMondayTuesdayWednesdayThursdayFridaySaturday> sort -r days.txtWednesdayTuesdayThursdaySundaySaturdayMondayFriday

Page 22: CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Advanced File Processing

CIT 140: Introduction to IT Slide #22

sort Example> cat numbers.txt10155715820019> sort numbers.txt10120015571589> sort -n numbers.txt95810120015571