Chapter FourChapter Four
UNIX File ProcessingUNIX File Processing
2
Lesson A
Extracting Information from Files
3
Objectives
• Explain the UNIX approach to file processing
• Use basic file manipulation commands
• Extract characters and fields from a file using the cut command
4
Objectives
• Rearrange fields inside a record using the paste command
• Merge files using the sort command
• Create a new file by combining cut, paste, and sort
5
UNIX Approach toFile Processing
• Based on the approach that files should be treated as nothing more than character sequences
• Because you can directly access each character, you can perform a range of editing tasks – this offers flexibility in terms of file manipulation
6
Understanding UNIX File Types
• Regular files, also known as ordinary files– Create information that you maintain and manipulate, and include ASCII
and binary files – represented by a – in the 1st position of the file permissions.
• Directories– System files for maintaining file system structure – represented by a d in
the 1st position of the file permissions.
• Special files– Character special files relate to serial I/O devices
• Communicates one character at a time – represented by a c In the 1st position of the file permissions
– Block special files relate to devices such as disks• Communicates using blocks of data – represented by a b in the 1st position
of the file permissions.
7
File Structures
• Files can be structured in many ways depending on the kind of data they store
• UNIX stores data, such as letters and product records, as flat ASCII files
• Three kinds of regular files are– Unstructured ASCII character– Unstructured ASCII records– Unstructured ASCII trees
8
9
Processing Files• When performing UNIX commands, UNIX
processes data by receiving input from a standard input device (e.g. keyboard) and sends it to a standard output device (e.g.monitor)
• System administrators and programmers refer to standard input as stdin, standard output as stdout
• A third standard device is called standard error, or stderr. When UNIX detects errors, it directs the data to stderr, which is the monitor
10
Using Input and Error Redirection
• You can use redirection operators to retrieve input from something other than the standard input device and send output to something other than the standard output device
• Examples of redirection:– Redirect the ls command output to a file, instead of to
the monitor (or screen)– Redirect a program that receives input from the
keyboard to receive input from a file instead– Redirect error messages to files, instead of to the
screen by default
11
Using Input and Error Redirection
Create a file by: typing in all the commands,or by redirecting the cat command output to a file
12
Creating Files – cat and touch• When you manipulate files, you work with the files
themselves, as well as their contents• Create files using output redirection
– cat command - concatenate text via output redirection – creates a file and enters text into the file.
• cat >file1– Each character that you type will be entered into the file. To
terminate file entry, <CTRL>c.– touch command - used to create empty files or to change the timestamp on
a file• touch file1
– Creates an empty file
13
Deleting Files – rm• To delete files
– The rm command permanently removes a file or an empty directory
• rm file1 (will remove the specified file from the dir)• rm f* (will remove all files beginning with an f in the working dir)
The -r option of the rm command will The -r option of the rm command will remove a directory and everything it remove a directory and everything it contains as well as any directory contains as well as any directory beneath. beneath. Be very careful with this command. Be very careful with this command. You can remove an entire branch of the You can remove an entire branch of the directory tree!directory tree!
In the directory structure on the right, In the directory structure on the right, rm –r work rm –r work
will remove the work directory, file3, file4, the will remove the work directory, file3, file4, the projects directory, and the file spec!projects directory, and the file spec!
14
Copying files - cp
• Copy files as a means of back-up or as a means to assist with new file creation– cp command - copies the file(s) specified by
the source path to the location specified by the destination path
• cp file1 file2 (simply creates a copy of file1) • cp file1 newdir/file2 (creates a copy of file1 in the
directory newdir)• cp file1 file2 file3 newdir (copies all three files to the
directory newdir)
15
Moving Files – mv• The mv command moves a file
from one directory to another directory. – mv file1 work
• This command will remove file1 from the jdoe directory and move it to the work directory.
• The mv command can also be used to rename a file within the current directory without moving it.– mv file1 myfile
• This command will simply rename file1 to myfile. It will remain in the jdoe directory
16
Finding files - find• The find command helps you locate a file in the directory
structure by name, size, date last modified, etc.– The first parameter specifies the directory from which the search
will begin, you may search by filename using the –name parameter, by last access time using the –atime parameter, by group name using the –group parameter, by last modification time using the –mtime parameter. See ‘man find’ for a full list of parameters.
• To search for file1 from your current directory:– find . –name file1 (. Indicates the current directory)
• To search for file1 from the / directory:– find / -name file1
• To search for all files beginning with an ‘f’ from your current dir:– find . –name “f*” (You must use double quotes around a name with a
wildcard)• To search for all files from / belonging to the group Acct:
– find / -group Acct• To search for all files created or modified within the last 5 days:
– find . –mtime -5
17
Manipulating Files
18
Manipulating Files – Combining Multiple Files
• Combining files using output redirection– cat command - concatenate text of two different
files via output redirection• cat product1 product2 >combinedprods
– combinedprods will consist of all of the records in product1 followed by all of the records in product2
– paste command - joins text of different files in side by side fashion
• paste product1 product2 >sidebyside– sidebyside will consist of the records in product1 and the
records in product2 in 2 columns.
• Extracting fields of a file using output redirection– cut command - removes specific columns or fields
from a file
19
Manipulating Files
20
Manipulating Files - sort
• Re-arranging the contents of a file– sort command - sorts a file’s contents
alphabetically or numerically– The sort command offers many options:
• You can sort the contents of a file and redirect the output to another file
• Utilizing a sort key which provides the option of sorting on a field position within each line
21
Manipulating Files
22
Lesson B
Assembling Extracted Information
23
Objectives
• Create a script file
• Use the join command to link files using a common field
• Use the awk command to create a professional-looking report
24
Using Script Files
• UNIX users create shell script files to contain commands that can be run sequentially as a set – this helps with the issues of command automation and re-use of command actions
• UNIX users use the vi editor to create script files, then make the script executable using the chmod command with the x argument
25
Using Script Files
Type out the script and then make it executable using the chmod command.
26
Scripts• Scripts can be used to simply give a short name to a complex
command or to combine multiple UNIX commands into a single command.
• Your scripts should always be placed in the bin directory beneath your home directory. You may have to create this directory if you do not have one already. (mkdir bin)
• You also need to check to make sure that UNIX will find your script.– To do this, type in
• echo $PATH• View the directories that are listed. • Do you see youruserid/bin? For example, my username is marty. I look
for /home/fac/marty/bin.– If you see it – you’re fine, skip the next step. Your bin directory will be searched
any time you issue a command.
– If you DON’T see it,
» From your home directory, type in
» PATH=$PATH:bin
• Now we’re ready to create our first script….
27
Scripts cont.
• Let’s write a script called home to change our current directory to the home directory.– vi bin/home
• # This script will take you to your home directory from any location• cd ..• <Escape>:wq
– The # symbol at the beginning of a line makes the line a comment
– Now, we need to make the script executable.• chmod u+x bin/home …. Or….. chmod 700 bin/home• We will now be able to run the script by simply typing in the script name.
– home (and the script executes automatically!)
– We will discuss script files in much more detail in Chapter 6 and 7.
28
Using the Join Command• The join command is used in relational database
processing
• Relational databases consider files as tables and records as rows
• Relational databases also consider fields as columns that can be joined to create new records
• The UNIX join command lets you extract information from files sharing a common field. You can use this command to associate lines in two files on the basis of a common field that they both share.
29
30
Using the Join Command to Create the Vendor Report
Use the join command to create reports showing the relationship between two files
31
A Brief Introduction to theAwk Program
• Awk, a pattern-scanning and processing language helps to produce professional-looking reports
• The awk command lets you do the same things as the cat command (in conjunction with the join command), but more quickly and easily
32
A Brief Introduction to theAwk Program
Awk uses a print formatting function from the C programming language to achieve a more professional-looking report
33
Using the awk Command toRefine the Vendor Report
• To refine and automate the vendor report, create a shell script that includes only the awk command, not a series of separate commands. To have awk perform the automation properly, redirect its input to come from a disk file and not from the keyboard.
34
Using the awk Command toRefine the Vendor Report
Awk has many features that let you manage your report output to your specification
35
Chapter Summary• UNIX supports regular files, directories, and
character and block special files
• File’s structures depend on data being stored and three kinds of regular files are unstructured ASCII characters, records and trees
• When running, UNIX receives input from the standard input device (keyboard) also known as stdin, and sends output to the standard output device (monitor) also known as stdout. Another standard device, stderr, refers to the error file that defaults to the monitor
36
Chapter Summary
• The touch command updates a file’s time and date stamps and creates empty files
• The rmdir command removes empty directories
• The cut command extracts specific columns or fields from a file
• To combine two or more files, use the paste command
• Use the sort command to sort a file’s contents alphabetically or numerically
37
Chapter Summary
• To automate command processing, include commands in a script file that you can later execute as a program
• Use the join command to extract data from two files sharing a common field and use this field to join the two files
• Awk is a pattern-scanning and processing language useful for creating a formatted report with a professional look
38
39