DESCRIPTION
Mario Dantas Unix-perl
Citation preview
available in many organizations can be gather to
solve a large amount of problems from several research areas.
Biology represents an example of an areas that can improve
experiments through the use of
these distributed resources.
Motivation
• Workfow:Workfow:
% represent a% represent a execution flo)execution
flo) )hich)hich data are passed bet)een somedata are passed
bet)een some tas*s obeying rules previouslytas*s obeying rules
previously defined.defined.
+ ,ntology-,ntology-
,ntology can be expressed as a,ntology can be expressed as
a
formalformal and explicit specification fromand explicit
specification from
aa sharedshared conceptconcept..
'ac( or%ani&ation'ac( or%ani&ation
develops yourdevelops your
ontolo%yontolo%y
uni*ue ontolo%yuni*ue ontolo%y
0rovider module Matchma*er
Motivation
Motivation
Motivation
Motivation
Monitoring Interface
(4/7)(4/7)
Motivation
Motivation
,b9ective
References
+ http-//))).pasteur.fr/=te*aia/B3G<useful
<lin*s.html
• http://google.com/linux
+ U:3&- 4>8> 0!0%$ minicomputer
+ 0!0%$ goes a)ay' re)ritten on 0!0%44 to ?help patent
la)yers@
+ A4- 4>$4
+ #+- 4>$8 1re)ritten in 3' base for B&!2
+ A$- 4>$> 1icensed' portable2
PDP-11
+ ;enix- %E &3,
+ &tandardization 10osix' ;/,pen2
FBut Then The euding Began
+ Unix International vs. Open Software Foundation
1to compete )ith des*top 03s2
+ Battle of the Dindo) Managers
Openlook Motif
Most popular U:; variant
ree )ith G:U license
:etB&! 14>>(' focus on portability2
,penB&! 14>>8' focus on security2
ree )ith B&! license
!ar)in
+ pple abandoned old Mac ,& for U:; 0urchased :e;T
in !ecember 4>>8
Unveiled in "###
,pen &ource
+ TITLs legal concerns
:ot allo)ed to enter computer business but needed to )rite
soft)are to help )ith s)itches
icensed cheaply or free
performance' better code
&3, vs. inux + Jan 00- &3, releases Ancient
Unix - B&! style
licensing of A7/A8/A$/("A/&ystem
+ /arc( 003- &3, sues BM for N( billion. lleges contributions
to inux come from proprietary licensed code ; is based
on &ystem A r6' no) o)ned by &3,
+ Au% 003- Jvidence released 3ode traced to ncient U:;
snLt in >#O of all running inux distributions
lready dropped from inux in Puly
More complex functionality by combining programs
Ma*e every program a filter
0ortability better for rapidly changing hard)are
+ Use flat &3 files 3ommon' simple file format
1yesterdayLs ;M2
Jxample of portability over efficiency
+ Reusable code Good programmers )rite good codeQ
great programmers borro) good code
..continued
print $(who | awk '{print $1}' | sort | uniq) | se 's/
/!/g'
who "##
/016 line$
• il" protot*pe$ ikl* hi)h le%el interprete" ln))e$
..continued
0roblems )ith scale
+ &ilence is golden
+ Thin* hierarchically
+ Multitas*ing and multiuser capability for
minicomputer
+ nter%process communication 0ipes- output of one programmed
fed into input of another
+ &oft)are tools
+ !evelopment tools
+ &cripting languages
+ T30/0
+ The *ernel is F
a program loaded into memory during the boot process' and
al)ays stays in physical memory.
#wa!!er
nter%process communication 1032
+ Memory management Airtual memory
+ ( main categories ile/device manipulation
nformation manipulation + e.g. getuid(), time()
Jnter at login: prompt
+ Jntering commands
+ Typical format- command options ar"uments
+ Jxamples- who' ate' ls' cat m*+ile'
ls ,l
3ase sensitive
+ Dindo)s
+ U)2 A4546 – http://www.research.att.com/sw/tools/uwin
+ Red Cat / edora commerical success
+ Ubuntu currently most popular' based on !ebian. ocus on
des*top
+ Gentoo portability
+ ncludes files and terminals
+ ntegration of storage devices
+ Co) processes share 30U' memory and signals
+ &cheduling
+ unctionality-
/bin/csh 3 shell
/bin/tcsh Jnhanced 3 &hell
/bin/*sh orn shell
/bin/bash ree *sh clone
3ommand history
+ set of shell commands that constitute an
executable pro"ram
+ shell script is a regular text file that contains shell or
U:; commands
&imple 3ommands
+ simple command - se5uence of non blan*s arguments separated
by blan*s or tabs.
+ 4st argument 1numbered zero2 usually specifies the name of the
command to be executed.
+ ny remaining arguments- re passed as arguments
to that command.
rguments may be filenames' pathnames' directories or
special options 1up to command2
&pecial characters are interpreted by shell
+ 0arsing into command in arguments is called splittin"
$
!epends on command
+ 7(atis' apropos
". &ystem calls
8. Games
S"N#$SIS
ls ; options ] [ file ... ] %!S&RI$'I#N
For eh "iretor* r)+ent ls li$t$ the ontent$< for eh
file r)+ent the n+e n" ree$te" infor+tion re li$te". The
rrent "iretor* i$ li$te" if no file r)+ent$ pper. The li$tin) i$
$orte" &* file n+e &* "eflt0 e7ept tht file r)+ent$
re li$te" &efore "iretorie$.
.
-)0 --classif*
=ppen" hrter for t*pin) eh entr*.
-l0 --lon+,verbose
-r0 --reverse
-R 0 --recursive
S!! A-S#
h+o"10 fin"10 )etonf10 t#1
ls 0 1S!R
undamentals of &ecurity
+ U:; systems have one or more users' identified )ith a number and
name.
+ set of users can form a group. user can be a member of
multiple groups.
+ special user 1id #' name root2 has complete control.
+ Jach user has a primary 1default2 group.
Co) are Users I Groups usedK
+ Used to determine if file or process operations can be
performed-
3an a given file be readK )ritten toK
3an this program be runK
3an use this piece of hard)areK
3an stop a particular process thatLs runningK
simple example
$
$r
.profile
+oo
Case sensitive.
$r
.profile
etc
Case sensitive.
$r
$r
.profile
One per process.
$r
A pathname relative to the woring directory !as
opposed to absolute pathname "
.. refers to parent directory
. refers to current directory
iles and !irectories
+ iles are 9ust a se5uence of bytes :o file types 1data vs.
executable2
:o sections
Jxample of U:; philosophy
+ !irectories are a list of files and status of the files-
3reation date
ttributes
etc.
+ Jach user has a #ome directory
+ Most shells 1*sh' csh2 support operator- – expands to
my home directory
• /m*+ile /home/korn0/m*+ile
– user expands to userLs home directory • unixtool/+ile&
/home/unixtool/+ile&
+ Useful because home directory locations vary by machine
Mounting ile &ystems
+ Dhen U:; is started' the directory hierarchy corresponds to the
file system located on a single dis* called the root de$ice.
+ %ountin" allo)s root to splice the root directory of a
file system into the existing directory hierarchy.
+ ile systems created on other devices can be attached to the
original directory hierarchy using the mount mechanism.
+ 3ommands mount and umount manage
/
0rinting ile 3ontents
+ The cat command can be used to copy the contents of a file
to the terminal. Dhen invo*ed )ith a list of file names' it
concatenates them.
+ &ome options-
display control%characters in visible form 1e.g. -,2
+ p7d print process )or*ing dir
+ ed! vi! emacsF create/edit files
+ ls list contents of directory
+ rm remove file
+ mv rename file
+ file determine file contents
ile 0ermissions
+ U:; provides a )ay to protect files based on users and
groups.
+ Three types of permissions- + read' process may read
contents of file
+ )rite' process may )rite contents of file
+ execute' process may execute file
+ Three sets of permissions- + permissions for
o)ner
+ permissions for group 14 group per file2
+ permissions for other
+ &ame types and sets of permissions as for files-
read- process may a read the directory contents 1i.e.'
list files2
7rite- process may add/remove files in the directory
+ c(mod change file permissions
+ c(o7n change file o)ner
+ umas9 user file creation mode mas*
+ only o)ner or super%user can change file attributes
+ example- c$mod %r file
+ ,ctal access modes octal read 7rite eecute # no no no
4 no no yes
The ,pen ile Table
+ /, operations are done on files by first openin% them'
reading/)riting/etc.' then closin% them.
+ The *ernel maintains a global table containing information about
each open file.
Inode ode &ount
$osition
+ Jach process contains a table of files it has opened.
+ nherits open files from parent
+ Jach open file is associated )ith a num:er or (andle'
called file descriptor ' fd6.
+ Jach entry of this table points to an entry in the open file
table.
+ l)ays starts at #
+ 3onvenient for *ernel
+ :umbering scheme can be local to process 1# .. 4"V2
+ Jxtra information stored-
&tandard in/out/err
+ The first three entries in the file descriptor table are special
by convention-
cat
• 'ntr* 1 i$ for otpt
• 'ntr* 2 i$ for error
+e$$)e$
!evices
+ Besides files' input and output can go from/to various hard)are
devices
+ U2); innovation- Treat these 9ust li*e filesH – /e/tt*! /e/lpr!
/e/moem
+ By default' standard in/out/err opened )ith /e/tt*
Redirection
+ Before a command is executed' the input and output can be changed
from the default 1terminal2 to a file &hell modifies
file descriptors in child process
The child program *no)s nothing about this
ls ls
+ ppend output- EE
+ &pecial files in /e directory Jxample-
/e/tt*
Jxample- /e/lp
• cat -ig2+ile 3 /e/null
in*s + !irectories are a list of files and directories.
Jach directory entry lin's to a file on the dis*
T)o different directory entries can lin* to the same file +
n same directory or across different directories
Moving a file does not actually move any data around. +
3reates lin* in ne) location
+ !eletes lin* in old location
+ ln command
&ymbolic lin*s
+ Sym:olic lin*s are different than regular lin*s 1often
called (ard lin9s2. 3reated )ith ln -s
+ 3an be thought of as a directory entry that points to the
name of another file.
+ !oes not change lin* count for file Dhen original deleted'
symbolic lin* remains
+ They exist because- Card lin*s donLt )or* across file
systems
Card lin*s only )or* for regular files' not
directories
2ard link S*mbolic -ink
$ontents of file $ontents of file dirent
dirent
"+r #+4
"+r #+4
"+r #+4
et
"+r #+4
Tree Dal*ing
+ Co) can do )e find a set of files in the
hierarchyK
+ ,ne possibility- – ls ,l ,7 /
+ Dhat about- ll files belo) a given directory in the
hierarchyK
ll files since Pan 4' "##4K
ll files larger than 4#K
+ find pat#list expression
+ find recursively descends through pat#list
and applies expression to every file.
+ expression can be- – name pattern
+ true if file name matches pattern. 0attern may include shell
patterns such as X' must be in 5uotes to suppress shell
interpretation.
Jg- +in / name '8.c'
find utility 1continued2
• perm Y%-Zmode ind files )ith given access mode'
mode must be in octal.
Jg- +in . "## • t*pe c#
ind files of type c# 1c [character' b[bloc*'
f for plain file' etc..2. Jg- +in /home ,t*pe +
• user userid /username ind by o)ner
userid or username
• group "roupid /"roupname ind by group
"roupid or "roupname
• si9e si(e ile size is at least si(e
+ many more)
+ op* %a op+ matches both patterns op* and
op+
+ op* %o op+ matches either op* or
op+
• ( ) group expressions together
• print prints out the name of the current file 1default2
• exec cmd Jxecutes cmd ' )here
cmd must be terminated by an
escaped semicolon 1; or ';'2.
f you specify {} as a command line argument' it is
replaced by the name of the current file 9ust found.
– exec executes cmd once per file.
Jxample- • +in name <8.o< exec rm <{}<
<;<
find Jxamples
+ ind all files beneath home directory beginning )ith f – +in
name '+8' print
+ ind all files beneath home directory modified in last day – +in
mtime 1 print
+ ind all files beneath home directory larger than 4# – +in si9e 1k
print
+ 3ount )ords in files under home directory – +in exec wc w {} ;
print
+ Remove core files – +in / name core ,exec rm {} ;
diff- comparing t)o files
+ diff - compares t)o files and outputs a description of their
differences Usage- diff YoptionsZ
file* file+ – i- ignore case
apples oranges walnuts
apples oranges grapes
+ cmp Tests t)o files for e5uality
f e5ual' nothing returned. f different' location of first
differing byte returned
aster than diff for chec*ing e5uality
+ comm Reads t)o files and outputs three columns-
+ ines in first file only
+ ines in second file only
+ ines in both files
+ nformation about each process.
+ 8rocess ta:le- contains an entry for every process in the
system.
+ $pen-file ta:le- contains at least one entry for every open file
in the system.
Co"e
+ Definitions
pro%ram- collection of bytes stored in a file that can be
run
ima%e- computer execution environment of program
process- execution of an image
+ Unix can execute many processes simultaneously.
+ for9 is typically follo)ed by an eec
AA A
A B
0rocess &etup
+ ll of the per process information is copied )ith the
for9 operation
Dor*ing directory
+ Before eec' these values can be modified
else
}
Unix process genealogy 0 r o c e s s g e n e r a t i o n
g e t t y
i n i t
i n i t
n i t
e x e c s
n i t p r o c e s s 4
Bac*ground Pobs
+ By default' executing a command in the shell )ill )ait for it to
exit before printing out the next prompt
+ Trailing a command )ith I allo)s the shell and command to run
simultaneously
$ /-in/sleep 1 > ?1@ %&% $
0rogram rguments
+ Dhen a process is started' it is sent a list of strings – arg'
argc
Jnding a process
+ Dhen a process ends' there is a return code associated )ith the
process
+ This is a positive integer
# means success
+ 0rocess group id number used to identify set of
processes
+ Umas* !efault file permissions for ne) file
-e #a$en.t tal'ed about t#ese yet/
+ Jffective user and group id The user and group this
process is running )ith
permissions as
+ Real user and group id The user and group that invo*ed the
process
+ Jnvironment variables
&etuid and &etgid Mechanisms
+ The *ernel can set the effective user and group ids of a process
to something different than the real user and group iles
executed )ith a setuid or setgid flag set
cause the these values to change
+ Ma*e it possible to do privileged tas*s- 3hange your
pass)ord
+ eys and values are strings
+ 0assed to children processes
+ 3ommon examples- 8A4=- Dhere to search for programs
4'>/- Terminal type
The 0TC environment variable
+ 3olon%separated list of directories.
+ :on%absolute pathnames of executables are only executed if found
in the list. &earched left to right
+ Jxample- $ myprogram s$& myrogr#m not found
$ PATH=/bin:/usr/bin:/home/kornj/bin $ myprogram $ello
$ ls +oo $ foo sh: +oo: not +oun
$ PATH=:/bin $ ls +oo $ !d /usr/badguy $ ls Aongratulations!
*our +iles hae -een remoe an *ou hae 0ust sent email to Bro+. Corn
challenging him to a +ight.
$ ./foo 4ello! +oo.
&hell Aariables + &hells have several mechanisms for
creating
variables. variable is a name representing a string value. Jxample-
8A4=
&hell variables can save time and reduce typing
errors
+ llo) you to store and manipulate information Jg- ls
'./0 '/
+ T)o types- local and environmental local
are set by the user or by the shell itself
Aariables 1conLt2
+ &yntax varies by shell – arnameDalue E sh! ksh – set arname D
alue E csh
+ To access the value- $arname
?8A4= list of directories to search for
?/A), bsolute pathname to mailbox
?US'> our user id
?4'>/ Type of your terminal
?8S1 0rompt
+ 0assing arguments' environment
+ Read/)rite regular files
&ignals + Si%nal- message a process can send to a
process
or process group' if it has appropriate permissions.
+ Message type represented by a symbolic name
+ or each signal' the receivin% process can<
Jxplicitly ignore signal
&pecify action to be ta*en upron receipt 1 si%nal
(andler 2
,ther)ise' default action ta*es place 1usually process is
*illed2
+ 3ommon signals- &G' >JRM' &G:T
&G&T,0' &G3,:T
&G&JGA' &GBU&
n Jxample of &ignals
+ Dhen a child exists' it sends a S)GC=,. signal to its
parent.
+ f a parent )ants to )ait for a child to exit' it tells the system
it )ants to catch the S)GC=,. signal
+ 9ill send a signal to a pid
+ 7ait parent process )ait for one of its children to
terminate
+ no(up ma*es a command immune to the hangup and terminate
signal
+ sleep sleep in seconds
0ipes
+ General idea- The input of one program is the output of the
other' and vice versa
+ Both programs run at the same time
A B
+ 3ould this be done )ith filesK
A B
+ Unnecessary use of the dis* &lo)er
3an ta*e up a lot of space
+ Ma*es no use of multi%tas*ing
More about pipes
+ Dhat if a process tries to read data but nothing is availableK
U:; puts the reader to sleep until data available
+ Dhat if a process canLt *eep up reading from the process thatLs
)ritingK U:; *eeps a buffer of unread data
+ This is referred to as the pipe si(e.
f the pipe fills up' U:; puts the )riter to sleep until the
reader frees up space 1by doing a read2
+ Multiple readers and )riters possible )ith pipes.
3alled filters
A B
+ , 1named pipes2 special file that )hen opened
represents
pipe
0ipelines
+ ,utput of one program becomes input to another Uses
concept of U:; pipes
+ Jxample- $ who | wc l counts the number of users logged
in
+ 0ipelines can be long
vs.
oth of the$e o++n"$ $en" inpt to !ommand fro+
file in$te" of the ter+inl8
+ class of Unix tools called filters.
Utilities that read from standard input' transform the file'
and )rite to standard out
+ Using filters can be thought of as data
oriented pro"rammin" .
,utput- lines from the file sorted
+ Grep
,utput- lines that match the argument
+ A79
cat- The simplest filter
+ The $at command copies its input to output unchanged
1identity filter 2. Dhen supplied a list of file names' it
concatenates them onto stdout.
+ &ome options- – n number output lines 1starting from 42
– display control%characters in visible form 1e.g. -,2
!at file
+ &yntax- #ead 0&n1 0filename2221
&n % number of lines to display' default is
4#
filename222 % list of filenames to display
+ !isplays the last part of a file
+ &yntax- tail 34&number 0lbc1 0f1 0filename1
or- tail 34&number 0l1 0rf1 0filename1
3number % begins copying at distance
number from
beginning of file' if number isnLt given' defaults to
4#
&number % begins from end of file
l5b5c % number is in units of
lines/bloc*/characters
r % print in reverse order 1lines only2
tail *f /usr/lo!al/httpd/a!!ess,log
tee
+ 3opy standard input to standard output and one or more
files
3aptures intermediate results from a filter in the
pipeline
Unix 3ommand &tandard output
tee conLt
+ &yntax- tee 0 &ai 1 file&list
&a % append to output file rather than over)rite'
default is to over)rite 1replace2 the output file
&i % ignore interrupts
file&list % one or more file names for
capturing output
+ Jxamples
who | tee user2list | wc
Grun Iowm*a "
AJKB111|&&#&%&%|G--ot! Gnrew Fohn |"&"|1|K
AJKB&11|&&11&&&|G-ur0h! Iaee
|%|&|K AJKB111|&|Gccent! GacLkKurhg
|%|1|K AJKB1&1|&&"|Gison! Mlair |"1|1|N
AJKB%1&|&1"#|Gllen! Oai Beter |%#|%|K
AJKB%1|&1"#|Gllen! Oai Bater |%#|%|K
root:P4ol4G4PwQGs&:::root:/root:/-in/ksh
0as:nF9ru#a/%%Co:1:1:Fohn Ihepher:/home/0as:/-in/ksh
cs1&1:iPsJJ#ePR:11:11:AJKB1&1:/home/cs1&1:/-in/-ash
cs&%1:rSCwIIBqkT*G:1&:1&:AJKB&%1:/home/cs&%1:/-in/csh
cs11:mT7iAUmtUJ&:1:1:AJKB11:/home/cs11:/-in/sh
%ab &eparated 'ipe(separated
cut- select columns
+ The $ut command prints selected parts of input lines.
can select columns 1assumes tab%separated input2
can select a range of character positions
+ &ome options- – -f listOf,ols: print only the specified
columns 1tab%
separated2 on output
– -c listOf6os: print only chars in the specified
positions
– -d c : use character c as the column
separator
!ut %! 1%. # data
column+ without counting the columns.
paste- 9oin columns
+ The !a#te command displays several text files \in parallel\
on output.
+ f the inputs are files a' b' c the first line of
output is composed
of the first lines of a' b' c
the second line of output is composed of the second lines of
a' b' c
+ ines from each file are separated by a tab character.
+ f files are different lengths' output has all lines from longest
file' )ith empty strings for missing lines.
1 &
cut + & = ata 3 ata&
cut + = ata 3 ata
sort- &ort lines of a file
+ The #ort command copies input to output but ensures that the
output is arranged in ascending order of lines. By default'
sorting is based on &3 comparisons
of the )hole line.
+ ,ther features of #ort- understands text data that occurs
in columns.
1can also sort on a column other than the first2
can distinguish numbers and sort appropriately
can sort files \in place\ as )ell as behaving li*e a
filter
capable of sorting $ery lar"e files
sort- ,ptions
+ &yntax- sort 0&dftnr1 0&o filename1
0filename7s81 &d !ictionary order' only letters'
digits' and )hitespace
are significant in determining sort order
&f gnore case 1fold into lo)er case2
&t &pecify delimiter
&n :umeric order' sort by arithmetic value instead of
first digit
&r &ort in reverse order
&ofilename % )rite output to filename' filename can be the
same as one of the input files
+ ots of more optionsF
• V&.1 , V ,& Vn
Jxclusive
+ :e) )ay- – k
f ?.c @?options@?!f ?.c @?options@@
• k&.1 ,k!1 ,kn
nclusive
sort t: V% /etc/passw
+ Remove or report ad9acent duplicate lines
+ &yntax- uni9 0 &cdu1 0input&file1 0
output&file1
c &upersede the %u and %d options and generate an output
report )ith each line preceded by an occurrence count
Drite only the duplicated lines
u Drite only those lines )hich are not duplicated
)c- 3ounting results
+ The )ord count utility' 7c' counts the number of lines'
characters or )ords
+ ,ptions-
sort +ile | uniq ,u | wc l
tr- TRanslate 3haracters
+ 3opies standard input to standard output )ith substitution or
deletion of selected characters
+ &yntax- tr 0 &cds 1 0 strin"* 1 0 strin"+ 1 •
delete all input characters contained in strin"*
• c complements the characters in strin"* )ith respect
to the entire &3 character set
tr 1continued2
+ tr reads from standard input. ny character
that does not match a character in
strin"* is passed to standard
output unchanged
ny character that does match a character in
strin"* is translated into the corresponding character in
strin"+ and then passed to standard output
+ Jxamples tr s ( replaces all instances of
s )ith z
tr so (x replaces all instances of s )ith
( and o )ith x
tr a&( :&; replaces all lo)er case characters
)ith upper case characters
tr <d a&c deletes all a%c characters
+ Re)rite numbers tr !. .!
+ mport !,& files tr , XrX = os2+ile
xargs
+ Unix limits the size of arguments and environment that can be
passed do)n to child
+ Dhat happens )hen )e have a list of 4#'### files to send to a
commandK
+ ar%s solves this problem Reads arguments as standard
input &ends them to commands that ta*e file lists
May invo*e program several times depending on size
of arguments
a - a
find utility and xargs
+ +in . t*pe + print | xargs wc l – t*pe + for files –
print to print them out
ar%s invo*es 7c 4 or more times
• wc l a - c e + g wc l h i 0 k l m n o Y
+ 3ompare to- +in . t*pe + ,exec wc l {} ;
%rep command
0rocesses- ps' *ill
cut' paste
ile name arguments
+ Regular Jxpressions llo) you to search for text in
files
%rep command
xargs
+ Unix limits the size of arguments and environment that can be
passed do)n to child
+ Dhat happens )hen )e have a list of 4#'### files to send to a
commandK
+ ar%s handles this problem Reads arguments as standard
input &ends them to commands that ta*e file lists
May invo*e program several times depending on size
of arguments
a - a
find utility and xargs
+ +in . t*pe + print | xargs wc l t*pe + for files
print to print them out
ar%s invo*es 7c 4 or more times
• wc l a - c e + g wc l h i 0 k l m n o Y
+ 3ompare to- +in . t*pe + ,exec wc l {} ;
Dhat s a Regular JxpressionK
+ regular expression 1re"ex 2 describes a set of
possible input strings.
+ Re"ular expressions descend from a fundamental concept in
3omputer &cience called finite automata theory
+ Re"ular expressions are endemic to Unix vi' ed' sed'
and emacs
a79' tcl' perl and 8yt(on
%rep' e%rep' f%rep
compilers
Regular Jxpressions
+ The simplest regular expressions are a string of literal
characters to match.
regular e#pression c k s
Regular Jxpressions
+ regular expression can match a string in more than one
place.
Icrapple +rom the apple.
+ The regular expression can be used to match any
character.
Nor me to poop on.
match match 0
3haracter 3lasses
+ 3haracter classes ?@ can be used to match any specific set
of characters.
-eat a -rat on a -oat
match match 0
match
-eat a -rat on a -oat
match
More bout 3haracter 3lasses
– ?aeiou@ )ill match any of the characters a' e' i' o' or
u
– ?kC@orn )ill match korn or Corn
+ Ranges can also be specified in character classes – ?1@ is
the same as ?1&%#"Q@
– ?a-ce@ is e5uivalent to ?ae@
ou can also combine multiple ranges
•?a-ce1&%#"Q@ is e5uivalent to ?ae1@
:amed 3haracter 3lasses
+ 3ommonly used character classes can be referred to by name
1alp#a' lower 5 upper 5
alnum' di"it ' punct ' cntrl 2
+ &yntax ?:name:@
nchors
match
match
\$\wor$
Repetition
match
match
.8
regular e#pression a . 8 e
+ match )ill be the longest string that satisfies the
regular expression.
no
Repetition Ranges
+ Ranges can also be specified – { } notation can specify
a range of
repetitions for the immediately preceding regex
– {n} means exactly n occurrences – {n!} means at
least n occurrences – {n!m} means at least
n occurrences but no
more than m occurrences
&ubexpressions
+ f you )ant to group part of an expression so that 8 or {
} applies to more than 9ust the previous character' use (
) notation
+ &ubexpresssions are treated li*e a single
character
– a8 matches # or more occurrences of a
– a-c8 matches a-' a-c' a-cc' a-ccc' F
– (a-c)8 matches a-c' a-ca-c' a-ca-ca-c' F
– (a-c){&!} matches a-ca-c or a-ca-ca-c
grep
• grep comes from the e 1Unix text editor2 search
command ?%lobal r egular expression print@ or g/re/p
+ This )as such a useful command that it )as )ritten as a
standalone utility
+ There are t)o other variants' egrep and fgrep that
comprise the "rep family
+ %rep % uses regular expressions for pattern matching
+ f%rep % file grep' does not use regular expressions' only
matches fixed strings but can get search strings from a file
+ e%rep % extended grep' uses a more po)erful set of regular
expressions but does not support bac*referencing' generally the
fastest member of the grep family
+ a%rep approximate grepQ not standard
&yntax
+ Regular expression concepts )e have seen so far are common to
%rep and e%rep.
+ grep and egrep have slightly different syntax
%rep- BRJs
+ Ma9or syntax differences-
%rep- ( and )' { and }
0rotecting Regex Metacharacters
+ &ince many of the special characters used in regexs also have
special meaning to the shell' itLs a good idea to get in the habit
of single 5uoting your regexs
This )ill protect any special characters from being operated
on by the shell
Jscaping &pecial 3haracters
+ Jven though )e are single 5uoting our regexs so the shell )onLt
interpret the special characters' some characters are special to
%rep 1eg 5 and 2
+ To get literal characters' )e escape the character )ith
a @ 1bac*slash2
+ &uppose )e )ant to search for the character se5uence a8-8
Unless )e do something special' this )ill match zero
or
more ^aLs follo)ed by zero or more ^bLs' not w#at we
want
Jgrep- lternation
+ Regex also provides an alternation character | for matching
one or another subexpression – (H|Nl)an )ill match ^TanL or
^lanL – \(Nrom|Iu-0ect): )ill match the rom and
&ub9ect
lines of a typical email message + t matches a beginning of line
follo)ed by either the characters
^romL or ^&ub9ectL follo)ed by a ^-L
+ &ubexpressions are used to limit the scope of the alternation
– Gt(ten|nine)tion then matches ?ttention@ or
Jgrep- Repetition &horthands
+ The 8 1star2 has already been seen to specify zero or more
occurrences of the immediately preceding character
• V 1plus2 means ?one or more@ a-cV )ill match
^abcdL' ^abccdL' or ^abccccccdL
but )ill not match ^abdL
J5uivalent to {1!}
Jgrep- Repetition &horthands cont
+ The ^]L 15uestion mar*2 specifies an optional character' the
single character that immediately precedes it Ful*] )ill match
^PulL or ^PulyL
J5uivalent to {!1} lso e5uivalent to (Ful|Ful*)
+ The 8' ]' and V are *no)n as 9uantifiers because they
specify the 5uantity of a match
+ ]uantifiers can also be used )ith subexpressions –
(a8c)V )ill match ^cL' ^acL' ^aacL or ^aacaacacL but )ill
not
match ^aL or a blan* line
Grep- Bac*references
+ &ometimes it is handy to be able to refer to a match that )as
made earlier in a regex
+ This is done using bac'references – n is the bac*reference
specifier' )here n is a number
+ oo*s for nth subexpression
+ or example' to find if the first )ord of a line is the same as
the last- – \(??:alpha:@@{1!}) .8 1$
The (??:alpha:@@{1!}) matches 4 or more letters
+ Aariable names in 3 – ?a9GP2@?a9GP2@8
+ !ollar amount )ith optional cents – $?@V(.?@?@)]
+ Time of day – (1?1&@|?1@):?#@?@ (am|pm)
+ CTM headers Wh4E WC4E Wh"E F – =?h4@?1%@3
+ &yntax "rep 0&#iln$1 0&e expression1
0filename1
e"rep 0&#iln$1 0&e expression1 0&f filename1
0expression1 0filename1
f"rep 0&#ilnx$1 0&e strin"1 0&f filename1 0strin"1
0filename1 -( !o not display filenames -i
gnore case -l ist only filenames containing matching
lines -n 0recede each matching line )ith its line
number -v :egate matches - Match )hole
line only 1f"rep only2 -e expression
&pecify expression as option -f filename Ta*e
the regular expression 1egrep2
or a list of strings 1fgrep2 from filename
grep Jxamples
• grep 'men' ^repKe • grep '+o8' ^repKe • egrep '+oV' ^repKe •
egrep n '?Ht@he' ^repKe • +grep 'Hhe' ^repKe • egrep '[AV?@8G]'
^repKe • +grep + exp+ile ^repKe
• Fin" ll line$ #ith $i)ne" n+&er$ $
egrep 0%'(%3'45(%30 ! -search. c: return 1;
compile. c: strchr(<V1&8<! t3 op)?1@ XX! st! conert. c:
Brint integers in a gien -ase &1 (e+ault 1) conert. c: sscan+(
arg? iV1@! <_ <! >-ase); strcmp. c: return 1; strcmp. c:
return V1;
• e+rep h$ it$ li+it$8 For e7+ple0 it nnot +th ll line$
tht
ontin n+&er "i%i$i&le &* .
un )ith the !ictionary
• /usr/ict/wors contains about "7'### )ords – egrep hh
/usr/ict/wors
+ beachhead
+ highhanded
+ )ithheld
+ )ithhold
+ e%rep as a simple spelling chec*er- &pecify plausible
alternatives you *no) egrep <n(ie|ei)ther<
/usr/ict/wors
neither
+ Co) many )ords have ( aLs one letter apartK – egrep a.a.a
/usr/ict/wors | wc ,l
+ 76 – egrep u.u.u /usr/ict/wors
+ cumulus
+ Use /e/null as an extra file name
Dill print the name of the file that matched • grep test
-ig+ile
– Hhis is a test.
• grep test /e/null -ig+ile – -ig+ile:Hhis is a test.
7*B
Or"inr* hrter$ +th the+$el%e$ N'!(@N'S n" +ethrter$ e7l"e" Or"inr*
$trin)$ +th the+$el%e$
+ E .
r1r2
Mthe$ literl hrter m Strt of line 'n" of line =n* $in)le hrter =n*
of 70 *0 0 E0 or B =n* one hrter other thn 70 *0 0 E0 or B =n*
$in)le hrter in )i%en rn)e Bero or +ore orrene$ of re)e7 r Mthe$ r1
follo#e" &* r2
r n
Hn0+I
T))e" re)lr e7pre$$ion0 +the$ r Set to #ht +the" the nth t))e"
e7pre$$ion n J 1-/ Repetition
rK rL
r1r2
Hn0+I
One or +ore orrene$ of r ero or one orrene$ of r 'ither r1 or
r2
'ither r1r3 or r2r3 ero or +ore orrene$ of r1r20 e.).0 r10 r1r10
r2r10 r1r1r2r10? Repetition
fgrep, grep, egrep
o.8o
&ed- &tream%oriented' :on% nteractive' Text
Jditor
+ oo* for patterns one line at a time' li*e %rep
+ ,#an"e lines of the file
+ :on%interactive text editor Jditing commands come in
as script
There is an interactive editor ed )hich accepts
the same commands
+ Unix filter &uperset of previously
mentioned tools
nput line 10attern &pace2
+ 3ommands in a sed script are applied in order to each
line.
+ f a command changes the input' subse5uent command )ill be applied
to the modified line in the pattern space' not the
original input line.
+ The input file is unchanged 1sed is a filter2.
+ Results are sent to standard output
+ script is nothing more than a file of commands
+ Jach command consists of up to t)o addresses and an action'
)here the address can be a regular expression or line
number.
address action command
&ed lo) of 3ontrol
+ sed then reads the next line in the input file and
restarts from the beginning of the script file
+ ll commands in the script file are compared to' and
potentially act on' all lines in the input file
. . .cmd cmd ncmd 0
sed 0&n1 0&f scriptfile1 0file)1
-n % only print lines specified )ith the print command
1or the ^pL flag of the substitute 1^sL2 command2
-f scriptfile % next argument is a filename
containing editing commands
-e command % the next argument is an editing
command rather than a filename' useful if multiple commands are
specified
+ sed commands have the general form 0address05
address110>1command 0ar"uments1
+ sed copies each input line into a pattern space
f the address of the command matches the line in the
pattern space' the command is applied to that line
f the command has no address' it is applied to each line as
it enters pattern space
f a command changes the line in pattern space'
subse5uent commands operate on the modified line
ddressing
+ n address can be either a line number or a pattern'
enclosed in slashes 1 / pattern/ 2
+ pattern is described using re"ular
expressions 1BRJs' as in %rep2
+ f no pattern is specified' the command )ill be applied to
all lines of the input file
+ To refer to the last line- $
ddressing 1continued2
+ Most commands )ill accept t)o addresses f only one address
is given' the command operates
only on that line
+ command is a single letter
+ Jxample- !eletion- •?aress1@?!aress&@
!elete the addressed line1s2 from the pattern
spaceQ line1s2 not passed to standard output.
ddress and 3ommand Jxamples
• deletes the all lines • deletes line 8 • /\$/
deletes all blan* lines • 1!1 deletes lines 4 through 4# •
1!/\$/ deletes from line 4 through the first blan* line •
/\$/!$ deletes from the first blan* line through
the last line of the file • /\$/!1 deletes from the first
blan* line through line 4# • /\*a8*/!/?@$/ deletes from the
first line that begins
Multiple 3ommands
+ Braces {} can be used to apply multiple commands to an
address
?/ pattern/?!/ pattern/@@{
comman1
}
+ &trange syntax- The openin" brace must be the
last character on a line
&ed 3ommands
+ lthough sed contains many editing commands' )e are only
going to cover the follo)ing subset-
• - "elete • p - print
• * - trn$for+ • q - it
0rint
+ The 0rint command 1 p2 can be used to force the pattern
space to be output' useful if the -n option has been
specified
+ &yntax- ?aress1?!aress&@@p
+ :ote- if the -n option has not been specified' p
)ill cause the line to be output t)iceH
+ Jxamples-
1!#p )ill display lines 4 through 7
replacement % replacement string for
pattern
fla"s % optionally any of the follo)ing + n a number
from 4 to 74" indicating )hich
occurrence of pattern should be replaced
+ % global' replace all occurrences of pattern in
pattern space
+ p print contents of pattern space
&ubstitute Jxamples
• s/Bu++ Oa*/B. Oi*/ &ubstitute 0. !iddy for the first
occurrence of 0uff
!addy in pattern space
• s/Hom/Oick/& &ubstitutes !ic* for the second
occurrence of Tom in
the pattern space
• s/woo/plastic/p &ubstitutes plastic for the first
occurrence of )ood and
outputs 1prints2 pattern space
Replacement 0atterns
+ &ubstitute can use several special characters in the
replacement string – > % replaced by the entire
string matched in
the regular expression for pattern – n % replaced by the nth
substring 1or
subexpression2 previously specified using ?_1? and ?_2@
Replacement 0attern Jxamples
<the Z[US operating s*stem Y< s/.[U./woner+ul >/ <the
woner+ul Z[US operating s*stem Y<
cat test1 first&second
se 's/(.8):(.8)/&:1/' test1 secon:+irst two:one
ppend' nsert' and 3hange
+ &yntax for these commands is a little strange because they
must be specified on multiple lines
+ append 0address1a@
ppend and nsert
+ ppend places text after the current line in
pattern space + nsert places text before the current
line in pattern space
Jach of these commands re5uires a follo)ing it.
text must begin on the next line.
f text begins )ith )hitespace' sed )ill discard it unless
you start the line )ith a
+ Jxample- /#6nsert Te7t Here8/ i4
9ine 1 of inserted te7t4 4 9ine 2 of inserted
te7t
would lea$e t#e followin" in t#e pattern space 9ine 1 of inserted
te7t
9ine 2 of inserted te7t #6nsert Te7t Here8
3hange
+ Unli*e nsert and ppend' 3hange can be applied to either a single
line address or a range of addresses
+ Dhen applied to a range' the entire range is replaced by text
specified )ith change' not each line Axception- f the 3hange
command is executed )ith
other commands enclosed in { } that act on a range of lines'
eac( line )ill be replaced )ith text
+ :o subse5uent editing allo)ed
3hange Jxamples
+ Remove mail headers' ieQ the address specifies a range of lines
beginning )ith a line that begins )ith rom until the first blan*
line. The first example replaces
all lines )ith a single occurrence of WMail Ceader RemovedE.
The second example replaces each line )ith WMail Ceader
RemovedE
/\Nrom /!/\$/c =Kail 4eaers 7emoe3
/\Nrom /!/\$/{ s/\Nrom //p c
=Kail 4eaer 7emoe3 }
Using 6
+ f an address is follo)ed by an exclamation point 162' the
associated command is applied to all lines that donLt match the
address or address range
+ Jxamples-
1!#6 )ould delete all lines except 4 through
7
/-lack/6s/cow/horse/ )ould substitute ?horse@ for ?co)@
on all lines except those that contained ?blac*@
?The bro)n co)@ %E ?The bro)n horse@
?The blac* co)@ %E ?The blac* co)@
Transform
+ The Transform command 1y2 operates li*e tr ' it does a
one%to%one or character%to%character replacement
+ Transform accepts zero' one or t)o addresses •
?aress?!aress@@*/a-c/x*9/
every a )ithin the specified address1es2 is transformed
to an x . The same is true for b to y and
c to (
– */a-ce+ghi0klmnopqrstuwx*9/GMAOLN^4UFCTK[JB`7
IHZ5SRP/ changes all lo)er case characters on the
addressed line to upper case
]uit
+ ]uit causes sed to stop reading ne) input lines and stop
sending them to standard output
+ t ta*es at most a single line address ,nce a line matching
the address is reached' the
script )ill be terminated
This can be used to save time )hen you only )ant to process
some portion of the beginning of a file
+ Jxample- to print the first 4## lines of a file 1li*e #ead 2
use- – se '1q' +ilename
0attern and Cold spaces
+ 8attern space- Dor*space or temporary buffer )here a single line
of input is held )hile the editing commands are applied
+ =old space- &econdary temporary buffer for temporary storage
only
'attern
Hold
in
out
+ :o facilities to manipulate numbers
+ 3umbersome syntax
+ unctionality- Jxecute other programs
+ ull programming language
/-in/ksh orn shell
+ )nteractively
+ Scriptin%
Ycontrol informationZ
&tandard out YdataZ
&hell &cripts
+ shell script is a regular text file that contains shell or
U:; commands
Before running it' it must have execute permission-
•chmo Vx filename
+ script can be invo*ed as- – sh name arg
– sh = name args
&hell &cripts
+ Dhen a script is run' the 9ernel determines )hich shell it
is )ritten for by examining the first line of the script
f 4st line starts )ith E6 pathname%of% shell' then
it invo*es pat#name and sends the script as an argument
to be interpreted
f E6 is not specified' the current shell assumes it is
a script in its o)n language
+ leads to problems
+ dvantages of shell scripts Jasy to )or* )ith other
programs
Jasy to )or* )ith files
Jasy to )or* )ith strings
Great for prototyping. :o compilation
+ !isadvantages of shell scripts &lo)er
:ot )ell suited for algorithms I data structures
+ )nade*uate for scriptin% 0oor control over file
descriptors
!ifficult 5uoting <U sa* <hello<< doesn`t
)or*
3an only trap &G:T
3an`t mix flo) control and commands
+ &urvives mostly because of interactive features. Pob
control
3ommand history
+ &cripts )ill also run )ith 9s(' :as(
+ nfluenced by G,
&imple 3ommands
+ simple command - se5uence of non blan*s arguments separated
by blan*s or tabs.
+ 4st argument 1numbered zero2 usually specifies the name of the
command to be executed.
+ ny remaining arguments- re passed as arguments
to that command.
rguments may be filenames' pathnames' directories or
special options
ls ,l / /-in/ls l /
Bac*ground 3ommands
+ ny command ending )ith \I\ is run in the bac*ground.
• wait )ill bloc* until the command finishes
+ire+ox >
3omplex 3ommands
+ The shell`s po)er is in its ability to hoo* commands
together
+ De`ve seen one example of this so far )ith pipelines-
+ De )ill see others cut ,: +& /etc/passw | sort |
uniq
+ Redirection of input- W
+ ppend output- EE
standard output remains the same
• cm 3 +ile &3>1 send both standard error and
standard output to
file
• cm 3 +ile1 &3+ile& send standard output to
file4
send standard error to file"
Cere !ocuments
+ &hell provides alternative )ays of supplying standard input
to commands 1an anonymous file2
+ &hell allo)s in%line input redirection using WW
called here documents
+ Synta<
ar-itrar*elimiter • #r6itr#ry-delimiter should be a string that
does
not appear in text
E6/-in/sh
mail stein-rennerb*ankees.com ==LJH Iorr*! U reall*
-lew it this *ear. Hhanks +or not +iring me. Rours!
Foe
LJH
+ Read- $ar
+ Aariables can be local or environment. Jnvironment variables are
part of U:; and can be accessed by child processes.
+ Turn local variable into environment- export aria-le
?8A4= list of directories to search for
?/A), bsolute pathname to mailbox
?US'> our login name
?4'>/ Type of your terminal
?8S1 0rompt
E6/-in/sh
mail stein-rennerb*ankees.com ==LJH Iorr*! U reall*
-lew it this *ear. Hhanks +or not +iring me. Rours!
$ZIL7I
LJH
positional parameter' starting from 4
special parameter
+ To get the value of a parameter- ${param} 3an be part of a
)ord 1a-c${+oo}e+2
Dor*s )ithin double 5uotes
+ The arguments to a shell function
+ rguments to the set built%in command – set this is a
test
• $1Dthis! $&Dis! $Da! $%Dtest
• $1Da! $&Dtest
E6/-in/sh
E Barameter 1: wor E Barameter &: +ile grep $1 $& |
wc ,l
$ !ountlines ing /usr/di!t/&ords &""
• $ ,ptions currently in effect
• $$ 0rocess number of current process
• $6 0rocess number of bac*ground process
• $8 ll arguments on command line
• <$b< ll arguments on command line individually 5uoted
<$1< <$&< ...
3ommand &ubstitution
+ Used to turn the output of a command into a string
+ Used to create arguments or variables
+ 3ommand is placed )ith grave accents to capture the output
of command
$ date 5e Iep &# 1%:%:# LOH &1 $
>?= date@
$ grep @generate,rege7p@ myfile!
+ Dildcards 1patterns2
K matches any single character
YlistZ matches any character in list
lo7er-upper matches any character in range lo7er- upper
inclusive
list matches any character not in list
+ This is the same syntax that find uses
ile Jxpansion
+ f multiple matches' all are returned and treated as separate
arguments-
+ Candled by the shell 1programs donLt see the )ildcards2
argvY#Z- /bin/cat
argvY4Z- file4
argvY"Z- file"
- $ !at file a
r)%;18 fileG
+ 3ommand groupings pipelines
+ Boolean operators
+ 3ontrol structures
Boolean ,perators
+ Jxit value of a program 1eit system call2 is a number
# means success
anything else is a failure code
+ cmd* >> cmd+
+ cmd* || cmd+
executes cmd" if cmd4 is not successful$ ls bad,file 8
/deB/null CC date $ ls bad,file 8 /deB/null "" date
5e Iep & ":%:& &
Dhat is an expressionK
+ ny U:; command. Jvaluates to true if the exit code is #'
false if the exit code E #
+ &pecial command /-in/test exists that does most common
expressions &tring compare
:umeric comparison
+ Good example U:; tools )or*ing together
echo <U know *ou< else
echo <U ont know *ou< +i
i+ ? + /tmp/stu++ @ >> ? wc ,l = /tmp/stu++ gt 1 @
then echo <Hhe +ile has more than 1 lines in it<
else echo <Hhe +ile is nonexistent or small<
+i
test &ummary + Strin% :ased tests
9 string ength of string is # n string ength of string is not #
string1 D string& &trings are identical string1 6D
string& &trings differ string &tring is not
:U
+ 2umeric tests int1 ,eq int& irst int e5ual to second int1 ,ne
int& irst int not e5ual to second gt! ge! lt! le greater'
greater/e5ual' less' less/e5ual
+ File tests r +ile ile exists and is readable w +ile ile exists
and is )ritable + +ile ile is regular file +ile ile is directory s
+ile file exists and is not empty
+ Use external command /-in/expr
standard output.
0articularly useful )ith command substitution SDexpr $S V
&
expr % <8< 1&
expr <(< % V <)< <8< &
• while Y one
+ !ifferent than 3- +or ar in list o comman one
+ Typically used )ith positional parameters or a list of
files-
sumD +or ar in <$b< o sumDexpr $sum V $ar one
echo Hhe sum is $sum
3ase statement
+ i*e a 3 s)itch statement for strings- case $ar in
opt1) comman1 comman& ;;
E6/-in/sh
3ase ,ptions
+ opt can be a shell pattern' or a list of shell patterns
delimited by |
+ Jxample-
+ 0rograms Most that are part of the ,& in /-in
+ Built%in commands
Built%in 3ommands
+ Built%in commands are internal to the shell and do not create a
separate process. 3ommands are built%in because- They are
intrinsic to the language 1eit2
They produce side effects on the current process 1cd2
They perform faster + :o for*/exec
mportant Built%in 3ommands eec - replaces shell )ith program
cd - change )or*ing directory
s(ift - rearrange positional parameters
set- set positional parameters
umas9 - change default file permissions
eit - 5uit the shell
eval - parse and execute string
time - run command and print times
eport - put variable into environment
trap - set signal handlers
< - true
unctions are similar to scripts and other commands except-
+ They can produce side effects in the callers script. + Aariables
are shared bet)een caller and callee. + The positional parameters
are saved and restored
)hen invo*ing a function. &yntax-
name () { commans }
+ :ot recommended for scripts
+ Built%ins not associated )ith 789:
• 789: search
Co) the &hell 0arses
+ 0art 4- Read the command- Read one or more lines a needed
&eparate into to'ens using space/tabs orm
commands based on to*en types
+ 0art "- Jvaluate a command- Axpand )ord to*ens
1command substitution' parameter
expansion2
E(omeEunitoolE:inEs(o7ar%s
+ 3omments end at the end of the line
+ 3omments can begin )henever a to*en begins
+ Jxamples E Hhis is a comment
E an so is this
grep +oo -ar E this is a comment
grep +oo -arE this is not a comment
&pecial 3haracters
+ The shell processes the follo)ing characters specially unless
5uoted- | > ( ) = 3 ; < ' $
space tab newline
+ The follo)ing are special )henever patterns are processed-
8 ] ? @
+ The follo)ing are special at the beginning of a )ord-
E
+ The follo)ing is special )hen processing assignments-
D
To*en Types
+ The shell uses spaces and tabs to split the line or lines into
the follo)ing types of to*ens- 3ontrol operators 1;;2
Redirection operators 1+2
,perator To*ens
+ ,perator to*ens are recognized every)here unless 5uoted.
&paces are optional before and after operator to*ens.
+ /, Redirection ,perators-
Jach /, operator can be immediately preceded by a single
digit
+ 3ontrol ,perators-
+ ]uoting causes characters to loose special meaning.
• Unless 5uoted' causes next character to be 5uoted. n front
of ne)%line causes lines to be 9oined.
• 'Y' iteral 5uotes. 3annot contain ' • <Y< Removes special
meaning of all
$ !at file cat: +ile8 not +oun
$ !at file1 8 /deB/null $ !at file1 8 /deB/null a cat: 3: cannot
open
&imple 3ommands
+ simple command consists of three types of to*ens-
ssignments 1must come first2
3ommand )ord to*ens
The first to*en must not be a reserved )ord
3ommand terminated by ne)%line or ;
Dord &plitting
+ After parameter epansion' command su:stitution' and arithmetic
expansion' the characters that are generated as a result of t#ese
expansions that are not inside double 5uotes are chec*ed for
split characters
+ !efault split character is space or tab
-
0athname Jxpansion
+ After )ord splitting' each field that contains pattern
characters is replaced by the pathnames that match
+ ]uoting prevents expansion
OGHLDate echo $+oo 3 /e/null assignment word param
redirection
echo hello there /dev/null
+ &trip 3R from files
E6/-in/sh
HKBNUTLD/tmp/+ile$$
i+ ? <$1< D << @ then tr 'r' exit +i
trap 'rm + $HKBNUTL' 1 & 1#
? <$1< 6D << @ >> c <$1<
cat ==4ZB =html3 =h13 Oirector* listing +or
$B5O =/h13 =ta-le -orerD13 =tr3 4ZB
numD +or +ile in 8 o genhtml $+ile E this +unction is
on next
page one cat ==4ZB =/tr3 =/ta-le3
=/html3
unction genhtml
llo)s nesting
Jxpressions
+ Jxpressions are built%in )ith the ?? @@ operator i+ ??
$ar D << @@ Y
+ Gets around parsing 5uir*s of E:inEtest! allo)s chec*ing
strings against patterns
0atterns
+ 3an be used to do string matching- i+ ?? $+oo D 8a8 @@
i+ ?? $+oo D ?a-c@8 @@
• '< param-$alue> !efault
$alue if param not set
+ Aariables can be arrays – foo@3Atest – ec$o '<foo@3A>
+ ndexed by number • ${Earr} is length of the array +
Multiple array elements can be set at once-
– set –8 foo # 6 c d – ec$o '<foo@1A> &et command
can also be used for positional
params& set # 6 c dB rint '2
+ Much faster
dditional eatures
+ Built%in arithmetic- Using N11expression 22
e.g.' print $(( 1 V 1 8 Q / x ))
+ Tilde file expansion
N0D!
- N,!0D!
]uoting causes characters to loose special meaning.
• Unless 5uoted' causes next character to be 5uoted. n
front of ne)%line causes lines to be 9oined.
• 'Y' iteral 5uotes. 3annot contain ' • <Y< Removes special
meaning of all
characters except $' <' and . The is only special
before one of
these characters and ne)%line.
$ !at file cat: +ile8 not +oun
$ !at file1 8 /deB/null $ !at file1 8 /deB/null a cat: 3: cannot
open
+ 3omments end at the end of the line
+ 3omments can begin )henever a to*en begins
+ Jxamples E Hhis is a comment
E an so is this
grep +oo -ar E this is a comment
grep +oo -arE this is not a comment
Co) the &hell 0arses
+ 0art 4- Read the command- Read one or more lines a needed
&eparate into to'ens using space/tabs orm
commands based on to*en types
+ 0art "- Jvaluate a command- Axpand )ord to*ens
1command substitution' parameter
expansion2
/home/unixtool/-in/showargs
&pecial 3haracters
+ The shell processes the follo)ing characters specially unless
5uoted- | > ( ) = 3 ; < ' $
space tab newline
+ The follo)ing are special )henever patterns are processed-
8 ] ? @
+ The follo)ing are special at the beginning of a )ord-
E
+ The follo)ing is special )hen processing assignments-
D
To*en Types
+ The shell uses spaces and tabs to split the line or lines into
the follo)ing types of to*ens- 3ontrol operators 1;;2
Redirection operators 1+2
,perator To*ens
+ ,perator to*ens are recognized every)here unless 5uoted.
&paces are optional before and after operator to*ens.
+ /, Redirection ,perators-
Jach /, operator can be immediately preceded by a single
digit
+ 3ontrol ,perators-
&imple 3ommands
+ simple command consists of three types of to*ens-
ssignments 1must come first2
3ommand )ord to*ens
+ The first to*en must not be a reserved )ord
+ 3ommand terminated by ne)%line or ;
Dord &plitting
+ After parameter epansion' command su:stitution' and arithmetic
expansion' the characters that are generated as a result of t#ese
expansions that are not inside double 5uotes are chec*ed for
split characters
+ !efault split character is space or tab
-
0athname Jxpansion
+ After )ord splitting' each field that contains pattern
characters is replaced by the pathnames that match
+ ]uoting prevents expansion
OGHLDate echo $+oo 3 /e/null assignment word param
redirection
echo hello there /dev/null
The eval built%in
• eal arg Y 3auses all the to*enizing and
expansions to
be performed again
trap command
+ trap specifies command that should be evaled )hen the shell
receives a signal of a particular value.
• trap ? ?command @ {si"nal }V@
trap 'echo <please! ont interrupt6<' IU^U[H
trap 'rm /tmp/tmp+ile' LSUH
Reading ines
+ read is used to read a line from a file and to store the
result into shell variables read –r prevents
special processing
Uses )FS to split into )ords
f no variable specified' uses >'8,H
rea
+ &trip 3R from files
E6/-in/sh
HKBNUTLD/tmp/+ile$$
i+ ? <$1< D << @ then tr 'r' exit +i
trap 'rm + $HKBNUTL' 1 & 1#
? <$1< 6D << @ >> c <$1<
cat ==4ZB =html3 =h13 Oirector* listing +or
$B5O =/h13 =ta-le -orerD13 =tr3 4ZB
numD +or +ile in 8 o genhtml $+ile E this +unction is
on next
page one cat ==4ZB =/tr3 =/ta-le3
=/html3
unction genhtml
llo)s nesting
Jxpressions
+ Jxpressions are built%in )ith the ?? @@ operator i+ ??
$ar D << @@ Y
+ Gets around parsing 5uir*s of E:inEtest! allo)s chec*ing
strings against patterns
0atterns
+ 3an be used to do string matching- i+ ?? $+oo D 8a8 @@
i+ ?? $+oo D ?a-c@8 @@
• '< param-$alue> !efault
$alue if param not set
+ Aariables can be arrays – foo@3Atest – ec$o '<foo@3A>
+ ndexed by number • ${Earr} is length of the array +
Multiple array elements can be set at once-
– set –8 foo # 6 c d – ec$o '<foo@1A> &et command
can also be used for positional
params& set # 6 c dB rint '2
dditional eatures
+ Built%in arithmetic- Using N11expression 22
e.g.' print $(( 1 V 1 8 Q / x ))
+ Tilde file expansion
N0D!
- N,!0D!
+ By default attributes hold strings of unlimited length
+ ttributes can be set )ith typeset- readonly 1%r2
cannot be changed
export 1%x2 value )ill be exported to env
upper 1%u2 letters )ill be converted to upper case
lo)er 1%l2 letters )ill be converted to lo)er case
l9ust 1% widt#2 left 9ustify to given )idth
r9ust 1%R widt#2 right 9ustify to given )idth
zfill 1% widt#2 9ustify' fill )ith leading zeros
integer 1% YbaseZ2 value stored as integer
float 1%J Y prec Z2 value stored as 3 double
nameref 1%n2 a name reference
:ame References
+ name reference is a type of variable that references
another variable.
• namere+ is an alias for t*peset n Jxample-
user1D<0e++< user&D<aam< t*peset ,n
nameD<user1<
print $name jeff
+ Nparam-offset-len &ubstring )ith offset
0atterns Jxtended
+ dditional pattern types so that shell patterns are e5ually
expressive as regular expressions
+ Used for- file expansion
• $'Y' Uses 3 escape se5uences $'t' $'4ellonthere'
+ printf added that supports 3 li*e printing- rintf CDou
$#ve ?d #lesC 'x
+ !eclared )ith t*peset ,G
+ &et- name?<+oo<@D<-ar<
+ Reference ${name?<+oo<@}
+ 3lient application and server application communicate via a
net)or* protocol
+ protocol is a set of rules on ho) the client and
server communicate
5eb
client
5eb
server
2''$
l
transport
:et)or* ccess/nternet ayers
+ :et)or* ccess ayer !eliver data to devices on the
same physical net)or*
Jthernet
!etermines routing of data"ram
!atagram fragmentation and reassembly
0rovides error%free' point%to%point connection bet)een
hosts
+ User !atagram 0rotocol 1U!02 Unreliable'
connectionless
+ Transmission 3ontrol 0rotocol 1T302 Reliable'
connection%oriented
c*no)ledgements' se5uencing' retransmission
+ Both T30 and U!0 use 48%bit port numbers
+ server application listen to a
specific port for connections
+ 0orts used by popular applications are )ell%defined
&&C 1""2' &MT0 1"72' CTT0 1V#2
4%4#"( are reserved 1well&'nown2
+ 3lients use ep#emeral ports 1,& dependent2
:ame &ervice
+ Jvery node on the net)or* normally has a hostname in addition to
an 0 address
+ !omain :ame &ystem 1!:&2 maps 0 addresses to names
e.g. 4"V.4"".V4.477 is access4.cims.nyu.edu
+ !:& loo*up utilities- nsloo9up' di%
+ ocal name address mappings stored in /etc/hosts
+ &oc*ets provide access to T30/0 on U:; systems
+ &oc*ets are communications endpoints
+ nvented in Ber*eley U:;
+ llo)s a net)or* connection to be opened as a file 1returns
a file descriptor 2
machine machine 0
+ Telnet 10ort "(2 0rovides virtual terminal for remote
user
The telnet program can also be used to connect to other
ports
+ T0 10ort "#/"42 Used to transfer files from one machine to
another
Uses port "# for data' "4 for control
+ &&C 10ort ""2 or logging in and executing commands
on
remote machines
Used by mail transfer agents 1MTs2
+ M0 10ort 46(2 llo) clients to access and manipulate
emails
on the server
sh>(- /dev/tcp
+ iles in the form /e/tcp/hostname/port result in a soc*et
connection to the given service-
exec =3/e/tcp/smtp.cs.n*u.eu/&# EIKHB print ,u dL4TJ
cs.n*u.eu< print ,u d`ZUH< while UNID rea ,u
o
print ,r <$7LBTR< one
+ Cypertext Transfer 0rotocol Use port V#
+ anguage used by )eb bro)sers 1J' :etscape' irefox2 to communicate
)ith )eb servers 1pache' &2
HTTP request:
3ui$kTie an* a TI55 (6W) *e$o!re##or
are nee*e* to #ee t+i# !i$ture8
Resources
+ Deb servers host )eb resources' including CTM files' 0! files' G
files' M0JG movies' etc.
+ Jach )eb ob9ect has an associated MMJ type CTM document
has type te9t/+tl
P0JG image has type iage/!eg
+ Deb resource is accessed using a Uniform Resource ocator 1UR2 –
+tt!://www8$#8nu8e*u:;</$our#e#/=all<>/G1181140<<./in*e98+tl
protocol host port resource
Host: www.nyu.edu
Content-type: image/gif
Content-length: 3210
HOST: www.cs.nyu.edu
Server: Apache/2.0.49 (Unix) mod_perl/1.99_14 Perl/v5.8.4
mod_ssl/2.0.49 OpenSSL/0.9.7e mod_auth_kerb/4.13 PHP/5.0.0RC3
Content-Length: 163
<html>
<head>
<body>
</body>
</html>
request
response
&tatus 3odes
+ &tatus code in the CTT0 response indicates if a re5uest is
successful
+ &ome typical status codes-
6#4 uthorization re5uired
3ui$kTiean* a TI55 (6W) *e$o!re##or
3G
+ Common Gate7ay )nterface is a standard interface for running
helper applications to generate dynamic contents &pecify
the encoding of data passed to programs
+ llo) CTM documents to be created on the fly
+ Transparent to clients 3lient sends regular CTT0
re5uest
Deb server receives CTT0 re5uest' runs 3G program' and sends
contents bac* in CTT0 responses
+ 3G programs can be )ritten in any language
3ui$kTie an* a TI55 (6W) *e$o!re##or
are nee*e* to #ee t+i# !i$ture8
Web Server
<head>
<p>
+ CTM is a file format that describes a )eb page.
+ These files can be made by hand' or generated by a program
+ !ata sent via CTT0 re5uest
+ &erver launches 3G script to process data
<form method=POST
action=“http://www.cs.nyu.edu/~unixtool/cgi-
<input type=submit>
+ Radio Buttons <input type=radio name=size value=“S”>
Small
<input type=radio name=size value=“M”> Medium
<input type=radio name=size value=“L”> Large
+ 3hec*boxes <input type=checkbox name=extras
value=“lettuce”> Lettuce
<input type=checkbox name=extras value=“tomato”> Tomato
+ Text rea <textarea name=address cols=50 rows=4>
F
</textarea>
&ubmit Button
+ &ubmits the form for processing by the 3G script specified in
the form tag
<input type=submit value=“Submit Order”>
+ T)o methods-
8$S4 + orm variables sent as content of CTT0 re5uest
Jncoding orm Aalues
+ Bro)ser sends form variable as name%value pairs –
name1=value1&name2=value2&name3=value3
+ :ames are defined in form elements – <input type=text name=ssn
maxlength=9>
HOST: www.cs.nyu.edu
GJT or 0,&TK
+ GJT method is useful for Retrieving information'
e.g. from a database
Jmbedding data in UR )ithout form element
+ 0,&T method should be used for forms )ith Many fields
or long fields
&ensitive information
!ata for updating database
+ 8$S4- !ata in standard input 1from body of re5uest2
+ Most scripts parse input into an associative array ou can
parse it yourself
,r use available libraries 1better2
0art 4- CTM orm
=html3 =center3 =413 8nonymous
Gomment Su6mission =/413 =/center3 7le#se
enter your comment 6elow w$ic$ will
6e sent #nonymously
to =tt3 korn*Hcsnyuedu =/tt3 /f you w#nt to 6e
extr# c#utious, #ccess t$is
#ge t$roug$ =a
hre+D<http://www.anon*mi9er.com<3 8nonymiIer =/a3
=p3 =+orm actionDcgi-in/comment.cgi
methoDpost3 =textarea nameDcomment rowsD&
colsDQ3 =/textarea3 =input t*peDsu-mit
alueD<Iu-mit Aomment<3 =/+orm3
=/html3
=J$omeJunixtoolJ6inJks$
. cgili-.ksh E 7ea special +unctions to help parse 7eaBarse
Brint4eaer
rint -r -- C${Agi.comment}C ; J6inJm#ilx -s CGKLLM9C korn*
rint C+:2Dou su6mitted t$e comment+J:2C
rint C+reC
!ebugging
+ !ebugging can be tric*y' since error messages don`t al)ays print
)ell as CTM
+ ,ne method- run interactively
$ `ZL7R2IH7U[^D'-irtha*D1/1#/' $ ./-irtha*.cgi Aontentt*pe:
text/html
=html3 Rour -irtha* is =tt31/1#/&=/tt3.
=/html3
+ This can vary by )eb server type
http-//))).cims.nyu.edu/systems/resources/)ebhosting/index.html
+ Typically' you give your script a name that ends )ith Dc%i
+ Give the script execute permission
+ &pecify the location of that script in the UR
+ :ever trust user input % sanity%chec* everything
+ f a shell command contains user input' run )ithout shell
escapes
+ l)ays encode sensitive information' e.g. pass)ords
lso use CTT0&
+ 3lean up % donLt leave sensitive data around
Dor* )ell )ith text
Jxample- ind )ords in !ictionary
=+orm actionDict.cgi3 7egular expression: =input
t*peDentr* nameDre alueD<.8<3
=input t*peDsu-mit3 =/+orm3
E6/home/unixtool/-in/ksh
BGH4D$BGH4:. . cgili-.ksh 7eaBarse Brint4eaer
+ 0ractical Jxtraction and Report anguage
+ &cripting language created by arry Dall in the mid%V#s
+ unctionality and speed some)here bet)een lo)%level languages
1li*e 32 and high%level ones 1li*e shell2
+ nfluence from a)*' sed' and 3 &hell
+ Jasy to )rite 1after you learn it2' but sometimes hard to
read
+ Didely used in 3G scripting
:ello, world
1)1tfI),>"(o<4s)1rs(2uB2(uC,6w-26 '
(i#A@11f523"Vt'')A4,BdJ
(41'<JH2")JI1"%)(2,,4c52)FU,$'4B'T1r,#,)'@4r)
'+"Bfor('x2B'x+!!!B'#%%'x)<s><<;>>Bus$HH,'WB
'xUX'I'#&%%'>>for(%%'Jsu6str('#,1VVU))
+ &upport ,, programming and user% defined types
+ !ata types have separate name spaces
'foo scalar
Hfoo list
?foo hash
Wfoo function
'num 2234UB
'num -13e3VB
'str QY$oFQs t$ereXQB
'str Ogood eveningFnOB
HJ &ystem error message
H/ nput record separator
H8 nput record number
+ &tring repetition- 9
+ Binary assignments-
'v#l 2B 'v#l 5 3B = 'v#l is !
'st#te NGityOB = NMewDorkGityO
Greater than or e5ual to
NL ge
if ('v#l) < Z >
+ :o boolean data type
+ ^L and ^<L are falseQ other strings are true
+ The unary not 1J2 negates the boolean value
'f 5 %%'nB
+ Use defined to chec* if a value is un*e=
if (defined('v#l)) < Z >
+ Jach element is a scalar variable
+ ndices are integers starting at #
'te#ms@3AOGelticsOB= #dd new elt
Hfoo ()B = emty list
Hnums (11"")B = list of 1-1""
H#rr ('x, 'y5!)B
H#rr1 H#rr2B
More bout rrays and ists
+ ]uoted )ords % w Hl#nets EwJ e#rt$ m#rs *uiter JB
Hl#nets Ew< e#rt$ m#rs *uiter >B
+ ast elementLs index- '=l#nets
:ot the same as number of elements in arrayH
+ ast element- 'l#nets@-1A
Hcolors Ew+ red green 6lue B
+ rray interpolated as string- rint NLy f#vorite colors #re
HcolorsFnOB
+ 0rints My favorite colors are red green blue
+ rray in scalar context returns the number of elements in
the list 'num Hcolors % UB = 'num gets V
+ pus( adds element to end of array
Hcolors Ew= red green 6lue =B
us$(Hcolors, OyellowO)B = s#me #s
Hcolors (Hcolors, OyellowO)B
'l#stcolor o(Hcolors)B
shift and unshift
+ s(ift and uns(ift- similar to push and pop on the ?left@
side of an array
+ uns(ift adds elements to the beginning
Hcolors Ew= red green 6lue =B
uns$ift Hcolors, Oor#ngeOB
'c s$ift(Hcolors)B = 'c gets Oor#ngeO
Hlist1 Ew= MD M\ G9 =B
Hlist2 reverse(Hlist1)B = (G9,M\,MD)
Hd#y EwJ tues wed t$urs JB
Hsorted sort(Hd#y)B =(t$urs,tues,wed)
Hnums sort 11"B = 1 1" 2 3 Z V T
+ reverse and sort do not modify their arguments
terate over a ist
+ foreac( loops through a list of values Hte#ms Ew= [nicks
Mets #kers =B
fore#c$ 'te#m (Hte#ms) <
rint N'te#m winFnOB
+ &ynonym for the for *ey)ord
• HD is the default fore#c$ (Hte#ms) <
' N winFnOB
rintB = rint '
Cashes
+ ssociative arrays % indexed by strings 1*eys2
'c#<N:#w#iiO> N:onoluluOB
?c# ( NMew DorkO, N8l6#nyO, NMew \erseyO, N9rentonO, N.el#w#reO,
N.overO )B
+ Un)inding the hash Hc##rr ?c#B
Gets unordered list of *ey%value pairs
+ ssigning one hash to another ?c#2 ?c#B
?c#of reverse ?c#B
rint 'c#of<O9rentonO>B = Mew \ersey
Hst#te keys ?c#B
+ values returns a list of values
Hcity v#lues ?c#B
>
rint N?c#FnOB
+ ndividual elements can
rint NG#it#l of 'st#te is
'c#<'st#te>FnOB
More Cash unctions
+ eists chec*s if a hash element has ever been
initialized
rint NxistsFnO if exists 'c#<N]t#$O>B
3an be used for array elements
hash or array element can only be defined if it
exists
+ delete removes a *ey from the hash
delete 'c#<NMew DorkO>B
+ Method 4- Treat them as lists ?$3 (?$1, ?$2)B
+ Method " 1save memory2- Build a ne) hash by looping over all
elements ?$3 ()B
w$ile ((?k,'v) e#c$(?$1)) <
>
rint$elloB @ !rint PBello QaneR
rint$ello()B @ !rint PBello QaneR
+ 0arameters are assigned to the special array D
+ ndividual parameter can be accessed as HD<' HD.' F su6 sum
<
my 'xB @ !ri'ate 'ariable H9
fore#c$ (H) < @ iterate o'er !ara#
'x % 'B
More on 0arameter 0assing
+ ny number of scalars' lists' and hashes can be passed to a
subroutine
+ ists and hashes are ?flattened@ func('x, Hy, ?I)B
nside func- • '@"A is 'x • '@1A is 'y@"A •
'@2A is 'y@1A' etc.
Return Aalues
+ The return value of a subroutine is the last expression
evaluated' or the value returned by the return operator
su6 myfunc <
my 'x 1B
+ 3an also return a list- return HsomelistB
+ f return is used )ithout an expression 1failure2'
un*e= or () is returned depending on context
su6 myfunc <
exical Aariables
+ Aariables can be scoped to the enclosing bloc* )ith the
my operator su6 myfunc <
my 'xB
Z
>
ll ne) variables need to be declared )ith
=JusrJ6inJerl -w
'num 4B
Hres dec6yone(Hnums, 'num)B @ re#L(<2 .2 12 V)
@ (nu#2Hnu)L(.2 12 V2 4)
minusone(Hnums, 'num)B @ (nu#2Hnu)L(<2 .2 12 V)
su6 dec6yone <
for my 'n (Hret) < 'n-- >
return HretB
• ST&I is the builtin filehandle to the std input
+ Use the line input operator around a file handle to read from it
'line +S9./MB @ rea* ne9t line
c$om('line)B
Z
W E
+ Diamond operator L M helps 0erl programs behave
li*e standard Unix utilities 1cut' sed' F2
+ ines are read from list of files given as command line arguments
1A"G2' other)ise from stdin w$ile (+) <
c$omB
>
• Jmyrog file1 file2 - Read from file*' then file+ '
then standard input
• HA"G is the current filename
ilehandles
+ Use o!en to open a file for reading/)riting oen K^,
OsyslogOB @ rea*
oen K^, O+syslogOB @ rea*
oen K^, OsyslogOB @ write
oen K^, OsyslogOB @ a!!en*
+ Dhen youLre done )ith a filehandle' close it
close K^B
Jrrors
+ Dhen a fatal error is encountered' use *ie to print out
error message and exit program die OSomet$ing 6#d $#enedFnO if
ZB
+ l)ays chec* return value of o!en
oen K^, OsyslogO
or die OG#nnot oen log& 'OB
+ or non%fatal errors' use warn instead w#rn O9emer#ture is
6elow "O
if 'tem + "B
or die NG#nnot oen mess#ges& 'FnOB
w$ile (+LS^) <
c$omB
'line +K^B
Hlines +K^B
+ Undefine H/ to read the rest of file as a string
undef 'JB
or die NG#nnot cre#te log& 'OB
rint K^ NSome log mess#gesZFnO
rintf K^ N?d entries rocessedFnO,