66
Working on a Unix System and Introduction to Perl Lecture Note for Computational Biology 1 (LSM 5191) Jiren Wang http://www.bii.a-star.edu.sg/~jiren BioInformatics Institute Singapore

Working on a Unix System and Introduction to · PDF fileWorking on a Unix System and Introduction to Perl Lecture Note for Computational Biology 1 (LSM 5191) Jiren Wang jiren

  • Upload
    vanthu

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Working on a Unix System and Introduction to PerlLecture Note for Computational Biology 1 (LSM 5191)

Jiren Wanghttp://www.bii.a-star.edu.sg/~jiren

BioInformatics InstituteSingapore

Outline

Working on a Unix SystemIntroduction to PERL

Unix System

What is Unix?Unix is a powerful operating system for multi-user, multi-process computer systems.It has been in existence for over 30 years.It has been used primarily in industry and academia, where networked systems and multi-user high-performance computer systems are required during that time.Unix is rich in commands and possibilities.

Different Flavors of Unix

LinuxAn open source version of Unix.

Solaris – Sun MicrosystemsIRIX – Silicon GraphicsDigital Unix – Compaq CorporationHP-UX - Hewlett PackardAIX - IBM

The Advantage of LinuxLinux is free.Linux is portable to any hardware platform.Linux was made to keep on running.

As with UNIX, a Linux system expects to run without rebooting all the time.

Linux is secure and versatile.The security model used in Linux is based on the UNIX idea of security.

Linux is scalable.The Linux OS and Linux applications have very short debug-times.

Some Basic Unix Concepts

The structure of the file system.Commands for working with directories and files.Controlling access to your files and directories.

File System Basic

Files are named locations on the computer’s storage device. Each file name is a pointer to a discrete object with a beginning and end.Directories are containers that can hold files, and other directory.The file hierarchy on a Unix system is structured as a tree, with a root directory that branches into subdirectories and subdirectories of subdirectories.

Four Different Types of File

Ordinary filesOrdinary files can contain text, data, or program information.

DirectoriesSpecial files

Special files represent input/output devices, like a tty (terminal), a disk drive, or a printer.

LinkA link is a pointer to another file.

The Structure of Unix File System

/

bin dev etc home lib opt root sbin tmp usr var …

cdrom tty globus webuser student bin local lib

cat cp date bin src mail

Some Major Subdirectories in Unix/dev – contains all the device drivers needed to connect peripherals to the system./etc – houses all the configuration files local to your machine./home – a common, but not standard, part of Unix. Usually a fairly large, separate

partition that houses all user home directories./opt – this is where optional, usually commercial, packages are installed./root – the home directory for root./sbin – system-level commands are present and that normal users probably won’t

need them./bin – executable file directory. /lib – a small subset of system libraries that are need by programs in /bin and /sbin./tmp and /var/tmp – typically configured to be readable / writable / executable by

all users./usr – the repository for the majority of programs, compilers, libraries, and

documentation for the Unix file system./usr/local – the typical directory in which to install programs and documentation so

that they aren’t overwritten by the operating system./var – the directory used by all system programs that write output to the disk.

Paths to Files and Directory

Each file on the file system can be uniquely identified by a combination of a file name and a path.The absolute path describes the relationship of the file to the root directory, /.The relative path describes its relationship to the current working directory.Each directory on the system contains two links, ./ and ../, which refer to the current directory and its parent directory, respectively.

Unix Command Line Structure

A command is a program that tells the Unix system to do some thing.Command [options] [arguments]

Options modify the command, changing the way it performs. Options are generally preceded by a hyphen (-), and for most commands, more than one option can be strung together, in the form:

command -[option][option][option]

Arguments indicate on what the command is to perform its action, usually a file or series of files. ls –alR *

Getting Help

The Unix manual, usually called man pages, is available on-line to explain the usage of the Unix system and commands.To use a man page, type the command "man" at the system prompt followed by the command for which you need information.

man [options] command_name

Commands for Working with Directories and Files (1)

remove a directoryrmdir [options] directory

print working (current) directorypwd

make a directorymkdir [options] directory

list directory contents or filepermissions

ls [options] [directory or file]

change directorycd [directory]

FunctionCommand/Syntax

Comparing Similar Unix and DOS Commands

cdpwdlocation in path (present working directory)

cd \cdreturn to user's home directory

rd (rmdir)rmdirdelete (remove) directory

cd (chdir)cdchange directory

md (mkdir)mkdirmake directory

dirlsList directory contents

DOSUnixCommand

Example of Command cd

cd - changes to user's home directory cd / - changes directory to the system's root cd .. - goes up one directory level cd ../.. - goes up two directory levels cd /full/path/name/from/root - changes directory to absolute path

named (note the leading slash)cd path/from/current/location changes directory to path relative to

current location (no leading slash)

Commands for Working with Directories and Files (2)

remove (delete) a file or directory (-r recursively deletes the directory and its contents) (-i prompts before removing files)

rm [options] file

move file1 into file2mv [options] file1 file2

copy file1 into file2; file2 shouldn't already exist. This command creates or overwrites file2.

cp [options] file1 file2

change the ownership of a file; can only be done by the superuser

chown [options] owner file

change file or directory access permissionschmod [options] file

change the group of the filechgrp [options] group file

FunctionCommand/Syntax

Display Commands

display the last part of a filetail [options] file

display the contents of a text file on the terminal, one screenful at a time

more (or less or pg) [options] file

display the first 10 (or number of) lines of a file

head [-number] fileecho the text string to stdoutecho [text string]

concatenate (list) a filecat [options] file

FunctionCommand/Syntax

System Resource Commands

report who is on the system or display information about currently logged in users

who or w

locate a command; display its pathname or alias

which command

locate the binary, source, and man page files for a command

whereis [options] commandreport process statusps [options]

summarize disk usagedu [options] [directory or file]

display number of free disk blocks and files

df [options] [resource]

report the current date and timedate [options]

FunctionCommand/Syntax

Controlling Access to Your Files and Directories

Displaying access permissions.Understanding access permissions.Changing the group ownership of files.Changing access permissions.

Displaying access permissions

To display the access permissions of a file or directory, use the the command:

ls -l filename (directory)

ls –l test.txt-rwxr-x-r-- 1 test staff 100 Aug 8 11:58 test.txt

Understanding Access Permissions

Three types of permissionsr - read the file or directory. w - write to the file or directory. x - execute the file or search the directory.

Three types of useru - the user who owns the file.g - members of the group to which the owner belongs. o - all other users.

A string of nine characters represents the access permissions for all three types of user.

user group others r w x r w x r w x4 2 1 4 2 1 4 2 1 – octal number

Summary of Access Permission Modes

search permissionexecute permission to the file

x

create and delete files in the directory

write permission to the file

w

list the directory contents (requires x on the directory)

read permission to the file

r

Directory AccessFile AccessMode

Changing Group Ownership of Files and Directories

Every user is a member of one or more groups.To find out which groups you belong to, use the command:

groups

To find out which groups another user belongs to, use the command:

groups usernameTo list the group ownership of your files:

ls -l

To change the group ownership of a file or directory with the command:

chgrp group_name file/directory_name

Changing Access Permissions

To change the access permissions for a file or directory use the command

chmod mode filenamechmod mode directory_name

The mode consists of three parts: who the permissions apply to, how the permissions are set and which permissions to set.

who – ‘u’ (user), ‘g’ (group), ‘o’ (other), and ‘a’ (all)how – ‘+’ (add), ‘-’ (subtract), and ‘=‘ (assign)which – ‘r’ (read), ‘w’ (write), and ‘x’ (execute).

Example of Command chmod

Changing access permissions using the chmod command.

ls –l test.txt-rw-rw-r-- 1 test test 100 Aug 8 11:58 test.txtchmod u+w,g-w,o-r test.txtls –l test.txt-rw-r----- 1 test test 100 Aug 8 11:58 test.txt

Commands for Remote Connections

telnetOpen a shell on a remote Unix machine; the workstation on which the command is issued becomes a terminal for that machine.

ftpTransferring files from one computer to another.

sshProvide secure encrypted communications between two untrusted hosts over an insecure network.

scpSecure copy (remote file copy program).

Example of Command ssh

Secure remote login connectionssh remotemachine –l remoteuser

ssh 192.168.115.66 –l students2003

Executing single commands remotelyssh remotemachine –l remoteuser commandssh remotemachine command

ssh mammoth.bii.a-star.edu.sg “ls –l > /tmp/myfile”

Introduction to PERL

PERLPerl is an acronym for Practical Extraction and Report Language.Perl is freely available for Unix, MVS, VMS, MS/DOS, Macintosh, OS/2, and other operating systems.Perl has powerful text-manipulation functions.Perl has enjoyed recent popularity for World Wide Web programming and served as glue and gateway between systems, databases, and users.

PERL’s Benefit

Easy of ProgrammingRapid PrototypingPortability, Speed, and Program Maintenance

Easy of Programming

Deal with the information in ASCII text files or flat files.Easy to process and manipulate long sequences such as DNA and proteins. Convenient to write a program that control one or more other programs. Use to put biology research labs and their results on their own dynamic web sites.

Rapid Prototyping

Many problems can be solved in far fewer lines of Perl code than in C or Java.

Portability, Speed, and Maintenance

PortabilityHow many types of computer systems the language can run on?

SpeedThe speed with which the program runs.A program written in C typically runs two or more times fast than the comparable Perl program.

MaintenanceAdd features to a program, handle more types of input, port program to run on other computer systems, fix bugs, etc.

PERL Built-in Data Types

ScalarA scalar stores a single, simple value, typically a string or a number.

ArrayAn array is an ordered list of scalars that you access with a numeric subscript (subscripts start at 0).

Hash A hash is an unordered set of key/value pairs that you access using strings (keys) as subscripts, to look up the scalar value corresponding to a given key.

Example of Build-in Data Types#!/usr/bin/perl –w# store DNA in a scalar, and print it out.$DNA = “AGCTTAGCAAAT”;print $DNA, “\n”;# declare an array, initialized with a list of four scalar values.@bases = (“A”, “C”, “G”, “T”); print $bases[0], “\n”;# initialize a hash with some key-value pairs%classification = (

‘dog’ => ‘mammal’,‘robin’ => ‘bird’,‘asp’ => ‘reptile’,

);print $classification{‘dog’}, “\n”;

Strings

Strings Sequences of characters.

Single-Quoted StringsA sequence of characters enclosed in single quotes.‘hello’, ‘do\’t’, ‘tell\\me’, and ‘hello\n’.

Double-Quoted StringsA sequence of characters enclosed in double quotes.“hello world\n”, and “my\tbook”.

Some Double-quoted String Representations

\n – new line\r – return\t – tab\f – formfeed\b – backspace\\ - backslash\” – double quote

Numeric and String Comparison

Comparison Numeric StringEqual == eqNot equal != neLess than < ltGreater than > gtLess than or equal to <= leGreater than or equal to >= geComparison returns –1, 0, 1 <=> cmp

Binary Assignment Operators

A shorthand for the operation of altering a variable that appears on both sides of an assignment.

$a = $a + 5;

$a += 5;

$b = $b * 3;

$b *= 3;

Array Assignment and Array Element Access

@one = (1, 2, 3);@two = @one;@two = (4, 5, @one, 6, 7);

@array = (7, 8, 9);$b = $array[0];

$#array

The push and pop Functions

Do things to the “right” side of a list (the portion with the highest subscripts).push function@mylist = (1, 2, 3);push(@mylist, 4); # @mylist = (1, 2, 3, 4)

pop functionremove the last element of array.return undef if given an empty list.

@mylist = (1, 2, 3);$lastvalue = pop(@mylist); # $lastvalue = 3

The shift and unshiftFunctions

Do things to the “left” side of a list (the portion with the lowest subscripts).shift function@one = (1, 2, 3);$a = shift(@one); # $a = 1; @one = (2,3);

unshift function$a = 3;@one = (4, 5, 6);unshift(@one, $a); # @one = (3, 4, 5, 6);

The reverse Function

The reverse function reverse the order of the elements of its argument, returning the resulting list.

@alist = (7, 8, 9);

@blist = reverse(@alist); # @blist = (9, 8, 7);

The sort Function

The sort function takes its arguments, and sorts them as if they were single strings in ascending ASCII order. It returns the sorted list without altering the original list.

@one = sort(“one”, “two”, “three”); # @one = (“one”, “three”, “two”);

@one = sort(1, 2, 4, 8, 16, 32, 64);# @one = (1, 16, 2, 32, 4, 64, 8);

<STDIN> as a Scale Value

Each time you use <STDIN> in a place where a scalar value is expected, Perlreads the next complete text line from standard input (up to the first newline), and uses that string as the value of <STDIN>.

$a = <STDIN>;

<STDIN> as an Array

In a list context, <STDIN> returns all remaining lines up to end of file. Each line is returned as a separate element of the list.

@a = <STDIN>;

Control Structure

Statement BlocksThe if/unless StatementThe while/until StatementThe do {} while / until StatementThe for StatementThe foreach Statement

Statement Blocks

A statement block is a sequence of statements, enclosed in matching curly braces.

{statement_1;statement_2;…statement_n;

}

The if Statementif (some_expression)

true_statement_block

if (some_expression)true_statement_block

elsefalse_statement_block

if (some_expression_1)true_statement_block_1

elsif (some_expression_2)true_statement_block_2

elsif (some_expression_3)true_statement_block_3

elseall_false_statement_block

The unless Statement

Do that if this is false.

$a = 20;unless ($a < 18) {

print “Old enough! So go vote!\n”;} else {

print “You’re not old enough to vote. \n”;}

The while / until Statementwhile statement

If the value of the control expression is true, the body of the while statement is evaluated repeatedly until the control expression becomes false.

while (some_expression)statement_block

until statement If the value of the control expression is false, the body of the until statement is evaluated repeatedly until the control expression becomes true.

until (some_expression)statement_block

The do {} while/until Statement

do {} while /until statementThis statement doesn’t test the expression until after executing the loop once.

do statement_block

while some_expression;

dostatement_block

until some_expression;

The for statement

The for statement looks like C or Java’s for statement and works in the same way.

for (initial_exp; test_exp; re-init_exp)statement_block

for ($i = 1; $i <= 10; $i++) {print “i = $i\n”;

}

The foreach Statement

This statement takes a list of values and assigns them one at a time to a scalar variable, executing a block of code with each successive assignment.@list = (“a”, “b”, “c”, “d”, “e”, “f”);

foreach $element (@list) {print “$element\n”;

}

Example: Command Line Values and Iterative Loops

print "$#ARGV is the subscript of the ", "last command argument.\n";

# Iterate on numeric subscript 0 to $#ARGV: for ($i=0; $i <= $#ARGV; $i++) {

print "Argument $i is $ARGV[$i].\n"; } # A variation on the preceding loop: foreach $item (@ARGV) { print "The word is: $item.\n";

} # A similar variation, using the "Default Scalar Variable" $_ : foreach (@ARGV) { print "Say: $_.\n";

}

Output of the Example

> perl example5.pl Good morning, everyone!2 is the subscript of the last command argument. Argument 0 is Good. Argument 1 is morning,. Argument 2 is everyone. The word is: Good. The word is: morning,. The word is: everyone!. Say: Good. Say: morning,. Say: everyone!.

Hash Functions

The keys functionThe values functionThe each functionThe delete function

The keys Function

The keys (%hashname) function returns a list of all the current keys in the hash %hashname.%classification = (

‘dog’ => ‘mammal’,‘asp’ => ‘reptile’,‘robin’ => ‘bird’,

);# keys(%classification) gets (‘dog’, ‘asp’, ‘robin’)

The values Function

The values (%hashname) function returns a list of all the current values of the %hashname, in the same order as the keys returned by the keys (@hashname) function.%classification = (

‘dog’ => ‘mammal’,‘asp’ => ‘reptile’,‘robin’ => ‘bird’,

);# values(%classification) gets (‘mammal’, ‘reptile’, ‘bird’)

The each FunctionThe each (@hashname) returns a key-value pair as a two-element list. On each evaluation of this function for the same hash, the next successive key-value pair is returned until all the elements have been accessed.

while (($mykey, $myvalue) = each(%classification)) {print “The hash value of $mykey is $myvalue.\n”;

}

The delete Function

The delete function removes hash elements from a hash.

delete $classification{‘dog’};while (($mykey, $myvalue) = each(%classification)) {

print "The hash value of $mykey is $myvalue\n";}

Miscellaneous Control Structures

The last StatementThe next StatementThe redo Statement&& and || as Control Structure

The last StatementThe last statement breaks out of the innermost enclosing loop block, causing execution to continue with the statement immediately following the block.

while (something) {something;if (somecondition) {

someotherthing;last; # break out the while loop

}morethings;

}# last comes here

The next Statement

The next statement causes execution to skip past the rest of the innermost enclosing looping block without terminating the block.

while (something) {firstpart;if (somecondition) {

somepart;next;

}otherpart;# next comes here

}

The redo Statement

The redo statement causes a jump to the beginning of the current block (without reevaluating the control expression).

while (somecondition) {# redo comes heresomething;if (somecondition) {

somestuff;redo;

}morething;

}

&& and || as Control Structures

They determine the truth of the statement by evaluating the fewest number of operands possible.If the left operand of an && operator is false, the right operand is never evaluated.If the left operand of the || operator is true, the right operand is never evaluated.

open(FILE, “somefile”) || die “Cannot open input file.\n”;