Upload
bruce-hamilton
View
221
Download
0
Embed Size (px)
Citation preview
1
Topics Quiz 1 Homework Review Programming Assignment # 1 Perl shortcuts Declaring variables and Scope Subroutines
passing arguments array references
Programming Methods Top Down Design Bottom Up Coding and
Testing Debugging
Reading manuals and help pages Plain old documentation (POD)
Lab time
BINF 634 FALL 2015
Acknowledgements
Thanks to John Grefenstette for allowing me to use these slides as a starting point for tonight’s lecture
BINF 634 FALL 2015 2
4
Perl Shortcuts Any simple statement can be followed by a single modifier
right before the ; or closing }STATEMENT if EXPRSTATEMENT unless EXPRSTATEMENT while EXPRSTATEMENT until EXPR
$ave = $ave/$n unless $n == 0;Same as:
unless ($n == 0) { $ave = $ave/$n }
What does this do?$x = 0;print $x++, "\n" until $x == 10;
Output0123456789
BINF 634 FALL 2015
5
Perl Shortcuts Any simple statement can be followed by a single modifier
STATEMENT foreach LISTSTATEMENT is evaluated for each item in LIST,with $_ set to current item.
@A = qw/One two three four/;print "$_\n" foreach @A;
Output:Onetwothreefour
BINF 634 FALL 2015
6
Perl Shortcuts Predefined Perl functions may be used with or without parentheses
around their arguments:
$next = shift @A;open FILE, $filename or die "Can't open $filename";$next = shift @A;@chars = split //, $word;@fields = split /:/, $line;
Many Perl functions assume $_ if their argument is omitted: @A = qw/One two three four/;print length, " $_\n" foreach @A;
3 One3 two5 three4 four BINF 634 FALL 2015
7
Scope of variables my variables can be accessed only until the end of the
enclosing block (or until end of file, if outside any block) It's best to declare a variable in the smallest possible scope
if ($x < $y) { my $tmp = $x; $x = $y; $y = $tmp }
Variable declared in a control-flow statement are visible only with the associated block:
my @seq_list = qw/ATT TTT GGG/;my $sequence = "NNN";for my $sequence (@seq_list){ $sequence .= "TAG"; print "$sequence\n";}print "$sequence\n";
Output: ATTTAG TTTTAG GGGTAG NNN
Are these twodifferent variables?
BINF 634 FALL 2015
8
Subroutines
Advantages of Subroutines Shorter code Easier to test Easier to understand More reliable Faster to write Re-usable
BINF 634 FALL 2015
9
Subroutines Defining a subroutine:
sub name { BLOCK }
Arguments are accessed through array @_
Subroutine values are returned by:return VALUE
Subroutines may be defined anywhere in the file, but are usually placed at end
They can be arranged alphabetically or by functionality
BINF 634 FALL 2015
Passing Parameters Into Subroutines
Values are passed into subroutines using the special array @_ How do we know that this is an
array?? The shortened name of this
argument is _ It contains all of the scalars passed
into the subroutineBINF 634 FALL 2015 10
Pass by Value#!/usr/bin/perl -w# A driver program to test a subroutine that# uses pass by value
use strict;use warnings;
my $i = 2;
simple_sub($i);
print "In main program, after the subroutine call, \$i equals $i\n\n";
exit;
sub simple_sub {my($i)=@_;
$i += 100;
print "In subroutine simple_sub, \$i equals $i\n\n";}
OutputIn subroutine simple_sub, $i equals 102
In main program, after the subroutine call, $i equals 2 11BINF 634 FALL 2015
Why are the two values different?
#!/usr/bin/perluse strict;use warnings;# File: min.pl
my $a = <STDIN>; chomp $a;my $b = <STDIN>; chomp $b;
$small = min($a, $b);
print "min of $a and $b is $small\n";exit;
sub min { my ($n, $m) = @_; # @_ is the array of parameters if ($n < $m) { return $n } else { return $m }}
% min.pl12345min of 123 and 45 is 45
There is a bug in this program as written can you find it? How would you fix it to produce the indicated output below?
12BINF 634 FALL 2015
$small is not declared
#!/usr/bin/perluse strict;use warnings;# File: min_max.pl## Subroutines can return lists
my $a = <STDIN>; chomp $a;my $b = <STDIN>; chomp $b;
my ($small, $big) = min_max($a, $b);
print "max of $a and $b is $big\n";print "min of $a and $b is $small\n";exit;
sub min_max { my ($n, $m) = @_; # @_ is the array of parameters if ($n < $m) { return ($n, $m) } else { return ($m, $n) }}
% min_max.pl12345max of 123 and 45 is 123min of 123 and 45 is 45 13BINF 634 FALL 2015
14
Passing arguments All arguments are passed in a single list
@a = qw/ This will all /;$b = "end";@c = qw/ up together /;@c = foo(@a, $b, @c);print "@c\n";
sub foo { my @args = @_; return @args;}
Output:
This will all end up together
BINF 634 FALL 2015
Array Flattening#!/usr/bin/perl -w# A driver program to test a subroutine that# illustrates array flattening
use strict;use warnings;
my @i = ('1', '2', '3');my @j = ('a','b','c');
print "In main program before calling subroutine: i = " . "@i\n";
print "In main program before calling subroutine: j = " . "@j\n";
reference_sub(@i, @j);
print "In main program after calling subroutine: i = " . "@i\n";print "In main program after calling subroutine: j = " . "@j\n";
exit;
sub reference_sub {my (@i, @j) = @_;
print "In subroutine : i = " . "@i\n";print "In subroutine : j = " . "@j\n";
push(@i, '4');
shift(@j);
}
OutputIn main program before calling subroutine: i = 1 2 3In main program before calling subroutine: j = a b cIn subroutine : i = 1 2 3 a b cIn subroutine : j = In main program after calling subroutine: i = 1 2 3In main program after calling subroutine: j = a b c
15BINF 634 FALL 2015
Passing by Value Versus Passing by Reference Passing by Value
Pass a copy of the variable Changes made to variable in subroutine do not
effect the value of variables in the main body Can cause array flattening
Passing by Reference Pass a reference (pointer) to the variable Must be dereferenced when used in the
subroutine This is the cure for array flattening
BINF 634 FALL 2015 16
Perl References - I
A reference is a scalar variable that refers to (points to) another variable So a reference might refer to an array
$aref = \@array; # $aref now holds a reference to @array
$xy = $aref; # $xy now holds a reference to @array
#Lines 2 and 3 working together do the same thing as line 1
$aref = [ 1, 2, 3 ];
@array = (1, 2, 3);
$aref = \@array;
BINF 634 FALL 2015 17
http://perl.plover.com/FAQs/references.html
Perl References - II
BINF 634 FALL 2015 18
http://perl.plover.com/FAQs/references.html
Dereferencing
${$aref}[3] is too hard to read, so you can write
$aref->[3] instead
Additional helpful discussions can be found at
http://oreilly.com/catalog/advperl/excerpt/ch01.html
BINF 634 FALL 2015 19
http://perl.plover.com/FAQs/references.html
Passing by Reference#!/usr/bin/perl -w# A driver program to test a subroutine that# passes by reference
use strict;use warnings;
my @i = ('1', '2', '3');my @j = ('a','b','c');
print "In main program before calling subroutine: i = " . "@i\n";print "In main program before calling subroutine: j = " . "@j\n";
reference_sub(\@i, \@j);
print "In main program after calling subroutine: i = " . "@i\n";print "In main program after calling subroutine: j = " . "@j\n";
exit;
sub reference_sub {my ($i, $j) = @_;
print "In subroutine : i = " . "@$i\n";print "In subroutine : j = " . "@$j\n";
push(@$i, '4');
shift(@$j);
}
Output:In main program before calling subroutine: i = 1 2 3In main program before calling subroutine: j = a b cIn subroutine : i = 1 2 3In subroutine : j = a b cIn main program after calling subroutine: i = 1 2 3 4In main program after calling subroutine: j = b c
20BINF 634 FALL 2015
Arrays references To pass more than one list to a subroutine, use references to the
arrays
@a = qw/ This will all /;$b = "end";@c = qw/ up together /;
# this passes in references to the arraysbar(\@a, $b, \@c); # \@a is a reference (pointer) to @a
sub bar { my ($x, $b, $z) = @_; # @_ has three items
# dereference first argument my @A = @$x; # @$x is the array referenced by $x
# dereference third argument my @C = @$z;
print "@A\n"; print "$b\n"; print "@C\n";}
This will allendup together 21BINF 634 FALL 2015
22
Program Design
Input Algorithm Output
Q. What is the form of input data?
Q. How will the program will get it?
- interactive
- command line
- parameter file
Q. How will the program process the data to compute the desired output?
Q. How will the output be formatted and delivered?
Specified by user requirements
BINF 634 FALL 2015
23
Program Design Design Top Down
Identify the inputs Understand the requirements for the output Design an overall algorithm for computing the output Express overall method in pseudocode Refine pseudocode until each step forms a well-defined subroutine
Test Bottom Up Write and debug subroutines one at a time Start with “utility” subroutines that will be used by other
subroutines Test each subroutine with input data that gives known results Include subroutines that help debugging, such as printing routines
for data structures
BINF 634 FALL 2015
24
Pseudocode High level, informal program No details
Example: print out length statistics and overall nucleotide usage statistics for a file of sequences
Input: get sequences from DNAfile
Algorithm: for each DNA sequence, get length statistics count each type of nucleotide
Output:print length statisticsprint nucleotide usage statistics
BINF 634 FALL 2015
25
Pseudocode Keep pseudocode in perl program as comments
# get sequences from DNAfile
# for each DNA sequence, # get length statistics # count each type of nucleotide
# print length statistics
# print nucleotide usage statistics
BINF 634 FALL 2015
26
RefinementRefine pseudocode into more detailed steps:
Input:get name of DNAfile
open DNAfileread lines from DNAfile, putting DNA sequences in a list
Algorithm: for each DNA sequence in the list get length and update statistics count each type of nucleotide in the sequence
Output:print length statisticsprint nucleotide usage statistics
BINF 634 FALL 2015
27
Algorithm RefinementTry to express complex tasks using Perl control structures (e.g. loops)
until inner subtasks for well-defined tasks that can be done by a single subroutine.
Algorithm:
for each DNA sequence in the list
get length and update statistics
count each type of nucleotide in the sequence
for each DNA sequence in the list get length and update statistics for each base count the occurrence of that base in the sequence
Now write a subroutine to count any base in any sequence
BINF 634 FALL 2015
28
Program Design Design Top Down
Identify the inputs Understand the requirements for the output Design an overall algorithm for computing the output Express overall method in pseudocode Refine pseudocode until each step forms a well-defined subroutine
Test Bottom Up Write and debug subroutines one at a time Start with “utility” subroutines that will be used by other
subroutines Test each subroutine with input data that gives known results Include subroutines that help debugging, such as printing routines
for data structures
BINF 634 FALL 2015
#!/usr/bin/perl# File: sub1.pl# subroutine to count A's in DNAuse warnings;use strict;
my $a;my $dna = "tagATAGAC";
$a = count_A($dna);print "$dna\n";print "a: $a\n";exit;
########################################## subroutine to count A's in DNA#sub count_A { # @_ is the list of parameters my ($dna) = @_; # array context assignment my $count;
# tr returns number of matches $count = ($dna =~ tr/Aa//); return $count;}
Output:
tagATAGACa: 4
After you've written a subroutine, ask yourself if it can be made a bit more general
29BINF 634 FALL 2015
#!/usr/bin/perl# File: sub2.pl# subroutine to count any letter in DNAuse warnings;use strict;
my ($a, $c, $g, $t);my $dna = "tagATAGAC";
$a = count_base('A', $dna);$t = count_base('T', $dna);$c = count_base('C', $dna);$g = count_base('G', $dna);
print "$dna\n";print "a: $a t: $t c: $c g: $g\n";exit;
########################################### subroutine to count any letter in DNA#sub count_base { my( $base, $dna ) = @_; my( $count );
$count = ($dna =~ s/$base//ig); return $count;}
Output:
tagATAGACa: 4 t: 2 c: 1 g: 2
30BINF 634 FALL 2015
31
Program Design: Managing Complexity
Understand inputs and outputs Use pseudocode to refine your algorithm Use divide-and-conquer to turn big problems into
manageable pieces within a chromosomes, process one gene at a time within each gene, process one reading frame at a time within each reading frame, process one ORF at a time
Pick data structures that make algorithms easier this gets easier with experience!
Write subroutines to transform one data object to another, for example:
dna (string) to reading frame (array of codons) reading frame to orf
perform some well defined task compute some statistics on a single data object produce final output format
Write small programs (drivers) to test each subroutine before combining them togetherBINF 634 FALL 2015
Some Good Programming References
Algorithms + Data Structures = Programs (Prentice-Hall Series in Automatic Computation)[Hardcover] Niklaus Wirth (Author)
Introduction to Algorithms [Hardcover] Thomas H. Cormen (Author), Charles E.
Leiserson (Author), Ronald L. Rivest (Author), Clifford Stein (Author)
BINF 634 FALL 2015 32
33
Read The Fine Manual (RTFM) The more you read manuals, the easier it will be For each function we have covered tonight, read
the corresponding description in Ch. 29 of Wall If you find something in the manual you don't
understand, look it up (or ask someone) Learn to use the online help pages, e.g.,
% perldoc -f join To see a list of online tutorials, see
% perldoc perlFor example:% perldoc perlstyle
The interface is somewhat vi like
BINF 634 FALL 2015
34
Debugging Strategies Before running the program, always run % perl -c
prog Read the warnings and error message from the
compiler carefully Always use strict and use warnings Basic strategy: bottom-up debugging
Test and debug one subroutine at a time Insert print statements
to figure out where a program fails to print values of variables Comment out when not needed - don't remove!
BINF 634 FALL 2015
Starting the Debugger[binf:~/binf634/workspace/binf634_book_examples] jsolka% perl
-d example-6-4.pl
Loading DB routines from perl5db.pl version 1.28
Editor support available.
Enter h or `h h' for help, or `man perldebug' for more help.
main::(example-6-4.pl:11): my $dna = 'CGACGTCTTCTAAGGCGA';
DB<1>
BINF 634 FALL 2015 35
Getting Help Within the Debugger - I
DB<2> hList/search source lines: Control script execution:
l [ln|sub] List source code T Stack trace
- or . List previous/current line s [expr] Single step [in expr]
v [line] View around line n [expr] Next, steps over subs
f filename View source in file <CR/Enter> Repeat last n or s
/pattern/ ?patt? Search forw/backw r Return from subroutine
M Show module versions c [ln|sub] Continue until position
Debugger controls: L List break/watch/actions
o [...] Set debugger options t [expr] Toggle trace [trace expr]
<[<]|{[{]|>[>] [cmd] Do pre/post-prompt b [ln|event|sub] [cnd] Set breakpoint
! [N|pat] Redo a previous command B ln|* Delete a/all breakpoints
H [-num] Display last num commands a [ln] cmd Do cmd before line
= [a val] Define/list an alias A ln|* Delete a/all actions
h [db_cmd] Get help on command w expr Add a watch expression
h h Complete help page W expr|* Delete a/all watch exprs
|[|]db_cmd Send output to pager ![!] syscmd Run cmd in a subprocess
q or ^D Quit R Attempt a restart
BINF 634 FALL 2015 36
Getting Help With the Debugger - IIData Examination: expr Execute perl code, also see: s,n,t expr
x|m expr Evals expr in list context, dumps the result or lists methods.
p expr Print expression (uses script's current package).
S [[!]pat] List subroutine names [not] matching pattern
V [Pk [Vars]] List Variables in Package. Vars can be ~pattern or !pattern.
X [Vars] Same as "V current_package [Vars]". i class inheritance tree.
y [n [Vars]] List lexicals in higher scope <n>. Vars same as V.
e Display thread id E Display all thread ids.
For more help, type h cmd_letter, or run man perldebug for all docs.
BINF 634 FALL 2015 37
Stepping Through Statements With the Debugger
main::(example-6-4.pl:11): my $dna = 'CGACGTCTTCTAAGGCGA';
DB<2> p $dna
DB<3>
DB<3> n
main::(example-6-4.pl:12): my @dna;
DB<6> l
12==> my @dna;
13: my $receivingcommittment;
14: my $previousbase = '';
15
16: my$subsequence = '';
17
18: if (@ARGV) {
19: my$subsequence = $ARGV[0];
20 }else{
21: $subsequence = 'TA';
DB<6> p $dna
CGACGTCTTCTAAGGCGA
BINF 634 FALL 2015 38
Using the Perl Debugger DB<7> n
n
main::(example-6-4.pl:13): my $receivingcommittment;
DB<7> n
main::(example-6-4.pl:14): my $previousbase = '';
DB<7> n
main::(example-6-4.pl:16): my$subsequence = '';
DB<7> n
main::(example-6-4.pl:18): if (@ARGV) {
DB<7> n
main::(example-6-4.pl:21): $subsequence = 'TA';
DB<7> n
main::(example-6-4.pl:24): my $base1 = substr($subsequence, 0, 1);
BINF 634 FALL 2015 39
Using the Perl DebuggerDB<7> n
main::(example-6-4.pl:25): my $base2 = substr($subsequence, 1, 1);
DB<7> n
main::(example-6-4.pl:28): @dna = split ( '', $dna );
DB<7> p $base1
T
DB<8> p $base2
A
DB<9>
DB<9> n
main::(example-6-4.pl:39): foreach (@dna) {
DB<9> p @dna
CGACGTCTTCTAAGGCGA
DB<10> p "@dna"
C G A C G T C T T C T A A G G C G A
DB<11>
BINF 634 FALL 2015 40
Examining the LoopDB<12> l 39-52
39==> foreach (@dna) {
40: if ($receivingcommittment) {
41: print;
42: next;
43 } elsif ($previousbase eq $base1) {
44: if ( /$base2/ ) {
45: print $base1, $base2;
46: $recievingcommitment = 1;
47 }
48 }
49: $previousbase = $_;
50 }
51
52: print "\n";
DB<13>
DB<13> b 40
BINF 634 FALL 2015 41
Clearing Breakpoints and Exiting the DebuggerDB<14> c
main::(example-6-4.pl:40): if ($receivingcommittment) {
DB<14> p
C
DB<16> B
Deleting a breakpoint requires a line number, or '*' for all
DB<18> q
For additional discussions please see Ch. 20 of Wall or Ch. 6 of Tisdall
BINF 634 FALL 2015 42
Modules and Libraries - I
We will have more to say about this later
We will collect subroutines into handy files called modules or libraries
We tell the Perl compiler to utilize a particular module with the “use” command
BINF 634 FALL 2015 43
Modules and Libraries - II Modules end in .pm
BeginPerlBioinfo.pm The last line in a module must be
1; So we would access this module by putting the line
use BeginPerlBioinfo; If the Perl compiler can’t find it you may have to
tell it the pathuse lib ‘/home/tisdall/book’
use BeginPerlBioinfo;
BINF 634 FALL 2015 44
45
POD(Ch. 26 in Wall)
Plain Old Documentation produces self-documenting programs Comments can be extracted and formatted by external programs
called translators Keeps program documentation consistent with external
documentation pod text begins with "=identifier" at the start of a line
but only where the compiler is expected a new statement All text is ignored by compiler until next line starting with "=cut" Various translators produced formatted documentation
perldoc, pod2text, pod2html, pod2latex ,etc details of format depends on identifier
BINF 634 FALL 2015
=begin
Put any number of lines of comments here. They will appear in the properformat when processed by pod translators.
=cut
# program text goes here
=begin comment
The identifier indicates which translator should process this text.This text will be ignored by all translators. Use this for internal documentation only.
=cut
# more program text ...
=head1 Section Heading text goes here, for example:
=head1 SYNOPSIS
usage: fasta.pl fastafile
=over
This starts a list:
=item *
First item in a list.
=item *
Second item.
=back
=cut 46BINF 634 FALL 2015
#!/usr/bin/perl
=head1 NAME
arglist.pl
=head1 AUTHOR
Jeff Solka
=head1 SYNOPSIS
usage: arglist.pl arg1 arg2 ...
=head1 DESCRIPTION
Echoes out the command line arguments.
=over
=item *
First item in a list.
=item *
Second item.
=back
=cut
### main programprint "The arguments are: @ARGV\n";exit;
An Example Program
47BINF 634 FALL 2015
48
Our Program in Action
[binf:fall09/binf634/mycode] jsolka% arglist.pl cat
The arguments are: cat
BINF 634 FALL 2015
49
pod2text acting On Our Program[binf:fall09/binf634/mycode] jsolka% pod2text arglist.plNAME arglist.pl
AUTHOR Jeff Solka
SYNOPSIS usage: arglist.pl arg1 arg2 ...
DESCRIPTION Echoes out the command line arguments.
* First item in a list.
* Second item.
See Ch. 26 for other formatting tricks.
BINF 634 FALL 2015
50
perldoc Acting on Our Program[binf:fall09/binf634/mycode] jsolka% perldoc arglist.pl > arglist.mp[binf:fall09/binf634/mycode] jsolka% cat arglist.mp
ARGLIST(1) User Contributed Perl Documentation ARGLIST(1)
NAME arglist.pl
AUTHOR Jeff Solka
SYNOPSIS usage: arglist.pl arg1 arg2 ...
DESCRIPTION Echoes out the command line arguments.
o First item in a list.
o Second item.
perl v5.8.8 2009-09-20 ARGLIST(1)
BINF 634 FALL 2015
perl2html Acting on Our Program[binf:fall09/binf634/mycode] jsolka% pod2html arglist.pl > arglist.html/usr/bin/pod2html: no title for arglist.pl.[binf:fall09/binf634/mycode] jsolka% cat arglist.html<?xml version="1.0" ?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-
strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>arglist.pl</title><meta http-equiv="content-type" content="text/html; charset=utf-8" /><link rev="made" href="mailto:[email protected]" /></head>
<body style="background-color: white">
<p><a name="__index__"></a></p><!-- INDEX BEGIN -->
<ul>
<li><a href="#name">NAME</a></li> <li><a href="#author">AUTHOR</a></li> <li><a href="#synopsis">SYNOPSIS</a></li> <li><a href="#description">DESCRIPTION</a></li></ul><!-- INDEX END -->
<hr /><p></p><</body>
</html>
51BINF 634 FALL 2015
52
html Output (cont.)h1><a name="name">NAME</a></h1><p>arglist.pl</p><p></p><hr /><h1><a name="author">AUTHOR</a></h1><p>Jeff Solka</p><p></p><hr /><h1><a name="synopsis">SYNOPSIS</a></h1><p>usage: arglist.pl arg1 arg2 ...</p><p></p><hr /><h1><a name="description">DESCRIPTION</a></h1><p>Echoes out the command line arguments.</p><ul><li><p>First item in a list.</p></li><li><p>Second item.</p></li></ul>
BINF 634 FALL 2015
A Link to the Autogenerated Website
Here it is
BINF 634 FALL 2015 53
54
Readings
Read Tisdall Chapters 7 and 8 HW3 Exercises 7.2 and 7.3 Don’t forget to turn in Program 1
next week. Don’t forget about Quiz # 2 next
week
BINF 634 FALL 2015