Upload
paolo-marcatili
View
508
Download
0
Tags:
Embed Size (px)
Citation preview
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Data Types in Perl
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Agenda
• Perl Basics • Hello World • Scalars • Arrays • Hashes
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Task Today
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Parsing
Parse a file Sort its words alphabetically Sort its words by number of occurences
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Perl Basics
Bioinformatics master course, ‘11/’12 Paolo Marcatili
PERL
Practical Extraction and Reporting Language ü Handle text files ü Web (CGI) ü Small scripts
http://www.perltutorial.org/
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Install
Windows http://www.activestate.com/activeperl/ Cygwin (linux emulation)
Linux / OS-X Native
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Hello World!
Bioinformatics master course, ‘11/’12 Paolo Marcatili
First script
Open an editor (e.g. gedit) #!/usr/bin/perl -w use strict; use warnings; print "Hello World!\n";
Save as -> first.pl
Bioinformatics master course, ‘11/’12 Paolo Marcatili
How to run a script
Terminal -> move to the script folder perl first.pl or chmod a+x first.pl <- now it is executable by
everyone ./first.pl <- ./ means ‘in this folder’
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Variable Overview
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Overview
Data types Casting Variable Scope (De)referencing
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Data Types
5
Hello world!
6.28 1.6e-4
J
(1,1,2,3,5,8)
ACCGACGACGCAGC
Mamma:3381245671
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Overview
5
Hello world!
6.28
1.6e-4
J
(1,1,2,3,5,8)
ACCGACGACGCAGC
Mamma:3381245671
Scalars arrays hashes
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Scalars
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Scalars
my $scalar; $scalar=5; $scalar=$scalar+3; $scalar= “scalar vale $scalar\n”; print $scalar; > scalar vale 8
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Scalars - 2
ü Scalar data can be number or string. ü In Perl, string and number can be used " nearly interchangeable." ü Scalar variable is used to hold scalar data. ü Scalar variable starts with dollar sign ($) " followed by Perl identifier. ü Perl identifier can contain " alphanumeric and underscores. ü It is not allowed to start with a digit.
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Examples #floating-point values my $x = 3.14; my $y = -2.78; #integer values my $a = 1000; my $b = -2000; my $s = "2000"; # similar to $s = 2000; #strings my $str = "this is a string in Perl". my $str2 = 'this is also as string too'.
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Operations my $x = 5 + 9; # Add 5 and 9, and then store the result in $x $x = 30 - 4; # Subtract 4 from 30 and then store the result in $x $x = 3 * 7; # Multiply 3 and 7 and then store the result in $x $x = 6 / 2; # Divide 6 by 2 $x = 2 ** 8; # two to the power of 8 $x = 3 % 2; # Remainder of 3 divided by 2 $x++; # Increase $x by 1 $x--; # Decrease $x by 1
my $y = $x; # Assign $x to $y $x += $y; # Add $y to $x $x -= $y; # Subtract $y from $x $x .= $y; # Append $y onto $x
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Operations - 2 my $x = 3; my $c = "he "; my $s = $c x $x; # $c repeated $x times my $b = "bye"; print $s . "\n"; #print s and start a new line # similar to print "$s\n"; my $a = $s . $b; # Concatenate $s and $b print $a; # Interpolation my $x = 10; my $s = "you get $x"; print $s;
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Type Casting
my $x = “3”; print $x + 4 .”\n”; Be careful!! my $x = "3"; my $y = 1; my $z = "uno"; print $x + $y."\n"; print $x + $z."\n"; print $x + 4 . 1 ."\n"; print $x + 4.1 ."\n";
(or data conversion, or coercion) is usually silent in perl
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Arrays
Bioinformatics master course, ‘11/’12 Paolo Marcatili
boxed scalars
Scalar Array
Indices are sequen6al integers star6ng from 0
Bioinformatics master course, ‘11/’12 Paolo Marcatili
array - 1
("Perl","array","tutorial"); (5,7,9,10); (5,7,9,"Perl","list"); (1..20); ();
Bioinformatics master course, ‘11/’12 Paolo Marcatili
array - 2 my @str_array=("Perl","array","tutorial"); my @num_array=(5,7,9,10); my @mixed_array=(5,7,9,"Perl","list"); my @rg_array=(1..20); my @empty_array=(); print $str_array[1]; # 1st element is [0]
Bioinformatics master course, ‘11/’12 Paolo Marcatili
operations my @int =(1,3,5,2); push(@int,10); #add 10 to @int print "@int\n"; my $last = pop(@int); #remove 10 from @int print "@int\n"; unshift(@int,0); #add 0 to @int print "@int\n"; my $start = shift(@int); # add 0 to @int print "@int\n";
Bioinformatics master course, ‘11/’12 Paolo Marcatili
on array
my @int =(1,3,5,2); foreach my $element (@int){ print “element is $element\n”; } my @sorted=sort(@int); foreach my $element (@sorted){ print “element is $element\n”; }
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Hashes
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Hashes • Hashes are like array, they store collections of scalars"
... but unlike arrays, indexing is by name (just like in real life!!!)"
• Two components to each hash entry: – Key example : name – Value example : phone number
• Hashes denoted with % – Example : %phoneDirectory
• Elements are accessed using {} (like [] in arrays)
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Hashes continued ...
• Adding a new key-value pair $phoneDirectory{“Shirly”} = 7267975 – Note the $ to specify “scalar” context!
• Each key can have only one value $phoneDirectory{“Shirly”} = 7265797 # overwrites previous assignment
• Multiple keys can have the same value
• Accessing the value of a key $phoneNumber =$phoneDirectory{“Shirly”};
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Hashes and Foreach
• Foreach works in hashes as well!
foreach $person (keys (%phoneDirectory) ) {
print “$person: $phoneDirectory{$person}”; }
• Never depend on the order you put key/values in the hash! Perl has its own magic to make hashes amazingly fast!!
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Hashes and Sorting
• The sort function works with hashes as well • Sorting on the keys
foreach $person (sort keys %phoneDirectory) { print “$person : $directory{$person}\n”; } – This will print the phoneDirectory hash table in
alphabetical order based on the name of the person, i.e. the key.
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Hash and Sorting cont...
• Sorting by value
foreach $person (sort {$phoneDirectory{$a} <=> $phoneDirectory{$b}} keys %phoneDirectory) { print “$person : $phoneDirectory{$person}\n”; }
– Prints the person and their phone number in the
order of their respective phone numbers, i.e. the value.
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Exercise
• Chose your own test or use wget "
• Identify the 10 most frequent words
• Sort the words alphabetically"
• Sort the words by the number of occurrences
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Counting Words
my %seen; my $l=“Lorem ipsum”; my @w=split (“ “, $l);# questa è una funzione nuova… foreach my $word (@w){
$seen{$word}++; } print “Sorted by occurrences\n”; foreach my $word (sort {$seen{$a}<=>$seen{$b}} keys %seen){
print “Word $word N: $seen{$word}\n”; } print “Sorted alphabetically\n”; foreach my $word (sort ( keys %seen)){ print “Word $word N: $seen{$word}\n”; }
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Homeworks
Download the “Divina commedia” (wget http://www.gutenberg.org/cache/epub/1000/pg1000.txt ) For each word length, count the number of occurences (e.g. 123456 words of length 2, etc.) Length of a string : length($a)
Bioinformatics master course, ‘11/’12 Paolo Marcatili
Modalità di esame: Difficoltà: febbraio < giugno < seBembre Per fare l’esame è NECESSARIO avermi mandato tuM i compi6 e una esercitazione