37
Bioinformatics master course, ‘11/’12 Paolo Marcatili Data Types in Perl

Master datatypes 2011

Embed Size (px)

Citation preview

Page 1: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Data Types in Perl

Page 2: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Agenda

•  Perl Basics •  Hello World •  Scalars •  Arrays •  Hashes

Page 3: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Task Today

Page 4: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Parsing

Parse a file Sort its words alphabetically Sort its words by number of occurences

Page 5: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Perl Basics

Page 6: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

PERL

Practical Extraction and Reporting Language ü  Handle text files ü  Web (CGI) ü  Small scripts

http://www.perltutorial.org/

Page 7: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Install

Windows http://www.activestate.com/activeperl/ Cygwin (linux emulation)

Linux / OS-X Native

Page 8: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Hello World!

Page 9: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

First script

Open an editor (e.g. gedit) #!/usr/bin/perl -w use strict; use warnings; print "Hello World!\n";

Save as -> first.pl

Page 10: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

How to run a script

Terminal -> move to the script folder perl first.pl or chmod a+x first.pl <- now it is executable by

everyone ./first.pl <- ./ means ‘in this folder’

Page 11: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Variable Overview

Page 12: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Overview

Data types Casting Variable Scope (De)referencing

Page 13: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Data Types

5

Hello world!

6.28 1.6e-4

J

(1,1,2,3,5,8)

ACCGACGACGCAGC

Mamma:3381245671

Page 14: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Overview

5

Hello world!

6.28

1.6e-4

J

(1,1,2,3,5,8)

ACCGACGACGCAGC

Mamma:3381245671

Scalars arrays hashes

Page 15: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Scalars

Page 16: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Scalars

my $scalar; $scalar=5; $scalar=$scalar+3; $scalar= “scalar vale $scalar\n”; print $scalar; > scalar vale 8

Page 17: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Scalars - 2

ü  Scalar data can be number or string. ü  In Perl, string and number can be used " nearly interchangeable." ü  Scalar variable is used to hold scalar data. ü  Scalar variable starts with dollar sign ($) " followed by Perl identifier. ü  Perl identifier can contain " alphanumeric and underscores. ü  It is not allowed to start with a digit.

Page 18: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Examples #floating-point values my $x = 3.14; my $y = -2.78; #integer values my $a = 1000; my $b = -2000; my $s = "2000"; # similar to $s = 2000; #strings my $str = "this is a string in Perl". my $str2 = 'this is also as string too'.

Page 19: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Operations my $x = 5 + 9; # Add 5 and 9, and then store the result in $x $x = 30 - 4; # Subtract 4 from 30 and then store the result in $x $x = 3 * 7; # Multiply 3 and 7 and then store the result in $x $x = 6 / 2; # Divide 6 by 2 $x = 2 ** 8; # two to the power of 8 $x = 3 % 2; # Remainder of 3 divided by 2 $x++; # Increase $x by 1 $x--; # Decrease $x by 1

my $y = $x; # Assign $x to $y $x += $y; # Add $y to $x $x -= $y; # Subtract $y from $x $x .= $y; # Append $y onto $x

Page 20: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Operations - 2 my $x = 3; my $c = "he "; my $s = $c x $x; # $c repeated $x times my $b = "bye"; print $s . "\n"; #print s and start a new line # similar to print "$s\n"; my $a = $s . $b; # Concatenate $s and $b print $a; # Interpolation my $x = 10; my $s = "you get $x"; print $s;

Page 21: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Type Casting

my $x = “3”; print $x + 4 .”\n”; Be careful!! my $x = "3"; my $y = 1; my $z = "uno"; print $x + $y."\n"; print $x + $z."\n"; print $x + 4 . 1 ."\n"; print $x + 4.1 ."\n";

(or  data  conversion,  or  coercion)  is  usually  silent  in  perl  

Page 22: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Arrays

Page 23: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

boxed scalars

Scalar Array

Indices  are  sequen6al  integers  star6ng  from  0    

Page 24: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

array - 1

("Perl","array","tutorial"); (5,7,9,10); (5,7,9,"Perl","list"); (1..20); ();

Page 25: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

array - 2 my @str_array=("Perl","array","tutorial"); my @num_array=(5,7,9,10); my @mixed_array=(5,7,9,"Perl","list"); my @rg_array=(1..20); my @empty_array=(); print $str_array[1]; # 1st element is [0]

Page 26: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

operations my @int =(1,3,5,2); push(@int,10); #add 10 to @int print "@int\n"; my $last = pop(@int); #remove 10 from @int print "@int\n"; unshift(@int,0); #add 0 to @int print "@int\n"; my $start = shift(@int); # add 0 to @int print "@int\n";

Page 27: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

on array

my @int =(1,3,5,2); foreach my $element (@int){ print “element is $element\n”; } my @sorted=sort(@int); foreach my $element (@sorted){ print “element is $element\n”; }

Page 28: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Hashes

Page 29: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Hashes •  Hashes are like array, they store collections of scalars"

... but unlike arrays, indexing is by name (just like in real life!!!)"

•  Two components to each hash entry: –  Key example : name –  Value example : phone number

•  Hashes denoted with % –  Example : %phoneDirectory

•  Elements are accessed using {} (like [] in arrays)

Page 30: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Hashes continued ...

•  Adding a new key-value pair $phoneDirectory{“Shirly”} = 7267975 –  Note the $ to specify “scalar” context!

•  Each key can have only one value $phoneDirectory{“Shirly”} = 7265797 # overwrites previous assignment

•  Multiple keys can have the same value

•  Accessing the value of a key $phoneNumber =$phoneDirectory{“Shirly”};

Page 31: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Hashes and Foreach

•  Foreach works in hashes as well!

foreach $person (keys (%phoneDirectory) ) {

print “$person: $phoneDirectory{$person}”; }

•  Never depend on the order you put key/values in the hash! Perl has its own magic to make hashes amazingly fast!!

Page 32: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Hashes and Sorting

•  The sort function works with hashes as well •  Sorting on the keys

foreach $person (sort keys %phoneDirectory) { print “$person : $directory{$person}\n”; } –  This will print the phoneDirectory hash table in

alphabetical order based on the name of the person, i.e. the key.

Page 33: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Hash and Sorting cont...

•  Sorting by value

foreach $person (sort {$phoneDirectory{$a} <=> $phoneDirectory{$b}} keys %phoneDirectory) { print “$person : $phoneDirectory{$person}\n”; }

–  Prints the person and their phone number in the

order of their respective phone numbers, i.e. the value.

Page 34: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Exercise

•  Chose your own test or use wget "

•  Identify the 10 most frequent words

•  Sort the words alphabetically"

•  Sort the words by the number of occurrences

Page 35: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Counting Words

my %seen; my $l=“Lorem ipsum”; my @w=split (“ “, $l);# questa è una funzione nuova… foreach my $word (@w){

$seen{$word}++; } print “Sorted by occurrences\n”; foreach my $word (sort {$seen{$a}<=>$seen{$b}} keys %seen){

print “Word $word N: $seen{$word}\n”; } print “Sorted alphabetically\n”; foreach my $word (sort ( keys %seen)){ print “Word $word N: $seen{$word}\n”; }

Page 36: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Homeworks

Download the “Divina commedia” (wget http://www.gutenberg.org/cache/epub/1000/pg1000.txt ) For each word length, count the number of occurences (e.g. 123456 words of length 2, etc.) Length of a string : length($a)

Page 37: Master datatypes 2011

Bioinformatics master course, ‘11/’12 Paolo Marcatili

Modalità  di  esame:  Difficoltà:  febbraio  <  giugno  <  seBembre  Per  fare  l’esame  è  NECESSARIO    avermi  mandato  tuM  i  compi6    e  una  esercitazione