Upload
nguyenphuc
View
241
Download
0
Embed Size (px)
Citation preview
Object OrientedProgramming (OOP)
and introduction to BioPerl
Laurent Falquet (original course by Marco Pagni), Basel October, 2006
Swiss Institute of BioinformaticsSwiss EMBnet node
LF Basel October 2006
Overview in 3 parts
Motivation for OOP
The key concepts of OOP
Using BioPerl objects
LF Basel October 2006
Part I Motivation for OOP
LF Basel October 2006
Background
OOP didn't come out of the blue. It has stronghistorical roots in otherparadigms and practices.
It came about to addressproblems commonlygrouped together as the "software crisis".
The "software crisis" manifests itself in
1. cost overruns2. user dissatisfactionwith the final product3. buggy software4. brittle software
LF Basel October 2006
ComplexitySoftware is inherently complexbecause
we attempt to solveproblems in complexdomainswe are forced by the size of the problem to work in teamssoftware is incrediblymalleable building materialdiscrete systems are proneto unpredictable behaviorsoftware systems consistof many pieces, many of which communicate.
Some factors that impact on and reflect complexity in software
The number of names(variables, functions, etc) that are visibleConstraints on the time-sequence of operations(real-time constraints)Memory management (garbage collection and address spaces)ConcurrencyEvent driven user interfaces.
LF Basel October 2006
How do humans cope withcomplexity in everyday life?
Humans deal withcomplexity by abstractingdetails away.
E.g., Surfing the Internet doesn't requireknowledge of internalprocessors registers; sufficient to think of a computer as simple visualization tool.
To be useful, an abstraction (model) must be smaller than what itrepresents.
E.g., road map vsphotographs of terrain vs physical model.
LF Basel October 2006
Exercise 1
Memorize as many numbers from the following sequence as youcan. I'll show them for 30 seconds. Now write them down.
1759376099873461324287593345108941120765934
How many did you remember?
How many could you remember with unlimited amounts of time?
LF Basel October 2006
Exercise 2
Write down as many of the following telephone numbersas you can:
Pizza:Friend 2:
Friends 1:Fax:
Post Office:Parents:
Co-worker:Boss:
Cellular:Home:
LF Basel October 2006
Answer to the Exercises 1 and 2
By abstracting the detailsof the numbers away and grouping them into a new concept (telephonenumber) we have increased our information handling capacity by nearly an order of magnitude!
Working with abstractions lets us handle more information.
LF Basel October 2006
Exercise 3
How many of these (unrelated) concepts can you memorizein 30 seconds?
LF Basel October 2006
Answer to the Exercises 3
Miller (PsychologicalReview, vol 63(2)):
"The Magical NumberSeven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information"
Working with abstractions lets us handle more information (e.g. phone numbers), but we're stilllimited by Miller's observation. What if you have more than 7 things to juggle in your headsimultaneously?
LF Basel October 2006
Hierarchy
A common strategy: forma hierarchy to classify and order our abstractions.
Common examples aremilitary, large companies, administrationLinnaeus’ classification system of organismEC numbers for enzymatic reactionsUNIX file system.
LF Basel October 2006
Decomposition
Divide and conquer is a handy skill for many thornylife problems.
We want to compose a system from small pieces, rather thanbuild a large monolithicsystem, because the former canbe made more reliable.
Failure in one part, if properlydesigned, won't cause failureof the whole. This depends on the issue of coupling.
We can beat this grim view of a system composed of manyparts by properly decomposingand decoupling. Anotherreason is that we can divide up the work more easily.
LF Basel October 2006
Object technology
Nothing unique about forming abstractions, but in OOP this is a main focus of activity and organization.
We can take advantageof the natural humantendency to anthropomorphise.We'll call ourabstractions objects.
We'll put our abstractions into a hierarchy to keepthem organized and minimize redundancy.
Natural way to "divideand conquer" the large state spaces we face (complexity).
LF Basel October 2006
Part II The Key Concepts of OOP
LF Basel October 2006
Class and Instance
A class is a part of a program that describes the properties of an object. These properties fall intotwo broad categories:
attributes - the dataassociated to an object,methods - the functionsor procedures or subroutines that comesalong with an object.
An instance of an objectis a member of a class which has receivedparticular values to itsattributes.
LF Basel October 2006
Class and Instance example
A "square" class couldhave size and colorattributes and methods to alter them:
Two instances of the square class may consistin a large blue square and a tiny red one:
set_size,
set_color,
perimeter
Methods
size, colorAttributes
squareName
LF Basel October 2006
Encapsulation
The attributes of an objectusually receive somedegree of privateness.
Private attributes are not accessible fromoutside the object.Public attributes can bedirectly accessed by anyother objects.
The methods of an objectusually receive somedegree of privateness.
Encapsulation: the values of an object'sattributes should only bealtered by its ownmethods. Privateattributes should alwaysbe favored.
LF Basel October 2006
Encapsulation
A "person" class couldhave two attributes, nameand credit card numberfor example.
Nobody want its creditcard number beingpublic!
Encapsulation hides the implementation awayfrom the user.
One should be able to drive a car withoutknowing how the engineworks…
LF Basel October 2006
Benefit of encapsulationEncapsulation is a technique for minimizinginterdependencies amongobjects by defining a strict external interface. This way, internal coding canbe changed without affectingthe interface, as long as the new implementation supports the same (or upwardscompatible) external interface.
LF Basel October 2006
MethodsOne could broadly distinguishfour kinds of methods
A method that creates a new object is called a constructor.A method that destroys an object is called a destructor.It is frequent to definemany simple accessmethods just to set or getattribute values. Thesemethods ensure data encapsulation.Other methods that permit the object to perform someuseful actions.
LF Basel October 2006
Hierarchy through inheritance
Classes can have childrenthat is, one class can becreated out of anotherclass.
A sub-class inherits all the attributes and methodsof the super-class, and may have additionalattributes and behaviors.
Inheritance aids in the reuse of code.
familynameAttributes
parentName
familynameAttributes
childName
LF Basel October 2006
Hierarchy exampleA very simple class diagram:
VehiculeAttr:Meth:
speedstartstop
Air-VehiculeAttr:Meth:
nr-wingtake-offland
Land-VehiculeAttr:
Meth:nr-wheel
PlaneAttr:Meth: fuel
BikeAttr:
Meth:
CarAttr:
Meth: fuel
LF Basel October 2006
A real example of hierarchy
The BioPerl class diagram:
LF Basel October 2006
Polymorphism
Polymorphism means the ability to request that the same operations beperformed by a widerange of different objects.Polymorphism is a consequence of inheritance. Fromprevious example, anyvehicle can start and stop.
Think to a computer desktop where one like to open, resize and close anywindow, whatever is the content.
LF Basel October 2006
Objects In Perl Are DeceptivelySimple
An object instance issimply a reference thathappens to know whichclass it belongs to (a reference is a scalar, justlike a number or a string).A class is simply a package that happens to provide methods to deal with object references.
A method is simply a subroutine that expectsan object reference (or a package name, for class methods) as the first argument.
A class inherits through @ISA array
LF Basel October 2006
Class vs module
package MyMod;
sub f { … }
sub g { … }
…
1;
package MyObj;sub new { # constructor
my $class = ref(shift); # get class refmy $self = { @_ }; # set attributesbless($self, $class); # create objectreturn($self); # return instance ref
}sub other_methods { … }…1;
LF Basel October 2006
Class vs module
use MyMod;
…
MyMod::f($param);
my $a = MyMod::g();
use MyObj;…# call constructormy $instance = MyObj->new($attr);# call method$instance->other_methods();
LF Basel October 2006
Part III Using BioPerl Objects
LF Basel October 2006
What is BioPerl?It is a collection of Perl modules for processing data for the life sciencesA project made up of biologists, bioinformaticians, computer scientistsAn open source toolkit of building blocks for life sciences applicationshttp://www.bioperl.org
First work in 1996
Bioperl 1.0 was released in May 2002
Current version 1.5.1 October2005
Part of the open-bio.org foundation (BioJava, BioPython, BioPerl, EMBOSS, BioMoby)
LF Basel October 2006
What to expect from BioPerl?
If you're looking for a script built to fit your exact need it's likely you won't find it.
What you will find is a diverse set of Perl modules that willenable you to write your ownscript, and a community of people who are willing to help you.
The toolkit is divided into several packages, most people will only want to deal with the Corepackage
Core package provides the main parsers, this isthe basic package and it's required by all the other packagesRun package provides wrappers for executingsome 60 common bioinformatics applicationsExt package is for C-language extensions including some alignment algorithms and an interface to the Staden IO libraryGUI package includes some basic widgets in Perl-TkBioPerl db is a subproject to store sequence and annotation data in a BioSQL relational databasePedigree package is for manipulating genotype, marker, and individual data for linkage studiesMicroarray package has preliminary objects for manipulating some microarray data formatsNetwork package parses and analyzes protein-protein interaction dataPipeline package is a project for creatinganalysis pipelines out of bioperl-run modules
LF Basel October 2006
Code SampleThe following piece of code
#!/usr/local/bin/perluse Bio::Seq;$seq_obj=Bio::Seq->new('-seq'=>'acgt');print $seq_obj->seq(),"\n";
prints out
acgt
let's have a look at it, a line after the other.
LF Basel October 2006
Line 1: Invoking Perl
This line tells your operating system where to find the Perl interpreter
#!/usr/local/bin/perluse Bio::Seq;$seq_obj=Bio::Seq->new('-seq'=>'acgt');print $seq_obj->seq(),"\n";
nothing object-oriented here!
LF Basel October 2006
Line 2: Import class
This line tells Perl to use a module on your machine called Seq.pm found in the directory Bio
#!/usr/local/bin/perluse Bio::Seq;$seq_obj=Bio::Seq->new('-seq'=>'acgt');print $seq_obj->seq(),"\n";
The code of the object class Bio::Seq is located in thismodule (as well as the associated documentation).The :: notation reflects the module organization into the file system.
LF Basel October 2006
What is Bio::Seq ?
The Bio::Seq object, or "Sequence object", or "Seq object", isubiquitous in BioPerl, itcontains a single sequenceand associated names, identifiers, and properties.
This generic "Sequenceobject" could be eitherprotein or DNA, and it isnot linked to a particularformat, like the SwissProt, the EMBL or the GenBank ones.
LF Basel October 2006
Line 3: Create instance
This line creates a sequence object (in memory)
#!/usr/local/bin/perluse Bio::Seq;$seq_obj=Bio::Seq->new('-seq'=>'acgt');print $seq_obj->seq(),"\n";
The Perl variable $seq_obj refers to an instance of the Bio::Seq classnew is a subroutine found in the module Bio/Seq.pm. The function call Bio::Seq->new acts as the constructor of the object. ‘-seq’=>’acgt’ assign the value to the attribute
LF Basel October 2006
More details
In BioPerl, most constructors take arguments under the form of key=value pairs. This is to provide maximal flexibility to the programmer. Many other keys (e.g., '-id' or '-desc') are available for the Bio::Seq->new constructor.
Read the documentation! http://doc.bioperl.org/
LF Basel October 2006
Line 4: Method callThis line prints out what is returned by the method seq() of the object $seq_obj (actually it is acgt)
#!/usr/local/bin/perluse Bio::Seq;$seq_obj=Bio::Seq->new('-seq'=>'acgt');print $seq_obj->seq(),"\n";
The -> notation means that one specifically intends to call the subroutine seq that is attached to $seq_obj. Indeed, a differentobject might have a method named seq, with a possibly differentimplementation if it belong to a different class (polymorphism).
LF Basel October 2006
More methodsThe BioPerl documentation tells us that the Bio::Seq objecthave many other methods. Some like seq() return scalar, for example
Read the documentation!
The following piece of code
#!/usr/local/bin/perluse Bio::Seq;$seq_obj=Bio::Seq->new('-
seq'=>'acgt');print $seq_obj->seq(),"\n";print $seq_obj->alphabet(),"\n";print $seq_obj->subseq(3,4),"\n";
prints outacgtdnagt
LF Basel October 2006
More methodsThe BioPerl documentation tells us that Bio::Seq objecthave methods that return a new instance of Bio::Seq object
Read the documentation!
The following piece of code
#!/usr/local/bin/perluse Bio::Seq;$seq_obj=Bio::Seq->new('-seq'=>'acgt');$seq_obj_2=$seq_obj->trunc(1,3);print $seq_obj_2->seq(),"\n";$seq_obj_3=$seq_obj_2->revcom();print $seq_obj_3->seq(),"\n";$seq_obj_4=$seq_obj_2->translate();print $seq_obj_4->seq(),"\n";
prints outacgcgtT
LF Basel October 2006
More objectsThe Bio::SeqIO object is responsible for reading/writing sequence to file. It provides support for the various database formats. The next script creates a 'sequence' object and save it to a file named 'test.seq' under FASTA format:
#!/usr/local/bin/perluse Bio::Seq;use Bio::SeqIO;$seq_obj=Bio::Seq->new('-seq'=>'acgt’,
‘-id’=>‘#12345’,‘-desc’=>‘example 1’);
$seqio_obj = Bio::SeqIO->new(‘-file’=>’>test.seq’,‘-format’=>’fasta’);
$seqio_obj->write_seq($seq_obj);
saves the file test.seq containing>#12345 example 1acgt
LF Basel October 2006
Simple change…
Case one replaces '-format' => 'fasta' with '-format' => 'embl' in the Bio::SeqIO constructor, one gets
ID #12345 standard; DNA; UNK; 4 BP.
XX
AC unknown;
XX
DE example 1
XX
FH Key Location/Qualifiers
FH
XX
SQ Sequence 4 BP; 1 A; 1 C; 1 G; 1 T; 0 other;
acgt 4
//
LF Basel October 2006
A format converter#!/usr/local/bin/perl
use Bio::SeqIO;
$in = Bio::SeqIO->new('-file' => 'infile' ,
'-format' => 'Fasta');
$out = Bio::SeqIO->new('-file' => '>outfile',
'-format' => 'EMBL');
while (my $seq=$in->next_seq()) {
$out->write_seq($seq);
}
LF Basel October 2006
What's next with BioPerl?
There are several tutorialsand plenty of examples to help you start with BioPerl
Read the documentation and play with the examples.
Many have learned throughpractice.
LF Basel October 2006
What's next with OOP?
The design of object internalswas not covered in detailshere, because this requiressome familiarity withprogramming. This is not especially difficult.
There are problems that greatlybenefit from OOP, and othersthat are more easily managedwithout.
Applied improperly, or by people without the skills, knowledge, and experience, OOP doesn't solve anyproblems, and might evenmake things worse. It can bean important piece of the solution, but isn't a guaranteeor a silver bullet.
Many programmers like the OOP way. Maybe you too? ;-)