23
Object Oriented Programming (OOP) and introduction to BioPerl Laurent Falquet (original course by Marco Pagni), Basel October, 2006 Swiss Institute of Bioinformatics Swiss EMBnet node LF Basel October 2006 Overview in 3 parts Motivation for OOP The key concepts of OOP Using BioPerl objects

Object Oriented Programming (OOP) - ch.embnet.org · Object Oriented Programming (OOP) and introduction to BioPerl Laurent Falquet (original course by Marco Pagni), Basel October,

Embed Size (px)

Citation preview

Page 1: Object Oriented Programming (OOP) - ch.embnet.org · Object Oriented Programming (OOP) and introduction to BioPerl Laurent Falquet (original course by Marco Pagni), Basel October,

Object OrientedProgramming (OOP)

and introduction to BioPerl

Laurent Falquet (original course by Marco Pagni), Basel October, 2006

Swiss Institute of BioinformaticsSwiss EMBnet node

LF Basel October 2006

Overview in 3 parts

Motivation for OOP

The key concepts of OOP

Using BioPerl objects

Page 2: Object Oriented Programming (OOP) - ch.embnet.org · Object Oriented Programming (OOP) and introduction to BioPerl Laurent Falquet (original course by Marco Pagni), Basel October,

LF Basel October 2006

Part I Motivation for OOP

LF Basel October 2006

Background

OOP didn't come out of the blue. It has stronghistorical roots in otherparadigms and practices.

It came about to addressproblems commonlygrouped together as the "software crisis".

The "software crisis" manifests itself in

1. cost overruns2. user dissatisfactionwith the final product3. buggy software4. brittle software

Page 3: Object Oriented Programming (OOP) - ch.embnet.org · Object Oriented Programming (OOP) and introduction to BioPerl Laurent Falquet (original course by Marco Pagni), Basel October,

LF Basel October 2006

ComplexitySoftware is inherently complexbecause

we attempt to solveproblems in complexdomainswe are forced by the size of the problem to work in teamssoftware is incrediblymalleable building materialdiscrete systems are proneto unpredictable behaviorsoftware systems consistof many pieces, many of which communicate.

Some factors that impact on and reflect complexity in software

The number of names(variables, functions, etc) that are visibleConstraints on the time-sequence of operations(real-time constraints)Memory management (garbage collection and address spaces)ConcurrencyEvent driven user interfaces.

LF Basel October 2006

How do humans cope withcomplexity in everyday life?

Humans deal withcomplexity by abstractingdetails away.

E.g., Surfing the Internet doesn't requireknowledge of internalprocessors registers; sufficient to think of a computer as simple visualization tool.

To be useful, an abstraction (model) must be smaller than what itrepresents.

E.g., road map vsphotographs of terrain vs physical model.

Page 4: Object Oriented Programming (OOP) - ch.embnet.org · Object Oriented Programming (OOP) and introduction to BioPerl Laurent Falquet (original course by Marco Pagni), Basel October,

LF Basel October 2006

Exercise 1

Memorize as many numbers from the following sequence as youcan. I'll show them for 30 seconds. Now write them down.

1759376099873461324287593345108941120765934

How many did you remember?

How many could you remember with unlimited amounts of time?

LF Basel October 2006

Exercise 2

Write down as many of the following telephone numbersas you can:

Pizza:Friend 2:

Friends 1:Fax:

Post Office:Parents:

Co-worker:Boss:

Cellular:Home:

Page 5: Object Oriented Programming (OOP) - ch.embnet.org · Object Oriented Programming (OOP) and introduction to BioPerl Laurent Falquet (original course by Marco Pagni), Basel October,

LF Basel October 2006

Answer to the Exercises 1 and 2

By abstracting the detailsof the numbers away and grouping them into a new concept (telephonenumber) we have increased our information handling capacity by nearly an order of magnitude!

Working with abstractions lets us handle more information.

LF Basel October 2006

Exercise 3

How many of these (unrelated) concepts can you memorizein 30 seconds?

Page 6: Object Oriented Programming (OOP) - ch.embnet.org · Object Oriented Programming (OOP) and introduction to BioPerl Laurent Falquet (original course by Marco Pagni), Basel October,

LF Basel October 2006

Answer to the Exercises 3

Miller (PsychologicalReview, vol 63(2)):

"The Magical NumberSeven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information"

Working with abstractions lets us handle more information (e.g. phone numbers), but we're stilllimited by Miller's observation. What if you have more than 7 things to juggle in your headsimultaneously?

LF Basel October 2006

Hierarchy

A common strategy: forma hierarchy to classify and order our abstractions.

Common examples aremilitary, large companies, administrationLinnaeus’ classification system of organismEC numbers for enzymatic reactionsUNIX file system.

Page 7: Object Oriented Programming (OOP) - ch.embnet.org · Object Oriented Programming (OOP) and introduction to BioPerl Laurent Falquet (original course by Marco Pagni), Basel October,

LF Basel October 2006

Decomposition

Divide and conquer is a handy skill for many thornylife problems.

We want to compose a system from small pieces, rather thanbuild a large monolithicsystem, because the former canbe made more reliable.

Failure in one part, if properlydesigned, won't cause failureof the whole. This depends on the issue of coupling.

We can beat this grim view of a system composed of manyparts by properly decomposingand decoupling. Anotherreason is that we can divide up the work more easily.

LF Basel October 2006

Object technology

Nothing unique about forming abstractions, but in OOP this is a main focus of activity and organization.

We can take advantageof the natural humantendency to anthropomorphise.We'll call ourabstractions objects.

We'll put our abstractions into a hierarchy to keepthem organized and minimize redundancy.

Natural way to "divideand conquer" the large state spaces we face (complexity).

Page 8: Object Oriented Programming (OOP) - ch.embnet.org · Object Oriented Programming (OOP) and introduction to BioPerl Laurent Falquet (original course by Marco Pagni), Basel October,

LF Basel October 2006

Part II The Key Concepts of OOP

LF Basel October 2006

Class and Instance

A class is a part of a program that describes the properties of an object. These properties fall intotwo broad categories:

attributes - the dataassociated to an object,methods - the functionsor procedures or subroutines that comesalong with an object.

An instance of an objectis a member of a class which has receivedparticular values to itsattributes.

Page 9: Object Oriented Programming (OOP) - ch.embnet.org · Object Oriented Programming (OOP) and introduction to BioPerl Laurent Falquet (original course by Marco Pagni), Basel October,

LF Basel October 2006

Class and Instance example

A "square" class couldhave size and colorattributes and methods to alter them:

Two instances of the square class may consistin a large blue square and a tiny red one:

set_size,

set_color,

perimeter

Methods

size, colorAttributes

squareName

LF Basel October 2006

Encapsulation

The attributes of an objectusually receive somedegree of privateness.

Private attributes are not accessible fromoutside the object.Public attributes can bedirectly accessed by anyother objects.

The methods of an objectusually receive somedegree of privateness.

Encapsulation: the values of an object'sattributes should only bealtered by its ownmethods. Privateattributes should alwaysbe favored.

Page 10: Object Oriented Programming (OOP) - ch.embnet.org · Object Oriented Programming (OOP) and introduction to BioPerl Laurent Falquet (original course by Marco Pagni), Basel October,

LF Basel October 2006

Encapsulation

A "person" class couldhave two attributes, nameand credit card numberfor example.

Nobody want its creditcard number beingpublic!

Encapsulation hides the implementation awayfrom the user.

One should be able to drive a car withoutknowing how the engineworks…

LF Basel October 2006

Benefit of encapsulationEncapsulation is a technique for minimizinginterdependencies amongobjects by defining a strict external interface. This way, internal coding canbe changed without affectingthe interface, as long as the new implementation supports the same (or upwardscompatible) external interface.

Page 11: Object Oriented Programming (OOP) - ch.embnet.org · Object Oriented Programming (OOP) and introduction to BioPerl Laurent Falquet (original course by Marco Pagni), Basel October,

LF Basel October 2006

MethodsOne could broadly distinguishfour kinds of methods

A method that creates a new object is called a constructor.A method that destroys an object is called a destructor.It is frequent to definemany simple accessmethods just to set or getattribute values. Thesemethods ensure data encapsulation.Other methods that permit the object to perform someuseful actions.

LF Basel October 2006

Hierarchy through inheritance

Classes can have childrenthat is, one class can becreated out of anotherclass.

A sub-class inherits all the attributes and methodsof the super-class, and may have additionalattributes and behaviors.

Inheritance aids in the reuse of code.

familynameAttributes

parentName

familynameAttributes

childName

Page 12: Object Oriented Programming (OOP) - ch.embnet.org · Object Oriented Programming (OOP) and introduction to BioPerl Laurent Falquet (original course by Marco Pagni), Basel October,

LF Basel October 2006

Hierarchy exampleA very simple class diagram:

VehiculeAttr:Meth:

speedstartstop

Air-VehiculeAttr:Meth:

nr-wingtake-offland

Land-VehiculeAttr:

Meth:nr-wheel

PlaneAttr:Meth: fuel

BikeAttr:

Meth:

CarAttr:

Meth: fuel

LF Basel October 2006

A real example of hierarchy

The BioPerl class diagram:

Page 13: Object Oriented Programming (OOP) - ch.embnet.org · Object Oriented Programming (OOP) and introduction to BioPerl Laurent Falquet (original course by Marco Pagni), Basel October,

LF Basel October 2006

Polymorphism

Polymorphism means the ability to request that the same operations beperformed by a widerange of different objects.Polymorphism is a consequence of inheritance. Fromprevious example, anyvehicle can start and stop.

Think to a computer desktop where one like to open, resize and close anywindow, whatever is the content.

LF Basel October 2006

Objects In Perl Are DeceptivelySimple

An object instance issimply a reference thathappens to know whichclass it belongs to (a reference is a scalar, justlike a number or a string).A class is simply a package that happens to provide methods to deal with object references.

A method is simply a subroutine that expectsan object reference (or a package name, for class methods) as the first argument.

A class inherits through @ISA array

Page 14: Object Oriented Programming (OOP) - ch.embnet.org · Object Oriented Programming (OOP) and introduction to BioPerl Laurent Falquet (original course by Marco Pagni), Basel October,

LF Basel October 2006

Class vs module

package MyMod;

sub f { … }

sub g { … }

1;

package MyObj;sub new { # constructor

my $class = ref(shift); # get class refmy $self = { @_ }; # set attributesbless($self, $class); # create objectreturn($self); # return instance ref

}sub other_methods { … }…1;

LF Basel October 2006

Class vs module

use MyMod;

MyMod::f($param);

my $a = MyMod::g();

use MyObj;…# call constructormy $instance = MyObj->new($attr);# call method$instance->other_methods();

Page 15: Object Oriented Programming (OOP) - ch.embnet.org · Object Oriented Programming (OOP) and introduction to BioPerl Laurent Falquet (original course by Marco Pagni), Basel October,

LF Basel October 2006

Part III Using BioPerl Objects

LF Basel October 2006

What is BioPerl?It is a collection of Perl modules for processing data for the life sciencesA project made up of biologists, bioinformaticians, computer scientistsAn open source toolkit of building blocks for life sciences applicationshttp://www.bioperl.org

First work in 1996

Bioperl 1.0 was released in May 2002

Current version 1.5.1 October2005

Part of the open-bio.org foundation (BioJava, BioPython, BioPerl, EMBOSS, BioMoby)

Page 16: Object Oriented Programming (OOP) - ch.embnet.org · Object Oriented Programming (OOP) and introduction to BioPerl Laurent Falquet (original course by Marco Pagni), Basel October,

LF Basel October 2006

What to expect from BioPerl?

If you're looking for a script built to fit your exact need it's likely you won't find it.

What you will find is a diverse set of Perl modules that willenable you to write your ownscript, and a community of people who are willing to help you.

The toolkit is divided into several packages, most people will only want to deal with the Corepackage

Core package provides the main parsers, this isthe basic package and it's required by all the other packagesRun package provides wrappers for executingsome 60 common bioinformatics applicationsExt package is for C-language extensions including some alignment algorithms and an interface to the Staden IO libraryGUI package includes some basic widgets in Perl-TkBioPerl db is a subproject to store sequence and annotation data in a BioSQL relational databasePedigree package is for manipulating genotype, marker, and individual data for linkage studiesMicroarray package has preliminary objects for manipulating some microarray data formatsNetwork package parses and analyzes protein-protein interaction dataPipeline package is a project for creatinganalysis pipelines out of bioperl-run modules

LF Basel October 2006

Code SampleThe following piece of code

#!/usr/local/bin/perluse Bio::Seq;$seq_obj=Bio::Seq->new('-seq'=>'acgt');print $seq_obj->seq(),"\n";

prints out

acgt

let's have a look at it, a line after the other.

Page 17: Object Oriented Programming (OOP) - ch.embnet.org · Object Oriented Programming (OOP) and introduction to BioPerl Laurent Falquet (original course by Marco Pagni), Basel October,

LF Basel October 2006

Line 1: Invoking Perl

This line tells your operating system where to find the Perl interpreter

#!/usr/local/bin/perluse Bio::Seq;$seq_obj=Bio::Seq->new('-seq'=>'acgt');print $seq_obj->seq(),"\n";

nothing object-oriented here!

LF Basel October 2006

Line 2: Import class

This line tells Perl to use a module on your machine called Seq.pm found in the directory Bio

#!/usr/local/bin/perluse Bio::Seq;$seq_obj=Bio::Seq->new('-seq'=>'acgt');print $seq_obj->seq(),"\n";

The code of the object class Bio::Seq is located in thismodule (as well as the associated documentation).The :: notation reflects the module organization into the file system.

Page 18: Object Oriented Programming (OOP) - ch.embnet.org · Object Oriented Programming (OOP) and introduction to BioPerl Laurent Falquet (original course by Marco Pagni), Basel October,

LF Basel October 2006

What is Bio::Seq ?

The Bio::Seq object, or "Sequence object", or "Seq object", isubiquitous in BioPerl, itcontains a single sequenceand associated names, identifiers, and properties.

This generic "Sequenceobject" could be eitherprotein or DNA, and it isnot linked to a particularformat, like the SwissProt, the EMBL or the GenBank ones.

LF Basel October 2006

Line 3: Create instance

This line creates a sequence object (in memory)

#!/usr/local/bin/perluse Bio::Seq;$seq_obj=Bio::Seq->new('-seq'=>'acgt');print $seq_obj->seq(),"\n";

The Perl variable $seq_obj refers to an instance of the Bio::Seq classnew is a subroutine found in the module Bio/Seq.pm. The function call Bio::Seq->new acts as the constructor of the object. ‘-seq’=>’acgt’ assign the value to the attribute

Page 19: Object Oriented Programming (OOP) - ch.embnet.org · Object Oriented Programming (OOP) and introduction to BioPerl Laurent Falquet (original course by Marco Pagni), Basel October,

LF Basel October 2006

More details

In BioPerl, most constructors take arguments under the form of key=value pairs. This is to provide maximal flexibility to the programmer. Many other keys (e.g., '-id' or '-desc') are available for the Bio::Seq->new constructor.

Read the documentation! http://doc.bioperl.org/

LF Basel October 2006

Line 4: Method callThis line prints out what is returned by the method seq() of the object $seq_obj (actually it is acgt)

#!/usr/local/bin/perluse Bio::Seq;$seq_obj=Bio::Seq->new('-seq'=>'acgt');print $seq_obj->seq(),"\n";

The -> notation means that one specifically intends to call the subroutine seq that is attached to $seq_obj. Indeed, a differentobject might have a method named seq, with a possibly differentimplementation if it belong to a different class (polymorphism).

Page 20: Object Oriented Programming (OOP) - ch.embnet.org · Object Oriented Programming (OOP) and introduction to BioPerl Laurent Falquet (original course by Marco Pagni), Basel October,

LF Basel October 2006

More methodsThe BioPerl documentation tells us that the Bio::Seq objecthave many other methods. Some like seq() return scalar, for example

Read the documentation!

The following piece of code

#!/usr/local/bin/perluse Bio::Seq;$seq_obj=Bio::Seq->new('-

seq'=>'acgt');print $seq_obj->seq(),"\n";print $seq_obj->alphabet(),"\n";print $seq_obj->subseq(3,4),"\n";

prints outacgtdnagt

LF Basel October 2006

More methodsThe BioPerl documentation tells us that Bio::Seq objecthave methods that return a new instance of Bio::Seq object

Read the documentation!

The following piece of code

#!/usr/local/bin/perluse Bio::Seq;$seq_obj=Bio::Seq->new('-seq'=>'acgt');$seq_obj_2=$seq_obj->trunc(1,3);print $seq_obj_2->seq(),"\n";$seq_obj_3=$seq_obj_2->revcom();print $seq_obj_3->seq(),"\n";$seq_obj_4=$seq_obj_2->translate();print $seq_obj_4->seq(),"\n";

prints outacgcgtT

Page 21: Object Oriented Programming (OOP) - ch.embnet.org · Object Oriented Programming (OOP) and introduction to BioPerl Laurent Falquet (original course by Marco Pagni), Basel October,

LF Basel October 2006

More objectsThe Bio::SeqIO object is responsible for reading/writing sequence to file. It provides support for the various database formats. The next script creates a 'sequence' object and save it to a file named 'test.seq' under FASTA format:

#!/usr/local/bin/perluse Bio::Seq;use Bio::SeqIO;$seq_obj=Bio::Seq->new('-seq'=>'acgt’,

‘-id’=>‘#12345’,‘-desc’=>‘example 1’);

$seqio_obj = Bio::SeqIO->new(‘-file’=>’>test.seq’,‘-format’=>’fasta’);

$seqio_obj->write_seq($seq_obj);

saves the file test.seq containing>#12345 example 1acgt

LF Basel October 2006

Simple change…

Case one replaces '-format' => 'fasta' with '-format' => 'embl' in the Bio::SeqIO constructor, one gets

ID #12345 standard; DNA; UNK; 4 BP.

XX

AC unknown;

XX

DE example 1

XX

FH Key Location/Qualifiers

FH

XX

SQ Sequence 4 BP; 1 A; 1 C; 1 G; 1 T; 0 other;

acgt 4

//

Page 22: Object Oriented Programming (OOP) - ch.embnet.org · Object Oriented Programming (OOP) and introduction to BioPerl Laurent Falquet (original course by Marco Pagni), Basel October,

LF Basel October 2006

A format converter#!/usr/local/bin/perl

use Bio::SeqIO;

$in = Bio::SeqIO->new('-file' => 'infile' ,

'-format' => 'Fasta');

$out = Bio::SeqIO->new('-file' => '>outfile',

'-format' => 'EMBL');

while (my $seq=$in->next_seq()) {

$out->write_seq($seq);

}

LF Basel October 2006

What's next with BioPerl?

There are several tutorialsand plenty of examples to help you start with BioPerl

Read the documentation and play with the examples.

Many have learned throughpractice.

Page 23: Object Oriented Programming (OOP) - ch.embnet.org · Object Oriented Programming (OOP) and introduction to BioPerl Laurent Falquet (original course by Marco Pagni), Basel October,

LF Basel October 2006

What's next with OOP?

The design of object internalswas not covered in detailshere, because this requiressome familiarity withprogramming. This is not especially difficult.

There are problems that greatlybenefit from OOP, and othersthat are more easily managedwithout.

Applied improperly, or by people without the skills, knowledge, and experience, OOP doesn't solve anyproblems, and might evenmake things worse. It can bean important piece of the solution, but isn't a guaranteeor a silver bullet.

Many programmers like the OOP way. Maybe you too? ;-)