11
Articles A Portable Bioinformatics Course for Upper-Division Undergraduate Curriculum in Sciences* Received for publication, April 11, 2008, and in revised form, May 21, 2008 Wely B. Floriano‡ From the From the Biological Sciences Department, California State Polytechnic University Pomona, Pomona, California 91768 This article discusses the challenges that bioinformatics education is facing and describes a bioinfor- matics course that is successfully taught at the California State Polytechnic University, Pomona, to the fourth year undergraduate students in biological sciences, chemistry, and computer science. Information on lecture and computer practice topics, free for academic use software and web links required for the laboratory exercises and student surveys for two instances of the course, is presented. This course emphasizes hands-on experience and focuses on developing practical skills while providing a solid knowledge base for critically applying these skills. It is designed in blocks of 1-hour lecture followed by 2 hours of computer laboratory exercises, both covering the same general topic, for a total of 30 hours of lecture and 60 hours of computer practice. The heavy computational aspect of this course was designed to use a single multiprocessor computer server running Linux, accessible from laptops through Virtual Network Computing sessions. The laptops can be either provided by the institution or owned by the individual students. This configuration avoids the need to install and maintain bioinformatics soft- ware on each laptop. Only a single installation is required for most bioinformatics software on the Linux server. The content of this course and its software/hardware configuration are well suited for institutions with no dedicated computer laboratory. This author believes that the same model can be successfully implemented in other institutions, especially those who do not have a strong instructional computer technology support such as community colleges and small universities. Keywords: Bioinformatics, computational biology, biotechnology education, computers in research and teach- ing, problem-based learning. INTRODUCTION Bioinformatics is in Demand Biomedical researchers are increasingly using com- puters to collect, store, manipulate, and analyze biologi- cal data relevant to human health and living. The last decade has witnessed an explosion of available biologi- cal data; thanks to the vast amount of data produced by genome projects and high-throughput biological experi- ments. Bioinformatics, defined here as the use of com- putational approaches to extract knowledge of the bio- logical system from biological data, has become critical to translate all this data into useful knowledge. Mining data to determine the biological functions and cellular roles of all genes and proteins, to understand their inter- action mechanisms with other proteins and small mole- cules, and to look for potential therapies to control gene expression and protein function are important challenges currently faced by bioinformatics. Job market projections are extremely positive for bioin- formatics professionals. According to the U.S. Bureau of Labor Statistics [1], bioinformatics is a particularly vibrant new area of work, and job opportunities in this area are expected to have the highest growth between 2004 and 2014, compared to the other fields in scientific research and development. The key driver for this expected 12% growth was identified as the increased demand for medi- cal and pharmaceutical advances resulting from an aging population. The U.S. Bureau of Labor Statistics recom- mends that biological sciences undergraduate students take computer courses, particularly in bioinformatics. Bioinformatics as an intrinsically multidisciplinary field draws on knowledge from biology, computer science, physics, mathematics, and chemistry [2–7, 10]. This heavily multidisciplinary nature is a challenge for most departments, because they are generally unified by disci- pline, and faculty cross-appointment is not the norm. Cooperation and planning involving multiple departments are necessary to provide quality education to form bioin- formatics professionals [5, 6, 8–11]. This limits the offer- ing of bioinformatics as major, minor, option, or emphasis *This work is supported by the College of Science and the Bi- ological Sciences Department at the California State Polytech- nic University Pomona. ‡To whom correspondence should be addressed. Tel.: (909) 869-4059; Fax: (909) 869-4078; E-mail: wbfloriano@csupomona. edu. This paper is available on line at http://www.bambed.org DOI 10.1002/bmb.20217 325 Q 2008 by The International Union of Biochemistry and Molecular Biology BIOCHEMISTRY AND MOLECULAR BIOLOGY EDUCATION Vol. 36, No. 5, pp. 325–335, 2008

A portable bioinformatics course for upper-division undergraduate curriculum in sciences

Embed Size (px)

Citation preview

Page 1: A portable bioinformatics course for upper-division undergraduate curriculum in sciences

Articles

A Portable Bioinformatics Course for Upper-Division UndergraduateCurriculum in Sciences*

Received for publication, April 11, 2008, and in revised form, May 21, 2008

Wely B. Floriano‡

From the From the Biological Sciences Department, California State Polytechnic University Pomona,Pomona, California 91768

This article discusses the challenges that bioinformatics education is facing and describes a bioinfor-matics course that is successfully taught at the California State Polytechnic University, Pomona, to thefourth year undergraduate students in biological sciences, chemistry, and computer science. Informationon lecture and computer practice topics, free for academic use software and web links required for thelaboratory exercises and student surveys for two instances of the course, is presented. This courseemphasizes hands-on experience and focuses on developing practical skills while providing a solidknowledge base for critically applying these skills. It is designed in blocks of 1-hour lecture followed by2 hours of computer laboratory exercises, both covering the same general topic, for a total of 30 hoursof lecture and 60 hours of computer practice. The heavy computational aspect of this course wasdesigned to use a single multiprocessor computer server running Linux, accessible from laptops throughVirtual Network Computing sessions. The laptops can be either provided by the institution or owned bythe individual students. This configuration avoids the need to install and maintain bioinformatics soft-ware on each laptop. Only a single installation is required for most bioinformatics software on the Linuxserver. The content of this course and its software/hardware configuration are well suited for institutionswith no dedicated computer laboratory. This author believes that the same model can be successfullyimplemented in other institutions, especially those who do not have a strong instructional computertechnology support such as community colleges and small universities.

Keywords: Bioinformatics, computational biology, biotechnology education, computers in research and teach-ing, problem-based learning.

INTRODUCTION

Bioinformatics is in Demand

Biomedical researchers are increasingly using com-puters to collect, store, manipulate, and analyze biologi-cal data relevant to human health and living. The lastdecade has witnessed an explosion of available biologi-cal data; thanks to the vast amount of data produced bygenome projects and high-throughput biological experi-ments. Bioinformatics, defined here as the use of com-putational approaches to extract knowledge of the bio-logical system from biological data, has become criticalto translate all this data into useful knowledge. Miningdata to determine the biological functions and cellularroles of all genes and proteins, to understand their inter-action mechanisms with other proteins and small mole-cules, and to look for potential therapies to control gene

expression and protein function are important challengescurrently faced by bioinformatics.

Job market projections are extremely positive for bioin-formatics professionals. According to the U.S. Bureau ofLabor Statistics [1], bioinformatics is a particularly vibrantnew area of work, and job opportunities in this area areexpected to have the highest growth between 2004 and2014, compared to the other fields in scientific researchand development. The key driver for this expected 12%growth was identified as the increased demand for medi-cal and pharmaceutical advances resulting from an agingpopulation. The U.S. Bureau of Labor Statistics recom-mends that biological sciences undergraduate studentstake computer courses, particularly in bioinformatics.

Bioinformatics as an intrinsically multidisciplinary fielddraws on knowledge from biology, computer science,physics, mathematics, and chemistry [2–7, 10]. Thisheavily multidisciplinary nature is a challenge for mostdepartments, because they are generally unified by disci-pline, and faculty cross-appointment is not the norm.Cooperation and planning involving multiple departmentsare necessary to provide quality education to form bioin-formatics professionals [5, 6, 8–11]. This limits the offer-ing of bioinformatics as major, minor, option, or emphasis

*This work is supported by the College of Science and the Bi-ological Sciences Department at the California State Polytech-nic University Pomona.

‡ To whom correspondence should be addressed. Tel.: (909)869-4059; Fax: (909) 869-4078; E-mail: [email protected].

This paper is available on line at http://www.bambed.org DOI 10.1002/bmb.20217325

Q 2008 by The International Union of Biochemistry and Molecular Biology BIOCHEMISTRY AND MOLECULAR BIOLOGY EDUCATION

Vol. 36, No. 5, pp. 325–335, 2008

Page 2: A portable bioinformatics course for upper-division undergraduate curriculum in sciences

in many universities. What most departments can do isto train students in the use of bioinformatics tools, sothat they can apply current tools in their professionalendeavors.

Teaching Practical Skills Versus Fundamentals

The demand for bioinformatics professionals is clearand, based on the job market projections, there is nodoubt that the stronger demand is for a professional thatcan apply knowledge of physics, mathematics, chemis-try, computer science, and information science to bio-medical research [5, 6, 10]. These professionals need tomaster the fundamental concepts behind bioinformatics,including algorithms and computer languages. However,biology students traditionally have a limited computa-tional background, and most biology programs have aweak quantitative component. Courses (such as biomet-rics, which covers statistical approaches, used to collectand analyze biological data) that require the use of com-puter programs typically follow a recipe-style approach,in which students learn how to use the programs but seenothing of how they are written or work internally (thealgorithms).

‘‘Toolkit’’ Bioinformatics is the First Step

Balancing the clear need for professionals that can de-velop/modify/adapt computational tools to deal with theever-growing amount of biological data, and the limitedcomputer skills of biology students is not an easy task.There has been a great deal of debate on how to teachbioinformatics in the literature [4, 6–12]. This authorbelieves that the best approach is to build layers of skills,beginning with a practical course that gives students fa-miliarity with current databases and computer applica-tions. The course presented in this study forms this ‘‘firstlayer.’’ Students are given an overall view of the availablecomputational tools (including simplified views of thealgorithms whenever appropriate), learn how to criticallychoose tools and parameters, and analyze results. Finallythey are taught how to apply these skills to typical real-world cases. This first layer ‘‘basic skills’’ course shouldbe followed by more in-depth courses (additional layers)covering algorithms in detail, programming, specializedapplications such as microarray data analysis or knowl-edge-building topics, such as understanding and analyz-ing genomes.

The inclusion of at least one bioinformatics course thatfocuses on the development of practical skills will pro-vide science students with quality education that attendsthe demands of our modern society.

A Portable Introductory Course Is Within the Reachof Most Higher Education Institutions

The most important advantage of the proposed courseis the fact that economically challenged universities canimplement it at a reasonable cost, and universities geo-graphically located near each other can, potentially,share a course they develop together. The course does

not require a dedicated computer laboratory, and sys-tems administration duties (such as installing and updat-ing programs, creating accounts, and running backups)is simplified by the use of a single Linux server and mul-tiple client laptops, which are essentially ‘‘dumb’’ (key-board, mouse, and display) terminals requiring minimummaintenance.

Targeted Audience and Prerequisites

This course targets upperdivision undergraduate stu-dents in biological sciences, chemistry, and computerscience. Students are expected to have a minimumbackground in cellular biology and biochemistry andshould be familiar with web-browsing and text editorswidely used in personal computers. Although based onthe Linux operating system (OS), no prior Linux experi-ence is required.

TEACHING STRATEGY

A Problem-solving Approach

This course adopts a problem-solving approach wherestudents learn to use bioinformatics first in the context ofa specific application and later as part of their devisedstrategy to complete a project. The lectures are designedassuming minimum prior knowledge and, thus, basic in-formation necessary to understand why, when, and howto use bioinformatics tools are provided. It is implicitlyassumed that the students will have access to otherrelated courses, so that they can gradually build-up theirknowledge.

Online Resources

Lecture notes, and all materials and procedures for thecourse, are available to students online through black-board [13]. A course survey that students take onlinenear the end of the course is also on blackboard, whichcomputes and returns all the statistics for the survey. Analternative for institutions that do not subscribe to acourse management system such as blackboard is tomake the course material available through an editableweb site using a wiki engine such as the open sourcesoftware packages MediaWiki [14] or Twiki [15].

Lectures

A 1-hour lecture is given each day the class assem-bles, followed by 2 hours of computer laboratory prac-tice. Students move from lecture to laboratory practiceseamlessly, as both occur in the same room, and stu-dents are occasionally prompted to perform tasks usingtheir laptops during the lecture component. The lecturesare designed to introduce the topics and give the stu-dents enough information for them to understand why, inwhat circumstances, and how to perform the laboratoryexercises. For example, the lecture on Linux covers basicconcepts on OSs and shared environments, typical orga-nization of a Linux directory tree (known as a file system),command-line-based operations, command-line syntax,and basic Linux commands students will use throughout

326 BAMBED, Vol. 36, No. 5, pp. 325–335, 2008

Page 3: A portable bioinformatics course for upper-division undergraduate curriculum in sciences

the course. Throughout the Linux lecture, students areinstructed to type each command being discussed andobserve its ‘‘effects.’’ Students learn how to log into thesystems; look at their location within the file system (i.e.,what is their current directory); change directories; listthe content of the current directory; create and deletedirectories; create simple text files; rename, copy, deleteand move files; and log off the system. These basic com-mands are executed by the students during the lectureand reinforced during the laboratory exercise, which alsocovers the use of text and spreadsheet editors to createlaboratory reports.

Lectures are prepared using PowerPoint [16], printedto pdf files as handouts with three slides per pages, andposted on blackboard [13] before class. Most studentsprint the notes, bring them to class, and use them towrite their own notes during lecture. Table I lists the maintopics covered in the lectures in the order they are pre-sented.

Laboratory Exercises

After each lecture, students spend 2 hours executinglaboratory exercises under the instructor’s supervision.Procedures for each laboratory task are posted onlineprior to the laboratory and remain accessible through thecourse. The procedures consist of detailed instructionsto execute a particular bioinformatics task. For example,students are asked to retrieve homologous sequences toa particular query protein, reduce the redundancy of thematching set, perform a multiple sequence alignment,and analyze it to identify highly conserved and variableregions. Laboratory exercises were designed to walk stu-dents through the use of bioinformatics tools, with em-phasis on selecting the right tools and parameters forspecific biological problems. The course has a total of24 laboratory exercises covering topics on nucleotidesequence analysis and comparison, amino acid sequenceanalysis and comparison, and structural bioinformatics.Table II lists the topics, objectives, required resources,and web links for each laboratory task.

The laboratory exercises are designed to be completedwithin the 2 hours allocated for them, and most studentsdo finish their tasks and reports within the allocatedtime. However, if students need more time to completetheir reports, to repeat particular steps of the proce-dures, to correct some questions they may haveanswered wrong, or to review some of the tools theylearned to use, they can access the course materials andall their files remotely. Everything they work on is storedin the Linux server machine and can be accessed fromany computer with an internet connection and a VirtualNetwork Computing (VNC) viewer session. This couldinclude their home computers, for example. They are notrestricted to the university laptops provided during thelaboratories. The Linux server machine is available to thestudents 24 hours a day and 7 days a week. Feedbackfrom student surveys (see Table IV) indicates that this isone of the strong positive points of the course.

STUDENT ASSESSMENT

Students are assessed through laboratory reports, afinal written report, and a short oral presentation. Thegrading weights and a summary description of activitiesare provided in Table III. A more detailed discussion ofeach assessment criteria is given below.

Laboratory Reports

Students are required to write a report for each labora-tory exercise. The reports are written during the labora-tory and consisted of parameters, data, and answers toquestions posed in the procedures. Grading includespoints for having all the files expected to be generatedthroughout the exercise, obtaining expected values forany quantities calculated during the exercises (e.g., phiand psi angles for a given structure, root mean squaredeviation in carbon-alpha coordinates between twothree-dimensional structures of proteins), and answeringposed questions correctly.

TABLE ILecture Topics

Lecture no. Topic Lecture no. Topic

1 Introduction to bioinformatics 16 Protein structure representations2 Linux 17 Structural classification of proteins3 Biological databases 18 Structural alignment4 Genome databases 19 Analyzing protein-ligand interactions5 Final project description and assignment 20 HBs, torsion angles and secondary structure assignment6 Nucleotide sequence analysis 21 Quality of protein structures7 Nucleotide sequence comparison 22 Secondary structure prediction—globular proteins8 Database similarity search—BLAST 23 Secondary structure prediction—membrane proteins9 Database similarity search—FASTA 24 Protein structure prediction

10 Substitution matrices 25 Software for modeling proteins11 Multiple sequence alignment 26 Overview of structural bioinformatics12 Genes, proteins and evolution 27 Student presentations13 Domains, patterns and motifs 28 Student presentations14 Overview of sequence analysis 29 Student presentations15 Protein structure determination and archival 30 Student presentations

Each lecture is 1-hr long, followed by 2 hr of laboratory exercises. Topics are listed in order of presentation.

327

Page 4: A portable bioinformatics course for upper-division undergraduate curriculum in sciences

TABLEII

Topicsforcomputerlaboratory

exe

rcises

Labno.

Component

Content

Requiredsoftware

resources

Weblinks

1Introductionto

Linux

Connectto

server.Execute

basic

Linux

commandsandinterprettheresulting

computeraction.

Accessto

studentaccountonLinux

serverthroughVNC

2Manipulatingtextfilesin

aLinux

environment

Create,edit,save,read,rename,copy,

move,anddelete

textfilesin

aLinux

environment.Findjobadsin

monster.com

andsaveastextfiles.

Findlocalandnationwidesalary

inform

ationonmonster.com

for

minim

um

ofonepositionatthree

levels

(BSc,MSc,andPhD)andsave

itin

aspreadsh

eet.

Internetaccess

Webbrowser(Firefox)

Texteditor(G

edit)

Textandspreadsh

eeteditors

(OpenOfficeWrite

andCalc)

3ENTREZ

Learn

aboutonlineresourcesatthe

NationalCenterforBiotechnology

Inform

ation(NCBI).Follo

wtutorials

forPubMedandMeSH,andusethe

ENTREZsystem

toperform

literature

searchesusingthesetw

otools.

Internetaccess

http://w

ww.ncbi.n

lm.nih.gov/Entrez/

Webbrowser(Firefox)

OpenOfficeWriter

4a

Genomebrowsertour

UseUCSC

GeneSorterto

find

inform

ationaboutspecificgenesand

genesthatare

relatedto

one

another.

Internetaccess

http://genome.ucsc

.edu/

Webbrowser(Firefox)

OpenOfficeWriter

4b

Ensemblworkedexample

Follo

wtheEnsemblworkedexample,

whichis

awalk

throughthemain

pagesoftheEnsemblBrowser.

Internetaccess

http://w

ww.ensembl.o

rg/info/

helpdesk/tutorials/

worked_example.pdf

Webbrowser(Firefox)

OpenOfficeWriter

5UsingtheOMIM

(onlineMendelian

inheritancein

man)database

UsetheOMIM

(OnlineMendelian

Inheritancein

Man)databaseto

find

inform

ationaboutgeneticdiseases/

conditionssuchassickle

cellanemia

(locus11p15.4),galactosemia

(locus

9p13)orhemophiliaA(lo

cusXq28)

Internetaccess

http://w

ww.ncbi.n

lm.nih.gov/sites/

entrez?

db¼O

MIM

Webbrowser(Firefox)

OpenOfficeWriter

6SequenceanalysisusingtheEMBOSS

package

UsetheEMBOSSpackageto

analyze

anucleotidesequence:identify

open

readingframes,translate

sequences,

andcalculate

nucleotidefrequencies.

EMBOSS/jemboss

package(plotorf,

getorf,transeqandfreak)

http://emboss.sourceforge.net/,

http://w

ww.ncbi.n

lm.nih.gov

Texteditor(gedit)

7Globalversuslocalpairwisealig

nments

Compare

aglobalto

alocalpairwise

alig

nmentusingthepairwise

sequencealig

nmentalgorithms

Needleman-W

unschandSmith-

Waterm

an,im

plementedin

the

programsNeedle

andWater.

EMBOSS/jemboss

package(Needle

andWater)

Texteditor(gedit)

8BLASTsearchusinganucleotide

sequence

UseBLASTatNCBIto

findinform

ation

aboutanunidentifiednucleotide

sequence.Reportitsaccession

code,sourceorganism,andgene

product.Findhomologous

sequencesandreportthespeciesof

theclosest

BLASThits.

Internetaccess

http://w

ww.ncbi.n

lm.nih.gov/B

LAST/

Webbrowser(Firefox)

Texteditor(gedit)

(continued)

328 BAMBED, Vol. 36, No. 5, pp. 325–335, 2008

Page 5: A portable bioinformatics course for upper-division undergraduate curriculum in sciences

TABLEII

(Continued)

Labno.

Component

Content

Requiredsoftware

resources

Weblinks

9Findingrelatedgenesusinggenesorter

Findgenesthatare

relatedbyprotein-

levelhomology,

sim

ilarity

ofgene

expressionprofiles,orgenomic

proxim

ity.

Internetaccess

http://genome.ucsc.edu/

Webbrowser(Firefox)

Texteditor(gedit)

10

Effects

ofsubstitutionmatriceson

pairwisealig

nments

Analyze

theeffects

ofdifferent

substitutionmatricesontheoutcome

ofpairwisealig

nments

obtained

usingtheNeedleman-W

unsch

algorithm.Reportvaluesforlength,

andpercentagesofidentity

and

sim

ilarity

forthesamepairof

sequencesalig

nedusingdifferent

substitutionmatrices.

Internetaccess

EMBOSS/jemboss

package(Needle)

Texteditor(gedit)

11

Multiple

sequencealig

nmentofa

nonredundantsetofproteinsusing

clustal

UseExPASyto

findaquery

protein

sequenceandsubmitasequence

homologysearchthroughtheSwiss

Institute

ofBioinform

aticsBLAST

server.Retrieveasetofnon-

redundantaminoacid

sequencesby

reducingthesetuntilallthe

sequenceshave

lessthan90%

of

sequenceidentify

andnonehave

lessthan65%

sequenceidentity.

Create

andanalyze

amultiple

sequencealig

nmentusingthe

program

Clustal.

Internetaccess

http://w

ww.expasy.org

Webbrowser(Firefox)

OpenOfficeWriter

ClustalX

EMBOSS/jemboss

(Emma,

prettyplot)

12

Drawingtrees

Draw

phylogenetictreesusingthe

multiple

sequencealig

nment

previously

created.

Texteditor(gedit)

Phylip

EMBOSS/jemboss

(jalview)

13

Aminoacid

patternsandprotein

domains

Explore

databasesofdomains,

patternsandprofiles.Classify

proteinsinto

familiesbyscanningits

aminoacid

sequence.

Internetaccess

http://ca.expasy.org/tools/blast/,

Webbrowser(Firefox)

http://w

ww.sanger.ac.uk/Software/Pfam/

OpenOfficeWriter

14

Finalproject—

sequenceretrieva

l,analysis,andcompariso

nStudents

apply

tools/knowledgefrom

laboratory

exercises1-13to

their

finalprojectunderinstructor’s

assistance.

15

ObtainingPDB

structures

Explore

theRCSB

Protein

Data

Bank

andobtain

3D

structuresofproteins.

Structureswillbeusedin

follo

wing

laboratory

tasks.

Internetaccess

http://w

ww.rcsb.org

Webbrowser(Firefox)

(continued)

329

Page 6: A portable bioinformatics course for upper-division undergraduate curriculum in sciences

TABLEII

(Continued)

Labno.

Component

Content

Requiredsoftware

resources

Weblinks

16

Graphicalrepresentationsofprotein

structure

Protein

structure

visualizationusingthe

moleculargraphicsprogram

VMD.

Tasks:(1)Visualizeprotein

structure:

hydrogenbonds,secondary,and

tertiary

structures.(2)Create

graphicalrepresentations:Calpha

trace,backbonetrace,ribbonsand

cartoons.(3)Colorrepresentations

bysecondary

structure,chain,

molecule.(4)Perform

structural

alig

nmentandcalculate

RMSD

fora

pairofstructures.(5)Save

view

state:savethestepsyouperform

ed

toafile

soyoucanre-create

the

samegraphicalrepresentationagain.

(6)Save

andmanipulate

imagefiles.

VMD

(VisualMolecularDynamics)

17

StructuralclassificationusingSCOP

Classifyacatalyticdomain

infolds,

superfamiliesandfamiliesusingthe

databaseSCOP(Structural

ClassificationOfProteins).

Internetaccess

http://scop.m

rc-lmb.cam.ac.uk/scop/

Webbrowser(Firefox)

18

Comparingprotein

structures

Perform

structuralalig

nmentand

calculate

RMSDs(Calphaand

backbone)forapairofstructures.

Compare

sequencealig

nmentbased

onstructuralsuperpositionto

sequence-basedpairwisealig

nment.

VMD

(VisualMolecularDynamics)

ClustalX

OpenOfficeWriter

19

Analyzingprotein-ligandinteractions

Displayandanalyze

protein-ligand

interactions.Analyze

thebinding

interactionsofasmalldrugto

its

protein

target.Perform

astructural

alig

nmentto

compare

ligand-bound

andunboundconform

ationsofthe

sameprotein.Calculate

RMSD

betw

eenboundandunboundform

s.

VMD

(VisualMolecularDynamics)

OpenOfficeWriter

20

Analyzingqualityofprotein

structures—

ramachandranplots

Calculate

torsionanglesandidentify

hydrogenbonds.Identify

regionsof

secondary

structure

usingtorsion

anglesandhydrogenbondpattern.

Create

andanalyze

Ramachandran

plots.Casestructures:crambin

(pdb

code1CRN)and7residues

polyalaninestructuresin

extended

andhelicalconform

ation.

DSSP

VMD

(VisualMolecularDynamics)

OpenOfficeWriter

(continued)

330 BAMBED, Vol. 36, No. 5, pp. 325–335, 2008

Page 7: A portable bioinformatics course for upper-division undergraduate curriculum in sciences

TABLEII

(Continued)

Labno.

Component

Content

Requiredsoftware

resources

Weblinks

21

Analyzingqualityofprotein

structures—

WhatcheckandProcheck

Checkandcompare

thequalityof

protein

structures.Analyze

Ramachandranplots,secondary

structure

content,closecontacts,

missingresiduesand/oratoms,

unsatisfiedhydrogenbondacceptors

anddonors.

Whatcheck(lo

cally

and/orthrough

PDBSum)

http://swift.cmbi.k

un.nl/swift/

whatcheck/

Procheck(lo

cally

and/orthrough

PDBSum)

http://w

ww.biochem.ucl.a

c.uk/

�roman/procheck/procheck.htm

lOpenOfficeWriter

http://w

ww.ebi.a

c.uk/thornton-srv/

databases/pdbsum/

22

Secondary

structure

predictionfor

globularproteins

Usevariousmethodsto

predictthe

secondary

structure

elements

ofa

globularprotein.Compare

predictionsto

theactualsecondary

structure

assignmentbasedon

torsionalanglesandhydrogenbond

patterns.Discusstheaccuracyofthe

differentmethods.

Internetaccess

Protein

Data

Bankwww.rcsb.org

(forstructure

retrieval)

Webbrowser(Firefox)

http://w

ww.compbio.dundee.ac.uk/

�www-jpred/

DSSP

JPRED

EMBOSS/jemboss

(Garnier,jalview)

OpenOfficeWriter

23

Secondary

structure

predictionfor

membraneproteins

UseTMHMM

andSOSUIto

predictthe

transmembranedomainsoffour

proteins.Build

aconsensus

predictionforeachcase.

OpenOfficeWriter

http://w

ww.ncbi.n

lm.nih.gov/(for

sequenceretrieva

l)http://w

ww.cbs.dtu.dk/services/

TMHMM-2.0/

www.bp.nuap.nagoya-u.ac.jp

/sosui/

24

HomologymodelingusingSwissM

odel

Useahomology-basedfully

automated

methodto

predictthethree-

dim

ensional(3D)structure

ofa

globularprotein.Compare

predicted

andexperimentally

determ

ined

structures,andcalculate

the

backboneRMSdeviationbetw

een

thetw

o.Analyze

thequalityofthe

predictedstructure.

Internetaccess

http://swissm

odel.e

xpasy.org//

SWISS-M

ODEL.htm

lWebbrowser(Firefox)

Whatcheck(throughSwissM

odel)

Procheck(lo

cally)

OpenOfficeWriter

25

Commercialsoftware

demo

Demonstrate

acommercialsoftware

packagesuchasSyb

yl

(www.tripos.com),MOE

(www.chemcomp.com),

DIscoveryStudio

(accelrys.com),or

Quanta

(www.accelrys.com).

Anyavailable

commercialsoftware

packageforbioinform

aticsand

computationalbiology

26

Finalproject—

applyingstructural

bioinform

aticstools

Students

apply

tools/knowledgefrom

laboratory

15–25to

theirfinalproject

underinstructor’sassistance.

Eachlaboratory

is2-hrlongandstartsim

mediately

aftera1-hrlecture

onthetopic

oftheexercise.Students

complete

areport

attheendofeachlaboratory.Thelaboratory

report

consists

ofdata

presentationandanalysis,andanswers

toposedquestions.Requiredsoftware

islistedin

italic.

331

Page 8: A portable bioinformatics course for upper-division undergraduate curriculum in sciences

TABLE IIIStudent assessment criteria

Component Material covered Percent of the final grade

Laboratory reports (24) One report per laboratory task. 60Includes analysis of results and answers to posed questions.

Students work in teams of two.Final project—written Individual projects. Student receives a code corresponding to a gene.

Student has to select and apply bioinformatics tools to discover theidentity and function of the gene, and the structure and function of itsproduct. Typical genes assigned to students are plant and animalgenes relevant to agriculture and animal care, and human genesinvolved in diseases.

30

Final project—oral presentation Each student prepares and presents a 10 minute PowerPoint [17] talkexplaining his/her strategy to the project and the results obtained.Attendance is required and the audience is encouraged to engage indiscussions and ask questions.

10

TABLE IVCourse surveys from Spring 2006 and Fall 2007 cohorts (45 students)

No. Statement

Percentage (%) of students responding

5 4 3 2 1 NA

Confidence with bioinformatics skills1 Before this course, I could effectively use bioinformatics

tools such as the ones in the NCBI website6 18 24 18 35 0

2 After this course, I understand how to effectively usebioinformatics tools

71 24 6 0 0 0

3 Overall, I have improved my computer skills 41 47 6 6 0 04 I have gained skills in using computational tools to study

biological problems65 35 0 0 0 0

5 I have acquired a better understanding of howbioinformatics tools are used in biomedical research

53 47 0 0 0 0

6 I am likely to apply the skills I acquired in this course in myprofessional life

29 53 12 0 0 6

7 I have a better understanding of the links between DNA-proteins disease

41 47 6 6 0 0

8 I increased my skills in viewing and interpreting 3Dstructures of proteins

65 24 12 0 0 0

Course content9 The laboratory tasks were appropriate in terms of difficulty

level53 41 0 6 0 0

10 The laboratory tasks were appropriate in terms of the timerequired to complete them

59 41 0 0 0 0

11 Too much time was spent with lectures 0 0 41 35 24 012 Covering a topic with a 1-hr lecture followed by two hours

of practice helped me to better understand and retain itscontent

29 65 6 0 0 0

13 I would have preferred to have lectures on a separate dayand time

0 0 18 53 29 0

14 I would have preferred to be evaluated by written exams 0 0 18 35 47 015 The laboratory tasks material was too detailed 6 12 24 41 18 016 The description of the final project was clear and easy to

follow29 65 6 0 0 0

17 I enjoyed the focus of the final projects 47 41 6 0 0 618 I enjoyed listening to my peers presenting their final

projects to the class41 47 12 0 0 0

Team work19 I would have liked to work more independently throughout

the laboratories35 18 29 6 6 6

20 I feel I would have learned more if I did not have to share acomputer with my team mate

29 29 18 12 6 6

Computer hardware and software21 The computer facilities for this course were adequate 18 59 6 6 6 622 The software and web tools used in this course were

adequate29 53 18 0 0 0

23 Having 24-hr remote access to all software and materialsof this course was very helpful

82 18 0 0 0 0

Overall satisfaction24 The instructor was effective 82 18 0 0 0 025 This course should be offered again 82 18 0 0 0 0

Student response are based on the following score scale: 5, strongly disagree; 4, disagree; 3, neither disagree or agree; 2, agree; 1,strongly agree; NA, not applicable. Evaluations are completed online at anytime following the mid point of the course and are completelyvoluntary.

332 BAMBED, Vol. 36, No. 5, pp. 325–335, 2008

Page 9: A portable bioinformatics course for upper-division undergraduate curriculum in sciences

Applied Project at the End of the Course

For the final project, each student is assigned a codethat uniquely identifies a nucleotide sequence. They areexpected to use a combination of bioinformatics toolsthey learned through the laboratory exercises to: (a) iden-tify the nucleotide sequence and the protein it encodes;(b) identify homologous sequences; (c) find informationabout the biological function of the protein; (d) find struc-tural information; (e) use automated servers to buildstructural models for proteins, if no experimental struc-ture is available; and (f) analyze the quality of the struc-ture and correlations between structure and function ofthe protein.

Guidelines for executing the final project are posted onblackboard. They include instructions on the layout ofthe written report (paper-style with standard sectionssuch as abstract, methods, results and discussion, andconclusions) and a general strategy for developing theprojects. The general strategy includes steps that are notapplicable to every project. Students are expected todecide what steps are relevant for their particular project.They are also instructed to devise their own strategyaccording to the focus they want to give to their project.For example, a student planning to attend medical schoolmay feel motivated to focus on the human disease aspectsof his/hers’ assigned gene. This focus will then determinewhich type of information needs to be obtained and whichdatabases and programs will be used.

Grading is based on style and required sections; cleardefinition of goals and strategy; justification for choice ofmethods and tools; proper description of procedures,parameters and results; proper analysis of results; use ofat least five bioinformatics tools; at least one attempt atsequence comparison; at least one attempt at obtainingthree-dimensional structure for the protein encoded bythe query gene.

Oral Presentation of Final Project

Students are required to prepare and present a 10minutes PowerPoint [17] talk explaining his/her strategyto execute the course project and the results obtained.Attendance at presentations is required from all students,and the audience is encouraged to engage in discus-sions and ask questions. Guidelines for preparing andpresenting a short talk are posted on blackboard [13].

The oral presentations are scheduled before the dead-line for the written report. This way, students can incor-porate the feedback they receive from their peers andfrom the instructor into their written report.

Grading is based on general aspects such as pace,voice, use of visual aids; content organization and logicalflow; command of subject; and handling of questionsfrom the audience.

COURSE EVALUATION

The course is assessed using a survey comprised 25Likert scale questions. The questions were designed toassess student’s satisfaction with the following aspectsof the course: (a) confidence with acquired bioinformatics

skills; (b) course content; (c) course structure (team work,grading system, lecture/laboratory format); (d) courseinstruction (instructor’s effectiveness, course materials,etc); (e) computer hardware and software; (f) overall sat-isfaction. This survey is in addition to standard instructorevaluations.

The combined results of the Spring 2006 and Fall 2007assessments, shown in Table IV, are very encouraging.All (100%) students surveyed strongly agreed/agreedthat the instructor was effective (statement 24), and thiscourse should be offered again (statement 25). The vastmajority of students (93%) responded highly satisfiedand satisfied with the course’s structure, content, andinstruction model (statements 2, 9, 10, 12, and 16). Mostof them (82% strongly agree/agree) believe that the skillsthey learned will be useful in their professional lives(statement 6), all (100% strongly agree/agree) believethat they acquired a better understanding of the applica-tion of bioinformatics in the biomedical field (statement5), and all (100% strongly agree/agree) feel confidentthat they can use these skills (statement 4). Studentsalso responded very positively (94% strongly agree/agree) to the 24-hour remote access provided (statement23).

The Spring 2006 survey identified physical space andlaboratory computer workstations (slow Pentium III PCswith not enough memory used in that instance) as themain area for improvement. In the Fall 2007, studentshad access to better equipment (dual core laptops with2GB of memory and 1700 LCD screens) and physicalfacilities, which is reflected in their answer to statement 21‘‘The computer facilities for this course were adequate’’(56% strongly agree/agree and 31% disagree/strongly dis-agree for Spring 2006, compared to, respectively, 82%and 0% in Fall 2007). Data for individual surveys are notshown in this work, but are available upon request.

The results of the two course assessments performedindicate that this is an effective course that studentsenjoy taking. These results combined with personal feed-back received from students are the motivation for writ-ing this article.

RECOMMENDED BOOKS

Finding a comprehensive bioinformatics book is notalways an easy task. Bioinformatics is such a young fieldand can be approached from so many different anglesbecause of its interdisciplinary nature. For this reason,the course was designed to be self-contained, andall the information necessary for the students to succeedin their course work is provided in the lectures or aselectronic links to online resources. This author opted forrecommending multiple books [17–20] and leaving to thestudents the choice of selecting one according to theirpersonal preferences and aptitudes.

SYSTEM REQUIREMENTS

Software

Many of the software tools required for the laboratorytasks are available online (their URLs are found in Table II).

333

Page 10: A portable bioinformatics course for upper-division undergraduate curriculum in sciences

There are only three applications (VNC viewer [21, 22],Putty [23], and WinScp [24]) that need to be installed onthe laptops the students use in the laboratory. All othersoftware is installed and runs locally on the Linux servermachine and are open source and/or freely distributed foracademic use. The use of a server machine makes soft-ware maintenance and updating trivial.

Nucleotide sequence analysis and manipulation is per-formed using the open-source software packageEMBOSS [25] through a freely distributed java-basedGUI called JEMBOSS [26]. Amino acid sequence analysisand manipulation use EMBOSS/JEMBOSS and Clustal[27]. Phylogenetic trees are created using the PHYLIP[28] package.

Protein structure analysis and visualization use the mo-lecular visualization program Visual Molecular Dynamics(VMD) [29]. Quality check for protein structures uses Pro-check [30] and WhatCheck [31]. Both programs readPDB files, calculate torsion angles, create Ramachandranplots, and check other structural parameters.

A remote desktop environment is created using VNC.VNC is a cross-platform remote control software for con-necting one computer desktop (the VNC server) to otherdesktops using a VNC viewer. There are several freeVNC server packages available for Linux [32] and manyfree VNC viewers available for Microsoft Windows andMac OS X (e.g., VNCviewer [21] for Windows, andchicken of the VNC [22] for Mac OS X). Students aregiven a user account each on the Linux server machineand an instance of the VNC server is run for them. Usingthe VNC viewer on their laptop, they can then connect totheir VNC server instance.

Text and spreadsheet editing use the programs Writerand Calc, which are part of the free OpenOffice package[33] included with most Linux distributions. Image view-ing and manipulation use ImageMagick [34], which areincluded with most Linux distributions.

Putty [23], a free SSH Client for Windows platforms, isused for logging into the Linux server from another com-puter for performing quick tasks such as restarting VNCservers for the students. WinSCP [24] is used for trans-ferring files to/from the server machine.

Hardware

Our implementation of this course uses an eight (8)core (dual quad-core) Dell PowerEdge 1950 server run-ning Linux with 8 GB of memory and 320 GB of diskspace. This server is capable of handling at least 13 con-current students actively using the scientific applicationsrequired for performing a laboratory exercise. Studentsuse Dell Inspiron 9400 laptops to remotely connect tothe server and perform their laboratory exercises. Theselaptops have Intel dual core 2.0 GHz CPUs, 2 GB ofmemory, 80 GB of dick space, 1700 LCD screens, andbuilt-in wireless adapters.

DISTRIBUTION OF COURSE MATERIALS

Please contact the author.

CONCLUSIONS

The development of a bioinformatics course is not aneasy task. As an emerging, intrinsically multidisciplinaryfield, bioinformatics has no standard content establishedat the present. Moreover, it can be approached from asmany different angles as there are fundamental disci-plines forming the core of bioinformatics. A courseintended for biological sciences undergraduates can, inprinciple, be designed very differently than a course forcomputer science majors. So where do we start the bio-informatics education of undergraduate students, regard-less of major? A course such as the one presented inthis study is general enough that students from differentmajors can take it together, which helps to emphasizethe multidisciplinary and collaborative nature of the field.If implemented as an introductory course to a bioinfor-matics series, the courses that follow in the series canbe tailored to a particular ‘‘origin’’ field. For example, acomputer science department may include algorithms forbioinformatics, database design, and Linux and shellscripting, as follow-up courses, whereas a biological sci-ences department may include genomes, microarraydata analysis, and introduction to programming for lifesciences as follow-ups.

The course presented here is especially attractive tocommunity colleges and small universities with limitedresources for software, hardware, and information tech-nology support. All software used in the course is freelyavailable for academic use. The course requires only oneserver machine running Linux, with all the access-pointmachines being standard PCs running Windows or Vistaor MACs running MacOSX. There is no need for local in-stallation of software, because all computation is doneon the server machine. Even departments with no com-puter laboratories can offer this course by requiring thatstudents bring their own laptops, which most of themhave these days. Given the portability of the laptops, thecourse can also easily move from room to room withonly the requirements of wireless access and adequatenetwork bandwidth to the Linux server. The minimuminvestment required from the institution is one computerwith enough memory, CPU power and disk space to runthe course, one faculty that is knowledgeable of the fieldor is willing to learn the topics suggested in this article,and one computer technician that can setup the Linuxserver, install and maintain software (most bioinformaticsfaculty can perform these duties, if needed).

It is the author’s hope that this study will facilitate theimplementation of bioinformatics courses and help dis-seminate this fascinating field.

Acknowledgment—The author thanks Darryl Willick from theMSC center at the California Institute of Technology for valuabletechnical advice. The author thanks the students from theSpring 2006 and Fall 2007 classes for their feedback, whichwas critical to improve the quality and effectiveness of thiscourse.

REFERENCES

[1] U.S. Bureau of Labor Statistics. http://www.bls.gov.[2] R. B. Altman (1998). A curriculum for bioinformatics: The time is

ripe, Bioinformatics 14, 549–550.

334 BAMBED, Vol. 36, No. 5, pp. 325–335, 2008

Page 11: A portable bioinformatics course for upper-division undergraduate curriculum in sciences

[3] T. Doom, M. Raymer, D. Krane, O. Garcia (2003) Crossing the inter-disciplinary barrier: A baccalaureate computer science option inbioinformatics, IEEE Trans. Educ. 46, 387–393.

[4] P. A. Pevzner (2004) Educating biologists in the 21st century: Bioin-formatics scientists versus bioinformatics technicians, Bioinformatics20, 2159–2161.

[5] S. Ranganathan (2005) Bioinformatics education—Perspectives andchallenges, PLoS Comput. Biol. 1, e52.

[6] M. Gerstein, D. Greenbaum, K. Cheung, P. L. Miller (2007) An inter-departmental Ph.D. program in computational biology and bioinfor-matics: the Yale perspective, J. Biomed Inform. 40, 73–79.

[7] S. Cattley (2004) A review of bioinformatics degrees in Australia,Brief. Bioinform. 5, 350–354.

[8] B. Krilowicz, W. Johnston, S. B. Sharp, N. Warter-Perez, J.Momand (2007) A summer program designed to educate collegestudents for careers in bioinformatics, CBE Life Sci. Educ. 6,74–83.

[9] N. B. Centeno, J. Villa-Freixa, B. Oliva (2003) Teaching structuralbioinformatics at the undergraduate level, Biochem. Mol. Biol.Educ. 31, 386–391.

[10] P. B. Heidorn, C. L. Palmer, D. Wright (2007) Biological informationspecialists for biological informatics, J. Biomed. Discov. Collab. 2, 1.

[11] D. Counsell (2003) A review of bioinformatics education in the UK,Brief. Bioinform. 4, 7–21.

[12] J. E. Honts (2003) Evolving strategies for incorporation of bioinfor-matics within the undergraduate cell biology curriculum, Cell Biol.Educ. 2, 233–247.

[13] Blackboard. http://www.blackboard.com

[14] MediaWiki—Written for Wikipedia. www.mediawiki.org/.[15] TWiki—the Open Source Wiki for the Enterprise. http://www.twiki. org/.[16] PowerPoint. http://office.microsoft.com/powerpoint.[17] M. Zvelebil, J. Baum (2007) Understanding Bioinformatics, Garland

Science, New York. ISBN 978 0 8153 4024 9.[18] A. D. Baxevanis, B. F. F. Ouellette, Eds. (2005) Bioinformatics: A

Practical Guide to the Analysis of Genes and Proteins, 3rd ed.,Wiley, New York. ISBN 0471478784.

[19] T. A. Brown (2006) Genomes, 3rd ed., Garland Science, New York.ISBN 0815341385.

[20] C. Gibas, P. Jambeck (2001) Developing Bioinformatics ComputerSkills, O’Reilly. ISBN: 1-56592-664-1.

[21] VNC viewer for windows. http://www.realvnc.com.[22] VNC viewer for Mac OS X. http://sourceforge.net/projects/cotvnc/.[23] Putty. http://www.putty.org.[24] WinScp. http://winscp.net.[25] EMBOSS. http://emboss.sourceforge.net/.[26] JEMBOSS. http://emboss.sourceforge.net/Jemboss/.[27] Clustal. http://www.clustal.org.[28] PHYLIP. http://evolution.genetics.washington.edu/.[29] VMD. http://www.ks.uiuc.edu/Research/vmd.[30] Procheck. http://www.biochem.ucl.ac.uk/�roman/procheck/procheck.

html.[31] WhatCheck. http://swift.cmbi.kun.nl/swift/whatcheck.[32] VNC servers. http://www.realvnc.com, http://www.tightvnc.com,

http://xf4vnc.sourceforge.net.[33] OpenOffice package. http://www.openoffice.org.[34] ImageMagick. http://www.imagemagick.org.

335