IMPLEMENTING A WEB BASED INTRODUCTORY …€¦ · IMPLEMENTING A WEB-BASED INTRODUCTORY BIOINFORMATICS COURSE FOR NON-BIOINFORMATICIANS THAT INCORPORATES PRACTICAL EXERCISES Antony

IMPLEMENTING A WEB-BASED INTRODUCTORY BIOINFORMATICS

COURSE FOR NON-BIOINFORMATICIANS THAT INCORPORATES

PRACTICAL EXERCISES

Antony T. Vincent1, Yves Bourbonnais

1, Jean-Simon Brouard

1, Hélène Deveau

1, Arnaud Droit

2,

Stéphane M. Gagné1, Michel Guertin

1, Claude Lemieux

1, Louis Rathier

3, Steve J. Charette

1 and Patrick

Lagüe1*

1Département de Biochimie, de Microbiologie et de Bio-informatique, Faculté des sciences et de génie,

Université Laval, Québec (Québec), Canada

2Centre Hospitalier de l’Université Laval, Faculté de Médecine, Université Laval, Québec (Québec),

Canada

3Équipe de soutien informatique, Faculté des sciences et de génie, Université Laval, Québec (Québec),

Canada

Correspondence:

Corresponding Author

[email protected]

SUPPLEMENTARY MATERIAL

mailto:[email protected]

2

TABLE OF CONTENT

Table S1 Topics, tools and subjects covered in the BIF-1901 course .……. P. 3

Brief descriptions of the assignments………………………………………. P. 4

BIF-1901 Syllabus (Translated English version) .………………………….. P. 9

BIF-1901 Plan de cours (Original French version) …………………………P. 16

3

Table S1. Topics, tools and subjects covered in the BIF-1901 course.

Topic Tools Subjects covered

Informatics

GNU/Linux NoMachine, FileZilla Bash command line

Remote access to a server

Biological sequences

The databases GenBank, PDB,

UCSC, ExPASy Creation of databases

Type of data

Specific VS general databases

Cross information between databases

Ontology

Sequencing and assembly Greedy algorithm:

SSAKE, VCAKE,

SHARCGS

Graph algorithm:

Velvet, ABySS, Ray

Tablet

Sanger sequencing

Human genome project

Next-generation sequencing

Sequence assembly (with and without a reference)

Quality score (PHRED)

Sequence alignments LALIGN, BLAST,

Muscle, Jalview

Homology

Global and local alignments

Substitution matrices and gap score

Multiple alignment

Databases of model organisms Saccharomyces

Genome Database, Dictybase,

Pseudomonas

Genome database,

Mouse Genome

Informatics,

SignalP, Phobius, TMHMM, NetPhos

Using online resources on model organisms

Gene ontology (GO)

Determining motifs and domains of a protein

Predict the cellular localization of a protein

Predict the post-translational modifications of a

protein

Phylogenetics Muscle, Jalview,

Gblocks

ClustalX, PhyML,

NJplot

Phylogenetic vocabulary

Rooted and unrooted trees

Multiple alignment

Filtration

Phylogenetic models

Phylogenetic methods

Structural bioinformatics

3d structure of protein PyMOL Understand the structural determinants of a protein

Visualizing the 3D structure of a protein

Generate a publishing grade quality image

Protein structure prediction chofas, SCRATCH,

Jpred 3, Modeller,

PyMOL

Crystallography

Nuclear magnetic resonance

Secondary prediction

3D structure prediction

Molecular modeling and docking Poseview, PyMOL,

Autodock Tools,

Autodock Vina,

Molinspiration

Energy minimization

Conformational search

Molecular dynamics

Molecular docking

Understanding the notion of force field

https://www.nomachine.com/

https://filezilla-project.org/

https://www.ncbi.nlm.nih.gov/genbank/

http://www.rcsb.org/pdb/home/home.do

https://genome.ucsc.edu/

https://www.expasy.org/

https://github.com/warrenlr/SSAKE

https://sourceforge.net/projects/vcake/

http://sharcgs.molgen.mpg.de/

https://www.ebi.ac.uk/~zerbino/velvet/

http://www.bcgsc.ca/platform/bioinfo/software/abyss

http://denovoassembler.sourceforge.net/

https://ics.hutton.ac.uk/tablet/

http://www.ebi.ac.uk/Tools/psa/lalign/

https://blast.ncbi.nlm.nih.gov/Blast.cgi

http://www.drive5.com/muscle/

http://www.jalview.org/

http://www.yeastgenome.org/

http://www.yeastgenome.org/

http://dictybase.org/

http://www.pseudomonas.com/

http://www.pseudomonas.com/

http://www.informatics.jax.org/

http://www.informatics.jax.org/

http://www.cbs.dtu.dk/services/SignalP/

http://phobius.sbc.su.se/

http://www.cbs.dtu.dk/services/TMHMM/

http://www.cbs.dtu.dk/services/NetPhos/

http://www.drive5.com/muscle/

http://www.jalview.org/

http://molevol.cmima.csic.es/castresana/Gblocks_server.html

http://www.clustal.org/

http://www.atgc-montpellier.fr/phyml/

http://doua.prabi.fr/software/njplot

https://www.pymol.org/

http://fasta.bioch.virginia.edu/fasta_www2/fasta_www.cgi?rm=misc1&pgm=cho

http://scratch.proteomics.ics.uci.edu/

http://www.compbio.dundee.ac.uk/jpred/

https://salilab.org/modeller/


http://www.zbh.uni-hamburg.de/en/research/research-group-for-computational-molecular-design/server/poseview.html


http://autodock.scripps.edu/

http://autodock.scripps.edu/

http://www.molinspiration.com/

4

BRIEF DESCRIPTIONS OF THE ASSIGNMENTS

5

Assignment 1 (Module 1: Introduction to Bioinformatics, Software and Linux)

The main goals of the first assignment are to (1) determine the knowledge of the students in basic

BASH command lines, and (2) ensure that they all have the informatics skills and materials to complete

the course. Students have to use a bioinformatics server running under the operating system

GNU/Linux through the software NoMachine installed on their computer. Before this assignment

became part of the course, some students waited to be close to the date of homework to access the

server for the first time. Sometimes, these students had technical issues that dramatically slowed their

learning and their ability to do their assignment within the required timeframe.

To complete this assignment, students should connect to the server, open a terminal window and take

screen captures of their desktop showing the commands and the results of the following BASH

command lines introduced in a clip on the course’s portal:

1) Go to the “Desktop” folder, and use the command “pwd” to print the folder’s path.

2) Create a folder with the name based on your login name, and print the list of the files of the

“Desktop” folder.

3) Create an empty file named “homework1.txt” in the “Desktop” folder, and print the content of

the “Desktop” folder.

4) Copy the file “homework1.txt” from the “Desktop” folder to the folder based on your login

name, and print the content of this folder.

5) Remove the file “homework1.txt” from the “Desktop” folder and print the content of the

“Desktop” folder.

6) Go to the folder based on your login name and change the name of the file “homework1.txt” for

“loginname-homework1.txt” where loginname is your login name.

7) Print all the running processes on the server (students have to find this command by themselves

as it is not taught in the course).

8) Print the history of the commands used to achieve the assignments until this step.

6

9) Print the human readable file list of the folder based on your login name.

10) Go back to the “Desktop” folder and create an archive (a “tar” file) of the folder based your

login name and all the screen captures created in the previous steps. Deposit this archive file in

the appropriate section of the course’s portal.

Assignment 2 (Module 4: Sequence Alignment and Database Search (1st part) and Module 5:

Sequence Alignment and Database Search (2nd part))

This assignment is a practical introduction to protein’s dataset retrieval and sequence alignment. To

complete this assignment, the students have to:

1) Choose a dataset of curated sequences in the Protein Clusters of the NCBI

(https://www.ncbi.nlm.nih.gov/proteinclusters). The dataset should contain between 15 and 30

sequences having between 300 and 600 amino acids in length. Also, since this dataset will also

be used for the third assignment on the phylogenetic analysis and trees, the sequences should be

from taxonomically diverse organisms (several genera of bacteria, for example).

2) Create a FASTA file with the dataset of the previous step, and manually edit this file to add a

short description in the title of each sequence included in the file.

3) Transfer the FASTA file on the bioinformatics server, and realize a multiple sequence

alignment using the MUSCLE software.

4) Open in JALVIEW the file created in step 3, and color the residues of the alignment having an

identity of at least 80% using the ClustalX color format. The results should be printed in a file

using the EPS file format and deposited in the appropriate section of the course’s portal.

Assignment 3 (Module 7: Phylogenetic Analysis and Module 8: Building Phylogenetic Trees)

In this assignment, each student has to construct a maximum likelihood tree using the protein dataset

that was selected for the second assignment. Using the same dataset help students to have interests in

obtaining results since they choose the sequences and concretely see the evolution of their work. The

robustness of the inferred tree is also evaluated by analysis of bootstrap replicates. Detailed instructions

are provided in a clip deposited on the course’s portal. The assignment involves the following steps:

https://translate.google.com/translate?hl=fr&prev=_t&sl=fr&tl=en&u=https://sitescours.monportail.ulaval.ca/ena/site/module%3FidSite%3D70785%26idModule%3D513702%26editionModule%3Dfalse

https://www.ncbi.nlm.nih.gov/proteinclusters

7

1) Removal of the ambiguous regions in the sequence alignment that was generated in the second

assignment using GBLOCKS.

2) Conversion of the filtered FASTA file to the PHYLIP format using the CLUSTALX software

program.

3) Inference of a maximum likelihood tree and evaluation of its robustness using the PHYML

program.

4) Visualization and edition of the phylogenetic tree, and production of an illustration using the

interactive NJPLOT program.

Students should deposit on monPortail the original alignment and the filtered one, the resulting

inferred tree as well as a short text discussing the resulting topology.

Assignment 4 (Module 9: 3D Structure of Proteins and Module 10: Predicting the Structure of

Proteins)

This assignment is a practical introduction to PyMOL and the preparation of publication-quality

images, and the prediction of protein 3D structures. To complete this assignment, the students in teams

of 2 or 3 have to:

1) Build a publication-quality (ray-traced) image of the protein PDB 3RGK using PyMOL.

2) For a given amino acid sequence (UniprotKB R4QRB9), find in the PDB the homologous

proteins with a 3D structure. For each sequence, provide the PDB code, the name of the protein,

the percentages of identity and similarity. Provide also the alignments.

3) Predict the secondary structure of the amino acid sequence using three different methods.

4) Predict the 3D structure by homology modeling using the PDB 3VNF as a template and

MODELLER on the bioinformatics server.

5) Using PyMOL, build a publication-quality image that compares both the predicted structure and

the template structure. Save the PyMOL session used to build the image in a PSE file.

6) Deposit the files in the appropriate section of the course’s portal.

8

Assignment 5 (Module 11: Introduction to Molecular Modeling and Module 12 Molecular

Docking)

For this assignment, the students are required to complete two docking simulations using Autodock

Vina and its PyMOL plugin. The different steps of the simulations are presented in a series of clips on

the course’s portal. Each team of 2 or 3 students performs their docking simulations on the

bioinformatics server, following the steps presented in the clips but using a different protein-ligand

complex from the PDB database, and discusses the results in a report.

For example, a student team is assigned to the protein-ligand complex PDB 10GS (Human glutathione

S-transferase P1-1, complexed with the ligand TER117). The students are required to:

1) Present the protein-ligand complex in the Introduction of the report, including the biological

function of the protein and the nature of the ligand (inhibitor, natural substrate, cofactor, etc.).

2) Identify the ligand in the complex and determine the molecular interactions between the protein

and the ligand. A PyMOL image and a Table reporting the interactions are included in the

report.

3) Prepare the protein and the ligand for docking using PyMOL.

4) Determine the docking parameters using the Vina’s PyMOL plugin and run the docking

simulation.

5) Modify the ligand using PyMOL, with the expectation of increasing the ligand’s affinity for the

protein.

6) Run a docking simulation for the modified ligand.

7) Include PyMOL images and a table of the docking results in the report.

8) Include in the report a discussion of the docking results in respect of the protein-ligand

interactions identified in step 2 and of the chemical modifications of step 5.

9

BIF-1901 SYLLABUS

(TRANSLATED ENGLISH VERSION)

10

SYLLABUS

BIF-1901: Introduction to Bioinformatics and Bioinformatics Tools

Faculty of Science and Engineering

Department of Biochemistry, Microbiology and Bioinformatics

Mode of instruction: Online course

Credits: 3

Note: This Syllabus is a translation of the essential sections of the original Lesson Plan of the course,

available online at www.ulaval.ca, and provided as supplementary material.

Description

The course aims to introduce students to the various fields of applications of bioinformatics. The course

emphasis is on learning the main tools of bioinformatics related to:

• databases of sequences and structures

• methods of sequencing and assembling genomes

• analysis and alignment of amino acid sequences

• phylogenetic analysis

• modelling the structure of proteins from their sequence

• modelling protein-ligand interactions (e.g., antibiotic, substrate, inhibitor) from molecular

docking

The students are also introduced to the concepts of systems biology.

This course is offered online. For more information, see the course page at www.distance.ulaval.ca.

Objectives

At the end of the course, students will be able to:

• Describe the major fields of application and challenges in bioinformatics

• Effectively retrieve information from biological databases and understand their importance in

bioinformatics

• Understand the main theoretical and practical aspects of:

o aligning and assembling DNA and protein sequences

o similarity searches in databases

o phylogenetics

o prediction of the 3D structure of protein

o molecular modelling

• Use various specialized resources to characterize patterns and domains of a protein, its location

in the cell, and its likely post-translational modifications

• Compare the experimental methods used to determine the 3D structure of proteins at the atomic

level and tools for predicting the structure of proteins

• Understand the importance of modelling and molecular docking in the study of biological

molecules

http://www.ulaval.ca/

http://www.distance.ulaval.ca/

11

Educational approach: a note to students

This course is designed according to a pedagogical approach specific to online learning. The teaching

materials and the method allow students to adopt a relatively autonomous learning approach. You can

manage your own study time and be responsible for your own learning.

The person responsible for the course will remain available to support you throughout the session. The

role of this person is to facilitate the learning conditions and to help you in your approach so that you

achieve the objectives of the course. You can communicate with this person by various means: send an

e-mail for more personal questions, and use the forum for issues of general interest that will benefit the

entire class. The modalities of supervision (response time, availability, etc.) are further described in the

Lesson Plan.

The course website and the textbook (Essential Bioinformatics) contain all the teaching materials

required to take the online course, including booklets, lectures, demonstrations, exercises, and anything

else you might need. Each week, you are invited to consult the Content and Activities tab of the module

for the description of the learning and evaluation activities planned. In general, the schedule proposed

in the Content and Activities section is flexible and can adapt to your time schedule within the space of

the week. Online training allows you to learn at your own pace; however, by adopting a regular

learning schedule from the beginning of the course, you will be able to benefit from regular feedback

from the person providing supervision (i.e. plan to do your class requirements at the beginning of each

week to allow time to ask questions and resolve difficult content questions.) You remain, of course, the

only person who manages your schedule, but you must commit to performing the homework and

summative evaluations at the prescribed times (see the Evaluations and Results tab).

For the duration of the session, you will have access to a bioinformatics server dedicated to the needs

of the course. This server is hosted at the Faculty of Science and Engineering at IP

XXX.XXX.XXX.XXX and runs on Linux. To complete the assignments, you will need to log in

remotely to the server and use some of the bioinformatics programs installed there. All the information

needed to connect and work properly with the server will be presented in Module 1.

The expected duration of the course is 15 weeks. It is divided into 12 modules, each tackling a specific

theme. Typically, each module has a duration of one week. The amount of work required to complete

the modules and the evaluations is about 135 hours. On average, the weekly workload is about 9 hours,

but some modules are longer and others are shorter. To access the modules, go to the Content and

Activities tab from the course website.

In each module, you will find the following information:

Content tab

• Introduction

• Specific objectives

Activities tab

• Learning activities: instructions detailing the work to be done for a given module

• Learning resources: mandatory instructional material (text, videos, clips, etc.)

Complementary Tab

12

• For each module, links to interesting website addresses are offered.

Summative evaluation Tab

• This tab will only be available for the modules where you need to complete a summative

evaluation. It will contain hyperlinks and evaluation information.

The Student Aid Center offers advice on how to succeed academically. They can help you improve

your learning strategies, help with basic content, and help you in the management of your study time.

See (https://www.aide.ulaval.ca/cms/site/aide).

Terms and conditions

The feedback provided by the supervisor can take different routes. This course focuses on two means

of coaching: e-mail and discussion forums. It is important to be aware that responses to e-mail

questions will not be instantaneous. In this course, the supervisor will reply within two working

days. In order to avoid delays, it is recommended that you send e-mail only for personal questions, and

use the forum for general content questions. Please be clear and explicit in your questions and

comments (e.g., specify document names and references).

In addition, you can also use the discussion forums to discuss various content issues with other

students. As you study remotely, the forum is a tool that allows you to converse with your colleagues

and with the person providing the supervision. In this course, there will be three types of forums:

• the forums specific to each module, where you can ask discuss and questions about each content

module. • the General Questions forum where you can ask questions about the administrative aspects of

the course

• the Technical Aspects forum where you can ask questions about the connection to the course

server, transfer of files or other technical aspects important to the success of the course

To facilitate timely responses to your needs and the management of the forums, be sure to ask your

questions in the right section. Be explicit in the title of your messages for the best conversations and

responses.

We are committed to answer or validate your responses within two business days. Face-to-face support

will be available on Monday afternoons on the campus.

Course Goal

The aim of this course is to introduce the student to the vocabulary and the theory of bioinformatics and

to provide them with hands-on experience with specialized bioinformatics tools and programs.

Introduction

The course BIF-1901: Introduction to bioinformatics and its tools aims to introduce students to the

various fields of application of bioinformatics. The course focuses on learning the main tools of

bioinformatics, including:

sequencing databases

genome sequencing and assembly methods,

the alignment of nucleic acid and protein sequences

modelling of protein structures, based on their sequences

modelling of enzyme-substrate interactions by molecular docking and phylogenetic analysis

https://www.aide.ulaval.ca/cms/site/aide

13

This course is intended for last-year students enrolled in bachelor's programs in Biochemistry or

Microbiology. The course BCM-2000 Molecular Genetics II or BIO-2003 Molecular Biology is a

prerequisite to the course.

In addition, to take this course, you will have to master some basic computer skills. For example, at the

beginning of the session, it is your responsibility to ensure that you are confident in working remotely

in a Linux environment. All of the necessary information for the computer requirements will be given

to you in the first module of the course.

This lesson plan presents all the information necessary for participation in this course. It includes the

instructions for the teaching materials you will use, information about how much leeway you may take

in the paths you choose to follow, and about the different requirements to which you will have to fulfill.

Happy reading and good luck with the course!

Course content and activities

The table below presents a week-by-week plan of the course activities.

Week 1 - Module 1: Introduction to Bioinformatics, Software and Linux Sep 5, 2016

Week 2 - Module 2: Biological Databanks Sep 12, 2016

Week 3 - Module 3: Sequencing and Assembly Techniques Sep 19, 2016

Week 4 - Module 4: Sequence Alignment and Database Search (1st part Sep 26, 2016

Week 5 - Module 5: Sequence Alignment and Database Search (2nd part) Oct 3, 2016

Week 6 - Free Work Oct 10, 2016

Week 7 - Module 6: Model Organization Databases and Prediction of

Patterns and Functions of Proteins Oct. 17, 2016

Week 8 - Module 7: Phylogenetic Analysis Oct 24, 2016

Week 9 - Reading Week Oct 31, 2016

Week 10 - Module 8: Building Phylogenetic Trees Nov. 7, 2016

Week 11 - Module 9: 3D Structure of Proteins Nov 14, 2016

Week 12 - Module 10: Predicting the Structure of Proteins Nov 21, 2016

Week 13 - Module 11: Introduction to Molecular Modeling Nov 28, 2016

Week 14 - Module 12: Molecular Docking Dec 5, 2016

Week 15 - Free Work Dec 12, 2016

Note: Please refer to the Content and Activities section of the course website for further details.

Evaluations and results

Assignment instructions. Each assignment will be accompanied by detailed instructions. Instructions

may vary between modules as different professors are responsible for each. Unless otherwise stated,

homework should be uploaded to the drop box of the course website. Homework should be done

individually unless otherwise indicated. A late penalty of 10% will be applied to homework.

DO NOT TO WAIT UNTIL THE LAST MINUTE to start the homework. Homework consists of

practical exercises that use computer tools, and time-consuming problems may arise, related to

particular hardware or software configurations. Sometimes a few hours or a few days are required to

deal with these problems. If a delay in handing in homework is the result of a delay related to a

breakdown that couldn’t get resolved in time because you started too late, you are still responsible for



14

the lateness of your assignment. Part of learning to use these computer tools is a realistic and practical

outlook in how long they can take, and planning for smooth handling of contingencies in your projects.

List of evaluations

Online evaluations

Online evaluation #1 From Oct. 13, 2016 9:00 AM Individual 10%

(Modules 4 and 5) To Oct. 14, 2016 4:00 PM

Online evaluation #2 From Oct. 27, 2016 9:00 AM Individual 15%

(Module 6) To Oct. 28, 2016 4:00 PM

Homework

Homework #1 (Module 1) Sept. 23, 2016 at 12:59 PM Individual 15%

Homework #2 (Modules 4&5) Oct. 14, 2016 at 12:59 PM Individual 10%

Homework #3 (Module 7&8) Nov. 18, 2016 at 12:59 PM Individual 15%

Homework #4 (Module 9&10) Dec. 2, 2016 at 12:59 PM Team 15%

Homework #5 (Module 11&12) Dec. 16, 2016 at 12:59 PM Team 20%

Grading Scale

Course Materials

Required Materials

Essential bioinformatics

Author: Xiong, Jin

Publisher: Cambridge University Press (Cambridge New York, 2006)

ISBN: 9780521600828

Software

Most of the required computer programs are already installed on the course server. However, you are

required to install the following (free) software:

Remote connection to the course server

• Nomachine (http://www.nomachine.com/download.php)

• FileZilla client (https://filezilla-project.org/download.php?show_all=1).

http://www.nomachine.com/download.php

https://filezilla-project.org/download.php?show_all=1

15

Modules 9 and 10: 3D structure of proteins

• PyMOL (http://pymol.org)

• Foldit (http://fold.it)

• Folding @ Home (http://folding.stanford.edu/English/Download)

Technological Specifications

To take this course you will also need one of the authorized calculators, which does not have internet

connectivity, as shown in the Policy on the Use of Electronic Devices during an Assessment

(https://www.fsg.ulaval.ca/fileadmin/fsg/documents/PDF/appareils_electro.pdf).

To be able to follow this course you will need to have, or have access to, the following tools:

• A computer with the applications required for web browsing

• An Internet connection (minimum intermediate speed, high speed recommended)

• Speakers or headphones

Mediagraphy and Annex

Bibliography

• Claverie, J. M. et C. Notredame. 2007. Bioinformatics for dummies. 2nd Edition. Wiley

Publishing, Inc. ISBN : 9780470089859

• Edwards, D., Stajich, J. E. and D. Hansen. 2009. Bioinformatics : tools and applications.

Springer. ISBN : 0387927379

• J. Pevsner. 2003. Bioinformatics and Functionnal Genomics. 1st Edition. John Wiley & Sons,

Inc. ISBN : 0471210048

• Rodríguez-Ezpeleta, N., Hackenberg, M. and A. Aransay. 2012. Bioinformatics for high

throughput sequencing. Springer. ISBN : 9781461407812

• J. Xiong. 2006. Essential Bioinformatics. Cambridge University Press. ISBN : 9780521600828

• Zvelebil, M. and J. O. Baum. 2008. Understanding Bioinformatics. Garland Science, Taylor &

Francis Group, LLC. ISBN : 9780815340249

http://pymol.org/

http://fold.it/

http://folding.stanford.edu/English/Download

https://www.fsg.ulaval.ca/fileadmin/fsg/documents/PDF/appareils_electro.pdf

16

BIF-1901 PLAN DE COURS

(ORIGINAL FRENCH VERSION)

17

18

19

20

21

22

23

24

25

26

27

28

29

Documents

IMPLEMENTING A WEB BASED INTRODUCTORY …€¦ · IMPLEMENTING A WEB-BASED INTRODUCTORY BIOINFORMATICS COURSE FOR NON-BIOINFORMATICIANS THAT INCORPORATES PRACTICAL EXERCISES Antony