LSM3241: Bioinformatics and Biocomputing Lecture 2: Bioinformatics of viral genome Prof. Chen Yu...

Preview:

Citation preview

LSM3241: Bioinformatics and BiocomputingLSM3241: Bioinformatics and Biocomputing

Lecture 2: Bioinformatics of viral genomeLecture 2: Bioinformatics of viral genome

Prof. Chen Yu ZongProf. Chen Yu Zong

Tel: 6874-6877Tel: 6874-6877Email: Email: csccyz@nus.edu.sgcsccyz@nus.edu.sghttp://xin.cz3.nus.edu.sghttp://xin.cz3.nus.edu.sg

Room 07-24, level 7, SOC1, Room 07-24, level 7, SOC1, National University of SingaporeNational University of Singapore

22

Resource of Viral GenomesResource of Viral GenomesNCBI Genome Database http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=genome

33

Resource of Viral GenomesResource of Viral GenomesNCBI Genome Database http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=genome

2,226 entries of viral genomes (1,524 distinct virus strains) in the database. Early 2005 figure: 1,250 entries and 1,022 distinct

1,193 entries of complete viral genome. Early 2005 figure: 900

44

Resource of Viral GenomesResource of Viral GenomesNCBI Genome Database http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=genome

12 entries of coronavirus genomes (8 in early 2005)

16 entries of influenza H5N1 genomes

55

Resource of Viral GenomesResource of Viral GenomesNCBI Genome Database http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=genome

Information of viral genomes in the database can also be retrieved by clicking the viruses link:

Click Here

66

Resource of Viral GenomesResource of Viral GenomesNCBI Genome Database http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=genome

List of viral genomes: (1,927 entries in Jan 2006, 1,461 in Jan 2005)

77

Resource of Viral GenomesResource of Viral GenomesNCBI Genome Database http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=genome

Viral taxonomy groups:

88

Resource of Viral GenomesResource of Viral GenomesNCBI Genome Database http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=genome

Viral genome list:

99

Resource of Viral GenomesResource of Viral GenomesViral genome list:

1010

Bioinformatics of Viral GenomesBioinformatics of Viral GenomesViral name link:

Viral genome link

All entries

1111

Bioinformatics of Viral GenomesBioinformatics of Viral GenomesViral protein link:

Limit to title search

1212

Bioinformatics of Viral GenomesBioinformatics of Viral GenomesSARS coronavirus PP1ab PID link. It gives multiple entries from difference strains or from related species

Viral strain

1313

Different strains of SARS coronavirusDifferent strains of SARS coronavirus

1414

Bioinformatics of Viral GenomesBioinformatics of Viral GenomesNote: Viral polyprotein is not a single protein, it is a combination of several proteins. Information about these proteins can be difficult to read

Suggestion: Looking into a latest NCBI entry of the same virus from a reputable research group

1515

Bioinformatics of Viral GenomesBioinformatics of Viral GenomesSARS coronavirus unknown sars3a PID link:

1616

Bioinformatics of Viral GenomesBioinformatics of Viral GenomesAlternative way to find SARS coronavirus genome. Look for the latest entry with complete genome and good functional annotation. Not all entries have these.

1717

Bioinformatics of Viral GenomesBioinformatics of Viral GenomesThe latest good entry: AY572038 civet020 SARS coronavirus (In Jan 2005 AY310120 SARS coronavirus FRA), complete genome

1818

SARS Coronavirus GenomeSARS Coronavirus Genome

You are expected to find the info about each gene (genome location, sequence, function)

1919

Function of SARS Coronavirus GenesFunction of SARS Coronavirus Genes

2020

Bioinformatics of Viral GenomesBioinformatics of Viral GenomesWhere to find the proteins in the genome entry?

Source 1: mat_peptide

Protein name

2121

Bioinformatics of Viral GenomesBioinformatics of Viral GenomesWhere to find the proteins in the genome entry?

Source 1:

mat_peptide

2222

Bioinformatics of Viral GenomesBioinformatics of Viral GenomesWhere to find the proteins in the genome entry?

Putative 3C-like protease mat_peptide link:

Protein name

Protein function

2323

Bioinformatics of Viral GenomesBioinformatics of Viral GenomesWhere to find the proteins in the genome entry?

Source 2: CDS

Protein name

2424

Bioinformatics of Viral GenomesBioinformatics of Viral GenomesWhere to find the proteins in the genome entry?

Source 2:

CDS

2525

Bioinformatics of Viral GenomesBioinformatics of Viral GenomesWhere to find the proteins in the genome entry?

Source 2:

CDS

2626

Bioinformatics of Viral GenomesBioinformatics of Viral GenomesWhere to find the proteins in the genome entry?

Source 2:

CDS

2727

Bioinformatics of Viral GenomesBioinformatics of Viral GenomesWhere to find the proteins in the genome entry?

Nucleocapsid protein protein_id link:

Protein name

2828

Bioinformatics of Viral GenomesBioinformatics of Viral Genomes

How to find the name or function of a putative

protein in a genome?

• Medline keyword search

• Google search

2929

Bioinformatics of Viral GenomesBioinformatics of Viral Genomes

What if the function of a putative protein is unknown?

• Sequence alignment (BLAST, PSI-BLAST). This will be further discussed in lecture 4.

• Motif analysis (Conduct a PROSITE motif search)

• If sequence analysis fails or in doubt, try machine learning method (SVMProt , Nucleic Acids Res., 31: 3692-3697; ProtFun , Bioinformatics, 19:635-642). This will be studied in lecture 5.

3030

Bioinformatics of Viral GenomesBioinformatics of Viral Genomes

Drug design:

• Step 1: Finding the right target in the genome

• A key protein involved in viral cycle (stop the disease process)

• Different from human proteins (reduce side-effects)

• Step 2: Finding or making a chemical agent to stop the protein

• In majority of cases: protein inhibitors

• Step 3: Test and clinical trials

3131

Bioinformatics of Viral GenomesBioinformatics of Viral GenomesSARS Drug design:

The target: 3C like protease

3232

Bioinformatics of Viral GenomesBioinformatics of Viral GenomesSARS Drug design:

• Inhibitor design: Finding inhibitors of similar proteins, such as those of the same name (3C like proteases or 3C proteases of other species), may offer clues to inhibitor design.

Search from NCBI

3333

Bioinformatics of Viral GenomesBioinformatics of Viral GenomesSearch from NCBI finds 19 references.

3434

Bioinformatics of Viral GenomesBioinformatics of Viral GenomesCheck each abstract to find the name of one or more inhibitors.

Be prepared to read the full paper to find inhibitors

3535

Bioinformatics of Viral GenomesBioinformatics of Viral GenomesMake sure the paper talks about the inhibitors of the right protein.

This one actually talks about inhibitors of protease family, thus may

not necessarily be suitable for SARS 3C like protease

3636

Bioinformatics of Viral GenomesBioinformatics of Viral GenomesSARS Drug design:

• Inhibitor design: Finding inhibitors of similar proteins, such as those of the same name (3C like proteases or 3C proteases of other species), may offer clues to inhibitor design.

Search from Google

3737

Bioinformatics of Viral GenomesBioinformatics of Viral GenomesSearch from Google finds numerous entries

3838

Bioinformatics of Viral GenomesBioinformatics of Viral GenomesCheck each entry to find the name of one or more inhibitors.

Be prepared to read the full paper to find inhibitors

3939

Bioinformatics of Viral GenomesBioinformatics of Viral GenomesDesign of SARS 3C like protease inhibitors

using rhinovirus 3C like protease inhibitors as templates

4040

Summary of Today’s lectureSummary of Today’s lecture

• Genome database at NCBI• Viral genomes

– SARS coronavirus genome as an example

• Finding proteins from a genome• Therapeutic target identification from a genome and

inhibitor design

Recommended