Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project...

Virus Host co-evolution in sight of their proteomes and

codon preferences

Bioinformatics project 2007

Yaar ReuveniInstructor - Michal Linial

Outline:

My project is composed of two phases:

1. Phase I: The virus host web tool – VirOsNet. You are welcome to visit at: www.virosnet.cs.huji.ac.il

2. Phase II: Virus Host co-evolution research using codon usage analysis.

Viruses: Basically a cpasid

envelope that contains genetic information.

Viruses can not replicate by themselves, and depend on the host for reproduction.

It’s main purpose in life enter a host, and use it’s facilities to reproduce

Viruses fight back:

Phase I: VirOsNet

VirOsNet provides database and tools for exploring virus evolution and virus-host co-

evolution

Background and Motivation:

Ample of examples suggest that often viruses steal information from their hosts.

Viruses must optimize their amount of genetic material and physical size.

Viruses have very fast evolution:o Hard to trace.o Might change by switching hosts.o Shuffle their genetic material.

Phase (I) main objective:

Compare all viral proteins to all known proteins and detect resemblance.

Meaning: in what way do viral proteins "resemble" any of all other known proteins in our world?

Objectives and possible outcomes (i)

Clever search: Provide crossbreeding factors when searching

Offer comparisons of viruses relative to the proteome of their known hosts

Stolen elements: where were they stolen from? Was it from the host?

Mimicking phenomenon: detect host - protein mimicry

When did it happen: Evolutionary tracking

Objectives and possible outcomes (ii)

Recent event – indicative by similarity search results that are exceptional.

Insights on viruses and their proteomes.

Long term: Pharmaceutics applications. Proposal

of drug targets

Methods: Data is from the ProtoNet DB (currently ~ 1.8 million

proteins) All proteins are from UniProt.

New tables to the DB -specialized for host-virus relations.

Pre computed BLAST (BLOSUM62) and dynamic BLAST options.

Entry is a Viral Protein, BLAST search results are sorted by the descending E-values.

Several display schemes. Each result associated with domain information

(InterPro) Download options for next phase analysis

Tool overview:

The tool works in a 4 steps scheme:1. Step 1: search for a virus to query on

using one of the search methods2. Step 2: choose a specific virus3. Step 3: choose one of it’s proteins, and

the BLAST properties4. Step 4: choosing one of the BLAST

results to get it’s pairwise alignment

7,763 viruses and 199,563 proteins

Some Statistics

Entry point to viruses according to their genetic material complexity

Example: check all dsRNA viruses

Affecting Eukaryotes

Case study: Abelson murine leukemia virus:a VERY close homolog of human and a

mouse protein tyrosine kinase that:(i) Regulates cytoskeleton during cell differentiation,

cell division and cell adhesion(ii) Regulates DNA repair potentially in severe

demage.

The viral protein causes cancer (active site mutation)

Lets look at it ……

Active site

Summery Phase I: Pros:

Platform for studying viruses relative to hosts A discovery tool Rich BLAST options for evolutionary wider view Crossbreeding with host data (i.e. IntrPro

Domains). Dynamic view on BLAST result as a group

(ProtoMesh) Cons: Still to improve the usability to the average

biologist VirOsNet can get very slow on overload or in some

of the filtering options.

Phase II: Codon usage

Virus-host classification using codon usage analysis

with SVM

Figure adapted fromL. Merkel, N. Budisa, BIOspektrum 2006 , 12 , 41.Veränderung des genetischen Codes.

RNA codons:

2’nd base

1’st

UUU (Phe/F)PhenylalanineUCU (Ser/S)SerineUAU (Tyr/Y)TyrosineUGU (Cys/C)Cysteine

baseUUC (Phe/F)PhenylalanineUCC (Ser/S)SerineUAC (Tyr/Y)TyrosineUGC (Cys/C)Cysteine

UUA (Leu/L)LeucineUCA (Ser/S)SerineUAA Ochre (Stop)UGA Opal (Stop)

UUG (Leu/L)LeucineUCG (Ser/S)SerineUAG Amber (Stop)UGG (Trp/W)Tryptophan

CUU (Leu/L)LeucineCCU (Pro/P)ProlineCAU (His/H)HistidineCGU (Arg/R)Arginine

CUC (Leu/L)LeucineCCC (Pro/P)ProlineCAC (His/H)HistidineCGC (Arg/R)Arginine

CUA (Leu/L)LeucineCCA (Pro/P)ProlineCAA (Gln/Q)GlutamineCGA (Arg/R)Arginine

CUG (Leu/L)LeucineCCG (Pro/P)ProlineCAG (Gln/Q)GlutamineCGG (Arg/R)Arginine

AUU (Ile/I)IsoleucineACU (Thr/T)ThreonineAAU (Asn/N)AsparagineAGU (Ser/S)Serine

AUC (Ile/I)IsoleucineACC (Thr/T)ThreonineAAC (Asn/N)AsparagineAGC (Ser/S)Serine

AUA (Ile/I)IsoleucineACA (Thr/T)ThreonineAAA (Lys/K)LysineAGA (Arg/R)Arginine

AUG (Met/M)Methionine, Start[1]ACG (Thr/T)ThreonineAAG (Lys/K)LysineAGG (Arg/R)Arginine

GUU (Val/V)ValineGCU (Ala/A)AlanineGAU (Asp/D)Aspartic acidGGU (Gly/G)Glycine

GUC (Val/V)ValineGCC (Ala/A)AlanineGAC (Asp/D)Aspartic acidGGC (Gly/G)Glycine

GUA (Val/V)ValineGCA (Ala/A)AlanineGAA (Glu/E)Glutamic acidGGA (Gly/G)Glycine

GUG (Val/V)ValineGCG (Ala/A)AlanineGAG (Glu/E)Glutamic acidGGG (Gly/G)Glycine

Main question:

Given a viral protein, determine who might be a potential host of the virus.

The basis for the hypothesis: An optimization of the viruses toward their hosts

Objectives:

Create a classification tool, that receives a viral protein and will give a prediction on its potential hosts.

Classify all the proteins to different classes, using a maximum-margin hyperplane.

Provide different levels of classification. Create a “host rank” for a given viral

protein for each of its potential hosts.

Results: May suggest a “virus cross-species potential index”

Methods: Collect and arrange all the codon usage

data (or other relevant data for this classification).

Analyze the data, normalization and processing.

Unsupervised learning and clustering for better understanding of the data.

Given all codon usage for all species, use the SVM algorithm to create a predictor for a new specimens.

Provide various levels of classifying classes for the codon data.

About the data: Codon usage is calculated for

each species. Each species is represented

by a 64 positions vector. The question of

normalization:o standard normalize to 1.o functional per amino-acid, or

by entropy.o percentage – per column

666444442222222223113

RLSTPAGVKNQHEDYCFIMWSTOP

Codon usage

1 . . . 64

Bacteria

666444442222222223113

RLSTPAGVKNQHEDYCFIMWSTOP

Primates

Data from Nakamura: Codon usage tabulated from the

international DNA sequence databasesNakamura, Y., Gojobori, T. and Ikemura, T. (2000) Nucl. Acids

Res. 28, 292.

Downloading the codon usage table The data covers all species (including

viruses).

Usage distribution:Bacteria Invertebrates Primates

ViralPlants Rodents

Usage distribution:

Positions 1-13

Our data: It was expected to find diverse codon

usage between different taxonomy groups.

There are 703 distinct known hosts in our DB and 2152 distinct known hosted viruses.

I created an interface for extracting the CDS data from the coding data we have in ProtoNet.

I used the same convention for the vector

In ProtoNet (version 5.1):16,567 viruses and 409,726 proteins

Dividing our data in to groups:

GroupName

FungiBacteria

Viridiplantae (green plants)

Rodents

Primates

Aves (birds)

Tetrapoda

Arthropoda

Taxid4751233090998999443

8782325236656

distinct Hosts

4463393831313418788

Number viruses not distinct

916914142511015474262761263

Distinct viruses

9161329162868163741549175

Distinct viruses with CDS

9151304150816163631462169

Who infect what?

6Primates

Rodents

Tetrapoda

Bacteria

2 302Fungi

Plants

6 Others

Arthropoda

+)99 (distributed

These are all diferent viruses groups:

Comparison:Positions 1-12

Looks Promising!

Clustering: preliminary results

Using a set of COMPACT tool (COMPACT: A Comparative Package for Clustering Assessment)

Varshavsky et al, 2005 ISPA: 159-167.

Visualization of resultsScoring

Hierarchal - Percentage Normalization

Hierarchal - Standard Normalization

Summery phase II: All data is organized, accessible and

will update along with the ProtoNet DB. Comprehensive analysis, created a

good understanding of the data. Future plans:

Decide on a good division into classes. Use SVM algorithm to create a classifier, given

a virus codon preferences guess potential hosts.

Create an interface that offers this service.

Acknowledgements:

Thank you to all the people that helped:

Michal Linial Iris Bahir Menachem Fromer Alexander Savenok Michael Dvorkin Roy Varshavsky

Thank You!

Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project...

Documents

MH Presentation PP.ppt [Read-Only]njsgna.org/images/pdf/mhpresentation.pdf · Malignant Hyperthermia Michael Reuveni MD. TextText. AT THE MOMENT OF RECOGNIZING A CRISIS THE FIRST

57075916 MR Yaar Antha Nilavu[1]

NearlyCompleteGraphsDecomposableintoLargeInduced ......Yet another construction was obtained by Birk, Linial and Meshulam [10], and in an improved formby Meshulam [20]. For the application

Tamil Comics - Mayavi yaar-antha-mayavi-pdf

Doron Reuveni - The Mobile App Quality Challenge - EuroSTAR 2010

Mein Ki Kar Yaar Manawan by Nabeela Abar Raja Urdu Novels Center (Urdunovels12.Blogspot.com)

TH ANNUAL LOOD RAIN ARRIER ONSORTIUM MEETING · 2020. 2. 28. · Scientific Program THURSDAY, MAR H 12TH Morning UPDATES ON PRE-LINIAL AND LINIAL TRIALS FROM THE ONSORTIUM AND OUR

LOCALITY IN DISTRIBUTED GRAPH ALGORITHMS Nathan Linial Presented by: Ron Ryvchin

: C olume ssue Clinical Medical Imaging anuary ttpd.doi ... · : C olume ssue anuary linial Iage ttpd.doi.org1.1icmi.11 International Journal of Clinical Medical Imaging Title: Rheumatoid

UNITED STATES DISTRICT COURT FOR THE EASTERN …...Dec 03, 2015 · /s/ Erez Reuveni EREZ REUVENI Senior Litigation Counsel U.S. Department of Justice, Civil Division Office of Immigration

Jan Gube Seyum Getenet Adnan Satariyan Yaar Muhammad Does alignment of research interests matter? Exploring doctoral students’ perception of supervisors’

Deewane Salik by Mufti Ahmad Yaar Khan Naimi

Path identification by hagay avraham the third Composers : Abraham Yaar,Adrian Perrig and Dawn Song

INTERNATIONAL MARINE & FRESHWATER SCIENCES …old.ims.metu.edu.tr/pdf/2296.pdf · Dr. Muhammed Yaar Dörtbudak Harran University, Turkey Assoc. Prof. Dr. Gökçen Bilge Muğla Sitki

Eigenvectors of random graphs: nodal domains James R. Lee University of Washington Yael Dekel and Nati Linial Hebrew University TexPoint fonts used in

Expander graphs – applications and combinatorial constructions Avi Wigderson IAS, Princeton [Hoory, Linial, W. 2006] “Expander graphs and applications”

Journal of Clinical & Experimental Neuroimmunology · Citation: Reuveni D, Gertel-Lapter S, Aricha R, Mittleman M, Fuchs S, et al. (2016) Erythropoietin Ameliorates Experimental Autoimmune

Yuval Peled, HUJI Joint work with Nati Linial, Benny Sudakov, Hao Huang and Humberto Naves

57075916 MR Yaar Antha Nilavu

JOSEPH H. HUNT Assistant Attorney General WILLIAM C ... · EREZ REUVENI Assistant Director ARCHITH RAMKUMAR Trial Attorney U.S. Department of Justice, Civil Division Office of Immigration