Exercise - CBS fileCENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Exercise! Prediction of MHC:peptide...

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS

Exercise Prediction of MHC:peptide binding using PSSM and ANN.

SYFPEITHI • Q1: What are the characteristics of the peptides that bind HLA-A*0201? Which positions are anchor position and what amino acids are found at the anchor positions?

SYPEITHI - Logos

* Q4: What is the predictive performance of the matrix method (Pearson coefficient and Aroc value)? Pearson coefficient for N= 66 data: 0.63507 Aroc value: 0.83399

* Q5: How many of the 1200 peptides in the train set are included in the matrix construction? Number of positive training examples: 136 Go back to the EasyPred server window (use the Back bottom). Set clustering

method to No clustering and the weight on prior to zero and redo calculation. * Q6: What is the predictive performance of the matrix method now? Pearson coefficient for N= 66 data: 0.51898 Aroc value: 0.78954

EasyPred

EASYPRED

• * Q7: Does clustering and pseudo count (weight on prior) improve the prediction accuracy?

• Clustering down weights similar sequences in the training data thereby removing a non-biological bias.!

• Pseudo counts allow you to estimate amino acids frequencies for amino acids that are not observed in the data by use of blosum frequency substitution matrices. By setting the weight on prior (or weight on pseudo count) to zero the relative weight on these pseudo count is set to zero and the method becomes less capable of generalization.!

•  A Network contains a very large set of parameters

• A network with 5 hidden neurons predicting binding for 9meric peptides has 9x20x5=900 weights

•  Over fitting is a problem

•  Stop training when test performance is optimal

Neural network training

Neural network training. Cross validation

Cross validation

Train on 4/5 of data Test on 1/5

=> Produce 5 different neural networks each

with a different prediction focus

Neural network training curve

Maximum test set performance Most capable of generalizing

EasyPred - ANN * Q8: What is the maximal test performance (maximal test set Pearson correlation), and in what epoch does it occur?"Maximal test set pearson correlation coefficent sum = 0.801300 in epoch 103 "

* Q9: What is evaluation performance (Pearson correlation and Aroc values)? "Pearson coefficient for N= 66 data: 0.58693 Aroc value: 0.85490 "

Go back to the EasyPred interface and change the parameters so that you use the bottom 80% of the train.set to train the neural network and the top 20% to stop the training. Redo the network training with the new parameters. "

* Q10: What is the maximal test performance, and in what epoch does it occur?"Maximal test set pearson correlation coefficent sum = 0.837800 in epoch 90 "

* Q11: What is evaluation performance?"Pearson coefficient for N= 66 data: 0.55571 Aroc value: 0.78170 "

EasyPred - ANN

* Q12: How does the performance differ from what you found in the previous training? The training stops faster and with a better test (stop) performance but the evaluation performances drop compared with the previous training. * Q13: Why do you think the performance differ so much? If the training set and the test (stop) set are to similar you get an artificially good test performance. However the training will be biased against this similarity thus when evaluating on a more dissimilar set the performance is not so good. !Or the evaluation set could share similarity to the top 80% of the training data thereby imposing a bias the evaluation performance !

Hidden Neurons

Go back to the EasyPred interface and change the parameters back so that you use the top 80% of the train.set of training. Next do neural network

training with a different set of hidden neurons (1 and 5 for instance). * Q14: How does the test performance differ when you vary the number of hidden neurons? It does not differ very much.! * Q15: How does the evaluation performance differ? It does not differ very mouch! * Q16: Can you decide on an optimal number of hidden neuron? No

Hidden Neurons (units)

Having more than one hidden neuron in the hidden layer will enable the network to solve higher order correlations like the “exclusive or” (XOR) function. That is in nature for example if two neighboring amino acids competes of a single binding pocket. One of the amino acids have to be large, but both of them cant. This problem can only be solved by methods that can handle higher order correlations like artificial neural networks.!

!Here we do not see any advantage of having more than 1 neuron so here we either do not have higher order correlations or the influence of those are so subtle that this training set is to small to learn from. We need much more data than what we have available to capture higher order correlations. We need to estimate amino acids pair frequencies to do this, and hence need to estimate in the order 400 such frequencies. This cannot be done from 136 binding peptides !

* Q17: Why do you think the number of hidden neurons has so little importance?

Linear function

y = x1 ⋅ v1 + x2 ⋅ v2

Neural networks

w11 w12

Higher order function

Neural networks

Ensembles Write down the test performance for each of the five networks * Q18: How does the train/test performance differ between the different partitions? Test Pearson CC varies between 0.79 and 0.86!

* Q19: What is the evaluation performance and how does it compare to the performance you found previously? Pearson coefficient for N= 66 data: 0.61446 Aroc value: 0.83137 !Pearson better, AUC (Aroc) a little worse.

Hidden Units Q20: How many high binding epitopes (>0.5~200 nM or >0.426~500nM)do you find? 2 Is this number reasonable (how large a fraction of random 9meric peptides are expected to bind to a given HLA complex?) 2/400 = 0.5% or 1/200

• Q21: What is the predictive performance of the method?

• Pearson coefficient for N= 22 data: 0.35836

• Aroc value: 0.46875

• Threshold for counting example as positive: 0.362000

• Q24: How does the logo compare to the binding motif described in the SYFPEITHI database? Server down"

• Q25: Which positions are most important for binding? P1, P3 and P9"

• What is the predictive performance of the method? and how does the performance compare to that of the TEPITOPE method? Pearson coefficient for N= 22 data: 0.74057, Aroc value: 0.88542. These values are much higher than what was obtained for the TEPITOPE method"

Exercise - CBS fileCENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Exercise! Prediction of MHC:peptide...

Documents

Security and Stockpile Management (PSSM) Orientation Course

PSSM Media Limited

01.4.pssm theory

Position-specific scoring matrices (PSSM) - GitHub Pagesrsa-tools.github.io/course/pdf_files/01.4.PSSM_theory.pdf · Position-specific scoring matrices (PSSM) Regulatory Sequence

MOC-PSSM CME Article: Venous Thromboembolism Prophylaxis ...cirugiaplastica.mx/content/6-miembros/12-protocolos-de-seguridad/... · CME/MOC MOC-PSSM CME Article: Venous Thromboembolism

RESISTANCE EXERCISE RESISTANCE EXERCISE RESISTANCE EXERCISE

Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME

Muscle Dysfunction - Ocean State Equine Associates · Muscle Dysfunction Barbara Harrison, DVM ... Type II PSSM: •Gene has not been ... •Manual Therapy –Stretching, ROM

fact sheet PSSM in Sahel sheet PSSM in Sahel.pdf · WITH IATG AND ISACS! Improving arms and ammunition management, while providing and validating national SOPs on PSSM that are compliant

PSSM Life Members LMNONAME 4 MR. BALE … · yedelli rajaram laxman 196mr. vadigoppula devaram rajamallu 197mr. puli laxminarayana narsaiah ... gundeti ashok mallesham 496mr. uduta

Motif discovery Tutorial 5. Motif discovery MEME Creates motif PSSM de-novo (unknown motif) MAST Searches for a PSSM in a DB TOMTOM Searches for a PSSM

Gourav - Seminar report on Food Security in India_Myths and Realities - with PSSM edit (2).doc

Miopatia ze spichrzaniem polisacharydów (PSSM) u konia ...ph.ptz.icm.edu.pl/wp-content/uploads/2018/07/4-Parzyszek.pdfWady postawy pod siodłem Słaba wydolność / brak energii Powracające

PHYSICAL SECURITY MINE ACTION STOCKPILE MANAGEMENT€¦ · local and international standards and regulations. Physical security stockpile management (PSSM) is essential in reducing

shiloh 2018 prayer guidelines fnl - vod-archive.domi.org.ngvod-archive.domi.org.ng/WEBSITE/filecenter/shiloh_2018_prayer... · INTERCESSO INTERCESSORY PRAYER PRAYER . Title: shiloh

PERSONAL SPIRITUAL AWAKENING - PRAYER GUIDELINE - David Oyedepovod-archive.domi.org.ng/WEBSITE/filecenter/PERSONAL_SPIRITUAL... · Title: PERSONAL SPIRITUAL AWAKENING - PRAYER GUIDELINE.cdr

The Opening of 5th International Malaysian Social Science ...pssmalaysia.tripod.com/beritapssm/PSSM_Dec2006.pdf · Malaysia or the Malaysian Social Science Association (PSSM) aims

Leveraging Fundamism Through Seasons of Change...(ZH Y LZ\S[ J \Z[VTLYZ^ PSSM LLS[ OL`H YL] HS\LK L TWSV`LLZ^ PSSM LLST VYLM \SÄSSLKH UK[ OLW LYJLW[PVUV M` V\YV YNHUPaH[PVU^ PSS PTWYV]L

Getting Started Guide - FileCenter · FileCenter Getting Started Guide Page 4 of 12 All Users. By installing FileCenter for all users, every user of the computer will share the same

Developing the TechMath Instructional Module. “Imagine a classroom, … NCTM, 2000. PSSM, p.3