Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

  • Upload
    gbland

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    1/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    Final Project Report

    CS 487 Introduction to Cluster Computing

    Old Dominion University

    Algorithm for Analysis of Amino Acid Bond Lengths of

    Proteins

    Tim Dugan, Computer Engineering Department

    Gordon Bland, Computer Science Department

    Tyler Wood, Computer Science Department

    Adrian Ostolski, Computer Science Department

    Research Advisor: Professor Jay Morris

    ABSTRACT

    Bioinformatics is a growing field of study which supplies us with endless computing problems

    to solve. Although the definition of the term itself is somewhat arguable, the generally accepted idea

    is that bioinformatics is using computers to solve biological issues, or answer questions. One such

    problem is to develop an algorithm for comparing lengths of proteins in order to search for protein

    keys. A protein key is a protein which sends signals to other cells by means of a chemical reaction

    where the binding occurs. Our team has chosen to develop an algorithm for analysis of amino acid

    bond lengths of proteins because this analysis will assist in identifying protein keys.

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    2/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    1

    Table of Contents

    1 Problem Description .................................................................................................. 3

    1.1 Introduction ......................................................................................................... 31.2 Background ........................................................................................................ 41.3 Need for Solution ................................................................................................ 7

    2 Solution Design ......................................................................................................... 82.1 Module Design .................................................................................................... 82.2 Classes ............................................................................................................... 82.3 Functions ............................................................................................................ 9

    3 Results .................................................................................................................... 123.1 Generated Protein Files .................................................................................... 133.2 Output ............................................................................................................... 15

    4 Conclusions and Recommended Further Research................................................ 185 Acknowledgements ................................................................................................. 19 6 Appendices ............................................................................................................. 20

    A References ............................................................................................................ 20

    B Source Code and Documentation ......................................................................... 20C Runtime Instructions ............................................................................................. 32

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    3/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    2

    List of Figures

    Figure 1. Amino Acid Structure ................................................................................... 6

    Figure 2. Peptide Bond ............................................................................................... 7

    Figure 3. Function Call in Main ................................................................................... 9Figure 4. Modify Function ......................................................................................... 10Figure 5. Load Function............................................................................................ 11Figure 6. Algorithm Function .................................................................................... 12Figure 7. Generate Function in Main ........................................................................ 13Figure 8. Generate Function ..................................................................................... 13

    Figure 9. Protein1_clean .......................................................................................... 14

    Figure 10. Protein2_clean ........................................................................................ 15

    Figure 11. Protein1_raw ........................................................................................... 16

    Figure 12. Protein2_raw ........................................................................................... 17Figure 13. Protein_matchs ....................................................................................... 18

    List of Tables

    Table 1. Amino Acids by Name .................................................................................. 5

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    4/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    3

    1 PROBLEM DESCRIPTION

    Bioinformatics is a growing field of study which supplies us with endless

    computing problems to solve. Although the definition of the term itself is somewhat

    arguable, the generally accepted idea is that bioinformatics is using computers to solve

    biological issues, or answer questions. One such problem is to develop an algorithm for

    comparing lengths of proteins in order to search for protein keys. A protein key is a

    protein which sends signals to other cells by means of a chemical reaction where the

    binding occurs.

    For example, Dutch scientists found a protein produced by glia cells in the central

    nervous system that transmit messages between brain cells which control the release of

    chemicals that affect memory, attention, and addiction. Acetylcholine affects memory,

    and dopamine affects addiction, to name a couple. Scientists anticipate using this

    protein key to develop drugs which will influence certain neuronal functions as opposed

    to certain others. There are many more protein keys that need to be identified,

    however.

    1.1 INTRODUCTION

    The purpose of this Beowulf class is to demonstrate how cluster computing can be

    used to solve large problems which could not otherwise be solved. Our team has

    chosen to develop an algorithm for analysis of amino acid bond lengths of proteins

    because this analysis will assist in identifying protein keys.

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    5/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    4

    1.2 BACKGROUND

    A protein is made up of amino acids, and therefore amino acids lie at the heart of

    bioinformatics. There are approximately 20 amino acids found in the human body.

    Each amino acid has unique properties and can be represented by a full name or a

    three-letter or one-letter code, as shown in Table 1.

    Alanine Ala A Hydrophobic Neutral

    Cysteine Cys C Hydrophobic Neutral

    Aspartic acid Asp D Hydrophilic Negative

    Glutamic acid Glu E Hydrophilic Negative

    Phenylalanine Phe F Hydrophobic Neutral

    Glycine Gly G Hydrophobic Neutral

    Histidine His H Hydrophilic Neutral/Positive/Negative

    Isoleucine Ile I Hydrophobic Neutral

    Lysine Lys K Hydrophilic Positive

    Leucine Leu L Hydrophobic Neutral

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    6/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    5

    Methionine Met M Hydrophobic Neutral

    Asparagine Asn N Hydrophilic Neutral

    Proline Pro P Hydrophobic Neutral

    Glutamine Gln Q Hydrophilic Neutral

    Arginine Arg R Hydrophilic Positive

    Serine Ser S Hydrophilic Neutral

    Threonine Thr T Hydrophilic Neutral

    Valine Val V Hydrophobic Neutral

    Tryptophan Trp W Hydrophobic Neutral

    Tyrosine Tyr Y Hydrophobic Neutral

    Table 1. Amino Acids by Name.

    All amino acids are composed of a few atoms of the same type which form its

    basic structure, with a central carbon atom or C-alpha, at its center. This carbon atom

    has a hydrogen atom, and amino group, and a carboxylic acid group, and a fourth group

    known as the variable sidechain connected to it. Sidechains are what differentiate one

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    7/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    6

    amino acid form another. Amino acids are connected by a peptide bond between the

    carboxyl group of the first amino acid and the amino group of the second amino acid.

    Figure 2 shows the general structure of an amino acid, and Figure 3 shows a peptide

    bond.

    Figure 1. Amino Acid Structure.

    (This space intentionally left blank.)

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    8/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    7

    Figure 2. Peptide Bond.

    1.3 NEED FOR SOLUTION

    This solution has the potential to drastically increase the ability of the human race to

    overcome diseases and illnesses or other conditions. With each discovery of a protein

    key, scientists are able to make progress toward curing and/or treating endless causes

    of mankinds suffering. Using the example mentioned in the introduction, that particular

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    9/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    8

    protein key can be used to prevent Alzhiemers disease, schizophrenia, or help people

    quit smoking or stop other drug additions similarly. This solution could save millions of

    lives!

    2 SOLUTION DESIGN

    Our team has designed a solution at Old Dominion Universitys Beowulf Laboratory

    where our supercomputer is housed. This solution accepts two ways of inputting data.

    Either by giving the program an actual protein file, or by the user first generating protein

    files. If the later is chosen, the user is prompted to input the number of protein nodes

    and the maximum number of possible connections between nodes. The user also

    inputs the sigma value, that is, the maximum deviation between distance comparison.

    This code will then generate random proteins based on this criteria and output the

    comparison to a text file.

    2.1 MODULE

    DESIGN

    It is important to note initially that each module is independent of one another.

    The intent is that each function can be run as a stand alone function. The purpose of

    this is so that the program designer can implement full user input and feed them into the

    functions.

    2.2 CLASSES

    There are two classes used in this solution; the point class and the edge class.

    The point class contains the node id and the x and y coordinate values for the node

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    10/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    9

    object. The edge class thus contains two points and the distance between them.

    These two classes can be found in Appendix B.

    2.3 FUNCTIONS

    There are four major functions in our code; the generate, modify, load, and

    algorithm functions. The generate function will be discussed in another section. Figure

    3 shows each of these functions being called in Main.cpp and what parameters each

    take.

    Figure 3. Function Call in Main.cpp.

    The modify function, shown in Figure 4, requires a file input name(s) and a file

    output name(d). It then reads in data from file s and ouputs it to file d in a format which

    will be acceptable for use in the load function.

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    11/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    10

    Figure 4. Modify Function.

    The load function requires a file input name(s) and a list type edge(d). It can be

    seen in Figure 5 and its purpose is to load data in from file s and put the data into the

    edge list d, which is passed to the function by reference.

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    12/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    11

    Figure 5. Load Function.

    The algorithm function, shown in Figure 6, is at the heart of our solution. This

    function must have a file output name(s), a list type edge(d), a list type edge(f), and a

    delta(x). It compares list lengths in list d to list f and if those lengths difference is

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    13/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    12

    within the delta x value then the program will output the matching edges from d and f

    to the file s.

    Figure 6. Algorithm Function.

    3 RESULTS

    Our team has successfully generated two separate protein files (protein1, and

    protein2) which mimic actual protein files. We have then used these protein files to

    compare protein lengths. Our current solution can perform this on any protein file

    supplied to it, with no alterations.

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    14/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    13

    3.1 GENERATED PROTEIN FILES

    The generated protein files are created by the generate function. It requires a file

    output name(s), amino acid length (x), number of bonds (y) and will generate a random

    protein with the amino acid length x. Each amino acid is connected to anywhere from 1

    to y nodes and the function outputs the data in a specified format to a file named s.

    Figure 7. Generate Function in Main.

    Figure 8. Generate Function.

    When the protein files are generated by the generate function, the files list the

    node, its coordinates, the connecting node number, and the connecting nodes

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    15/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    14

    coordinates. For example, the beginnings of our two generated protein files, Protein1

    and Protein2, are shown in Figures 9 and 10.

    Figure 9. Protein1_clean.

    (This space intentionally left blank.)

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    16/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    15

    Figure 10. Protein2_raw.

    3.2 OUTPUT

    The output for this project is the comparison of lengths performed on our generated

    protein test data. The formatting for the file output in the raw and clean output files

    shows data for each node in pairs. In the raw file, the first node is the real node data

    and the second node data is the node it is connected to.

    A(1) 5, 4A(4) 10, 11A(1) 5, 4A(5) 13, 15

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    17/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    16

    This means that node 1 is connected to node 4 and node 5.

    Figure 11. Protein1_raw.

    (This space intentionally left blank.)

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    18/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    17

    Figure 12. Protein2_clean.

    (This space intentionally left blank.)

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    19/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    18

    The matching file lists all edges, and is shown in Figure 13.

    Figure 13. Protein_matchs.

    4 CONCLUSIONS AND RECOMMENDED FURTHER RESEARCH

    Our current solution finds all matching protein lengths in a given protein file, but it

    does not yet actively search for protein keys. Therefore, further development would

    implement an algorithm which would enable this module to identify protein keys. This

    would, however, require copious amounts of additional research in order to generate a

    precise method which could positively make such identifications possible.

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    20/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    19

    5 ACKNOWLEDGEMENTS

    Our team would like to thank Professor Jay Morris for the concept of this project

    and his invaluable assistance throughout the development of this module. We would

    also like to thank the Computer Science Department for providing the supercomputer for

    us work with during this course. Special thank you also to Tihomir Hristov for the

    training and initial setup which he provided.

    (This space intentionally left blank.)

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    21/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    20

    6 APPENDICES

    A REFERENCES

    [1] Carter, J.S. (2004, November 02). Amino acids and proteins. Retrieved from

    http://biology.clc.uc.edu/courses/bio104/protein.htm[2] CMBI. (2010, February 12). Amino aci d. Retrieved from

    http://wiki.cmbi.ru.nl/index.php/Amino_acid[3] Vriend, G., & Gelder, C.V. (n.d.). Intro bioinformatics. Retrieved from

    http://swift.cmbi.ru.nl/teach/B1M/[4] Yahoo Stories, . (2001, May 16).

    rotein key to new smoking, alzheimer'sdrugs. Retrieved from http://cmbi.bjmu.edu.cn/news/0105/97.htm

    B SOURCE CODE AND DOCUMENTATION

    Main.cpp

    #include

    #include

    #include

    #include

    #include

    using namespace std;

    #include "point.h"

    #include "Edge.h"

    #include "function.h"

    int main()

    {

    list protein1;

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    22/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    21

    list protein2;

    list::iterator pitr;

    int a,b,c,d;

    double sigma;

    char str;

    srand(time(NULL));

    cout>a>>b;

    cout>c>>d;

    coutsigma;

    cout

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    23/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    22

    LoadProteinFile("protein2_clean.txt",protein2);

    algorithm("protein_matchs.txt",protein1,protein2,sigma);

    cout> str;

    str = toupper(str);

    if(str=='Y'){

    cout

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    24/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    23

    Function.h

    struct node{

    int label,nlabel;

    int x,nx;

    int y,ny;

    };

    //////////////////////////////

    //functions

    //////////////////////////////

    void GenerateProteinFile(char str[256],int array_size,int node_connection){

    int a=0;

    int b=0;

    int c=0;

    int d=0;

    int e=0;

    int f=0;

    node A[array_size];

    //USER INPUT

    fstream fout(str,ios::out);

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    25/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    24

    array_size++;

    //generate array of nodes

    for(int z=0; z

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    26/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    25

    fout

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    27/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    26

    while (fin.good())

    {

    c = &b;

    fin.get(*c);

    a=*c;

    if(fin.good())

    {

    if(a!=65 && a!=40 && a!=41 && a!=44)//ascii values for A ( ) , #

    {

    if(a==32)//ascii value for space

    {

    fout

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    28/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    27

    fin.close();

    fout.close();

    }

    /////////////////////////////////////////////////////////////////////

    void LoadProteinFile(char infileName[256], list &protein){

    //variable declaration

    ifstream fin;

    ofstream fout;

    char * c;

    //USER INPUT

    fin.open(infileName);

    Edge *amino;

    Point *aptr;

    Point *bptr;

    int i,index=0;

    double x1,y1,x2,y2,distance;

    //temp code

    fin>>i;

    while(fin.good()){

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    29/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    28

    fin>>x1;

    fin>>y1;

    aptr = new Point(i,x1,y1);

    fin>>i;

    fin>>x2;

    fin>>y2;

    bptr = new Point(i,x2,y2);

    //calculate distance

    distance = sqrt(pow((x2-x1),2)+pow((y2-y1),2));

    //

    amino = new Edge(index,distance,aptr,bptr);

    index++;

    //amino->display();

    protein.push_back(*amino);

    fin>>i;

    }

    }

    //////////////////////////////////////////////////////

    void DisplayProtein(list &protein){

    list::iterator pitr;

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    30/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    29

    int i;

    double x,y;

    for(pitr=protein.begin(); pitr!=protein.end();pitr++)

    {

    pitr->display();

    }

    }

    ////////////////////////////////////////////////////////

    void algorithm(char outfileName[256],list &protein1,list &protein2, double sigma){

    double delta;

    fstream fout(outfileName,ios::out);

    list::iterator protein1_itr;

    list::iterator protein2_itr;

    for(protein1_itr=protein1.begin(); protein1_itr!=protein1.end();protein1_itr++) {

    for(protein2_itr=protein2.begin(); protein2_itr!=protein2.end();protein2_itr++) {

    delta=fabs(protein1_itr->getDistance()-protein2_itr->getDistance());

    if(sigma>=delta){

    fout

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    31/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    30

    fout

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    32/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    31

    //Operator=

    private:

    int name;

    double x;

    double y;

    };

    Edge.h

    class Edge {

    public:

    Edge();

    Edge(int i,double dist, Point *x, Point *y){

    index = i;

    distance = dist;

    a.setX(x->getX());

    a.setY(x->getY());

    a.setName(x->getName());

    b.setX(y->getX());

    b.setY(y->getY());

    b.setName(y->getName());

    }

    int getAname(){return a.getName();}

    double getAx(){return a.getX();}

    double getAy(){return a.getY();}

    int getBname(){return b.getName();}

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    33/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    32

    double getBx(){return b.getX();}

    double getBy(){return b.getY();}

    double getDistance(){return distance;}

    void display(){cout

  • 8/9/2019 Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    34/34

    Algorithm for Analysis of Amino Acid Bond Lengths of Proteins

    33

    asked if they would like the output printed to the screen. If no is selected, the output

    may be found in the text file.