z-Bioinformatics

Embed Size (px)

Citation preview

  • 8/3/2019 z-Bioinformatics

    1/14

    ContentWhat is bioinformatics.

    What is computational Biology.

    Data mining.Application of data mining.

    Net accessible resources.

    Sequence Analysis.

    What can be Done with sequence

    Analysis.?Identification of protein primary

    sequence from DNA sequence.

    Tips for searching Database.The process of Evolution.

    Principle and their Importance.Conclusion.

  • 8/3/2019 z-Bioinformatics

    2/14

    What is Bioinformatics.

    Bioinformatics describe anyuse of computer to handle

    biological information. Inpractice, the definition usedby most people in narrower,bioinformatics to them is asyononym for

    computational molecularbiology, the use of

    computers to characterizethe molecular components

    of living.

  • 8/3/2019 z-Bioinformatics

    3/14

    What is data mining.

    Data mining is the process bywhich testable hypothesis aregenerated regarding the function

    or structure of gene or proteinof interest by idenfenite similarsequence in better characterized

    organism. Application of data mining:- Include fraud detection, credit

    card scoring and personal profilemarketing. Skillful interpretationof data can enhance customerrelation, direct marketing, trendanalysis, financial marketforecasting and international

    criminal investigations.

  • 8/3/2019 z-Bioinformatics

    4/14

    Net accessible resources:-

    Two main world wide web sites

    provide information on data mining:- The data mine: This includes

    pointers to FTP-able papers, and two

    large data mining bibliographies. Itattempts to provide links to as muchof the available data mining

    information on the net as is possible.Run by Pryke , at the University ofBirmingham.

    Knowledge discovery mine: Theknowledge discovery mine has theKDD FAQ, a comprehensive catalog

    of tools for discovery in data ,as wellas back issues of the KDD-Nuggetmailing list. Run by leading KDD

    researcher Gregory Piatetsky-Shapiro.

  • 8/3/2019 z-Bioinformatics

    5/14

    What is sequence Analysis.

    Sequence analysis is the processof trying to find out somethingabout a nucleotide or amino acid

    sequence, employing in silicobiology techniques. You may havesequenced a gene yourself, and

    wish to learn what the long string ofletters representing base, actuallycode for. You may want to confirmthat you indeed cloned a genesuccessfully, or you might want to

    learn about a sequence of DNA thatyou know absolutely nothing about.You may want to know if a worm

    has a similar protein to a humanone..

    Wh t b d ith

  • 8/3/2019 z-Bioinformatics

    6/14

    What can be done now with

    sequence Analysis

    Given the pessimistic view ofsequence analysis presented in theprevious section, why do we even

    bother with it? In the first place theattempted to find methods forsuccessful sequence analysis is a

    research goal in its own right; onewhose potential rewards are sovast as to make it of the first

    importance. In the second place,although there are many thingsthat sequence analysis cannot yet

    do , there are many very worthwhile things that can currently bedone with sequence analysis and

    these will be summarized in thissection.

  • 8/3/2019 z-Bioinformatics

    7/14

    Identification of protein sequence

    from DNA sequence The computer programs which are used to infer

    protein sequence from DNA sequence provideinformation which can be used to be helpapproach a solution. For example, if you are

    trying to find out in a DNA sequence a protein isencoded, it is very used to know what peptideswould be encoded by all six reading frames. Astretch containing many stop codons is a poor

    candidate for encoding a protein. This will notabsolutely tell you where the protein sequencestarts and stops, but it will you guess where thatmight occur. Programs exist for doing these . In

    fact there are many factors you can used toguess where in a DNA sequence a proteinsequence might reside; use of the expectedcodon bias, presents of characteristic sequences

    representing regulatory signals in the DNA andso forth. One family of programs integrates avariety of these approaches , and using eitherexplicit algorithms or trained neural nets ,makes

    a prediction.

  • 8/3/2019 z-Bioinformatics

    8/14

    Tips for searching database.

    Use latest database version Use blast first, then a finer tool (fasta,

    search, blitz , sweep, block et al) Search both strands when using FASTA.

    This is automatically done in GCG Program. Translate sequence where relevant

    Search 6-frame translation of DNAdatabase

    EO200 for protein

    If the query has repeated segments, deletethem and repeat search

  • 8/3/2019 z-Bioinformatics

    9/14

    The process of evolution.

    Indeed, homologous proteins arisefrom mutations in a common

    ancestor coding gene. Through theprocess of gene divergence, somegene mutations have been

    accepted by natural selectionbecause they preserved the foldingand function of the coded protein.

    This could be represented byschematic tree where several genescome from a common ancestorgene.

  • 8/3/2019 z-Bioinformatics

    10/14

    Principle and their

    importance Sensitivity Versus Specificity

    There are different ways toestimate similarity between twosequences, allowing us to modify

    the sensitivity and specificity ofthe results when performing asequence database search with a

    query sequences . If thesensitivity is high, more distantlyrelated sequence as the S.griseus protease will beretrieved.

  • 8/3/2019 z-Bioinformatics

    11/14

    Continue..

    However, unrelated sequences

    as the endochitinase will also bereturned. On the other hand if thespecificity is high , only closely

    related sequences will bereturned but, in this case,distantly related ones will be

    missed . Thus, a researcher hasto know how he could managethis problem .And this is one

    additional reason explaining whybiologists should not treatsoftware as a black box .

  • 8/3/2019 z-Bioinformatics

    12/14

    Window approaches In particular, in comparing two

    sequences, a dot matrix can be used

    where one sequence is written outhorizontally and the other is writtenout vertically . A dot I placed at the

    intersection of a row and a column foreach matched pair of letters. If thefrequency matched letters betweentwo sequences is high, particularly inDNA sequences , which arecomposed of only four buildingblocks , the background noise is

    high . In order to reduce the noise,one can place a dot only when severaljoined letters are matched. The

    numbers of joined letters evaluatedtogether is called the window size.

  • 8/3/2019 z-Bioinformatics

    13/14

    Efficient use of program When performing a database

    search , a research must knowthat he can improve his results .If he knows the principles, the

    use of windows, he will beincrease the sensitivity bydecreasing the window size

    parameter. This will improve theability of the program torecognize distantly related

    sequences . Alternatively , he willbe able to increase the specificityby increasing the window size

    parameter ..

  • 8/3/2019 z-Bioinformatics

    14/14

    conclusion

    This is important for a researcherwho wants to use the programs

    available for sequence analysisto acquire a reliable knowledgeof biocomputing. Knowing thecapabilities and the draw backsof the program will help us to

    use them in a more accurate andefficient way.