13
UNIVERSITI TEKNOLOGI MARA FAKULTI KEJURUTERAAN KIMIA BIOINFORMATICS (CBE 647) NAME : NORAFIQAH BINTI AZMAN STUDENT NO. : 2010872226 GROUP : EH 222 8A EXPERIMENT : LAB 1 DATE PERFORMED : 19 TH MARCH 2014 SEMESTER : 8 PROGRAMME / CODE : CBE 647 (BIOINFORMATICS) SUBMIT TO :DR. TAN HUEY LING No Title Marks 1 Abstract / Summary 2 Introduction 3 Aims 4 Theory 5 Methodology 6 Results 7 Discussions 8 Conclusion Remarks: Checked by : Recheck by : .. .......................... ............................. Date : Date :

Bioinformatics Lab 1

Embed Size (px)

DESCRIPTION

example

Citation preview

Page 1: Bioinformatics Lab 1

UNIVERSITI TEKNOLOGI MARA

FAKULTI KEJURUTERAAN KIMIA

BIOINFORMATICS (CBE 647)

NAME : NORAFIQAH BINTI AZMAN

STUDENT NO. : 2010872226

GROUP : EH 222 8A

EXPERIMENT : LAB 1

DATE PERFORMED : 19TH

MARCH 2014

SEMESTER : 8

PROGRAMME / CODE : CBE 647 (BIOINFORMATICS)

SUBMIT TO :DR. TAN HUEY LING

No Title Marks

1 Abstract / Summary

2 Introduction

3 Aims

4 Theory

5 Methodology

6 Results

7 Discussions

8 Conclusion

Remarks:

Checked by : Recheck by :

.. .......................... .............................

Date : Date :

Page 2: Bioinformatics Lab 1

ABSTRACT

From this exercise,students was introduce to the basic knowledge about Bioinformatics. Many bioinformatics

website had been introduced to students such as GenBank, KEGG, UniProtKB, OMIM, GO, ORF Finder, and NCBI. This

website databases provide information that are vital in completing the task given. There are four parts of exercise

need to be solved which are finding public biological databases, NCBI Entrez and searching biological databases,

determining the Open Reading Frame (ORF) of the Hemoglobin Alpha 2 Gene, and extracting sequence.

INTRODUCTION

To familiarize students with the website datbase regarding bioinformatics, 4 laboratory exercise were performed.

First is finding public biological databases. These biological databases can be accessed from the Gen Bank, KEGG, UniProtKB, OMIM and GO. These databases can be accessed through NAR (Nucleic Acid Research).NAR Online contains hotlinks to all of the databases in the compilation as well as brief summaries of their content. Second part is the NCBI Entrez and searching biological databases.in this particular part, students were assigned to

investigate the human triose phosphate isomerase 1 gene which is responsible for the reaction that converts

dihdroxyacetone phosphate to glyceraldehyde-3-phosphate in glycolysis. In order to performed this task, students

must visit the NCBI website and visit the “All Databases” page.

The third part is determination of the Open Frame Findings (ORF) and determination of the gene product in the ORF.

Students were assigned to determine the start and stop codon for Hemoglobin Alpha 2 (HBA2) Gene. For this task,

students can accessed GenBank database to solve the task.

Lastly is sequences extraction. In this section, students were given guidelines steps by step in finding whether

nucleotide ara h2 or opsins were related to peanut allergic or an eye gene related to long wave sensitivity and colour

blindness.

OBJECTIVE

The main goal of this laboratory exercise is to give students practical experienxes sing the NCBI interface. Apart from

that, it also to give students an early idea on how to navigate and perform basic and advanced searches using the

NCBI website.

THEORY

Biological databases are libraries of life sciences information, collected from scientific experiments, published

literature, high-throughput experiment technology, and computational analyses. They contain information from

research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics.

Triose Phosphate Isomerase plays an important role in glycolysis and is essential for efficient energy production. TPI

has been found in nearly every organism searched for the enzyme, including animals such as mammals and insects

as well as in fungi, plants, and bacteria.

In molecular genetics, an open reading frame (ORF) is the part of a reading frame that contains no stop codons. The transcription termination pause site is located after the ORF, beyond the translation stop codon, because if transcription were to cease before the stop codon, an incomplete protein would be made during translation.

Page 3: Bioinformatics Lab 1

METHODOLOGY

Page 4: Bioinformatics Lab 1

RESULT & DISCUSSION Based on question part A) in Finding Public Biological Databases, we have to click on website as stated below. (http://www.oxfordjournals.org/nar/database/a/). Below was the screenshot of the website.

Basically Nucleic Acids Research (NAR) was established to help researcher in finding the results of research in physical, chemical, biochemical and biological aspects of nucleic acids and proteins involved in nucleic acid metabolism and/or interactions. This site focus on the database and summary of particularly selected NAR database. First is GenBank (Nucleotide Sequence Databases)

GenBank contains more than 300 000 available nucleotide sequences. NCBI helps reseacher in understanding of fundamental molecular and genetic processes that control health and disease. For example, if we want to know more on DNA&RNA, just click on DNA&RNA on the left side of the site. Then it will link us to all the databases related to it. Through this, reseacher can limit their time usage and findings information in an ease method.

Page 5: Bioinformatics Lab 1

Second is the KEGG (Metabolic Pathways).

This site provides a database resource understanding high-level functions and utilities of the biological system. At

the main page, it has organism-specific entry points. Through this, if researcher known the specific org codes, they

can just fill it and will be directly directed to the specific page. For example. Input code of hsa which stands for Homo

sapiens (human). The new link will provide all the genome information such as pathway map, brite hierarchy,

module, blast and taxonomy.

Third is UniProtKB (Proteins).

This site provide a scientific comunity with a comprehensive, high quality and freely accessible resource of protein sequence and functional information. For example, if we want to know amylase protein. When amylase was key in, the page will be directed to page with all of proteins amylase database. There are 80 657 results for amylase available in UniProtKB.

Page 6: Bioinformatics Lab 1

Fourth is OMIM (Online Mendelian Inheritance in Man).

Basically, OMIM provides a comprehensive, authoritative compedium of human genes and genetic phenotypes that is freely available and updated daily. For example, type amylase and the new link will provide a range of 33 available amylase genes. Lastly, is GO (Gene Ontology)

It is a major bioinformatics initiative to standardize the presentation of gene and gene product across species and databases.

Page 7: Bioinformatics Lab 1

Part B (NCBI Entrez and Searching Biological Databases). For question B1), we have to use database from NCBI website to determine which among these query: gene, proteins or nucleotide were a good search query in finding triosephosphate isomerase. For query gene:

For query protein:

For query nucleotide:

Based on these queries, query gene was proven to be the most efficient in specify both the name of the genes as well as the organism. From the data obtained, name of the gene is TPI1 triosephosphate isomerase 1 and the organism is human sapiens (human).

Page 8: Bioinformatics Lab 1

For question B2), we have to search the RefSeq accession number for this gene in the mRNA form and protein form.

From this page, we can see clearly the RefSeq accession number for this gene in the mRNA form is NM_000365.5 and in protein form is NP_000356.1. For question B3), we haave to determine which cromosomes does this gene lies.

For this question, the gene lies on chromosomes 12. For question B4), we have to determine how many amino acids are presents in this gene and identify the first five sequences.

The first five amino acids obtained are mapsr. For question B5), we have to determine the author of of this paper and it’s unique PubMed ID.

For this question, the authors are Watanabe H, Seino T and Sato Y and the PubMed ID is 15358119.

Page 9: Bioinformatics Lab 1

Part C. Determination of the Open Reading Frame (ORF) of the Hemoglobin Alpha 2 (HBA2) Gene. For part C1) we have to retrieve the mRNA sequence from GenBack database.

The mRNA sequence is

Start and stop codons are:

auggugcug ucuccugcc gacaagacc aacgucaag gccgccugg gguaagguc

ggcgcgcac gcuggcgag uauggugcg gaggcccug gagaggaug uuccugucc

uuccccacc accaagacc uacuucccg cacuucgac cugagccac ggcucugcc

cagguuaag ggccacggc aagaaggug gccgacgcg cugaccaac gccguggcg

cacguggac gacaugccc aacgcgcug uccgcccug agcgaccug cacgcgcac

aagcuucgg guggacccg gucaacuuc aagcuccua agccacugc cugcuggug

acccuggcc gcccaccuc cccgccgag uucaccccu gcggugcac gccucccug

gacaaguuc cuggcuucu gugagcacc gugcugacc uccaaauac cguuaagcu

ggagccucg guagccguu ccuccugcc cgcugggcc ucccaacgg gcccuccuc

cccuccuug caccggccc uuccugguc uuugaauaa

from this sequences, we can see clearly that aug acts as start codon while uaa acts as stop codon. For the ORF, the mRNAs sequences are:

auggugcug ucuccugcc gacaagacc aacgucaag gccgccugg gguaagguc

ggcgcgcac gcuggcgag uauggugcg gaggcccug gagaggaug uuccugucc

uuccccacc accaagacc uacuucccg cacuucgac cugagccac ggcucugcc

cagguuaag ggccacggc aagaaggug gccgacgcg cugaccaac gccguggcg

cacguggac gacaugccc aacgcgcug uccgcccug agcgaccug cacgcgcac

aagcuucgg guggacccg gucaacuuc aagcuccua agccacugc cugcuggug

acccuggcc gcccaccuc cccgccgag uucaccccu gcggugcac gccucccug

gacaaguuc cuggcuucu gugagcacc gugcugacc uccaaauac cgu

Page 10: Bioinformatics Lab 1

For question C2),we have to translate the first 10 codons to amino acid sequence by using genetic code table.

i. gug = valine (V) ii. cug = leucine (L) iii. ucu = serine (S) iv. ccu = proline (P) v. gcc = alanine (A)

vi. gac = aspartic acid (D) vii. aag = lysine (K)

viii. acc = threonine (T) ix. aac = asparagine (N) x. guc = valine (V)

For question B3), the results of the ORFs will be all six reading frames. The longest frames will most probably translated to the protein.

For this question just copy/paste the mRNA sequence and the new link will shown the six frames available. By clicking the longest length will provide the above diagram. When clicking the longest frames, a corresponding translation is provided. Thus the statement is true.

Page 11: Bioinformatics Lab 1

Part D (Sequence Extraction) Step 1-2

Step 3

Step 4

Step 5

Page 12: Bioinformatics Lab 1

Step 6

Step 7

From this, we can stated that ara h2 is related to peanut allergen and opsins is refering to eye gene related to long-wave sensitivity and colour blindness.

Page 13: Bioinformatics Lab 1

CONCLUSIONS

From this experiments, students should be familiarize with the NCBI interface on findings information needed in solving Bioinformatics problems. NCBI website contains a wide range of resources that shoulb be able to helps students in finding solution related to their task.