20
Gene Hunting for the Cystic Fibrosis gene Centro Università di Milano-Scuola per la diffusione delle bioscienze e delle biotecnologie www.cusmibio.unimi.it

Centro Università di Milano-Scuola per la diffusione delle ...users.unimi.it/cusbio/scaricare/cystic_fibrosis.pdfpromise as a potential avenue to cure cystic fibrosis. Gene therapy

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Centro Università di Milano-Scuola per la diffusione delle ...users.unimi.it/cusbio/scaricare/cystic_fibrosis.pdfpromise as a potential avenue to cure cystic fibrosis. Gene therapy

Gene Hunting for the Cystic Fibrosis gene

Centro Università di Milano-Scuola per la diffusione delle bioscienze e delle biotecnologie

www.cusmibio.unimi.it

Page 2: Centro Università di Milano-Scuola per la diffusione delle ...users.unimi.it/cusbio/scaricare/cystic_fibrosis.pdfpromise as a potential avenue to cure cystic fibrosis. Gene therapy

2

The Cus-Mi-Bio staff, composed of both University Professors and High School teachers, are the scientific editors and authors of this Handbook’s contents. Workshop Leaders Giovanna Viale Professor of Biology and Genetics, Dept. of Biology and Genetics for Medical Sciences, University of Milan, via Viotti 3/5, Milan, Italy Cinzia Grazioli High school teacher fully working at Cus-Mi-Bio Dept. of Biology and Genetics for Medical Sciences, University of Milan, via Viotti 3/5, Milan, Italy Cristina Gritti High school teacher fully working at Cus-Mi-Bio Dept. of Biology and Genetics for Medical Sciences, University of Milan, via Viotti 3/5, Milan, Italy

Page 3: Centro Università di Milano-Scuola per la diffusione delle ...users.unimi.it/cusbio/scaricare/cystic_fibrosis.pdfpromise as a potential avenue to cure cystic fibrosis. Gene therapy

3

Cystic Fibrosis A bioinformatic gene hunting

Scenario You are a geneticist and are requested counselling from a young couple. Sophie and David have two children, Peter 6 years old is a healthy boy, Mary 4 years old has frequent cough, colds and recurrent lung infections. Sophie tells you that her sister died at 12 years of age from Cystic Fibrosis (CF), while in David’s family there are no cases of the disease. Sophie is now in the 6th week of pregnancy of her third child and asks information about her risk of giving birth to a CF affected child. You suggest that Mary has a sweat test to establish Na+ and Cl- ions levels. This is a very simple test routinely used for CF diagnosis. From the data available at this point, draw the pedigree of the family and answer the following questions: David and Sophie come back to show you the results of the sweat test. The test confirms the diagnosis of CF (Na+ = 87 nmol/l, well above the reference range of 60 nmol/l) for Mary. With this new piece of information, answer the same questions again. The affected child Mary can now be represented by a full symbol in the pedigree.

• The risk of Sophie being heterozygous for CF • The risk of David being heterozygous for CF • The risk of the foetus being affected by CF • The probability of the baby being healthy / being heterozygous for CF

• The risk of Sophie being heterozygous for CF • The risk of David being heterozygous for CF • The risk of the foetus being affected by CF • The probability of the baby being healthy / being heterozygous for CF

Page 4: Centro Università di Milano-Scuola per la diffusione delle ...users.unimi.it/cusbio/scaricare/cystic_fibrosis.pdfpromise as a potential avenue to cure cystic fibrosis. Gene therapy

Fig.1. CFTR is a unique member of the ABC transporter family that forms a novel Cl- channel. It is located predominantly in the apical membrane of epithelia where it mediates transepithelial salt and liquid movement. The protein has 5 domains: 2 MSD (membrane spanning domain), 2 NBD (nucleotide binding domain) binding ATP and a regulatory (R) domain. In the majority of CF cases, the NBD1 domain is affected.

Cystic Fibrosis (CF)Cystic Fibrosis (CF)

Cystic fibrosis is a homozygous recessive genetic disorder associated with abnormal secretions of the exocrine glands and therefore affecting many vital body functions, from respiration to digestion and reproduction. This disease involves a genetic mutation that disrupts the cystic fibrosis transmembrane regulator (CFTR) protein (Fig.1), resulting in abnormal regulation of electrolyte transport across the epithelial cell membrane. The CFTR channel protein possesses two ATP-hydrolyzing domains which allow the protein to use energy in the form of ATP to control channel gating. Malfunction of CFTR results in production of poorly hydrated, thickened mucous secretions in the lungs and pancreas and in the characteristic high chlorine levels in sweat (Fig.2). CF is the most common hereditary disease in the

caucasic white population with an incidence of 1 in 2500-3500 individuals. One in 25-30 people of European descent carry one gene for CF (are heterozygous). The CFTR gene is found at the q31.2 locus of chromosome 7, is 230 000 base pairs long (27 exons), and creates a protein that is 1,480 amino acids long. The most common mutation, ΔF508 is a deletion (Δ) of three nucleotides that results in a loss of the amino acid phenylalanine (F) at the 508th (508) position on the protein (Fig.3). This mutation accounts for seventy percent of CF worldwide and more than 80% in Northern European populations. There are over 1,400 other rare mutations that can produce CF, however. All the identified mutations affect single nucleotides or a small number of adjacent nucleotides. To describe mutations, a specific nomenclature is used: a mutation can be described as a change in genomic DNA (g.), in cDNA (c.) or in the protein (p.). I.e. p.F508del or ΔF508 means that the mutation is a deletion of the aminoacid phenylalanine (F) in position 508. Five mutations cover around 90% of CF cases in Caucasian populations (Tab.1). Affected individuals can carry two identical mutations or two different mutations.

Mutation exon % p.F508del 10 81,0 p.Gly551Asp 11 3,5 p.Gly542X 11 1,1 c.621+1G>T Intron 4 1,0 c.1898+1G>A Intron 12 0,9

Fig. 2. Main body apparatus affected in CF. In all affected individuals, sweat glands produce excessive amounts of salt. Thick mucus blocks digestive enzymes in the pancreas. The viscous mucus blocks respiratory ducts and makes respiration difficult. In males, dense mucus obstructs sperm ducts leading to infertility. Only 2-3% of the affected males are fertile.

Fig. 3. The figure shows chromosome 7 involved in CF and the nucleotide and aminoacid sequences involved in the ΔF508 (F508 del) mutation.

Tab.1 Most common mutations in CFTR gene in Northern Europe; p and c mean that the mutation is described in the protein or in cDNA respectively; “+” means that the involved nucleotide is the first in the intron following the exon which terminates at the nucleotide position indicated by the number before the “+ “.

Page 5: Centro Università di Milano-Scuola per la diffusione delle ...users.unimi.it/cusbio/scaricare/cystic_fibrosis.pdfpromise as a potential avenue to cure cystic fibrosis. Gene therapy

5

CF mutations affect CFTR function at different levels. As shown in Fig.4, non-sense mutations cause premature arrest of protein synthesis, missense mutations can affect protein processing, regulation or function (Fig. 4). ΔF508 deletion affects CFTR glycosylation in the Golgi, blocking its transport to the plasma membrane. The different mutations result in different gravity of the disease. There is no cure for CF. Average life expectancy is around 36.8 years. Management of CF is limited at modifying and reducing the effects of most frequent mutations. Gene therapy holds promise as a potential avenue to cure cystic fibrosis. Gene therapy attempts to place a normal copy of the CFTR gene into affected cells. Studies have shown that to prevent the lung manifestations of cystic fibrosis, only 5–10% the normal amount of CFTR gene expression is needed. CF is due to mutations of the CFTR on chromosome 7. CF is an autosomic recessive disease and therefore Mary carries two mutated copies of the CFTR gene. (Note that as all affected individuals, Mary could carry two identical mutations or two different mutations). To answer Sophie’s and David’s questions you need to know the genotypes of some crucial members of the family. Which ones?

DNA analysis allows the definition of genotypes and the identification of the mutations present in the family (in particular Mary and her parents). You recommend Sophie to undergo amniocentesis to examine fetal DNA for the presence of CF mutations. The most common DNA based diagnostic test for CF is an hybridization technique (Dot Blot) for the detection of the most frequent mutations of CFTR gene (see Tab 1 in the CF card). DNA is extracted from patient’s cells and deposited in multiple spots on a filter, denatured and hybridized to labelled oligonucleotide probes specific for the different mutations. After extensive washing, to eliminate excess unbound probe, a simple colorimetric reaction is performed to identify positive probe hybridization.

Dot Blot results: The three DNA samples (parents’ and Mary’s) hybridize to the same probe; Parent’s DNA also hybridize to the control probe (wild type gene); Mary’s DNA does not hybridize to the control probe; Fetal DNA (from amniocentesis) hybridizes to a specific probe (the same as Mary’s and her parents’) and to the control probe. From the Dot Blot results it is possible to derive:

• The parents’ genotypes • Mary’s genotype • The fetus genotype

Fig. 4. Effects of most common mutations on CFTR function.

Page 6: Centro Università di Milano-Scuola per la diffusione delle ...users.unimi.it/cusbio/scaricare/cystic_fibrosis.pdfpromise as a potential avenue to cure cystic fibrosis. Gene therapy

6

Identification of the mutation present in Mary’s family You are given the sequence of the probe hybridizing with Mary’s and her parents’ DNAs and are requested to identify the mutation present in this family. Probe sequence: 5’- CACCATTTAAAGAAAATATCGGTGTTTCCTATGATGA -3’ Start your search at the National Center for Biotechnology Information (NCBI) http://www.ncbi.nlm.nih.gov/ In the homepage, choose BLAST (on the horizontal menu bar on top of the page) .

BLAST (Basic Local Alignment Search Tool) finds regions of local similarity between sequences (homology regions). The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families. The software BLAST is composed of different algorithms allowing the alignment not only of nucleotide sequences to other nucleotide sequences, but also protein sequences to protein sequences, nucleotide sequences to protein sequences and vice versa, using the rules of the genetic code to translate nucleotide

into aminoacid sequences. In the BLAST page of the NCBI site choose Nucleotide-nucleotide BLAST (blastn). In the search box paste the sequence you want to compare to the database. In the field “choose database” you can choose the database you want to use for the analysis. Choose the default option nr (non redundant) which contains all the deposited non redundant sequences; this is indeed

Page 7: Centro Università di Milano-Scuola per la diffusione delle ...users.unimi.it/cusbio/scaricare/cystic_fibrosis.pdfpromise as a potential avenue to cure cystic fibrosis. Gene therapy

7

only an historical definitions, since now many of the information contained in the databases are redundant. (In many cases, you might want to select alignment only to sequences from a given organism; in this case, you can specify the organism of interest in the field “Options for advanced blasting” and restrict sequence analysis, for instance, to Homo sapiens). All the underlined entries are hyperlinks with short definitions of the corresponding field. You can have further information clicking on the various entries. For simplicity, we will use standard criteria for the analysis, without any option for advanced blasting. Once you have pasted your sequence, click on the “BLAST!” button. Identification of the sequence with the best alignment score The results page is divided in five sections. In the first section you find: • link to “reference” on the program (BLASTN); • information on the database; • link to “taxonomy report” where you can find a phylogenetic representation of all the species to whom the sequences identified by the BLAST analysis belong. The same sequences are also listed by organism; • information about the “query” sequence (i.e. length, shown as “n. letters”, where letters stands for nucleotide bases).

The second section is a graphic representation of the sequences which have obtained the highest scores in the alignment to the“query”:

Page 8: Centro Università di Milano-Scuola per la diffusione delle ...users.unimi.it/cusbio/scaricare/cystic_fibrosis.pdfpromise as a potential avenue to cure cystic fibrosis. Gene therapy

8

• The red thick line represents the “query”; • the numbers underneath refer to the length in nucleotides; • each of the thin lines below the “query”, of various colors, shows an alignment of the “query” sequence to one sequence from the nucleotide database; • the color code used to represent sequences reflects the alignment score obtained, which in turn depends on the percent identity between the two sequences. If you pass with the pointer on the different colored segments, the name and the access number of the corresponding sequence will appear; clicking on one of the segments will take you to the corresponding sequence alignment; • under the graphic scheme you find a new link: “distance tree of result” which is a graphic representation of the evolution distances separating the sequences which share alignments with the “query. BLAST can be used to infer evolutionary relationships between sequences. Phylogenetic trees programs have been developed to provide graphic representations (trees) of the relationships existing among biological sequences. Note that, since they are inevitably based on a certain degree of approximation, phylogenetic trees in certain cases don’t give definite results. In the third section, under the phylogenetic tree, nucleotide sequences producing significant alignments to the “query” are listed, starting from those with the highest score and the lowest statistic significance value (E). The statistic significance value E gives the probability to obtain the same match just by chance. For example, an E value of 1 assigned to a hit can be interpreted as meaning that in a database of the current size one might expect to see 1 match with a similar score simply by chance. This means that the lower the E-value, or the closer it is to "0", the more "significant" the match is.

Page 9: Centro Università di Milano-Scuola per la diffusione delle ...users.unimi.it/cusbio/scaricare/cystic_fibrosis.pdfpromise as a potential avenue to cure cystic fibrosis. Gene therapy

9

The forth section shows significative alignments of the "query" sequence to sequences in the database with the highest similarity to the "query" (“subject “ sequences).

For each alignment, the following features are given: • alignment Score; • Expect, i.e. ”E value”; • Identities showing the ratio between the number of identical bases (in this case 37) and the alignment length (in this case 40); in brackets the resulting percentage (in this case 92%) between the two sequences in the region of alignment; • Gaps showing the ratio between the number of gaps present in the alignment (in this case 3) and the alignment length; in brackets the resulting percentage (in this case 7%); • Strand shows the polarity of the "query" sequence in respect to the sequence in the database (Plus/Plus means that the “query” sequence has the same polarity as the sequence in the database, while Plus/Minus means that the two sequences have opposite polarity). • The complete alignment follows between the "query" sequence and a given sequence in the database (" sbjct”). Numbers correspond to base positions within the two sequences; when in a given position the base in the "query" sequence corresponds to that in the “sbjct” sequence, the symbol "|" appears in between. This symbol is missing when, in a given position, the nucleotide in the "query" sequence and that in the “sbjct” sequence are different or when one sequence shows a gap, as found in the alignments with low identity scores. The fifth section contains statistic details of the search.

Page 10: Centro Università di Milano-Scuola per la diffusione delle ...users.unimi.it/cusbio/scaricare/cystic_fibrosis.pdfpromise as a potential avenue to cure cystic fibrosis. Gene therapy

10

Now click on the link corresponding to S71320.1 (the first in the list) to go to the database page containing info relative to this sequence.

Hunting the cystic fibrosis gene Go back to the NCBI homepage: http://www.ncbi.nlm.nih.gov/

To limit your search, choose Nucleotide in the Search box and type CFTR cystic fibrosis homo sapiens in the for box. Now click GO.

Which mutation is present in Mary’s family?

Page 11: Centro Università di Milano-Scuola per la diffusione delle ...users.unimi.it/cusbio/scaricare/cystic_fibrosis.pdfpromise as a potential avenue to cure cystic fibrosis. Gene therapy

11

The database has found 301 hits. In the list below, choose the hit related to the CFTR gene and click on the corresponding code NM_000492 to find a typical GenBank nucleotide file. A lot of it is hard to read, but a few things are clear. First, you find the LOCUS DEFINITION and the ACCESSION code. Under references, you find citations to the publication of this sequence in the scientific literature. To see an abstract of the article in which this gene is described, click the PubMed link below the reference. Scrolling down the page you find the FEATURES section, i.e. info on the main sequence elements, the protein sequence and the coding sequence (CDS).

To obtain the CFTR cDNA sequence, click in the box Display (top left of the page) and select FASTA. In the following page you find the complete cDNA sequence in the format compatible to your next search. Save the sequence in a word file (cDNAFASTA.doc).

Going back to the NM_000492 page, find the identifying code of the normal CFTR protein (NP_000483) and its aminoacid sequence. Copy and save it as a CFTRprot.doc file. You can count the aa residues (using the word tool “word count”), or you can find the protein length (number of aa residues) clicking on the protein code. From the cDNA sequence you can find the genomic sequence of the CFTR gene using the BLAT (BLAST-Like Alignment Tool) software.

Page 12: Centro Università di Milano-Scuola per la diffusione delle ...users.unimi.it/cusbio/scaricare/cystic_fibrosis.pdfpromise as a potential avenue to cure cystic fibrosis. Gene therapy

12

Return to http://genome.ucsc.edu/cgi-bin/hgBlat?db=mm2 and paste the full cDNA sequence (in FASTA format) in the BLAT search box. This software allows to easily compare cDNA sequences and genomic sequences and to identify the exon-intron structure of a given gene.

Submit and, in the results page, click “details” on the left of the first record (score 6106, size in nucleotides 8094, 100% identity). The gene, as you already know, is located on chromosome 7. In the following page, you find your cDNA sequence and, below, the alignment of this sequence to the genomic sequence. Matching bases in cDNA and genomic sequences are colored blue and capitalized. Light blue bases mark the boundaries of gaps in either sequence (often splice sites). In the genomic sequence below, the aligned regions (exons) are shown in capital blue letters while the regions with no alignment (introns) are in small black letters. Click on links, in the left column, to navigate through the alignment. Clicking on the various blocks, you are directed to the corresponding exon in the genomic sequence. The CFTR gene has 27 exons. Construction of deleted cDNA Open the text file cDNAFASTA.doc, make a copy and save it as ΔcDNA.doc. You have now to insert the ΔF508 in the wild type DNA sequence. We suggest you use the Find function, finding the 15 nucleotides wild type sequence in Fig.3 (CF card). The sequence ATCATCTTTGGTGTT includes the positions deleted in the ΔF508 mutation. Delete the 3 nucleotides involved in ΔF508 (CTT) and save the new sequence.

Page 13: Centro Università di Milano-Scuola per la diffusione delle ...users.unimi.it/cusbio/scaricare/cystic_fibrosis.pdfpromise as a potential avenue to cure cystic fibrosis. Gene therapy

13

Translation of the deleted cDNA sequence to obtain the mutated protein sequence Go to: http://arbl.cvmbs.colostate.edu/molkit/translate/index.html

Here you can translate your cDNA sequence into an aminoacid sequence. Just paste your deleted cDNA sequence in the white box and click “translate DNA”. You will get an image with black background, where the results of the translation of your sequence in the

six possible reading frames (three for each DNA strand) are represented. In green you will find the putative ATG start codons and in mauve the stop codons. Choose the frame with the longest open reading frame, ORF (i.e. the longest sequence without stop codons). In our case, the longest ORF is forward frame 1. Now click “Text Output” to visualize the sequence and its translation into aminoacids. Choose forward frame 1. You can easily personalize the software output (you can choose to visualize the aa sequence with the one or three letter code, alone or together with the DNA sequence). Choose “one letter code” and “amino acids and DNA”. Copy the whole content of

Page 14: Centro Università di Milano-Scuola per la diffusione delle ...users.unimi.it/cusbio/scaricare/cystic_fibrosis.pdfpromise as a potential avenue to cure cystic fibrosis. Gene therapy

14

the box in a new Word file (pCFTR mutated.doc) using the font “courier new”, font size 8, black; this font is suitable for sequence alignments since all the letters and spaces have the same size.

Localization of the main features of the cDNA Now we can start the localization of all the main cDNA/mRNA features, i.e.: • 5’UTR (5’ UnTranslated Region) • start codon (start of translation, ATG (AUG) coding for methionine • CDS (CoDing Sequence) • stop codon (end of translation, TAA (UAA) / TAG (UAG) / TGA (UGA) • 3’UTR (3’ UnTranslated Region) • polyA site (polyadenilation site) : consensus sequence is “AATAAA” generally located 20-30 bp (max 10-35 bp) upstream the polyA tail. In some genes, this signal can slightly differ from the consensus sequence (i.e. ATTAAA). Now try to identify all the above mentioned elements and mark them with the yellow highlighter. Starting from the 5’ end of the molecule, find the correct start codon, i.e. the first ATG, corresponding to the first aa M, not followed, at short distance, by any stop codon (aa position 45). Once identified the start codon, you can cancel the upstream aa sequence. This region (5’UTR) is not translated by the ribosomes into protein. You now find the stop codon (in this case a TGA codon) and cancel the downstream sequence (3’UTR). The CDS (i.e. the sequence which is translated into protein) extends from nucleotide 133 to nucleotide 4570. Eventually, identify the polyadenilation signal the sequence signalling the transcription can stop and where the polyA tail is added (starting at position 6106). Compare it with the length of the normal protein you have taken note of before. View the CFTR 3D structure and compare the normal and mutated protein structures. Accurate 3D macromolecular structures are obtained by two main techniques: X-ray crystallography (83%) and Nuclear Magnetic resonance (NMR). Rough structure previsions can also be obtained by comparative modelling (when the unknown structure shows a good level of sequence identity with a known structure), but templates are available for only 20% of domains. This kind of analysis, although less accurate, can also provide important insights on the structure/function of a protein. The 3D coordinates of each single atom are collected in a file and can be viewed using dedicated software. We will use the Deep View software. All files containing molecular coordinates of the molecules whose structure has already been determined are collected (in different formats) in the PDB (“Protein Data Bank”) database (and are accessible using different kinds of software). Deep View Deep View (formerly called Swiss-PdbViewer) is a powerful molecular graphics program available from the Expert Protein Analysis System (ExPASy) Molecular Biology Server in Geneva, Switzerland. Deep View is downloadable from http://www.expasy.org/spdbv/. Deep View is an application that provides a user friendly interface allowing to view protein 3D structures, create models and also analyse several proteins at the same time. The proteins can be superimposed in order to deduce structural alignments and compare their active sites or any other relevant parts. Amino

How long is the mutated protein?

Page 15: Centro Università di Milano-Scuola per la diffusione delle ...users.unimi.it/cusbio/scaricare/cystic_fibrosis.pdfpromise as a potential avenue to cure cystic fibrosis. Gene therapy

15

acid mutations, H-bonds, angles and distances between atoms are easy to obtain thanks to the intuitive graphic and menu interface. For proteins of known sequence but unknown structure, DeepView submits amino acid sequences to ExPASy to find homologous proteins, onto which you can subsequently align your sequence to build a preliminary three-dimensional model. Then DeepView submits your alignment to ExPASy, where the SWISS-MODEL server builds a final model, called a homology model, and returns it directly to DeepView. With the Deep View software, you will learn to: • observe tertiary structures of proteins, identify domains and secondary structures • get familiar with the different types of visualizations: ribbon diagrams, backbone and sidechains • superimpose and compare two, or many, models simultaneously • identify the residues involved in a given pathologic mutation within a structure (es: F508del) • compare 3D structures of the normal and mutated protein. Open the file 2BB0 (with “.pdb” o “.ent” extension) containing the 3D structure of the normal CFTR protein. This file is also available at the “Protein Data Bank” http://www.rcsb.org/pdb/ site writing CFTR in the search box. Open the file with “DeepView” visualization software. To display a molecule, simply drag a Pdb file onto the Swiss-PdbViewer icon, select "Open" item from the "File" menu, or select one of the recently opened proteins that appear at the bottom of the file menu. By default, the molecule will appear in wireframe representation into the Display Window. The upper window gives access to the Menu and to the most common tools to manipulate the protein. The lower window shows the protein structure.

Protein manipulation

Drag

Center

Zoom Rotate Information about the protein

Image of the protein structure

Page 16: Centro Università di Milano-Scuola per la diffusione delle ...users.unimi.it/cusbio/scaricare/cystic_fibrosis.pdfpromise as a potential avenue to cure cystic fibrosis. Gene therapy

16

The icons in the toolbar window allow protein manipulation. The icon on the left allows to move the

protein to the center of the screen. The three next icons, above the Move all button, are used to drag, zoom and rotate the protein. Once the icon has been selected, (with a mouse click) you can manipulate the protein shown in the lower window, moving the mouse .

Icons of the third group, on the right, are for measuring distances, bond angles, and dihedral angles. We won’t use these functions.

The icon (with the written page, to the left of Move all, opens a window containing info about the protein, including its aminoacid sequence.

Control Panel Use the command Wind: Control Panel to display it. The window will appear on the left part of the screen. The Control Panel lists the amino-acids residues and other contents of the PDB file. You use the Control Panel to select residues, establish the content of the display, label residues, and color them. The first click on an inactive window activates it, but does not cause other changes. First, simply scroll down to the bottom of the list to see how many amino-acid residues the protein contains. Now click and drag, starting on the first residue at the top of the window, dragging down to the tenth, and releasing the mouse button. All group names from 1 to 10 should turn red. Groups printed in red are now selected. The simplest way to select a small number of residues is to click their names and drag to select a range of them. Press the Return key: this action hides all groups except those selected. Notice also that a checkmark has appeared in the show column next to the selected residues, indicating that they are on display. A side chain is shown only if the rest of the residue is shown, so you only see the side chains of displayed residues in the graphics window. There are also two narrow columns to the left of the group column. The first column is blank when the current model contains only one protein chain. If the crystal is made of more chains, in this column the letters A, B etc will appear. In the Control Panel a chain corresponding to aa residues 389-678 is shown, corresponding to the NBD1 functional domain of the CFTR protein. The second column contains groups of the letters h or s. Groups labeled h comprise alpha helices, while groups labeled s comprise strands of beta sheet. Click anywhere in the A column, you have selected the entire chain (all the A groups turn red). Press Return and the A chain will be displayed in the window. In the same way, you can select h or s parts of the chain. If you want to select two separate groups, select the first one and then the second keeping pressed the Control key. Once selected, press Return to visualize them. Select some residues and click on the columns show, side, labl, surface (Van der Waals forces are rendered as dots surfaces) and ribn. • Labl: shows the names of the selected aa residues. • Surface: shows for each aa the Van der Waals forces, rendered as dots surfaces. Other kinds of

surfaces are visible from the menu under the black triangle symbol. • Ribn: designs the 3D protein structure. You can select or deselect groups just keeping the mouse button pressed and dragging them along the Control Panel columns; you can get the same result clicking on the top of the columns.

Page 17: Centro Università di Milano-Scuola per la diffusione delle ...users.unimi.it/cusbio/scaricare/cystic_fibrosis.pdfpromise as a potential avenue to cure cystic fibrosis. Gene therapy

17

Remember that you can center, move, zoom, rotate the image in the display window using the corresponding icons. To show the alpha helices (red) and beta sheet (yellow) structure of the protein, in Control Panel select all the aminoacids in the ribn column (pressing the control and shift keys), from the Color menu select secondary structure and from the display menu select render in 3D. Coloring (Color menu) DeepView provides many different ways to color a model. Colors can reveal structural, chemical, and comparative features of the protein. Go to the Color menu and click: • Secondary Structure: DeepView colors helical residues red, beta sheet residues (strands) yellow, and all others gray. Notice that the color for each residue is displayed in a small square beside the residue in the right-hand column of the Control Panel. • Secondary Structure Succession: DeepView colors helices and strands, but with this command, color reflects the order of each structural element in the overall sequence of residues. The result is that it is easy to follow the chain through the protein -- elements of secondary structure are colored from the N-terminal to the C-terminal end in the order violet, blue, green, yellow, orange, red. • Chain: DeepView colors the entire model yellow. If there were more than one chain in this model, each chain would be shown in a different color. This color provides good contrast to the colors you will now add for sidechains. • Type: This command recolors the side chains all residues according to chemical type: non polar groups in grey (note that most non polar idrophobic side chains are inside the protein), acidic (negative) groups in red and basic (positive) in blue. • CPK: this operation returns the atoms to the standard colors: white for carbon, red for oxygen, blue for nitrogen and yellow for sulfur. N.B You can change the default colors for atoms, individual residues or complex selections of residues (chains) or the background color from Preferences (clicking color). In the new window you can choose the colors you like. Select Menu From the Select menu you can select the submenus: • All: to select all the protein residues (press Return to display).

Page 18: Centro Università di Milano-Scuola per la diffusione delle ...users.unimi.it/cusbio/scaricare/cystic_fibrosis.pdfpromise as a potential avenue to cure cystic fibrosis. Gene therapy

18

• Secondary structure: to select and display different parts of the protein (press Return to display). - Helices: to select and display residues forming alfa helices, - Strand: to select and display only beta sheets. - Coil: to select and display the remaining residues.

• Group property: to select and display only positive, negative, polar and non polar residues (press Return to display). Wind Menu From the Wind Menu you can select the submenu: • Alignment: The Align window appears at the bottom left. You use this window when you want to compare two or more proteins. The sequences of the two chains in one-letter abbreviations are shown. The name of the currently active layer is shown in red, with the reference layer at the top. You can click a layer name to make it active, and you can also select residues and switch layers from this window. Select 508 phenylalanine in the list of aa residues and color it blue; in the 3D image, the aa position will be shown and in the sequence in the Alignment window the name of the currently active residue will be shown in red.

Comparing the wild type and mutated forms of the CFTR protein In this case, we can align the wild type protein with its aberrant form. 3D structures are available for both forms of this important protein. We can identify the mutated amino acid from its position in the sequence and view the structural changes induced by the mutation. This is an optimal situation. However, 3D structures are not available for many proteins. And even less often for different forms of the same protein. Thus, one has to resort to identifying the mutation from the aminoacid positions in the sequence and inferring structural changes from the nature and position of the change. This is certainly less than optimal and will change with the availability of more accurate structure prediction algorithms that will allow to build testable hypotheses without having to experimentally determine the 3D structure of mutated proteins. Resolving protein structures experimentally is a tremendous effort and should be reserved for the final confirmation of already validated drug targets. Now, you will compare two different forms of the same protein, the normal and the P508del forms of CFTR. DeepView allows you to superimpose different models, and to color the models to emphasize conformational differences. Open the file 1XMJ.pdb (P508del) in Deep View (the normal protein is already displayed on your screen). Below the 3D structure, you will find the sequence alignment of the two proteins. The 1XMJ corresponds to aa residues 389-678. Select in the Control Panel, with a different color, the aa residues around position 508. As expected, P508 is missing in the 1XMJ crystal. To compare the two structures, you carry out a simple superposition of the two proteins in the same window. Before you do this, click on the Control Panel, then click and hold on the name 2BBO at the top. A menu appears, showing the names of all loaded models. Select 1XMJ. Examine the effects of the visible and can move buttons on 1XMJ. Restore movement and visibility for 1XMJ.

Page 19: Centro Università di Milano-Scuola per la diffusione delle ...users.unimi.it/cusbio/scaricare/cystic_fibrosis.pdfpromise as a potential avenue to cure cystic fibrosis. Gene therapy

19

To better compare the two structures, before superimposing them, color the normal protein (2BBO) model (2BBO) in pink (to act on one of the two proteins, to change colors etc , choose the correct file from the scroll down menu in the Control Panel). To superimpose the two structures, select Magic Fit from the Fit menu. You will immediately note that the two proteins have superimposable 3D structures.

Go back to the Cystic Fibrosis card at the beginning of this handbook. As you can learn in Fig. 4, F508del mutation causes the block of processing of the CFTR protein in the ER. The mutated protein doesn’t reach the membrane not because of its altered structure but because of its incomplete maturation. F508 is involved in a crucial signal for the correct post-translational modification of the protein: the F508del protein doesn’t pass the quality control step at the level of the endoplasmic reticulum. This mutated protein is transported to proteasomes and degraded.

Page 20: Centro Università di Milano-Scuola per la diffusione delle ...users.unimi.it/cusbio/scaricare/cystic_fibrosis.pdfpromise as a potential avenue to cure cystic fibrosis. Gene therapy

20

Links

Bioinformatics is a tool for accessing and managing data in a database. Bioinformatics includes: 1. Databases of experimental data generated in research laboratories 2. Software for navigating databases

A biological database is a collection of data generated by experimental research, both from in vitro and in vivo studies, from bioinformatics analysis (in silico) and from the scientific literature. Databases are systematically arranged collections of data structured to enable efficient user access and management of different types of information. Databases are composed of “entries”. Each entry contains data according to the database type. First level Databases: collections of nucleotide (DNA, RNA) or protein sequences and info to identify their species of origin and their functions. Specialized Databases: collections of specific info about taxonomy, functional domains, scientific literature, diseases associated with sequence mutations.

A list of Databases used by Bioinformatics labs (save them in your Bookmarks)

http://www.ncbi.nlm.nih.gov/ NCBI (National Biotechnology Institute, NIH) has developed a public database and the software for genome analysis. http://www.ensembl.org Ensembl (from the french word “ensemble” and “EMBL” European Molecular Biology Laboratory) is a database created from the collaboration of EMBL - European Bioinformatics Institute (EBI) and Wellcome Trust Sanger Institute (WTSI) to develop a software system to automatically manage all the available annotations on some eukaryotic genomes. http://www.expasy.org/sprot/sprot-top.html Swiss Prot http://smart.embl-heidelberg.de/ SMART (Simple Modular Architecture Research Tool) http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/ The PDBsum Database is a collection of images allowing to have a preview of the 3D structures contained in the protein database Protein Data Bank (PDB). It shows the different component of the structure (i.e. protein chains, ligands, metal ions) and gives a schematic view of their interactions. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM OMIM, Online Mendelian Inheritance in Man http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=pubmed PubMed, accessible from NCBI Entrez was developed by National Center for Biotechnology Information (NCBI) and by National Library of Medicine (NLM), at the U.S. National Institutes of Health (NIH). PubMed allows accession to scientific and medical literature. Bioinformatics tools

http://genome.ucsc.edu/cgi-bin/hgBlat?db=mm2 BLAT (BLAST-Like Alignment Tool), http://arbl.cvmbs.colostate.edu/molkit/translate/index.html This tool allows to insert nucleic acid sequences (cDNA or mRNA) and to obtain all the possible forms of translation into aminoacid sequences. http://www.expasy.org/spdbv/ Deep View (from Expert Protein Analysis System (ExPASy) Molecular Biology Server in Geneva, is an application that provides a user friendly interface allowing to view protein 3D structures, create models and also analyze several proteins at the same time. For proteins of known sequence but unknown structure, DeepView submits amino acid sequences to ExPASy to find homologous proteins, onto which you can subsequently align your sequence to build a preliminary three-dimensional model. Then DeepView submits your alignment to ExPASy, where the SWISS-MODEL server builds a final model, called a homology model, and returns it directly to DeepView.