3
224 Restriction Enzyme Searches Using the Word Pro- cessing Software: LOCOSCRIPT T SMITH Department of Biochemistry and Microbiology University of St Andrews St Andrews, Fife KY16 9AL, Scotland, UK Introduction One of the most important routine applications of computers in biochemistry and molecular biology is the determination of restriction enzyme sites. In our laboratory, we routinely carry out restriction enzyme searches on the University's Vax system, using the Staden Molecular Biology software. Dedicated mol- ecular biology software, however, can be expensive, and is frequently difficult to obtain for the computer system which happens to be available, while for those involved in introducing a genetic engineering/biotechnology com- ponent into the life science courses offered by colleges of further education and schools, the purchase of software may prove a major hurdle. Even when the systems and software are available, competition from colleagues for terminals can mean constant frustration and delay to the biochemist wanting to do a simple restriction enzyme search which is well within the capacity of the cheapest PCW. We now describe a rapid and reliable method of performing restriction enzyme searches and other mol- ecular biology functions using the word processing soft- ware LOCOSCRIPT included with the Amstrad PCW 8256. At a total cost for the complete system -- computer, monitor, dot matrix printer and LOCOSCRIPT -- of £299 plus VAT, this brings restriction enzyme searches and similar functions within the range of all educational institutes and research labs, and even of the individual scientist or teacher, many of whom already possess and use the system as a word processor. Method Creating a File Containing the DNA and Protein Sequence Nucleotide or amino acid sequences can be keyed in separately, or a nucleotide sequence written with its corresponding amino acid sequence beneath. A suitable format for A4 paper is to have 60 nucleotides or 20 amino acids per line, thus leaving space in the left hand for numbering the residues. In this article the format, 30 nucleotides and 10 amino acids will be used. To begin, press the TAB button twice, or move the cursor to the required spot with the space bar, then key in the first 60 nucleotides, using the conventional ATGC shorthand. After keying in the 60th nucleotide, press the RETURN key. Type in the number 1, then move the cursor to the position directly below the first nucleotide, and type in the corresponding amino acids using the standard three-letter code. Repeat this process until the whole sequence is entered, then press EXIT twice to obtain the Menu Exit Options, enabling the sequence to be saved on file, and printed out if desired eg: GTGACGAGCGAAAACGGGCAGCGCGCGGAA 1 ValThrSerGluAsnGlyGlnArgAlaGlu CTGAAGGAGAGCAGATCTGTGCAGAACAAG II LeuLysGluSerArgSerValGlnAsnLys ATCTGGAAGAACCGGTGGGGCGAGCGGTGG 21 [leTrpLysAsnArgTrpGlyGluArsTr p TCTGTGCAGATCTCCAACACCGACGCATCT 31SerValGlnIleSerAsnThrAspAlaSer Restriction Enzyme Searches These make use of the FIND and EXCHANGE facilities of LOCOSCRIPT. A list of restriction enzymes and their sites are available in many of the commercial molecular biology catalogues, supplied free of charge by companies such as Pharmacia, New England Biolabs, Amersham, Gibco-BRL and so on, and usually appear in the form: AGATCT Bgl II TCTAGA In order to locate the restriction enzyme sites, the cursor should be positioned at the start of the sequence by pressing ALT/SHIFT/DOC. On pressing the FIND button, a menu appears asking us to key in the characters we want to find. If we key in AGATCT and press ENTER, the cursor will move through the sequence until it comes to the first Bgl II site. To continue the search for more Bgl II sites, press the B button. A much faster and more permanent method uses the EXCHANGE function of LOCOSCRIPT. First copy the gene or protein sequence into another file (simply press/3 [copy] from the disc menu and follow the instructions). Press EXCHANGE; key into the menu the sequence you want to find, in this case, AGATCT, and that which you wish to exchange it for; say the name of the enzyme, in this case Bgl II or -Bgl2-. Move the cursor down the menu until it is over: Automatic exchange to end of DOC, press ENTER, and the cursor will move through the text replacing AGATCT wherever it occurs throughout the document with -Bgl2-. Sequence After Restriction Enzyme Search: GTGACGAGCGAA AACGGGCAGCGCGCGGAA 1 Val ThrSerG1 uAsnGlyGlnArgAlaGlu CTGAAGGAGAGC-BgI2-GTGCAGAACAAG 11LeuLysGluSerArgSerValGlnAsnLys ATCTGGAAGAACCGGTGGGGCGAGCGGTGG 21 [leTrpLysAsnArgTrpGlyGluArgTrp TCTGTGC-Bgl2-CCAACACCGACGCATCT 31SerValOlnlleSerAsnThrAspAlaSer BIOCHEMICAL EDUCATION 16(4) 1988

Restriction enzyme searches using the world processing software: LOCOSCRIPT

  • Upload
    t-smith

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Restriction enzyme searches using the world processing software: LOCOSCRIPT

224

Restriction Enzyme Searches Using the Word Pro- cessing Software: LOCOSCRIPT

T SMITH

Department of Biochemistry and Microbiology University of St Andrews St Andrews, Fife KY16 9AL, Scotland, UK

Introduction One of the most important routine applications of computers in biochemistry and molecular biology is the determination of restriction enzyme sites.

In our laboratory, we routinely carry out restriction enzyme searches on the University's Vax system, using the Staden Molecular Biology software. Dedicated mol- ecular biology software, however, can be expensive, and is frequently difficult to obtain for the computer system which happens to be available, while for those involved in introducing a genetic engineering/biotechnology com- ponent into the life science courses offered by colleges of further education and schools, the purchase of software may prove a major hurdle.

Even when the systems and software are available, competition from colleagues for terminals can mean constant frustration and delay to the biochemist wanting to do a simple restriction enzyme search which is well within the capacity of the cheapest PCW.

We now describe a rapid and reliable method of performing restriction enzyme searches and other mol- ecular biology functions using the word processing soft- ware LOCOSCRIPT included with the Amstrad PCW 8256. At a total cost for the complete system - - computer, monitor, dot matrix printer and LOCOSCRIPT - - of £299 plus VAT, this brings restriction enzyme searches and similar functions within the range of all educational institutes and research labs, and even of the individual scientist or teacher, many of whom already possess and use the system as a word processor.

Method Creating a File Containing the DNA and Protein Sequence Nucleotide or amino acid sequences can be keyed in separately, or a nucleotide sequence written with its corresponding amino acid sequence beneath. A suitable format for A4 paper is to have 60 nucleotides or 20 amino acids per line, thus leaving space in the left hand for numbering the residues. In this article the format, 30 nucleotides and 10 amino acids will be used.

To begin, press the TAB button twice, or move the cursor to the required spot with the space bar, then key in the first 60 nucleotides, using the conventional ATGC shorthand. After keying in the 60th nucleotide, press the RETURN key. Type in the number 1, then move the cursor to the position directly below the first nucleotide, and type in the corresponding amino acids using the standard three-letter code. Repeat this process until the whole sequence is entered, then press EXIT twice to

obtain the Menu Exit Options, enabling the sequence to be saved on file, and printed out if desired eg:

GTGACGAGCGAAAACGGGCAGCGCGCGGAA 1 ValThrSerGluAsnGlyGlnArgAlaGlu

CTGAAGGAGAGCAGATCTGTGCAGAACAAG II LeuLysGluSerArgSerValGlnAsnLys

ATCTGGAAGAACCGGTGGGGCGAGCGGTGG 21 [leTrpLysAsnArgTrpGlyGluArsTr p

TCTGTGCAGATCTCCAACACCGACGCATCT 31SerValGlnIleSerAsnThrAspAlaSer

Restriction Enzyme Searches These make use of the FIND and EXCHANGE facilities of LOCOSCRIPT. A list of restriction enzymes and their sites are available in many of the commercial molecular biology catalogues, supplied free of charge by companies such as Pharmacia, New England Biolabs, Amersham, Gibco-BRL and so on, and usually appear in the form:

AGATCT Bgl II TCTAGA

In order to locate the restriction enzyme sites, the cursor should be positioned at the start of the sequence by pressing ALT/SHIFT/DOC. On pressing the FIND button, a menu appears asking us to key in the characters we want to find. If we key in AGATCT and press ENTER, the cursor will move through the sequence until it comes to the first Bgl II site. To continue the search for more Bgl II sites, press the B button.

A much faster and more permanent method uses the EXCHANGE function of LOCOSCRIPT. First copy the gene or protein sequence into another file (simply press/3 [copy] from the disc menu and follow the instructions).

Press EXCHANGE; key into the menu the sequence you want to find, in this case, AGATCT, and that which you wish to exchange it for; say the name of the enzyme, in this case Bgl II or - B g l 2 - . Move the cursor down the menu until it is over: Automatic exchange to end of DOC, press ENTER, and the cursor will move through the text replacing AGATCT wherever it occurs throughout the document with - B g l 2 - .

Sequence After Restriction Enzyme Search:

GTGACGAGCGAA AACGGGCAGCGCGCGGAA 1 Val ThrSerG1 u AsnGlyGlnArgAlaGlu

CTGAAGGAGAGC-BgI2-GTGCAGAACAAG 1 1 L e u L y s G l u S e r A r g S e r V a l G l n A s n L y s

ATCTGGAAGAACCGGTGGGGCGAGCGGTGG 21 [leTrpLysAsnArgTrpGlyGluArgTrp

TCTGTGC-Bgl2-CCAACACCGACGCATCT 3 1 S e r V a l O l n l l e S e r A s n T h r A s p A l a S e r

BIOCHEMICAL EDUCATION 16(4) 1988

Page 2: Restriction enzyme searches using the world processing software: LOCOSCRIPT

225

In order to emphasize the position of the restriction site still more effectively, its nucleotide sequence can be exchanged for a blank space by pressing the space bar six times (instead of - B g l 2 - ) .

Restriction Sites Deleted for Emphasis:

GTGACGAGCGAAAACGGGCAGCGCGCGGAA 1 ValThrSerGluAsnGly~lnArgAlaOlu

CTGAAGGAGAGC GTGCAGAACAAG 11LeuLysGluSerArgSerValGlnAsnLys

ATCTGGAAGAACCGGTGGGGCGAGCGGTGG 21 [leTrpLysAsnArgTrpGlyGluArgTrp

TCTGTGC CCAACACCGACGCATCT 3 1 S e r V a l G l n I l e S e r A s n T h r A s p A l a S e r

A permanent copy of the restriction sites can then be obtained by printing out the sequence. At 90cps, this takes less than a minute for a typical sequence of around 1500 nucleotides, and the next search can be performed whilst printing out is taking place.

Shifting the Sequence to Reveal Hidden Sites Restriction enzyme sites lying partly on one line and partly on another will not be recognised; this problem, however, can easily be overcome by simultaneously searching a copy of the original sequence which has been shifted forward six nucleotides.

In order to produce the shifted sequence, make a copy of the original, ie position the cursor at the beginning of the sequence, press COPY, move the cursor to the end of the sequence, press COPY 0; then move the cursor to a point below the original where you want to insert the copy, and press PASTE 0; whereupon a copy of the original sequence such as the one below will appear:

GTGACGAGCGAAAACGGGCAGCGCGCGGAA 1 ValThrSerGluAsnGlyGinArgAlaGlu

CTGAAGGAGAGCAGATCTGTGCAGAACAAG ii LeuLysGluSerArgSerValGln/k~l~

ATCTGGAAGAACCGGTGGGGCGAGCGGTGG 21 I1oTrpLysAsnArgTrpGlyGluAr~Trp

TCTGTGCAGATCTCCAACACCGACGCATCT 31SerValGlnIleSerAsnThrAspAlaSer

In order to delete the six nucleotides or two amino acids at the end of each line move the cursor to the sixth nucleotide from the end of the first line, in this case G; press COPY: press EOL (End Of Line); press CUT 1.

Move cursor down to the sixth character from the end of the second line, in this case the A of Ala; press COPY; press EOL; press CUT 2.

Carry on until you have run out of numbers (only numbers 1-9 and 0 can be used.)

GTGACGAGCGAAAACGGGCAGCGC 2 ValThrSerGluAsnGlyGlnArg

CTGAAGGAGAGCAGATCTGTGCAG ii LeuLysGluSerArgSerValGln

ATCTGGAAGAACCGGTGGGC, CGAG 2111eTrpLysAsnArgTrpGlyGlu

TCTGTGCAGATCTCCAACACCGAC 31SerValGlnIleSerAsnThrAsp

To transfer the nucleotides deleted from the end of the first line onto the beginning of the third, position the cursor over the first nucleotide of the third line, in this case C; press PASTE 1.

Similarly, to transfer the amino acids deleted from the end of the second line onto the fourth line, position the cursor over the first character of the fourth line, in this case the L of Leu; and press PASTE 2, and so on.

On re-screening, the third BgllI ( - B g l 2 - ) site is revealed:

GTGACGAGCGAAAACGGGCAGCGC 1 ValThrSerGluAsnGlyGlnArg

GCGGAACTGAAGGAGAGC-BgI2-GTGCAG 11AlaGluLeuLysGluSerArBSerValGln

AACA-BgI2-GGAAGAACCGGTGGGGCGAG 21K.%Di~IleTrpLysAsnArgTrpGlyGlu

31 CGGTGGTCTGTGC-BgI2-CCAACACCGAC Ar~TrDSerValGlnIleSerAsnThrAsp

With the shifted version positioned below the original, both can be screened for restriction enzymes at the same time, increasing the search time by no more than a few seconds per enzyme.

Conclusion LOCOSCRIPT provides a cheap and efficient way of obtaining restriction enzyme searches of DNA sequences. Using the EXCHANGE function in a similar way, searches can also be made for open reading frames, and for conserved sequences, while specific amino acid residues which may have functional significance can be highlighted by exchanging them for a blank space, or an asterisk.

With the nucleotide sequence written in the format: GTG.ACG.AGC. (ie with a dot or space between each codon), translations can also be carried out:

BIOCHEMICAL EDUCATION 16(4) 1988

Page 3: Restriction enzyme searches using the world processing software: LOCOSCRIPT

226

(1) COPY the new format nucleotide sequence, (2) Using tables, FIND each codon in turn and EXCHANGE for its corresponding amino acid, (3) Position the cursor on the line above the first letter of the first amino acid with the space bar and PASTE back the nucleotide sequence.

Combined with simple, inexpensive experiments like agarose gel electrophoresis of DNA fragments generated by restriction enzyme digestion, and of their ligation products etc, the simple molecular biology programme described here, makes a neat package whereby some of the most important principles of DNA technology can be introduced at minimal cost in the pre-university situation; while for the professional researcher it provides a con- venient fireside means of carrying out restriction enzyme and other similar searches, whilst relieving the load on the institute's central computer network.

The Use of Computers in the Teaching of Quanti- tative Aspects of Biochemistry

J B CLARKE

Biochemistry Department Royal Holloway and Bedford New College Egham Hill, Egham Surrey TW20 OEX, UK

Laboratory work and the subsequent analysis of results is an important element in science education for a number of reasons. Perhaps the most significant of these is that students participate actively, in contrast to passive lecture attendance, and this is by far the most effective way of learning.1 Moreover, biochemists gain experience of the behaviour of biological materials and learn how to manipulate and obtain experimental data from apparatus which may sometimes be complex. In this way both measurement techniques and fundamental aspects of the nature of biological materials can be learnt satisfactorily. However, especially in large classes, the problems atten- dant on the subsequent analysis of data are often not well dealt with. A critical approach to results and their evaluation during laboratory sessions is frequently not encouraged with sufficient vigour. The reluctance of students to carry out numerical tasks on the spot leads to a situation in which analysis of results is carried out later. Essential 'interaction' with the experiment therefore does not occur. In addition, students frequently have a poor grasp of the significance of experimental error and do not attempt to interpret their results in a statistically meaning- ful way.

These failings are very important as assessment of the significance of results is a vital element of the scientific method. Indeed, an ability to cope with statistically-based interpretations is becoming as important in biochemistry as it is in biology, medicine, etc. This is partly due to the

ease with which statistical analysis can now be carried out. Spreadsheets and other powerful software such as MINITAB and SPSSx are important tools in this context. Indeed, students ought to be taught that experiments should be designed to take advantage of these methods. Although gains are small when analysing small data sets, significant advantages occur when large sets are involved. It also becomes possible to plan more efficient and complex experiments. Furthermore, the use of these spreadsheets stimulates a critical approach to the evalu- ation of results which at present is conspicuous by its absence in student work.

Laboratory work and the subsequent analysis of data gives an opportunity to improve the numeracy of students and is a vital aspect of training for scientific and other careers. Problems relating to numeracy are dealt with pri- marily through practical work and analysis of data. How- ever, lack of mathematical confidence or ability remains a problem with a high percentage of students. Moreover, circumstances in school education and elsewhere are likely to compound the situation in the future. These difficulties must be resolved, for students' job prospects are correlated highly with numerical ability. Although it is true that top quality students do cope, many do not and these form the majority. The numeracy of this weaker- ability group must be improved if our courses are to remain relevant to the needs of future students. The problem is likely to be particularly acute at times when the numbers of students applying for places in Biochemistry Departments is declining nationally.

The use of computers as an integral part of laboratory work offers a way to redress the balance. It becomes possible to bypass slow and error-prone analysis 'by hand' and obtain an instant statistical or graphical assessment of data. This strongly reinforces the interactive element which allows students to spot outliers and other problems in their data sets whilst they still have time to repeat observations or extend the range of conditions studied. A critical approach is encouraged and it is far more likely that students will complete laboratory sessions with readily interpretable and hopefully meaningful data sets. This achievement of itself will induce a much more positive approach to practical work.

Success in the introduction of computer-based methods, however, depends upon a number of factors and it is important to take a broad view of the role of these methods in the curriculum. Geographical considerations are important as the interactive element of computer use stressed above depends upon the computing facility being close to or in the laboratories concerned. The nature of the hardware and software to which the student has access must also be considered. In this context, vocational value is an important concern. Thus where a choice is possible, students should be directed towards a statistical package such as MINITAB or general spreadsheets such as LOTUS 1-2-3, VISICALC and SUPERCALC, especially if these have an inbuilt graphics capability. Expertise obtained in the use of these packages are far more likely

BIOCHEMICAL EDUCATION 16(4) 1988