4
TIBTECH - MAY 1987 [Vol. 5] Biotechnology information in Europe; problems and solutions Richard Wakeford Information will have an increasing impact on biotechnology. A co-ordinated European programme is required now to provide resources and services such as databanks and advanced computers to make a full contribution to efforts undertaken internationally. Biotechnology today is a knowledge- dependent industry; an industry that perhaps is unique in its requirement for rapid and efficient information. Imagine the needs of the research and marketing directors of a bio- technology company: in the chain of events that bring a molecule to market, they might install an auto- mated gene sequencer, access inter- national databanks, search the scien- tific literature, find out about pro- duction techniques and the latest manufacturing regulations, consult market research data, and analyse their customers and competitors. In the same way that biotechnology is spoken of as an enabling technology for the health-care or food industry, information is an equally pervasive and multi-disciplinary foundation element of biotechnology, and inno- vation and good management is vital for the success of any project. Many features of the information process are shared between bio- technology and other areas of high technology: there is a widespread need for the fast delivery of scientific literature or for understanding and analysis of new markets. However, in the space of a short article it will be most interesting to review some areas which are unique to bio- technology. Information and biotechnology are tightly coupled in two fundamental ways. DNA is the world's oldest Richard Wakeford is at The British Library, Biotechnology Information Ser- vice, 9 Kean Street, London WC2B 4AT, UK. This article represents the personal opinions of the author only. database with a one dimensional code closely mimicking the data strings used by digital computers. Biotechnology is also in the business of exploiting the full range of bio- logical diversity and any of the 50 000 proteins of the human body or any of the 2 million known organisms are sources of potential products. Here lies one way in which biotechnology differs from biological science where the em- phasis ~is much more on the com- plete understanding of a few para- digm systems. The advantages that will come from finding the right solutions to the questions posed by the inter- action of biotechnology and informa- tion technology are almost un- limited. In fact some major objec- tives such as the prediction of protein folding, and the mapping and sequencing of the human genome are inconceivable without massive computer support. And for culture collections, resources which have traditionally been fragmented and underserved, automated cata- logues and listings of strain data (e.g. MINE and MICIS - see Table 1) have meant that they can begin to become integrated into functioning networks and deliver their services in ways not possible until now. Europe and international collaboration Already there are signs that Europe is slipping behind the lead given by the USA and the rapid progress made in Japan. The tech- nological base for biotechnology (~) 1987, Elsevier Publications, Cambridge 0166- 9430/87/$02.00 information in Europe is now inade- quate for the demands being made upon it and the shortage of man- power trained in the hybrid skills of biotechnology and information tech- nology will soon limit the rate of further growth. In contrast, the US government is placing several hun- dred million dollars in this area and the National Institutes of Health (NIH), for example, recently com- missioned their own CRAY super- computer to undertake exhaustive pattern matching searches between sequences. Access to such machines in Europe however remains ex- tremely limited. At the moment, Europe and the USA enjoy an unrestricted and open collaboration, the link between NIH's GenBank (Genetic Sequence Databank) and the European Molecular Biology Laboratory (EMBL) being one not- able instance, but this can only continue on the basis of an equal partnership. As an aggregate re- search community Europe is as large as the USA but in its fragmented state there is no single country that can contribute on anything like equal terms. The message being voiced by agencies such as the Concentration Unit for Biotechnol- ogy in Europe (CUBE - part of the European Commission's Directorate for Science, Research and Develop- ment) is that Europe is in danger of finding itself excluded from some critically important areas of research. Biotechnology information has there- fore been targeted for support in the Bioinformatics Collaborative Pro- grammes and Strategy (BICEPS - see Table 2) which is budgeted to spend over ECU 100 million (US$114 mil- lion) on an automated laboratory, automated process plant and com- puterized training. The biological sciences are now entering a phase of international collaboration that has been manda- tory for the physics community over many years. The need for a European centre is now widely accepted and the natural choice would fall upon EMBL at Heidelberg where the development of computer resources, the Nucleic Acid Data Library and training programme is already well under way. The infrastructure for more distributed collaborative net-

Biotechnology information in Europe; problems and solutions

Embed Size (px)

Citation preview

Page 1: Biotechnology information in Europe; problems and solutions

TIBTECH - MAY 1987 [Vol. 5]

Biotechnology information in Europe; problems and

solutions R i c h a r d W a k e f o r d

Information wil l have an increasing impact on biotechnology. A co-ordinated European programme is required now to provide resources and services such as databanks and advanced computers

to make a full contribution to efforts undertaken internationally.

Biotechnology today is a knowledge- dependent industry; an industry that perhaps is unique in its requirement for rapid and efficient information. Imagine the needs of the research and marketing directors of a bio- technology company: in the chain of events that bring a molecule to market, they might install an auto- mated gene sequencer, access inter- national databanks, search the scien- tific literature, find out about pro- duction techniques and the latest manufacturing regulations, consult market research data, and analyse their customers and competitors. In the same way that biotechnology is spoken of as an enabling technology for the health-care or food industry, information is an equally pervasive and multi-disciplinary foundation element of biotechnology, and inno- vation and good management is vital for the success of any project.

Many features of the information process are shared between bio- technology and other areas of high technology: there is a widespread need for the fast delivery of scientific literature or for understanding and analysis of new markets. However, in the space of a short article it will be most interesting to review some areas which are unique to bio- technology.

Information and biotechnology are tightly coupled in two fundamental ways. DNA is the world's oldest

Richard Wakeford is at The British Library, Biotechnology Information Ser- vice, 9 Kean Street, London WC2B 4AT, UK. This article represents the personal opinions of the author only.

database with a one dimensional code closely mimicking the data strings used by digital computers. Biotechnology is also in the business of exploiting the full range of bio- logical diversity and any of the 50 000 proteins of the human body or any of the 2 million known organisms are sources of potential products. Here lies one way in which biotechnology differs from biological science where the em- phasis ~is much more on the com- plete understanding of a few para- digm systems.

The advantages that will come from finding the right solutions to the questions posed by the inter- action of biotechnology and informa- tion technology are almost un- limited. In fact some major objec- tives such as the prediction of protein folding, and the mapping and sequencing of the human genome are inconceivable without massive computer support. And for culture collections, resources which have traditionally been fragmented and underserved, automated cata- logues and listings of strain data (e.g. MINE and MICIS - see Table 1) have meant that they can begin to become integrated into functioning networks and deliver their services in ways not possible until now.

Europe and international collaboration

Already there are signs that Europe is slipping behind the lead given by the USA and the rapid progress made in Japan. The tech- nological base for biotechnology

(~) 1987, Elsevier Publications, Cambridge 0166- 9430/87/$02.00

information in Europe is now inade- quate for the demands being made upon it and the shortage of man- power trained in the hybrid skills of biotechnology and information tech- nology will soon limit the rate of further growth. In contrast, the US government is placing several hun- dred million dollars in this area and the National Institutes of Health (NIH), for example, recently com- missioned their own CRAY super- computer to undertake exhaustive pattern matching searches between sequences. Access to such machines in Europe however remains ex- tremely limited. At the moment, Europe and the USA enjoy an unrestricted and open collaboration, the link between NIH's GenBank (Genetic Sequence Databank) and the European Molecular Biology Laboratory (EMBL) being one not- able instance, but this can only continue on the basis of an equal partnership. As an aggregate re- search community Europe is as large as the USA but in its fragmented state there is no single country that can contribute on anything like equal terms. The message being voiced by agencies such as the Concentration Unit for Biotechnol- ogy in Europe (CUBE - part of the European Commission's Directorate for Science, Research and Develop- ment) is that Europe is in danger of finding itself excluded from some critically important areas of research. Biotechnology information has there- fore been targeted for support in the Bioinformatics Collaborative Pro- grammes and Strategy (BICEPS - see Table 2) which is budgeted to spend over ECU 100 million (US$114 mil- lion) on an automated laboratory, automated process plant and com- puterized training.

The biological sciences are now entering a phase of international collaboration that has been manda- tory for the physics community over many years. The need for a European centre is now widely accepted and the natural choice would fall upon EMBL at Heidelberg where the development of computer resources, the Nucleic Acid Data Library and training programme is already well under way. The infrastructure for more distributed collaborative net-

Page 2: Biotechnology information in Europe; problems and solutions

TIBTECH - MAY 1987 [Vol. 5]

- Table 1 Sources of biotechnology information in Europe

Information source Location

Molecular Data Nucleic Acid Data Library

SWlSS-PROT Protein Sequence Data Bank

PG-TRANS Protein sequence data bank

ENZIDEX Enzyme Data

Cellular Data MINE

Microbial Information Network in Europe

MICIS Microbial Information Service

CFISM French Database of Microbial Strains

In Vitro Conservation Database Plant cell culture techniques database

Hybridoma Databank

MIRDAB Microbiological Resources Databank

Bibliographic Databases Abstracts in BioCommerce

Commercial Abstracts

Biotechnology Abstracts Scientific Abstracts

Biotechnologies Scientific Abstracts

Current Biotechnology Abstracts Scientific Abstracts

European Molecular Biology Laboratory, Meyerhoffstrasse-1, 6900 Heidelberg, FRG

Department de Biochimie Medicale, Universit~ de Geneve, 1 rue Michel Servet, 1211 Geneve, Switzerland.

Computer Science Unit, Institut Pasteur, 28 rue du Dr Roux, 75724 Paris, Cedex 15, France

Biocatalysts Ltd., Main Avenue, Treforest Industrial Estate, Pontypridd CF37 5UT, UK

CAB International Mycological Institute (UK Node), Ferry Lane, Kew, Surrey TW9 3AF, UK

Laboratory of the Government Chemist, Cornwall House, Waterloo Road, London SE1 8XY, UK

INRA, Institut National Agronomique, 16 rue Claude- Bernard, 75321 Paris, Cedex 05, France

International Board for Plant Genetic Resources, Via Delle Terme Di Caracalla, 00100 Rome, Italy

University of Nice/CODATA Parc Valrose, 0604 Nice Cedex, France

Elsevier Science Publishers, P.O. Box 211, 1000 AE, Amsterdam, The Netherlands

BioCommerce Data Ltd., Old Crown Building, Windsor Road, Slough, Berks SL1 2DY, UK

Derwent Publications Ltd, 128 Theobalds Road, London WClX 8RP, UK

Centre de Documentation Scientifique et Technique, 26 rue Boyer, 75971 Paris, Cedex 20, France

Royal Society of Chemistry, The University, Nottingham NG7 2RD, UK

works is being established and one basic requirement, a European biotechnology research directory (BIOREP - see Table 2) is due to be started this year by the Organization for the Advancement of Pure Research (ZWO) in the Netherlands.

Attempts at European co- operation may of course always founder upon political differences outside science and it will be interesting to see how the future treats the efforts of the two major funding agencies concerned, the European Community (12 member states), EMBL (18 European member states plus Israel). At the time of

writing, the long term research plans of the European Community, such as BICEPS are uncertain for they are all bound within the FRAMEWORK programme which has been under intense debate within the Council of Research Ministers for several months.

Problems and answers If there is one over-riding problem

retarding the full impact of advanced information techniques it is the lack of awareness of their significance. When this is combined with slow returns from long term data collec- tion projects and resistance to the

diversion of funds from core experi- mental science, it is not surprising that progress is piecemeal. The pervasive nature of information also means that responsibility has been spread across several funding agen- cies. The effect of this division can be seen in the recent attempt by several UK research councils to set up a joint information network: this has now been shelved. However, many of the problems of working with information in biotechnology are common to all countries and institutions and often cannot be solved simply by the investment of money. Breakdowns in organization may have far more impact and can occur at any stage in the chain of collecting, processing and distribut- ing information.

It is common now for centres that collect data to find that although their users are enthusiastic about the continued enlargement of the data- base, they are sometimes reluctant to supply data themselves. Reasons range from the real barrier of commercial confidentiality to sim- ple laziness. But the serious problem here is that the incentives for scien- tists to contribute are few, database entries not being citable or publicly rewarded as are published articles. So a mechanism needs to be found for allocating professional prestige, a need that is becoming urgent in nucleic acid sequencing where fewer and fewer determined sequences can be published by conventional means. The time may yet come when we see the 'GenBank Top 100 Citations'. An alternative suggestion made in a recent report from the US National Research Council is that the deposition of data in a public data- bank should become a condition of grant award 1.

Effective transmission of informa- tion depends on common under- standing through data drawn up in a standard form and using a standard terminology. Some movement to- wards this ideal pre,Babel state has been made by GenBank and EMBL who have now agreed upon a com- mon record format and by the use of the Rogosa, Krichevsky and Col- well data standard 2 in the Microbial Strain Data Network (MSDN), a for- mat rich enough and flexible enough

Page 3: Biotechnology information in Europe; problems and solutions

TIBTECH - MAY 1987 [Vol. 5]

to accommodate any microbial rec- ord collection. The MSDN is spon- sored by the Committee on Data for Science and Technology (CODATA - a committee of the International Council for Scientific Unions) who has also been looking at the require- ments for protein sequence data. In practice though, it may be the de facto standards set by pace setting data collectors that determines what hap- pens rather than the deliberations of committees. However, for the naming of clones, strains and genetic elements, there will remain a need for international agreement parallel- ing the accepted systematic nomen- clature standard used at the organism level. The US National Library of Medicine is now attempting to cata- lyse efforts in this direction 1.

Data linkage between different levels of biological organization is a problem of a different order of complexity. For example, databases of protein sequences, like the Protein Information Resource, produced by the National Biological Research Foundation in Washington, exist in complete logical isolation from data- bases of higher orders of structures such as the Protein Databank from the Brookhaven National Laboratory. However, work is now underway at many laboratories in Europe, USA and Japan to link the two levels of description. Eventually this will provide the solution to the formi- dable problem of predicting how proteins will fold; then an even greater challenge will emerge - the mapping of biological structure to function. This achievement, a 'Holy Grail' of molecular biology rests far in the future but it emphasizes that there are still gaps, like the activity and function of molecules, in the systematic recording of biological knowledge.

Electronic networks now offer the cheapest and fastest communication medium for dispersed groups of scientists who wish to exchange data, access software and correspond with each other. Either a dedicated computer is plugged into a public telecommunications system or a group rents time on an electronic mailbox. Two networks are coming into widespread use in bio- technology at the moment: BIONET

- - Table 2 Biotechnology information projects receiving funds from the European Commission

Project Location

Nucleic Acid Data Library (1983-84) European Biotechnology

Information Project (1984-86) Hybridoma Databank (1986-) BIOREP- A database of ongoing

research (1987-) BIOROM - 'Biotech nology Abstracts'

on CD-ROM (1987-)

EMBL, Heidelberg The British Library

University of Nice Organisation for the Advancement of

Pure Research, The Netherlands Derwent, London/Telesysteme, Paris

BAP- Biotechnology Action Programme (1985-89) Protein Electrophoresis- Data capture, analysis and databank construction Automated DNA sequencing MINE - Microbial Information Network in Europe Automated control and monitoring of biotechnological processing Protein engineering software

Proposed projects BICEPS- Bioinformatics Collaborative European Programmes and Strategy.

1988-. Part of the FRAMEWORK research programme. Projects involve molecular modelling, advanced computing automated process engineering and computer aided learning.

and CODATA. BIONET is an exam- ple of the use of public telecom- munications. Intelligenetics Inc. has mounted sequencing software and the major molecular databanks on their own machine which then com- municates world-wide with scientists through the TELENET telecommuni- cations network. Bulletin boards have grown to cover topics ranging from requests for reagents or discus- sions on gene expression, plant molecular biology or oncogenes. However the system has been a victim of its own considerable suc- cess: and its performance degrades as more users come on-line. Intelli- genetics is currently enlarging their machine capacity and establishing satellite BIONET nodes. Although European users are welcome to BIONET they now need to have access to adequate capacity available on local BIONET-like systems. Limitations on networking will con- tinue to be felt for some years in the field of molecular design as high resolution graphics displays cannot run on low capacity networks such as TELENET, EARN (European Academic Research Network) or JANET (Joint Academic Network - UK) which support BIONET. A low cost solution to the communications problem has been taken by the CODATA network which runs en- tirely on the commercial DIALCOM system. Although at the moment used mainly for its mailbox facilities, it is expected to provide a directory to the Hybridoma Databank and to the

MSDN. An alternative strategy for distri-

buting information that has recently gained much attention is the use of compact discs (CD-ROMSs). A start in this direction is about to be made in the BIOROM project where, with the support of the European Commission, Derwent Publications will be publishing 'Biotechnology Abstracts' on a disc with search software supplied by Telesysteme. The opportunities open for the enor- mous storage capacity of CD-ROMs would be well exploited by sequence data which has now overflowed both the storage and processing capacities of personal microcomputers. Files of sequences which are now of the order of 10's of Megabytes can be easily held as well as pre-processed indexes and dictionaries of every record field and sub-structure. These would enable microcomputers to handle time consuming problems that would previously have been impossible 3.

The information business As well as stimulating the

development of the biotechnology industry directly through the intro- duction of new products, bio- technology information has commer- cial implications of its own. Pub- lishing, either on paper or electronically is the most obvious area of exploitation and there is always the risk that without a healthy home-grown trade, Europe will be forced to buy back the results

Page 4: Biotechnology information in Europe; problems and solutions

TIBTECH - MAY 1987 [V01.5]

of its own research from overseas suppliers. European companies are on the move however. The Swiss on- line host Datastar, for instance, now has its own marketing team in the USA selling British and Dutch bio- medical databases. Biotechnology produces large and complex data files and the stringent demands of handling material of this nature is exactly the sort of problem to stim- ulate developments in advanced hardware and software. To take two examples, transputer technology is being applied by Chemical Design Ltd, Oxford, UK to a high power graphics processor which would initially deal with protein structures and receptor-ligand docking prob- lems. The University of London to- gether with the Imperial Cancer Re- search Fund UK are developing a knowledge based system for mol-

ecular biology with GEC Ltd as part of the UK's 5th generation Alvey Programme.

Conclusion Is then the application of informa-

tion to biotechnology merely another technique, maybe on a par with chromatography or centrifugation, or is it a critical foundation element which should receive support even in the face of more immediate demands from experimental scien- tists? Certainly it is difficult at the moment to assess the impact of databases, databanks or graphical displays on any specific commercial product. On one hand, the influence of information is so pervasive as to be almost invisible, and on the other, the development of most specific applications is only at the beginning of a curve that will take 10 or 15

years to culminate. Growth in very few of these areas is unlikely to take place without the investment of public money on a large scale and it is to this difficult proposition that European funding agencies must now address themselves.

References 1 National Research Council (1986)

Nomenclature and Information Organ- ization. National Academy Press, Washington D.C.

2 Rogosa, M., Krichevsky, M.I. and Colwell, R.R. (1971) Int. J. Syst. Bacterio]. 20, 6A-175A

3 Coulson, A. F. W. and Collins, J.F. Biological Sequence Analysis. Ad - vanced Architecture Computing Requirements. Report prepared under contract with the European Economic Communities. Copies may be obtained from CUBE-DGXII, 200 rue de la Loi, B-1049 Brussels, Belgium

[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]

Controlling the risks to health and environment

from biotechnology- what is the European Community

doing? Cynthia Whitehead

The EEC is an economic community which must endeavour to unify the regulation of trade and environment among its member states. The establishment of international regulatory standards in bio- technology wil l be relatively less hindered by entrenched national legislation and practices than in other industrial sectors. Rapid progress towards a common biotechnology market wil l benefit both manufacturers and research in Europe, and provide common stan-

dards of health and environmental protection.

During the past three years, the Euro- pean Community (EC) has been quietly and steadily laying the foundation for a framework of poli-

Cynthia Whitehead is Editor of the Euro- pean Environment Review, 23, A v. Eisenhower, B-1030 Brussels, Belgium.

cies and laws that will identify and control the risks to human health and the environment that might arise from the commercial application of the many new techniques of genetic manipulation.

In fact, the work actually started somewhat earlier. In 1982 later than

in the USA, the Community looked at the risks that might come from r-DNA research, and adopted the Council (see Glossary) Recommen- dation 82/472/EEC on the registra- tion of work involving r-DNA. This urged the member states to set up national notification schemes and containment guidelines. Most of the member states have done so; several of the schemes are mandatory, some involve containment guidelines, but one member state in which consider- able research is going on - Italy - has nothing.

At the same time, the Commission of The European Communities (see Glossary) decided that the field of biotechnology applications covered such a diverse number of industrial sectors and products, that a cross- sectoral, community-wide approach was vital to ensure that European companies would be in a position to compete successfully on the world markets. Hence, in February 1982, the Biotechnology Steering Commit- tee (BSC) was set up to coordinate EC policies affecting or affected by developments in biotechnology. The BSC is composed of the directors- general of the Commission services concerned: internal market and industrial affairs; environment, con-

© 1997, Elsevier Publications, Cambridge 0166- 9430/87/$02.00