6
Neurotoxicologyand Teratology, Vol. 12, pp. 449--454. ©Pergamon Press plc, 1990. Printed in the U.S.A. 0892-0362/90 $3.00 + .00 Dealing With Large Data Sets I JOHN F. GRAEFE AND RONALD W. WOOD 2 Department of Environmental Medicine, New York University Medical Center 550 First Avenue, New York, NY 10016 GRAEFE, J. F. AND R. W. WOOD. Dealing with large data sets. NEUROTOXICOL TERATOL 12(5) 449--454, 1990.- Collection, storage and retrieval of large amounts of data from multiple experiments for subsequent reduction, graphing and statistical analysis need not be a burdensome task. Although turnkey systems may offer significant economies for single well-defined and repetitive tasks, they may not permit sufficient flexibility to achieve the diverse alms required by many research programs. Using popular microcomputers to run one or a few experimental subjects may confront the investigator not only with significant bookkeeping problems, but also with an allocation of labor resources to computer maintenance and support that might be better invested in research effort. By using networked minicomputers, economies of scale emerge both in data collection, transfer, reduction, and analysis, as well as in maintenance, support, and scientific effort. Data collection Data management Data analysis Network SKED RS/1 IN laboratories focused on behavioral pharmacology/toxicology and the experimental analysis of behavior, the use of computers to control experimental processes and to collect and store data has become as commonplace as the use of electromechanicai relays and counters was two decades ago. Although computers have been used for this purpose for quite some time (2), the choice among various system designs can be perplexing. One strategy is to use personal computers, each dedicated to run one or a few experimental subjects, but unless appropriately networked, this microcomputer approach confronts the investi- gator not only with the headache of tracking and assembling data from disks on each PC before performing overall analyses, but also with the maintenance of hardware and software on multiple instruments. Collection, storage and retrieval of large amounts of data from multiple experiments running simultaneously for subsequent re- duction, graphing and statistical analysis need not be burdensome in a minicomputer environment. The use of a minicomputer dedicated to process control and data collection and that is networked to a separate minicomputer for data reduction relieves the investigator from having to deal repetitively with individual data files and disks. In our laboratory, data are collected and stored on a single disk on a PDP11/73 and are transferred electronically to a MicroVAX II for subsequent examination and statistical analysis using simple command files. Only one local backup operation must be performed to archive data versus separate backups on each of several microcomputers. Computing needs in our laboratory are dictated by two projects, one focused on the behavioral effects of solvents subject to abuse by inhalation, and the second focused on the development of an animal model of cocaine smoking. The solvent project is currently examining the structure-activity relationships among the single and double constituent C7, C s and C9 alkyl benzenes across several different preparations using mouse, rat, and monkey. Behavioral effects are characterized in individual animals using enough replications to allow for regression analyses and simulta- neous comparisons within animals; the studies employ enough animals to permit quantitative estimates of the variation in sensi- tivity between animals. For example, spontaneous and food- reinforced activity are simultaneously studied in four 1.3-m 3 exposure chambers. One chamber holds 24 rats in suspended wire cages, one holds 24 mice in running wheels, and the two remaining chambers each hold eight rats in running wheels (four of which have feeders) for a total of 64 animals. Animals are typically exposed on Tuesday and Friday to one member of a series of six replications at each of six solvent concentrations, a typical design used in behavioral pharmacology. Activity counts from individual animals are recorded every 10 min, six hours per day, five days per week. A total of 11,520 numbers are recorded per week, 207,360 numbers per compound studied in 64 animals, or a total of 622,080 numbers per year for three compounds. A second focus is on schedule-controlled operant behavior in both rat and monkey. In the rat preparations, 18 animals per day can be studied in individual five-liter acrylic inhalation boxes; six animals per day are studied in the monkey preparation. Animals are run five days per week for 72 min each day, responses are recorded every two min, and six replications of six concentrations of a solvent are typically examined. Thus, the rat and monkey operant preparations can generate 174,960 and 58,320 numbers, respectively, for a total of 855,360 numbers annually. The initial phase of the cocaine project has been the develop- ment of a cocaine aerosol ("smoke") generator for delivery of measured atmospheric concentrations with a known particle size distribution appropriate for nonhuman primates. The primary tSupported by grants K02-DA00117, R01-DA05080, and R01-DA04438 from the National Institute on Drug Abuse. 2Requests for reprints should be addressed to Ronald W. Wood, A.J. Lanza Labs, Long Meadow Road, Tuxedo, NY 10987. 449

Dealing with large data sets

Embed Size (px)

Citation preview

Page 1: Dealing with large data sets

Neurotoxicology and Teratology, Vol. 12, pp. 449--454. © Pergamon Press plc, 1990. Printed in the U.S.A. 0892-0362/90 $3.00 + .00

Dealing With Large Data Sets I

J O H N F. G R A E F E A N D R O N A L D W. W O O D 2

Department of Environmental Medicine, New York University Medical Center 550 First Avenue, New York, NY 10016

GRAEFE, J. F. AND R. W. WOOD. Dealing with large data sets. NEUROTOXICOL TERATOL 12(5) 449--454, 1990.- Collection, storage and retrieval of large amounts of data from multiple experiments for subsequent reduction, graphing and statistical analysis need not be a burdensome task. Although turnkey systems may offer significant economies for single well-defined and repetitive tasks, they may not permit sufficient flexibility to achieve the diverse alms required by many research programs. Using popular microcomputers to run one or a few experimental subjects may confront the investigator not only with significant bookkeeping problems, but also with an allocation of labor resources to computer maintenance and support that might be better invested in research effort. By using networked minicomputers, economies of scale emerge both in data collection, transfer, reduction, and analysis, as well as in maintenance, support, and scientific effort.

Data collection Data management Data analysis Network SKED RS/1

IN laboratories focused on behavioral pharmacology/toxicology and the experimental analysis of behavior, the use of computers to control experimental processes and to collect and store data has become as commonplace as the use of electromechanicai relays and counters was two decades ago. Although computers have been used for this purpose for quite some time (2), the choice among various system designs can be perplexing.

One strategy is to use personal computers, each dedicated to run one or a few experimental subjects, but unless appropriately networked, this microcomputer approach confronts the investi- gator not only with the headache of tracking and assembling data from disks on each PC before performing overall analyses, but also with the maintenance of hardware and software on multiple instruments.

Collection, storage and retrieval of large amounts of data from multiple experiments running simultaneously for subsequent re- duction, graphing and statistical analysis need not be burdensome in a minicomputer environment. The use of a minicomputer dedicated to process control and data collection and that is networked to a separate minicomputer for data reduction relieves the investigator from having to deal repetitively with individual data files and disks. In our laboratory, data are collected and stored on a single disk on a PDP11/73 and are transferred electronically to a MicroVAX II for subsequent examination and statistical analysis using simple command files. Only one local backup operation must be performed to archive data versus separate backups on each of several microcomputers.

Computing needs in our laboratory are dictated by two projects, one focused on the behavioral effects of solvents subject to abuse by inhalation, and the second focused on the development of an animal model of cocaine smoking. The solvent project is currently examining the structure-activity relationships among the single

and double constituent C7, C s and C9 alkyl benzenes across several different preparations using mouse, rat, and monkey. Behavioral effects are characterized in individual animals using enough replications to allow for regression analyses and simulta- neous comparisons within animals; the studies employ enough animals to permit quantitative estimates of the variation in sensi- tivity between animals. For example, spontaneous and food- reinforced activity are simultaneously studied in four 1.3-m 3 exposure chambers. One chamber holds 24 rats in suspended wire cages, one holds 24 mice in running wheels, and the two remaining chambers each hold eight rats in running wheels (four of which have feeders) for a total of 64 animals. Animals are typically exposed on Tuesday and Friday to one member of a series of six replications at each of six solvent concentrations, a typical design used in behavioral pharmacology. Activity counts from individual animals are recorded every 10 min, six hours per day, five days per week. A total of 11,520 numbers are recorded per week, 207,360 numbers per compound studied in 64 animals, or a total of 622,080 numbers per year for three compounds.

A second focus is on schedule-controlled operant behavior in both rat and monkey. In the rat preparations, 18 animals per day can be studied in individual five-liter acrylic inhalation boxes; six animals per day are studied in the monkey preparation. Animals are run five days per week for 72 min each day, responses are recorded every two min, and six replications of six concentrations of a solvent are typically examined. Thus, the rat and monkey operant preparations can generate 174,960 and 58,320 numbers, respectively, for a total of 855,360 numbers annually.

The initial phase of the cocaine project has been the develop- ment of a cocaine aerosol ( " smoke") generator for delivery of measured atmospheric concentrations with a known particle size distribution appropriate for nonhuman primates. The primary

tSupported by grants K02-DA00117, R01-DA05080, and R01-DA04438 from the National Institute on Drug Abuse. 2Requests for reprints should be addressed to Ronald W. Wood, A.J. Lanza Labs, Long Meadow Road, Tuxedo, NY 10987.

449

Page 2: Dealing with large data sets

450 GRAEFE AND WOOD

, Ch.mber,,| JChamber 41 IChamber21 I a=~ats-°ll 8Ra~- I J 24 Mice- | J Wl-lee s J I wnepw I I Wheels I I II =-eeaers I I i X ! d

6 Inhalation i Chambers I

18 Rats - SCOB I I

1 Mouse Activity Inhalation I Chamber | 18 Mice

, - - r "

\ I i

f=

FORTRAN

[ Chromat~ograph / Integrator I

J 80286 I Microcomputer NODE: PC1

Mainframe I Network / E-Mail

I Br!ge I

FIG. 1. Flow diagram depicting the hardware and software utilized in experiment control and data collection.

instruments for this task are a gas chromatograph and integrator (GC) which is connected by a serial line to the MicroVAX and a particle sizing optical spectrometer (OPS) which is operated by software on a PC. The PC is attached to etheruet, thus allowing data files collected from the OPS to be written directly onto a "virtual" disk on the MicroVAX using the Personal Computer Systems Architecture (PCSA), which relies on the DECnet proto- cols. The GC generates 50 numbers and the OPS generates about 400 numbers per typical sample run. If 10 samples per day are taken, a total of about 4,500 numbers per day are recorded. Heart rate and operant performance are recorded during experimental sessions; heartrate is obtained from recording the interbeat interval with 10-msec resolution, while responses are recorded in two-min epochs. In a 60-min session approximately 36,000 numbers are recorded. For both the analytical and function measurements, a total of approximately 40,000 numbers per day or upwards of 4 million numbers per year could be generated if warranted.

Thus, in a year we anticipate collecting about 5 million numbers (not bytes). The management of a data base of this size is small in comparison to that of an airline reservation or banking system, and the computational effort involved in number-crunching and sorting data into daily tables is not nearly as intensive as in fluid dynamics, weather prediction, or computer-aided design. Basically, the task is the same one that behavioral pharmacologists have always done, except that there are five times the number of animals that Peter Dews (1982) said was possible to track before experimental error begins to creep in and obscure effects.

In order to conduct reliable scientific research and stay within budgets without a significant actuarial burden, our general strategy has been to purchase proprietary hardware and software products

(thus profiting from the extensive engineering invested in them and not having to design and test interfaces and operating system software) and to let service contracts for computer hardware and software maintenance. This leaves the investigators responsible only for data collection and analysis, and monitoring the continu- ing function of the instruments. Our systems have proven to be cost effective and powerful (Figs. 1 and 2).

The remainder of this paper examines: 1) the desirable at- tributes of a data collection system; 2) the requirements of data transfer, storage, and retrieval; and 3) the desirable attributes of a data reduction, analysis and graphics system.

DESIRABLE ATTRIBUTES OF A DATA COLLECTION SYSTEM

1) Reliability of hardware. This factor is critical to continuous data collection. The hobbyist "do-it-yourself" machine is "self- insured" and has hidden personnel costs when the equipment breaks and requires repairs, or requires finding someone else to perform either remote or onsite repairs. A proprietary instrument with a service contract for 24-hour repair from trained service company personnel assures the minimal interruption of data collection and analysis and the continued productivity of labora- tory staff.

2) Economy of scale. Duplicating microprocessor systems in the $2000-$5000 range, each with its own peripherals, may ultimately be costlier than purchasing one $8000-$10,000 ma- chine with a larger disk drive, fewer keyboards, monitors, peripherals, power supplies, and tape units. Furthermore, address space of a controlling microcomputer may limit data manipulation: thus, if local data reduction is required in a microcomputer

Page 3: Dealing with large data sets

D E A L I N G W I T H L A R G E D A T A SETS 451

Winchester Casette Library Disk Drive Tape Unit

\ / PDP 11/73 I NODE:RWW (End Node)

I

Winchester I 70 MB

Disk Drive

\ / L PDP 11/73 NODE:RWW 2

(End Node)

I

1.2 MB | Floppy I Disk Drive

I " I 95 MB MICROVAX II

Cassette NODE:MVAX Tape Unit (Routing Node)

I i

I I) I Tape Disk Ddves Library (684 MB Total)

Terminal I Server - 16 Ports

/ \ I P,,n,°rs I I ,o,m'n~,s I

I 8o2e6 I

Microcomputer I NOOE : PC 1 |

/ \ 20 MB I 1.2 MB I Winchester Floppy

Disk Drive Disk Dr,ve

FIG. 2. Diagram of the hardware configuration utilized for data collection, storage and retrieval.

Do j = l t o 2 4 ; /* make individual rat tables from a mastertable containing all rat data */

Make Table ( 'RAT'. j) from table ('Mastertable') where column RAT = j;

/* make a series of graphs of total responses across daily sessions */

Make graph ( 'RAT' . j /Calendar ' ) from column DATE of table ( 'RAT'. j) vs column TOTAL RESPONSES;

Display graph ( 'RAT'.j . 'Calendar ') ;

/* make a table of summary statistics where columns are different doses */ Allocate table 'Summary' 20 rows by 8 columns; Do k = 1 to 8; /* 6 concentrations + air & control */

MEASURE Column TOTAL RESPONSES of Table('RAT'.j) where column DOSE = k;

if table exists( 'Summary') then

{ADD Column to Table 'Summary' from Table 'Measures'} else

{make Table 'Summary' from Table 'Measures'} SET ROW DOSE COL K to K / * Enter Dose for Column */

End;

/* make a series of dose-effect graphs with SEMs for each rat */

Make graph ("DoseEffect".j) from row DOSE of table 'summary' vs row MEAN; ADD ERROR BARS to CURVE 1 of GRAPH ("DoseEffect".j) from row STDERROR of

Table 'summary'; Plot graph ("DoseEffect".j)

End;

FIG. 3. Example of RS/1 Research Programming Language used to make individual animal data files, daily session data plotted over actual calendar days, summary statistics tables, and concentration-effect graphs with S.E.M.s for individual animals.

Page 4: Dealing with large data sets

452 GRAEFE AND WOOD

1 INTER- 2 INTER- 3 INTER- 4 ... 5 INTER- 6 DATE 7 SOLV 8 CONC VAL 1 VAL 2 VAL 3 VAL 36

1 2

3 4

5

220 222 224 ... 53 II-MAR-87 0 0 213 203 173 ... 124 12-MAR-87 0 0

195 175 235 ... 36 13-MAR-87 0 AIR

220 223 227 ... 27 17-MAR-87 0 AIR 153 ii0 134 ... 199 18-MAR-87 0 0

9 TOTAL i0 LOG10 ii LOG SUM - 12 LAST30 13 LOG 14 LOG SUMLAST30 RESP LOG THURS LAST30 -LOG THURS

1

2

3 4 5

4867 3.687261 0.143317 280 2.447158 -0.176091 3499 3.543944 0.000000 420 2.623249 0.000000

4167 3.619824 0.170808 36 1.556303 -0.663806 4057 3.608205 0.159190 52 1.716003 -0.504105 3592 3.555336 0.106321 233 2.367356 0.147248

FIG. 4. Example of a portion of an individual animal daily data file.

environment, data sorting or description may be awkward and/or time consuming as intermediate files or disk resident arrays are brought in and out of memory.

3) Independent starting and stopping of experiments. Single program path environments, such as those using BASIC or RT11, can require the investigator to anticipate and load every parameter at once at the start of the session. The investigator must anticipate required service for any condition of every experiment on each path through the program's cycle. This can cause severe problems in debugging as well as limit the experimenter's ability to modify parameters while an experiment is in progress.

4) Multiple user access. Several experimenters must be able to access the machine simultaneously for program preparation, ex- periment control, and data handling.

5) Easy exportation of data to other computing environments. 6) Local backup capabilities. How clumsy will it be a year

from now to find your data spread across 400 diskettes? How important is loss of data to you? Backing up data regularly on removable media of some form is essential.

To attain these attributes we chose a Digital Equipment Corporation (DEC) Micro PDPll /73 system with 4 megabytes (mb) of memory (2 mb will do), a 70-mb nonremovable disk, and

6 0 0 0

4 0 0 C

aooo

0 27-0CT-67

0

0

0

• C o n t r o l O 3 0 0 0 ppm 0 4 0 0 0 ppm

06-NOV-B7 i6-NOV-67 26-NOV-B7

FIG. 5. Example of a portion of an individual animal's performance across calendar days.

a 95-mb tape unit for software distribution and backup. We use Coulbourn interface modules (Coulbourn Instruments, Inc.) and SKED-11 (State Systems, Inc.) software for experiment control and online monitoring of experiment status. Raw binary data files are formatted and converted to machine-independent ASCII data files using Fortran 77. This machine can also be used for further data reduction, although we no longer do so.

THE REQUIREMENTS OF DATA TRANSFER, STORAGE, AND RETRIEVAL

Data Transfer

To move data between machines requires either: a) compatible media between machines so that disks or tapes can be carried between machines ("sneaker net"); b) the use of asynchronous communications programs like Kermit, xmodem or others to move files between machines; these are usually multistep procedures that require operator execution, and are limited to conventional baud rates (e.g., 9600 bps); or, c) the use of networks. Networks entail additional expense, but have major advantages: 1) speed--10 megabits/sec with ethernet; 2) direct compatibility with institution- wide computing systems; 3) given privilege, access to all ma- chines on the network from one seat at one terminal; 4) multiple concurrent sessions, e.g., examine experiment-control on one or several machines, do data reduction and analysis or work on manuscripts or correspondence on another machine; 5) easy syntax for moving files (e.g., copy a to b); 6) capability for complete automation, e.g., copy on time of day.

For data transfer DECnet has been adopted; this is a proprietary protocol which is in wide use and has a very high degree of reliability and security.

Data Storage

For storage purposes, there is a large disk capacity on a central MicroVAX; 684 mb divided across three disk drives: one system disk and two data disks, one of which can stand in for one of the other two should it fail; estimated use is about 15 mb per month in our environment.

A 95 mb-cartridge tape drive is used for weekly backups (slow, but 35 cents/mb), and four sets of tapes are rotated to minimize potential data loss. When an experiment is completed, the raw binary data is archived on tape and then deleted from the disks. Backup tapes of the operating system and archived data are stored in a vault for physical security.

Page 5: Dealing with large data sets

DEALING WITH LARGE DATA SETS 453

0 Statistic 1 0.000000 2 air 3 178.000000

1 Count (N) 26.000000 16.000000 6.000000 2 Sum 88102.000000 62813.000000 24816.000000 3 Mean 3388.538462 3925.812500 4136.000000 4 SEM (s.e. of mean) 229.874067 233.032584 666.927282 5 Median 3392.500000 4027.000000 4564.000000 6 Variance 1373894.258462 868866.962500 2668752.000000 7 StDev (sd) 1172.132355 932.130336 1633.631537 8 Maximum 5703.000000 5503.000000 5690.000000 9 Minimum 760.000000 1986.000000 2176.000000

i0 Range 4943.000000 3517.000000 3514.000000 ii Skewness 0.009963 -0.365674 -0.375960 12 Kurtosis 0.175232 0.068746 -2.462357 13 Mean+2SEM 3848.286596 4391.877668 5469.854565 14 Mean-2SEM 2928.790327 3459.747332 2802.145435

0 Statistic 4 300.000000 5 560.000000 6 I000.000000

1 Count (N) 6.000000 6.000000 6.000000 2 Sum 33661.000000 25374.000000 18678.000000 3 Mean 5610.166667 4229.000000 3113.000000 4 SEM (s.e. of mean) 320.840918 750.410110 420.425182 5 Median 5680.000000 4397.500000 2932.000000 6 Variance 617633.366667 3378692.000000 1060544.000000 7 StDev (sd) 785.896537 1838.121868 1029.827170 8 Maximum 6637.000000 6327.000000 4327.000000 9 Minimum 4666.000000 2088.000000 1645.000000

i0 Range 1971.000000 4239.000000 2682.000000 ii Skewness -0.069509 -0.172171 -0.056658 12 Kurtosis -1.542608 -2.057119 -1.000018 13 Mean+2SEM 6251.848502 5729.820220 3953.850363 14 Mean-2SEM 4968.484831 2728.179780 2272.149637

0 Statistic 7 1780.000000 8 3000.000000

1 Count (N) 6.000000 2.000000 2 Sum 7027.000000 1229.000000 3 Mean 1171.166667 614.500000 4 SEM (s.e. of mean) 168.172217 300.500000 5 Median 1129.000000 614.500000 6 Variance 169691.366667 180600.500000 7 StDev (sd) 411.936120 424.971175 8 Maximum 1664.000000 915.000000 9 Minimum 557.000000 314.000000

i0 Range 1107.000000 601.000000 ii Skewness -0.218929 12 Kurtosis -0.587222 13 Mean+2SEM 1507.511100 1215.500000 14 Mean-2SEM 834.822233 13.500000

FIG. 6. Example of a summary statistics table showing control (0 ppm), air and six concentrations of a test atmosphere.

Retrieval

By keeping large amounts of data on line, the necessity of maintaining extensive hardcopy catalogues and tape or disk libraries from which data must be physically retrieved is obviated. Because the storage capacity of 5.25-inch Winchester disk drives is becoming enormous, the economy of scale is dramatic; by purchasing large capacity drives for a MicroVAX, one can achieve almost an order of magnitude savings in cost per unit of storage.

As in most laboratories, we have experienced disk failures, and the existence of our backup tapes has permitted us to come back up painlessly after hardware replacement. As disks get larger, backup becomes more burdensome, unless an incremental backup strategy is adopted, or a larger capacity tape technology is adopted.

DESIRABLE ATTRIBLrFES OF A DATA REDUCTION, ANALYSIS AND GRAPHICS SYSTEM

1) Ease of use of any software system is paramount, so that the

Page 6: Dealing with large data sets

454 GRAEFE AND WOOD

naive user is encouraged to jump right in; extensive defaulting and native language syntax are very helpful to this end. The investi- gator should be able to ask a naive technician to make a graph and that person should have success on the first try.

2) The system should either be menu driven or should have a graphical user interface for the naive user.

3) The user should be able to address and retrieve data sets quickly and automatically using a syntax for generating data set names, e.g., by concatenating strings of letters and numbers under program control, so that data structures are addressable by animals, treatments, sex, or other attribute.

4) There should be macro or programming language capability for the automation of repeated sequences of commands.

5) The user should be able to edit all objects, i.e., data tables, graphs, barcharts, macros, and programs.

6) There should be device independence, i.e., any graphics terminal, laser printer, plotter, or dumb device should be sup- ported.

7) The system should allow the user to work at a fast enough pace so that he/she doesn't feel put upon. Speed reflects not just central processing unit (CPU) speed but data throughput (hardware and software).

8) Color displays are very helpful for rapid examination of complex data sets, otherwise unique line types and symbols must be used to extract the finding of interest.

9) Independence of data format restrictions is quite useful; for example, being able to plot data using the date or the time of day as the abscissa can be advantageous.

10) Efficient use of storage media is important. Space should not be allocated to hold nothing; garbage should be collected and data compressed.

11) The system and its software should allow access by multiple users to shared procedures and data thus reducing duplication of effort.

12) Security of data sets from inappropriate mixing, manipu- lation, or vandalism is imperative.

13) There should be extensive documentation, easy manage- ment and installation, online help, and a phone support hotline.

14) There should be enough statistical capability to perform routine tasks and some not so routine tasks, but not so much as to duplicate other major packages (e.g., SAS, BMDP).

A product developed and marketed by Bolt, Beranek and Newman (BBN Software Products Cow), RS/1 and a companion statistical advisor, RS/Explore, meets or exceeds all of the above requirements. This product was originally bought for the PDP11/ 73. It was a major advance for this laboratory, since it is much faster and easier to use than other programming and data reduction environments with which we have had experience. Figure 3 is an example of the programming language and some of the descriptive statistics capabilities of RS/1. Figures 4, 5, 6 and 7 show the tables and graphs produced by the program shown in Fig. 3.

When this software environment was no longer supported on this machine, we bought a MicroVAX to implement it on a stable computing platform. At about the same time the MicroVAX was purchased, DECnet became available. Given the reduction in hardware prices which have occurred in the past few years, the whole environment consisting of a MicroVAX, the Micro PDP11/ 73, interface hardware, and software can be purchased for about $40,000. It would not be possible with our limited manpower to

400¢

>

~ 200C

a~r t 78 300 560 t000 1780 3000

CONCENTRATION

FIG. 7. Example of an individual animal concentration- effect graph (mean --+2 s.e.m.) made from a table similar to that shown in Fig. 6.

undertake the effort to handle such large data sets without this environment.

We have a final word about flexibility and obsolescence. One of the great advantages of the early equipment used in behavioral pharmacology and toxicology was the flexibility and rapidity with which procedures could be changed; this ability to ask scientific questions quickly facilitated innovation and the fleshing out of the literature. Even then the hardware imposed limits on the questions asked; the limits now are frequently the effort and sophistication required to implement the experiment. This ability to do things quickly and simply is a major consideration today when we ask students to become simultaneously not only psychologists, phar- macologists, and toxicologists, but also computer programmers. The computing environment chosen directly influences the actual behavior of the scientist at the bench and the amount of effort necessary to ask interesting scientific questions.

There is an old saying that if you can buy a piece of equipment it is already obsolete. There is a lot of wisdom in this observation which reflects the product development cycle of the computer industry, a cycle largely driven by the joint actions of engineering advances and marketing strategies. The experimenter considering a system purchase must realize the possibility that the competen- cies developed in a given environment may eventually be useless if the cost of maintaining the system becomes too great or parts cease to be available. An examination of the history of computing systems across the last twenty years shows us a littered landscape of once useful systems that have fallen by the wayside as either too clumsy, too weak, or too expensive to maintain. The secret of success in this area is choosing systems that can be rapidly configured to ask the scientific questions at hand at a reasonable cost in both manpower and equipment. This choice is heavily influenced by the state of the computer industry as a whole. The researcher should be satisfied that the chosen computer "hotbox," today's dream machine, will have a product lifecycle long enough to comprise a suitably large proportion of his or her research career.

REFERENCES

1. Dews, P. Epistemology of screening for behavioral toxicity. In: 2. Wood, 17,. W.; Sette, W. F.; Weiss, B. Languages for psychologists. Mitchell, C. L., ed. Nervous system toxicology. New York: Raven Am. Psychol. 30:230-238; 1975, Press; 1982:229-236.