Hogarth* 1 - ISSCT Hogarth Computerisation of... · I D.M. HOGARTH AND J.C. SKINNER 48 1 The clone summaries have proved to be very useful when assessing the perfor- mance of a series

D.M. H o g a r t h * and J.C. Skinner*" 1 Bureau o f Sugar Exper imen t Stations, "Bundaberg and ""Gordonvale,

Aus t ra l i a

ABSTRACT The data bases developed by BSES to store and retrieve plant breeding

data are described. The type of data required for input and the reports generated are described.

Computerisation has greatly increased efficiency in the plant breeding program, and has made it possible to make more complex decisions. Problems encountered during the development of the data bases are discussed briefly.

INTRODUCTION I Although computers have been used for more than 20 years for statistical

analysis of clone trials and research data, it is only in the past 10 to 15 years that they have been used extensively for the storage and retrieval of plant breeding data. Plant breeders create a vast quantity of data which are very amenable to computer storage and subsequent retrieval.

The development of computer techniques in plant breeding has been aided by the improvement in data base management packages and the more efficient key to disk data input methods which replaced paper tape and punch cards. The HSPA was probably the first sugarcane research organisation to use computers effectively, and their early system was described by Meyer2. This system has recently been completely revised, and is now much easier to operate (Wu, personal communication). Although many research stations have developed computer systems, the methods adopted have usually not been published. This paper discusses the data bases developed by BSES and their advantages.

DATA BASES

All data bases are maintained on an HP 1000 system situated in the BSES Head Office in Brisbane, some 1500 km from the breeding station at Meringa, near Cairns. Obviously, this distance has created some problems, but these should be overcome, to some extent, when communications are established between the computer in Brisbane and a personal computer at Meringa.

Four major data bases are maintained:

(a) Clone data file;

Keywords: Sugarcane, breeding, computers, data bases

D.M. HOGARTH AND J.C. SKINNER

(b) Crossing register;

(c) Crossing chart;

(d) Clone index.

The type of data stored and the reports generated form the basis of this paper.

Clone data file

The clone data file was designed to record the results of all clone yield trials and pathology trials. It also records information on the movement of all clones introduced from overseas and the exchange of clones between districts within Australia.

Clone yield trials

When a clone yield trial is analysed statistically, the computir has been pro- grammed to create a file in which all relevant information on the trial is stored. This file is then used as an input file for the data base. Data stored are:

Trial code name; date of harvest; location of trial; names of standard clones; mean cane t/ha, c.c.s., and sugar t/ha for standard clones; for each clone, mean cane t/ha, c.c.s., sugar t/ha, and net merit grade (Skinner3). In addition, 't' tests comparing clone means with standard means are calculated for each character, and the probabilities of obtaining these 't' values are also stored.

Reports on individual clones or particular districts can be requested. In a report trial code, standard clone names, date of harvest, cane t/ha, c.c.s., sugar t/ha, and net merit grade are all printed for each trial of each clone. In addition, probabilities and average standard values are printed. Parentages are also reported.

After all data on a clone have been printed, two summaries of the clone performance are printed. In the first summary, means for cane t/ha, c.c.s., sugar t/ha, and net merit grade are compared with standard means for all plant crops, all first ratoon crops, all other ratoon crops, and over all trials. For net merit grade, probabilities from each trial are combined into a combined probability using the method of Fisher described by Steel and Torrie4. In the second summary, a modification of the gain-even-loss system used by HSPA is presented (Meyer2). In the BSES system (called the LEG system) there are five categories based on the net merit grade probabilities. The probability (p) ranges are :

p < -0.95 significantly inferior to.standards. -0.95 ( p ( -0.50 inferior to standards. -0.5 < p < 0.5 'equal' to standards. 0.5 ( p 0.95 superior to standards.

p > 0.95 significantly superior to standards.

An example of the summaries is given in Table I. Both summaries show that the clone concerned improved, relative to standard clones, in ratoon crops. For example, the LEG scores show that in three out of five older ratoon trials, the clone was significantly superior to the standard clones. The yield summary shows that this was largely due to much higher yields of cane, although the sugar content was also higher.

', i

I D.M. HOGARTH AND J.C. SKINNER 48 1

The clone summaries have proved to be very useful when assessing the performance of a series of clones. As data accumulate, the clone data file also provides a very useful summary of the performance of standard clones relative to one another.

Pathology trials

BSES conducts statistically designed and analysed trials to test the resistance of clones to Fiji disease, leaf scald, red rot, and mosaic. Results-for all these trials are stored in the clone data file. Information stored on trials includes: disease name; trial code number; number of replicates; correlation for standards.

The correlation referred to is that between per cent infection of standard clones and their known disease rating. For each clone, per cent infection (if applicable) and rating (ISSCT 0-9 scale) are stored.

When clones are first planted in a disease trial, their names are entered into the data file. This prevents clones that are in current trials from being replanted in new trials; previously, checking for this possibility was very time-consuming. TO indicate the type of information produced in a typical report, an extract showing the disease ratings of Q124 is produced in Table 11.

The pathology report has been useful, not only to pathologists but also to plant breeders who use it when making decisions on the future of clones. I t is also an important source of information for the crossing chart data base because disease ratings play an important role in the assessment of parent clones.

Clone exchange

The names of all foreign clones requested by plant breeders are stored in the data base. Data stored include year of request from plant breeders; year requested from Head Office; year received in quarantine; germination; year released from quarantine; stations to which clone distributed; comments.

Reports generated enable plant breeders to monitor what has happened to clones in which they are interested. The reports are also useful when compiling a list of clones to import each year. All local clones being exchanged between districts within Australia are also entered on the data base.

Crossing register

The BSES plant breeding program is based on a proven cross system which I

is based on selection rates from crosses at all stages of selection. The selection data used for making decisions on the status of crosses are stored in crossing registers, one for each of the four selection zones in Queensland. For each cross, the register records the germination, number of seedlings planted to the field, and number of selections at each stage of selection. From stage three onwards, the serial numbers of all selections are also recorded. From the crossing registers, selection rates for each stage of selection are calculated and all the data available on a cross for a ten-year period are considered when deciding whether a cross should be proven or not.

The system of recording:data manually and then assessing the crosses was very time-consuming and prone to eiror.. This was an obvious field for improving efficien-' cy by using computers. In the'computer system that has been developed, ten years

\

D.M. H ~ G A R T H AND J.C. SKINNER 483

of data are stored. Thus, when a new series of data is about to be input, the oldest series is deleted. The data base consists of ten data sets, the most important of which are: PVARS: a master data set containing basic information 011 parent clones. CROSS: a detail data set containing basic information on each cross for each year on each station. SETT30, YOT, FI: detail data sets containing basic information on each cross in selection stages 3, 4, and 5 respectively. TOTALS: a detail data set containing total selections for each stage of selection for each year.

There are additional files outside the data base system. Two files have lists of female and male parents, and other files have lists of serial numbers of clones selected at various stages from each of the ten current series. The parentage of each clone in these latter files is also recorded in coded form.

The data for input into the system are provided from Annual Seedling Reports prepared at each Experiment Station with a selection program. These reports are almost identical to those used with the manually-recorded system. However, punch- ing of data onto disk files is simple, rapid, and accurate compared to entering data into crossing registers. After data have been punched and checked, they are quickly entered onto the data base, and crossing registers can be produced.

The program that produces the crossing register alsg.calculates cross ratios for each year that a cross is planted. The cross ratio is the ratio of the selection rate for the cross compared to the selection rate for the whole population. It is calculated for the latest stage of selection reached by the cross. For example, if some members of the population have reached stage four, the cross ratio is calculated on selection rates to stage four only; selection rates to stage three are ignored. If the cross has progeny in earlier stages of selection due to different selection pathways or repeating of clones at some stages (Skinner3), an estimate is made of the number of clones expected to be selected to stage four. The total number of clones expected to be selected is used in the calculation of the cross ratio.

The latest stage of selection is used because some crosses may have high selection rates into stage three, for example, because of high sugar contents, but few of the clones are selected for stage four because of poor cane yield. Such a cross is of little value, and it would be misleading to use stage three selection rates in addition to stage four selection rates. Similar arguments apply to populations that have reached stage five. If crosses cannot produce clones which are worth selecting for stage 5 trials, they are not of much use to the program.

If a cross has been planted several times in the last ten years, a weighted estimate of all cross ratios is calculated. Cross ratios are weighted according to the number of clones planted in the original seedlidg population, and more weight is given to

/ the most recent series. If large numbers are planted from a cross in the most recent few years and selection rates are low', the cross will be penalised severely. This has been most useful for discarding crosses susceptible to rust disease which did not enter Australia until 1978. Susceptible crosses could be removed relatively rapidly even if they had performed well prior to the rust epidemic. Nevertheless, some of

BREEDING

these crosses persisted for a few years longer than they should have, a point which indicates that the proven cross system is slower to adapt to a sudden change of selection priorities than is desirable. In such cases, it is probably more expedient for the plant brecder to over-rule the decision printed by the computer than to try to change the rules that are usually effective.

An example of the printout produced from the crossing register data base is given in Table 111. The figures shown in parenthesis are the numbers of clones still current in a particular stage. If a number is followed by a ' + ', it means that there are clones current at an earlier stage of selection. Using the weighted mean cross ratios, proven cross status is allotted to crosses as follows:

Cross ratio Cross slatus 0 - 0.8 Discard 0.9 - 1.5 Single repeat proven cross 1.6 - 2.5 Double repeat proven cross 2.6 - 3.5 Triple repeat proven cross

3.5 Quadruple repeat proven cross

If seed is available, proven crosses are planted each year until the cross ratio is reduced to less than 0.9. The number of seedlings planted per cross varies according to the cross status, wifh the greatest number being planted from quadruple repeat crosses. Surprisingly few crosses are planted more than three times, which suggests that the population mean is continually improving, or that very few crosses are capable of producing clones that will be selected for stage five (farm introductions in Table 111). Selection errors may also be an important factor, because the cross ratio has a fairly low broad sense heritability. A genetically inferior cross may have high selection rates in the initial planting because of a large positive environmental effect, or a large positive cross x year interaction. Such a cross is likely to be discarded after one or two more plantings.

Crossing chart

The crossing chart data base was developed to assist with the problem of choos- ing the experimental crosses with the most potential for producing commercial clones. Before computerisation, compilation of a list of proposed crosses required staff to check the following records: parentages, to avoid inbreeding; proven cross list, to check if proven; crossing register, to check if made in previous years; crossing index, to check if already made in current year; stored seed list, to check if seed held in cold storage.

Experimental crosses were often chosen by intuition, based on the plant breeders' intimate knowledge of the performance of particular parents in previous years. Compilation of the crossing list was very time-consuming and required the attention of experienced plant breeders. This list is prepared on a daily basis during the crossing season, when staff are fully occupied. With the computerised system, the task is much less costly in terms of time, and specialist plant breeders can leave most of the work to experienced technicians.

The data base is relatively simple, as it consists basically of only two data sets:

486 BREEDING

CLONE - a master data set containing information on each parent clone:

(a) Parentage

(b) Free-flowering or not

(c) Disease ratings (0-9 scale) for leaf scald, Fiji disease, rust, yellow, spot, and mosaic. I

(d) Number of crosses tried but not proven (that is failed crosses) on each of the four Experiment Stations I

(e) Net merit grades on each of the Experiment Stations I (f) Sugar content relative to a standard clone for each Experiment Station I (g) Total proven cross status for each Experiment Station 1 (h) Breeding codes for eadh Experiment Station (0-9 scale where 9 is best and

0 is worst).

CROSS - a detail data set containing information on each cross:

(a) Cross status for each Experiment Station = 0 for experimental cross = 1 for failed cross = 2 for proven cross

(b) Proven cross status for each Experiment Station = 0 if not proven = 1 if single repeat = 2 if double repeat = 4 if triple repeat = 6 if quadruple repeat

(c) Cross in field for each Experiment Station = 0 not in field = 1 in field

(d) Number of germination failures

(e) No. of bags of seed in store

(f) Cross made in current season for each Experiment Station = 0 not made = 1 factorial polycross = 2 biparental cross

In the data set clone, the total proven cross status for a parent clone is the sum of the proven cross status of all proven crosses including that parent. The breeding code is derived from a function that takes into account net merit grade, disease ratings, number of failed crosses, and sum of proven cross status both for the particular station and for all other stations. This function provides a value called a breeding estimate which is then subjected to a square root transformation, standardised, and expressed on a 0-9 scale. The emphasis given to each component in the function is currently under review, using results from cross evaluation trials which include

488, BREEDING

station. Parent clones that have been used in fewer than five crosses are indicated on

the printout with a 'U' and some preference is given to such parents. Often 'U' parents will have relatively low breeding codes due to lack of information so they are usually crosses with well-performed parents with high breeding codes.

An example of a crossing chart is shown in Table IV. This table is only a small extract from a crossing chart, using parents that demonstrate some of the symbols used. The first experimental crosses chosen would be '9' crosses involving a 'U' parent, such as the crosses with H56-752. The parents shown in Table IV are above average; most charts have over 50 per cent of the crosses discarded and relatively few crosses have predicted values of '9'.

4

TABLE IV. Extract of a crossing chart for the Northern Station

NORTHERN 30105185

7 C H M 6 7 Q 3 P 4 1 6 7 1 C 4 9 4 N N 2 4 4 - 7 2 3 1 8 - 1 - 0 7 7 1 0 4 0 8

0 4 4 8 1

9 6 6 6 9 4 9 U

58A515 3 $ - - - - - 6 63B47 7 8.6.7 $ % 6 9. C0740 6 7.- 6.6 9.5 8. CP57-526 8 8.$ 7 - 9.6 9. H56-752 9U 9.6.7 7 - 7 9. 63N165 5U 7.- ( 5 9. 5 7 Q120 8 7 6 7 % - ) )

NOTES : 1. Males are shown at the top of the chart and females along the side. 2. Breeding codes of parents are shown below the male parents and

beside the female parents. 3. U indicates the parent has not been used in five previous crosses. 4. Symbols in chart.

(a) $ = seed in store, e.g. 58A515 x 73C487 (b) O/O = proven cross, e.g. 63B47 x 66N2008 (c) ( = progeny would be disease susceptible

e.g. 63N165 x H49-104 (d) ) = progeny would be inbred, e.g. Q120 x Q121 (e) 9. = predicted value of cross + cross has predicted value > 7 on

at least two other stations, e.g. H56-752 x 73C487.

D.M. HOGARTH AND J.C. SKINNER 489

Several other printouts are produced from the crossing chart data base. These are:

Breeding clones file. In this listing, parentage, disease ratings, net merit grades, breeding codes, sums of proven cross status, and numbers of failed crosses are printed for each clone in the parent collection. Where relevant, the information for each Experiment Station is printed.

Proven cross list. This is a list of all proven crosses, showing for which station(s) the crosses are proven, the status of the crosses for those stations, and the predicted values for the stations on which the crosses are not proven. This printout also calculates how many bags of seed are required from each cross, which makes it possible to decide how many times prticular crosses should be made during a crossing season.

Index of stored seed and crosses made. At the end of each crossing season, this listing shows the name of each cross for which seed is held in store as well as the names of crosses made during the current year. This listing produces the cross status for each cross, i.e. the predicted value or proven cross status for each Ex- periment Station.

List of 'dot' crosses. This is a listing of all crosses with a '.' plus crosses with high predicted values on two stations. This listing is only used if the crossing charts cannot be produced for some reason. It is a relatively difficult listing to use because of its bulk, but it does enable an experimental cross list to be compiled in an emergency.

Clone index

All breeding stations have large collections of clones, either in breeding blocks or selection blocks. The collections are dynamic with many clones being added or discarded each year. Frequently, a plant breeder wishes to locate a particular clone on the station. In addition, decisions have to be made on each clone each year, e.g. should it be planted i n a breeding block, should it be heat-treated, should it be discarded? Both these problems are greatly simplified by the computerised clone index. The data base itself is simple, but the decision-making process is complex, and draws on information from the Clone Data File and the Crossing Chart data bases as well as information stored in the Clone Index.

The system comprises three data sets: VARS - a master data set containing general information on each clone,

including:

(b) Number of proven parents crossed with the clone (c) Number of proven crosses with the clone (d) Sum of proven cross status (e) Net merit grade in Meringa farm trials (f) Number of years planted in breeding block (g) Latest year planted in breeding block (h) Number of years that clone had 'tassels

FIELD - a master data set containing general information on each block:

490 BREEDING

(a) Type of block B = breeding P = propagation S = selection T = introduction W = wild cane

(b) Crop class 0 = plant 1 = first ratoon 2 = second ratoon etc.

VDATA - a detail data set containing specific information about each clone in each block:

(a) Size of plot (b) Code - this code indicates what type of block is involved. For example,

breeding blocks may be BP or BR depending on whether the block is plant or ratoon.

The major listing produced is a printout of all ,clones, showing the blocks in which they are planted, and the size of the plots planted. In addition, a decision on the future of each clone based on the performance of clones in yield trials and as parents is printed. For clones planted in breeding blocks, the printout indicates the number of rows that should be planted, based on the breeding value of the clone and on the average number of tassels produced per row in previous years. If the clone is an important parent but produces few tassels per row, the printout indicates that the clone should be planted on a substation where flowering is expected to be more profuse than at Meringa. A secondary listing is a printout of all clones that should be included in the breeding clones list.

An important decision to make is which clones should be discarded. Rules for discarding clones proved difficult to develop, but are now well established. By using a combination of estimated breeding value and number of failed crosses with a clone, it has been possible to discard a reasonable number of clones each year, without discarding potentially good parents. Parents with high breeding codes must have been tried in at least ten crosses before they are flagged for discard. The software for this data base has been the most difficult to develop, but the gain in efficiency and time saved in making decisions are proving worthwhile.

DISCUSSION

The data bases developed by BSES have resulted in significant gains in efficiency. They have released specialist plant breeders and technicians from much repetitive book work, making time available for more productive work. The computer systems are also more accurate, and it is possible to use more complex rules for making decisions.

Possibly the most important aspect of any computer system is the initial specifications given to the programmer. While a considerable amount of thought went

D.M. HOGARTH AND J.C. SKINNER 49 1

into writing detailed specifications, it was frequently found that unforeseen compli- cations arose when data were first processed by computer. Therefore, it is most important that programs be dynamic and easily modified to cater for unexpected changes in specifications. All the BSES software was created 'in-house', so modifications were accomplished easily, but this can be an expensive item if software is developed by a commercial firm.

Computers are fast, accurate, and reliable; however, there is a consequent danger that results produced by a computer will automatically be accepted as correct. It is important to monitor the computer printouts in case unexpected changes have occurred. For example, when the crossing chart data base commenced in 1976, breeding codes were normally distributed. Since that time, the breeding clone population has increased substantially in size and composition, and it was found in 1984 that breeding codes were no longer normally distributed. This has since been rectified.

Future development of computer systems may include a program to print a list of the optimum crosses to make with a given number of tassels per clone. If successful, this would make more time available to consider modifications to the computer list. It would be tempting to accept the computer's list as being optimum, but it is important that any computer program should facilitate rather than replace the plant breeder's control of the crossing program.

REFERENCES

1. Berding, N. and Skinner, J.C. (1980). Improvement of sugarcane fertility by modification of cross-pollination environment. Crop Sci. 20, 463-467.

2. Meyer, H.K., Heinz, D.J., Lawrence, E., Kimura, N. and Ladd, S.L. (1974). Computer processing of sugarcane yield, breeding and selection records. Proc. Int. Soc. Sugar Cane Technol. 15th Cong. 24-35.

3 . Skinner, J.C. (1965). Grading varieties for selection. Proc. Int. Soc. Sugar Cane Technol. 12th Cong. 938-949.

4. Steel, R.G.D. and Torrie, J .H. (1960). Principles and Plocedures of Statistics. McGraw- Hill Book Company.

Documents

Hogarth* 1 - ISSCT Hogarth Computerisation of... · I D.M. HOGARTH AND J.C. SKINNER 48 1 The clone summaries have proved to be very useful when assessing the perfor- mance of a series