22
DEPARTMENT for ENVIRONMENT, FOOD and RURAL AFFAIRS CSG 15 Research and Development Final Project Report (Not to be used for LINK projects) Two hard copies of this form should be returned to: Research Policy and International Division, Final Reports Unit DEFRA, Area 301 Cromwell House, Dean Stanley Street, London, SW1P 3JH. An electronic version should be e-mailed to [email protected] Project title Biometrics website for statistics support of biological research DEFRA project code HH3809SX Contractor organisation and location HRI-Wellesbourne Total DEFRA project costs £ 68,955 Project start date 01/05/03 Project end date 31/03/04 Executive summary (maximum 2 sides A4) Purpose of the project In a new Joint Code of Practice for Research issued by BBSRC, Defra, FSA and NERC at http://www.Defra.gov.uk/science/Quality/default.asp , it is emphasised that appropriate statistical validation of the experimental plan and the procedures for the analysis of data must be addressed at the planning stage of any research project. The proper use of appropriate biometrics methodology is an essential component of quality assurance in all areas of biological research and the overall purpose of this project is to develop computer software for improving biometrics support for biological research. CSG 15 (Rev. 6/02) 1

Research and Development - GOV.UKrandd.defra.gov.uk/Document.aspx?Document=HH3809S…  · Web viewThe main aim of the current project was to explore options for future development

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research and Development - GOV.UKrandd.defra.gov.uk/Document.aspx?Document=HH3809S…  · Web viewThe main aim of the current project was to explore options for future development

DEPARTMENT for ENVIRONMENT, FOOD and RURAL AFFAIRS CSG 15Research and Development

Final Project Report(Not to be used for LINK projects)

Two hard copies of this form should be returned to:Research Policy and International Division, Final Reports UnitDEFRA, Area 301Cromwell House, Dean Stanley Street, London, SW1P 3JH.

An electronic version should be e-mailed to [email protected]

Project title Biometrics website for statistics support of biological research     

DEFRA project code HH3809SX

Contractor organisation and location

HRI-Wellesbourne          

Total DEFRA project costs £ 68,955

Project start date 01/05/03 Project end date 31/03/04

Executive summary (maximum 2 sides A4)

Purpose of the project

In a new Joint Code of Practice for Research issued by BBSRC, Defra, FSA and NERC at http://www.Defra.gov.uk/science/Quality/default.asp, it is emphasised that appropriate statistical validation of the experimental plan and the procedures for the analysis of data must be addressed at the planning stage of any research project. The proper use of appropriate biometrics methodology is an essential component of quality assurance in all areas of biological research and the overall purpose of this project is to develop computer software for improving biometrics support for biological research.

The motivation for the project is the development of computer virtual advice systems to provide scientists and statisticians with interactive advice about the design of efficient and effective experiments.

Aims of the project

1. To research the potential of virtual biometrics advice resources for providing support for biological research using interactive web-based facilities.

2. To develop http://www.hri.ac.uk/ExperimentalDesigns/Website/hri.htm into a fully functional test site for developing web-based biometrics support methods

The initial design website was developed by a summer student in 2001 and the main aims of the current one-year Defra funded project were to upgrade that website to provide an enhanced basic facility for the design and analysis of simple experiments and to assess future options for virtual biometrics support.

The website statistics software engineCSG 15 (Rev. 6/02) 1

Page 2: Research and Development - GOV.UKrandd.defra.gov.uk/Document.aspx?Document=HH3809S…  · Web viewThe main aim of the current project was to explore options for future development

Projecttitle

Biometrics website for statistics support of biological research     

DEFRAproject code

HH3809SX

The primary aim of the project was to couple the existing website to a suitable statistics package to provide the algorithms needed for a general design website. This was achieved by coupling the website to the major open-source statistical package R at http://cran.r-project.org/. The R statistical software now provides the underpinning statistical algorithms needed for effective design construction. The website was also coupled to the GenStat software package under a verbal agreement from http://www.nag.co.uk/stats/tt_soft.asp allowing GenStat to be used on an internet server to provide dummy analysis of variance for constructed designs. GenStat is well regarded by agricultural statisticians and researchers but license restrictions mean that GenStat cannot be freely distributed with the project software. Therefore it is important to use R for the essential underpinning design algorithms and to use GenStat only for optional functions such as exploring a dummy analysis of variance after a design has been completed.

The browser software The current internet site at http://www.hri.ac.uk/ExperimentalDesigns/Website/hri.htm uses Microsoft Active Server Pages for the website server pages and Microsoft Visual Basic for the website scripting language. However, we have found numerous problems in using this technology for our scientific website and believe that further development using this technology would be extremely difficult. Instead, we believe that Java http://java.sun.com/ multi-platform technology using the Java Server Pages (JSP) is more appropriate and we recommend that the future development of the website should be Java based.

The website functionality

The current website contains a range of menu options for design construction and an extensive range of documentation describing the functions of the website and providing background information on the principles and practice of good experimental design. The menu design options comprise: i) General Block and Covariance Designs allowing for the construction of any general block design and possibly including plot covariates to allow for known pre-treatment plot effects.

ii) Latin Squares designs allowing for direct construction of Latin squares. These squares can have factorial treatment structure or control treatments and there is a standard GenStat analysis for the constructed design.

iii) Incomplete Latin squares designs allowing for the direct construction of incomplete Latin squares obtained by omitting a single row from a complete Latin square.

iv) Trojan Squares allowing for the construction of all possible Trojan designs Trojan squares of size up to and including designs of size (9 x 9)/8, which should be adequate for all practical purposes.

v) Incomplete Trojan Squares designs allowing for the construction of all possible incomplete Trojan designs of sizes up to and including designs of size (9 x 8)/8 adequate for all practical purposes

vi) Split-Plot Designs allowing for the construction of blocked split-plot designs with any number of main plots within replicates and any number of sub-plots within main plots.

The Project User Group Report

A steering group of agricultural scientists and statisticians from a range of Institutes including IGER, ADAS, the Stockbridge Technology Centre, Nottingham University, Queen Mary University of London, HRI East Malling and HRI Wellesbourne was established to advise on the progress of the website project in 2003-2004. The group held a single meeting at HRI-Wellesbourne on the 4th Feb 2004 to review progress.

Following a demonstration of the development site, there was a round-table discussion and most attendees felt that the work had potential not only for practical use by scientists and statisticians but also for teaching and training. The relative merits of a website versus a downloadable application were discussed and a number of

CSG 15 (Rev. 6/02) 2

Page 3: Research and Development - GOV.UKrandd.defra.gov.uk/Document.aspx?Document=HH3809S…  · Web viewThe main aim of the current project was to explore options for future development

Projecttitle

Biometrics website for statistics support of biological research     

DEFRAproject code

HH3809SX

attendees thought that the simplicity and convenience of a website were very important for biological users. However, specialist statisticians thought that a downloadable version of the software would provide more flexibility of access than a simple website and it was recommended that a downloadable version of the software, in addition to an internet website, would be valuable

Future development

The current project has created an operational website for the design of statistical experiments and provides advice and support for the construction of a range of standard types of experiments. In addition, we have provided facilities for the construction of efficient non-standard incomplete block and covariance designs using a simple browser interface. The consensus view of the user group was that the website software was potentially useful and provided a substantial opportunity for development of useful support software for research but that a downloadable version in addition to an internet website would also be valuable.

Our recommended future development is to convert the existing website development to a Java based website using JSP technology and to provide both an internet server based website and a downloadable version. The downloadable version could be run on local machines using a virtual server based on the freely available Java Virtual Machine technology and could be underpinned by the open-source statistical software package R. It appears feasible to bundle all the necessary software into a self-installing package on a CD or at a suitable software website such as the R site http://cran.r-project.org/. The bundled package would then install automatically on the user machine to provide a local server for the website on the user machine. The advantage of this approach would be that the same software could be used both to supply an internet server site and to supply a virtual server for local machines.

Provided that the future development can be based on open-source software using cross-platform technology and provided that the development uses high-quality and reliable algorithms there should be future opportunities to attract research funding and research effort from a range of agencies and organisations interested in ensuring high quality research using designed experiments. .

CSG 15 (Rev. 6/02) 3

Page 4: Research and Development - GOV.UKrandd.defra.gov.uk/Document.aspx?Document=HH3809S…  · Web viewThe main aim of the current project was to explore options for future development

Projecttitle

Biometrics website for statistics support of biological research     

DEFRAproject code

HH3809SX

Scientific report (maximum 20 sides A4)

1) Introduction1.1) Biometrics support for biological researchIn a recently commissioned Defra report (Assuring the Quality of Defra Research, Risk Solutions, July 2002), the need to assure the statistical validity of research was emphasised. One conclusion of the report was that there was a requirement for formal statistical design of research and it was recommend that when reviewing research proposals greater attention should be given to the statistical design of the experiments at this stage. This recommendation would ensure that Defra would have the necessary confidence in the results, or at least understand the limitations. However, the Risk Solutions report also noted concerns that research organisations lacked the skills and resources to implement a quality management system fully.

In a new Joint Code of Practice for Research issued by BBSRC, Defra, FSA and NERC at http://www.Defra.gov.uk/science/Quality/default.asp, it is emphasised that appropriate statistical validation of the experimental plan and the procedures for the analysis of data must be addressed at the planning stage of any research project. The proper use of appropriate biometrics methodology is an essential component of quality assurance in all areas of biological research and the overall purpose of the current project was to develop computer software for improving biometrics support for biological research.

Biological and environmental research organisations have traditionally maintained a strong biometrics capability. The need for proper statistical support for biological research continues to be an absolute requirement for good quality biological research but much of the traditional institute support for biometrics has now disappeared. It is therefore timely and relevant to consider new ways in which the demand for biometrics support for research in the biological sciences can be met.

One important future source of biometrics support is likely to be increased use of computer virtual advice systems and it can be assumed that expert virtual advice systems will eventually play an important role in disseminating biometrics advice. The specialist skills and experience needed for the design and analysis of research projects will increasingly be supplemented and supported by computer based systems that will provide advice at the planning stage of projects. Eventually, it is likely that these systems will provide full in-depth advice in areas such as the design and analysis of experiments and power studies for proposed research projects.

1.2) Statistical Design of Experiments

Good experimental design is essential for ensuring reliable and cost effective research. The principles of good design are well understood and have two key requirements. First, the design should estimate the required model or treatment effects and second, the design should estimate the reliability of the estimated model or treatment effects. In biological research, the special problem that complicates good design is the high natural variability of biological material and it is essential to use some form of replication to account for variability between individual experimental units. Furthermore, it is often necessary to arrange experimental units into blocks of units to provide precise comparisons between units within the same block. Properly designed biological experiments estimate not only the treatment effects themselves, but also provide estimates of the precision of the treatment effects so that their significance can be assessed relative to the natural background variability. Properly designed biological experiments can be regarded as the definitive scientific methodology for biological research.

1.3) Choice of a good statistical design

CSG 15 (Rev. 6/02) 4

Page 5: Research and Development - GOV.UKrandd.defra.gov.uk/Document.aspx?Document=HH3809S…  · Web viewThe main aim of the current project was to explore options for future development

Projecttitle

Biometrics website for statistics support of biological research     

DEFRAproject code

HH3809SX

Although the principles of good statistical design are well understood, the practical application of good design can be difficult. First, the design must fit the appropriate treatment model using an appropriate choice of treatment design. Second, the design must fit and allow for natural variability using an appropriate choice of block design. The proper choices of block and treatment design, together with an appropriate level of replication, are the key components of good experimental design. However, the proper choice of these key components is often complicated by factors such as the magnitude of the background variability, the required level of precision and the likely magnitude of any assumed block effects. Good design requires that all these factors be considered simultaneously. Where the parameters or assumptions of a design are uncertain, it is often necessary to undertake a full design study and to choose the most appropriate design for the required purpose.

1.4) Motivation for the development of design and analysis of experiments software

The literature on experimental design is enormous but many papers are of theoretical interest only. Many of the remaining designs are so highly specialized that they are of little general use. However, there remains a substantial body of relevant statistical and biometrical literature for biological research. These include the standard designs such as randomized blocks, Latin squares and split-plot designs and also include the important classes of Trojan and incomplete Trojan designs recently discussed by Edmondson (1998) and Edmondson (2002). General incomplete block designs and general covariance design are also important for practical research.

Although there is a large amount of literature information available on the design of experiments, the practical application of good design in experimental research is often limited by lack of specialist skills and training. The essential motivation for the development of good design of experiments software is to provide support for scientists in the design of efficient and effective experiments.

1.5) Currently available design and analysis of experiments software

i) Packages and free-standing software

The current availability of design software is extensive and there are many current software packages. See for example the listing of design software available at http://lib.stat.cmu.edu/. Also, many commercial packages such as GenStat (http://www.nag.co.uk/stats/tt_soft.asp), SAS (http://www.sas.com/) and Stat-Ease (http://www.statease.com/) contain design software components. However, none of these packages meet all of the criteria for a general design and analysis of experiments package, as listed above. Most of the freely available packages are either highly specialized for a particular class of design or offer only very simple general design facilities. We are not aware that any of the above mentioned design packages has been linked to any open-source statistical packages, nor are we aware that any of the above mentioned design packages can be accessed in a simple way using simple browser technology. The commercial design packages are linked to commercial software that requires expensive license agreements and is highly restrictive. None of the stand-alone design packages that we have examined cover the range of practical designs discussed in this project and none of the current design software packages that we have examined are integrated with any major free statistics software packages.

ii) Website software

The availability of scientific websites is developing rapidly and it is clear that the technology is now available to make interactive web based software a practical reality. There are a number of current websites that have some design of experiments component and one of the most general appears to be: http://www.webdoe.cc/main.php. Although this site has excellent interactive facilities, it appears to be directed mainly towards industrial response surface type designs and is not suitable for biological research. Nevertheless, the site is well documented and provides an excellent example of the potential for website development. 1.6) Website versus free-standing design software

CSG 15 (Rev. 6/02) 5

Page 6: Research and Development - GOV.UKrandd.defra.gov.uk/Document.aspx?Document=HH3809S…  · Web viewThe main aim of the current project was to explore options for future development

Projecttitle

Biometrics website for statistics support of biological research     

DEFRAproject code

HH3809SX

There are two distinct approaches to providing virtual biometrics support, one based on web-site software and the other based on free-standing software. Essentially, web-based software runs on a remote server and is accessible via the internet through a browser interface whereas free-standing software runs directly on the user machine. Server based systems provide easy user access and can be maintained and updated centrally whereas free-standing applications require software to be installed and maintained on the user machine. However, server based systems have certain security restrictions on accessing data files and software on the user machine whereas free-standing applications have full access to user software and files. Also, the freedom to run very large iterative routines on servers must be restricted to ensure availability to other users whereas, on a local machine, the user is free to run any size of job.

The advantage of a server-based system is that biologists and occasional users can access the software immediately without the need to download or install software packages. Most users are already familiar with browser technology and web-sites provide the widest possible access for users. A further advantage of a server is that installation and maintenance is the responsibility of the provider and the user should have no need to download and install special software

For the occasional user, the advantages of a web-site service are very substantial and the current project will concentrate on the development of a web-based service. However, as it likely that regular users of design software will want a downloadable version, the project also considers future options for providing downloadable software.

2) Background to the 2003-2004 project

2.1) The aims of the project

1. To research the potential of virtual biometrics advice resources for providing support for biological research using interactive web-based facilities.

2. To develop http://www.hri.ac.uk/ExperimentalDesigns/Website/hri.htm into a fully functional test site for developing web-based biometrics support methods

The initial design website project was developed during summer of 2001 by Yvonne Walker in part completion of her MSc in Software development at Coventry University. The main aim of the current one-year Defra funded project was to upgrade that website to provide an enhanced basic facility for the design and analysis of simple experiments and to assess future options for virtual biometrics support. The primary aim of the upgrade was to couple the website to a statistics software package to provide an interactive statistics software engine for the website. Coupling the website to a statistics software package allows the use of powerful statistical algorithms for constructing general designs and also allows the statistical properties of constructed designs to be explored interactively at the design stage.

The current work is intended as the first year of a rolling project to develop a fully functional virtual biometrics advice facility and the project report will provide a blueprint for the future development of this facility. The project will also provide an upgraded version of the existing website and this will provide an important component of any future virtual biometrics support site.

2.2) The project user groupAt an early stage of the project, a user group of agricultural scientists was established to provide user feedback and to inform and modify the development of the project. The current group comprises:Rob Jacobson Stockbridge Technology CentreMartin Broadley Sutton Bonnington, University of NottinghamDouglas Wlilson ADASHelen Ougham IGER Ruth Sanderson IGERDavid Pink HRI-WellesbourneGraham King HRI-Wellesbourne

CSG 15 (Rev. 6/02) 6

Page 7: Research and Development - GOV.UKrandd.defra.gov.uk/Document.aspx?Document=HH3809S…  · Web viewThe main aim of the current project was to explore options for future development

Projecttitle

Biometrics website for statistics support of biological research     

DEFRAproject code

HH3809SX

Dave Simpson East Malling Research Colin Campbell East Malling ResearchSteve Gilmour Queen Mary University of London

The group was established to consider the requirement for biometric software for the design and analysis of experiments and the group was scheduled to meet and report on the initial pilot project during Feb 2004.

2.3) The main project areas of work

i) The statistics engine

The website development undertaken by Yvonne Walker in 2001 provided a novel and useful way of presenting simple statistical designs using a server and a web browser interface. The flexible and interactive development of simple designs was of considerable interest but, as originally developed, the website provided little more than could be obtained from a good textbook. The accessibility and interactivity of website software provided a new and potentially important way of presenting statistical designs to users but the full potential of the software could be developed only if the browser was linked to a powerful statistical software package that could provide the necessary tools for constructing and testing advanced statistical designs. Therefore a major component of the current project was the investigation of a statistics software package suitable for linking to a browser interface.

ii) The browser software

Although the initial software development was undertaken by Yvonne Walker using Microsoft ASP and Visual Basic technology, other possible technologies are available and it was thought important to investigate the most appropriate software for a scientific website development. Computer technology and software languages are evolving at a very fast rate and it is important to make the right choices if a website project is to have long term viability. Ideally, the software development needs to be undertaken by the scientific community but this will not happen unless the software can be made easily available using a widely accepted technology. An appropriate choice of technology is therefore crucial for the future development of the project and we have investigated and reported a range of options.

iii) The website functionality

The main aim of the current project was to explore options for future development of the website browser software and to achieve this it has been necessary to develop a sufficient level of functionality to allow the future potential of the site to be visualised. Therefore the project work has aimed to develop a functional website that can be used not only for standard text-book designs but also for exploring and optimising new designs. We have provided a range of options for constructing standard designs and have also developed simple iterative search algorithms for block and covariance designs.

3) The Statistics Engine i) Background

Good design software needs to be linked to a major software package to provide access to the statistical and mathematical algorithms that are needed to construct and test efficient designs. The browser website software must be linked directly to a major statistics software package that can interact with the browser interface and execute any necessary statistical operations automatically in response to browser commands. The most direct approach, and the one used here, is to use the website program language to construct appropriate statistical programs for an underpinning statistical package and then to execute the programs via an appropriate batch shell command. These operations can be carried out automatically and invisibly during the design construction stage. Batch operations are less interactive than true graphical user interfaces (GUI’s) but are far simpler to implement and should provide sufficient interactivity for the highly structured and defined environment envisaged for this project.

CSG 15 (Rev. 6/02) 7

Page 8: Research and Development - GOV.UKrandd.defra.gov.uk/Document.aspx?Document=HH3809S…  · Web viewThe main aim of the current project was to explore options for future development

Projecttitle

Biometrics website for statistics support of biological research     

DEFRAproject code

HH3809SX

The two necessary requirements for a suitable statistics software package are that the program commands can be executed automatically in batch mode via a suitable shell command and that the statistics package is freely available for internet access and for distribution to any server. In practice, this latter requirement implies the use either of open-source software or of software with a special license agreement that allows unrestricted use of the necessary design algorithms.

ii) GenStat

GenStat http://www.nag.co.uk/stats/tt_soft.asp is a well-established and well-regarded statistical software package with a strong tradition in agricultural research. Most of the UK Agricultural Research Institutes use GenStat as a front-line statistics package for the analysis of data from designed experiments. However, GenStat is licensed software and appears to be relatively little used outside the area of agricultural research. Although the project has an agreement from the NAG organization allowing GenStat to be used as a statistics engine for the internet website (personal communication from Roger Payne), it is impracticable to tie the development of the website to the GenStat package. To do so would be highly restrictive for further development of the website, as it would be impossible to distribute the software for use by anyone except GenStat licence holders. It is essential that the core statistics algorithms used to underpin the main design engine must be open source software that can be freely-distributed for use on any machine.

iii) RThe R statistical program language http://cran.r-project.org/ is a major software resource that is freely available and widely used throughout the world. The R software initiative is supported and developed by a very large user group and R will become an increasingly important statistics software resource for the future. The free availability and widespread distribution of R makes it an ideal resource for a design website engine. R can provide all the necessary statistical and mathematical algorithms required for design construction and R can be freely downloaded to any user machine. Software developed using R can be freely distributed to any user at any time for any purpose.

The R project is widely supported by the research community and there are already many contributed packages from research workers and research groups. Currently, rather few packages provide support for experimental design but a relatively recent contributed package (Feb 2004) called AlgDesign provides algorithms for constructing incomplete block designs and some classes of factorial and fractional factorial designs. Although we have not yet had time to evaluate AlgDesign fully, it is possible that some of the algorithms from this package may provide direct underpinning support for some of the functions needed for the design website. It seems likely that, in the future, further packages will be developed that may provide additional underpinning support for the website statistical engine.

iv) Dummy statistics analysis of variance

It is clear that to ensure the future viability of the project and to ensure maximum distribution and uptake, the underpinning statistics software for the website must be based on the R language. These algorithms are invisible to the user therefore the user has no need to understand or be aware of the language used for the underpinning design algorithms. However, once the design has been constructed, it can be useful to examine the constructed design using a dummy analysis of simulated data, or indeed of a large number of dummy analyses, if power studies are to be undertaken. Such dummy analyses can be useful for displaying the propertied of a constructed design and can be useful for choosing a suitable design for data with known properties.

Obviously, the results of a simulated analysis must be displayed to a user in a conventional output file and it is beneficial to display the output using a statistical language that is familiar to potential users. As a major group of potential users for the design software will be agricultural research scientists familiar with GenStat, (see steering group membership) there is considerable potential benefit in providing visible statistical output using GenStat. As the underpinning statistical algorithms are completely separate from the dummy analysis algorithms, it is perfectly feasible to use R for the main statistics website engine while using GenStat for the visible front-end dummy analysis visible to the user.

CSG 15 (Rev. 6/02) 8

Page 9: Research and Development - GOV.UKrandd.defra.gov.uk/Document.aspx?Document=HH3809S…  · Web viewThe main aim of the current project was to explore options for future development

Projecttitle

Biometrics website for statistics support of biological research     

DEFRAproject code

HH3809SX

Our current website development uses R for the underpinning design algorithms but uses GenStat for the dummy analysis of variance displayed to the user.

4) The browser software

i) Microsoft ASP

The current internet site at http://www.hri.ac.uk/ExperimentalDesigns/Website/hri.htm uses Microsoft Active Server Pages (ASP) for the website server pages and Microsoft Visual Basic for the website scripting language. Although ASP and Visual Basic has been a highly successful technology for the development of commercial websites, we have found numerous problems in using this technology for a scientific website.

Visual Basic is a very simple program language that can be used efficiently only for relatively simple tasks. Also, Visual Basic allows data variables to change type according to circumstance and, while this certainly increases flexibility, it has proved a major source of unpredictable behaviour in our website. For example, unless data types are declared explicitly they can sometimes be treated as strings instead of integers giving totally unexpected behaviour.

Also, the ASP technology mixes HTML and Visual Basic script indiscriminately and can be very difficult to debug. We believe our current website is now working satisfactorily but we also believe that further development of the website using this technology would be extremely difficult

A further problem with Microsoft ASP is that, for security reasons, the execution of shell commands from within ASP is extremely restricted. We have succeeded in executing the necessary batch shell commands for running batch jobs in R or GenStat using Microsoft Internet Information Server (IIS) with Windows 2000 or with Microsoft Server 2000 but we have been completely unsuccessful in executing the necessary shell command using the Windows XP or Microsoft Server 2003 operating systems. This means that we cannot run the current ASP website using the Windows XP or Microsoft Server 2003 operating systems.

ii) Microsoft ASP.NET

Microsoft has recently introduced a new technology called ASP.NET. This technology is intended to overcome some of the problems associated with ASP and we have explored the possible use of ASP.NET technology for the design website. VB.Net http://msdn.microsoft.com/vbasic/ is a new version of Visual Basic that is intended to provide an improved scripting language for website development. Although VB.Net does, indeed, overcome some of the problems associated with the old Visual Basic, it remains a very simple high-level language and appears unsuitable for serious scientific programming work.

The ASP.NET language appears to have overcome some of the security issues associated with ASP as we have found, for example, that execution of the ASP.NET shell batch command using IIS is now feasible using IIS on Windows XP (we have not tested the Server 2003 operating system). Therefore the use of ASP.NET together with a more powerful scripting language such as say C++ or C# could be a feasible option for developing a design website. However, the ASP.NET family is licensed Microsoft software and Microsoft products are usually restricted for use only with Microsoft operating systems. Thus ASP.NET appears to be available for use only with IIS. This is a serious restriction as scientific applications need to be open-source and cross-platform to ensure maximum uptake by the scientific community. For this reason, further development of the website software using Microsoft based technology is not recommended.

CSG 15 (Rev. 6/02) 9

Page 10: Research and Development - GOV.UKrandd.defra.gov.uk/Document.aspx?Document=HH3809S…  · Web viewThe main aim of the current project was to explore options for future development

Projecttitle

Biometrics website for statistics support of biological research     

DEFRAproject code

HH3809SX

iii) JAVA technology

Java http://java.sun.com/ technology is a multi-platform technology that can be used to support web server pages using Java Server Pages (JSP) http://java.sun.com/products/jsp/ technology. Java Server Pages (JSP) provides a simplified, fast way to create dynamic web content and enables rapid development of web-based applications that are server- and platform-independent. JSP technology can be implemented using open-source servers available from the Apache foundation at http://jakarta.apache.org/tomcat/ and we have tested the use of an Apache Tomcat server on a Microsoft XP personal computer and have succeeded in launching the R batch shell command from a snippet of Java program code. We therefore anticipate that it is both feasible and practicable to convert the present website to a JSP site and to replace the VB scripting language with Java. The advantage of this conversion would be that the design website could then be run either as an internet based service from any remote server or as an application on a local machine using the freely available and cross-platform Java Virtual Machine (see http://java.sun.com/).

Our recommendation for the future development of the website is therefore a Java based site using the Java program language together with JSP.

5) The website functionality

i) Background Information

The current website contains a range of menu options for design construction and has an extensive range of documentation describing the functions of the website and providing background information on the principles and practice of good experimental design. Although the current site is limited in scope and provides only part of the functionality required from a complete design website, it does, nevertheless, provide a useful facility that can be used for practical design purposes by scientists and students. The current site also provides a valuable test-bed for development work on a larger and more comprehensive future design site.

The main information pages currently available at the site include:

An introduction to the websitehttp://biometrics.hri.ac.uk/ExperimentalDesigns/Website/designexperiments/introduction.htm

Background information on designed experimentshttp://biometrics.hri.ac.uk/ExperimentalDesigns/Website/designexperiments/background.htm

Brief explanation of the design types available at this websitehttp://biometrics.hri.ac.uk/ExperimentalDesigns/Website/designexperiments/designtypes.htm

Brief overview of the analysis of data (to be updated)http://biometrics.hri.ac.uk/ExperimentalDesigns/Website/designexperiments/analysis.htm

Additional links and references (to be updated)http://biometrics.hri.ac.uk/ExperimentalDesigns/Website/designexperiments/additionalinfo.htm

ii) Menu options

The menu options currently available at the site for designed experiments are:

General Block and Covariance Designs

This option provides for the construction of any general block design and can include plot covariates to allow for the known effects of any pre-treatment plot effects. After selecting this option, the user can select any

CSG 15 (Rev. 6/02) 10

Page 11: Research and Development - GOV.UKrandd.defra.gov.uk/Document.aspx?Document=HH3809S…  · Web viewThe main aim of the current project was to explore options for future development

Projecttitle

Biometrics website for statistics support of biological research     

DEFRAproject code

HH3809SX

number of treatments and any number of replicates subject to the restriction that the total plot number does not exceed 150 (this limit is to prevent the server being overwhelmed by large optimisation problems and may be increased once the various optimisation routines have been improved). The user has the option of choosing a control treatment with single or double replication, if required, and can have completely unstructured or crossed factorial treatment structures for the remaining treatments. All factors and factor levels can be named, if required. After selecting the required treatment structure, the user is presented with options that allow the physical shape of the design plan to be modified or that allow the treatments to be grouped into blocks that may contain any multiple number of replications provided that the multiple divides the replication number.

After selecting the plan and replication number options, the user can then choose whether to add additional block or plot covariance constraints. New blocks or covariates can be added in arbitrary combinations and there are no restrictions on the numbers of additional constraints that can be included. There are special functions for adding patterned block constraints http://biometrics.hri.ac.uk/ExperimentalDesigns/Website/DesignExperiments/rb_blockfun.asp#Discussionand there are similar special functions for adding covariates for spatial trend effects in field trials.

After adding all necessary block and covariate information, the design information is passed to an R optimisation routine via a batch command. The R algorithm then seeks to maximise the determinant of the contrast information matrix via a simple pairwise treatment swapping algorithm. After a fixed number of swaps, the improved design is output to a design page together with the A-efficiency factor of the design. As the improved design may not have reached a local maxima, there is an option that allow the optimisation algorithm to be continued from its previous end point to test whether further iteration achieves any further improvement in efficiency. Finally, an option is available that allows a new search to be undertaken starting from some random design choice. The new search option allows the algorithm to search for a new and possibly improved local maximum.

The optimisation algorithm is intended to be a general purpose algorithm that can deal with any type of design and this means it is not possible to use simple updating equations for the determinant of the contrast information. After each treatment swap, the full determinant must be solved ab initio, which results in a very slow algorithm especially for large designs with a large number of block constraints. In the future, it will be important to develop ways to improve the speed of the search algorithm.

After completing the optimisation, the final design page displays a plan layout of the design and a factorial design key for experiments with factorial treatment structures. The final design page also has options for displaying a GenStat dummy analysis of variance of the finalised design and for downloading a file of design details for use in constructing a suitable analysis program for the finished design.

The file (Examples - please read) that is associated with the general block and covariance design option in the menu list contains a simple set of instructions for constructing a semi-Latin square design for four replicates of 16 treatments using this menu option. It is intended to add further examples to this file at a later date.

Latin Squares

The Latin square option is a special option for the direct construction of Latin squares. These squares can have factorial treatment structure and control treatments and there is a standard GenStat analysis for the constructed designs.

Incomplete Latin squares

The incomplete Latin square option is a special option for the direct construction of incomplete Latin squares obtained by omitting a single row from a complete Latin square. These squares can have factorial treatment structure and control treatments and there is a standard GenStat analysis for the constructed designs.

Trojan Squares

CSG 15 (Rev. 6/02) 11

Page 12: Research and Development - GOV.UKrandd.defra.gov.uk/Document.aspx?Document=HH3809S…  · Web viewThe main aim of the current project was to explore options for future development

Projecttitle

Biometrics website for statistics support of biological research     

DEFRAproject code

HH3809SX

Trojan squares are a special class of semi-Latin square designs discussed by Edmondson (1998) and (2002).

http://biometrics.hri.ac.uk/ExperimentalDesigns/Website/designexperiments/designtypes.htm#trojansquare

Trojan squares have special properties of balance that make for a particularly efficient recovery of inter-block treatment information in addition to having optimum intra-block efficiency. Although the general block algorithm discussed above will construct semi-Latin squares on the assumption that row and column block effects are simply additive, the algorithm will not yet give optimum designs for semi-Latin squares with a rows-by-columns interaction stratum. For that reason, Trojan designs have been included explicitly as a special class of design that can be defined directly using the Trojan Squares option. The option includes all possible Trojan designs of size up to and including designs of size (9 x 9)/8 and should be adequate for all practical purposes. There is an option for displaying a stratified GenStat analysis of variance of the completed design and the design details can be downloaded, if required, for construction of a suitable analysis program.

Incomplete Trojan Squares

Incomplete Trojan squares are a special class of incomplete semi-Latiin square designs discussed by Edmondson (1998) and (2002) and can be obtained from complete Trojan squares by omitting an appropriate main row or main column. The option includes all possible incomplete Trojan designs of sizes up to and including designs of size (9 x 8)/8 and should be adequate for all practical purposes. The same options are available as for the complete Trojan designs.

Split-Plot Designs

The split-plot menu option allows the construction of a blocked split-plot design with any number of main plots within replicates and any number of sub-plots within main plots. There are options for entering a factorial treatment structure for main plot effects or for sub-plot effects and all factors and factor levels can be named, if required. The constructed design can be re-oriented to allow sub-plots to run either vertically or horizontally within main plots and there is an option for displaying a stratified GenStat analysis of variance of the completed design. The design details can be downloaded, if required, for construction of a suitable analysis program.

6) The Project User Group Report

The Biometrics Website Project User group met at HRI-Wellesbourne on the 4th Feb 2004 to review progress on the website project. The attendance list was

Dr David Pink, R. N. Edmondson, Dr James Lynn, Richard Reader and Dr Julie Jones (HRI Wellesbourne) Dr Steven Gilmour (Queen Mary, University of London), Rob Jacobson (Stockbridge Technology Centre), Ruth Sanderson (IGER), Martin Broadley (Sutton Bonnington), Doug Wilson (ADAS), Dave Simpson (HRI-EM) and Colin Campbell (HRI-EM), Matt Bell (Defra).

Following an introduction by Dr Pink, RNE gave a broad background explanation of the project and gave a demonstration of the current website developments from the current project work.

RNE then gave a demonstration of a development site using a local, non-internet server (IIS) on a Windows 2000 laptop machine and the potential of the current internet site for the constructing of arbitrary block and covariance designs with a range of treatment structures was demonstrated.

CSG 15 (Rev. 6/02) 12

Page 13: Research and Development - GOV.UKrandd.defra.gov.uk/Document.aspx?Document=HH3809S…  · Web viewThe main aim of the current project was to explore options for future development

Projecttitle

Biometrics website for statistics support of biological research     

DEFRAproject code

HH3809SX

RNE outlined some of the main conclusions that have become apparent from the year 1 work.

1) R is the best choice of statistical program language for the underpinning statistics engine for the website:

i) R does not require a licence agreementii) R is very widely available and is closely related to the important commercial S-plus softwareiii) R is likely to continue to be developed and to acquire new functionality at a very rapid rateiv) R is likely to become the standard scientific and academic statistics software for the future

2) Website technology is potentially highly convenient for users, as only a minimum of knowledge is required to access a website on a remote server. However, there are security restrictions on uploading data and websites are less flexible than applications that execute on the user machine.

3) The current websites were developed using the Microsoft active server pages technology. This technology is restricted to Microsoft servers and is aimed at commercial rather than scientific users. Java software is more suitable for cross-platform development and is likely to provide a more powerful and flexible environment for integrated software development than other software technologies.

Following the presentation, there was a round-table discussion and most attendees felt that the use of R for the underpinning algorithms was appropriate. However, as GenStat is commonly used in agricultural research in the UK, it would be valuable to have a GenStat program for each design in addition to an R program. (CC, RJ, RS)

Many of the attendees felt the development had potential value for teaching and training (SG, MB, DW).

DW suggested that large organisations could provide internal training and support in the use of the software but RJ pointed out that smaller organisations could not afford to do this. Ideally, the software would be fully self-documented.

CC thought that the ability to execute routine power studies easily and conveniently at the design stage of an experiment could be valuable for experimenters as this is often not done due to the computational complexity of the problem. Routine power studies executed at the design stage of an experiment could be valuable to researchers and funding agencies for deciding the amount of resources to invest in a research problem.

The issue of website security was discussed. All websites have some vulnerability to malicious attack. However, this site runs on a server outside the HRI firewall therefore there is no risk to HRI users in general. It is not intended that the site will execute data analysis therefore there is no risk that user data will be attacked or corrupted. RR explained that Java software may be less vulnerable than Microsoft therefore converting the site to Java technology might provide some additional protection.

The merits of websites versus downloadable applications were discussed and a number of attendees thought that the simplicity and convenience of a website were very important for biological users (CC, RJ). However, specialist statisticians (RNE, RS, SG) may want more flexibility of access than that provided by a simple website and a downloadable version of the software, in addition to an internet website, may be valuable. RR believes that it will be possible to produce website software using Java technology that would also be available for downloading to run on a local server on the user machine.

Some General Conclusions

i) R should underpin the future website development, although it will still be desirable to provide GenStat dummy analyses for GenStat users.

ii) A website development is favoured for user convenience but it will also be desirable to provide a downloadable version for local user machines if possible

CSG 15 (Rev. 6/02) 13

Page 14: Research and Development - GOV.UKrandd.defra.gov.uk/Document.aspx?Document=HH3809S…  · Web viewThe main aim of the current project was to explore options for future development

Projecttitle

Biometrics website for statistics support of biological research     

DEFRAproject code

HH3809SX

iii) The website development should be based on Java rather than Microsoft technology to give cross-platform flexibility, to access a more powerful technology for scientific applications and to provide improved security against malicious attack.

After the general discussion, DP asked whether the steering group thought the project was worthwhile and whether it should be continued if further funding was made available. The general consensus was that the project was worthwhile and had the potential to supply useful and important software for biometrics support and was worth developing further.

DW though it would be important to have a well-defined development program, possibly using a GANTT Chart development.

MB indicated the Defra was interested in developing the work further.

Defra had previously indicated that if the work was to be developed further they would want proper integration with user requirements and would want a steering or user group to be involved in the future development of the project. All the attendees were asked if they would be interested in continuing their involvement with the project if additional funding was obtained and all indicated that they would be interested to continue their involvement.

7) Current and Future development

The current project has created an operational website for the design of statistical experiments and provides advice and support for the construction of a range of standard types of experiments. In addition, we have provided facilities for constructing efficient non-standard incomplete block and covariance designs using a simple browser interface. We have demonstrated the software on a number of occasions including a demonstration to statisticians at the International Biometric Society British Region meeting at Reading University on 16th-18th Sep. 2003 (see Appendix B) and a demonstration for a group of potential users at our user group meeting at HRI on 4th Feb 2004.

The consensus view is that an internet website service for the design of experiments is potentially useful and should provide a substantial opportunity for development of useful support software for biometrics both for research and for teaching. However, it has become apparent during the current project that an internet website service alone is unlikely to meet the requirements of professional statisticians. There will also be a need for a downloadable version of the software that can be implemented and maintained on the user machine and that can be used without the security or access restrictions necessary on a remote server.

The recommended future development is to convert the existing website development to a Java based website using JSP technology and to provide both an internet server based website and a downloadable version that can be run on local machines using a virtual server based on Java Virtual Machine technology and underpinned by R statistical software. It appears feasible to bundle all the necessary software into a self-installing package on a CD or at a suitable software website such as the R site http://cran.r-project.org/ that could then be used to install all the required software automatically on the user machine.

The advantage of this approach would be that the same software could be used both to supply an internet server site and to supply a virtual server site for local machines.

Provided that the future development of the project is based on open-source software using cross-platform technology and provided the development uses high-quality and reliable algorithms, we believe there will be substantial opportunities to attract new research funding from agencies and organisations interested in maintaining and improving the quality of experimental research in all areas of applied biology

A further 3-year program of work to develop the ideas outlined above has been submitted to Defra for the period 2004-2007.

CSG 15 (Rev. 6/02) 14

Page 15: Research and Development - GOV.UKrandd.defra.gov.uk/Document.aspx?Document=HH3809S…  · Web viewThe main aim of the current project was to explore options for future development

Projecttitle

Biometrics website for statistics support of biological research     

DEFRAproject code

HH3809SX

8) ReferencesEDMONDSON, R. N. (1998). Trojan and incomplete Trojan square designs for crop research. Journal of

Agricultural Science, Cambridge, 131, 135-142. EDMONDSON, R. N. (2002). Generalized incomplete Trojan designs. Biometrika, 89, 877-891.

Appendix A

Defra funding was used to provide a dedicated Viglen CX120 Pedestal Server for the project. The Server is currently located in the Server room at HRI-Wellesbourne where routine maintenance and back-up is the responsibility of the IT group.

Vig350B Dual Xeon Server Motherboard • Supports two Intel Xeon CPUs with 512K L2 cache • ServerWorks ServerSet GC-SL chipset • 400/533MHz front side bus • Supports up to 4GB ECC registered DDR memory • On-Board dual channel Adaptec Ultra320 SCSI • On-Board dual Intel 10/100/1000 Server NIC • On-Board ATi VGA Adapter with 8MB of video memory • Five full-length PCI slots: three 64-bit/33MHz and two 32-bit/33MHz • Basic hardware management built in with LANDesk softwareViglen Vig811 Server Chassis with 450w PSU • Rackmount or pedestal configuration • 450W PFC Power Supply • Support for four IDE or five SCSI hard disk drives • Three high quality fans for system wide cooling • Locking front door and chassis intrusion detection • Dimensions: 220mm x 449mm x 622mm (WxHxD)Upgrade to Single Intel Xeon 2.8GHz, 512k cache, 400MHz FSB Upgrade to 1GB DDR ECC SDRAM 1 x 36.7GB Ultra320 SCSI-3 10,000RPM 1" HDD 1.44MB 3.5 Floppy Disk Drive 52x IDE CD-ROM Drive

Appendix BSummary of IBS –BR03 Presentation

Construction and Analysis of Standard DesignsWe will give a brief demonstration of the website and show how the site can be used to construct and analyze standard block and treatment designs.Designs can be copied directly into an Excel spreadsheet (or Word document) by using the copy and paste options of the browser and we will demonstrate how constructed designs can be manipulated using the Excel (or Word) editor to give a finished plan.

We will show how the site automatically generates a GenStat analysis of variance program for any constructed design and we will show how this program can be downloaded to the user machine and used to construct a dummy analysis of variance for the constructed design.

CSG 15 (Rev. 6/02) 15

Page 16: Research and Development - GOV.UKrandd.defra.gov.uk/Document.aspx?Document=HH3809S…  · Web viewThe main aim of the current project was to explore options for future development

Projecttitle

Biometrics website for statistics support of biological research     

DEFRAproject code

HH3809SX

Construction and Analysis of Covariance Designs

Covariance information may be available at the design stage of an experiment and our site allows us to enter up to five covariates for any randomized block or Latin or incomplete Latin square design. The site automatically generates 500 independent randomizations of the design and these can be downloaded and analyzed using a GenStat program that will automatically calculate the covariance efficiency factors of the randomized designs. The procedure can be iterated as often as may be required until a satisfactory distribution has been established for the covariance efficiency factors of the randomized designs.

We believe that designs should be chosen that have a high efficiency factor for the assumed covariance model and we will demonstrate this methodology by constructing a Latin square design that is robust against low-order polynomial row-by-column interaction effects. We will then compare this methodology with that discussed in Edmondson (1993).

EDMONDSON, R. N. (1993). Systematic row and column designs balanced for low-order polynomial interactions between rows and columns. Journal of the Royal Statistical Society Series B, 55, 707-723.

CSG 15 (Rev. 6/02) 16