25
Shirley Crompton Source: Rob Allan

Shirley Crompton

Embed Size (px)

DESCRIPTION

Shirley Crompton. Source: Rob Allan. single dataset traditional tools value added services various metadata various data storage. Institutional Repository. Subject Repository. Data Producer Repository. harmonisation tools Sub-setting tools modelling tools simulation tools - PowerPoint PPT Presentation

Citation preview

Page 1: Shirley Crompton

Shirley Crompton

Source: Rob Allan

Page 2: Shirley Crompton

Institutional Repository

Subject Repository

Data Producer Repository

•share resources•solve bigger problems•integrate communities•secure setting•integrate dataGrid

Repository

•single dataset•traditional tools•value added services•various metadata•various data storage

•search function•ontologies•metadata registry•algorithm registry •geo cross walk•question bank•classification mappings•variable mapping

harmonisation toolsSub-setting toolsmodelling toolssimulation toolsdataset linking toolsannotation toolsmethods capture toolstraditional toolssafe setting toolsstatistical disclosure tools

Page 3: Shirley Crompton

1. Recommend that the British Household Panel Survey, the Census 1991 Samples of Anonymised Records and EDINA UKBorders Census boundary data for SARs 1991 should be the first priority for Grid-enabling.

Page 4: Shirley Crompton

2. Recommend that a GRID Shibboleth and/or Athens authentication system is in place and a GIS system that can utilise the boundary data.

Page 5: Shirley Crompton

3. Recommend that initially the datasets reside in an oracle database but that long term the data should be pulled to the Grid from the existing dataset provider. Note: At present both the BHPS and SARs reside in Nesstar servers where authentication and sub-setting systems already exist. The outputs from these servers are downloadable Zip files containing the data in SPSS, SAS, Stata, NSDStat or delimited format.

Page 6: Shirley Crompton

4. Recommend that the Grid projects that have previously used these datasets, for example GEMEDA, MoSeS, GEMS, are used as exemplars. Where possible the data outputs, techniques and methodology should also be made available. Note: A further enhancement could be the actual modification of the models and simulations so that researchers can experiment with the systems.

Page 7: Shirley Crompton

5. Recommend priority for Grid-enabling should be given to health related datasets that are available in Nesstar, such as the Health Survey for England and the National Child Development Survey. Note: The BHPS also contains health related questions.

Page 8: Shirley Crompton

6. Recommend that the Grid projects that have previously used this type of data, for example HYDRA and MoSeS are used as exemplars as above.

Page 9: Shirley Crompton

7. Recommend that datasets from other disciplines are available on the Grid to social science researchers. For example more sensitive medical data from the Medical Research Council or environmental data from the Natural Environment Research Council on air pollution or global warming.

Page 10: Shirley Crompton

8. Recommend that the Grid projects that have previously used this type of data, for example GeoVUE and ESG II are used as exemplars as above.

Page 11: Shirley Crompton

9. Recommend that the experience and difficulties encountered in the pilot projects should be pooled to ensure that the metadata describing these datasets is sufficient to allow ease of use and the data accompanied by additional systems to ease interoperability. Also that these pooled experiences should be the base for and converted into procedure and best practice guides for Grid-enabling datasets.

Page 12: Shirley Crompton

10. Recommend that long running series of data, suitable for harmonisation and available via Nesstar, are considered for Grid-enabling. These include the Quarterly Labour Force Survey, the General Household Survey, the British Social Attitudes Survey, the Workplace Employee Relations Survey, the ONS Omnibus Survey and the Millennium Cohort Study.

Page 13: Shirley Crompton

11. Recommend that Grid tools are in place that facilitates the harmonisation of long running data series. Tools for Sub-setting, modelling, simulation and linking of datasets should also be available on the Grid. The methods employed should be captured for future use and where applicable added as metadata to the appropriate dataset.

Page 14: Shirley Crompton

12. Recommend additional Grid tools for geographic mappings, metadata registries, controlled vocabularies, ontologies, question banks, classification schema and variable mappings are also considered.

Page 15: Shirley Crompton

13. Recommended that consideration should be given to making traditional social science tools such as SPSS, SAS and Stata available on the Grid.

Page 16: Shirley Crompton

14. Recommend that the following aggregate datasets should be Grid-enabled; namely the International Monetary Fund (IMF), World Bank and Organisation for Economic Cooperation and Development (OECD) macro databank series from the ESDS International service. Note: These datasets are available via Beyond20/20

Page 17: Shirley Crompton

15. Recommend that the Grid projects that have previously used this type of data, for example SAMD are used as exemplars as above.

Page 18: Shirley Crompton

16. Recommend other datasets for consideration be the British Crime Survey and the ONS Neighbourhood Statistics.

Page 19: Shirley Crompton

17. Recommend that the Grid projects that have previously used this type of data, for example the Offenders Personal and Area-based Social Exclusion project are used as exemplars as above.

Page 20: Shirley Crompton

18. Recommend that European datasets, such as the European Social Survey and the Eurobarometer series which are also available via Nesstar, be considered for Grid-enabling.

Page 21: Shirley Crompton

19. Recommend investigation into Grid-enabling data that are not available via existing data centres, such as administrative, retail, consumer, video, CCTV and web usage data.

Page 22: Shirley Crompton

20. Recommend negotiations with ONS and ESDS to establish a Grid virtual organisation which could act as a safe setting for statistically sensitive data such as the Census Controlled Access Microdata Samples.

Page 23: Shirley Crompton

21. Recommend development of Grid software to determine whether the combining or sub-setting of datasets would lead to statistical disclosure of individuals.

Page 24: Shirley Crompton

This report concludes that the Grid-enabling of datasets in itself is not sufficient to stimulate the uptake by researchers of Grid technologies and the new methodologies for research that are offered by exposing data to the computational power of the Grid. To encourage uptake the report suggests that the metadata associated with Grid-enabled datasets has to be sufficient to support both the combination of data and the new forms of research, and that systems have to be in place that facilitate and ease the processes involved; such as metadata registries, geo-cross walks, question banks, ontologies, classification schema and variable mappings. The report also concludes that exposing statistical sensitive datasets in a controlled safe setting or systems for the analysis of the outputs from modifiable simulation models would offer a unique opportunity for social science research and increase the uptake of the Grid.

Page 25: Shirley Crompton

• British Household Panel Survey, 1991- 4,522 • Quarterly Labour Force Survey, 1992- 17,386 (735)• General Household Survey, 1971- 5,343 (935)• Family Expenditure Survey, 1961-2001• Health Survey for England, 1991-• British Social Attitudes Survey, 1983-• British Cohort Study (BCS70) 1970-2005• British Crime Survey, 1982 -• British Election Studies, 1969-• Family Resources Survey, 1993-• National Child Development Study, 1958-• Workplace Employee Relations Survey, 1980-