23
How is data generated? 1 How to researchers get data? Recap ‘What is data’ Methods of data collection, setting down, storage Varies across disciplines Sliding scale of accessibility and formality How to guide – example cheat sheet

How is data generated?

  • Upload
    baina

  • View
    32

  • Download
    2

Embed Size (px)

DESCRIPTION

How is data generated?. Recap ‘What is data’ Methods of data collection, setting down, storage Varies across disciplines. How to researchers get data?. Sliding scale of accessibility and formality How to guide – example cheat sheet. Defining research data. Collection method - PowerPoint PPT Presentation

Citation preview

Page 1: How is data generated?

How is data generated?

1

How to researchers get data?

Recap ‘What is data’ Methods of data collection, setting down, storage Varies across disciplines

Sliding scale of accessibility and formality How to guide – example cheat sheet

Page 2: How is data generated?

Defining research data

ExamplesNumbersWords/textsSurvey resultsInterviewsMachine readingsVoice recordingsVoice transcriptsImagesVideoSoundArtifactsSpecimensSamples (medical, paleo, geo, …)…

2

Collection method→ Quantitative → Qualitative (& quant)→ Quant/Qual → Quant/Qual → Quantitative → Qualitative → Qualitative → ?→ Qualitative → ?→ ?→ ?

Other termsObservational?Mixed methods?Secondary?Case study?Cross-sectional?Longitudinal?‘Big data’

Page 3: How is data generated?

Defining research data

3

Different data formats (should be documented!)RawTranscribedConverted (in format, by analysis)Derived (e.g., confidentialised, de-sensitised)Physical or DigitisedSingle, multiple, combined datasets

Same ‘research input’ may have multiple data outputs (e.g., ancient/historical scripture – image, digital image, transcription, interpretation)

Page 4: How is data generated?

4

Common features of data‘building blocks of information’As information varies with discipline, so do the main kinds of data and methods of collection

http://www.dcc.ac.uk/sites/default/files/documents/publications/DCC_Howto_Discover_Requirements.pdf

E.g., Medical science: bloods + readings = disease presence E.g., Anthropology: recorded interviews + observations =

cultural practices

Page 5: How is data generated?

What is data, recap

Formats: Can be physical/analog (e.g. paper) or digital (e.g., Papyrology can be both)

Original or transcribed/described/representative Methodology – cross-sectional vs. longitudinal, survey

vs. administrative Can be created by and for a range of people and

services

5

Data questions?

Page 6: How is data generated?

How and where data is stored

Data storage vs. metadata Continuums of data storage

*Does not necessarily relate to accessibility

6

Formal (conventions around capture, vocab)

Informal(much variability)

Stores/repositories Individual researcher

Screenshot from: ada.edu.au [Accessed 28/04/2014].

Page 7: How is data generated?

Cultural institutions Researchers On institutional file storage networks or portable media Captured by third parties - storage or social media

service providers, e.g. DropBox or Flickr, Figshare, or data repositories, e.g. Australian Data Archive (NCI, RDSI), VicNode (RDSI)

More examples of databases/repositories after lunch

7

Who manages stored data?

Page 8: How is data generated?

Continuums of metadata storage

8

Formal Informal

Registries/Commons Project website

Screenshot from: researchdata.ands.org.au [Accessed 28/04/2014]. Screenshot from: rsha.anu.edu.au [Accessed 28/04/2014].

Page 9: How is data generated?

Accessibility & quality of metadata and data don’t align

9

Public-Public(Open access)

Public-Private(Mediated open access)

Private-Private(Closed access)

Metadata is fully discoverable

Metadata is fully discoverable

Metadata is not publicly available

Data are accessible and immediately downloadable

Mediated access to data via data custodian 

Data not discoverable or available to third parties

Preferred option for non-sensitive data from completed projects

Good option for sensitive or confidential data

Safest option for highly-sensitive data

http://libguides.library.curtin.edu.au/

Page 10: How is data generated?

10

Page 11: How is data generated?

11

Accessing data

When this might be harder – Sharing and accessing sensitive data

Page 12: How is data generated?

Getting data

How do people Find/Discover data?Movable feast / changing beastNo established methods like other scholarly outputsNo standard practice or vocabDatabases are non-exhaustive Methods for searching and terms driven by why people are

looking (e.g., may start with direct contact from a project website)

and subject matter as well as methodology, accessibility etc.

12

Page 13: How is data generated?

1. Have you already identified the data or exploring?2. Search formal databases (public/private mix):

Research Data Australia (RDA), Australian Bureau of Statistics (ABS), Australian Data Archive (ADA), Figshare, Trove

data.gov.au, data.gov, data.gov.uk http://databib.org/index.php Think about search terms by data topics AND characteristics

3. Informal searching: ‘Googling’ From publications Peer networks Cold calling

13

Finding data

Why metadata counts!

Page 14: How is data generated?

Case study

Student approaches ANU library staff to access Child and Adolescent Component (1998) of the National Survey of Mental Health and Wellbeing after reading an study that uses the data

Google locates researcher in WA… ….who says data is in Australian Data Archive….in Canberra (but have to know to look there! – not found via google search) Link to request permission for license (once register with ADA)

14

Page 15: How is data generated?

Accessing data

So you’ve found an interesting dataset. How do you GET it? Repository catalogue entries (derived from metadata) will typically provide info about how

to obtain the data …or at least a contact…

Access varies depending on access policy of the owner

15

Open Access(public/public)

No access

Download from website

Highly sensitive data (e.g., not de-identified medical records)

Conditional/Mediated(public/sort-of private)

May need to pay fee and/or sign contract

Why metadata counts!

Page 16: How is data generated?

Conditional or mediated access to data

May be held by:Custodian of dataLogin or approval required (e.g., ADA)Licenced = reuse is (legally) conditional

AusGoal Organisational licenses (or repository or data manager)

16

What is a license?

Page 17: How is data generated?

AusGoal licences Australian Government Open Access and Licensing Framework Ready-made licences with legal surety. Endorsed by CAUL. Apply least restrictive 6 levels of Creative Commons license

Least restrictive = CC BY (Default Licence for Aust Govt) Most restrictive = CC BY-NC-ND

Restricted License (template) - for data that contains personal or other confidential information

17

Page 18: How is data generated?

Sensitive data

Sensitive data is data that can be used to identify an individual or object to place them at risk of discrimination/harm or unwanted attention

Invokes law (Privacy Act) and research ethics Examples:

Survey data including names and criminal records Hospital records Location of endangered species * sensitive by context

18

Page 19: How is data generated?

Can sensitive data be shared?

Typically, Yes! But How? When?

When consent is explicitly given, and/or When data is de-sensitised (‘de-identified’) When data is modified When an appropriate license is applied

Different issues when data is new vs. existing

19

Page 20: How is data generated?

Stay tuned…

ANDS Guide to Sharing Sensitive Data Safely is on the way

20

Page 21: How is data generated?

Case Study

A group of researchers at University of Timbuktoo were interested in the links between mental health, activity, and internet use in young people. They surveyed 986 young people aged 16-20 years. The survey asked about their age (DOB), school, physical and mental health, eating habits, physical activity, computer/internet use, educational achievement, family structure and parents’ cultural background. Paper surveys were used and then destroyed when the data was entered into an electronic database. The researchers would like to make their data available to other researchers – particularly to forms new collaborations and link with similar datasets on young people.

21

Page 22: How is data generated?

1. Is the data sensitive?

2. Barriers to sharing/publishing

3. What can be done now towards sharing?

22

Page 23: How is data generated?

23

Barriers/issues Solutions To look into

1.

2.

3.