7
Data goodness Mostly in black and white By Dom

Data goodness Mostly in black and white By Dom. You must love your data! Lost data : Current imaging data in BRIC cost ~£5.1M, just for scanning costs!

Embed Size (px)

Citation preview

Page 1: Data goodness Mostly in black and white By Dom. You must love your data! Lost data : Current imaging data in BRIC cost ~£5.1M, just for scanning costs!

Data goodness

Mostly in black and white

By Dom

Page 2: Data goodness Mostly in black and white By Dom. You must love your data! Lost data : Current imaging data in BRIC cost ~£5.1M, just for scanning costs!

You must love your data!

• Lost data : • Current imaging data in BRIC cost ~£5.1M, just for scanning costs! (2011)

no research no publications

no jobs no PhDs!Sad Dom

• Look after your data! – It looks after you

• Happy Dom

Page 3: Data goodness Mostly in black and white By Dom. You must love your data! Lost data : Current imaging data in BRIC cost ~£5.1M, just for scanning costs!

Data Storage

• Home directories:– ISIS home, U Home

» Not for large amounts of imaging data

• Projects directory– ISIS, V: Big stuff goes here

• If you require large amounts of space – E.g. > 50 GB

– LET ME KNOW IN ADVANCE!

Page 4: Data goodness Mostly in black and white By Dom. You must love your data! Lost data : Current imaging data in BRIC cost ~£5.1M, just for scanning costs!

Server goodness

• Why is the server a good place to store data?

• Mirror and parity - some errors - data can be easily recovered

– BACKUPS:• Tape backups, daily - 1 month retention• if you have funding, processed data can be mirrored off site• raw data is always mirrored offsite (ECDF) by default

– Desktop PC's• not reliable - no mirroring, no parity - some errors - data is lost

(Often all of it)• Network backups often fail

– Machines turned off, Network busy– moving to a new system when I get time!

Page 5: Data goodness Mostly in black and white By Dom. You must love your data! Lost data : Current imaging data in BRIC cost ~£5.1M, just for scanning costs!

Data love• Curation: Do this as you work!

• Plan your data use

– Use meaningful folder names– Make 'README.txt' files with dates, names of students/employees

involved, references to software, scripts and versions, purpose of experiment/processing.

– Be tidy with your data - tidy up occasionally – Friday afternoon - quick tidy up– Big tidy up at end of experiment/ project/ phase/ year

• BE CAREFUL, don’t rush

• Data, spreadsheets, databases– Anonymisation– *** Repatriation keys***

Page 6: Data goodness Mostly in black and white By Dom. You must love your data! Lost data : Current imaging data in BRIC cost ~£5.1M, just for scanning costs!

Code and Scripts

• Coding:• Testing

– Make sure that the software you are using does exactly what you think it does!

» Check every step for every image!

– Do not use hard coded paths• Use versioning software (ECDF)

Page 7: Data goodness Mostly in black and white By Dom. You must love your data! Lost data : Current imaging data in BRIC cost ~£5.1M, just for scanning costs!

Safe data is Happy data!