Upload
ryder-watts
View
35
Download
0
Tags:
Embed Size (px)
DESCRIPTION
PIPING HOT: Little Bins in big workflows. Alex Garnett Digital Preservation & Data Curation SFU Library. Thesis: I am a terrible programmer. Thesis: I am a terrible programmer. 2 0% of you are thinking “no kidding!” The other 80% of you are thinking “uh huh. Stupid false-modest shmuck .”. - PowerPoint PPT Presentation
Citation preview
PIPING HOT:Little Bins in
big workflows
Alex GarnettDigital Preservation & Data
CurationSFU Library
Thesis: I am a terrible programmer
Thesis: I am a terrible programmer
• 20% of you are thinking “no kidding!”
• The other 80% of you are thinking “uh huh. Stupid false-modest shmuck.”
Thesis: I am a terrible programmer
• 20% of you are thinking “no kidding!”
• The other 80% of you are thinking “uh huh. Stupid false-modest shmuck.”
• Who needs impostor syndrome when you have a bash shell?
• For the record, this is the payoff from all those colonoscopy jokes. Yep.
But how does it apply to libraries?
[If MJ Suhonos is here this year, this is his cue to groan
audibly]
LIBRARY PROBLEM #1: PDFA
• ProQuest wants PDFA submissions from now on
• “now on” apparently = the past five years’ backlog
• We have to convert five years of theses!
• This is now also being used at the UofA.
LIBRARY PROBLEM #2: ARCHIVES PROBLEM:
LIBRARY HARDERSTARRING BRUCE
WILLIS
CRAP, I USED UP THE WHOLE SLIDE ON THE
TITLE
• Archives needed a GUI tool to be able to create restrictive FTP accounts for donors.
LIBRARY PROBLEM #3:PDF REDACTION (IT’S LIKE THE FIRST ONE
BECAUSE NO ONE LIKED THE SEQUEL,
DOES ANYONE WANT TO WATCH TEMPLE OF
DOOM LATER, OH HELL I’VE DONE IT AGAIN)
• We learned we had some poorly redacted PDFs
• Blackout meant to obscure text; still selectable
• Solution:– Detect offending pages with
ghostscript…• (this is the hard part; dumping PDF guts is
appalling)
• … and then:– Snip offending pages with pdftk– Convert them to images with imagemagick– OCR back into PDF (minus obscured text)
with tesseract and fix up the dimensions with gs again
– Paste back in with pdftk.– 5 lines, all free tools! Documentation &
piping.
Takeaway
• If you find yourself doing a very bad job of learning PHP and feeling like you have something to prove: it doesn’t have to be this way
Takeaway
• If you find yourself doing a very bad job of learning PHP and feeling like you have something to prove: it doesn’t have to be this way
• There is a huge amount of useful space you can occupy as a barely-programmer if you’re comfortable using a terminal for problem solving (less so on Windows). StackOverflow and Google are your friend.
Takeaway
• Open-source command line tools are really good these days! They are powerful, they are straightforward, and they are often cutting edge.
• There is a huge amount of useful space you can occupy as a barely-programmer if you’re comfortable using a terminal for problem solving (less so on Windows). StackOverflow and Google are your friend.
Surprise: Everybody gets a free colonoscopy after all!
• Thanks! [email protected] ; @axfelix