Workflows for Digital Preservation and Curation Workshop Open Repositories 2012

  • Published on

  • View

  • Download

Embed Size (px)


Workflows for Digital Preservation and Curation Workshop Open Repositories 2012. Stacy Kowalczyk Beth Plale Kavitha Chandrasekar Yiming Sun. Agenda. Introduction to Digital Curation Workflow Systems Overview Workflows for Digital Curation Break Implementing Workflows in Trident - PowerPoint PPT Presentation


Workflows for Digital Preservation and Curation Workshop

Workflows for Digital Preservation and Curation Workshop

Open Repositories 2012

Stacy KowalczykBeth PlaleKavitha ChandrasekarYiming Sun1AgendaIntroduction to Digital Curation Workflow Systems OverviewWorkflows for Digital CurationBreakImplementing Workflows in Trident Modifying a WorkflowCreate a new Workflow Creating ComponentsWrap up

7/10/122AcknowledgementsThis workshop was made possible through a generous grant by Microsoft Research

And by the Data to Insight Center of Indiana Universitys Pervasive Technology Institute

Quan Zhou, Ph.D. student and developer, for his help with developing components, workflows, and documentation7/10/123

Introduction to Digital CurationDefining curationInfrastructure for curationCurating the filesCurating the object7/10/124Defining CurationDigital curation involves maintaining, preserving and adding value to digital research data throughout its lifecycle.The active management of research data reduces threats to their long-term research value and mitigates the risk of digital obsolescence. Meanwhile, curated data in trusted digital repositories may be shared among the wider research community.As well as reducing duplication of effort in research data creation, curation enhances the long-term value of existing data by making it available for further high quality research.Digital Curation Center 7/10/1255Curation InfrastructureRepositoryPublic accessPoliciesProcessesInstitutional support

7/10/1266Curating the FilesBitstream IntegrityFixityDuplicate copiesFile integrityFormat verificationFormat validation

7/10/127File FormatsDurabilityTransparencyDocumentationUbiquityRenderabilityLongevity

7/10/128Format ChoicesMaster files for preservationHighest qualityHighest fidelityLosslessDerivative files for active use and deliverySmallest possible for user needsFast deliveryEasy to use format7/10/129Curating the ObjectContext Relationships between filesTechnical metadataIntellectual metadataTo MetadataImplicit/explicit context

7/10/1210Curation ActivitiesOngoing verificationFile integrityObject integrityMetadata managementManagement of obsolescenceHardwareSoftwareFormatsDocumentation

7/10/1211Workflow SystemsPurpose of workflow systemsTypes of workflow systemsTrident Workflow Workbench

7/10/1212Why Workflow SystemsRepetitive and mundane activities simplifiedFacilitates and enforces best practices Enables efficient scheduling Machinery for coordinating the execution of services and linking together resourcesFacilitates outreach to researchers for direct deposit and automatic curation


Types of Workflow Systems7/10/1214



Ptolemy II


Taverna14TridentOpen source projectBased on Microsoft Workflow Foundation classesSupported by Microsoft Research and academic researchersIntegrates with myExperimentWell accepted in the research communitywell over 100 peer-reviewed and white papers were discovered from one scholarly aggregation service


15Trident ComponentsTrident Management StudioTrident Workflow ComposerTrident Workflow ApplicationMicrosoft SQL ServerTrident Silverlight client for web execution of workflowsMicrosoft Visual StudioC# development environment7/10/1216Design

Visual Workflow Composer

Trident Registry

Workflow Packages(domain specific)

Trident Runtime Services

Windows Workflow Foundation.NET 4.0Provenance


Workflow Scheduling Service


Admin Console

Workflow Monitor


Web PortalsearchLaunch Monitor

Workflow Launcher

Results Repository

Workflow Repository (myExperiment)

Data Access LayerData Object Model (data source abstraction layer)Data Storage Providers: SQL Server, Local XML store, Workflows for CurationGoalsSystematic and repeatable processesHelps remove human errorsData IngestIntegrity checksFormat normalization/derivative generationMetadata creationsCuration activitiesIntegrity checksFormat migrationMedia migration7/10/1218Data Ingest WorkflowsScenariosSingle part objects (individual images)Multi-part objects (a book)Multiple instantiations of a logical object (word, pdf and ppt of a research paper)Multiple multi-part objects (a group of letters)Research data products (multiple files of various types)Scientific workflow process

7/10/1219Single Part Objects WorkflowMagic Lantern Slides Individual filesSpreadsheet7/10/1220Derivative GenerationFormat Validation andVerificationFixity CheckCreateTech MetadataCreate Intellectual MetadataCreate Object MetadataPersistentIdentificationDeposit in RepositoryImage Quality ChecksMulti-part Object WorkflowComic BookRISSet of .tif files

7/10/1221CreateTech MetadataDerivative GenerationFormat Validation andVerificationFixity CheckObject IntegrityCreate Intellectual MetadataCreate Object MetadataPersistent IdentificationDeposit in RepositoryImage Quality ChecksMultiple Instantiations of a Logical Object Workflow PapersEach logical object per subdirectoryRIS, word file and (perhaps) supplemental file7/10/1222Format NormalizationFormat Validation andVerificationFixity CheckCreateTech MetadataCreate Intellectual MetadataCreate Object MetadataPersistent IdentificationDeposit in RepositoryDerivative GenerationMultiple Multi-part Object WorkflowBall collectionRIS for collection and Inventory spreadsheetEach logical object in separate subdirectory

7/10/1223CreateTech MetadataDerivative GenerationFormat Validation andVerificationFixity CheckObject IntegrityCreate Intellectual MetadataCreate Object MetadataPersistent IdentificationDeposit in RepositoryImage Quality ChecksCollection IntegrityCreate Collection MetadataResearch Data Products VortexEach subdirectory is an experiment with FGDC metadata 7/10/1224Compress DataFixity CheckCreate Intellectual MetadataCreate Object MetadataPersistentIdentificationDeposit in RepositoryWorkflow ComponentsFormat Conversions (for normalization and derivative generation).xlsx to .csv.docx to .pdf.ppt to .pdf.tif to .jpgZipping on demandImage (.tif or .jpg) to .pdf

7/10/1225Workflow Components 2Context creationMIX data generator and validatorMETS data generator and validatorData IntegrityMD5 checksum generatorMD5 checksum validatorJHOVE for format verification and validationGroup validation (for object integrity)

7/10/1226Post Deposit Curation WorkflowScenarios Fixity verificationFormat normalizationNew or additional derivative generationMedia migrationPersistent identifier updatesMetadata updates 7/10/1227Workflows in Trident7/10/1228Executing Workflows7/10/1229Individual object ingestMultipart object ingestMultiple multipart object ingestMultiple instantiations of a single logical objectResearch data ingestScientific workflow Fixity check curation workflowImplementing Workflows in Trident Launch the Remote Desktop applicationUser: AMAZONA-JJOAL14\oruserPWD: TridentOR12!!Computer ip addresses on slip of paper being passed out now.7/10/1230

Trident Workflow Composer7/10/1231

Participant Exercises7/10/1232Modifying Workflows Add components to existing workflowsSelect the Individual Ingest WorkflowAdd DOI componentBefore the METS generator componentMake the connectionsSelect the Group Ingest Workflow ComicAdd the METS generation componentAfter the last component in the main lineMake the connections

7/10/1233Simple Curation Workflow CreationCreate a Workflow for a simple curation process validate MD5 checksumsDefine a directory of image filesDefine a METS fileDefine an out put locationLink the MD5 checksum validation componentLink the MD5 checksum report componentSave and execute the workflow

7/10/1234Creating Components Exercise:Create a new Trident workflow componentImplement the MARCXML to MODS Stylesheet Kavitha Chandrasekar will demonstrate the process7/10/1235Wrap UpThumb drivesTrident codeplex siteTrident listservContributing to TridentWorkshop Evaluation FormOngoing conversation

7/10/1236Contacts for Further DiscussionTrident CodePlex site: Trident Listserv:

Stacy Kowalczyk: skowalcz@indiana.eduKavitha Chandrasekar: Yiming Sun: Quan Zhou:



View more >