Documenting metadata application profiles and vocabularies

Paul Walk

Director, Antleaf

Managing Director, Dublin Core Metadata Initiative (DCMI)

Web: http://www.paulwalk.net

Email: [email protected]

Twitter: @paulwalk

www.antleaf.com www.dublincore.org

Sharing profiles: Documenting profiles and

vocabularies on the Web

is it more important that

application profiles are

machine-friendly, or user-

friendly?

the specific challenge:

how to manage & publish the Dublin Core

technical documentation in a more

efficient & sustainable way, making it

as user-friendly as possible while

maintaining its machine-readability

context

• DCMI publishes important technical

documentation (vocabularies,

specifications, models) on the Web

• until recently, managed in sophisticated

bespoke system:

• sources edited as XML files

• maintained in a Subversion

repository

• assembled & converted with shell

scripts and 'Ant'

• FTP to a 'staging server'

• deployed to the live server by the

server admin, on request

• essentially a "closed" system

three technologies which make the difference

1. Git• stable, sophisticated, free version control technology which is ubiquitously

supported

• github: global scale infrastructure providing git as a service

• invite contribution by 'pull request’

2. Markdown• simple, parseable but easily readable plain text format

3. Static website generators• a new class of content management system where sources are managed

locally and compiled into webpages which are then uploaded to a server

(like we used to do it in the early 90s!)

• supports distributed content-management via git

• supports long-term preservation by requiring only simple text-based

formats

• supports use of desktop authoring tools - e.g. text-editors

we are exploring how these three

technologies:

* Git/GitHub

* Markdown (with metadata “front matter”)

* static-site generators

can be harnessed together to address

our challenge

what are static site

generators?

what are static site generators?

• a different kind of web-content management system, designed to publish

content as static content to a bog-standard web-server.

• content is processed during the publishing operation, rather than when the

user requests content (although client-side Javascript still supported)

• simple command-line application to generate content and serve pages

• no database - content in semi-structured text files

components - standard to most systems

1. content-model

• folder hierarchy, text files

2. content pages

• (markdown, front-matter)

• blog type content is also often supported

3. templates (& themes)

• (with some level of basic scripting)

4. generator software

• typically a command-line script or application

5. configuration file

1. content-model

• text files arranged in folder

hierarchy

• folder hierarchy relates to URL path

structure

• filename relates to URL

2. content pages

• "front-matter" metadata

• often in YAML format like here

• main body in Markdown, arbitrary

HTML also accepted where necessary

3. templates

• can reference metadata (e.g. 'page title') from content page

• can re-use 'partial' templates (e.g. a common 'header' & 'footer')

• often in a common templating language such as HAML

• (example below is in Go's templating syntax)

= include partials/header.html .

div.row-fluid

div class="col-xs-12"

h1.page-title {{if .Draft}}[**draft**]{{end}}{{.Title}}

h2.page-title

i {{.Params.author}}, {{.Date.Format "Monday, January 02, 2006"}}

{{.Content}}

= include partials/share_buttons.html .

= include _internal/disqus.html .

= include partials/footer.html .

4. generator software

• used to generate new content:

• also used to run a local sever to see how the site will look

deployment options

• SFTP

• Rsync (over SSH)

• git commit hooks (or GitHub webhooks)

• requires the site to be built on the server, so a little more infrastructure (a

simple CGI) is required

436 known generators

https://staticsitegenerators.net

workflow

‘flipping’ the approach

old approach (single source file)

new approach (many source files, one per term)

pros and cons

• old approach (source in XML file

or similar)

• pros:

• easy to track source files (few in

number)

• easy to transform into other

machine-readable formats

• cons:

• difficult to maintain the source -

not user-friendly

• poor support for extensive free

text description

• new approach (source in

Markdown+YAML)

• pros:

• easier to for humans to read and

maintain

• good support for extensive free

text description

• easy to re-use

(partially/completely)

• cons:

• may not suit very complex

vocabularies/or profiles

simplifying curation and preservation

• version control and redundancy• synchronised repositories & distributed version control via Git

• active curation• ease of access and contribution to sources via Git

• simple & readable plain text formats (Markdown)

• "one click" deployment

• minimal deployment infrastructure• standard web-server

• text files, open formats, no database or server-side 'logic', static site

generators

• reduces broken websites

issues & challenges

1. is this still too technical for

some people who may need

to maintain a metadata

profile or vocabulary?

2. will this approach be

sophisticated enough to

document the majority of

candidate

profiles/vocabularies?

3. can we generalise this

approach to provide a

useful, re-usable tool kit for

others to adopt?

4. how do we handle

versioning? By term, or by

‘collection’ - e.g. vocabulary

or profile

versioning by term

Paul WalkDirector, Antleaf

Managing Director, Dublin Core Metadata Initiative (DCMI)

Web: http://www.paulwalk.net

Email: [email protected]

Twitter: @paulwalk www.antleaf.com www.dublincore.org

Thank you!

Technology

Documenting metadata application profiles and vocabularies