Metadatos Border Crossings Weiber Stuart

Embed Size (px)

Citation preview

  • 8/14/2019 Metadatos Border Crossings Weiber Stuart


    D-Lib Magazine

    July/August 2005

    Volume 11 Number 7/8

    ISSN 1082-9873

    Border Crossings

    Reflections on a Decade of Metadata Consensus Building

    Stuart L. Weibel

    Senior Research Scientist

    OCLC Research

    In June of this year, I performed my final official duties as part of the Dublin Core Metadata

    Initiative management team. It is a happy irony to affix a seal on that service in this journal, as

    both D-Lib Magazine and the Dublin Core celebrate their tenth anniversaries. This essay is apersonal reflection on some of the achievements and lessons of that decade.

    The OCLC-NCSA Metadata Workshop took place in March of 1995, and as we tried to understand

    what it meant and who would care, D-Lib magazine came into being and offered a natural venue

    for sharing our work [16]. I recall a certain skepticism when Bill Arms said "We want D-Lib to be

    the first place people look for the latest developments in digital library research." These were the

    early days in the evolution of electronic publishing, and the goal was ambitious. By any measure, a

    decade of high-quality electronic publishing is an auspicious accomplishment, and D-Lib (and its

    host, CNRI) deserve congratulations for having achieved their goal. I am grateful to have been a


    That first DC workshop led to further workshops, a community, a variety of standards in several

    countries, an ISO standard, a conference series, and an international consortium. Looking back on

    this evolution is both satisfying and wistful. While I am pleased that the achievements are

    substantial, the unmet challenges also provide a rich till in which to cultivate insights on the

    development of digital infrastructure.

    The Achievements

    When we started down the metadata garden path, the term itself was new to most. The known Web

    was less than a million pages, people tried to bribe their way into sold-out Web conferences, and

    the term 'search engine' was as yet unfamiliar outside of research labs. The OCLC-NCSA Metadata

    Workshop brought practitioners and theoreticians together to identify approaches to improve

    discovery. In two and a half days, an eclectic Gang of 52 (we affectionately described ourselves as

    'geeks, freaks, and people with sensible shoes') brought forward a core element set upon which

    many resource description efforts have since been based.

    The goal was simple, modular, extensible metadata a starting place for more elaborate

    description schemes. From the thirteen original elements we grew to a core of fifteen, and later

    elaborated the means for refining those rough categories. In recent years much work has been done

    on the modular and extensible aspects, as application profiles have emerged to bring together terms

    from separate vocabularies [9].

    A Consensus Community

    The workshop series coalesced as a community of people from many countries and many domains,
  • 8/14/2019 Metadatos Border Crossings Weiber Stuart


    drawn by the appeal of a simple metadata standard. Openness was thePrime Directive, and earlyprogress was often marked by the contentious debate of consensus building. But our belief that

    value would emerge from many voices informed our deliberations, and still does. Not without

    difficulty: in one early meeting, participants spent an hour of scarce plenary time talking about

    Type before realizing that the librarians and the computer scientists had been talking about

    completely different concepts. Crossing borders is often difficult.

    This open, inclusive approach to problem solving helped the Dublin Core community to frame themetadata conversation for the past decade. The Dublin Core brand has been for some years the first

    link returned for the Google search term "metadata", and for a time, it outranked all other results

    for the search "Dublin" (as of this writing, it is #6). With only moderate irony, we might say "I feel



    As a workshop series evolved into a set of standards and a community, the need for rules and

    governance evolved as well. DCMI developed a process for evaluating proposed changes and

    bringing them into conformance with the overall standard [5]. The DCMI Usage Board is

    comprised of knowledgeable, experienced metadata experts from five countries who exerciseeditorial guidance over the evolution of DCMI terms and their conformance with the DCMI

    Abstract Model [13].

    This model itself is among the most important of the achievements of the Initiative, representing as

    it does the convergence of theory and practice over a decade of vigorous debate and practical

    implementation. It emerged from early intuition and experience, informed by an evolving sense of

    grammatical structure [2,6] and further refined by a long co-evolution with the W3C's Resource

    Description Framework (RDF) and the Semantic Web.

    At a higher level, DCMI has a Board of Trustees [1], who oversee operations and do strategic

    planning, and an Affiliate Program and governance structure that distributes the cost of the

    initiative and assures that the needs of stakeholders are accommodated [3]. At the time of this

    writing, there are four national DCMI Affiliates and several more in discussion.


    The global nature of the Web demands commitment to internationalization. The difficulties of

    achieving system interoperability in multiple languages are immense, and still only partially solved

    (anyone used IRIs recently?). Nonetheless, DCMI has succeeded in attracting translations of its

    basic terms in 25 languages and offers a multilingual registry infrastructure of global reach [14].

    The venues for the workshops and conferences have been chosen to make the Initiative accessibleto people in as many places as possible. Workshops and conferences are held in the Americas,

    Europe, and Austral-Asia on a rotating basis, and Dublin Core principals have given talks on every

    continent save Antarctica. This policy of international inclusion has been a philosophic mainstay

    for the Initiative, attracting long-term participation from around the world.

    Where we were confused

    Confusions and unmet challenges are both interesting and instructive. A few of these are historical

    curiosities, and interesting mostly as a source of wry humility. Others represent unsolved dilemmas

    that remain prominent challenges for the metadata world in general.

    Author-created Metadata

    The idea of user-created metadata is seductive. Creating metadata early in the life cycle of an

    information asset makes sense, and who should know the content better than its creator? Creators
  • 8/14/2019 Metadatos Border Crossings Weiber Stuart


    also have the incentive of their work being more easily found who wouldn'twant to spend anextra few minutes with so much already invested?

    The answer is that almost nobody will spend the time, and probably the majority of those who do

    are in the business of creating metadata-spam. Creating good quality metadata is challenging, and

    users are unlikely to have the knowledge or patience to do it very well, let alone fit it into an

    appropriate context with related resources. Our expectations to the contrary seem touchingly nave

    in retrospect.

    The challenge of creating cost-effective metadata remains prominent. As Erik Duval pointed out in

    his DC-2004 keynote, 'Librarians don't scale' [7]. We need automated (or at least, hybrid) means

    for creating metadata that is both useful and inexpensive.

    What is metadata for?

    Another nave assumption was that metadata would be the primary key to discovery on the Web.

    While one may quibble about the effectiveness of unstructured search for some purposes, it is the

    dominant idiom of discovery for Web resources, and may be expected to remain so. What then, is

    metadata for?

    There are many answers to this question, though given the high stakes in the search domain, expect

    these answers to shift and weave for the foreseeable future. Searching the so-called 'dark web'

    remains a function of gated access, and metadata is a central feature of such access. One might

    simply say harvest and index. OCLC's exposure of WorldCat assets in search engines such as

    Google and Yahoo is exemplary of this approach [11]. Indexed metadata terms connect users to the

    location of the physical assets via holdings records, but it is reasonable to ask... would simple, full-

    text indexing of these assets be better still? We may argue the fine points today but in the future,

    we'll know the answer, for the day of digitization is fast upon us.

    Structured metadata remains important in organizing and managing intellectual assets. The

    Canadian Government's approach to managing electronic information illustrates this strategy [4].

    Metadata becomes the linkage relating content, legislative mandates, reporting requirements,

    intended audience, and various other management functions. One does not achieve this sort of

    functionality with unstructured full text.

    The International Press Telecommunications Council is exploring embedding Dublin Core in their

    new generation of news standards [17]. No domain is more digitally now than this one. If you wantto know the value of structured metadata, look to the requirements and business cases in such

    communities [10].

    Similarly, in the management of intellectual property rights, well-structured data is essential, and

    as these requirements become ubiquitous, the creation and management of metadata will be central

    to the story.

    Metadata for images is a critical use. Association of images with text makes them discoverable.

    When the asset is a stand-alone image, metadata is the primary avenue by which they can be

    accessed. Picture Australia is an early and enduring (and widely copied) model in this area,

    showing how a photo archive can become a primary cultural heritage asset through the addition of

    systematic search tools and Web accessibility [12].

    There is much talk of taxonomies, their strengths, and deficiencies these days and in fact theemergence of 'folksonomies ' hints at a sea change in the use of vocabularies to improve

    organization and discovery [9]. The Dublin Core community has struggled with the role of

    controlled vocabularies, how to declare and use them, and how important (or impotent?) they

    might be. The notion that uncontrolled vocabularies community-based, emergent vocabularies
  • 8/14/2019 Metadatos Border Crossings Weiber Stuart


    might play an important role in aggregation and discovery occasions a certain discomfort for those

    schooled in formal information management. Whether it is just the latest fad, or an important

    emerging trend, remains to be seen.

    A Major Unmet Challenge

    Entropy is an arrow. In the absence of constant care and fussing, our greatest successes break

    down. Failures, however, remain potent without much attention, retaining their power to impede.

    One of the yet-unsolved problems in the metadata community is the railroad gage dilemma. The

    first editor of D-Lib, Amy Friedlander, introduced me to the notion of train gages as metaphor for

    interoperability challenges [8]. Last year I rode that metaphor from Beijing to Ulan Bator,

    Mongolia. A cursory knowledge of Asian history reminds us that relations between Mongolia and

    China have been less-than-cordial from time to time, and this history remains manifest at the Gobi

    border crossing today. In the dark of night, the Beijing train of the Trans-Siberian Railway pulls

    into a longhouse of clanking and switching as the entire train is raised on hydraulic jacks. Chinese

    bogeys (wheel carriages) are rolled out, and Mongolian bogeys of a different gage are rolled in.

    Border guards with comically high hats (and un-comical sidearms) work their way through the

    train cars in the manner of border guards everywhere. After a couple of hours, the train is rollingthrough the Gobi anew.

    It is a fascinating display of technological diplomacy a kind ofMaginotline that helps those on

    both sides of the border sleep better. These images belong to a Bogart movie or a Clancy novel, but

    their abstraction pervades the metadata arena.

    Stacked bogeys, ready to be rolled into use. Photo by Stuart Weibel.
  • 8/14/2019 Metadatos Border Crossings Weiber Stuart


    A railroad car raised on one of dozens of hydraulic jacks that raise an entire train at once for

    the exchange of bogeys. Photo by Stuart Weibel.

    We load our metadata into structures in one domain and when we cross borders we unload it,

    repackage it, massage it to something slightly different, and suffer a measure of broken semantics

    in the bargain. We're running on different gages of track, manifested in different data models,

    slightly divergent semantics, and driven by related, but meandering, often poorly-understoodfunctional requirements. Crosswalks are the hydraulic jacks quieter, but no more efficient than

    the clanking and grinding in the train longhouse.

    Metadata standards specify the means to make (mostly) straightforward assertions about resources.

    Many of these assertions are as simple as attribute-value pairs. Others are more complex, involving

    dependencies or hierarchies. None are so complicated that they cannot be accommodated within a

    common formal model. Yet we do not have such a model in place. Why?

    NIH (Not Invented Here) Syndrome is often blamed for disparities that emerge in solutions

    from separate domains targeted at similar problems. Certainly our propensity to like our

    own ideas better than those of others plays a role, but my view is that it is not such a largerole.

    Developments take place in parallel. It is unusual to have the luxury of waiting to see how

    another group is approaching a particular problem before tackling it yourself. It is quite

    hard enough to know what is happening in one's own community, let alone to follow related

    developments in others, whose differences in terminology obscure what we need to know.

    The functional requirements of various metadata standards are often ambiguous and always

    focused slightly differently. DCMI focuses on simple, extensible, high-level metadata.

    IEEE LOM (Learning Object Metadata) also concerns itself with discovery metadata, but

    focuses more strongly on educational process descriptors. MPEG is about media, where

    technical image metadata is central, and intellectual property rights management is crucial.

    MODS is grounded firmly in the legacies of MARC (and the world's largest installed base

    of resource discovery systems).

    The cost of collaboration in intellectual as well as financial terms is high. People have

    to know and trust one another, which generally requires face-to-face engagement:

    transporting ourselves and our ideas to other time zones, surviving frequent-flyer-flues,

    finding the means to support travel costs, and missing baseball games of our children.

    The problems are more complicated than we imagine at the outset. The recent approval of

    the Dublin Core Abstract Model by DCMI is the culmination of a journey that began

    almost at the outset of the Initiative. Early attempts, under the guise of the DC Data Model

    Working Group, rank among my most contentious professional experiences. To borrow

    from the oldest joke of the Dismal Profession, put all the data modelers in the world end toend, and you won't reach a conclusion (we did, but it took ten years to manage it).

    The idea of achieving similar consensus across communities with their own legacies of such

    conflict is daunting in the extreme, though recent discussions on this topic with colleagues in

    another metadata community remind me that hopefulness and optimism are as much a part of our

    domain as contention [18].

    Collaboration and consensus in the digital environment

    The Web demands an international, multicultural approach to standards and infrastructure. The

    costs in time and treasure are substantial, and the results are uncertain. Paying for collaborationthat spans national boundaries, language barriers, and the often-divergent interests of different

    domains is a major part of these challenges. Doing this while sustaining forward progress and

    attracting a suitable mix of contributors, reviewers, implementers, and practitioners, is particularly
  • 8/14/2019 Metadatos Border Crossings Weiber Stuart



    A recent presentation by Google's Adam Bosworth, referenced in the Blandiose blog [15], makes

    for provocative reading for those debating the costs and benefits of heavy-weight versus light-

    weight standards. The tension between these approaches sharpens designers and practitioners (and

    especially, entrepreneurs), to the eventual benefit of users. Any standards activity ignores this

    balancing act at its peril.

    As we try to foment change and react to it at once, we are like Escher's Hands designing the

    future as it, in turn, designs us... except that there are often implements other than pencils in those

    hands. Ever try explaining what you do for a living to your mother? In the Internet standards arena,

    conveying an appropriate balance of glee, terror, satisfaction, frustration, and pure wonder is no

    easy task. I just tell her I'm not a real librarian, but I play one on the Internet. It seems enough.


    I wish to acknowledge my personal debt to uncountable colleagues in the Dublin Core community,

    and my deep sense of gratitude for the opportunity to have played the role I have. The patience,

    forbearance, and generosity of the support of OCLC management in supporting my efforts andDCMI in general, have been singular and essential.

    Thomas Baker reviewed and improved this manuscript with several insightful suggestions.

    Amy Friedlander and Bonnie Wilson, successive editors of D-Lib, have made me look better than I

    am in these pages for 10 years. Congratulations to them and to all who have helped make this

    journal (and its authors) what they are.

    References and Notes

    [1]About the Initiative DCMI Website, accessed June 23, 2005.

    [2] Baker, Thomas

    "A Grammar of Dublin Core"

    D-Lib Magazine, October 2000

    Volume 6 Number 10


    [3]DCMI Affiliate Program

    DCMI Website, accessed June 23, 2005.

    [4] Committee of Federal Metadata Experts Metadata Action Team,

    Council of Federal Libraries.

    Government of Canada Metadata Implementation Guide For Web Resources3rd edition - July 2004


    [5] DCMI Usage Board

    DCMI Usage Board Mission and Principle

    DCMI Website, June 11, 2003.

    [6] DCMI Usage Board

    DCMI Grammatical Principles
  • 8/14/2019 Metadatos Border Crossings Weiber Stuart


  • 8/14/2019 Metadatos Border Crossings Weiber Stuart


    [18] The author has been party to discussions with Erik Duval and Wayne Hodgins of the IEEE

    LOM effort centered around the possibility of cross-standard data modeling that might promote

    convergence among various metadata activities. The means and methods for carrying such work

    forward are presently undetermined.

    Copyright 2005 OCLC Online Computer Library Center, Inc.