Metadata and Controlled Vocabularies - IA

  • Upload
    ar9vega

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

Controlled vocabularies in relation to the metadata work.

Citation preview

  • Taxo

    nom

    y &

    Met

    adat

    a / I

    nfor

    mat

    ion

    Arch

    itect

    ure

    Cons

    ultin

    g

    Amy J. Warner, Ph.D.

    Metadata & Taxonomies for a More Flexible Information

    ArchitectureInformation Architecture Summit

    March 16, 2002Amy J. Warner, Ph.D.

    [email protected]

  • Amy J. Warner, Ph.D. 2

    Outline

    What Ill cover: Metadata and IA. Metadata schema. Vocabulary development.

    Underlying themes: Standards. Reality. Some IR (information retrieval) issues.

  • Amy J. Warner, Ph.D. 3

    What is Metadata?

    Metadata is structured data which describes the characteristics of a resource. It shares many similar characteristics to the cataloguing that takes place in libraries, museums and archives.

    Chris TaylorUniversity of Queensland

  • Amy J. Warner, Ph.D. 4

    Types & Functions of MetadataTYPE DEFINITION EXAMPLES

    Administrative Metadata used in managing andadministering resources

    Acquisition informationRights and reproduction trackingDocumentation of legal accessrequirementsLocation informationVersion control

    Descriptive Metadata used to describe oridentify information resources

    Cataloging recordsSpecialized indexesHyperlinked relationships betweenresourcesAnnotations by users

    Preservation Metadata related to thepreservation of informationresources

    Documentation of actions taken topreserve physical and digitalversions of resources (e.g., datarefreshing and migration)

    Technical Metadata related to how asystem functions or metadatabehaves

    Digitization information (e.g.,formats, compression ratios,scaling routines)Authentication and security data(e.g., encryptions, passwords)

    Use Metadata related to the level andtype of use of informationresources

    Use and user trackingContent re-use and multi-versioning information

    Introduction to Metadata, Getty Information Institute

  • Amy J. Warner, Ph.D. 5

    Confusing Terminology Controlled vocabularies

    Subject Headings: traditionally employed in libraries to tag (index) the topics of books and other library materials

    Thesauri: traditionally employed in abstracting & indexing services to tag (index) the topics of journal articles and other scholarly material in a given subject area (e.g. medicine, engineering)

    Taxonomies: the classification of different organisms into mutually exclusive categories based on phylum species

  • Amy J. Warner, Ph.D. 6

    Levels of Control

    Simple Complex

    SynonymRings

    AuthorityFiles Thesauri

    ClassificationSchemes

    Equivalence Hierarchical Associative

    (Vocabularies)

    (Relationships)

    Taxonomies

  • Amy J. Warner, Ph.D. 7

    Metadata & IA

    Content

    UsersBusinessContext

    Identify patternsin content

    Determine how target audience(s) search for and use information

    Determine how stakeholderswant to organize &present

    their information

  • Amy J. Warner, Ph.D. 8

    IA Generations

    Brochureware

    Pages served from database

    Metadata-driven website

    CMS

  • Amy J. Warner, Ph.D. 9

    Metadata in Metadata-Driven Websites

    MetadataRecords

    Content

    J. Jones xxxx White Paper Employees http://...

    Author Title DocType Audience URL

    http://.

  • Amy J. Warner, Ph.D. 10

    Two Parts to Generating a Metadata Schema

    Decisions about indexable parameters (attributes, aspects) of documents; this corresponds to fields in the database records.

    Decisions about the elements (terms, descriptors, subject headings, tags) that these fields contain.

  • Amy J. Warner, Ph.D. 11

    Two Possibilities

    Content already exists Identify content that exists--content

    inventory. Most or all content does not exist

    Use wish lists to identify desired content. To do content inventory, need to go to

    those who are going to develop, own, maintain content.

  • Amy J. Warner, Ph.D. 12

    Content Analysis

    Look for patterns, similarities: logical--themes, sensitivity, specialization. physical--formats, dynamic vs. static (dated

    vs. rarely updated). Look for relationships--note connections

    between content (parent-child, sibling, dependencies.

    Begin to create groupings.

  • Amy J. Warner, Ph.D. 13

    Generating a Metadata Table The beginning of a metadata-driven website. Determine the major indexable parameters or attributes

    for each major document type in your sample. Determine what major types of rules or general guidelines

    your indexing system will follow for each attribute. Create an X-by-Y table. Put indexable attributes on the X axis and the rules on the

    Y axis. Fill in the decisions you make about each rule application

    in the individual cells of the table.

  • Amy J. Warner, Ph.D. 14

    Required Repeatable Auto/Manual Whole doc/Concepts

    CV

    Author Yes Yes Manual Whole Doc. No

    Title Yes No Manual Whole Doc. No

    DocType No Yes Manual Whole Doc. DocTypesList

    Subject Yes Yes Semi-Auto Concepts SubjectsVocabulary

    Audience No No Manual WholeDocument

    AudienceList

    Metadata Table

  • Amy J. Warner, Ph.D. 15

    User and Stakeholder Involvement

    When organizing content, start with the content, generate the metadata, and then evaluate with users and stakeholders.

    When organizing entities (i.e. products, projects) where content is not the major focus, start with stakeholders and users to determine metadata.

  • Amy J. Warner, Ph.D. 16

    Identify Terms Published Reference Materials

    Thesauri, classification schemes, encyclopedias, dictionaries, glossaries, indexes.

    Content Representative sample of web site / intranet.

    Users Search log analysis, surveys, interviews.

    Experts Authors, subject experts.

  • Amy J. Warner, Ph.D. 17

    Organize Terms Define preferred terms. Link synonyms and variants.

    Synonym Rings

    Group preferred terms by subject. Identify broader and narrower terms.

    Taxonomies / Hierarchies Identify related terms.

    Thesauri

  • Amy J. Warner, Ph.D. 18

    Variant Terms

    Variant terms provide the user with entrypoints into the vocabulary.

    Synonyms (same meaning):cats USE felines helicopters USE whirlybirds

    Lexical Variants (different word forms):paediatrics USE pediatrics BK USE Burger King

    Quasi-Synonyms (treated as equivalent):generic posting: beagle USE dogantonyms/continuum: wetness USE dryness

  • Amy J. Warner, Ph.D. 19

    Term Specificity

    Assuming a good entry vocabulary, increased term specificity allows for improved precision without hurting recall (but costs grow fast).

    Vocabulary A Vocabulary B United States United States

    California San Diego

  • Amy J. Warner, Ph.D. 20

    Compound TermsArticle Title: Software for Information Architects

    Hig

    h Pr

    ecis

    ion

    Hig

    h R

    ecal

    lOne Term Information Architecture Software

    Two Terms Information Architecture Software

    Three Terms Architecture Information Software

  • Amy J. Warner, Ph.D. 21

    Facets

    Things (entities)ConceptsProcessesPeopleOrganizationsOccupations

    etc.

    TopicAudienceIntellectual LevelFormTypeLanguageDate

    etc.

    Facets of a Topic Facets of Documents

    Aspects of Documentsto Index

    Controlled Vocabular(ies)

  • Amy J. Warner, Ph.D. 22

    Facet Analysis

    Facets come from content inventory, intuition, and users.

    Break domain into logical categories or chunks based on how documents need to be managed (both for system and for search).

  • Amy J. Warner, Ph.D. 23

    Polyhierarchy

    Strict Hierarchies Each term appears in only

    one place in the hierarchy. Essential for placement

    of physical objects. Polyhierarchies

    Terms cross-listed in multiple categories

    Accepts complex nature of reality.

  • Amy J. Warner, Ph.D. 24

    Polyhierarchy

    Compound terms neededto manage 6 milliondocuments in Medline.

    High level ofpre-coordinationforces polyhierarchy.

    Terms may havemore than one BT. Viral

    Pneumonia

    Diseases

    VirusDiseases

    RespiratoryTract

    Diseases

    Medical Subject Headings (MeSH)

  • Amy J. Warner, Ph.D. 25

    Facets, Coordination, Specificity

    Drying of ApplesDrying of PearsDrying of PeachesCanned ApplesCanned PearsCanned PeachesFrozen ApplesFrozen PearsFrozen PeachesFresh ApplesFresh PearsFresh PeachesFreezing of Canned ApplesCanning of Dried PearsDrying of Fresh Peaches

    EntitiesApplesPearsPeaches

    ProcessesCanningFreezingDrying

    FormsCannedFrozenFresh

    ApplesPearsPeachesCanningFreezingDryingCannedFrozenFreshCanning of ApplesCanning of PearsCanning of PeachesFreezing of ApplesFreezing of PearsFreezing of Peaches

    Partial List of Potential Combinations

  • Amy J. Warner, Ph.D. 26

    Semantic Relationships

    Equivalence: Use/Used For (USE/UF) Leads from variants to preferred

    e.g., prams: USE baby carriages

    A = B

  • Amy J. Warner, Ph.D. 27

    Semantic Relationships

    Hierarchical: Broader Term/Narrower Term (BT/NT)

    Types Generic (class/species, inheritance)

    Vertebrata NT Amphibia Whole-Part (associative unless exclusive)

    Ear NT Vestibular Apparatus Instance (proper name)

    Seas NT Mediterranean Sea

    AB

  • Amy J. Warner, Ph.D. 28

    Semantic Relationships

    Associative: Related Term (RT, See Also) Non-hierarchical and non-equivalent Relation should be strongly implied

    e.g., hammers RT nails

    A B

  • Amy J. Warner, Ph.D. 29

    Associative Relationships Field of Study and Object of Study:

    Forestry RT Forests Process and its Agent:

    Temperature Control RT Thermostat Concepts and their Properties:

    Poisons RT Toxicity Action and Product of Action:

    Weaving RT Cloth Concepts Linked by Causal Dependence:

    Bereavement RT Death

  • Amy J. Warner, Ph.D. 30

    Leveraging the Thesaurus User Interface:

    Generate browsable indexes (site-wide, sub-site, specialized authority lists).

    Enable Field-Specific Searching (filters, zones, sorting).

    Support personalization (map profile to vocabulary).

    Behind the Scenes: Enable efficient content management. Support decentralized tagging.

  • Amy J. Warner, Ph.D. 31

    Uses of Metadata-Driven Website

    Routing Search Navigation

  • Amy J. Warner, Ph.D. 32

    RoutingDocument Stream Metadata Filter Document Subset

    From IndividualContributors or Syndication Service

    Profile orFilter

  • Amy J. Warner, Ph.D. 33

    Generalizations about Routing

    Can be push or pull. Can be driven by various metadata

    elements (e.g., audience, topic, etc.). May have both internal and external

    metadata schemes to consider; mapping may be an important issue.

  • Amy J. Warner, Ph.D. 34

    SearchingSearchingUser Query Databases Document

    Subset

    MetadataRecords

    http://.

  • Amy J. Warner, Ph.D. 35

    Epicurious.com

  • Amy J. Warner, Ph.D. 36

    Epicurious, First Facet Browse > Picnics

  • Amy J. Warner, Ph.D. 37

    Epicurious.com FacetsBeans, Beef, Berries, Cheese, Chocolate, Citrus,Dairy, Eggs, Fish, Fruits, Garlic, Ginger, Grains,Greens, Herbs, Lamb, Mushrooms, Mustard, Nuts,Olives, Onions, Pasta, Peppers, Pork, Potatoes, Poultry, Rice, Shellfish, Tomatoes, Vegetables

    Main Ingredients

    African, American, Asian, Caribbean, EasternEuropean, French, Greek, Indian, Italian, Jewish,Mediterranean, Mexican, Middle Eastern,Scandinavian, Spanish

    Cuisine

    Advance, Bake, Broil, Fry, Grill, Marinade,Microwave, No Cook, Poach, Quick, Roast, Saut, Slow Cook, Steam, Stir Fry

    Preparation Method

    Christmas, Easter, Fall, Fourth of July,Hanukkah, New Years, Picnics, Spring,Summer, Superbowl, Thanksgiving, Valentine's Day, Winter

    Season/Occasion

    Appetizers, Bread, Breakfast, Brunch,Condiments, Cookies, Desserts, HorsD'oeuvres, Main Dish, Salads, Sandwiches,Sauces, Side Dish, Snacks, Soup, Vegetables

    Course/Dish

  • Amy J. Warner, Ph.D. 38

    Epicurious, Second FacetBrowse > Picnics > Poultry

  • Amy J. Warner, Ph.D. 39

    Integration of Search and Browse

  • Amy J. Warner, Ph.D. 40

    Integration of Search and Browse

  • Amy J. Warner, Ph.D. 41

    Amazon.com Advanced Search

  • Amy J. Warner, Ph.D. 42

    Generalizations about Search & Navigation

    The relationship between the metadata and search engine capabilities is crucial.

    Controlled vocabulary and keyword searching are often both enabled.

    Navigation and search are often both provided as complements to each other.

  • Amy J. Warner, Ph.D. 43

    Contact:Amy J. Warner, [email protected]

    Questions??

    Metadata & Taxonomies for a More Flexible Information ArchitectureOutlineWhat is Metadata?Types & Functions of MetadataConfusing TerminologyLevels of ControlMetadata & IAIA GenerationsMetadata in Metadata-Driven WebsitesTwo Parts to Generating a Metadata SchemaTwo PossibilitiesContent AnalysisGenerating a Metadata TableSlide 14User and Stakeholder InvolvementIdentify TermsOrganize TermsVariant TermsTerm SpecificityCompound TermsFacetsFacet AnalysisPolyhierarchySlide 24Facets, Coordination, SpecificitySemantic RelationshipsSlide 27Slide 28Associative RelationshipsLeveraging the ThesaurusUses of Metadata-Driven WebsiteRoutingGeneralizations about RoutingSearchingEpicurious.comEpicurious, First FacetEpicurious.com FacetsEpicurious, Second FacetIntegration of Search and BrowseSlide 40Amazon.com Advanced SearchGeneralizations about Search & NavigationSlide 43