A Dynamic Learning Object Life Cycle and its Implications for Automatic Metadata Generation

S KATHOLIEKE UNIVERSITEIT LEUVENFACULTEIT INGENIEURSWETENSCHAPPENDEPARTEMENT COMPUTERWETENSCHAPPENAFDELING INFORMATICACelestijnenlaan 200 A — B-3001 Leuven

A Dynamic Learning Object Life Cycle and its Implications forAutomatic Metadata Generation

Promotor :

Prof. Dr. H. OLIVIE

Proefschrift voorgedragen tothet behalen van het doctoraatin de ingenieurswetenschappen

door

Kris CARDINAELS

Juni 2007

S KATHOLIEKE UNIVERSITEIT LEUVENFACULTEIT INGENIEURSWETENSCHAPPENDEPARTEMENT COMPUTERWETENSCHAPPENAFDELING INFORMATICACelestijnenlaan 200 A — B-3001 Leuven

A Dynamic Learning Object Life Cycle and its Implications forAutomatic Metadata Generation

Jury :Prof. Dr. ir. D. Vandermeulen, voorzitterProf. Dr. H. Olivie, promotorProf. Dr. ir. E. DuvalProf. Dr. D. De SchreyeProf. Dr. M.-F. MoensProf. Dr. Daniel R. Rehak (University of Memphis, USA)

Proefschrift voorgedragen tothet behalen van het doctoraatin de ingenieurswetenschappen

door

Kris CARDINAELS

U.D.C. 681.3∗H1, 681.3∗H3

Juni 2007

c©Katholieke Universiteit Leuven – Faculteit IngenieurswetenschappenArenbergkasteel, B-3001 Heverlee (Belgium)

Alle rechten voorbehouden. Niets uit deze uitgave mag worden vermenigvuldigden/of openbaar gemaakt worden door middel van druk, fotocopie, microfilm, elek-tronisch of op welke andere wijze ook zonder voorafgaande schriftelijke toestemmingvan de uitgever.

All rights reserved. No part of the publication may be reproduced in any form byprint, photoprint, microfilm or any other means without written permission fromthe publisher.

D/2007/7515/39ISBN 978–90–5682–805–9

i

Voorwoord

Onmiddellijk na mijn opleiding als licentiaat Informatica ben ik in 1995 gestartals medewerker van prof. Henk Olivie en kreeg ik de kans mee te werken aan ARI-ADNE, een internationaal project waaruit nu dit doctoraatswerk voortvloeit.

Omdat een doctoraat niet tot stand kan komen zonder de steun van een on-derzoeksgroep, wil ik ook al de (ex-)leden van HMDB bedanken voor de goedesamenwerking. In de eerste plaats natuurlijk Henk Olivie voor het aanbiedenvan de mogelijkheid het onderzoek te starten en zijn steun al die jaren. Tentweede, Erik Duval voor de begeleiding in het kader van het ARIADNE-projecten de verderzetting van mijn onderzoek. En tenslotte de collega’s met wie ik bin-nen HMDB heb samengewerkt: Koen Hendrikx, Raf Van Durm, Bart Verhoeven,Thomas Cleenewerck, Stefaan Ternier en Michael Meire.

Ook wil ik de collega’s binnen Communicatie- en Multimediadesign van de KHLimbedanken voor de vaak pittige discussies over onderzoek in verschillende domeinen.Dit heeft me soms een heel andere kijk op het onderwerp gegeven en geholpen hetverhaal af te werken.

Bedankt aan mijn ouders, mijn schoonouders, mijn zus, schoonbroers en schoon-zussen voor de steun al die jaren. Bedankt aan al diegenen die me te pas en teonpas naar de toestand van m’n thesis vroegen en hierdoor de druk op de ketelhielden. Maar vooral bedankt aan Christine, Stan, Simon, Marthe en Kaat, jullieweten waarom.

ii

A Dynamic Learning Object Life Cycle and its Implicationsfor Automatic Metadata Generation

Starting from a reuse life cycle for courseware development, we define a dynamiclearning object life cycle in which the dynamic character of the metadata is thekey issue. The dynamic metadata help to enhance the reusability of the learningobject as the metadata can contain much richer information. In this life cycle, wehave taken the labeling phase out of the cycle and placed it in parallel with all theother phases. In this way, metadata can be added or updated throughout all thephases of the life cycle.Second, we have developed a framework of automatic metadata generation forlearning objects to overcome the problems with manual indexing. Within thisframework, metadata are generated automatically taking into account differentsources of information that are available in the different phases of the life cycle. Ex-amples are the learning objects themselves, user feedback, relationships with otherlearning objects, and so on. The implementation of this framework is based on aformal model of learning object metadata. This model defines how metadata canbe associated with learning objects and how the metadata from different sourcescan be combined to overcome conflicts between sources. In the formal model wealso introduce the notion of context-awareness for learning object metadata.

Een Dynamische Levenscyclus voor Leerobjecten en de Gevol-gen voor Automatische Metadatageneratie

Vertrekkende van een levenscyclus gebaseerd op hergebruik voor de ontwikkelingvan cursusmateriaal, definieren we een dynamische levenscyclus voor leerobjectenwaarbij het dynamische karakter van de metadata de belangrijkste rol speelt.Deze dynamische metadata helpen de herbruikbaarheid van de leerobjecten teverbeteren doordat de metadata veel rijkere informatie kunnen bevatten. In delevenscyclus wordt dit benadrukt door de indexeringsfase uit de cyclus te halen enparallel te plaatsen aan al de andere fasen van de cyclus. Op deze manier kunnenmetadata op elk moment van de andere fasen worden toegevoegd of aangepast.Ten tweede hebben we een raamwerk voor automatische metadatageneratie on-twikkeld om de problemen met manuele indexering op lossen. In dit raamwerkworden de metadata automatisch genereerd aan de hand van verschillende bronnenvan informatie die beschikbaar zijn in de verschillende fasen van de levenscyclus.Voorbeelden hiervan zijn, de leerobjecten zelf, gebruikersfeedback en relaties metandere leerobjecten. Om dit raamwerk te implementeren, hebben we een formeelmodel voor leerobject-metadata gedefinieerd. Dit model definieert hoe metadatamet leerobjecten geassocieerd kunnen worden en hoe metadata van verschillendebronnen gecombineerd kunnen worden door eventuele conflicten op te lossen. Inhet formele model introduceren we ook context-afhankelijke leerobject-metadata.

Contents

Contents iii

List of Acronyms vii

List of Figures ix

List of Tables xi

Preface xiii

1 Learning Objects & Learning Object Metadata 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Learning Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2.1 Categories of Definitions . . . . . . . . . . . . . . . . . . . . 21.2.2 The Interest in Reuse . . . . . . . . . . . . . . . . . . . . . 6

1.3 Learning Object Metadata . . . . . . . . . . . . . . . . . . . . . . . 81.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.3.2 IEEE Learning Object Metadata (LOM) . . . . . . . . . . . 9

1.4 ARIADNE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.4.2 ARIADNE Metadata Recommendation . . . . . . . . . . . 111.4.3 Knowledge Pool System . . . . . . . . . . . . . . . . . . . . 131.4.4 ARIADNE Life Cycle . . . . . . . . . . . . . . . . . . . . . 14

1.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.5.1 Dublin Core Metadata Initiative . . . . . . . . . . . . . . . 141.5.2 IMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.5.3 SCORM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.5.4 Merlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

iii

iv CONTENTS

2 Dynamic Learning Object Life Cycle 232.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.2 Reuse Life Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.2.1 Two Development Phases . . . . . . . . . . . . . . . . . . . 242.2.2 Knowledge Transfer between the Phases . . . . . . . . . . . 26

2.3 Traditional Life Cycle . . . . . . . . . . . . . . . . . . . . . . . . . 272.4 Dynamic Life Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.5 The Labeling Phase . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.5.1 Information Retrieval/Extraction . . . . . . . . . . . . . . . 332.5.2 Social Information Retrieval . . . . . . . . . . . . . . . . . . 382.5.3 Explicit Feedback . . . . . . . . . . . . . . . . . . . . . . . 402.5.4 Implicit Feedback . . . . . . . . . . . . . . . . . . . . . . . . 422.5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.6 Related Learning Object Life Cycles . . . . . . . . . . . . . . . . . 432.6.1 Alternative Presentation of the Traditional Life Cycle . . . 442.6.2 Learning Objects Ontology . . . . . . . . . . . . . . . . . . 452.6.3 COLIS Global Use Case . . . . . . . . . . . . . . . . . . . . 462.6.4 Digital Library Framework - Discovery to Delivery . . . . . 492.6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3 Automatic Metadata Generation 553.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553.2 Metadata Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.2.1 Manual Metadata . . . . . . . . . . . . . . . . . . . . . . . 583.2.2 Object Contents . . . . . . . . . . . . . . . . . . . . . . . . 593.2.3 Contexts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633.2.4 Actual Use . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.3 Different Sources - Different Values . . . . . . . . . . . . . . . . . . 663.3.1 Quality of Metadata . . . . . . . . . . . . . . . . . . . . . . 663.3.2 Uncertain Metadata . . . . . . . . . . . . . . . . . . . . . . 683.3.3 Conflicts Between Sources . . . . . . . . . . . . . . . . . . . 693.3.4 Using Confidence Values to Solve Conflicts . . . . . . . . . 70

3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4 Formal Reuse and Metadata Model 734.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734.2 Learning Objects and Metadata . . . . . . . . . . . . . . . . . . . . 74

4.2.1 Learning Objects . . . . . . . . . . . . . . . . . . . . . . . . 744.2.2 Single-valued Metadata Elements . . . . . . . . . . . . . . . 744.2.3 Multi-valued Metadata Elements . . . . . . . . . . . . . . . 754.2.4 Metadata Records . . . . . . . . . . . . . . . . . . . . . . . 75

4.3 Fuzzy Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

CONTENTS v

4.3.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . 764.3.2 Selecting Facet Values . . . . . . . . . . . . . . . . . . . . . 784.3.3 Information Retrieval Models . . . . . . . . . . . . . . . . . 79

4.4 Context-Awareness in Metadata . . . . . . . . . . . . . . . . . . . 814.4.1 Contexts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814.4.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 824.4.3 Context Reification . . . . . . . . . . . . . . . . . . . . . . . 83

4.5 RDF Metadata Model . . . . . . . . . . . . . . . . . . . . . . . . . 844.6 Metadata Standards . . . . . . . . . . . . . . . . . . . . . . . . . . 87

4.6.1 IEEE LOM . . . . . . . . . . . . . . . . . . . . . . . . . . . 874.6.2 Dublin Core Metadata Initiative . . . . . . . . . . . . . . . 89

4.7 Metadata Propagation . . . . . . . . . . . . . . . . . . . . . . . . . 914.7.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974.7.2 Context-aware Propagation . . . . . . . . . . . . . . . . . . 97

4.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

5 Automatic Metadata Generation Framework 1015.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1015.2 Overall Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

5.2.1 ObjectBasedIndexer . . . . . . . . . . . . . . . . . . . . . . 1045.2.2 ContextBasedIndexer . . . . . . . . . . . . . . . . . . . . . 1055.2.3 Confidence Values . . . . . . . . . . . . . . . . . . . . . . . 108

5.3 The Simple Indexing Interface . . . . . . . . . . . . . . . . . . . . . 1095.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

5.4.1 ToledoBBContextBasedIndexer . . . . . . . . . . . . . . . . 1145.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1175.4.3 Further Evaluation . . . . . . . . . . . . . . . . . . . . . . . 1185.4.4 Related Experimentation . . . . . . . . . . . . . . . . . . . 120

5.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1225.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

6 General Conclusions 1276.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1276.2 Further Research Topics . . . . . . . . . . . . . . . . . . . . . . . . 1306.3 Final Reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

7 Een Dynamische Levenscyclus voor Leerobjecten en de Gevolgenvoor Automatische Metadatageneratie 1337.1 Leerobjecten en Leerobject-metadata . . . . . . . . . . . . . . . . . 133

7.1.1 Leerobjecten . . . . . . . . . . . . . . . . . . . . . . . . . . 1337.1.2 Leerobject-metadata . . . . . . . . . . . . . . . . . . . . . . 134

7.2 Dynamische Levenscyclus . . . . . . . . . . . . . . . . . . . . . . . 1357.2.1 Levenscyclus . . . . . . . . . . . . . . . . . . . . . . . . . . 135

vi CONTENTS

7.2.2 Technologieen om Informatie te Vergaren . . . . . . . . . . 1367.3 Automatische Metadata-Generatie . . . . . . . . . . . . . . . . . . 137

7.3.1 Metadatabronnen . . . . . . . . . . . . . . . . . . . . . . . . 1377.3.2 Meerdere Bronnen - Meerdere Waarden . . . . . . . . . . . 138

7.4 Formele Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1387.5 Raamwerk voor Automatische Metadatageneratie . . . . . . . . . . 139

7.5.1 Klassenstructuur . . . . . . . . . . . . . . . . . . . . . . . . 1397.5.2 The Simple Indexing Interface . . . . . . . . . . . . . . . . 139

7.6 Besluit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1407.6.1 Mogelijk Verder Onderzoek . . . . . . . . . . . . . . . . . . 140

Bibliography 143

Curriculum Vitae 157

Publicaties van de Doctorandus 158

A The Formal Model, Summary 161A.1 Learning Objects and Metadata . . . . . . . . . . . . . . . . . . . . 161A.2 Fuzzy Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161A.3 Selecting Facet Values . . . . . . . . . . . . . . . . . . . . . . . . . 162A.4 Context-awareness . . . . . . . . . . . . . . . . . . . . . . . . . . . 162A.5 Metadata Propagation . . . . . . . . . . . . . . . . . . . . . . . . . 163

List of Acronyms

AMG Automatic Metadata GenerationAPI Application Program InterfaceARIADNE Alliance of Remote Instructional Authoring and Distribution

Networks for EuropeDCMI Dublin Core Metadata InitativeCOLIS Collaborative Online Learning and Information SystemsELF E-Learning FrameworkIEEE Institution of Electronic and Electric EngineersKPS Knowledge Pool SystemLCMS Learning Content Management SystemLMS Learning Management SystemLOR Learning Object RepositoryLOM Learning Object MetadataLTSC Learning Technology Standardization CommitteeMIME Multipurpose Internet Mail ExtensionsOAI Open Archives InitiativeOCW Open CourseWareOWL Web Ontology LanguageRDF Resource Description FramworkRLO Reusable Learning ObjectSCORM Sharable Content Object Reference ModelSII Simple Indexing InterfaceSQI Simple Query InterfaceXML Extensible Markup LanguageURI Universal Resource Identifier

vii

viii CONTENTS

List of Figures

1.1 The CISCO RLO-RIO Structure . . . . . . . . . . . . . . . . . . . 41.2 The ARIADNE Structure . . . . . . . . . . . . . . . . . . . . . . . 121.3 The SCORM Bookshelf . . . . . . . . . . . . . . . . . . . . . . . . 181.4 The SCORM Content Aggregation Model . . . . . . . . . . . . . . 19

2.1 Reuse Life Cycle [Vittorini and Di Felice, 2000] . . . . . . . . . . . 242.2 The Traditional Learning Object Life Cycle . . . . . . . . . . . . . 272.3 The Dynamic Learning Object Life Cycle . . . . . . . . . . . . . . 292.4 Use Scenario for Learning Resources [Van Assche and Vuorikari, 2006] 322.5 Information Retrieval/Extraction in the Life Cycle . . . . . . . . . 342.6 Learning Object Illustrating a Sorting Algorithm . . . . . . . . . . 352.7 Language Detection in MS Word . . . . . . . . . . . . . . . . . . . 362.8 Learning Object Metadata which may be ’copied’ to Related Learn-

ing Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.9 Learning Object Comparing Sorting Algoirthms . . . . . . . . . . . 382.10 Social Information Retrieval/Extraction in the Life Cycle . . . . . 392.11 Explicit Feedback mechanisms in the Life Cycle . . . . . . . . . . . 412.12 Implicit Feedback in the Life Cycle . . . . . . . . . . . . . . . . . . 422.13 Alternative Presentation proposed by [Strijker, 2004] . . . . . . . . 442.14 First-level Issues of the Learning Objects Ontology . . . . . . . . . 452.15 COLIS Global Use Case . . . . . . . . . . . . . . . . . . . . . . . . 482.16 Workflow in the D2D model . . . . . . . . . . . . . . . . . . . . . . 49

3.1 Metadata Information Sources in the Life Cycle . . . . . . . . . . . 583.2 A combination of manually and automatically filled in metadata

in the Blackboard LMS. The metadata elements Title, Language,Creation Date and Resource Format are filled in automatically bythe system; all elements can be modified by the user. . . . . . . . . 60

3.3 Learning object properties available for metadata harvesting in MSWord. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.4 The Attention Metadata Management framework [Najjar et al., 2005] 65

ix

x LIST OF FIGURES

4.1 A Simple RDF Graph . . . . . . . . . . . . . . . . . . . . . . . . . 844.2 A Structured RDF Graph . . . . . . . . . . . . . . . . . . . . . . . 854.3 A Blank Node in RDF . . . . . . . . . . . . . . . . . . . . . . . . . 864.4 Resource State/Condition Description Framework . . . . . . . . . 864.5 DCMI Resource Model . . . . . . . . . . . . . . . . . . . . . . . . . 894.6 DCMI Description Model . . . . . . . . . . . . . . . . . . . . . . . 904.7 Metadata Inheritance and Accumulation . . . . . . . . . . . . . . . 95

5.1 Overall Structure of the Automatic Indexing Framework . . . . . . 1025.2 The Class Hierarchy of ObjectBasedIndexer . . . . . . . . . . . . . 1035.3 The Class Hierarchy of ContextBasedIndexer . . . . . . . . . . . . 1065.4 Information Flow in the AMG Framework . . . . . . . . . . . . . . 1115.5 Typical SII Sequence Diagram . . . . . . . . . . . . . . . . . . . . 1135.6 Average Quality Grade for the Metadata [Meire et al., 2007] . . . . 1205.7 Expected Accuracy Level for Automatic Generation of Dublinc Core

[Greenberg et al., 2005] . . . . . . . . . . . . . . . . . . . . . . . . 1215.8 IBM MAGIC’s object analyzers and metadata integretation [Li et

al., 2005] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1235.9 Beagle’s Indexing Process . . . . . . . . . . . . . . . . . . . . . . . 124

7.1 De Dynamische Levenscyclus voor Leerobjecten . . . . . . . . . . . 135

List of Tables

1.1 The Dublin Core Element Set . . . . . . . . . . . . . . . . . . . . . 15

2.1 Mapping of the Learning Object Ontology to Dynamic Life Cycle . 472.2 Comparison of the Different Life Cycle Models . . . . . . . . . . . 51

4.1 Language Detection Accuracy . . . . . . . . . . . . . . . . . . . . . 77

5.1 Methods of the Simple Indexing Interface . . . . . . . . . . . . . . 1145.2 Metadata Elements Generated by the Toledo Context . . . . . . . 1155.3 Metadata Elements Generated by the Object-based Indexers . . . 1155.4 Comparing Automatic and Manual Metadata . . . . . . . . . . . . 119

xi

xii LIST OF TABLES

Preface

In the last decade, learning objects have gained a lot of interest as the basis of anew type of computer-based instruction in which the instructional content is cre-ated from individual components. The interesting aspect of these learning objectsare their reusability, adaptability and scalability. One of the current drawbacksis their availability. The availability is defined as the possibility to retrieve theappropriate learning object from a large set. An important aspect of the avail-ability is the usefulness of the associated metadata. The goal of this research is tooptimize the availability of the learning objects in order to increase the reusability.

This optimization is performed by looking at two major aspects of metadata.First of all, learning object metadata are considered to have a dynamic character.This is represented in the dynamic life cycle of learning objects, which is opposedto the traditionally static approach of the life cycle. Within this life cycle, thelabeling phase - the phase in which metadata about the learning object are added- plays an important role, because it allows a continuous update of the metadataduring the life cycle of the learning object.

Furthermore, a framework of automatic metadata generation for learning objects isimplemented to overcome the problems with manual indexing. Within this frame-work, metadata are generated automatically taking into account different sourcesof information that are available in the different phases of the life cycle. Exam-ples are the learning objects themselves, user feedback, relationships with otherlearning objects, and so on. The implementation of this framework is based on aformal model of learning object metadata. This model defines how metadata canbe associated with learning objects and how the metadata from different sourcescan be combined to overcome conflicts between sources. In the formal model wealso introduce the notion of context-awareness for learning object metadata.

xiii

xiv LIST OF TABLES

Contents

Chapter 1: Learning Objects & Learning Object Metadata

This chapter considers which definitions for learning objects and learning objectmetadata apply in the context of the dynamic life cycle and automatic metadatageneration. Both for learning objects and for their metadata, multiple definitionsare in use, depending on the context. Chapter 1 provides the appropriate defini-tions and the focal points in this work.

As a general introduction for this research, Chapter 1 refers to current work con-cerning learning objects and metadata, such as the IEEE LTSC LOM standardfor learning object metadata and the ARIADNE system. This latter has been theworking context for my work. It also briefly describes related research projectswhich are referred to throughout this text.

The conclusions of Chapter 1 list several problems with learning objects and meta-data, which allow to explain the further structure of this thesis.

Chapter 2: Dynamic Metadata in the Learning Object Life Cycle

This chapter deals with the definition of the dynamic learning object life cycle.This is considered as an important factor to enhance the reusability of learningobjects, because the metadata of the learning objects will be much richer, repre-senting a.o. real use information. The dynamic life cycle replaces the traditionalstatic life cycle approach of learning objects.

In this dynamic life cycle, the labeling phase is taken out of the life cycle andplaced in parallel with all the other phases. In this way, metadata can be addedor updated throughout all the phases of the life cycle.

Chapter 2 describes the different sources of information that become availableduring the life of a learning object and how they can be applied in the labelingphase.

Chapter 3: Automatic Metadata Generation

Automatic metadata generation is important because for two reasons. First, re-search has proven that automatic indexing can be as good as manual indexing.Second, it helps to acquire the critical mass of learning objects to establish realreuse.

We consider several reasons why users rarely make the learning objects available

LIST OF TABLES xv

for reuse or do not create metadata for those objects. As a consequence, that criti-cal mass is not obtained. Most important, the current tools available for metadatacreation are not user friendly. Most tools directly relate to some standard andpresent that standard to the users. The user has to fill in a substantial numberof electronic forms. However, the standards are not meant for end users. A directrepresentation of these standards on forms makes it very difficult and time con-suming to fill out the correct values for metadata in substantial quantities.

The aspect of automatic metadata generation is tackled by describing how meta-data can be acquired from different sources of information about the learningobjects and how these values can be combined into the metadata for that object.The contribution of this chapter is not the creation of new algorithms or techniquesthat derive metadata, but a means of integrating different methods to acquire ap-propriate metadata.

The automatic metadata generation process closely relates to the dynamic lifecycle because it defines how the different sources of information in the differentphases can be applied to provide metadata. The dynamic life cycle enables theapplication of more advanced information than object-related information only,such as the analysis of the actual learning object use and context information.

The use of different sources of metadata introduces possible conflicts betweenvalues. Therefore, we have to provide a solution to resolve these conflicts. Chapter3 introduces a confidence value as a possible solution.

Chapter 4: Formal Reuse and Metadata Model

In this chapter, we formalize the automatic metadata generation model describedin the Chapter 3. Such a formal model helps to build a formal reasoning aboutmetadata generation and metadata propagation. In Chapter 5, an automatic in-dexing framework is developed based on the theory described and the formal modelgiven in Chapter 4.

The definition of the formal model is split into three parts. In the first part,learning objects are considered on their own. This means that we do not look attheir use in learning environments or their relationships with other learning ob-jects. In this approach the learning objects are also considered black boxes withand unknown internal structure.

In the second part, those learning objects are placed in a context of use related toother learning objects and possibly exposing a defined structure. This approach ismore realistic because learning objects are meant to be used in learning contextsand do have relationships with other learning objects within those contexts.

xvi LIST OF TABLES

In the third section of this chapter, we introduce metadata propagation rules, thatcan be used to generate metadata in a specific reuse situation, namely contentaggregation and disaggregation. The rules that are defined are an initial proposal,which requires further investigation.

Chapter 5: Automatic Metadata Generation Framework

This chapter describes the reference framework that has been implemented forautomatic metadata generation. The implementation is based on the explanationsgiven in the previous chapters.

The development of this framework has partially been performed within the con-text of this research, andcontinues as part of related work. Therefore, the ex-planation of the framework can be divided in two components. In the first, theimplementation of automatic indexing is considered. In the second, the frameworkas a service-oriented architecture is described. The main contribution of the re-search described in this thesis relates to the first part, the automatic indexing.In this thesis, only an initiation of the service-oriented architecture is developed,called the Simple Indexing Framework. This is currently being developed furtheras the Simple AMG Interface, focusing on the interoperability with learning objectrepositories.

Chapter 6: General Conclusions

This last chapter provides a general conclusion for this research and points to op-tions for further research concerning the reusability of learning objects.

In this chapter we compare the approach of this work with the current devel-opments on the semantic web or Web2.0. Metadata and the possibility to updatethese metadata continuously plays an important role for the semantic web andrelates closely to our work.

Chapter 1

Learning Objects &Learning Object Metadata

1.1 Introduction

In this chapter we introduce the domain of learning objects and learning objectmetadata. The focus of our research is on the enabling of learning object reusethrough the availability of appropriate metadata. In section 1.2 this will appear inthe definition we apply for learning objects, and for learning object metadata, insection 1.3.

Section 1.4 describes ARIADNE, which is the operational context of this researchin which we have developed our automatic indexing framework (see Chapter 5).In section 1.5 we briefly describe related research projects which we often refer tothroughout this text.

In section 1.6 we provide some conclusions and summarize the issues that wewill tackle in the remainder of this thesis.

1.2 Learning Objects

David Wiley explains the popularity of learning objects in e-learning as follows [Wi-ley, 2001]:

An instructional technology called ”learning objects”currently leads[. . . ] the position of technology of choice in the next generation ofinstructional design, development, and delivery, due to its potentialfor reusability, generativity, adaptability, and scalability.

1

2 Learning Objects & Learning Object Metadata

In this description, Wiley introduces an instructional technology which facilitatesthe design, development, and delivery of learning material. This technology iscalled ’learning objects’. The potential of this technology is gained from the basisupon which this technology is built, namely the individual components, also called’learning objects’. These components have the potential for reusability, generativ-ity, adaptability, and scalability. In this research, we focus on the second approachto learning objects, i.e. on the components that can be used in the instructionaltechnology to develop e-learning, instead of the learning technology itself.Learning objects have been approached from many different points of view, re-sulting in a pleiad of definitions. Because of this, it is important that we clearlydescribe which definition is appropriate for our purpose and which aspects we donot focus on. In the next section we describe the different options for defininglearning objects, and explain which definition we are using.

1.2.1 Categories of Definitions

Depending on the focus of research, several definitions of learning objects exist;[Farance, 1999] distinguishes at least three categories of definitions. The definitionsin these categories do not have to focus on a single aspect; mostly they combineseveral important aspects.

• ReusabilityThe idea of reusable components in instruction is not new. In 1965, TedNelson already described settings where information and course design werebased on the use of reusable objects taken from interconnected digital li-braries [Nelson, 1965; Oliver, 2001]. Reusability of learning material is oneof the main points of interest in learning objects and is mentioned in mostof the definitions found.

This object-based principle is based upon the idea that a course or lessoncan be built from reusable instructional components which can be built sep-arately but modified to the user’s needs. A learning object is a self-containedcomponent with associated metadata that allow to reuse the object in dif-ferent contexts. Additionally, learning objects are generally understood tobe digital entities deliverable over the internet, making them accessible andusable by multiple users in parallel [Wiley, 2001].

Therefore, learning objects are often related to the object-oriented paradigmof software development, although reusability is probably the only link be-tween these two.

Learning resources are objects in an object-oriented model. Theyhave methods and properties. Typical methods include rendering

1.2 Learning Objects 3

and assessment methods. Typical properties include content andrelationships to other resources [Friesen, 2001].

The object-oriented paradigm includes much more principles, such as encap-sulation and inheritance which do not apply to learning objects. For example,content models describe explicitly the internal structure of learning objectsin different types of components, which contradicts to the idea of encapsu-lation of objects.

Mostly, the definitions of learning objects explicitly refer to the idea ofreusability. The IEEE Learning Technology Standardization Committee (LTSC)uses it as the basis for its definition [IEEE, 2002]:

A learning object is defined as any entity, digital or non-digital,which can be re-used or referenced during technology-supportedlearning.

McGreal also focuses on the reusability of the learning objects, but empha-sizes the importance of standards, both for the learning objects and themetadata that describe them [McGreal, 2002]:

Learning objects enable and facilitate the use of educational con-tent online. Internationally accepted specifications and standardsmake them interoperable and reusable by different applications indiverse learning environments. The metadata that describe themfacilitate searching and render them accessible.

• Goal

In this category of definitions, the focus is on the objective of the learningobjects. Each learning object has an associated goal about the knowledgeto be achieved or to be accomplished. This learning objective (or ”learn-ing outcome”) is an explicit statement of what the learner is expected todemonstrate after the learning has been completed. A definition stressingthis educational goal is the following one1:

A self-contained piece of learning material with an associated learn-ing objective, which could be of any size and in a range of media.Learning objects are capable of re-use by being combined togetherwith other objects for different purposes.

The CISCO Reusable Learning Object Strategy [Barritt and Lewis, 1999]defines Reusable Information Objects (RIO) as the base components for

1 definition given at http://www.ict4lt.org/en/en glossary.htm, accessed October 30, 2006.


Figure 1.1: The CISCO RLO-RIO Structure

Reusable Learning Objects (RLO). A RIO is a collection of content items,practice items and assessment items that are combined based on a singlelearning objective. A Reusable Learning Object is based on a single objec-tive; each RIO is built upon an objective that supports the RLO’s objective.In practice, a RLO corresponds to a lesson, which is part of module. Fig-ure 1.1 shows how such RLO is constructed from RIO’s. Each RLO containsa pre-assessment that allows the learner to check whether he/she is ableto start studying the RLO and a post-assessment to see if the content issuccessfully studied. In order to learn the content, from five up to nine com-ponents are combined into the learning object, preceded by an overview andfollowed by a summary object. Depending on the results of the assessments,the learner can go directly to the appropriate information objects, study thecomplete learning object or skip the content of the learning object.

The CISCO strategy also shows the importance of content models to en-able reuse of learning objects. A content model describes the different typesof learning objects and their components and how they can be combined toform reusable objects. It provides a more precise definition of what learningobjects are and allow us to identify learning object components and repur-pose them [Verbert and Duval, 2004]. Repurposing belongs to the domainof reuse, but stresses the importance of processing learning objects to makethem available in multiple contexts, instead of direct reuse. Other contentmodels focussing on these aspects are the SCORM Content AggregationModel [ADL, 2004] and ALOCoM, a generalized content model for learningobjects [Verbert and Duval, 2004].


• Containerization

A learning object is considered as a separable unit that has boundaries andhides its implementation from the outer world. Again, this category is im-plied from the object-oriented paradigm; one of the main principles in object-orientation is that of containerization, which helps to enable reuse becausethe objects become much less dependent on one another if the inner detailsare hidden. In practice, however, we think that the idea of encapsulationis not completely feasible for learning objects, because the inner details areimportant for effective reuse, e.g. through content repurposing.

Containerization itself refers explicitly to the possibility of transportation,i.e. copy or transmit, of an object and its associated metadata from one lo-cation to another [Currie and Place, 2000]. Currently, this is also subject ofresearch, such as in the IMS Content Packaging Information model2 or theSCORM Content Aggregation Model [ADL, 2004].

Because of the black box principle in containerization, this focus also im-plies the need for metadata about learning objects. These metadata enablethe possibility to select learning objects for use, without knowing all the in-ner details.

On the other hand, the inner details of the learning object do become im-portant for the automatic metadata generation process, because this processneeds the contents and the structure to generate valuable metadata. Thiscontainerization aspect only applies to the use of learning objects for learn-ing, and does not apply to our objective of metadata generation.

Next to these categories, [Farance, 1999] describes two other categories that are lesscommonly used; we briefly mention them here for completeness. In the first cate-gory, learning objects are treated as objects that learn from the interaction withthe users. In the second, the focus is on the separation of content and structure.In this category, learning objects contain no course structure, but only content.

For our purpose, we can apply the definition of a learning object, given by theIEEE Learning Technology Standardization Committee, mentioned before:

Any entity, digital or non-digital, which can be re-used orreferenced during technology-supported learning.

2 see http://www.imsglobal.org/content/packaging/index.html


1.2.2 The Interest in Reuse

(Learning objects) reuse is the process of creating (instructional) components fromexisting components rather than building those components from scratch [Krueger,1992]. In [Wiley, 2001], one suggestion for the interest in learning objects is given,based on the findings of Reigeluth and Nelson [Reigeluth and Nelson, 1997]:

When teachers first gain access to instructional materials, they oftenbreak the material down into their constituent parts. They then re-assemble these parts in ways that support their individual instructionalgoals. This suggests one reason why reusable instructional components,or learning objects, may provide instructional benefits: if instructorsreceived instructional resources as individual components, this initialstep of decomposition could be bypassed, potentially increasing thespeed and efficiency of instructional development.

If the speed and efficiency of instructional development is increased, the interestfor reuse becomes very attractive for the instructional designers. In this case, thecost of reuse is small enough compared to creating a new learning object. We cancompare this with the attractiveness of software reuse, based on the costs of thedevelopment and the search for reusable components [Milli et al., 1995].

Generally, a distinction is made between either black box reuse, whereby the ob-ject is reused without modifications, or white box reuse, whereby the componentis adapted and then integrated. Although theoretically, black box reuse is applica-ble to learning objects, it will occur in very specific situation only or for learningobjects with a very low granularity (very small learning objects). White box reuse– learning object repurposing – is more applicable to learning objects.

The following discussion shows, from a theoretical point of view, when it is eco-nomical interesting to perform learning object reuse instead of developing fromscratch. In practice, these calculations are less interesting because most of the fac-tors cannot be calculated in advance.

Black box reuse becomes attractive if the following conditions is met. Csearch

is the cost of searching for the appropriate learning object, p: the chance of find-ing the appropriate learning object and Cdevelopment is the cost of developing thelearning object from scratch.

p× Csearch + (1− p)× [Csearch + Cdevelopment] ≤ Cdevelopment

orCsearch + (1− p)× Cdevelopment ≤ Cdevelopment


orCsearch ≤ p× Cdevelopment

Black box reuse is interesting if the chance of finding the appropriate learningobjects is large enough (large p) or if the cost of finding that learning object issmall enough (small Csearch).

This inequality reveals two important aspects about learning object repositories.First is the need of a large coverage, i.e. the need for a critical mass of learning ob-jects available in the repository. Secondly, searches must be effective and efficient.To establish this latter, appropriate metadata are an important aspect, where wefocus on in this research.

In the case of white box reuse, we have to distinguish between the cost of de-veloping the object from scratch and that of adapting an existing object. The costof this type of reuse can be expressed as:

Csearch + (1− p)× (CappproxSearch + q × Cadaptation + (1− q)× Cdevelopment)

In this expression, q expresses the chance of finding an appropriate learning objectfor adaptation to the user’s needs, CapproxSearch is the cost of searching for thisadaptable learning object, and Cadaptation is the cost of the adaptation.

For reuse to be attractive, the condition is still that its cost must be less thanthe cost of developing a new learning object from scratch:

Csearch + (1− p)× (CappproxSearch + q × Cadaptation + (1− q)× Cdevelopment) ≤p× Cdevelopment

Adaptation of an existing learning object is more interesting than creating a newlearning object from scratch if the cost of the prior is less than the cost of the de-velopment: Cadaptation ≤ Cdevelopment. The inequality above can then be rewrittenas follows:

Csearch + (1− p)× CapproxSearch ≤ p× Cdevelopment

In other words, the cost of searching for a satisfactory learning object, which canreused directly or must be adapted, should be less than the savings made for thosecases (100× p%) in which a learning object can be reused directly.


In this formula we accepted the premise that the cost of adaptation is less than thecost of developing from scratch. In general, however, the cost of adaptation growsfast if the portion of contents to be modified goes up. In this case, that premisemight not be correct anymore. According to [Milli et al., 1995], it is fair to saythat, for software reuse, white box reuse is cost-effective if it is restricted to thosecases where modifications are minor, and thus the cost of adaptation is low. Thegoal of content models and repurposing, in the domain of learning objects, is toprovide opportunities for cost-effective white-box reuse, with large modifications.

Both formulas stress the importance of low costs in searching for appropriatelearning objects, whether for direct reuse, or adaptation. Associating adequatemetadata with the learning objects surely helps to decrease this cost and makingreuse more attractive.

1.3 Learning Object Metadata

1.3.1 Definition

Comparable to the definition of learning objects, a single definition of metadatadoes not exist. Different definitions are in use, depending on the working context.Literally, metadata are data about data – information about an object [IEEE,2002]. The goal of metadata, however, is not only to provide a description aboutthe set of data. The most common goal is enabling the discovery of the objects de-scribed. [Greenberg, 2002] generalizes this idea, and describes metadata as functionenablers:

[Metadata are] structured data about an object that supportfunctions associated with the designated object.

Metadata must support the activities and behaviors of the object, by enabling thediscovery of the object, but also the application of the object. With respect tolearning objects, the main function of the objects is supporting learning; in thiscase the metadata are the enablers of this learning, for example by allowing toapply the learning object correctly, finding the appropriate learning object (basedon the didactic properties),. . .

Secondly, metadata are defined as structured data. Generally, metadata are clas-sified in (at least) the following categories: descriptive metadata, structural meta-data, and administrative metadata [NISO, 2004]. Other categories include rightsmanagement and preservation metadata. The structure of the metadata is definedin metadata schema specifications which serve as the model for a systematic or-dering of the data.

1.3 Learning Object Metadata 9

Because the generation of metadata is also the focus of our research, we refer tothe following definition that stresses the importance of the creation of metadata:

Information about a data set which is provided by the datasupplier or the generating algorithm and which provides adescription of the content, format, and utility of the data set.Metadata provide criteria which may be used to select datafor a particular (scientific) investigation3.

This definition also refers to metadata as function enablers, especially the selectingof data.

One of our objectives is to help the data supplier, i.e. the author of the learn-ing object or the indexer, in his/her difficult task by creating automatic metadatasuppliers that analyze the learning objects and contexts in which they are usedand return metadata based on this analysis.

1.3.2 IEEE Learning Object Metadata (LOM)

In 2002, the IEEE standard 1484.12.1, the Learning Object Metadata Data Model,has been defined [IEEE, 2002]. This standard specifies the syntax and semanticsof learning object metadata, defined as the attributes required to adequately de-scribe a learning object in order to enable our ability to discover, manage anduse learning objects. The standard accommodates the ability for locally extendingthe basic fields and entity types, and the fields can have a status of obligatoryor optional. Since its definition, it has become a widely adopted standard. All theother learning object initiatives have made their system compliant to this standard.

The purpose of the standard is to facilitate search, evaluation, acquisition, and useof learning objects, for instance by learners or instructors or automated softwareprocesses. The data model defines a conceptual schema that ensures that differentimplementations of the standard have a high degree of semantic interoperability.

Metadata Schema

The conceptual schema defines metadata elements in nine categories, which formthe base schema. All categories, and also the number of categories are extendiblefor specific goals.

1. General : this category of elements describes the learning object as a whole;

3 PO.DAAC Glossary, http://podaac.jpl.nasa.gov/glossary, accessed October 30, 2006.


2. Lifecycle: features related to the history and current state of this learningobject and those who have affected this learning object during its evolution;

3. Meta-Metadata: information about the metadata instance itself;

4. Technical : technical requirements and technical characteristics of the learn-ing object;

5. Educational : educational and pedagogic characteristics of the learning ob-ject;

6. Rights: intellectual property rights and conditions of use for the learningobject;

7. Relation: features that define the relationship between the learning objectand other related learning objects;

8. Annotation: provides comments on the educational use of the learning objectand provides information on when and by whom the comments were created;

9. Classification: describes this learning object in relation to a particular clas-sification system.

Metadata elements are either aggregate elements defining a sub-hierarchy of ele-ments or simple elements which contain the actual metadata values. Simple ele-ments are either single-valued or multi-valued. In the latter case, the list of valuescan be ordered or unordered. In total, the schema defines 77 elements that can beused to describe a learning object. We refer to [IEEE, 2002] for the description ofthese categories and the elements defined within these categories.

1.4 ARIADNE

1.4.1 Background

ARIADNE was initiated in 1996 by the European Commission’s telematics for ed-ucation and training program as the European educational digital library project[Duval et al., 2001; Duval, 1999]. Its goal is to promote the concept of share andreuse of educational resources. Within the project an infrastructure has been de-veloped for the production of reusable learning content, including its description,distributed storage, and discovery as well as its exploitation in structured courses.The ARIADNE architecture is built around a distributed library called the Knowl-edge Pool System, for use in both academic and corporate environments.

Next to the creation of tools for enabling share and reuse, one of the main goals for

1.4 ARIADNE 11

ARIADNE is to address realistic methodologies, techniques and tools so that learn-ers enrol in, access and use these technology-supported curricula. Different toolsare built to accomplish these goals, being categorized in the following classes [Forteet al., 1997a; Forte et al., 1997b] (Figure 1.2):

• Specific theoretical tools: Models or methodologies which are based ontheoretical results and which are applicable to pedagogic objects.

• Specific computer tools: Tools based on theoretical models which act onpedagogic elements either to create them, mark them, segment them,. . . Anexample is the Text Conceptual Segmentation & Hypertext Generation tool.

• Specific pedagogical elements: Software products developed with thespecific tools which will constitute the basic material of the knowledge pool;

• General system integration and access tools: Computer- or telematics-based tools or methods applied in a pedagogic context to assemble pedagogicelements into the components of a curriculum. In Figure 1.2, this is indicateas the Curriculum Editor and the ARIADNE Learner Interface.

After the finishing of the ARIADNE and ARIADNE II projects, the ’ARIADNEFoundation for the European Knowledge Pool’ was founded in mid 2000 and thefocus of the project slightly shifted to the management of learning objects andmetadata. Within the context of the ARIADNE Foundation, research in multiplelearning object domains is performed, not only focusing on repositories. Amongothers, these domains include automatic metadata generation, content models,attention metadata, information visualization for metadata access.

1.4.2 ARIADNE Metadata Recommendation

To enable share and reuse the ARIADNE project defined in the beginning of itswork its metadata scheme which had the goal to overcome two problems with thewidespread use of any metadata system [Duval et al., 2000]:

1. indexing should be as easy as possible;

2. the exploitation of the metadata by users should be as easy and efficient aspossible.

Together with the IMS project (see next section), ARIADNE submitted in 1998 ajoint proposal and specification to the IEEE LTSC which formed the basis for theLTSC Working Group 12 and resulted in the IEEE LOM standard in July 2002(see above).

The ARIADNE Educational Metadata Recommendation consisted of the followingcategories of descriptors:


Figure 1.2: The ARIADNE Structure

1.4 ARIADNE 13

1. General information on the resource itself,

2. Semantics of the resource,

3. Pedagogical attributes,

4. Technical characteristics,

5. Conditions for use,

6. Meta-metadata information,

7. Annotations (optional),

8. Physical data of the represented educational resource (optional).

Throughout the research and practical experiments with the metadata recom-mendation, the scheme has been updated in different versions but the principalidea of the categories and metadata records has been maintained. Currently, ARI-ADNE has defined a binding between its own metadata schema and the LOMstandard, allowing to exchange metadata records between multiple projects. Oneof the results of applying the standard for metadata is definition of the SimpleQuery Interface (SQI) which allows to create metadata aggregations and performaggregated searches among repositories [Simon et al., 2005].

1.4.3 Knowledge Pool System

In essence the Knowledge Pool System (KPS) is a distributed database of multi-media pedagogical documents (learning objects) and their descriptions (metadata)[Cardinaels et al., 1998]. Currently the KPS is also interconnected with other repos-itories, such as MERLOT, CGIAR or EdNA Online. The KPS itself is structuredas star-shaped distributed database with a central root node (Central KnowledgePool or CKP) and local nodes, which are either publicly available or private (LocalKnowledge Pool or LKP, and Private Knowledge Pool or PKP).

Metadata information is distributed to all the local nodes and the central nodes;the learning objects themselves, however, are copied to the central node only, un-less a local node explicitly demands that object. A policy of interest in objects isset up to define which objects are also distributed to other local nodes.

To cope with copyrights and fees the KPS allows to distinguish between differ-ent types of learning objects. The first distinction introduced was to have freeor not free documents. Later on this policy has been extended to distinguish be-tween documents available for everyone, for ARIADNE members only, for users ofa specific knowledge pool or on a negotiation basis.


1.4.4 ARIADNE Life Cycle

Figure 1.2 not only shows the structure of the ARIADNE system, it also showsthe life cycle of an ARIADNE learning object in this system. From top to bottomthe learning objects pass through the following steps:

1. Learning objects are developed using one of the authoring tools, startingfrom existing learning objects or from scratch.

2. The metadata are added to the learning object, using the pedagogical headergeneration, i.e. the indexing tool, called SILO in the current version of ARI-ADNE.

3. The learning object is stored in the Knowledge Pool System together withits metadata record.

4. Courses (called curricula) are created using the curriculum editor, selectingexisting learning objects from the KPS.

5. Learners access the learning objects through the curriculum web site.

In Chapter 2 we will see that this sequence corresponds well to the traditionallearning object life cycle. In that chapter we also propose a new life cycle sup-porting the creation of more advanced metadata, resulting from the actual use oflearning objects.

1.5 Related Work

The variety of learning object and metadata definitions indicates the broadness ofthe research domain learning objects have become. In this section, we provide anoverview of several important initiatives that relate to our research. The goal ofthis section is not to list all initiative exhaustively, but to provide some explana-tion on the projects and developments for further reference.

Most of the initiatives perform research on many learning object-related aspects,comparable to the ARIADNE Foundation, such as metadata, content models, in-teroperability of repositories, and so on. In the description of the related initiativesbelow, we refer to the most important aspects for dynamic metadata and auto-matic metadata generation.

1.5.1 Dublin Core Metadata Initiative

The Dublin Core Metadata Initiative4 (DCMI) has the mission to make it easierto find resources using the Internet through several activities:

4 Dublin Core Metadata Initiative: http://www.dublincore.org/

1.5 Related Work 15

Element Name DefinitionTitle A name given to the resourceCreator An entity primarily responsible for making the con-

tent of the resourceSubject A topic of the content of the resourceDescription An account of the content of the resourcePublisher An entity responsible for making the resource avail-

ableContributor An entity responsible for making contributions to the

content of the resourceDate A date of an event in the life cycle of the resourceType The nature or genre of the content of the resourceFormat The physical or digital manifestation of the resourceIdentifier An unambiguous reference to the resource within a

given contextSource A reference to a resource from which the present re-

source is derivedLanguage A language of the intellectual content of the resourceRelation A reference to a related resourceCoverage The extent or scope of the content of the resourceRights Information about right held in and over the resource

Table 1.1: The Dublin Core Element Set

• Develop metadata standards for discovery across domains;

• Define frameworks for the interoperation of metadata sets;

• Facilitate the development of community- or disciplinary-specific metadatasets.

The most known result of the DCMI is the Dublin Core Metadata Element Set,a definition of 15 metadata elements that can be applied to any resource (see Ta-ble 1.1). This element set is also referred to as Simple DC, because it is such asimple definition of metadata elements.

This simplicity of Dublin Core can be both a strength and a weakness. Simplicitylowers the cost of creating metadata and promotes interoperability. On the otherhand, simplicity does not accommodate the semantic and functional richness sup-ported by complex metadata schemes. In effect, the Dublin Core element set tradesrichness for wide visibility. The design of Dublin Core mitigates this loss by en-couraging the use of richer metadata schemes in combination with Dublin Core.Richer schemes can also be mapped to Dublin Core for export or for cross-system


searching. Conversely, simple Dublin Core records can be used as a starting pointfor the creation of more complex descriptions [ISO 15836:2003, 2003].

In parallel with Simple DC an extended version exists, known as Qualified DC,which allows the introduction of any metadata elements which may contain a com-plex structure. In this model, the users can define the metadata elements them-selves or use a mapping from existing standards to the qualified DC.

Dublin Core relates closely to the general metadata standard RDF5, which isa language to express any information about a resource in the world-wide web.RDF itself does not define which elements to use for the description of resources,but only offers the syntax to create statements.

In Chapter 4, we describe DC and RDF in more detail, when we compare ourformal metadata model with these systems.

1.5.2 IMS

IMS was founded in 1997 as the Instructional Management System project with theNational Learning Infrastructure Initiative of EDUCAUSE6. Through the years,the focus of the project has somewhat shifted from the development of those learn-ing management systems to the interoperability of learning systems and content,and only the abbreviated name IMS or the formal name IMS Global LearningConsortium (IMS/GLC) are used nowadays.

To establish the interoperability, IMS focusses on defining specifications for bothonline and off-line learning settings. An example of such a specification is theIMS Learning Resources Meta-data Specification [McKell and Thropp, 2001], butthe scope of the work is much broader than only metadata. Among others, thefollowing specifications are available from the IMS project:

• IMS Content Packaging [Smythe and Jackl, 2004]: an information modelfor a standardized set of structures that can be used to exchange contentbetween content creation tools, learning management systems, and run timeenvironments;

• IMS Digital Repositories [Riley and McKell, 2003]: to provide recommenda-tions for the interoperation of the most common repository functions;

• IMS Learner Information Package [Smythe et al., 2001]: a collection of in-formation about a learner or a producer of learning content to import datainto and extract data from an IMS compliant Learner Information server;

5 Resource Description Framework: http://www.w3.org/RDF/6 see http://www.imsglobal.org

1.5 Related Work 17

• IMS ePortfolio [Smythe et al., 2005]: to make ePortfolios interoperable acrossdifferent systems and institutions.

The development of the IMS Meta-data Specification and the ARIADNE Meta-data Recommendation started around the same time and the resulting specifica-tions corresponded very well to each other. Therefore, the two projects startedto cooperate for the development of one general standard, resulting in the IEEELOM specification that we have discussed before.

1.5.3 SCORM

SCORM is an acronym for Sharable Content Object Reference Model developed byADL [ADL, 2004]. The aim of this project is to provide the specifications neededfor the developers to produce content that is sharable, reusable, and most impor-tantly interoperable.

The rationale behind the development of the SCORM specifications is based onso-called ’ilities’, properties often associated with learning objects:

• Accessibility: the ability to locate and access instructional components fromone remote location and deliver them to many other locations.

• Adaptability: the ability to tailor instruction to individual and organizationalneeds.

• Affordability: the ability to increase efficiency and productivity by reducingthe time and costs involved in delivering instruction.

• Durability: the ability to withstand technology evolution and changes with-out costly redesign, reconfiguration or recoding.

• Interoperability: the ability to take instructional components developed inone location with one set of tools or platform and use them in anotherlocation with a different set of tools or platform.

• Reusability: the flexibility to incorporateinstructional components in multi-ple applications and contexts.

SCORM’s organization is mostly represented as a bookshelf, which is depicted inFigure 1.3, consisting of technical books containing the SCORM specifications.The different technical books are the Content Aggregation Model, the Run-TimeEnvironment book and the Sequencing and Navigation book. As shown in thepicture, SCORM integrates technology developments from groups such as IMS,ARIADNE and the IEEE LTSC within a single reference model to specify consis-tent implementations that can be used across the e-learning community.


Figure 1.3: The SCORM Bookshelf

For our work, the Content Aggregation Model (CAM) is the most interesting, be-cause it describes the organization of components to build a learning experience;it defines how the lower level sharable, learning resources are aggregated and or-ganized into higher-level units of instruction. The different components the modeldistinguishes are Assets, Sharable Content Objects (SCOs), Activities, ContentOrganization and Content Aggregations. Figure 1.4 shows how the CAM orga-nizes these into content aggregations.

Together with the metadata of a content aggregation, for us, content models aremost interesting because they make the relationships between the component ex-plicitly available. For automatic indexing, these relationships can be used to gen-erate metadata automatically in the case of the aggregation (the creation of theaggregate) or disaggregation (the extraction of components for individual reuse).We refer to this in the chapter on automatic indexing (Chapter 3).

1.5.4 Merlot

The Multimedia Educational Resource for Learning and Online Teaching (MER-LOT) is a community of academic institutions, professional discipline organiza-tions, and individual people building a collection of web-based teaching and learn-ing resources where faculty can easily find peer reviewed materials for use in theirclasses [Malloy and Hanley, 2001]. In practice it acts as a learning object repositorybut adds some additional features, such as discipline communities and peer-reviews[Vargo et al., 2003]. This latter is part of MERLOT’s strategic goal: to improve

1.5 Related Work 19

Figure 1.4: The SCORM Content Aggregation Model


the effectiveness of teaching and learning by increasing the quantity and quality ofpeer reviewed online learning materials that can be easily incorporated into facultydesigned courses.

MERLOT was the first repository in which search results were offered accord-ing to their peer review ranking. Peer reviews are performed on three dimensions:quality of content, ease of use, and potential effectiveness as a teaching tool. Qual-ity of Content encompasses both the educational significance of the content andits accuracy or validity. Ease of Use encompasses usability for first-time users, aes-thetic value, and provision of feedback to user responses. Potential effectivenessas a teaching tool encompasses the pedagogically appropriate use of media andinteractivity, and clarity of learning goals [Nesbit et al., 2002].

Currently (October, 2006) MERLOT contains about 15.000 learning resources inseven different disciplines: Arts, Business, Education, Humanities, Mathematicsand Statistics, Science and Technology, and Social Sciences. The system allows tocreate personal collections of material and to define and store assignments usingthe material found. Both the collections and assignments can be viewed by thecommunity members.

1.6 Conclusions

In this chapter we have introduced the important subjects of this thesis: learningobjects and learning object metadata. From the amount of definitions availablewe discussed which definitions are applicable for our purpose, focusing on the as-pects our research is about. We are aware of the incompleteness of this overview,but assembling a exhaustive overview of research about both learning objects andlearning object metadata would lead us too far. For example, we almost completelyignored the instructional use of learning objects, which itself is a broad domain ofresearch.

Although the definition of the IEEE LOM standard has led to a wide adoption oflearning objects metadata, learning objects still suffer from the difficulty to createmetadata. As a consequence we note that:

• Most reuse initiatives still struggle to achieve a critical mass of learningobjects to really establish reuse,

• Many learning objects only have a very limited set of metadata associatedto them [Najjar et al., 2003; Najjar et al., 2004],

• Currently, metadata are added only once and remain unchanged afterwards,during the further life of the learning object.

1.6 Conclusions 21

Both the critical mass and the amount of adequate metadata play an importantrole in the cost-expression for reuse to be attractive, as we discussed in section1.2.2.

We consider several reasons why users often do not make the learning objectsavailable for reuse or do not create metadata for those objects (see also [Duvaland Hodgins, 2003; Greenberg et al., 2003]). Most importantly, the current toolsavailable for metadata creation are not user friendly. Most tools directly relate tosome standard and present that standard to the users. The user has to fill in a sub-stantial number of electronic forms. However, such standards were not meant to bevisible to end users. A direct representation of these standards on forms makes itvery difficult and time consuming to fill out the correct values for the metadata insubstantial quantities. The slogan that ”electronic forms must die”addresses thisspecific concern.

To overcome the problem of the static metadata, we introduce in Chapter 2 thedynamic life cycle for learning objects. In this life cycle the labeling phase is takenout of the sequence and put on top of the other phases, allowing to add metadatain every other phase of the life cycle. During the life of the learning object, multiplesources of metadata will provide information for this labeling phase, contributingmetadata to the learning object.

A possible solution to the problem of a lack of metadata, is the automatic creationof it. In that way, the users do not have to bother about the metadata if theydo not want to. This can be compared with search engines on the web that indexweb pages in the background without any intervention of the creator or the hostof the site. In our approach, if the user wants to correct, add or delete metadata,he/she will still be able to do so, but most users will not need to spend time on it.In Chapter 3, we discuss the options for automatic metadata generation using dif-ferent information sources, introduced in the dynamic life cycle. In Chapter 4, wecreate a formal model that allows to implement the automatic indexing framework,discussed in Chapter 5.


Chapter 2

Dynamic Learning ObjectLife Cycle

2.1 Introduction

Every object, whether or not an electronic one, from its creation on, passes througha life cycle. With respect to the metadata, the learning object life cycle has beenneglected by most of the management systems or those systems considered themetadata to be very static. However, dynamic metadata can reflect much moreadvanced information, that cannot be obtained in a static life cycle, and thus en-able more advanced use of these metadata.

In this chapter we define the learning object life cycle from the dynamic meta-data point of view. In contrast to the traditional point of view, the metadata canbe updated in every phase of the life cycle; the traditional approach considers theindexing phase a singular event before the learning object is offered for reuse. Inthe dynamic approach, metadata become available in every stage of the life cycleand enhance the information constantly.

One of the main aspects we look at in this dynamic life cycle, is the availability ofmultiple sources of information about the learning object which provide informa-tion at different moments in that cycle. In Chapter 3 we describe how metadatacan be generated automatically, based on these different sources of information inthe life cycle. In that chapter we also refer to possible conflicts in the generatedmetadata and discuss options to overcome these problems.

The organization of this chapter is as follows. In section 2.2 we first describethe basis of our life cycle, the reuse life cycle for for courseware reuse. We ap-

23

24 Dynamic Learning Object Life Cycle

Figure 2.1: Reuse Life Cycle [Vittorini and Di Felice, 2000]

ply this reuse life cycle to base our information flow on in the dynamic learningobject life cycle. Next, section 2.3 discusses the traditional life cycle of learningobjects. In section 2.4 we introduce the dynamic learning object life cycle. Section2.5 then describes how the labeling phase in this life cycle must be interpreted. Insection 2.6 we compare the dynamic life cycle to related models. Finally, section2.7 provides some general conclusions concerning this life cycle.

2.2 Reuse Life Cycle

2.2.1 Two Development Phases

We have based our idea of dynamic metadata on the reuse life cycle for course-ware, defined by [Vittorini and Di Felice, 2000] (see Figure 2.1). This coursewarelife cycle is an adaptation of the life cycle defined by [Milli et al., 1995] for softwaredevelopment. This life cycle distinguishes between two main phases in the devel-opment: the first phase is the development for reuse, the second the developmentwith reuse.

The important aspect of this life cycle is the influence of both phases on eachother. The latter phase benefits from the first, but also influences it and viceversa: knowledge is transferred back and forth between phases.

1. In theory, in the first phase of this cycle, new objects are created for directuse, but attention should be paid to the reusability. This is done through thefollowing activities during the development:

• Qualification and generalization, and

2.2 Reuse Life Cycle 25

• Classification.

Qualification and generalization is performed to qualify a component forreuse by evaluating the cost to build it, its usefulness and quality, and thengeneralized by a particular abstraction mechanism.

The generalization step is very interesting for software reuse because thedevelopers benefit from the use of generic types. A generic object can be in-stantiated according to the current needs of the development requesting onlylittle effort. A well-known example of a generic type is a list. Lists are definedvery generally and can be instantiated to contain elements of a specific typewhen needed:

Generic declaration: List<T>Instantiated examples: List<Integer>

List<Animal>

For the generalization of learning objects, we distinguish between two types:first-order and second-order learning objects. In the discussion on the def-inition of learning objects, D. Wiley [Wiley, 2003] suggests that ”learningobjects should not contain content at all; rather, they should contain theeducational equivalent of algorithms instructional strategies (teaching tech-niques) for operating on separately available, structured content”. This hasled to the distinction of different types of learning objects, called first-orderand second-order learning objects. These are defined as follows [Allert et al.,2004]:

• First-Order Learning Objects are resources which are created or re-designed towards a specific learning objective. The learning objectiveis an integral part of the first-order learning object, no matter if it isexplicitly stated or not.

First-order learning objects, in general, only exploit the possibility ofpresentation generalization (presentation-generative in [Wiley, 2001]).Generalization of content in first-order learning objects is more diffi-cult, because of the close relationship between content and learningobjective.

• Second-Order Learning Objects provide and reflect learning strategies,such as problem-solving or decision making strategies operating on thefirst-order learning objects.

Wiley classifies these as generative-instructional. These objects possess


the logic and structure for combining learning objects and evaluatingstudent interactions with those combinations, created to support the in-stantiation of abstract instructional strategies (such as ”remember andperform a series of steps”) [Wiley, 2001; Brophy and Velankar, 2006].

Classification is used to catalogue components in a repository for later re-trievability. This step is also applicable to learning objects. We already men-tioned the importance of cataloging learning objects to enable share andreuse several times in this text. Classification, however, must be interpretedvery broadly, including adding metadata to the learning object, and classi-fying it according to (multiple) schemas.

In practice, we note that little attention is paid to the reusability of first-order learning objects except for the presentation generalization, mentionedabove. The main focus of this discussion on reusability (of first-order learningobjects) is on the separation of content and instruction, which should enablethe learning objects reusable in more educational settings (see a.o. [Wiley,2001; Polsani, 2003; Sicilia and Garcıa, 2003]).

2. The second phase represents the development with reuse. In this phase,(courseware) objects are developed using existing objects. According to [Milliet al., 1995] the development in this phase doesn’t have to be radically dif-ferent from the development in the first phase. Three activities are identifiedin this phase:

• Retrieval: repositories are searched for potential reusable components.

• Understanding and adaptation: reuse the learning object as a black boxor customize it into the particular needs of a course.

• Composition: retrieved components are assembled to build more com-plex objects.

2.2.2 Knowledge Transfer between the Phases

The interesting aspect of this reuse life cycle is the fact that development knowledgeand experience is transferred between the two phases. We distinguish betweendifferent types of knowledge that can be passed between the phases.

1. The first type is concerned with the development of the learning objectsthemselves. This is comparable to the development of reusable software as-sets. Figure 2.1 shows development data and metadata passing from the firstphase to the second and vice versa. The first phase, obviously, delivers thebasis for the development in the second phase – the courseware bases. Thesecond phase, on the other hand, influences how assets will be created thenext time, because development experience is available.

2.3 Traditional Life Cycle 27

2. The second way considers metadata as the information flow between thephases. As with the development knowledge, also the learning object meta-data are exchanged between the two phases: metadata of the reused learningobject influences the newly created object and the metadata of the latteralso influences the reused metadata.

An example of reuse in which this metadata flow becomes clear is the use ofcontent aggregations. When we are creating aggregations, metadata aboutthe components that are included in the aggregate influence the metadataof that aggregate. Vice versa, the metadata of the aggregate contain usefulinformation about the components in the case of disaggregation.

In the remainder of this chapter we explain this dynamic life cycle of learningobjects with respect to the metadata and explain how reuse benefits from it.Before going into the details of this life cycle, we first discuss the traditional lifecycle.

2.3 Traditional Life Cycle

Creating Labeling Offering Selecting Using Retaining

Figure 2.2: The Traditional Learning Object Life Cycle

In general, the life cycle is represented as a sequence of stages, starting at thecreation of the object through the use to the archival of the object. In [Strijker,2004], this life cycle consists of six different stages that are performed one afterthe other (see Figure 2.2). This life cycle mainly focuses on the creation of newlearning objects which pass through different stages during their existence. Thebasis for this model comes from the records management life cycle ([Pederson,1999]) which describes the different stages in the existence of a record from itscreation to its retirement.

The different stages of this life cycle are explained as follows:

1. Creating: in the first phase the learning objects are created. Creating alearning object can be accomplished in two ways: creating a new one orreusing an existing one. The first way conforms the first phase of the reuselife cycle previously discussed, i.e. development for reuse. The second way isthe development with reuse in which existing learning objects are reused.


It is our opinion that the use of this creation phase combined with the se-lecting phase (below) stresses too much the creational aspect instead of thereuse aspect of learning objects. The life cycle suggests to create a learningobject first and then to use that in the learning activity. In the dynamiclife cycle (see section 2.4), we combine these phases to a single phase at thebeginning of the life cycle (the obtaining phase). In that way, the selectingof a learning object is performed either for reuse in development or for reusein a learning activity.

2. Labeling: after the creation the learning object gets labeled or describedwith learning object metadata. The most important reason to label a learningobject is to retrieve it; the more qualitatively a learning object is labeled,the greater the chance that the object will be retrieved.

3. Offering: a LCMS tries to offer a set of learning objects to be used fordifferent goals. The goals depend on the needs of the course developers. Howthe material is offered should fit the search strategy of the course developer.For example, content models and ontologies help to find appropriate learningobject for repurposing, or query interfaces help to perform advanced queries.

4. Selecting: the selection of the learning objects is closely related to the to-be created course material. The selected material should fit the needs of thecourse developer. The offered material can only be selected if it has beenlabeled according to these needs.

5. Using: when a learning object actually is selected, it may be used. If thematerial fulfils the exact needs of the course developer, it may be used as itis, otherwise the developer may choose to change the material to fulfil theneeds.

6. Retaining: after a learning object is used, the choice can be made to reusethe object in the future as it is, to revise it or to remove the object from therepositories.

The retaining phase for electronic records is still subject of discussion. Inthe life cycle of physical objects, the retaining phase is used as a decisionpoint about whether to keep an object or not. Useful objects are stored inarchives, other objects are destroyed and end to exist. Concerning electronicrecords in general, this archival state is said to be inadequate: records areunlikely to reach a definite inactive point but are instead migrated into newformats following developments in technology [Yusof and Chell, 2000]. In-stead of talking about a life cycle for electronic records, records should beconsidered to exist in a continuum for its proper management. The RecordsContinuum defined in [Pederson, 1999; An, 2001] is such a management

2.4 Dynamic Life Cycle 29

Obtaining Repurposing

Authoring Offering

Integration Using Retaining

Labeling

Figure 2.3: The Dynamic Learning Object Life Cycle

framework for electronic records.

On the other hand, researchers argue that archival and destruction of elec-tronic records is not needed or wanted. The availability of large storage space,for example, makes the destruction of learning objects superfluous; further-more, learning objects may always be referred to by other objects whichprohibits their destruction.

From these arguments, the importance of proper retention schemas in the lifecycle of learning objects – and electronic records in general – becomes clearand should be investigated thoroughly [James et al., 2003]. In this thesis weare not going into the details of this discussion.

Although this life cycle represents correctly the current use of learning objects inmost systems, it does not cope well with the dynamic aspect of metadata andfocusses too little on the reuse. In the next section, we tackle these problems bymaking the labeling phase available for all other phases and putting the focus onthe ’obtaining’ phase before the ’authoring’ phase is entered.

2.4 Dynamic Life Cycle

In order to stress the aspects of reuse and the dynamic character of metadata wedefine the learning object life cycle differently (see Figure 2.3). Basically, the stagesof the traditional life cycle remain but their positioning is changed, stressing twoimportant aspects in this life cycle:

• ReuseWe stress the aspect of reuse in two ways. In the first place, the obtainingphase is put at the first place of the cycle; although the cyclic aspect allowsto enter the life cycle in every phase, we want the obtaining phase to beconsidered first. In our case, obtaining indicates the selecting (finding) ofrelevant learning objects and all the steps needed to get hold of those learn-ing objects, such as downloading, rights management issues, and so on.

On the second hand, the cycle introduces a point of choice that allows to


directly reuse a learning object in a learning activity (possibly after havingit repurposed and/or integrated). This option also opens up the possibilityof automatic selection of learning objects, as in adaptive learning systems.Learning objects can be selected automatically and integrated immediatelyin the activity, for example, using a learning object ontology as mentionedin [Verbert et al., 2005b]. Although this aspect of the life cycle is not thefocus of our research, we will refer to it in the comparison of other life cyclemodels later in this chapter.

• Dynamic metadataMetadata are made dynamic by taking the labeling phase out of the sequenceand putting it in parallel with all the other phases, accessible by those phases.In other words, the labeling can be done at every moment during the life ofthe learning object.

In fact, the labeling phase is not really a phase that the learning objectreaches at some point, it is a consideration that should be made at everystage in the life cycle. Metadata not only get added once, just before theoffering phase, but they continue to be updated every time new informationbecomes available. In the next section we provide an in-depth discussionabout the labeling phase in this life cycle.

The life cycle as we define it contains the following phases:

• Obtaining: The first phase in the use of learning objects consists of tryingto find existing learning objects that can be reused - directly in a learningsetting or indirectly by adapting it or as the basis for a new learning object.

This phase incorporates different substeps that are needed to actually gethold of the learning object: selecting, rights management, downloading,. . . Inour context, these details of the phase are not important and can be groupedinto this single phase.

• Repurposing: In this phase, a learning object is slightly modified to makeit suitable for the other phases. Repurposing can be done manually or auto-matically; it includes the activities such as aggregation and disaggregation[Verbert et al., 2005b; Van Assche and Vuorikari, 2006].

After the repurposing phase, the life cycle offers the choice whether to di-rectly integrate the learning object in the learning activity or to start theauthoring phase to edit the learning object or create a new one by reusingthe current learning object(s).

• Authoring: We call the third phase the authoring phase; this phase is thecombination of creation a new learning object and modifying or updating an

2.4 Dynamic Life Cycle 31

existing learning object. In the alternative life cycle presented by [Strijker,2004], discussed in section 2.6.1, this phase is split in two phases in the lifecycle, namely obtaining and adapting. In the dynamic life cycle the differencebetween those two is only minimal and therefore we keep authoring as a singlephase.

• Offering: In the fourth step in the cycle the learning objects, both newones and modified ones, are offered to the other users (both instructors andlearners) by storing them in a repository.

• Integration: In the integration phase the learning objects are integrated inthe learning context to enable the learners to use the object.

• Using: This phase represents the actual use by learners.

• Retaining: In the discussion of the traditional life cycle above, we alreadymentioned the importance of good retention schemes for learning objects.This latter phase allows to integrate such schemes in the repositories.

• Labeling : Spread across all the other phases we find the labeling phase. Thisphase is accessible from all other phases allowing to update the metadataconstantly during the life of the learning object. Arrows also go out of thelabeling phase towards the other phases to indicate the importance of usingmetadata in these phases; every phase has access to metadata.

In the discussion of the labeling phase, we refer to current research in infor-mation retrieval and the web2.0 to distinguish between different sources ofmetadata that are available in specific phases of the life cycle.

The definition of the stages of this life cycle is based on the use scenario givenin [Van Assche and Vuorikari, 2006], shown in Figure 2.4. This scenario describesthe typical actions taken upon a learning object, from the point of view of qualityassurance of the learning object. For our context we have modified the layout ofthe cycle to incorporate the labeling phase in parallel with the other phases andwe have combined several phases into a single phase because the dynamic life cyclecan be defined without those details.

In the use scenario, two sub-cycles can be discovered that intersect at the re-purpose & reuse phase, that are run in parallel. The first cycle corresponds to thedevelopment phase, the other represents the use of a learning object, i.e. includingit in a learning activity and the actual use of it by the learners. Both cycles startfrom the discovery of the learning object:

1. The development phase is represented by the upper circle: Discovery, Eval-uate, Resolution, Obtaining, Repurposing & Reuse, Describe, Create, Ap-prove, Publish and Retract.


Figure 2.4: Use Scenario for Learning Resources [Van Assche and Vuorikari, 2006]

2. The use life cycle also starts at the Discovery phase and then passes throughthe following phases: Evaluate, Resolution, Obtain, Repurpose & Reuse, In-tegrate, Use/Play and Local Delete.

These two cycles in the use scenario roughly correspond to the two parallel tracksin the dynamic life cycle. The upper sequence represents the development phase,the lower sequence represents the use phase (the corresponding phases of the usescenario are put between parentheses):

1. The development cycle corresponds to the following phases: Obtaining (Dis-covery, Evaluate, Resolution, Obtaining), Repurposing (Repurposing & Reuse),Authoring (Create), Offering (Approve, Publish)

2. The use cycle follows the following phases: Obtaining, Repurposing, Integra-tion (Integrate), Using (Use/Play), Retaining (Local Delete)

2.5 The Labeling Phase

For the labeling phase, we refer to current research on information retrieval, theworld-wide web in general, and the semantic web or web2.0 in specific, for the

2.5 The Labeling Phase 33

generation of metadata for learning objects. As we explained above, metadata be-come available in all the separate phases of the life cycle, but the sources of thesemetadata differ according to the phase in the life cycle.

In this section, we explain which phases of the life cycle benefit from which researchfor the acquiring of metadata and how the different phases may benefit from theavailable metadata. In the next chapter, we discuss these technologies when werefer to different sources of metadata for automatic indexing.

An extreme approach to dynamic metadata is given in [McCalla, 2004], whichis called the ’ecological approach’. In this approach metadata are captured dur-ing the life of learning objects, e.g. their creation and the actual use by learners,without an explicit labeling phase. This captured information is used in a mul-titude of ways, based on enhancements of collaborative filtering approaches. Theinformation captured, includes [Brooks and McCalla, 2006]:

• cognitive, affective, social characteristics of the users and their goals in ac-cessing the content;

• information about the content itself;

• information about how the users interacted with the content;

• information about the technical context of use;

• information about the social context of use.

Although our approach is less extreme, this ecological approach indicates the im-portance of dynamic metadata, especially during the using phase of the life cycle.For each technology and life cycle phase, we illustrate the possibilities of the la-beling phase using concrete examples of learning objects.

2.5.1 Information Retrieval/Extraction

Description Information retrieval is the search for material of an unstructurednature that satisfies an information need in large collections [Manning et al., 2005].Information extraction is a type of information retrieval that focuses on extractingand interpreting information from this material instead of returning the materialitself.

Obviously, finding learning objects for (re-)use is a specific type of informationretrieval in which learning objects are the requested information. Creating meta-data for those learning objects, on the other hand, is closely related to extractinginformation from the set of available learning objects or one single learning objectitself.


Applicability With respect to information retrieval we have to clearly distin-guish between two different activities in the life cycle. The first activity relatesto the obtaining phase in which learning objects are to be found for further use.Of course, information retrieval is an important technology to help this search foradequate learning objects.

In the second activity, we apply information retrieval and extraction to obtainmetadata from one or more learning objects. In this activity, these methods op-erate on the learning object itself or on a large set of learning objects to findrelationships between objects. For this purpose, the technology is best applicablein the repurposing phase, the authoring phase, the offering phase and the integra-tion phase (Figure 2.5).


Authoring Offering


Labeling

Figure 2.5: Information Retrieval/Extraction in the Life Cycle

The creation of metadata is generally considered as taking two approaches: meta-data extraction or metadata harvesting [Greenberg, 2004]. The method of metadataextraction refers to the automatic creation of metadata based on document mining(information extraction), i.e. deriving information from the content of the object.Metadata harvesting collects metadata from the object that were previously addedto that object automatically or manually.

• Repurposing phaseThe repurposing phase is a kind of authoring phase, with specific activities,such as aggregation, disaggregation of components or modifications with re-spect to language, cultural or pedagogic aspects, etc.

During the repurposing phase, the internal structure of the learning objectis very interesting for the labeling phase. In Chapter 3 we describe how thisstructure can be used for metadata propagation from an aggregate learningobject to the components and vice versa.

Examples

– A typical example of an activity during the repurposing phaseis the disaggregation of a learning object into smaller com-ponents for individual reuse. Images, for example, are often


stored separately to reuse in a presentation or a text doc-ument. In this case, the (text) information surrounding theimage provides information about the image itself. A clear ex-ample is the label associated with the image, containing thetitle of the image as a new learning object.

– In the case of content aggregation, metadata from the inte-grated learning object can be applied as information aboutthe new learning object (the aggregate). For example, the ped-agogical duration of the aggregate can be derived from thepedagogical duration values of the different components thatare included in the learning object. As an example, take anapplet illustrating a sorting algorithm, shown in Figure 2.6.The typical learning time for this learning object is defined bythe indexer to 20 minutes. In the case of an aggregate object,this duration can automatically be taken into account for thetypical learning time of that aggregate.

Figure 2.6: Learning Object Illustrating a Sorting Algorithm

Typical Metadata ElementsIn general, the repurposing phase of the life cycle helps to generatemetadata based on the relationships between the different compo-nents, but there is no clear distinction between the elements thatcan or cannot be generated during this phase. In Chapter 4 we


discuss several options for automatic metadata generation basedon propagation rules in the context of content aggregates.

• Authoring phaseDuring the authoring of the learning object, mainly document mining is ap-plicable for the information extraction. This is typically performed by contentanalysis tools such as keyword extractors, language detection algorithms (seeFigure 2.7), structure analyzers ([Greenberg et al., 2006]). . .

Figure 2.7: Language Detection in MS Word

Typical Metadata ElementsIn this phase, it depends on the authoring environment used andthe type of learning object created which metadata elements canbe generated automatically with extraction methods. In general-purpose editors for example, such as text editors, mainly generalor technical metadata can be extracted. In LOM, this correspondsto the categories ’General’, ’Life Cycle’ or ’Technical’. In specific-purpose editors, also educational information can be obtained (cat-egories ’Educational’, ’Relation’ and ’Classification’).

• Offering phaseDuring the offering phase, the learning object is placed in a large collectionof learning objects, some of which may be related to the new learning object.At that moment, these relationships can be indicated explicitly in the meta-data or can be tacitly present. In the latter case, similarity searches betweenobjects could be performed to discover relationships between several objects[Cardinaels et al., 2002; Haveliwala et al., 2002].

Relationships can be applied to retrieve appropriate learning objects (seealso recommender systems in the next section on social information retrieval)or to add metadata to those learning objects.

ExampleGiven the sorting algorithm applet of Figure 2.6, the ARIADNEKnowledge Pool System contains several similar learning objects,illustrating related algorithms (quicksort, selection sort,...). In theoffering phase, many metadata values can be obtained automat-ically if the relationship with the first learning object is known.


Figure 2.8 shows three categories of learning object metadata thatcan be ’copied’ to these new learning objects.

Figure 2.8: Learning Object Metadata which may be ’copied’ to Related LearningObjects

Typical Metadata ElementsThe offering phase allows to find metadata in most categories,based on the relationships found between learning objects in therepository.

• Integration phaseComparable to the offering phase, the integration phase also places the learn-ing object in a collection of learning objects. In this phase, however, a moreexplicit structure might be available, such as sessions, modules, and so on to-gether with course information (e.g. syllabi) that contain specific informationapplicable to the learning object also.

ExampleIn the context of sorting algorithms, Figure 2.9 shows an appletcomparing the speed of different algorithms. When this learningobject is integrated in a session about sorting algorithms, this in-formation can automatically be used to create or update the meta-data about the applet.

Typical Metadata ElementsThe type of metadata that can be generated in this phase is also


Figure 2.9: Learning Object Comparing Sorting Algoirthms

comparable to that of the offering phase, described above. In thiscase, the educational elements benefit most because the relation-ships exist in a educational context, such as a course or session.

2.5.2 Social Information Retrieval

Description Social information retrieval is a subdomain of information retrievalthat has gained interest in the last several years, in which the characteristics of thecurrent web2.0 are largely exploited. One of the important aspects for informationretrieval within this domain, is the social structure of the information space [Kirschet al., 2006]. Currently, the most known domain within social information retrievalis that of collaborative filtering or social recommender systems (see further); theseapplications are mainly used for commercial applications. Social recommender sys-tems suggest objects to the users based on previous selections made by other usersor based on relationships between the different objects in the repository.

A second aspect, important within the current development of the web, is thesocial interaction between users for the creation of information. Well-known ex-amples of this trend are wikipedia [Lih, 2004; Krotzsch et al., 2005], blogs, flickr[Weiss, 2005],. . . In these applications, information is not authored by one individ-ual anymore, but by a collaboration of individuals. Information can also be up-dated constantly by multiple sources. This clearly relates closely to the dynamic


character of metadata which can be updated constantly also.

Applicability Social information retrieval is mainly based on the users’ activ-ities or social network. In the learning object life cycle, this domain relates mostclosely to the obtaining phase and the using phase (see Figure 2.10).


Authoring Offering


Labeling

Figure 2.10: Social Information Retrieval/Extraction in the Life Cycle

• Obtaining phaseIn the obtaining phase, query information and data about the query resultscan be gained from the users. This type of information is typically interestingfor use in recommender systems.

Social information retrieval is both applicable to obtain metadata for thelearning objects, but also to help the users in the obtaining phase to selectthe most appropriate learning object.

To help the users making the selections, social information retrieval enablesthe system to offer different types of recommendations. These recommenda-tions are based upon correlations found between the current users, the ob-jects he/she has selected or used (item-to-item correlations) and other usersthat may have shown a similar interest for learning objects (people-to-peoplecorrelations) [Schafer et al., 1999].

ExampleSocial information retrieval can help the user to find learning ob-jects as is done for finding books: ’people who used this learningobject, have also included the following learning objects in theircourses’. This is an example of people-to-people correlations. Anal-ogously, item-to-item correlations help to find learning objects too,based on characteristics of the selected learning object. For exam-ple, learning objects with the same subject, pedagogical duration,difficulty level and interactivity type. Typical Metadata Ele-mentsIn this phase, mainly relationships between learning objects areindicated by the system. These relationships can be used for the


recommendation of learning objects or for the updating of themetadata.

• Using phaseDuring the using phase, the social interaction as in the web2.0 becomes in-teresting to exploit to create metadata. Typical information that can be cre-ated, are user evaluations, such as ratings, reviews or annotations [Vuorikariet al., 2006]. In the beginning of this section we also mentioned the ecologicalapproach that generates metadata almost exclusively during the using phase.

As in the selecting phase, recommender systems also gain information duringthe using phase, as already mentioned in the previous paragraphs.

Typical Metadata ElementsFor the collaborative creation of metadata, we distinguish betweentwo (extreme) approaches. The first allows to create all the meta-data elements collaboratively, as is done in folksonomies1. In thesecond approach, only annotations can be added to the metadata.

2.5.3 Explicit Feedback

Description In general, metadata in other phases than the offering phase, ismostly captured as some kind of feedback [Croft, 1995], either manually, providedby the users, or automatically, captured by the system [White et al., 2001]. Wedistinguish between these two types of feedback as explicit and implicit feedback.Explicit feedback is given by the users by relevance indications, evaluations, com-ments,. . . Implicit feedback is captured by the system, through user interactionanalysis [White et al., 2002], click-through analyses [Joachims, 2002], or query ex-pansions [Fitzpatrick and Dent, 1997], and so on. We discuss implicit feedback insections 2.5.4.

Currently, feedback in e-learning systems – both explicit and implicit – is gainingmuch interest. Important research is focussing on attention metadata; attentionmetadata concerns collecting detailed information about the relation between usersand the content they access. In this area, the interest paid by the user to digitalmaterial is analyzed. This information should help the systems to offer recommen-dations for the users to suit their needs. Attention metadata can be consideredthe generalization of the feedback mechanisms and the use of social informationretrieval of the previous paragraph.

1 A folksonomy is a user generated taxonomy used to categorize and retrieve web content usingopen ended labels called tags, http://en.wikipedia.org/wiki/Folksonomy



Authoring Offering


Labeling

Figure 2.11: Explicit Feedback mechanisms in the Life Cycle

Applicability Feedback can be gained in every phase of the life cycle in whichsome interaction with the system occurs, especially in those phases in which thesystem suggests results and the users can reply to those results. In the life cycle,the phases in which feedback mechanisms are best suited, are shown Figure 2.11.

• Obtaining phaseIn this phase, typically relevance feedback or comments on the retrievedresults is provided by the users that are looking for an appropriate learningobject. An example system in which this type of feedback is implemented isMERLOT, described in section 1.5.

Typical Metadata ElementsExplicit feedback can relate to all metadata elements as the userscan provide useful information about those elements when search-ing for appropriate learning objects. Furthermore, relevance feed-back is also often applied to optimize the search facilities helpingthe system to provide better search results, i.e. to define an order-ing among the learning objects in the search result.

• Using phaseComparable to feedback in the obtaining phase, learners can also providefeedback about the learning objects during their learning activities. Com-ments on the applied learning objects, are an example of subjective metadatathat are provided by explicit feedback.

Typical Metadata ElementsIn this phase, it is most interesting to gain feedback for the educa-tional metadata of the learning object, which is often subjective.For example, real values about the pedagogical duration or thedifficulty level.

• Retaining phaseIn the retaining phase, the decision is taken whether a learning object shouldbe kept or should be removed from the system. The reason why to keep anobject or not is important information which could be stored in the metadataof that object.


2.5.4 Implicit Feedback

Description As mentioned above, implicit feedback is obtained from systemanalysis providing information about the use of learning objects or the learningsystem. We distinguish between two types of metadata that can be gained fromsuch analysis: instance metadata and context metadata.

• Instance metadata are metadata that contain information about the learn-ing objects themselves. This is typically the metadata represented in theIEEE LOM element set, such as the object’s title, the difficulty level or thepedagogical duration.

• Context metadata, on the other hand, cannot directly be associated withthe individual learning objects, but contain information about the systemin general; such as user profiles and usage information. Context metadataare mostly applied for system optimization, such as faster response times forqueries or for adaptive learning systems.

Applicability System analysis is typically applied in the obtaining phase andusing phase of the life cycle.


Authoring Offering


Labeling

Figure 2.12: Implicit Feedback in the Life Cycle

• Obtaining phaseIn the obtaining phase, both instance metadata and context metadata canbe obtained. System analysis mechanisms, such as click-through analysis[White et al., 2002] help to obtain instance metadata while the users areselecting and evaluating learning objects for use or adaptation. Query ex-pansion [Joachims, 2002], for example, is used for the optimization of theselecting system, which is context metadata.

The use of implicit feedback is analogous to the explicitly obtained userfeedback which we have described above. The feedback is mostly used tooptimize the search results, such as providing an ordering of the learningobjects based on their relevance, for example within a certain discipline.

• Using phaseDuring the using phase, both instance metadata and context metadata can

2.6 Related Learning Object Life Cycles 43

be obtained from system analysis. In [Najjar et al., 2006], for example, a sys-tem is described to apply attention metadata to drop the rating button fromthe user interface. Another example is the use of system logs to estimate thetypical learning time of an object by inspecting the typical time spent to usethat object.

Context metadata include learner profiles which are applicable in adaptivelearning environments or for personalization of learning content [Bellaachiaet al., 2006]. In [Zaıane, 2000], for example, web usage mining is used toenhance the learning environment itself, i.e. the user interface aspects of it.

2.5.5 Discussion

The overview given in this section shows which kind of metadata can be retrievedduring the life cycle of a learning object. In every phase, specific information be-comes available that requires appropriate technologies to acquire the information,either about the learning objects themselves or about the system in which theyare used.

In the discussion of the different technologies we mentioned different domains thatare suited for the application of the obtained metadata. The information gatheringand the use of that information are closely related, because the use also determineswhich information should be obtained. Metadata are not only applicable duringthe selecting phase of the learning object, but in the other phases too. For examplecontext information is typically used for personalization in the use phase of thecycle; during authoring, metadata may be applied to assist the author in includingreused learning objects (see for example also [Verbert et al., 2005a]).

The main use of metadata is finding relevant material from a large collection;most information about learning objects or the systems is intended to help in thisactivity.

2.6 Related Learning Object Life Cycles

In this section we describe several life cycle models that are comparable to the onewe proposed in this chapter. For each model we describe how the different phasescorrespond in the cycles, whether dynamic metadata are thought of and which(dis-)advantages the model may show. We summarize these findings in Table 2.2.


Figure 2.13: Alternative Presentation proposed by [Strijker, 2004]

2.6.1 Alternative Presentation of the Traditional Life Cycle

An alternative presentation of the traditional life cycle is also proposed in [Strijker,2004], and shown in Figure 2.13. The main important change is the sequence of thestages and the possibility to traverse the cycle through different paths. This latteris introduced because reuse is performed differently in various contexts, compa-rable to the two parallel paths in the dynamic cycle (see [Strijker, 2004] for thedetails of this discussion). The introduction of the additional arrows, however,creates a more complex life cycle which is not needed for the purpose of dynamicmetadata.

It is interesting to remark that this alternative allows to skip the labeling phase(together with the offering, the selecting and adapting phases) before the first useof the learning object (arrow 3), because the labeling phase is sometimes diffi-cult and time consuming to perform. From a pragmatic point of view this is areasonable option, but we think that it is better to try to automate the indexingprocess instead of skipping it. This indexing process should already start duringthe authoring phase of the learning object, which decreases the efforts needed inthe other phases.

On the second hand, this alternative life cycle does not address metadata in adifferent way from the other representation. The labeling is still performed onlyonce, right before the object is stored in the repository.

As such, this alternative cycle only defines some explicit paths in the life cyclebut does not introduce other aspects.


Figure 2.14: First-level Issues of the Learning Objects Ontology

2.6.2 Learning Objects Ontology

The Learning Objects Ontology defined in [Metros and Bennett, 2004] distin-guishes between pedagogical issues and technical and administrative issues con-cerning learning objects (see Figure 2.14). Because of the nature of an ontology –opposed to a life cycle model – the different issues are not put in a specific orderbut represent concepts or activities related to learning objects. The first level is-sues in both categories of the ontology, however, are closely related to the phasesof a life cycle model.

Table 2.1 shows the matching of the different (first level) issues of the ontology tothe life cycle stages. We have indicated aspects relating to the labeling phase initalics in their most appropriate life cycle phase(s).

The learning object ontology corresponds quite well to the life cycle model:

• The technical part of the ontology maps best to the subcycle concerning thedevelopment of learning objects. Technical issues, indeed, are more concernedwith the creation of learning objects and making them available to the users,than of the using in learning itself.

• The pedagogical issues correspond to both the development cycle and theinstructional use cycle. Pedagogical issues are important in the developmentphase as the creation of learning objects always takes into account the ped-agogical use of those learning objects. The importance of pedagogical issuesin the instructional use phase is obvious.


• Many learning objects issues categorized in the ontology relate to metadataor correspond to activities that require or provide metadata. Although thisontology does not explicitly mention dynamic metadata, it does define mul-tiple dynamic metadata issues. For example, the tracking and evaluation oflearning objects add metadata to the learning object after it has been offered,i.e. in the using phase of the life cycle.

Although the ontology does not directly represent a life cycle, we consider it as avaluable addition to the life cycle explaining for the different phases the importantaspects and the relationships between them, both with respect to the learningobjects and the metadata. Based on the mapping described in Table 2.1, the twomay be used together. The columns in this table represent the different phasesof the dynamic life cycle. The different elements of the ontology are put in thecells of the corresponding life cycle phase, distinguishing between pedagogical andtechnical issues.

2.6.3 COLIS Global Use Case

The COLIS2 Global Use Case [Dalziel, 2002] incorporates the learning object lifecycle in a broader context of e-learning. The life cycle is represented as a workflow,introducing the different actors involved in the learning activity (see Figure 2.15).Because of this global context, the life cycle is an extension of the traditional lifecycle, presented above, revealing an important distinction between learning con-tent and learning activities [Dalziel, 2003].

The activities, such as discussion groups and chat rooms, may incorporate oneor more learning objects. The left part of Figure 2.15 – the phases related to au-thority, creator, arranger and infoseeker from the Prescribe phase to the StructureLOs & Activities – shows the phases related to the learning content. The otherphases in this use case relate more closely to the learning activities.

Other extensions are the involvement of new types of actors in the life cycle andthe attention paid to digital rights management. The actors include - apart frominstructors, authors and learners - authorities, facilitators, monitors and certifiers.As a result, the workflow is a much more detailed representation of the learningobject life cycle including stages that are not directly important in our life cycle.

2 The Collaborative Online Learning and Information Services (COLIS) project is an Aus-tralian DEST funded initiative to build a broad, interoperable, standards-based e-learningenvironment for the future


Phas

e/Is

sue

Obta

inin

gR

epurp

osin

gA

uth

orin

gO

ffer

ing

Inte

grat

ion

Usi

ng

Ret

ainin

gPed

agog

ical

Ado

ptD

efine

,D

evel

op,

Use

,U

se,

Use

,Eva

luat

eIs

sues

Des

ign

Ado

ptG

over

nTra

ck,

Tra

ck,

Exp

lore

Exp

lore

Tec

hnic

alFin

dSh

are

Inde

x/St

ore,

Pro

tect

Pre

serv

e/Is

sues

Cat

alog

Pro

tect

,A

rchi

veSh

are,

Inde

x/C

atal

og,

Pro

tect

Tab

le2.

1:M

appi

ngof

the

Lea

rnin

gO

bje

ctO

ntol

ogy

toD

ynam

icLife

Cyc

le


Figure 2.15: COLIS Global Use Case


Figure 2.16: Workflow in the D2D model

2.6.4 Digital Library Framework - Discovery to Delivery

As part of the Service Framework of the Digital Library Framework, A. Powellintroduces the Discovery to Delivery (D2D) reference model [Powell, 2005]. Thismodel (Figure 2.16) is described as the process by which an end-user moves fromidentifying a need to obtaining a resource that they can use to address that need.

The D2D model describes different activities – workflow – that have to be per-formed to obtain a resource. This is not exactly a life cycle, but the different stepstraversed also relate to the life cycle phases.

• Survey/Discover: ’Survey’ is concerned with identifying the collectionsthat are worthy of further investigation, ’discover’ drills down into each ofthe collections identified during the initial survey. The intention is to discovera particular item of interest. The result of this activity is not the item itself,but a metadata record about that item.

• Detail: in this phase the user builds up knowledge about a particular re-source to the point where enough is known that there is no ambiguity aboutthe item that must be requested.

• UseRecord: here the user uses the metadata record about the resources.

• Request: having sufficiently identified the particular resource that he or sheis interested in, the end-user may attempt to obtain it.

• Deliver: this phase is initiated by the content provider - in response to theend-user’s request.

• UseResource: in the last stage, the delivered resource is used.


These phases match as follows to the life cycle model:

Obtaining –Labeling –Offering –Selecting Survey/Discover, UseRecord, Detail, Request, DeliverUsing UseResourceRetaining –

There are two important differences that rise immediately looking at this matching:

1. The D2D model focuses on the delivery of resources and does not pay at-tention to the first phases of the learning object life cycle that deal with thecreation or obtaining of the resources. The different steps it defines, however,are defined in more detail and show clearly how the activities to search aresource and use it follow one another.

2. As a consequence of the prior issue, the labeling phase is not mentionedwithin the D2D model. The model only mentions ’useRecord’, the possibilityto check the metadata of the resource, it does not address the creation ofmetadata.

2.6.5 Discussion

In this section we presented some different life cycle models. Together with the tra-ditional life cycle and our dynamic life cycle, this comparison shows that there is nostandard definition of what the life cycle should be. Every life cycle focuses on thespecific needs of the application for which it has been defined. As a consequence,the different models differ a lot in the details of the phases that are modeled.In general, however, the traditional life cycle can be seen as a solid basis for theother models. Table 2.2 summarizes the comparison with respect to the presenceof the different phases of the dynamic life cycle and the availability of dynamicmetadata (++ more detailed than the dynamic life cycle, + good correspondencebetween both models, ± rather good correspondence, − bad correspondance, −−not present or supported).

Because of the different focuses of the models, it is clear that some of the phases inthe models are defined more extensively compared to our more general model. Forexample the COLIS global use case describes in much detail the different aspectsof using phase, applying more detailed subphases in its model. On the other hand,the Discovery to Delivery model focuses on a single aspect of the life cycle and,thus, does not address the other phases.

Little attention has been paid to dynamic metadata. Only the learning object


Model

Obta

inin

gR

epurp

osi

ng

Auth

ori

ng

Offeri

ng

Inte

gra

tion

Usi

ng

Reta

inin

gD

ynam

ic

Met

adata

Alt

erna

tive

+±

++

+−

++

−−Lea

rnin

gO

bje

ctO

ntol

ogy

++

++

++

++

+

CO

LIS

Glo

balU

seC

ase

++

±+

+±

++

+−

−−

DLF

-D

2D+

+−−

−−−−

−−+

−−−−

Tab

le2.

2:C

ompa

riso

nof

the

Diff

eren

tLife

Cyc

leM

odel

s


ontology incorporates the possibility for dynamic metadata, all the other modelsdefine a singular labeling phase comparable to the traditional life cycle. We think,however, that is important to address dynamic metadata in the life cycle to enablenew aspects of metadata, such as attention metadata described above.

2.7 Conclusions

Starting from the reuse life cycle, we have defined in this chapter a dynamic lifecycle for learning objects, focussing on the dynamic aspect of learning object meta-data. An important phase in this life cycle is the labeling phase which is accessiblefrom every other phase in the life cycle, allowing the metadata to be updated con-tinuously. During the life of a learning object, different sources of information maybe consulted for metadata. In this chapter we have shown how current technologiescan be applied to obtain the appropriate metadata in every phase.

As there is no standard for the life cycle of a learning object, almost every appli-cation has its own definition focusing on its important aspects. In the second partof this chapter we have compared several life cycle models with our dynamic lifecycle. An important conclusion of this comparison is that only little attention hasbeen paid to dynamic metadata.

The implementation of dynamic metadata has also consequences for the systemsworking with learning objects. Dynamic metadata imply that metadata recordswill never be finalized and may be updated constantly. Currently, learning objectrepositories do not support this updating of metadata. A service-oriented architec-ture may help to implement a system supporting dynamic metadata. An examplecomponent that could be part of such an architecture is the Simple Indexing In-terface that we discuss in Chapter 5.

Freely updating metadata probably leads to conflicts between metadata gainedfrom different phases or different sources. We go into the details of this issue whenwe discuss automatic metadata generation and conflict resolution (see Chapter 3and Chapter 4).

Another related aspect that should get attention when metadata are updatedmore often, is versioning, both for learning objects and the metadata. For exam-ple, [Brooks et al., 2003] cite four types of relationships between a learning objectand a different version of it:

• Functionally equivalent: the meaning of the content of both versions is un-changed.

• Subset: the new learning object contains all the information from the reused

2.7 Conclusions 53

one as well as new information.

• Superset: the new learning object contains only information from the originalobject, but does not contain all the information of the original.

• Non-null intersection: an overlap between both objects exists, but this cannotbe defined as a proper subset or superset.

Concerning the metadata, these types of versioning also define relationships be-tween the metadata of the learning objects, which may be used for automaticmetadata generation. Consider the example given in [Brooks et al., 2003]:

Given an online tutorial about database systems built for students ina final year undergraduate level course. An instructor for a graduatecourse on database systems may well adopt this learning object buttake out any information not explicitly on relational database systems,and use it as an introduction. This reworking of the learning object hasbroadened its audience while restricting the topics that it is relevantto.

This example shows how the versioning of a learning object influences the meta-data. In Chapter 4 we discuss the relationships between metadata of learningobjects in a specific reuse situation, namely content aggregation. In content aggre-gates, a subset relationship between the components and the aggregate is created(or a superset relationship in the case of disaggregation).

In Chapter 4 a third aspect with respect to dynamic metadata is discussed: context-dependent metadata. Especially pedagogical metadata and using metadata dependon the context they are given in. Current metadata schemas barely support thisaspect.


Chapter 3

Automatic MetadataGeneration

3.1 Introduction

One of the concerns of the learning objects research is the problem of acquiring acritical mass. This is one of the factors to establish real reuse. There are multiplecauses of this problem and several research projects focus on different aspects tosolve it.

First of all, learning objects are difficult to develop; a multidisciplinary team withbackground in pedagogy, graphical design, programming, etc. is needed to authorrich digital learning resources [Duval et al., 2001]. In [Downes, 2001; Greenberget al., 2003], a solution for this problem is suggested by making the developmentof content easier, faster or cheaper. The use of content models and the inter-operability of learning objects in order to facilitate the repurposing of learningobjects has gained a lot of interest to solve this aspect [Sicilia and Garcıa, 2003;Verbert and Duval, 2004].

On the second hand, multiple learning object repositories exist, each containing arather small number of learning objects. A more substantive amount of learningobjects is obtained when repositories can be combined. Federated searches acrossinteroperable repositories [Ternier et al., 2003] and content federations create suchlarger set of learning objects available for reuse.

Another cause is the difficulty to create good quality metadata for the learn-ing objects. This is the aspect we focus on. Without appropriate metadata, nolearning content will be really reusable as it will be difficult or even impossible

55

56 Automatic Metadata Generation

to identify and retrieve it. The manual creation of metadata is a difficult andtime-consuming task, which can be supported by automatic indexing aids. In thischapter we define a method to create learning object metadata automatically. Re-search has shown that the advantages of such automatic indexing are the largernumber of metadata available and the higher reliability of those metadata values[Anderson and Perez-Carballo, 2001; Duval and Hodgins, 2004; Han et al., 2003;Hatala and Forth, 2003; Liddy et al., 2002].

The contribution of this chapter to the research on learning objects and learn-ing object metadata is twofold:

1. In contrast to other indexing projects or frameworks, we develop a frame-work which closely relates to the dynamic life cycle described in the previouschapter. Depending on the phase of the life cycle, we have shown in Chap-ter 2 that different sources of information become available. Our frameworkconsiders generators for all these sources. Generally, metadata generationmethods focus on the objects only, because that is the only source availablein the labeling phase. The object is isolated from any actual use informa-tion or any related objects; relating it to the dynamic life cycle allows us toconsult all the available sources of information. This aspect is already in usein the semantic web and web search engines, such as YouTube and Google.In these systems, users can add metadata (annotations, ratings,. . . ) to theobjects (movies in YouTube) and relationships between objects are analyzedand used to provide more advanced search results. With respect to learningobjects, however, this is an innovative approach.

It is important to notice that the focus of the search for new or advancedinformation extraction algorithms, such as keyword extraction or languagedetection algorithms. Our focus is on the integration of existing techniquesinto a framework which is applicable in the context of learning objects andlearning object metadata. Throughout this text we refer to information ex-traction techniques, but we do not go into the details of them. In Chapter 5,we discuss the automatic metadata generation framework that we have de-veloped, which is using the different sources of information discussed in thecurrent chapter.

2. We associate a confidence value to the generated metadata. This confidencevalue has two purposes. At first, it highlights the uncertainty about meta-data. When providing metadata, we are not always completely sure aboutthe values; a confidence value helps to express this uncertainty. And, on thesecond hand, the confidence value is an aid in resolving conflicts between dif-ferent sources. When multiple sources provide values for the same metadataelements, they may conflict. Using the confidence value as a measure about

3.2 Metadata Sources 57

the certainty can help to resolve that conflict.

In Chapter 4, we define a formal model for the representation of metadata, basedon the aspects we describe for automatic metadata generation. The most impor-tant aspects this model formalizes are the use of the confidence value to expressthe uncertainty about the metadata values and the introduction of contexts formetadata. In this chapter we introduce contexts as a source of information; inChapter 4 we use contexts to define context-dependent metadata.

This chapter is structured as follows. In section 3.2, we describe the differenttypes of metadata sources we can distinguish, based on the discussion of the label-ing phase in Chapter 2. In section 3.3, we discuss how possible conflicts betweenmetadata sources may arise and how these conflicts can be solved. Section 3.4 thenprovides conclusions on this topic.

3.2 Metadata Sources

In the labeling phase of the dynamic life cycle (see section 2.5), we distinguishedbetween several sources of information that are available in the different phases ofthat life cycle. Based on these sources, we distinguish between different methodsfor metadata generation. The combination of the results of these methods providesus with the metadata – the initial metadata or updates for existing metadata –for a given learning object (depending on the phase the learning object is in).

From the point of view of metadata generation, we classify the information sources,described in the previous chapter, as follows:

• Manual metadataManual metadata are the initial metadata for a given learning object, up-dates on those metadata and the explicit feedback mentioned in the previouschapter, manually provided by a human indexer and by the users (learners,instructors).

• Object contentsObject analysis is mainly used for information extraction, i.e. the gatheringof metadata from the contents of the learning object itself.

• ContextsContext analysis plays an important role in the social information retrievaland for implicit feedback on learning objects in which the relationships withother learning objects or user environments are important.

• Actual useThis provides us with implicit feedback on the use of the learning objects,


but is also applicable for explicit feedback about the actual use – such ascomments on the use.

The applicability of these information sources during the life cycle is depicted inFigure 3.1. We explain this applicability during the discussion of the different in-formation sources.

It is our goal to create a framework that enables learning object metadata gen-eration for different types of objects through the integration of multiple indexersthat obtain information from the sources listed above. Similar to related research(see further), we are not trying to develop some spectacular analysis method fromobject contents, but we are integrating existing methods for the application in thedomain of learning objects.


Authoring Offering


Manual Information

Object Contents

Contexts

Actual Use

X X X X

X XX

XX

X

X X

X

Figure 3.1: Metadata Information Sources in the Life Cycle

3.2.1 Manual Metadata

An obvious source of information are the people creating the learning object, in-tegrating it in a course, or using it for learning. In the dynamic life cycle, all thesepeople can add metadata to the given learning object manually.

In general, any metadata value can be provided manually. Traditionally, it is per-formed during the authoring or the offering phases of the learning object in whichthe authors or the indexers provide the needed information for the metadata, butin principle any phase of the life cycle allows to add metadata manually.

Literature describes several problems with manually indexing learning objects [An-derson and Perez-Carballo, 2001; Cardinaels et al., 2005; Duval and Hodgins, 2003;Greenberg et al., 2003; Kabel et al., 2003; Kabel et al., 2004; Najjar et al., 2003]:

• The quality of the metadata is rather low. Indexing is a difficult task to per-form, human indexers may not always completely understand the meaningof the metadata elements, and, even worse, most indexing tools are direct


reflections of metadata standards, which makes them not human-friendly.The metadata standards were not meant to be visible to end users, a di-rect representation of these standards on forms makes it very difficult to fillout the correct values for the metadata in substantial quantities. The slogan”electronic forms must die”addresses this specific concern.

• Even when expert indexers perform the task, the consistency of the metadatais a problem, both between indexers (different indexers providing metadatafor a single object) and between learning objects (a single person indexingseveral learning objects). [Kabel et al., 2004] describes how consistency de-pends on different factors: the nature of the indexing vocabulary (flat listsversus ontologies), properties of the domain (formal versus informal), me-dia type of fragments (text, video, audio, pictures), and applicable attributetypes (tangible versus intangible).

• Instructors generally do not want to loose time on a time-consuming activitysuch as indexing, which takes the focus away from their main task.

A solution to these problems, is the use of automatic indexing, possibly in combina-tion with human indexing [Greenberg et al., 2005], because automatic indexing isless costly, more efficient and more consistent. Content creation applications oftenhave facilities for author-supplied attributes or automated capture of attributesthat can simplify the creation of metadata [Duval et al., 2002]; in electronic forms,values can be pre-filled automatically (Figure 3.2) which can be modified by theuser if necessary.

3.2.2 Object Contents

The method for metadata generation from object contents is object analysis orinformation extraction. In this method, only a single object is considered, indepen-dent from any specific use or relationship with other learning objects [Deniman etal., 2003]. Therefore, as depicted in Figure 3.1, object analysis is typically appliedin the life cycle phases in which the object itself has the focus, i.e. the repurposingphase, the authoring phase and the offering phase.

With respect to object analysis, Greenberg [Greenberg, 2004; Greenberg et al.,2006] distinguishes between metadata extraction and metadata harvesting. Themethod of metadata extraction refers to the automatic creation of metadata basedon document mining, this is deriving information from the document (object) it-self through an analysis of the contents. Metadata harvesting, on the other hand,collects metadata from the object that are rather easily available, e.g. added au-tomatically or manually in an earlier stage, typically an in type-specific metadataformat (HTML META tags, Office Properties,. . . ). For example, Figure 3.3 showsmetadata that can be harvested because they are added during the authoring


Figure 3.2: A combination of manually and automatically filled in metadata inthe Blackboard LMS. The metadata elements Title, Language, Creation Date andResource Format are filled in automatically by the system; all elements can bemodified by the user.

phase of the object. According to [Greenberg, 2004], some metadata elements maybe better suited for harvesting or for extraction methods. For our metadata gener-ation framework as a whole, the distinction between harvesting and extraction isnot important, because we are not focusing on the method of retrieval but on themetadata obtained. Furthermore, the use of confidence values (see further) allowsto indicate how confident an indexer is about a value. In this way, the frameworkcan cope with different indexers (extractors and harvester) that possibly providedifferent values. In the remainder of this text, when we talk about object analysis,this can either indicate metadata extraction or metadata harvesting.

Research has indicated a close relationship between document contents and struc-ture, and document genre [Toms et al., 1999]. Therefore, object analysis almost


Figure 3.3: Learning object properties available for metadata harvesting in MSWord.

always focuses on specific types of objects only or they generate values for a specificset of metadata elements, e.g. audio or video analyzers [Li et al., 2005], text cat-egorizers, web site analyzers [Pierre, 2001], language detection algorithms,. . . Forexample, [Moens, 1999] describes a system for the automatic abstracting of legaldocuments.

Comparable work on object analyzers, which is interesting for our indexing frame-work, can be found in advanced (desktop) search tools, allowing to search the desk-top for more complex information than just filenames. In principle, these searchtools perform some kind of metadata generation that can be compared to the in-dexing we are doing. Well-known examples of these tools are Google Desktop1,SIMILE2, Beagle3 or Spotlight on Mac OS X [Apple Computer, 2005]. In the dis-cussion of the indexing framework in Chapter 5, we compare the architectures ofthese tools. Here, we are especially interested in the object analysis componentsthey define. Mostly, these tools only implement object analysis and do not considerobjects in relationship with other objects; neither do they implement a dynamicview of the metadata.

1 http://desktop.google.com2 http://simile.mit.edu3 http://www.beagle-project.org


• The SIMILE project defines components called RDFizers: tools that cre-ate (RDF) metadata for one specific type of document. For example, theJPEG2RDF tool harvests jpeg pictures for EXIF or IPCT information; theOCW2RDF tool creates RDF metadata for an OpenCourseWare course website. In principle, each object analyzers of our framework could be convertedinto a RDFizer and vice versa, although our framework is more general withrespect to the metadata profile than only RDF.

• The Beagle search tool defines backends and filters to create metadata forindexable objects. The backends obtain information from a data source thatrelates to the indexed object, for example, an e-mail that contains an image.Therefore, backends can be considered context analyzers that we discuss inthe next section. Filters are object analyzers that process the contents of theobject, and thus are object analyzers. The filter extracts metadata based onthe content (e.g. HTML tags) as well as the plain text content which will beindexed.

• Google Desktop allows to extend its functionality by the implementation ofadditional indexing plug-ins. Plug-ins analyze the object contents and gen-erate a metadata record for the Google Desktop according to pre-definedschemas. The base schema is Google.Deskotp.Indexable, a schema definingsome general information about the indexed object. Plug-ins must use a sub-schema of this one to create type-specific information, such as Google.Desktop.MediaFilewhich contains information about videos, sound or images. Currently, GoogleDesktop only indexes text-based documents; video and audio contents aren’tprocessed.

The major drawback of this system is that it does not allow to extend theproperties for the different schemas and that new schemas cannot be added,which makes it difficult to extend it for new metadata, such as learning ob-ject metadata.Opposed to Google Desktop, the Google web search engine also incorporatescontexts in its indexing of web pages. Typically, the PageRank system [Brinand Page, 1998] takes into account the relationships between web pages,based on the incoming and outgoing links for a page.

• In Spotlight, the object analyzers are also called plug-ins, each plug-in isspecialized for a particular file type or set of file types. Additional plug-inscan be developed to support specific object types or additional metadata.


3.2.3 Contexts

The use of contexts has gained a lot of attention in the last several years. Con-texts are important for different purposes. A well known application is that ofcontext-awareness in information systems, in which the information provided iscontext-dependent, i.e. varies according to the situation of the user (time, loca-tion, reason,. . . ) [Abowd and Mynatt, 1999]. This can be compared to adaptivelearning systems, that take into account the user’s information to provide learningmaterial [Brusilovsky et al., 1998; De Bra et al., 2004; Nabeth et al., 2004] andthe more recent approaches to ubiquitous and ambient learning [Kolmel, 2006].

In our approach, we are not looking to exploit this context information in a real-time setting or for personalization as described above. Context analysis deducesmetadata from the given context that can be applied to the learning objects inthat context.

Figure 3.1 shows the applicability of context information in different phases: therepurposing phase, the offering phase and the integration phase. These are thephases in which the learning object is put in larger context with other learningobjects.

We distinguish between different types of contexts that provide information indifferent phases of the life cycle.

1. The working environments of the authors (school, research institution,. . . )and the courses that include the learning objects form the first type of con-texts. For example, personnel databases can provide author information, lec-ture guides contain course information such as session duration, languages,subject and electronic learning environments contain course metadata. Allthis information is a good candidate for metadata about the learning objects.

2. In [Cardinaels et al., 2002] we describe user profiles that may be available atthe moment of indexing. A user profile is a template with pre-filled metadatathat are applicable to the learning objects.

User profiles relate closely to the first type of contexts, because they mayrepresent similar information. User profiles, however, typically are generatedmanually and contain more specific information which is not available fromadministrative information. Educational metadata are good examples thatmay be included in such profiles.

User profiles as contexts are already applied in metadata editors, such asARIADNE SILO (Search and Index Learning Objects) in which every regis-tered indexer can create his/her personal profile which is automatically used


when new learning objects are inserted. A similar profile mechanism can beused in automatic indexing tools [Pansanato and Fortes, 2005]. The profileis then provided as a context for the learning object and the system extractsthe appropriate metadata values from it.

3. The third type is the relationships between learning objects. Learning ob-jects can be related to one another because of different reasons. For example,one learning object may be a different version of another [Doan et al., 2002],or it may discuss the same topic as the other.

When relationships are explicitly available, such as in versioning systems,it is straightforward for the system to find the known metadata for theindexing of the other learning objects. Implicitly related learning objectsare sometimes more difficult to find, but search facilities, such as similaritysearches, may help to find related learning objects [Cardinaels et al., 2002;Mayorga et al., 2006; Pinkwart et al., 2005]. Similarity searches for learningobjects can be compared to the Google search facility for similar pages.

4. In many cases, the reuse of learning objects is the creation of a content aggre-gate, or the reverse action, the extraction of components from an aggregateas new learning objects. In these situations, the generation of metadata canbe considered a special case of both object analysis and context analysis.

Metadata generation for content aggregates, allows to exploit special char-acteristics of these aggregations, namely the relationships between the com-ponents and the aggregate object, and the relationships between sibling andchild components [Cardinaels and Duval, 2003; Verbert and Duval, 2004].For example, one slide in a slide show often contains relevant informationabout the contents of the next slide, or a figure in a document is mostlysurrounded by text that explains the contents of that figure [Declerck et al.,2004]. Because of these relationships, we also consider the content aggregatea special kind of context for the component objects. The components residewithin the context of that aggregate, and that specific context contains a lotof information for the components.

Another interesting development in composite documents, is the use of con-tent models. Content models help the automation of metadata creation be-cause they make the content structure explicit [Anjewierden and Kabel,2001; Stuckenschmidt and van Harmelen, 2001]. [Anjewierden and Kabel,2001], for example, distinguish between four types of ontologies dependingon the point of view taken to index the document: general and syntactical,semantic, instructional, and domain ontologies.


Figure 3.4: The Attention Metadata Management framework [Najjar et al., 2005]

For example, the domain ontology is a description of a specific domain; ti-tles in documents can often be mapped to concepts in this domain ontology.The semantic ontology represents the description type and scope of objectfragments. The description type tells about the point of view a fragment wascreated. For example, structural description, description of components, de-scription of location of components,. . . The scope describes the nature of thefragment: general, short, elaborate. For example, an introduction paragraphof a text would be mapped to a description scope ’short’. The knowledge con-tained in these ontologies may be used to generate metadata for the givenlearning object.

3.2.4 Actual Use

Metadata derived from object analysis or from the contexts of use, only providetheoretical values for the different elements. The analysis of the actual use of alearning object provides more factual metadata.

Typical sources for the actual use information are system logs, available in thelearning management systems, or installed on the client side, as described in [Na-jjar et al., 2005; Najjar et al., 2006]. The framework shown in Figure 3.4 enablesto track use information from a variety of sources:

• Learning Object Repositories (LORs): Users may interact with a LOR (suchas MERLOT, EdNa, ARIADNE and SMETE) to introduce or search relevantlearning objects.

• Learning Management systems (LMSs): Teachers may interact with an LMS


(such as Blackboard, Moodle, and WebCT) to manage objects of their courses.Students also can use an LMS to access those objects.

• Internet Browsers: Users may download their relevant learning objects fromthe appropriate LOR and then open and read it (learn it) in another appli-cation like web browsers.

• Messaging Systems: users may also chat about a learning object using amessaging system. Email Clients: users may send or receive learning objectsor information about them by email messages or RSS feeds.

• Audio and Video Players: users may use an audio or video player to learn.

• Other tools, such as Microsoft Word and PowerPoint.

The Attention Metadata Module (AMM) converts the information of the differentsources into the correct format and then the information is merged into a singleattention metadata record. This record is the basis for the object’s metadata.

3.3 Different Sources - Different Values

In this section, we introduce a confidence value to express an uncertainty abouta given metadata value and to solve conflicts between sources. The uncertaintyabout values is closely related to the discussion on the quality of metadata, whichwe discuss first.

3.3.1 Quality of Metadata

For the operation of learning object repositories in general and specifically thereusability of the learning objects, the quality of metadata is an important issue[Hughes, 2004; Guy et al., 2003]. In the discussion about quality, a distinction ismade between correctness and quality. Correctness is an objective measure andthus better associated with objective metadata. Objective metadata is factual in-formation, such as physical attributes, dates and so on [Hodgins, 2000], whichindeed can be right or wrong. Subjective metadata, on the other hand, are themore varied and valuable attributes determined by the person who creates themetadata. For these metadata, it is difficult to argue whether a value is correct ornot. A indication of the quality is more appropriate for these metadata.

In [Barton et al., 2003], this is argued as follows: ”There will always be someaspects of the metadata that are inaccurate, inconsistent or out of date, evenin systems which have extensive quality assurance procedures in place and haveinvested heavily in the creation of good quality metadata.[. . . ]Nevertheless, it isessential that quality assurance is built into the metadata creation process at the

3.3 Different Sources - Different Values 67

outset, that its scope extends beyond the local context and that the resultingmetadata is ’good’ as it can be within the inevitable limitations of time and cost.”

For subjective metadata, we argue that it is indeed more important to have goodquality metadata, as mentioned in the quote above, opposed to trying to indicateright or wrong for these subjective metadata. It is indeed true that some valuesmay be better than other metadata values, but this does not imply that the lat-ter should not be used as they still may help us to find relevant learning material.[Guy et al., 2003] describe the importance of qualitative metadata as follows: ’highquality metadata support the functional requirements of the system it is designedto support’, which corresponds to the definition for metadata as function enablersin Chapter 1.

To assess the quality of metadata, multiple approaches have been taken. In generalwe can distinguish two types: applying human expertise and quality measurement.Although it is not our goal to discuss these system in detail, we briefly describeboth types below.

• Human ExpertiseHuman expertise is applied in the manual metadata generation process. Thistakes place when a person such as a professional metadata creator or con-tent provider produces metadata, either for the initial metadata or for thevalidation of existing metadata [Greenberg, 2003].

Validation mechanisms, for both metadata and contents, have been imple-mented in learning object systems, but the results have not always beensatisfiable, as shown by the following two examples.

– MetadataThe first version of the ARIADNE Knowledge Pools System distin-guished between validated and unvalidated metadata records [Duval etal., 2001]. Validated records were controlled by experts, possibly modi-fied, and then approved. Upon validation of the metadata, the learningobject becomes available for all the users of the systems for reuse. In anext version of this system, validation has become optional as an extraquality label. Learning objects stored in the repository may be reusedby other user even if the metadata record has not been validated.

– ContentsAn example of a quality assurance mechanism for the learning contentsis the MERLOT system [Swift, 1997]. In this system, peer reviews areused to control the quality of the learning objects. In practice, thesepeer reviews contain metadata about the learning object at hand, suchas the pedagogical goals of the learning object, an abstract, and so on.


Implementing such explicit validation, however, is difficult to establish. Gen-erally, we can say that the validation process suffers the same problem asthe manual indexing of learning objects, it is difficult to perform and time-consuming. For example, in March 2006, only about ten percent of the MER-LOT contents have been reviewed: in the Science and Technology category658 of the 5570 learning objects had a peer review record associated withthem [Knauff, 2001]. For the ARIADNE system, we already mentioned thechange to optional validation as an extra quality mark for the records, be-cause the validation mechanism has shown to be a bottleneck in the reuse lifecycle of learning objects. Often, learning objects were stored in the repos-itory in order to reuse them quickly in another context, but the validationprevented this.

• Quality MeasurementBecause of the subjectivity of the metadata we are measuring, it is difficultto assign objective values to this quality. Therefore, evaluation frameworkshave been defined that help to reduce this subjectivity. Currently, the mostimportant work has been performed by [Moen et al., 1997], who define 23evaluation. This set includes, among others, accuracy, consistency, complete-ness as criteria. Afterwards, this initial set has been adapted to improve theapplicability [Gasser and Stvilia, 2001; Bruce and Hillman, 2004]. In [Ochoaand Duval, 2006], a proposal is given to convert these evaluation criteria intoobjective metrics for quality assessment.

3.3.2 Uncertain Metadata

An alternative for the validation process is an indicator about the certainty of ametadata value. Instead of arguing whether a value is correct or not, the meta-data source indicates a priori how confident it is about that value. This confidencecan be kept in mind when searching for learning objects [Mihaila et al., 2000].Although this confidence value can be provided by human indexers too, we thinkit is more appropriate in automatic metadata generation. In an automated pro-cess, the algorithms will be evaluated to define a confidence value that can beassociated with the different values the algorithm generates. In some situations,the algorithm could calculate this value automatically (for example a language de-tection algorithm using N-Grams [Cavnar and Trenkle, 1994] could calculate thisautomatically).

We introduce a confidence value for metadata elements that expresses this cer-tainty about a given value. The table below shows a few example metadata ele-ments and their value together with a confidence value for a learning object. Forexample, the confidence about the Difficulty Level to be ’very difficult’ is 95%,which is a high confidence; about the Typical Learning Time the record is less


confident (60%). A confidence value of 1 indicates complete confidence, a value of0, completely unconfident.

Metadata element Value Confidence valueAuthor E. Duval 0.85Typical Learning Time 30min. 0.6Difficulty Level ’very difficult’ 0.95Package Size 356KB 1.0

The confidence value is inspired by a Bayesian certainty factor, the fuzziness valuein fuzzy set theory [Zadeh, 1995], or to fuzzy data for the semantic web. In thesedomains, the value always indicates more or less the same: the certainty, confi-dence, probability of a given value (we discuss the difference on these in section4.3). For our purpose, the confidence value, however, is only a means that help tosolve conflicts between metadata sources. As a consequence, we do not define thisconfidence value more formally.The practical management of semantic knowledge bases needs itself classificationsthat are fuzzy by nature, such as trustworthy or reliable. These categories are fuzzyby nature and cannot be sharply defined. However, people make sense out of thisinformation and use it in decision making [Mazzieri, 2004].

We refer to Chapter 4 for the definition of how this confidence value can be appliedin metadata.

3.3.3 Conflicts Between Sources

As the metadata of a learning object can be updated in different phases of the lifecycle, it is possible that a conflict occurs between an old value and a new value. Itis also possible that different sources of information provide different informationat the same time. In these cases, we possibly have to solve the conflict betweenthose values.

Generally, we have several options when multiple values are provided for the samemetadata element:

• Include all the generated values in the resulting metadata set;

• Propose the different values to the user and let him/her decide which toretain;

• Apply the confidence value about the values to decide which to retain;

• Apply heuristics to decide.


The first option - including all the values in the resulting set - is the easiest toimplement and might be feasible for some metadata elements. For example, a listof concepts could contain all the keywords returned by several sources. Dependingon the metadata element this solution is feasible because that metadata elementcan contain multiple values. For other elements, if only one value is permitted, thissolution is not possible.

The second option could be used in a small system with only a small numberof new entries per week or month. In larger systems, however, we would lose allthe benefits of automatic indexing as the user has to spend time on controlling allthe values and decide which one to use.

In the third option, the system decides on the most appropriate value for theelement. This decision is made based on the confidence value associated with thedifferent values. Different algorithms can be used to solve the conflict as we discusslater.

The fourth option applies in certain cases, if experimentation or evaluation hasdefined rules for certain metadata values. In that case, those heuristics may pro-vide the solution to the conflict. An example element for which heuristics mayapply is the document language. A lot of families of languages exist and in thosefamilies the differences between languages might be very small. For example Italianand Catalan are closely related to each other but are different languages; the sameapplies to Afrikaans and Dutch. If one indexer decides the language is Catalan,the heuristic might say to use Italian. In either case, if the document is used in anItalian or a Catalan environment, the users will probably understand the contentsand thus be able to use the object. Applying Catalan for the document languagehowever could be more precise but the value Italian is not wrong. Remember forthis latter case the discussion on the correctness of metadata values in which westated that there’s often no strict ’wrong’ or ’right’.

3.3.4 Using Confidence Values to Solve Conflicts

When conflicts occur, the confidence value can be applied to solve them. Thisvalue allows to do calculations about the given values if necessary.

For example, to decide between two values, the simplest solutions could be toretain the value that has the largest confidence value associated with it. This isshown in the table below, for the typical learning time element.

Value Confidence Value Retain33min. 0.75 ←60min. 0.50


This example takes the simple assumption that the most confident source providesthe ’best’ value. In Chapter 4, we go into the details of the confidence value, distin-guishing between certainty, confidence and accuracy. For the moment, we assumethat a higher confidence value indicates a higher probability of a correct or bettervalue.

Next to simple solutions, as shown above, the confidence value can be appliedto solve more difficult conflicts. Suppose that two different sources provide thefollowing information about the authors of a learning object:

Source Value(s) Confidence Value1 K. Cardinaels 0.78

E. Duval 0.562 K. Cardinaels 0.43

E. Duval 0.82

To solve this conflict we apply the metrics defined by D. McAllister to combinethe certainty of different evidence in expert systems [Chassell, 1997a]. McAllisterdefines this rule to combine factors of evidence to obtain a certain conclusion abouta situation. In our context we combine certainties about conclusions to obtain afinal conclusion. The rule for positive evidence is defined as follows:

CFcombine(CFa, CFb) = CFa + CFb − CFaCFb

CFa and CFb are the certainty factors of two sources of evidence a and b. CFcombine

is the certainty factor of the combination of these two sources of evidence. Thefollowing example shows the application of this rule:

Consider the predicate: The pilot is suffering from hypoxia.

Given the fact that the pilot is flying at an altitude of 4,000m in an unpressuredairplane. And, given the fact that the pilot informs us that he is feeling wonderful.We can state that the first factor is a strongly suggestive evidence (certainty factor0.80) for suffering from hypoxia and the second factor is suggestive evidence (cer-tainty 0.60) for the same conclusion.

In combination, these two evidences result in a certainty factor of 0.92 that thepilot suffers from hypoxia.

Applying this rule to the example above results in the following combined facetvalues and confidence values:

Value(s) Confidence ValueK. Cardinaels 0.875E. Duval 0.921


In this case the combination rule provides an ordering for the two values. Depend-ing on the metadata element, either both values may be kept in the specified orderor the value with the highest combined confidence value may be retained.

3.4 Conclusions

Starting from the dynamic life cycle, defined in the previous chapter, we haveintroduced automatic metadata generation for learning objects, based on the dif-ferent sources of information that become available in the different phases of thatlife cycle.

Similar to other research projects and developments on advanced search enginesand automatic metadata generation, we do not invent a radically new method formetadata extraction, but integrate existing methods into a single framework. Inthis research, the framework focuses on learning objects in specific and metadataresulting from a dynamic life cycle.

There are two important aspects for the metadata in this framework:

1. The life cycle implies the use of different sources of metadata. Without theadvanced use of these sources, this framework is not different from otherindexing frameworks. However, the use of sources such as contexts and actualuse analysis, results in much more advanced metadata.

2. The use of relationships between learning objects and, especially, the rela-tionships between learning objects in content aggregates, creates even moreopportunities for the definition of metadata. In this way, content models arenot only important to enable reuse through repurposing of learning objects,but also for the definition of metadata values. In Chapter 4, we providesome initial definitions of metadata propagation for content aggregates. Fur-ther research on this topic, however, has to be performed, especially on therelationship between content models and metadata values.

Another aspect we have focused on, is the possible problem with conflicting valueswhen multiple sources provide information about the same element. In this chapterwe have suggested the use of confidence values to solve this problem. As mentionedabove, this confidence value also expresses a uncertainty about the metadata. Theintegration of these values in repositories, opens up the possibility for other typesof queries for learning objects, such as fuzzy queries [Carrasco et al., 2003].

Chapter 4

Formal Reuse and MetadataModel

4.1 Introduction

In this chapter we develop a formal model for learning object metadata. On theone hand, such a model helps us to reason in a formal, rigorous way about meta-data in the case of share and reuse, and, on the other hand, it helps to implementthe automatic metadata generation framework (see Chapter 5).

One of the important aspects we introduce in this model is context-awarenessin metadata. With respect to the dynamic learning object life cycle, we indicatethe importance of expressing metadata relatively to specific situations. A contextrepresents such a situation in which a learning object can be used and in whichthe metadata are to be interpreted. The use of contexts allows us to define moreadvanced metadata values which enable a better, i.e. more efficient and more ef-fective retrievability of learning objects.

The second innovation our model introduces, is a first definition of metadata prop-agation rules in the context of content aggregation. The propagation rules definerelationships between the metadata elements of the aggregate and the components.These rules can be operationalized in order to generate derived metadata, or inorder to validate the quality of metadata.

This chapter defines the model in different sections. We provide basic definitions oflearning objects and metadata in section 4.2, which form the basis of the model. Insection 4.3 we introduce fuzzy metadata using a confidence value associated withmetadata elements. In Section 4.4 we extend the model by introducing learning

73

74 Formal Reuse and Metadata Model

object contexts for metadata. In section 4.5 we compare our simple model withthe RDF metadata model because this is a generally approved model for metadataabout resources in general. In section 4.6 we explain how the formal model can beapplied to express IEEE LOM records or Dublin Core metadata which are widelyused metadata standards. In section 4.7 we define metadata propagation rules forcontent aggregations. In section 4.8, we summarize our findings and provide somepossibilities for further research.

4.2 Learning Objects and Metadata

4.2.1 Learning Objects

The first definitions of our model consider metadata for learning objects withouttaking into account any context in which they are used or reused. In this section,we consider learning objects as atomic objects from which the inner details arehidden. The model introduces a notation for learning objects as elements of a set:

L = {l|l is a learning object} (4.1)

4.2.2 Single-valued Metadata Elements

Metadata are data about data. In the model we consider metadata elements indi-vidually, each one describing a specific aspect of the learning object. Later in thischapter we show how elements can be grouped into records.

In the remainder of this chapter, we use the term facet to refer to a metadataelement; a facet value is the value for a given facet. The term facet is appro-priate because we are, in some way, dealing with multi-faceted classifications asintroduced by Prieto-Dıaz in [Prieto-Dıaz, 1991]. In the formal modal, a facet isrepresented as a function providing the value for that facet for a given learningobject.

f : L → codf (4.2)

The function f maps a learning object to the facet value, in codf , the codomain off . The codomain of a function is the set of possible values resulting from applyingthe function to all elements in the domain. We assume that the set of possiblevalues always includes the value null; our use of the value null is similar to theway it is used in the relational model, in which this value may indicate that thevalue of the facet is unknown, unavailable or unapplicable [Elmasri and Navathe,2004].

4.3 Fuzzy Metadata 75

Analogously to learning objects, this definition of facets does not specify anythingabout the possible values in the codomain. This approach opens the possibility toinclude all sort of metadata items, even with a complex structure.

4.2.3 Multi-valued Metadata Elements

Definition 4.2 applies to single-valued elements only - the function maps the learn-ing object to one and only one element of the codomain. In the previous chapters,we distinguished between single-valued and multi-valued elements; therefore, weimmediately redefine a facet to include multiple values also.

f : L → (codf )n (4.3)n ≥ 1

(codf )n is the Cartesian product of the set codf . This definition states that thevalue f(l) can be a tuple (f1(l), f2(l), f3(l), ...) consisting of n values.

4.2.4 Metadata Records

In definitions 4.2 and 4.3, we considered metadata as individual items that describea learning object. In practice, metadata are often collected in records. A metadatarecord is a grouping of metadata facets associated with a single learning object. Arecord can be a flat set of facets or it may contain a structure. Our model allowsus to work with both individual metadata facets and metadata records.

We define a record as a metadata facet grouping different facets together:

r : L → codr

codr = codpf1× codq

f2× . . .× codx

fn(4.4)

This definition is recursively applicable to the facets fi allowing to introduce ahierarchical structure in the metadata record.

An example of a metadata record is given in section 4.6 in the discussion of IEEELOM records.

4.3 Fuzzy Metadata

For the automatic metadata generation, we have introduced the aspect of a con-fidence value for generated metadata. In this chapter we use this value to define


fuzzy metadata.

Fuzzy metadata is inspired by fuzzy set theory [Zadeh, 1995] or fuzzy logic, al-though we do not intend to define it as rigorously as is done in these theories.In these theories a degree of truth is associated with the elements of the set, ex-pressing how true the expression is in that set. This degree of truth is defined as amembership function µ → [0, 1]. [Marchiori, 1998] introduced fuzziness for meta-data in the context of web pages, expressing the relevance of the facet value for agiven page. We will use this fuzziness to indicate the confidence about a value inorder to solve conflicts between metadata values.

We start our discussion on fuzzy metadata for metadata generation in the mostsimple situation in which we only address the learning objects themselves. After-wards we extend our mechanisms by looking at contexts of reuse also.

4.3.1 Basic Definitions

In Chapter 3, we have introduced the fuzziness attribute for metadata as a mecha-nism to express the confidence about an automatically generated facet value. Usingthis confidence value, we now define fuzzy metadata as follows:

f : L → codf × [0, 1] (4.5)

In the general form, in which multiple values can be associated with one facet, theconfidence value is associated with each individual value.

f : L → (codf × [0, 1])n (4.6)

The confidence value can be defined analogously to the membership function offuzzy logic. This function provides the confidence value of a certain facet valuefor a learning object; for conflict resolution (see further), it is necessary that thisfunction can return the confidence value for every facet value provided by differentmetadata sources.

CVf : codf × L → [0, 1]n (4.7)

In general, fuzzy metadata can be applied in two manners. In the first, the fuzzinessvalue helps in the metadata generation process as we discussed in the previouschapter. In the second manner, it introduces the idea of uncertainty in learningobject metadata which is a new approach for these kinds of metadata, comparableto fuzzy metadata for web pages [Marchiori, 1998].


Language Acc (%) Language Acc (%)Afrikaans 97 Italian 95Croatian 100 Norwegian 95Czech/Slovak 44 Polish 100Danish 96 Portuguese 96Dutch 100 Rumanian 93English 95 Spanish 95French 92 Swedish 98Hungarian 94 Welsh 97

Table 4.1: Language Detection Accuracy

Example: Given the values from Table 4.1 and given a learning object s, theconfidence value for the facet language could be the following:

CVlanguage(”English”, s) = 0.95

The example states that the confidence value ”English”for the facet ”language”is

0.95.

The confidence value as we define it, should not be confused with an accuracy ofa specific algorithm or the probability that a value is correct. From our point ofview the accuracy or the probability are possible ways of calculating the confi-dence value, but these aren’t the only possibilities. The accuracy of an algorithmis measured a posteriori, calculating the correctness of the algorithm given a dataset. The probability expresses an extrapolated value starting from known results.Our confidence value expresses a subjective feeling about a value. Of course, thisvalue can be calculated and defined from the accuracy or a probability, as shownbelow.

Table 4.1 provides an example of a language classification algorithm and the accu-racy of the algorithm according to different languages (see [Sibun and Spitz, 1994]for a discussion on the algorithms). This accuracy value is an excellent exampleof how the confidence value could be defined for a specific metadata facet, i.c.the accuracy is used to define the confidence value for the language element. Forexample, if the algorithm indicates the language of learning object as English, itwould associate .95 as the confidence value for that value.

Another example to define the confidence value is using heuristics about the learn-ing objects. For example, defining the language of a learning object using courseinformation such as language and audience. In this case, we can define the languageequal to the course language with a rather high confidence value if the audience isfirst year university or higher education.


4.3.2 Selecting Facet Values

Our general definitions of metadata facets allow the possibility that multiple val-ues for one facet are associated with a learning object, as we have explained forthe dynamic life cycle in which multiple sources can provide information for thesame facets. Obviously, this can result in conflicting values from which the ’best’value must be kept.

The less complex situation to solve conflicts is when only one value should be kept;then the best value, e.g. with the highest confidence value, is used. When multiplevalues should be kept, it is more complex to solve conflicts between sources. Inthese situations, the ordering of values can be important, values missing in one ormore sources must be handled, and so on. A general solution for these problemsquickly results in complex situations. The function MG that we define next, triesto hint some possible solutions on how confidence values can be used to solve theseconflicts. We define MG as the function that provides the appropriate value(s) fora given facet:

MGf : L × (codf × [0, 1])n → (codf × [0, 1])m

m ≤ n (4.8)

Possibly, this function returns multiple values that are applicable for the givenfacet (m > 1). In this case, the function MG can be defined to provided an order-ing among the different values (e.g. in the case of multiple authors).

For example, when every source provides only one value for a facet, the simplestsolution to retain one value is to keep that value that is returned with the highestconfidence value.

MGf (l, f1(l), f2(l), ..., fn(l)) = fj(l) for that j for whichCVj(f, l) = max

∀k(CVk(f, l)) (4.9)

Although this simple situation is easy to define, the generalization of the problemof multiple values tends to become difficult to manage. In Chapter 3, we suggestedto use the rules to combine positive certainty factors of evidence given by DavidMcAllister [Chassell, 1997a]. We now briefly retake the example provided in thatchapter to illustrate the function MG.

Example: In the example we are looking at authors of documents. We use twoalgorithms to define the author(s) of a given learning object l. Suppose the followingvalues for the facet author are given:


Source Value(s) Confidence Values1 K. Cardinaels 0.78

E. Duval 0.56s2 K. Cardinaels 0.43

E. Duval 0.82

CFcombine(CFa CFb) = CFa + CFb(1− CFa)

Applying this rule to the example above results in the following combined facetvalues and confidence values:

Value(s) Confidence ValueK. Cardinaels 0.875 = 0.78 + 0.43(1-0.78)E. Duval 0.921 = 0.56 + 0.82(1-0.56)

In this case the combination rule provides an ordering for the two values. Depend-ing on the metadata element, either both values can be kept in the specified orderor the value with the highest combined confidence value can be retained.

Or, formally, the result of MG in this situation is given as follows:

MGauthor(l, s1, s2) = ((E.Duval, 0.921), (K.Cardinaels, 0.875))

given

s1 = ((K.Cardinaels, 0.78), (E.Duval, 0.56)) and

s2 = ((K.Cardinaels, 0.43), (E.Duval, 0.82))

We can say that the confidence value seems to be a good candidate for solvingproblems in automatically generated metadata values if only one value can beretained for a certain facet or when an ordering should be decided. In simplesituations, a straightforward solution can be applied, in more complex situations,the confidence value can be used to rule out the most appropriate value.

4.3.3 Information Retrieval Models

Due to the purpose of metadata – finding relevant learning objects – there shouldbe a relationship between a metadata model and the information retrieval modelsavailable. The use of confidence values to express a certainty about the metadatafacet closely relates to some of the major information retrieval models:

• Fuzzy Set TheoryIn this theory an element has a varying degree of membership to a set insteadof the traditional binary choice. Every index term gets a weight for the givendocument which reflects the degree to which the term describes the contentsof the document [Zadeh, 1995].

In the context of learning object metadata and metadata generation, fuzzymetadata can be interesting in two different ways:


– In the first way, the fuzziness attribute is used during the automaticmetadata generation in order to solve possible conflicts. The confidencevalue expresses the confidence about the value provided by the indexer.

– In the second way, the generated fuzziness attribute is stored togetherwith the metadata for later use during search and retrieval of learningobjects.

• Vector Space Model [Salton et al., 1975]The vector space model represents the documents as vectors in a multidi-mensional space, whose dimensions are the terms used to build an indexto represent the documents [Spoerri, 1995]. The values in the vector are astatistical computation of the frequency of occurrence of each index termwithin in the document:

Di = (di1, di1, . . . , dit)

Different methods to calculate these values are available.

The advantage of the vector space model is the possibility to compute thesimilarity between vectors, using, for example, the cosine similarity measure.This method is applied to retrieve documents relevant for a given query (alsoexpressed as a vector).

A metadata record in our model could be interpreted as a sparse repre-sentation of a vector in the vector space model, including only those ’terms’for which the record contains a value. For each such term, the confidencevalue would be applied as dij . In this way metadata records for learningobject could be compared according to methods in the vector space model.

This comparison with information retrieval techniques also reveals another im-provement of the confidence value over the traditional metadata models. Using theconfidence value the possibility rises to express more advance queries for learningobjects. Traditionally, queries are solved to match exactly the metadata termsagainst the learning object metadata, but now a similarity between the queriesand the learning objects can be expressed applying techniques from informationretrieval.

4.4 Context-Awareness in Metadata 81

4.4 Context-Awareness in Metadata

4.4.1 Contexts

Description

A context represents a specific situation where a learning object can be used. Gen-erally, metadata values for learning objects are context-dependent and this factshould be represented in the metadata for the learning object [Brasher and McAn-drew, 2004].

In this section we consider how contexts can be brought into metadata. Mak-ing metadata context-dependent allows the users to do more fine-grained searchesfor learning objects to use in their environment. Context-awareness in metadataalso helps to enable more intelligent tutoring systems, based on adaptive hyper-media in which the current learner model is used to select learning objects to beshown (e.g. [O’Connor et al., 2004]). This latter is not the focus of this research.

Context-aware systems define a context as follows:

Any information that can be used to characterize the situation of anentity. An entity is a person, place, or object that is considered relevant(to the interaction between a user and an application, including the userand application themselves). [Dey, 2000]

A system is context-aware if

it uses context to provide relevant information and/or services to theuser (, where relevancy depends on the user’s task). [Dey, 2000]

Finding a complete definition of what a context should embrace is illusive, a mini-mal set of what it should be is given by the ”five W’s”: Who, What, Where, Whenand Why [Abowd and Mynatt, 1999].

Example

We clarify the use of contexts by writing out the example given in [Robson, 2002]about the Riemann Zeta function1.

An animated fly-through of the graph of the Riemann Zeta functionmight be considered:

• Advanced for use in a public mathematics lecture,1 The Riemann Zeta function is an important function in number theory because of its relation

to the distribution of prime numbers. This function is denoted as ζ(s) with s ∈ C.


• Moderate for use in a complex analysis course,

• Beginning as an example of computer animation techniques.

It makes sense to have different metadata records associated with thesame fly-through graph that reflect the intended use of the graph bydifferent communities.

We define the contexts as follows:

• PM: public mathematics lecture

• CA: complex analysis course

• COMPA: computer animation techniques

Using these contexts we can for example specify the difficulty

Educational.Difficulty(rz, PM) = ’very difficult’Educational.Difficulty(rz, CA) = ’medium’

Educational.Difficulty(rz, COMPA) = ’easy’

Contexts-Independent Facets

Remark that not all metadata facets have to be context-dependent. It is possiblethat the metadata set includes facets that have the same value for all possiblecontexts. Our definitions given in the previous sections allow this situation. Wecan apply either the first definition of single-valued facets or the second one ofmulti-valued facets.

Example: The LOM standard defines several elements that can be consideredcontext-independent:

• Most elements in the technical category, such as file size, format or duration.

• Elements in the life cycle category, about the creation of the object.

In general, the educational facets are most context-dependent than the pedagogical

facets.

4.4.2 Definitions

Now we redefine metadata and reuse in the case where metadata will be contextdependent. This is a major change from the IEEE LOM standard approach or theDublin Core Metadata Initiative.

4.4 Context-Awareness in Metadata 83

At first, we consider a context as an abstract object or black box of which wedon’t know the inner details. Typically, a context will be a course at an institu-tion, or information about a person incorporating information as described in theprevious section.

To work with contexts in our definitions, we introduce a set of contexts C:

C = {c|c is a context} (4.10)

Now, we can redeclare fi to take as input the learning object and, possibly, multiplecontexts.

fi : L × Cm → (codfi× [0, 1])n (4.11)

Using a context as a black box object is a very simplistic approach. Contexts aremuch more complex and should be modeled adequately for proper use by peopleor systems. In [Apps and MacIntyre, 2003] a ContextObject is used to model acontext for usage in the OpenURL Framework for context-sensitive services. AContextObject contains up to six entities to represent its context: the referent,the referring entity, the requester, the request resolver, the referring service andthe type of service requested. At this moment, however, this is sufficient for ourpurposes.

4.4.3 Context Reification

From the metadata generation point of view, a specific category of contexts isof much interest. In Chapter 3 we described these contexts as composite docu-ments or content aggregates. Within a content aggregate, the learning objects areclosely related to each other and influence each other’s metadata (see also further).

In the case of content aggregates, the composite is a context for the componentsand vice versa. In this situation, the context is a learning object itself.

We can use reification of contexts in our model: context reification considers thecontext itself a learning object. Then we can use the model to associate metadatato the context. For automatic metadata generation, this metadata can help tocreate the metadata values of the learning object within the context.

The definition of a metadata facet in this reification becomes:

C ⊂ Lf : L × Lm → (codf × [0, 1])n (4.12)


The advantage of this reified statement is that a context is a learning object. Thecontexts can be described using the metadata as described in our model, as is alsopossible in the RscDF described above.

4.5 RDF Metadata Model

No Contexts

The metadata model based on facets, described above, is closely related to themodel used in the Resource Description Framework. Formally, RDF uses triplesas the expression mechanism. Each triple or RDF expression consists of a subject,a predicate and an object, which can represented as an RDF graph as shown inFigure 4.1 [W3C, 2004].

Figure 4.1: A Simple RDF Graph

The RDF triplet expresses the relationship ’resource X has value Y for predicate Z’.[Melnik, 1999] defines an RDF triplet not as a function because a single propertycan result in multiple values for one resource, but as a product of sets:

S ⊂ P ×R× (R ∪ L ∪ {blank node})S: set of statementsP: set of propertiesR: set of resourcesL: set of literals

For example, the following triplet is a correct RDF statement:

(creator, ”http://www.w3.org/Home/Lassila”, ”Ora Lassila”)

With respect to resources and properties, RDF and our model do not differ much.In general, RDF is more general, talking about resources in general, not only learn-ing objects. The properties that can be defined are comparable in both models. Insummary, we say that the set L is a subset of the set of resources R and that afacet f is an element of the set of properties P :

L ⊂ R

f ∈ P

4.5 RDF Metadata Model 85

Figure 4.2: A Structured RDF Graph

The main difference between RDF and our metadata model is the definition ofthe facet values. We have defined the possible values as elements of the codomainof the facet, and introduced records as a possible structured facet. RDF definesvalues as resources or literals (R ∪ L ∪ {blank node}).

Structured descriptions are created referring to a resources (URI reference) orthrough the insertion of blank node. A blank node must be treated as an anony-mous local resource (with respect to the RDF statement). This is shown in Fig-ure 4.2 and Figure 4.3. This is comparable to the possibility in our model to createcomplex structures, such as metadata records.

The generality of RDF allows to use resources as facet values, this is not pos-sible in our model. In our model resources are learning objects, which cannot befacet values directly. In order to allow relationships between learning objects (e.g.”this learning object is a version of learning object x”), the identifiers are used.

The RDF model is a very flexible and open generalization of our model. Indeed,using RDF, any type of property can be applied to any type of resource. For ourobjective, however, this generality is not required and it would render the furtherextensions of the model more complex; therefore we keep our model for learningobject metadata.

Resource State/Condition Description Framework (RscDF)

The definition of context-sensitive metadata is also used in the RscDF quads([MacGregor and In-Young Ko, 2003] and [Nikitin et al., 2005]). An RscDF quadis a Resource State/Condition Description Framework tuple and is representedas (C, S, P,O). C represents the context and (S, P, O) is an RDF tuple (sub-


Figure 4.3: A Blank Node in RDF

Figure 4.4: Resource State/Condition Description Framework

ject, predicate, object). See also Section 4.5 for more about RDF and dynamicmetadata. Figure 4.4 graphically shows an example of a quad using the propertyrscdf : trueInContext to include the context in the tuple.

Our formal model also allows the expression of the context-sensitive metadatain this RscDF language using the following statements:

[PM rz lom-edu:Difficulty ’very difficult’][CA rz lom-edu:Difficulty ’medium’][null PM dc:title ’Public Mathematics Lecture’][null CA dc:title ’Complex Analysis Course’]

These quads express that in contexts of the ’Public Mathematics Lecture’ (PM)and the ’Complex Analysis Course’ (CA) the value of Educational.Difficulty isrespectively ’very difficult’ or ’medium’ as was given in the example earlier in thissection.

4.6 Metadata Standards 87

An advantage of the RscDF is the possibility of adding metadata to context with-out special constructions. A context is considered a resource and, thus, can getmetadata. In our model, we only talk about a special type of contexts, which canbe considered learning objects too, such as content aggregates (see the next sec-tion) and thus can get metadata in our model. The generality of RDF allows thisconstruction, which is not possible in the formal model for learning objects.

4.6 Metadata Standards

Currently, two major metadata standards are in use. The first one is the IEEELOM standard, for learning objects [IEEE, 2002], the second is the more gen-eral Dublin Core Metadata Initiative [Powell et al., 2004]. The major differencebetween these two systems is the way metadata are associated with the learningobject. In the LOM standard, metadata are contained in records which are struc-tured as hierarchical sets of metadata elements. Dublin Core defines independentelements.

Using the formal model, we can cope with both approaches as we will show inthis section.

4.6.1 IEEE LOM

Data Elements

IEEE LOM metadata instances are defined as a structure of data elements groupedinto categories. Data elements are either nodes containing simple data elementsor aggregate data elements defining a sub hierarchy. Aggregate data elements canbe expressed in our model using the construction for records (see section 4.2.4);simple data elements correspond to the facets that we have defined:

LOM Formal Modelsimple data element ≡ facetaggregate date element ≡ record

Because of the goal of the LOM standard, data elements are defined more explicitlywith value space and data type. For our purpose, these aspects are not importantin the definition.

List Values

Some elements in LOM are defined as multi-valued elements of which the value isa list of values. This corresponds to multi-valued facets that we have defined above.


LOM further distinguishes multi-valued elements in ordered and unordered list.For example, the list of authors of an object is ordered because the ordering ofthe values is important. Through the definition of a multi-valued element as aCartesian product of the codomain, the value of such element is implicitly orderedbecause the tuples in a Cartesian product are ordered tuples.

LOM Formal Modelsingle-valued element ≡ f : L → codf

list value ≡ f : L → codnf

(f1(l), f2(l), ...)

A LOM Record

Using the comparison of the data elements in LOM and the definitions of ourmodel, we can express a LOM record as follows:

lom : L → general× lifecycle×metametadata×technical× educational× rights×relation× annotation× classification

general : L → identifier× title× language×description× keyword× coverage×structure× aggregation level

lifecycle : L → version× status× contribute. . .

We can express a record instance using this definition as shown below. We speci-fied only the elements General.Title and General.Language to show how a recordinstance can be created.

lom(l) = (general(l), lifecycle(l), metametadata(l),technical(l), educational(l), rights(l),relation(l), annotation(l), classification(l))

general(l) = (identifier(l),[”en”, ”The Life and Work of Da Vinci”],(”en−GB”, ”de”, ”fr − CA”),description(l), keyword(l), coverage(l),structure(l), aggregation level(l))

. . .

4.6 Metadata Standards 89

Figure 4.5: DCMI Resource Model

We have used [”language code”, ”string”] to indicate a LangString value(see [IEEE,2002] for the definition of this datatype). In this example, the title of the object isthe English sentence ”The Life and Work of Da Vinci”. A list of values is enclosedwith (). The language facet contains three values ”en-GB”, ”de”and ”fr-CA”.

4.6.2 Dublin Core Metadata Initiative

The abstract model of Dublin Core Metadata Initiative [Powell et al., 2004] dis-tinguishes between a resource model (Figure 4.5) and a description model (Fig-ure 4.6). For our purpose, the resource model is most interesting because it defineshow properties are associated with the resources. The description model extendsthe resource model by defining how statements about resources are grouped intodescriptions and description sets.

Dublin Core is closely related to the RDF model, which we have compared toour model above. The remarks made for RDF, about its generality, are also validfor the DC model.The important aspects, with respect to our context, of the resource model and thedescription model are defined as follows:

• Each resource has zero or more property/value pairs.

• Each property/value pair is made up of one property and one value.


Figure 4.6: DCMI Description Model

4.7 Metadata Propagation 91

• Each value is a resource (the physical or conceptual entity that is associatedwith a property when it is used to describe a resource).

• A description is made up of one or more statements (about one, and onlyone, resource) and zero or one resource URI (a URI reference that identifiesthe resource being described).

• Each statement instantiates a property/value pair and is made up of a prop-erty URI (a URI reference that identifies a property), zero or one value URI(a URI reference that identifies a value of the property), zero or one vocab-ulary encoding scheme URI (a URI reference that identifies the class of thevalue) and zero or more value representations of the value.

We go into the details of two aspects of these models which relate closely to whatwe defined before.

Property/Value Pairs A property corresponds to the definition of a metadatafacet in our model. The property is defined as the metadata facet and the value isan element of the codomain.

In the resource model, a value is defined as a resource itself, therefore, we definethe codomain of the property as a subset of the set of resources. This generalityresults also from the relationship with the RDF model.

Descriptions and Description Sets A description is the grouping of severalstatements about a single resource. This is similar to the record definition we havegiven earlier in this chapter. The description corresponds to the record.

Description sets group several descriptions together. Mostly, these sets containdescriptions about related resources and possibly also a meta-description aboutthe set itself. For example, a description set may contain descriptions about apainting, the artist and the creator of the description set itself.

Our formal model does not support the notion of description sets. Metadata facetsand records always describe one learning object individually or provide meta-metadata about the facets.

4.7 Metadata Propagation

The generation of metadata based on relationships is called metadata inheritanceor metadata propagation [Hatala and Forth, 2003; Hatala and Richards, 2003].Metadata propagation is mostly associated with content aggregates, as a majorpart of learning object reuse can be classified as creating content aggregates or


using components from such aggregations as individual objects. Secondly, contentaggregates are interesting to use because the relationships between the differentcomponents are rather easy to identify. The metadata of the components and thoseof the aggregate closely relate to one another and they can be applied to auto-matically define metadata, using metadata propagation. In this section, we definehow this propagation can be applied.

Formally, a content aggregate is a learning object that contains other learningobjects as its components. These components can be explicitly related to one an-other using a content model, e.g. the IMS Simple Sequencing Specification thatspecifies rules for the branching or flow of the content according to the user’s inter-action with the content [Norton and Panar, 2003] or implicitly resulting from theaggregate structure (e.g. a slide show). For us, these relationships or the internalstructure of the content aggregate are not important; we define an aggregate as aset of learning objects. In that case, a content aggregate is an element of the setL and it is a subset of that set:

a ∈ La ⊂ L

We refer to the elements of the aggregate as ci, the component learning objects ofthe aggregate.

In order to formulate the rules of metadata propagation, we rule out a possibledifficulty factor in metadata: contexts. Using context-aware metadata challengesthe definition of propagation rules as metadata can be context-dependent. In gen-eral, context-dependent metadata cannot be propagated across different contexts.We discuss this issue at the end of this section.

Metadata Categories

Not all types of metadata elements expose the same behavior concerning the propa-gation within content aggregations. In order to express propagation rules for meta-data elements, it is important that we classify the elements in different categories.Investigating the IEEE LOM element set, we can distinguish the following typesof metadata:

• Not-propagated metadata

• Accumulated metadata

• Implied metadata

• Additive metadata


• Special relationships

We explain these categories below and provide the metadata propagation rulesthat apply to them.

Not-propagated metadata

The metadata sets include several elements for which the values do not propagate,or for which no general rules can be defined with respect to propagation from theaggregate to the components or vice versa.

The table below show the LOM elements that belong to this category.

Metadata Category - Element1.1 General - Identifier1.2 General - Title2.1 Life Cycle - Version2.2 Life Cycle - Status2.3 Life Cycle - Contribute3.1 Meta-metadata - Idenitifier3.2 Meta-metadata - Contribute4.3 Technical - Location5.3 Educational - Interactivity Level5.4 Educational - Semantic Density5.8 Educational - Difficulty5.9 Educational - Typical Learning Time8 Annotation

Accumulated metadata

Accumulated metadata are metadata for which the following relationship existsbetween the content aggregate and the components:

f(a) ⊃⋃

j

f(cj)

or∀cj ∈ a : f(cj) ⊂ f(a)

The aggregate object is represented by a, the components of the aggregate, bycj . Metadata propagation for this category can be defined from the componentstowards the aggregate object, but not vice versa. For a facet f , the value of thatfacet for the learning object a is defined as containing the union of the facet valuesof the different components cj . Possibly, f(a) contains additional values that are


not part of any f(cj). Therefore, we defined the relationship as ⊃.

For the components, the relationship only shows that the facet values should bepart of the set of values of the aggregate.

The IEEE LOM elements that belong to this category are shown below.

Metadata Category - Element1.3 General - Language1.4 General - Description1.5 General - Keyword1.6 General - Coverage4.1 Technical - Format4.4 Technical - Requirement4.5 Technical - Installation Remarks4.6 Technical - Other Platform Requirements5.2 Educational - Learning Resource Type5.10 Educational - Description7. Relation9 Classification

For example, the installation remarks of an aggregate can be defined as the unionof all installation remarks for the different components. Vice versa, given the in-stallation remarks of an aggregate object, the installation remarks also contain theinstallation remarks of the components, but we cannot tell which they are exactly.

In [Hatala and Forth, 2003; Hatala and Richards, 2003], these are the metadatathat exhibit the accumulation property. Accumulated metadata propagate upwardin the hierarchy, i.e. the values are accumulated from the components to the higherlevel elements. This is shown in Figure 4.7, part b. A learning object is representedas a tree. The elements of the tree are aggregates, containing other learning ob-jects, or simple learning objects. The metadata are shown as small tables nextto the respective element. The black bars in the table represent the value for aspecific facet. The element higher in the hierarchy accumulates the values fromthe elements lower in the tree. This is defined as follows:

ASSVOn(E) =⋃

i

MOi(E){Oi|Oi ∈ subtree of On}

This definition describes how an accumulated set of suggested values (ASSV) for ametadata element E is generated from a hierarchically organized content aggregate


Figure 4.7: Metadata Inheritance and Accumulation [Hatala and Forth, 2003]

by taking the union of all values in the subtree of the object. On is the learningobject the metadata are calculated for, the objects Oi are the objects in the subtreeof On. The metadata values for element E of an object in the subtree is indicatedby MOi(E).

Additive metadata

Propagation of additive metadata is also from the components towards the aggre-gate, as in accumulated metadata. The propagation rule, however, can be morespecialized as being the sum of the values.

f(a) =∑

j

f(cj)

Metadata Category - Element4.2 Technical - Size4.7 Technical - Duration

Inherited metadata

Inherited metadata propagate from the aggregate object to the components.

In practice it is difficult to express the inheritance relationship in one formulabecause the inheritance can depend on the type of learning object and the meta-data facet. [Hatala and Forth, 2003] use a union operation (see below) to expressthe idea of the inheritance, but this is not the formula to calculate the value.They distinguish between different types of aggregate objects – aggregations andassemblies – in the calculation of the facet value. An assembly contains individu-ally independent units; in an aggregation the components relate more closely to


one another. Having this distinction, the inheritance within an assembly objectwould be defined dependent on the object’s type, such as the media type. In anaggregation, this distinction would not be made. The following example explainsthis difference:

For example, the creator element for the aggregates (i.e. course content)exhibits the inheritance property and we can include each value foundin the creator element in records on the path to the root record of thehierarchy. However, for the same creator element for the asset anotherset of values would be generated based on the values collected from allthe assets in the assembly with the similar media type based on anassumption that it is likely that a creator of a specific media type is amedia developer specializing in a particular type of media object [?].

Metadata Category - Element3.3 Meta-metadata - Metadata Schema3.4 Meta-metadata - Language5.1 Educational - Interactivity Type5.5 Educational - Intended End User Role5.6 Educational - Context5.7 Educational - Typical Age Range5.11 Educational - Language6.1 Rights - Cost6.2 Rights - Copyright6.3 Rights - Description

As for accumulated metadata, [Hatala and Forth, 2003](Figure 4.7, part a) definesthe category of inherited suggested values for the propagation of metadata fromthe aggregate to the components:

ISSVOn(E) =n−1⋃

i=0

MOi(E){Oi|Oi ∈ path from O0 to On}

The notation is similar to the notation used for the definition of the accumulatedset of suggested values. The difference is that the tree is traversed from the rootto the learning object On.

Special Relationships

The LOM element set includes two metadata items that are typically associatedwith content aggregations: Structure and Aggregation Level. Therefore, we canclassify these as exposing a special relationship. Both elements have a discrete,ordered domain. The value for the components is less than or equal to the value of


the aggregate, as shown in the formula below. In specific applications, the valuesfor these elements can easily be defined and automatically applied to the aggregateor the components.

∀cj ∈ a : f(cj) ≤ f(a)

Metadata Category - Element1.7 General - Structure1.8 General - Aggregation Level

4.7.1 Discussion

In this section we have defined propagation rules for automatic metadata genera-tion in a special case of reuse, i.e. content aggregation. We have chosen this typeof reuse because most reuse can be classified in this category. Defining compara-ble rules for other types of reuse is much more difficult because the relationshipsbetween the reused learning objects and and the newly created object are muchless apparent.

On the other hand, these rules define a method to suggest metadata values fornew learning objects – [Hatala and Forth, 2003] talks about sets of suggested val-ues – because these rules are not definite or completely exact. In many cases, theserules will probably provide less accurate values than, for example, a human expertcan provide. In the previous chapter, we already explained the need for good sug-gestions because it helps to obtain a critical mass of metadata that are acceptableto use.

Furthermore, in specific cases or environments, more explicit relationships maybe defined between aggregate metadata and component metadata which allows tosuggest more specific metadata values.

4.7.2 Context-aware Propagation

In the beginning of this chapter we defined context-awareness of metadata, allow-ing to define metadata values relative to a specific context of use. In this sectionon metadata propagation we explicitly left out the context in the definition of therules because it makes these rules much more complex.

Generally, we argue that metadata can be propagated as defined above if themetadata items are context-independent or if the contexts in which the learningobjects are put remain equivalent. If the contexts are not equivalent, it is verydifficult to define propagation rules for the metadata elements.


The issue of context-independent metadata elements is relatively easy to solve.Not every metadata element for a learning object is dependent of a context, forexample the package size of an object remains the same in any context, as we havementioned in section 4.4.1.

For context-dependent metadata elements, the equivalence of different contextsbecomes important in order to define propagation rules. Currently, however, weare not aware of research about the equivalence of contexts with respect to learn-ing objects. In a simplified approach, equality could be considered a reasonabledefinition.

The following example explains this idea of context-dependent metadata prop-agation.

Example

In section 4.4.1 we described a learning object about an animated fly-through

animation of the Riemann Zeta function. Suppose we have an aggregate ob-

ject including this learning object for a complex analysis course; the difficulty

level (LOM Element 5.8) would, for example, be set to the value ’medium’.

Suppose that the application profile and the working context allows to de-

fine a propagation rule for this element2 that says that the value remains the

same for the components: difficulty(a) = difficulty(cj). Applying the rule for

the difficulty level of the animation component object, results in the value

’medium’.

Reusing the same animation, however, in the context of the public mathemat-

ics course would have to indicate that the difficulty becomes ’very difficult’,

as we stated in the introduction of contexts instead of ’medium’ because the

context of reuse is not equivalent to the original context.

From the example follows, indeed, that some elements cannot be propagated if thecontext of reuse is not ’equivalent’ to the original context for which the originalmetadata was defined.

4.8 Conclusions

The formal model we discussed in this chapter defines how learning object meta-data can be represented and used for automatic metadata generation. This modelintroduces three important aspects for learning object metadata:

2 Generally, we have stated that the metadata element Difficulty cannot be propagated fromthe aggregate to the components or vice versa.

4.8 Conclusions 99

1. The first aspect is the use of a confidence value to indicate the certaintyabout a metadata value for a specific facet. This opens up the possibility toexpress more advanced queries based on information retrieval models suchas the vector space model.

2. The second is the definition of context-awareness in metadata. Context-awaremetadata provide the possibility to express more advanced uses of metadatain contrast to the very static metadata that are traditionally used.

3. The third defines propagation rules for metadata elements in the case reusein content aggregations. These propagation rules also help to automaticallydefine metadata values for learning objects based on existing metadata.

Introducing context-awareness for metadata initiates the possibility for furtherresearch for the expressiveness of contexts. For example [Chalmers et al., 2004]introduce compound contexts to indicate a relationship between contexts in whichone context is more specific than the other. A context is formed of many aspectsand it is possible to make abstraction of several aspects in a specific situation.Making those aspects concrete in another situation results in the more specificcontext. Currently, we treat contexts as black boxes, but they have more potentialto augment the metadata value.

Also the definition of metadata propagation rules implies further research. In ourapproach, we only defined an aggregate as a very simple set of component learningobjects. Using a more advanced definition based on content models, for example,allows to define propagation rules based on relationships between different relatedcomponents in the aggregate. Furthermore, the combination of context-awarenessand metadata propagation is a field that is not yet explored.


Chapter 5

Automatic MetadataGeneration Framework

5.1 Introduction

Based on the discussion of automatic metadata generation and the formal model inthe previous chapters, we have implemented an automatic indexing framework. Inthis chapter we explain how this framework is structured. We developed our frame-work as an extendible application which can be integrated in learning objects-basede-learning environments or used from within other environments, such as learningobjects authoring tools. The framework can be extended by implementing newtechnologies for metadata generation, i.e. metadata extraction or metadata inher-itance, or by the definition of new conflict resolution mechanisms. Our researchand implementation mainly focuses on applying and combining existing technolo-gies for information extraction in a learning object specific environment, insteadof developing new analysis algorithms. Therefore, it is important to notice thatthis chapter does not go into the details of the individual algorithms that extractmetadata from the objects, as this is not the focus of the work.

The development of the initial framework has been performed within the con-text of this research, but is still continuing as part of related research. Therefore,the explanation of the framework can be divided in two components. In the first,the implementation of automatic indexing is considered. In the second, the frame-work as a service-oriented architecture is described. The main contribution of theresearch described in this thesis relates to the first part, the automatic indexing.In this thesis, only an initiation of the service-oriented architecture is developed,called the Simple Indexing Framework. This is currently being developed furtheras the Simple AMG Interface, focusing on the interoperability with learning object

101

102 Automatic Metadata Generation Framework

Indexing Service

Indexer Extractor MetadataMerger

ObjectBasedIndexer ContextBasedIndexer TextExtractor

PdfTextExtractor

PropertyExtractor

OfficePropertyExtractor... ...

Figure 5.1: Overall Structure of the Automatic Indexing Framework

repositories. This latter is not a part of this research anymore.

In section 5.2 we describe the overall structure of the framework, referring tothe different sources of metadata discussed in Chapter 3 and the definition of con-fidence values for metadata, discussed in Chapter 4. That section also describesthe main hierarchies of the framework in more detail. In section 5.3 we describe theweb service as a main component for the Simple Indexing Interface. In section 5.4we illustrate the framework in combination with a specific learning managementsystem as a context, allowing to evaluate the possibilities for automatic indexing.Section 5.5 compares two related implementations with this framework. Section5.6 provides some general conclusions on the implementation.

5.2 Overall Structure

The overall structure of the framework is based on the categorization of meta-data sources and the discussion about conflict resolution using fuzzy metadata,described in Chapter 3 and Chapter 4. The working principle of the framework isto consult different sources of information (indexers) and to combine the metadatasets of those indexers into a single metadata record using confidence values. De-pending on the stage in the life cycle, more sources of information may be availableand consulted for metadata.

In Figure 5.1, these main classes of the framework are defined as Indexer andMetadataMerger.

• Indexer objects perform the actual metadata extraction from the given source.Currently, we have implemented indexers that take the learning object or thecontext objects as the source of metadata. For the moment, we have not de-

5.2 Overall Structure 103

ObjectBasedIndexer

PropertyIndexer MimeBasedIndexer

LanguageIndexer TextCategorizer ArchiveIndexer PdfIndexer

MSOfficeIndexer

PowerPointIndexer WordIndexer

ImageIndexer

Figure 5.2: The Class Hierarchy of ObjectBasedIndexer

fined indexers for actual use analysis or content aggregates, although we haveforeseen classes that allow this implementation.

• The MetadataMerger class allows the implementation of different strategiesfor conflict handling. One of the strategies our framework has implemented,is the simple rule that uses the value with the highest confidence value.Confidence values are defined for each indexer and are associated with themetadata values that those indexers return (see further).

The framework also uses helper classes, called Extractors. These classes are usedto extract specific parts from a source in order to use it afterwards for metadataextraction. Typical classes defined in this hierarchy are text extractors for non-ascii text documents, such as Acrobat PDF documents, and property extractors,mainly in the context of metadata harvesting.

The entry point of the framework is the web service, IndexingService, that acts asthe interface for the metadata generation. Further in this chapter we discuss howthis web service is the key of the Simple Indexing Interface.

We will now describe the main hierarchies of the framework, ObjectBasedIndexerand ContextBasedIndexer, and the implementation of confidence values in moredetail.


5.2.1 ObjectBasedIndexer

The hierarchy of ObjectBasedIndexer contains several sub hierarchies that groupspecific types of indexer classes (see Figure 5.2). These indexers use the Extractorclasses to generate metadata that are derivable from the learning object content.The size of the metadata set generated by these classes may vary; due to thespeciality of the different indexers, we have noticed that the classes typically im-plement metadata generation for about five facets, although this has not beenproved formally.

1. The first group is called MimeBasedIndexer. These classes are able to dealwith a particular type of documents like pdf texts. We introduced the classof MIME-based indexers to enable the implementation of document typespecific indexers. Each indexer is aware of the type specific properties and isthe expert to index that document.

Within this hierarchy we have also defined the ArchiveIndexer class, whichimplements strategies to handle aggregations of learning objects to treatthem as a whole. An example of such an archive can be a set of hypertextsthat have links between the documents and are stored as a archived (zip)document.

2. PropertyIndexers: these classes have the expertise to analyze the documentfor one or more specific properties. Examples of this type of indexers arealgorithms that determine the language of given object or classify the ob-ject according to a given classification scheme. These indexers correspond tothe object analysis techniques described in the previous chapters, based oninformation extraction.

In practice, a set of indexers is defined for different types of learning objects andthese are activated to index the given learning object. After the indexing, the re-sults of the different indexers is combined into a single metadata record by themetadata merger.

The specification of the ObjectBasedIndexer class contains only the method togenerate metadata for a given learning object, identified by a FileDataSource1.The result of the method is a set of metadata with confidence values. This API isshown in Listing 5.1.

1 This class is part of the Java activation framework


pub l i c ab s t r a c t c l a s s ObjectBasedIndexer implements Indexer {/∗∗∗ Generate the metadata tha t can be derived from the learning∗ object , and return i t as metadata se t with confidence va lues∗∗ @param lo The learning ob j ec t for which metadata must∗ be generated∗ @return The r e su l t i n g ARIADNE metadata instance∗/

pub l i c AriadneMetadataWithConfidenceValue getMetadata (DataSource l o ) ;

}

Listing 5.1: Partial Implementation of OfficeIndexer

As an example, we show a part of the implementation of the OfficeIndexer, theclass that generates metadata from the MS Office properties, in listing 5.2.

pub l i c c l a s s O f f i c e Indexe r extends MimeBasedIndexer {/∗∗∗ Class tha t generates metadata an o f f i c e document .∗/

pub l i c AriadneMetadataWithConfidenceValue getMetadata (Fi leDataSource l o ) {

AriadneMetadataWithConfidenceValue metadata = . . . ;

. . .S t r ing sub j e c t =

props . getProperty ( Of f i c eProper tyExt rac to r .PID SUBJECT) ;i f ( sub j e c t != nu l l && ! sub j e c t . equa l s IgnoreCase ( ”” ) )

metadata . addMainOrOtherConcept (new LangString ( ”en” , sub j e c t ) ,AriadneMetadataWithConfidenceValue .REASONABLEDOUBT) ;

St r ing keywords =props . getProperty ( Of f i c eProper tyExt rac to r .PID KEYWORDS) ;

i f ( keywords != nu l l && ! keywords . equa l s IgnoreCase ( ”” ) )metadata . addMainOrOtherConcept (new LangString ( ”en” , keywords ) ,

AriadneMetadataWithConfidenceValue .REASONABLEDOUBT) ;. . .

}}

Listing 5.2: ObjectBasedIndexer

The implementation maps the MS Office properties extracted from the objectby the OfficePropertyExtractor to metadata and associates the confidence valueswith the resulting values. We discuss the different confidence values in section 5.2.3.The OfficePropertyExtractor is shown in Figure 5.1 as a subclass of Extractor andPropertyExtractor.

5.2.2 ContextBasedIndexer

Contexts typically describe the use of a learning object in a course, or representinformation about the author of the learning object (profiles). Currently, the frame-


ContextBasedIndexer

ContentAggregation

ContextBasedIndexer

LMS

ContextBasedIndexer

Library

ContextBasedIndexer

BB

ContextBasedIndexer

OCW

ContextBasedIndexer

WebCT

ContextBasedIndexer

AACE

ContextBasedIndexer

Figure 5.3: The Class Hierarchy of ContextBasedIndexer

work focuses on the creation of context-based indexers for learning managementsystems or repositories in specific (Figure 5.3). The context-based indexers are alsothe classes that enable the integration of the dynamic life cycle and the indexingframework. For example, these indexers can be made aware of the different logformats of the LMSs or the repositories in order to analyze a part of the actual useof the learning objects. Another example is the definition of the learning object’sdiscipline from the course information available in the LMS. This latter is onlypossible in the integration phase of the life cycle and not during the authoring ofit.

To implement contexts related to a learning management system, we define theclass LMSContextBasedIndexer. Other contexts are represented by other classes,such as library systems or administrative databases. Within the LMS category ofclasses we have classes that generate metadata for a Blackboard document, or anOpenCourseWare object. Basically, these classes analyze the consistent contextthat courses in both environments display.

For example, Blackboard maintains information about the user logged in (whichis a reasonable candidate as the author of learning objects that are stored in thecurrent session), about the domain that the course covers (a reasonable candidatefor the domain of the learning objects that the course includes), etc. Similarly, theOCW web site is quite consistent in how it makes that kind of information avail-able to end users. Our indexer for OCW basically mines this consistent structurefor relevant metadata about the learning objects referenced in the course web site.

An example of LibraryContextBasedIndexer is a digital library such as the AACEdigital library. A library context differs from a LMS context in that a LMS is ableto provide much more information about the specific learning context. In contrastan object in a library is more isolated from other objects. The library typicallyexposes cataloging and author information about the objects.


The class ContentAggregationContextBasedIndexer and its subclasses allows thehandling of learning objects within a content aggregation. This differs from theArchiveIndexer in the content-based indexers in that this hierarchy uses the con-tent aggregation as the information source for learning objects within the ag-gregate. For example, the third slide in a MS PowerPoint presentation can beconsidered a learning object for separate storage; the presentation itself and thesurrounding slides of this learning object provide specific metadata. The expertiseabout this metadata lies within these classes. Because we have not defined anyrules for these type of relationships in content aggregates, we have not furtherimplemented these classes in the framework.

The definition of ContextBasedIndexer is similar to that of ObjectBasedIndexerexcept that we have to identify the context using a utility class, called Metadata-sourceId, because a context mostly is not available as a single document that canbe stored. The MetadatasourceId fully identifies a given source of metadata andenables to read the data in it. In practice, every context is identified by its re-spective identification class, such as BBMetadatasourceId for Blackboard courses,and OCWMetadatasourceId for OpenCourseWare courses. The specification of theContextBasedIndexer is given in Listing 5.3.

pub l i c ab s t r a c t c l a s s ContextBasedIndexer implements Indexer {/∗∗∗ Sets the i d e n t i f i c a t i o n of the context given∗ a MetadatasourceId∗∗ @param metadatasourceId the id of the context∗/

pub l i c void setMetadatasourceId (MetadatasourceId metadatasourceId ) ;

/∗∗∗ Generate the metadata tha t can be derived from the given∗ context , and return i t as metadata se t with confidence va lues∗∗ @param lo The learning ob j ec t for which metadata must∗ be generated∗ @return The r e su l t i n g ARIADNE metadata instance∗/

pub l i c ab s t r a c t AriadneMetadataWithConfidenceValue getMetadata ( ) ;}

Listing 5.3: ContextBasedIndexer

The BBContextBasedIndexer (see also section 5.4) extends this abstract class forthe generation of metadata in a Blackboard environment. Listing 5.4 shows a smallpart of this implementation. The code is used to obtain the author metadatafrom a LDAP server connected with the Blackboard environment or from theBlackboard environment itself if the LDAP server is not available. The LDAP


server is preferred because it contains more detailed information than Blackboarddoes.

pub l i c ab s t r a c t c l a s s ContextBasedIndexer implements Indexer {pub l i c AriadneMetadataWithConfidenceValue getMetadata ( ) {

. . .t ry {

Person personBB = . . . ;Person personLDAP = . . . ;ariadneMetadata . addAuthor (personLDAP . getFamilyName ( ) ,

personLDAP . getGivenName ( ) , . . . , orderNr ,AriadneMetadataWithConfidenceValue .DOUBT) ;

} catch ( Exception e ) {ariadneMetadata . addAuthor ( personBB . getFamilyName ( ) ,personBB . getGivenName ( ) , . . . , orderNr ,AriadneMetadataWithConfidenceValue .REASONABLEDOUBT) ;

}. . .

}}

Listing 5.4: ContextBasedIndexer

5.2.3 Confidence Values

The framework implements the idea of fuzzy metadata, discussed in the formalmetadata model, for the generated metadata. Every indexer returns for each valuea confidence value that is appropriate for that value.

We have based our implementation on the ARIADNE metadata application profile.Therefore, we introduced the confidence value in the definition of this metadataclass. The resulting class is called AriadneMetadataWithConfidenceValue, whichincludes a confidence value for each facet in the metadata profile. The class al-lows to add every metadata element defined in the ARIADNE application profiletogether with a confidence value. We show two such methods as an example inListing 5.5.

To facilitate the easy expression of confidence values we introduced several con-stant values:

CERTAIN = 1

PROBABLE = 0.8

REASONABLEDOUBT = 0.6

DOUBT = 0.4

STRONGDOUBT = 0.2

NOOPINION = 0

These values can be compared to the values defined by McAllister [Chassell, 1997b],although our current implementation use hedges related more to probabilities thanto fuzziness:

5.3 The Simple Indexing Interface 109

strongly or highly suggestive 0.8

suggestive 0.6

weakly suggestive 0.4

slight hint 0.2

Of course, the indexers are free to associate any value with the generated facetvalue, in the range [0..1]. In many situations the indexers will be tested manuallyand the confidence value derived from those tests. Typically a test set of learningobjects will be used to check how accurate the indexer algorithm is. In other situ-ations, the confidence value could maybe be derived automatically by the indexerusing available parameters used in the calculations.

c l a s s AriadneMetadataWithConfidenceValue {/∗∗∗ Adds a value for document language to the se t of metadata∗∗ @param documentLanguage The language to be added∗ @param cv The confidence value for t h i s metadata value∗/

pub l i c boolean addDocumentLanguage ( St r ing documentLanguage ,f l o a t cv ) ;

/∗∗∗ Adds a value for Granulariy to the se t of metadata∗∗ @param l s The granu lar i t y as a LangString element∗ @param cv The confidence value for t h i s metadata value∗/

pub l i c boolean addGranular ity ( LangString l s , f l o a t cv ) ;

. . .}

Listing 5.5: AriadneMetadataWithConfidenceValue

5.3 The Simple Indexing Interface

The Simple Indexing Interface (SII) is the initial implementation of a service-oriented architecture for automatic indexing of learning objects. In essence, thisis an application programmer’s interface to implement the services for automaticindexing.

This interface is being developed as a part of the research on the developmentof a global framework for learning object web services. The first initiative in thiscontext has led to the development of the Simple Query Interface standard [Si-mon et al., 2004], a definition of web services that enable querying learning objectrepositories in a standardized way. Our specification for the indexing interfaceclosely relates to SQI and uses the same design principles, the specification shouldbe simple and easy to implement. The work on SII is being continued, focusing


on the interoperability with learning object repositories. Currently, this work iscalled the Simple AMG Interface (SAmgI), but this is not a part of the researchdescribed here. In the conclusions of this chapter, we discuss the reasons why thesecond version of the framework has been implemented.

Figure 5.4 shows the process of indexing a learning object. The following stepsare shown in this schema:

a) The user identifies the object to be indexed and the contexts within whichthe object resides and which are to be used by the automatic indexing pro-cess.

Remark, that for every learning object, multiple identification objects maybe instantiated. The learning object itself is identified, but every context inwhich the object is used, may be used for indexing also.

The example of Figure 5.4 shows a learning object on the upper left andtwo courses in electronic learning environments, one Blackboard course andone OCW course, both including the same learning object.

b) The identifying objects defined in the previous step, are then fed to the sys-tem, which uses the identifiers to create the correct Indexer instances for themetadata generation. The decision on which indexers are applicable for thedocument is made based on the file type (for example MS PowerPoint) ordefined by the context that is provided.

As mentioned above, this part of the framework is also configurable for ev-ery type of object, allowing to easily extend it when new indexers becomeavailable in the system.

In the example, three indexers are shown, one for the respective identifi-cation objects.

c) For each Indexer instance associated with the learning object, the systemsends a request to create metadata for that object.

d) As described in section 3.3 we need conflict resolving between different meta-data instances. For now, we only implement strategy 3: working with degreesof confidence for generated metadata values, and choosing the one(s) withthe best confidence value. This is implemented by the MetadataMerger andAriadneMetadataWithConfidenceValue classes. The latter class representsthe metadata instance, with associated confidence values for each metadataelement.

5.3 The Simple Indexing Interface 111

LO

LO

BB Course

LO

OCW Course

FileSystem

MetadataSourceIdBBMetadataSourceId OCWMetadataSourceId

ObjectBasedIndexer BBContextBasedIndexer OCWContextBasedIndexer

MetadataMerger

identifies

identifies identifies

AriadneWithConfidenceValue

Metadata Set

uses uses uses


Metadata Set


Metadata Set


Metadata Set

Indexer Indexer Indexer

(a)

(b)

(c)

(d)

Figure 5.4: Information Flow in the AMG Framework


When a metadata element is associated with the instance, the confidencevalues are checked. Only if the confidence value for the new value is higherthan the previously added values, the new one replaces the old one. Other-wise, the old value is preserved. The MetadataMerger class can merge twoexisting metadata instances into one, according to the same strategy.

The Simple Indexing Interface wraps a web service around this automatic indexingin order to index learning objects in a session-based environment that is config-urable according to the needs of the user. The different methods that offer thisfunctionality are given in Table 5.1.

The SII methods are classified in three categories: session management, sessionconfiguration, and learning object indexing.

• The session management methods allow the user to start and end a sessionwith the indexing service. Sessions are implemented for the sake of the config-urability of the interaction; users can configure several parameters accordingto their needs on a session basis.

• Session configuration is performed to obtain results according to the user’sneeds. The options the user can control, are metadata language, conflicthandling method and metadata format. Of course, every implementationis responsible for implementing the configuration parameters; possibly, forexample, only one conflict handling method could be supported, or only onemetadata schema to request metadata.

• The indexing methods are the most important methods, because they allowto request metadata for a given learning object. Metadata can either berequested in a metadata schema format, such as DC or ARIADNE, or as araw XML document.

A typical course of action to use the Simple Indexing Interface is illustrated inFigure 5.5. The sequence diagram shows the use of the three categories of methods.First the session is established (step 1) and then configured (steps 2 and 3). Instep 4 metadata for a given learning object is requested. This request is handledinternally as described above. Finally, when the result is obtained and no morelearning objects are to be indexed, the session is finalized (step 5).

5.4 Evaluation

In order to evaluate the possibilities of the framework, we implemented a context-based indexer for the Toledo environment of the K.U.Leuven and indexed severallearning objects using a combination of this context and object-based indexers.

5.4 Evaluation 113

Cli

en

tS

IIIn

de

xe

r

1: startSession(): String

2: setMetadataLanguage(String, String)

3: other configuration calls

4: getMetadata(MetadatasourceId[])

5: endSession(): String

4.1:

∀ MetadatasourceId: createIndexer(MetadatasourceId)

4.2: getMetadata(): AriadneMetadataWithConfidenceValue

Figure 5.5: Typical SII Sequence Diagram


Session handling methodsstartSessionendSession

Session ConfigurationsetMetadataLanguagesetConflictHandlingMethodsetMetadataFormatgetSupportedConflictHandlingMethodsgetSupportedLanguagesgetSupportedMetadataFormats

Learning Object IndexinggetMetadatagetMetadataXML

Table 5.1: Methods of the Simple Indexing Interface

The tests that we have performed must be seen as a proof of concept instead of avery formally conducted evaluation. A more formal evaluation is currently beingperformed in the context of the SAmgI framework. In section 5.4.4 we refer torelated experiments in order to show the possibilities of automatic indexing.

5.4.1 ToledoBBContextBasedIndexer

In our indexing framework we implemented a component ToledoBBContextBasedIn-dexer as a subclass of BBContextBasedIndexer. Toledo is the electronic learningenvironment implemented at the K.U.Leuven, Blackboard has been integrated inthis environment as a learning management system. The ToledoBBContextBasedIn-dexer implements the context for Blackboard in the Toledo environment. This in-dexer generates all possible metadata that can be derived from the BlackboardLMS context of a learning object. To retrieve this information we could make useof:

• The file system Blackboard uses internally. For example: all documents ofcourse ”XYZ”reside in a directory ”XYZ”

• The database used internally by Blackboard to manage all the data.

• The Blackboard API. Blackboard offers a Java API on top of the databaseand file system. This API allows for example to retrieve all announcements,all staff information, all course documents

Using these means for metadata generation, the Toledo indexer generates valuesfor the metadata elements shown in the Table 5.2.

5.4 Evaluation 115

ARIADNE Metadata Element1.2 General - Title3.2 Meta-metadata - Contribute (creator)3.4 Meta-metadata - Language4.1 Technical - Format4.5 Technical - Installation Remarks5.2 Educational - Learning Resource Type5.5 Educational - Intended End User Role

6 Rights9 Classification (main discipline,

sub discipline, concept)

Table 5.2: Metadata Elements Generated by the Toledo Context

The object indexers for the different learning objects used in this test can generatevalues for several additional metadata elements (Table 5.3.

Metadata Element1.2 General - Title1.3 General - Language2.3 Life Cycle - Contribute (author)4.2 Technical - Size5.1 Educational - Learning Resource Type5.2 Educational - Interactivity Type

Table 5.3: Metadata Elements Generated by the Object-based Indexers

Of course, Table 5.3 can vary according to the type of object, because some objectscontain more information than other. For example, image objects contain size in-formation, possibly also the date when a picture was taken and so on. Hypertextscan contain META tags containing a lot of information, such as title, author, key-words. . .

There are several issues concerning the suggested values that are generated au-tomatically:

• The pedagogical duration is difficult to determine. A possible heuristic is toconsider it as a function of the number of pages. For example, a MS Power-Point document containing 30 slides can be considered to have a duration of60 minutes. This is however not a robust approach. One person can take 2hours for 30 slides while another person can do it in 20 minutes. To solve thisissue, we can derive it from a context like the electronic learning environ-


ment. If that environment supports the notion of a ”lesson”, we can deducethe pedagogical duration from the context. For instance, if a lesson takes 2hours and contains 2 documents, we could take a pedagogical duration of 1hour for each of them.

• The authors: When a user is logged in and uploads a document, it is knownwho does this. So during the document upload, we have the informationabout the author. However, in our case, we were processing the alreadyexisting documents at a later moment. And unfortunately, nowhere in theBlackboard system is maintained who has uploaded the document. We havesolved this issue by taking the instructors of the course as authors of thedocument. For courses with only 1 instructor, this is probably correct in themajority of the cases. Even when a teaching assistant uploads a file, it ismaybe not a problem to take the instructor as author.

• The accuracy for fields concerning top-level classification (ScienceType, Main-Discipline and SubDiscipline) depends on the way course numbers or iden-tifiers are chosen. For Toledo the classification is determined pretty well ina considerable part of documents. For example, in Toledo all courses in thefield of Philosophy start with a ”W”, so for all documents, used in thatcourse, we can take ”Human and Social Sciences”as ScienceType, ”HumanSciences”as MainDiscipline and ”Philosophy”as SubDiscipline. However, thishighly depends on the particular Blackboard configuration. A possible exten-sion and solution for this is to take into account the instructor information:for example instructors who work at the department of Computer Scienceproduce documents about computer science.

• The lowest level classification element (MainConcept) more or less repre-sents keywords for the document. Here we have several options among whichthe label of the file within Blackboard or the directory name in which thefile resides. E.g. for the course ”Multimedia”, one of the topics is ”XML”.The course documents about that topic could be structured in folders like”Multimedia ’ XML ’ Introduction”. Then it would be a reasonable approachto take ”Multimedia”, ”XML”, and ”Introduction”as main concepts, as theyare all relevant keywords. The assumption then is that instructors make fold-ers with relevant names. A possible extension in the future would be to useother techniques like AI approaches for automatic keyword extraction.

• To determine the language of documents, we use a java library. This seemsto do a good job if there is enough text to process. So, to determine thedocumentLanguage field, it seems to be a viable solution. For now, we arealso using it to determine the language of e.g. the MainConcept. The onlyproblem is that this consists of only one or a few words, which makes thejob of determining the language a lot harder. Because in the particular case

5.4 Evaluation 117

study of Toledo, we know that the courses are in English or Dutch, we havelimited the possible languages to ”en”and ”nl”. But even then, it is oftennot correct.

5.4.2 Results

For this test, we have taken 16 learning objects from the course ”Beginselen vanProgrammeren”, containing mainly (PDF) handouts of presentations used in thelectures. The objects we have taken, were available in the ARIADNE KnowledgePool System together with a manually created metadata record and were includedin the Toledo environment associated with this course. For all the objects, we gen-erated metadata automatically, using the indexers described above and comparedthose values with the values in the ARIADNE record.

Table 5.4 shows the comparison between the automatically generated metadataand the manual record for one learning object in this test. Of course, we cannotgeneralize conclusions from this single example, but the results for all objects werecomparable to this one. The table shows some typical differences in the metadatavalues.

We split up the table into two parts. The first part contains the metadata fieldsfor which the difference is understandable or not important.

• For example ”expositive”and ”Expositive”were regarded as being differentby the comparison tool, but the only difference is the capitalization.

• Another difference is the creation date and publication date. However thisis normal too because of our way of working, i.e.: the same document wasat some moment inserted in the ARIADNE KPS (therefore publication andcreation date 28 October 2003) and at some time in Toledo (so publicationdate on 2 February 2004). The metadata creation date for the Toledo versionwas the time we used in our framework for the document.

• Then, concerning the data about the author, the automatic values are alsomore correct, as the framework contacts the K.U.Leuven LDAP-server toretrieve the information, whereas the manual data is of course manual, andliable to errors.

• The title also differs. Moreover, the language of the title-field differs, and inthis particular case, the automatic value is the correct one. Probably, thetitle language has been entered wrongly by the indexer.

The second part of the table contains more pertinent differences.


• The most important one is maybe the main discipline. As we mentionedabove, the automatic value is not always as accurate as we would want. Cur-rently, the value depends on the course structure in the Blackboard system.The definition of the main discipline depends on this manual course identifi-cation made by the administrators; if this systems changes, the classificationwill not function correctly anymore.

• Concerning the document language, the chosen course is a bit particular asit is a course about programming. As a consequence it consists of pieces ofexplanation (in Dutch) and pieces of Java code. Therefore both are in a waycorrect, although you can argue that the explanation is most important todetermine the language, and therefore the automatic value is more correct.But we must admit that this is certainly not always the case. For some ofthe documents of the same course, the automatic value was ”fy”(Frisian)and thus less accurate than the manual one.

• The documentFormat is a tricky issue. The document is a pdf, but actually itis the conversion of a PowerPoint to PDF. Generally, the framework considersPDF documents as texts and does not look any further to see whether itshould be another value.

5.4.3 Further Evaluation

In [Meire et al., 2007] a more rigorous evaluation of the framework is performed. Inthis experiment, 22 expert metadata reviewers were asked to evaluate the qualityof manually and automatically generated metadata. The set of metadata recordsreviewed contained 10 objects for which the metadata was generated automatically(randomly chosen from a set of 114 learning objects with automatically generatedmetadata) and 10 learning objects that have a manually created record (of a total425 objects about information technologies in the ARIADNE KPS).

Using the quality assessment framework described in [Bruce and Hillman, 2004],the metadata records are evaluated according to seven parameters: completeness,accuracy, provenance, conformance to expectation, logical consistency and coher-ence, timeliness and accessibility2. Figure 5.6 shows the average quality grade forthese seven parameters for the automatically generated metadata and the manu-ally created records.

In average, the automatic values score 0.4 points higher, but the significance ofthis difference is not high enough to claim that those values are indeed better. For

2 see http://ariadne.cti.espol.edu.ec/Metrics/instructions.jsp for a description of these param-eters in this evaluation

5.4 Evaluation 119

Metadata Field Value, automati-cally generated byAMG framework

Value, manually in-serted

Manual and automatic value differ, but this difference is normal,understandable and/or not meaningful or not importantDocumentType expositive ExpositivePackageSize 83.9 84PublicationDate 02-02-2004 28/10/2003CreationDate 29-10-2004 28/10/2003OS Type Multi-OS MS-WindowsAccessRight private RestrictedAuthor/PostalCode B-3001 3001Author/Affiliation K.U.Leuven KULeuvenAuthor/City Heverlee LeuvenAuthor/Tel +32 16 327538 /Author/Departement Afdeling Information CompterwetenschappenAuthor/Email Henk.Olivie@cs... [email protected]/Type multiValued StringTitle Werken met Java Praktisch werken met

JavaTitle/Lang nl enManual and automatic values differ, this has to be investigatedwhether it matters for ’quality’MainDiscipline Civil Engineering/Ar-

chitectureInformatics/InformationProcessing

DocumentLanguage nl enDocumentFormat Narrative Text Slides

Table 5.4: Comparing Automatic and Manual Metadata

two parameters, the difference is significant enough, although the results cannotbe generalized from these two. All the AMG records came from the same source,which makes them very reliable; the manual records are chosen randomly andthus the value of their provenance is lower. With respect to timeliness, we haveto remark that the different reviewers did not agree upon the evaluation of thisparameter for the metadata records (the Intra-Class Correlation between the re-viewers was .67 which is too low to have an agreement on the evaluation).

Overall, we can conclude, for the test performed here, that there is no statisti-cal difference between the quality grades given to the manually created metadatarecords and the automatically generated records. In other words, using the frame-


Figure 5.6: Average Quality Grade for the Metadata [Meire et al., 2007]

work for the creation of metadata will no degrade the quality of the metadatarecords in the ARIADNE repository.

5.4.4 Related Experimentation

[Greenberg et al., 2005] analyzes the quality of automatic metadata generationbased on the expected accuracy for DC elements expressed by experts. Theseexpectations are shown in Figure 5.7. The scale used ranges from value 3 (veryaccurate) to 1 meaning ’not very accurate’. Greater accuracy is expected for tech-nical metadata elements, and less accuracy is expected for subjective metadata orelements requiring more intellectual discretion.

Experiments indeed show that these expectations are reasonable, but that goodresults can be obtained in specific situations:

• [Nadkarni et al., 2001] show that progress has been made in domain-specificautomatic indexing, but that the results for subject or description metadataare still not accurate enough for production use in general-domain collections.

The experiments used the Unified Medical Language System Metathesaurusto index narrative texts. For the training set, the authors used 100 documentsand identified the concepts in those documents. 82.6 percent of 8,745 con-

5.4 Evaluation 121

Figure 5.7: Expected Accuracy Level for Automatic Generation of Dublinc Core[Greenberg et al., 2005]

cepts were identified correctly (true positive). For a test set of 24 documents,1,701 concepts were identified, of which 76.3 percent was correct.

• [Saini et al., 2006] describe an automatic classifier for learning objects andthe use of ontologies to associate metadata with the classified learning ob-ject. In their experiments, they used three repositories and two classificationalgorithms to classify learning objects. The results of the micro-F1 measure(testing both precision and recall) range from 31 percent to 69 percent.

• [Han et al., 2003] and [Kim and Seamus, 2006] describe the use of semi-structured documents analysis for the automatic generation of metadata ele-ments such as title, author or publisher. A typical example of such documentis a conference paper which mostly contains the the author information in theheader section of the paper. The accuracy and precision of these algorithmsare very good which allows to integrate such techniques into an automatic in-dexing framework. For example, the automatic generation of author resultedin a precision of 99.3 percent and the recall for this element is 98.4.

• In [Li et al., 2005], the IBM’s MAGIC tool (see further) is described and eval-uated for two sets of objects. The first set contains reference books, videocaptures of seminars and recorded class lectures. The second set contains 172text documents (pdf, MS Word or HTML).

Although the actual use of the video analysis is rather limited at the mo-ment, the system performance is rather good. Currently, the system segmentsvideos in three types: narrator, informative text and linkage segment. Over-all, the system identifies these segments correctly; in some situations, suchas very small faces in the video, or fast moving cameras, the performancedegrades.


With respect to the text analysis, the authors evaluated nine test docu-ments from the second set. For this test, they asked evaluators to score theautomatically generated metadata from 1 (very poor) to 5 (very good). Forthe elements title, keywords and summary, the scores are 3.53, 3.72 and3.69. These results are rather good, especially for a subjective element assummary. In this test set, the results for authors or publishers, the resultsare rather poor (2.58 for names and 2.83 for places).

5.5 Related Work

In the discussion on automatic indexing, we already argued that this closely relatesto current research on advanced search and indexing tools, such as Google Desktopor Beagle. In section 3.2.2 we compared the different indexer components of theseframework with the metadata generators of our framework. In this section wediscuss the overall architecture with the AMG framework.

• IBM MAGIC

MAGIC (Metadata Automatic Generation for Instructional Content) is asystem for automatic metadata generation developed at IBM [Li et al., 2005],initially focusing on audio/video content. Figure 5.8 shows how different an-alyzers generate metadata for multiple types of objects.

The upper part of the figure shows components that analyze the objects.The objective of this system is the same as ours, i.e. integrating differentanalyzers to generate metadata. The lower part of the image shows the in-tegration part that creates SCORM metadata and packages for the indexedobject.

The MAGIC system and the automatic indexing part of our framework donot differ much. Both systems can be extended by content analyzers and gen-erate a metadata record for the given object. The main differences betweenthe two are:

– MAGIC probably is less application-dependent as it creates SCORM-compliant packages. Our framework is developed more closely connectedto a repository, the ARIADNE system, which makes it somewhat lessgeneral. Although this drawback should be solved in the SAmgI frame-work when a full service-oriented framework is defined. The SAmgI(Simple AMG Interface) framework is the implementation made as partof further research for automatic metadata generation.

5.5 Related Work 123

Figure 5.8: IBM MAGIC’s object analyzers and metadata integretation [Li et al.,2005]

– The MAGIC system focuses on a single indexing phase, generatingmetadata for an object only once. As discussed earlier, our frameworkalso supports the dynamic character of metadata.

• BEAGLE

Indexing in the Beagle search tool is performed in two sequential steps, shownin Figure 5.9. In the first step, the backend gets the object to be indexed froma data source and creates some backend-specific metadata for that object.In Chapter 3, we already compared these backends with our context-basedindexers, that generate metadata based on the context in which the learn-ing object resides. In the second step, a filter processes the contents of the


Figure 5.9: Beagle’s Indexing Process

object in order to generate metadata from those contents. In Beagle, boththe contents as a whole, i.e. the textual contents together with its structure(HTML-tags, EXIF metadata,. . . ) and the textual contents on itself are con-sidered to create metadata. Filters correspond to the MIME-based indexerwe defined above. For every supported MIME-type a corresponding filter isimplemented. If a MIME-type is not supported, Beagle does not perform acontents-based indexing of the object, only the backend generates metadatafor it.

We notice two important differences between our framework and Beagle.In Beagle, the filters seem to be the most important metadata generators,which makes it less flexible to configure when new information can be gainedfrom other sources for a certain object. In our framework, the indexing pro-cess is divided across different indexers, for every object multiple indexerscan be configured to generate metadata. This is not the case in Beagle.

On the second hand, although we compared a backend with a contextual in-dexer, Beagle only supports one such backend per object. Again, this is lessflexible than the use of multiple contexts for one learning object, each pro-viding metadata related to their information. In our framework, one learning

5.6 Conclusions 125

object can reside in different contexts (for example, two learning manage-ment systems), and they can be treated as the same object. In Beagle, thedata sources bound to two backends are not considered the same object.

5.6 Conclusions

In this chapter we have described a framework for automatic indexing of learningobjects, based on the information sources given in the previous chapters. Withoutgoing into the details of the implementation, we have shown how the automaticindexing process is executed using indexers, either object-based or context-based,and conflict resolution mechanisms.

The case study that we have performed shows an interesting result with respectto the capabilities of automatic indexing: to obtain valuable information about alearning object, a context is an invaluable source of metadata. Without the use ofcontexts, a limited number of metadata elements can be defined; with the use ofa context, an extensive set of metadata can be generated.

Linking this result to the dynamic life cycle of a learning object, we stress theimportance of the labeling in every phase of that cycle, especially in the integra-tion and using phase because of the valuable information available.

The work on this framework is being continued, because of several limitationson the first implementation (see also [Meire et al., 2007]):

• Currently, the framework is closely bound to the ARIADNE LOM applica-tion profile, which limits its use to the ARIADNE system. This limitationshould be resolved by supporting any binding of the LOM element set. InSAmgI, a standard implementation to represent LOM records is used whichindeed resolves this problem.

• The architecture is updated to better support a service-oriented frameworkand a stand-alone implementation.

• The current implementation of the confidence values and conflict resolutionmechanisms is rather difficult to extend.


Chapter 6

General Conclusions

In this chapter we review the results of this work and discuss possibilities for fur-ther research and development.

6.1 Summary

In general, we can summarize the contributions to the field of learning objectreusability as follows:

1. Dynamic Life CycleFirst of all, we have defined a new way of looking at the life cycle that learn-ing objects go through. We have called this the dynamic learning object lifecycle, as it considers the learning objects in a more dynamic environmentthan is usually the case. In this environment, information about the learningobjects can be generated in every phase of the life cycle.

This dynamic life cycle contrasts with the traditionally static life cycle thatcontains a unique labeling phase after the authoring of the object and rightbefore the object is stored in a repository. In this new life cycle, much in-formation about the learning object becomes available in the other phases,such as effective use information or context information. As an example, wementioned the discipline element that can be obtained during the integrationphase, when the learning object is placed in a course environment.

In a broader context, this dynamic life cycle closely relates to the currenttrends on the World-Wide Web, or Web2.0, with respect to the creation ofcontent. Web content is also considered to exist in a dynamic environment.

127

128 General Conclusions

Not only the content itself can be updated continuously, also new infor-mation can be added constantly as metadata to that content. And this isperformed voluntarily by multiple people. An important result of this is theemergence of folksonomies. Folksonomies are rooted in the web2.0 principleof user-generated content, in which users work together to create content,and continuously update (each other’s) information. In this way, web con-tent exposes a dynamic life cycle similar to the learning object life cycle. Themain difference with the learning object life cycle is that the web content lifecycle is not explicitly defined, but results from the collaborative authoringparadigm readily applied on the web. Until now, this approach is not ade-quately tested for learning objects.

Furthermore, folksonomies reveal another issue related to metadata. Folk-sonomies are opposed to closed classification systems which apply controlledvocabularies to define the values instead of tags that allow every possiblevalue. The main criticism for open tagging, is its unreliability and incon-sistency [Golder and Huberman, 2005]. Furthermore, it is stated that thisopenness invites for meta noise which decreases the usefulness of the meta-data [Bateman et al., 2006]. The drawbacks of the closed systems are thecost in applying the controlled vocabularies and the difficulty in learning tocategorize objects correctly. The automated metadata generation frameworkthat we have developed tries to overcome these problems for a closed learn-ing object metadata system.

However, folksonomies can be considered the key to a real semantic webbecause they have shown to work in many applications. Therefore, we thinkthat the use of folksonomies or similar methods can be beneficial for learningobjects too, although this should be subject of scientific experiments also. Ina traditional life cycle, learning object metadata are typically created basedon a standard schema, such as IEEE LOM, which applies well for the type ofinformation added. In the dynamic life cycle, however, the type of informa-tion that becomes available may not fit that well in such a schema. In thatcase, an open system can help to overcome the problem of adding metadata.In the dynamic life cycle, especially the phases related to using a learningobject apply for such a system.

Although the dynamic life cycle for learning objects and their metadataseems to be very promising, we have to admit that there has been no realintegrated experiments for this idea. Most experiments consider an isolatedcomponent for metadata generation. For example, the framework we havedescribed is mostly tested in a static environment, i.e. during the offeringphase. Tests performed by [Najjar et al., 2005] and [Verbert et al., 2005a],

6.1 Summary 129

focus on a single phase too, respectively, the using phase and the repurposingphase of the life cycle. The framework that we have developed (see below),however, offers the possibility to use different types of information sources,in order to allow this integration with the dynamic life cycle.

2. Automatic Metadata GenerationSecond, we have developed an automatic metadata generation framework.Automatic metadata generation can assist in the creation of good qualitymetadata. In this way, it helps in this difficult and time-consuming task ofindexing that instructors often do not have the time or knowledge for. Twoaspects of this framework are important:

(a) The metadata generation method proposed in this text correspondsclosely to the dynamic life cycle. The method consults different sourcesof information, that become available in the dynamic life cycle. In astatic life cycle, only object analysis can be used as metadata. In thedynamic life cycle, other information sources can be used to obtainmetadata too. Examples of these sources are the courses in which thelearning object is used, attention metadata from actual use by learnersand explicit feedback mechanisms from users – both trainees and in-structors.

In Chapter 3, we have explained these different metadata sources by re-ferring to the world-wide web. Many advanced web applications, such ase-commerce systems and search engines, apply several of these sourcestoo in the analysis of the content or the users’ interactions. In ourframework, we try to apply similar technologies in a specific context –learning objects – which allows us to add more semantics to the meta-data values. Compared to general search engines, these semantics helpto answer more detailed search queries. Search engines, often only applykeywords and little semantics such as title or publication date to answerqueries. For example, Google Scholar allows to search for publicationsof specific authors, words in the title or body and categories.

(b) The second contribution of the automatic metadata generation frame-work is the introduction of a confidence value for learning object meta-data. This confidence value serves two goals. First, it highlights the(un)certainty about metadata. The metadata provider is not alwayscompletely sure about a value; the confidence value helps to expressthis uncertainty. Second, we apply the confidence value to resolve con-flicts between the multiple sources of metadata, i.e. to define the mostappropriate value. This is needed, because several sources can providemetadata during the life of the learning object. But the values theyprovide can be conflicting to one another.


At this moment, the definition of the confidence value is introducedad hoc. We have referred to fuzzy logic as the inspiration for this defi-nition, but a formal link between the two has not been made. We thinkthat this link, such as the definition of different axioms for the confi-dence value, should be investigated.

6.2 Further Research Topics

Some of the ideas and results presented also introduce new questions or optionsfor further research and testing. We discuss some of these below.

Dynamic Life Cycle

Currently, the learning object life cycle is being defined by each research projectitself; no standard definition exists that covers all the needs or allows to definethose needs within the standard. In principle, the dynamic life cycle is an ad hocdefinition too. It introduces a new aspect within our context – the labeling phasein parallel to the other phases.

We think that learning objects could benefit from a standardization effort forthis life cycle. Similar to the result of the metadata standardization effort, learn-ing object repository development would benefit from a standardized life cycle. Forexample, the development of architectures, such as the Simple Indexing Interfaceor the Simple AMG Interface, could refer to the standard life cycle.

Context-awareness

In the formal model, we have introduced context-awareness for metadata.

• In the definition of context-awareness, we have considered the contexts them-selves as black-boxes hiding their internal structure. For the current defini-tion, this internal structure is not important. Further research on context-awareness, however, cannot neglect the importance of the context and itsinternal structure and contents. We have already mentioned one example ofthis aspect: context reification, when a context is a content aggregate andthus a learning object itself. Possibly, the latter is even the most simple case,because learning objects have been studied rather well and thus might berelatively easy to use as contexts.

Generally, context-awareness in information systems takes into considerationthe Who’s, Where’s, When’s and What’s of entities and use this information

6.3 Final Reflection 131

to determine why a situation is occurring [Dey, 2000]. The generalization ofcontext-aware metadata to these four W’s has to be investigated further.

• Context-awareness introduces other aspects that need further investigation.For example, in Chapter 4 we have mentioned context-aware propagation ofmetadata as an extension of the rules given.

• Another aspect is the definition of equivalence of contexts, which is impor-tant to implement context-aware metadata systems. In other words, how canwe make sure that equal contexts are not treated as being different or un-equal contexts as being the same? In order to allow the implementation ofcontext-aware repositories or query mechanisms, this in an important aspectto solve. Without an appropriate definition, for example, a lot of metadatawill become duplicated because they appear to refer to different contexts. Inpractice, however, the contexts would be equivalent, which does not implyduplicate metadata.

Advanced Integration with Dynamic Life Cycle & Conflict ResolutionMechanisms

The current implementation of the framework contains several aspects that mayneed further attention:

1. Currently, the framework implements a limited number of indexers, mainlyfocussing on object-based indexers. With respect to use information, for ex-ample, no indexers have been integrated yet. In order to fully support au-tomatic metadata generation related to the dynamic life cycle, additionalindexers should be developed. Also, further analysis on how the frameworkcan be integrated with this life cycle support must be performed. The cur-rent framework mainly focuses on the offering phase of the life cycle; the useof the framework in other phases, for example for log analysis, has not beentested adequately.

2. With respect to conflict resolution mechanisms:

(a) The framework uses a straightforward definition of conflict resolution,selecting the value of the most confident indexer. However, this is notnecessarily the best value to select.

(b) In the specific case of multi-valued metadata elements or ordered listelements, conflicts can become complex to solve.

6.3 Final Reflection

As a concluding remark, I think we can state that the results obtained in thiswork really can help in tackling the problems, regarding learning objects and their


metadata, mentioned throughout this text. One of the main advantages of auto-matic metadata generation is the possibility to hide the metadata for the end userand make their use transparent. Searching for learning objects can become as easyas searching for other information on the web; the query results, however, will bebased on quality metadata that have become available automatically in differentphases of the life cycle. Furthermore, the dynamic life cycle defines explicitly whichinformation can be applied as metadata, and thus adds more valuable informationto the metadata.

In my opinion, the practical use of learning objects has shown the need for thistransparency in order to effectively embed the idea of share and reuse of learningobjects in learning.

Hoofdstuk 7

Een DynamischeLevenscyclus voorLeerobjecten en de Gevolgenvoor AutomatischeMetadatageneratie

7.1 Leerobjecten en Leerobject-metadata

7.1.1 Leerobjecten

De voorbije jaren hebben leerobjecten heel veel aandacht gekregen als een didactis-che technologie waarbij lessen of onderdelen daarvan opgebouwd kunnen wordenaan de hand van bestaand materiaal. Vanuit didactisch oogpunt wordt er veel on-derzoek gedaan naar deze technologie, die vooral dankzij de opkomst van het world-wide web een sterke opgang kent. In dit werk zijn we echter veeleer geınteresseerdin de componenten die deze technologie mogelijk maken en niet zozeer in de di-dactische aspecten zelf. De grote interesse voor leerobjecten is ontstaan uit demogelijkheid om de individuele componenten te hergebruiken in plaats van allemateriaal altijd opnieuw te moeten aanmaken.

Om hier duidelijk de nadruk op te leggen, definieren we een leerobject als ”elkeentiteit, digitaal of niet-digitaal, die hergebruikt kan worden of waarnaar verwezenkan worden tijdens technologie-ondersteund leren”. Dit is ook de definitie die ge-bruikt wordt in de definitie van de IEEE Learning Object Metadata standaard.

133

134Een Dynamische Levenscyclus voor Leerobjecten en de Gevolgen voor

Automatische Metadatageneratie

Afhankelijk van de doelstellingen, legt een onderzoek een andere nadruk op ver-schillende aspecten van leerobjecten. De belangrijkste zijn: herbruikbaarheid, leer-doelstellingen en inkapseling/containerisatie. Voor ons doel is, zoals boven aangegeven,de herbruikbaarheid vanuit technologisch standpunt een belangrijke eigenschap.

7.1.2 Leerobject-metadata

De algemene definitie van metadata, die ook van toepassing is op leerobject-metadata, zegt dat ”metadata gestructureerde gegevens over een object zijn dieondersteuning bieden voor de functies die met dat object geassocieerd zijn”. Dezedefinitie bevat twee onderdelen. Enerzijds, maken metadata het mogelijk het ob-ject te vinden en te gebruiken, i.e. de functies uit te voeren. Anderzijds, definieertze metadata als gestructureerde gegevens. Typische categorieen van metadata zijn:beschrijvende metadata, structurele metadata en administratieve metadata. Omde herbruikbaarheid van de leerobjecten te verhogen, baseren we ons voor de struc-tuur op de IEEE standaard en is het dus vooral belangrijk de vindbaarheid vande leerobjecten te verbeteren door meer en betere metadata te creeren.

Naast deze algemene definitie, is het interessant een tweede definitie te gebruiken,die de nadruk legt op de generatie van de metadata, omdat dit voor ons onder-zoek van belang is. Deze definitie luidt: informatie over een verzameling gegevensdie door de gegevensbron of door een genererend algoritme aangeboden worden endie een beschrijving geven over de inhoud, het formaat en de bruikbaarheid vandie verzameling gegevens. In de context van dit werk, is de verwijzing naar hetgenererende algoritme interessant omdat dit de mogelijkheid aanbiedt metadataautomatisch aan te maken.

De ontwikkeling van de standaard voor leerobject-metadata, de IEEE LOM stan-daard, heeft ertoe geleid dat het gebruik leerobjecten en hun metadata wijd ver-spreid is geworden, maar toch lijden projecten nog steeds onder het probleem ommetadata aan te maken. Dit heeft onder andere als gevolg, dat

• de meeste projecten moeilijkheden hebben met het bekomen van een kritischemassa om hergebruik echt te kunnen verwezenlijken,

• dat veel leerobjecten slechts een beperkte set van metadata met zich geasso-cieerd hebben, en

• dat metadata nadat ze eenmaal zijn toegevoegd niet meer aangepast of ver-beterd worden.

De voornaamste reden hiervoor is dat de toepassingen die ontwikkeld zijn om meta-data toe te voegen aan een leerobject, vaak niet gebruiksvriendelijk zijn. Meestal

7.2 Dynamische Levenscyclus 135


Authoring Offering


Labeling

Figuur 7.1: De Dynamische Levenscyclus voor Leerobjecten

omdat die toepassingen te dicht aansluiten bij de metadata-standaard waardoorde gebruikers verscheidene ingewikkelde formulieren moeten invullen. Een bekendeslogan die gebruikt wordt om dit probleem aan te pakken, is dat ’elektronische for-mulieren moet sterven’.

Het probleem van statische metadata proberen we in dit werk op te lossen door eendynamische levenscyclus voor leerobjecten te definieren waarbij de indexeringsfasein parallel geplaatst wordt met al de andere fasen, waardoor het mogelijk wordtde metadata in elke fase aan te passen en te verbeteren.

Een mogelijke oplossing voor het tekort aan metadata, is de automatische gen-eratie ervan. Op deze manier worden de gebruikers niet meer geconfronteerd metde metadata als ze dat niet nodig vinden. In dit werk ontwikkelen we een methodevoor het automatisch indexeren van leerobjecten, waarbij we vooral aandacht be-steden aan het integreren van bestaande methodes uit andere onderzoeksdomeinenvoor automatische metadatageneratie voor toepassing bij leerobjecten.

7.2 Dynamische Levenscyclus

7.2.1 Levenscyclus

Elk object, elektronisch of niet, doorloopt een levenscyclus van zodra het gemaaktis. Wat de metadata betreft, is deze levenscyclus voor leerobjecten tot nu toe ge-negeerd ofwel werden de metadata als statisch beschouwd. Dynamische metadatakunnen echter veel meer geavanceerde informatie bevatten, die niet verkregen kanworden in een statische levenscyclus, met als gevolg dat de metadata veel gea-vanceerder gebruikt kunnen worden en dus de herbruikbaarheid van de leerob-jecten verhoogd kan worden. De nadruk van de dynamische levenscyclus ligt ophergebruik van leerobjecten en dynamische metadata.

We stimuleren hergebruik enerzijds door de nadruk te leggen op het zoeken vangeschikte leerobjecten (de Obtaining fase) en anderzijds door de introductie vaneen keuzepunt in cyclus dat toelaat bestaande leerobjecten rechtstreeks te herge-bruiken in een cursus of het object te gebruiken als basis voor een nieuw leerobject.



De cyclus wordt weergegeven in Figuur 7.1.

Dynamische metadata worden verkregen door de indexeringsfase parallel te plaat-sen met de andere fasen van de levenscyclus, zodat bestaande metadata constantkunnen worden aangepast of nieuwe metadata kunnen worden toegevoegd.

De verschillende fasen die we in de levenscyclus onderscheiden zijn de volgende:

• Obtaining fase: het verkrijgen van een geschikt bestaand object door het tezoeken met behulp van metadata,

• Repurposing fase: het aanpassen van het gevonden object aan de eigen spe-cifieke noden of onderdelen afzonderen voor hergebruik,

• Authoring fase: het aanmaken van een nieuw leerobject, al dan niet gebaseerdop bestaande leerobjecten,

• Offering fase: het aanbieden/publiceren van een object in een database,

• Integration fase: het invoegen van een leerobject in een cursusomgeving,

• Using fase: het effectieve gebruik door de lerenden,

• Retaining fase: de fase waarin beslist wordt of het object al dan niet bewaardblijft.

7.2.2 Technologieen om Informatie te Vergaren

Om metadata te kunnen genereren in de verschillende fasen van de levenscyclus,is het belangrijk dat we de juiste technologieen aanspreken in elke fase. We on-derscheiden de volgende vier technologieen om die informatie te verkrijgen, die elktoegepast kunnen worden in een of meerdere fasen van de levenscyclus:

1. Informatie-extractieInformatie-extractie gebruiken we om metadata uit de inhoud van het objectzelf of uit de context waarin het object gebruikt wordt te halen. Typischevoorbeelden van de gebruikte technieken zijn sleutelwoord-extractie en taal-bepaling. Informatie-extractie is vooral toepasbaar in de fasen waarin hetleerobject ontwikkeld wordt of waarin het in een cursus toegevoegd wordt.Dit zijn de fasen Repurposing, Authoring, Offering en Integration.

2. Sociale-informatievergaringBij sociale-informatievergaring wordt gebruik gemaakt van de sociale netwerkendie ontstaan door het gebruik van (leer-)objecten door verschillende gebruik-ers. De verkregen informatie wordt gebruikt om aanbevelingen te doen naar

7.3 Automatische Metadata-Generatie 137

de gebruikers. Dit is gekend van online winkels die klanten mogelijk interes-sante items aanraden aan de hand van het koopgedrag van andere klanten.Een tweede belangrijk aspect – vooral opgekomen door de ontwikkeling vanhet world-wide web – is de samenwerking bij het aanmaken van informatie,zoals Wikipedia en blogs.

Omdat deze bron vooral gebruik maakt van de relaties tussen gebruikersvan leerobjecten, is ze het beste toepasbaar in de Obtaining fase en de Usingfase. In die eerste, kunnen aanbevelingen gemaakt worden om de gepasteobjecten te vinden; in de laatste kunnen gebruikers samenwerken aan demetadata van het leerobject dat ze gebruiken.

3. Expliciete FeedbackExpliciete feedback wordt gebruikt om de gebruikers de kans te geven com-mentaar op de leerobjecten of op het zoeksysteem te geven. Deze infor-matiebron is vooral interessant voor het verkrijgen van subjectieve beo-ordelingen over het gebruikte leerobject, omdat deze metadata moeilijk au-tomatisch te vergaren is. De fasen van de cyclus waarbinnen expliciete feed-back vooral gebruikt wordt, zijn de Obtaining fase, de Using fase en deRetaining fase.

4. Impliciete FeedbackImpliciete feedback wordt aangewend om informatie te verkrijgen over hetreele gebruik van de leerobjecten, zoals de reele tijd die nodig is om eenleerobject te ’studeren’, en om informatie over de gebruikte systemen tevergaren. Dit laatste wordt bijvoorbeeld aangewend om optimalisaties tekunnen uitvoeren aan de zoekmachines.

7.3 Automatische Metadata-Generatie

7.3.1 Metadatabronnen

Gebaseerd op de technologieen die we besproken hebben bij de levenscyclus, ont-wikkelen we een methode om metadata automatisch te genereren uit verschillendeinformatiebronnen. De bronnen die we onderscheiden zijn:

• Inhoud : de inhoud van het leerobject wordt aangewend om informatie uithet object zelf te halen, zoals sleutelwoorden of de taal van het object.

• Context : de context van een leerobject bevat informatie over de omgevingwaarin het leerobject zich bevindt. Dit kan zowel informatie over de on-twikkelaar of de auteur zijn (zoals de werkomgeving waarin deze het objectmaakt), als de cursussen waarin het object gebruikt wordt (bijvoorbeeld in-formatie over de studenten of de inhoud van de cursus).



• Reele gebruik : het reele gebruik van de leerobjecten voorziet ons van praktis-che metadata ten opzichte van de meer theoretische metadata die de anderebronnen ons kunnen bezorgen.

7.3.2 Meerdere Bronnen - Meerdere Waarden

Doordat we de mogelijkheid aanbieden metadata in verschillende fasen aan tepassen of toe te voegen, bestaat de kans dat er zich conflicten voordoen tussen deverschillende waarden voor eenzelfde metadata-element.

Om mogelijke conflicten tussen waarden op te lossen, introduceren we een waardedie het vertrouwen in een metadata-waarde aangeeft (de confidence-waarde). Dezewaarde wordt bepaald voor elke bron en technologie die metadata genereren enwordt bijgehouden tijdens bij de metadata. Wanneer conflicten tussen waarden zichvoordoen en er een keuze gemaakt moet worden tussen verschillende metadata-waarden, kan deze confidence-waarde gebruikt worden om het conflict op te lossen.Een simpele optie is bijvoorbeeld altijd te kiezen voor de metadata-waarde met dehoogste confidence-waarde.

Algemener rijst de vraag naar de betrouwbaarheid of de correctheid van de verkre-gen metadata. Voor leerobject-metadata, argumenteren we dat het belangrijk isgoed metadata te verkrijgen, maar dat volledig correcte metadata niet haalbaarzijn binnen de beperkingen van tijd en kosten. Deze onzekerheid over de correc-theid van de waarde kunnen we integreren in de metadata zelf door gebruik temaken van de bovengenoemde confidence-waarde.

7.4 Formele Model

Om de ontwikkeling van het raamwerk voor automatische metadatageneratie mo-gelijk te maken, hebben we een formeel model gedefinieerd voor leerobject-metadata.Dit model biedt ons ook de kans op een rigoureuze manier te redeneren overmetadata; we passen dit onder andere toe voor de definitie van regels voor hetpropageren van metadata in het geval leerobjecten hergebruikt worden. Naast dedefinitie van metadata en de confidence-waarde uit de vorige sectie, introducerenwe in dit model ook de notie van context-afhankelijke metadata. Dit laatste maakteen nog efficienter gebruik van de metadata mogelijk.

We verwijzen naar Appendix A voor de samenvatting van de verschillende defini-ties die onderdeel uitmaken van dit model.

Voor de definitie van de propagatieregels van metadata, focussen we ons op hethergebruik door middel van aggregaties, dit zijn leerobjecten die samengesteld zijn

7.5 Raamwerk voor Automatische Metadatageneratie 139

uit verschillende andere leerobjecten. Hergebruik is in veel gevallen te classificerenals dit soort hergebruik: ofwel worden component uit het aggregaat gehaald voorindividueel hergebruik, ofwel worden verschillende objecten samengevoegd tot eennieuw aggregaat. In beide gevallen bestaan er relaties tussen de metadata vanhet aggregaat en de verschillende componenten. Afhankelijk van het type van hetmetadata-element, definieren we regels die toelaten de metadata automatisch tebepalen aan de hand van bestaande metadata.

7.5 Raamwerk voor Automatische Metadatagen-eratie

Het raamwerk voor automatische metadatageneratie dat we implementeren, bestaatuit twee grote onderdelen. Ten eerste is er het onderdeel dat klassen voorziet diemetadata-waarden voor een gegeven leerobject bepalen. Het tweede onderdeel iseen service-georienteerde architectuur die een eenvoudige interface aanbiedt om hetraamwerk in te schakelen in bestaande leerobjectsystemen. In dit werk bestedenwe vooral aandacht aan het eerste onderdeel en bespreken een initiele implemen-tatie van het service-systeem. Momenteel, wordt dit laatste verder ontwikkeld alsonderdeel van gerelateerd onderzoek.

7.5.1 Klassenstructuur

In de klassenstructuur onderscheiden we twee groepen van klassen: Indexer -klassenen MetadataMerger -klassen. Tot de eerste groep behoren de klassen die de eigen-lijke metadata bepalen aan de hand van de leerobjecten of andere beschikbarebronnen. De MetadataMerger-klassen implementeren strategieen om metadata vanverschillende bronnen samen te voegen tot de uiteindelijke waarden. Hierbij wordtgebruik gemaakt van de confidence-waarde die we eerder al besproken hebben.

Het onderscheid tussen verschillende bronnen van metadata wordt onder anderegemaakt door gebruik te maken van enerzijds object-gebaseerd indexer-klassen(ObjectBasedIndexer) en anderzijds de metadata-generatoren voor context-afhankelijkemetadata (ContextBasedIndexer). De contexten waarmee in de eerste implemen-tatie vooral gewerkt wordt zijn contexten gerelateerd aan elektronische leeromgevin-gen zoals Blackboard.

7.5.2 The Simple Indexing Interface

De service-georienteerde architectuur die we aanbieden voor het gebruik van hetraamwerk, draagt de naam Simple Indexing Interface, omdat het een eenvoudigemethode aanbiedt om metadata voor leerobjecten automatisch te genereren. Mo-menteel wordt de implementatie van deze architectuur verder gewerkt als onderdeel



van gerelateerd onderzoek. In het kader van het huidig onderzoek hebben we eeneerste prototype gemaakt om de bruikbaarheid ervan aan te tonen.

De evaluatie van het raamwerk hebben we uitgevoerd door middel van enkeletests waarbij we metadata voor leerobjecten automatisch hebben gegenereerd envergeleken met de beschikbare metadata in de ARIADNE Knowledge Pool. Uitdeze – weliswaar beperkte – tests blijkt dat het raamwerk een goed hulpmiddelkan zijn voor het indexeren van leerobjecten. Meer formele en uitgebreidere testenmoeten echter zeker uitgevoerd worden in het kader van de verdere ontwikkelingvan het raamwerk en de service interface.

7.6 Besluit

De bijdragen van dit werk kunnen we als volgt samenvatten:

1. Door de definitie van een dynamische levenscyclus maken we het mogelijkom meer, maar ook rijkere metadata over een leerobject te weten te komen.Dit past in de huidige ontwikkelingen bij het world-wide web, waarbij meeren meer aandacht besteed wordt aan de semantiek van de informatie diebeschikbaar is, en waarbij die informatie ook een heel dynamisch karaktergekregen heeft.

2. De implementatie van een raamwerk voor automatische metadata-gene-ratie, gebaseerd op een formeel model, zorgt ervoor dat metadata op eeneenvoudige manier verkregen kunnen worden, in elke fase van de levenscy-clus. We raadplegen daarvoor de verschillende bronnen die beschikbaar zijnin die verschillende fasen.

Om ervoor te zorgen dat mogelijke conflicten tussen verschillende bronnenopgelost kunnen worden, hebben we een confidence-waarde ingevoerd dieaanduidt hoe zeker het raamwerk is over de gegeneerde waarden.

7.6.1 Mogelijk Verder Onderzoek

1. Dynamische Levenscyclus: Veel projecten definieren levenscycli binnenhet kader waarin zij werken, voor ons is dat in principe niet anders. Het isonze opinie dat het een goede zaak zou zijn indien een standaard ontworpenkan worden voor de levenscyclus van leerobjecten, rekening houdend met deverschillende aspecten, zoals het dynamische karakter van de metadata.

2. Context-afhankelijkheid In het formele model van de metadata, intro-duceren we context-afhankelijkheid van metadata. Het is echter heel inter-essant te onderzoeken hoe dit concreet kan worden geımplementeeerd en of

7.6 Besluit 141

er effectief voordelen uit te halen zijn voor het gebruik en hergebruik vanleerobjecten. Ook het gebruik van contexten voor de generatie van metadata,staat nog in de kinderschoenen en behoeft verder onderzoek.

3. Conflict-resolutie tussen verschillende bronnen van metadata: momenteelimplementeren we een heel eenvoudige strategie om conflicten tussen meta-databronnen op te lossen. Het kan echter interessant zijn te kijken welkegeavanceerde technieken hier gebruikt kunnen worden.



Bibliography

[Abowd and Mynatt, 1999] G.D. Abowd and E.D. Mynatt. Charting Past,Present, and Future Research in Ubiquitous Computing. ACM Transactionson Computer-Human Interaction, special issue on HCI in the new Millenium,7(1):29–58, 1999.

[ADL, 2004] ADL. Sharable content object reference model (scorm) content ag-gregation model (cam), 2004. Advanced Distributed Learning (ADL), http:

//www.adlnet.gov/scorm/index.cfm.

[Allert et al., 2004] H. Allert, C. Richter, and W. Nejdl. Lifelong learning andsecond-order learning objects. British Journal of Educational Technology,35(6):701–715, 2004.

[An, 2001] X. An. A Chinese view of Records Continuum methodology and impli-cations for managing Electronic Records. available http://www.caldeson.com/

RIMOS/xanuum.html, 2001.

[Anderson and Perez-Carballo, 2001] J.D. Anderson and J. Perez-Carballo. Thenature of indexing: how humans and machines analyze messages and texts forretrieval: part i: research, and the nature of human indexing. Inf. Process.Manage., 37(2):231–254, 2001.

[Anjewierden and Kabel, 2001] A. Anjewierden and S. Kabel. Automatic indexingof documents with ontologies, 2001.

[Apple Computer, 2005] Inc Apple Computer. Spotlight, find anything on yourmac instantly. Technology Brief, http://images.apple.com/macosx/pdf/MacOSXSpotlight TB.pdf, 2005.

[Apps and MacIntyre, 2003] A. Apps and R. MacIntyre. Using the OpenURLFramework to Locate Bibliographic Resources. In Proceedings of 2003 DublinCore Conference: Supporting Communities of Discourse and Practice-MetadataResearch & Applications, 2003.

143

144 BIBLIOGRAPHY

[Barritt and Lewis, 1999] C. Barritt and D. Lewis. Reusable learning object strat-egy, definition, creation process, and guidelines for building. Technical report,CISCO Systems, Inc., 1999.

[Barton et al., 2003] J. Barton, S. Currier, and J. Hey. Building quality assuranceinto metadata creation: an analysis based on the learning objects and e-printscommunities of practice. In Proceedings of the 2003 Dublin Core Conference,2003.

[Bateman et al., 2006] S. Bateman, C. Brooks, and G. McCalla. Collaborativetagging approaches for ontological metadata in adaptive e-learning systems. InInternational Workshop on Applications of Semantic Web technologies for E-Learning, 2006.

[Bellaachia et al., 2006] A. Bellaachia, E. Vommina, and B. Berrada. Minel: Aframework for mining e-learning logs. In Proceedings of Web-based Education,2006.

[Brasher and McAndrew, 2004] A. Brasher and P. McAndrew. Human-generatedlearning object metadata. In R. Meersman, Zahir Tari, and A. Corsaro, editors,Proceedings of On the Move to Meaningful Internet Systems 2004: OTM 2004Workshops: OTM Confederated International Workshops and Posters, GADA,JTRES, MIOS, WORM, WOSE, PhDS, and INTEROP 2004, volume 3292 /2004, pages 723 –730, 2004.

[Brin and Page, 1998] S. Brin and L. Page. The anatomy of a large-scale hypertex-tual Web search engine. Computer Networks and ISDN Systems, 30(1/7):107–117, 1998.

[Brooks and McCalla, 2006] C. Brooks and G. McCalla. Towards flexible learningobject metadata. Int. J. Cont. Engineering Education and Lifelong Learning,16(1/2):50–63, 2006.

[Brooks et al., 2003] C. Brooks, J. Cooke, and J. Vassileva. Versioning of Learn-ing Objects. In Proceedings of the The 3rd IEEE International Conference onAdvanced Learning Technologies (ICALT’03), 2003.

[Brophy and Velankar, 2006] S. Brophy and Y. Velankar. Work in progress: Cat-aloging instructional design patterns that facilitate generative learning. In Pro-ceedings of the 36th Frontiers in Education Conference (FIE 2006), 2006.

[Bruce and Hillman, 2004] T. Bruce and D. Hillman. The continuum of meta-data quality: defining, expressing, exploiting. In D.Hillman and L. Westbrooks,editors, Metadata in Practice. Chicago: American Library Association, 2004.

BIBLIOGRAPHY 145

[Brusilovsky et al., 1998] P. Brusilovsky, J. Eklund, and E. Schwarz. Web-basededucation for all: A tool for developing adaptive courseware. In Proceedings of7th International World Wide Web Conference, 1998.

[Cardinaels and Duval, 2003] K. Cardinaels and E. Duval. Composite learningobjects: exposing the components. In Proceedings of the 3rd Annual ARIADNEConference, pages 1–7. ARIADNE Foundation, 2003.

[Cardinaels et al., 1998] K. Cardinaels, K. Hendrikx, E. Vervaet, E. Duval,H. Olivie, F. Haenni, K. Warkentyne, M. Wentland-Forte, and E. Forte. Aknowledge pool system of reusable pedagogical elements. In 4th InternationalConference on Computer Aided Learning and Instruction in Science and Engi-neering, pages 54–62, jan 1998.

[Cardinaels et al., 2002] K. Cardinaels, E. Duval, and H. Olivie. Issues in Auto-matic Learning Object Indexation. In Proceedings of ED-MEDIA World Con-ference on Educational Multimedia, Hypermedia & Telecommunications, pages239–240. AACE, 2002.

[Cardinaels et al., 2005] K. Cardinaels, M. Meire, and E. Duval. AutomatingMetadata Generation: the Simple Indexing Interface. In Proceedings of the 14thInternational World Wide Web Conference (WWW2005), pages 548–556, 2005.

[Carrasco et al., 2003] Ramon Alberto Carrasco, Marıa Amparo Vila, and JoseGalindo. FSQL: a flexible query language for data mining, pages 68–74. KluwerAcademic Publishers, Hingham, MA, USA, 2003.

[Cavnar and Trenkle, 1994] W.B. Cavnar and J. M. Trenkle. N-Gram-Based TextCategorization. In Proceedings of Third Annual Symposium on Document Anal-ysis and Information Retrieval, Las Vegas, NV, pages 161–175. UNLV Publica-tions/Reprographics, 1994.

[Chalmers et al., 2004] D. Chalmers, N. Dulay, and M. Sloman. Towards Reason-ing About Context in the Presence of Uncertainty. In Proceedings of Workshopon Advanced Context Modelling, Reasoning And Management at UbiComp 2004,2004.

[Chassell, 1997a] R.J. Chassell. Certainty factors. http://www.rattlesnake.com/

notions/certainty-factors.html, 1997.

[Chassell, 1997b] R.J. Chassell. An exercise using certainty factors. http://www.

rattlesnake.com/notions/certainty-factor-exercise.html, 1997.

[Croft, 1995] W.B. Croft. What do people want from information retrieval. D-LibMagazine, Nov 1995.

146 BIBLIOGRAPHY

[Currie and Place, 2000] D. Currie and C. Place. Learning object containers: Asuggested method of transporting metadata with a learning object. In PietKommers and Griff Richards, editors, Proceedings of World Conference on Ed-ucational Multimedia, Hypermedia and Telecommunications 2000, pages 1300–1301, Chesapeake, VA, 2000. AACE.

[Dalziel, 2002] J. Dalziel. Reflections on the COLIS (Collaborative Online Learn-ing and Information Systems) Demonstrator Project and the ”Learning ObjectLifecycle”. In Proceedings of ASCILITE 2002, 2002.

[Dalziel, 2003] J. Dalziel. The learning object lifecycle: Creation, trading, instal-lation, digital rights management, presentation. http://www.melcoe.mq.edu.au/

documents/EDUCAUSE LOLifecycle2003rev.ppt, 2003.

[De Bra et al., 2004] P. De Bra, L. Aroyo, and V. Chepegin. The next big thing:Adaptive web-based systems. Journal of Digital Information, 5(1), 2004.

[Declerck et al., 2004] T. Declerck, J. Contreras, O. Corcho, and C. Crispi. Text-based semantic annotation service for multimedia content in the esperontoproject. European Workshop on the Integration of Knowledge, Semantics andDigital Media Technology, 2004.

[Deniman et al., 2003] D. Deniman, T. Sumner, L. Davis, S. Bhushan, and J. Fox.Merging metadata and content-based retrieval. Journal of Digital Information,4(3), 2003.

[Dey, 2000] A.K. Dey. Providing Architectural Support for Building Context-AwareApplications. PhD thesis, Georgia Institute of Technology, 2000.

[Doan et al., 2002] B. Doan, W. Kekhia, and Y. Bourda. A semi-automatic toolfor the indexation of learning objects. In Proceedings of ED-MEDIA WorldConference on Educational Multimedia, Hypermedia and Telecommunications,pages 190–191, 2002.

[Downes, 2001] S. Downes. Learning objects: Resources for distance educationworldwide. International Review of Research in Open and Distance Learning,2(1), 2001.

[Duval and Hodgins, 2003] E. Duval and W. Hodgins. A LOM research agenda. InProceedings of the twelfth international conference on World Wide Web, pages1–9. ACM Press, 2003.

[Duval and Hodgins, 2004] Erik Duval and Wayne Hodgins. Making metadata goaway - hiding everything but the benefits. In DC-2004: Proceedings of the Inter-national Conference on Dublin Core and Metadata Applications, pages 29–35.shangai library, Shangai scientific & technological literature publishing house,2004.

BIBLIOGRAPHY 147

[Duval et al., 2000] E. Duval, E. Vervaet, B. Verhoeven, K. Hendrikx, K. Cardi-naels, H. Olivie, E. Forte, F. Haenni, K. Warkentyne, M. Wentland-Forte, andF. Simillion. Managing Digital Educational Resources with ARIADNE Meta-data system. Journal of Internet Cataloging, 3(1 and 2/3):145–171, 2000. Alsopublished as book with Editor J. Greenberg : Metadata and Organizing Educa-tional Resources on the Internet, ISBN 0-7890-1178-6 en 0-7890-1179-4.

[Duval et al., 2001] E. Duval, E. Forte, K. Cardinaels, B. Verhoeven,R. Van Durm, K. Hendrikx, M. Wentland-Forte, N. Ebel, M. Macowicz,K. Warkentyne, and F. Haenni. The ARIADNE Knowledge Pool System.Communications of the ACM, 44(5):73–78, may 2001.

[Duval et al., 2002] E. Duval, W. Hodgins, S. Sutton, and S.L. Weibel. Metadataprinciples and practicalities. D-Lib Magazine, 8(4), 2002.

[Duval, 1999] E Duval. An Open Infrastructure for Learning - the ARIADNEproject - Share and Reuse without boudaries. In Proceedings of ENABLE99 -Enabling Network-Based Learning, pages 144–151, June 1999. keynoteURL =http://www.enable.evitech.fi/enable99/papers/duval/duval.html.

[Elmasri and Navathe, 2004] R. Elmasri and S.B. Navathe. Fundamentals ofDatabase Systems, Fourth Edition. Pearson, Addison Wesley, 2004.

[Farance, 1999] F. Farance. Learning Objects - Definitions. Edutool.Com, a divi-sion of Farance Inc., 3 1999.

[Fitzpatrick and Dent, 1997] L. Fitzpatrick and M. Dent. Automatic feedbackusing past queries: Social searching? In SIGIR ’97: Proceedings of the 20thAnnual International ACM SIGIR Conference on Research and Developmentin Information Retrieval, July 27-31, 1997, Philadelphia, PA, USA, pages 306–313. ACM, 1997.

[Forte et al., 1997a] E. Forte, M. Wentland Forte, and E. Duval. The ARIADNEproject (part 1): Knowledge pools for computer-based and telematics-supportedclassical, open and distance education. European Journal of Engineering Edu-cation, 22(1):61–74, 1997.

[Forte et al., 1997b] E. Forte, M. Wentland Forte, and E. Duval. The ARIADNEproject (part 2): Knowledge pools for computer-based and telematics-supportedclassical, open and distance education. European Journal of Engineering Edu-cation, 22(2):153–166, 1997.

[Friesen, 2001] N. Friesen. What are educational objects. Interactive LearningEnvironments, 9(3), 2001.

[Gasser and Stvilia, 2001] L. Gasser and B. Stvilia. A new framework for infor-mation quality. Technical report, ISRN UIUCLIS, 2001.

148 BIBLIOGRAPHY

[Golder and Huberman, 2005] S. Golder and B.A. Huberman. The structure ofcollaborative tagging systems. Technical report, Information Dynamics Lab,HP Labs, 2005.

[Greenberg et al., 2003] J. Greenberg, A. Crystal, W.D. Robertson, andE. Leadam. Iterative Design of Metadata Creation Tools for Resource Authors.In S. Sutton, J. Greenberg, and J. Tennis, editors, 2003 Dublin Core Confer-ence: Supporting Communities of Discourse and Practice - Metadata Researchand Applications. DC-2003: Proceedings of the International DCMI Conferenceand Workshop., pages 49–58, 2003.

[Greenberg et al., 2005] J. Greenberg, K. Spurgin, and A. Crystal. Final reportfor the amega (automatic metadata generation applications) project. Technicalreport, School of Information and Library Science, 2005.

[Greenberg et al., 2006] J. Greenberg, K. Spurgin, and A. Crystal. Functionalitiesfor automatic metadata generation applications: a survey of metadata experts’opinions. Int. J. Metadata, Semantics and Ontologies, 1(1):3–20, 2006.

[Greenberg, 2002] J. Greenberg. Metadata and the world wide web. Encyclopediaof Library and Information Science, 72(35):244–261, 2002.

[Greenberg, 2003] J. Greenberg. Metadata generation: Processes, people and tools.Bulletin of the American Society for Information Science and Technology, vol.29, nr. 2, 2003. http://www.asis.org/Bulletin/Dec-02/greenberg.html, accessedNovember 17, 2006.

[Greenberg, 2004] J. Greenberg. Metadata Extraction and Harvesting: A Compar-ison of Two Automatic Metadata Generation Applications. Journal of InternetCataloging, 6(4):59–82, 2004.

[Guy et al., 2003] M. Guy, A. Powell, and M. Day. Improving the quality of meta-data in eprint archives. Ariadne, 38, 2003.

[Han et al., 2003] Hui Han, C. Lee Giles, Eren Manavoglu, Hongyuan Zha,Zhenyue Zhang, and Edward A. Fox. Automatic document metadata extrac-tion using support vector machines. In JCDL ’03: Proceedings of the 3rdACM/IEEE-CS joint conference on Digital libraries, pages 37–48, Washington,DC, USA, 2003. IEEE Computer Society.

[Hatala and Forth, 2003] M. Hatala and S. Forth. A comprehensive system forcomputer-aided metadata generation. In Proceedings of 12th International Con-ference of The World Wide Web Consortium (WWW2003), Budapest, May 20-24, 2003.

BIBLIOGRAPHY 149

[Hatala and Richards, 2003] M. Hatala and G. Richards. Value-added metatag-ging: Ontology and rule based methods for smarter metadata. In M. Schroederand G. Wagner, editors, Rules and Rule Markup Languages for the SemanticWeb (RuleML2003), volume 2876 of Lecture Notes in Computer Science, pages65–80. Springer Berlin/Heidelberg, 2003.

[Haveliwala et al., 2002] T. Haveliwala, A Gionis, D. Klein, and P. Indyk. Evalu-ating strategies for similarity search on the web. In Proceedings of WWW2002,The Eleventh International World Wide Web Conference, 2002.

[Hodgins, 2000] W. Hodgins. Into the future: a vision paper. Technical report,Commission on Technology and Adult Learning, 2000.

[Hughes, 2004] B. Hughes. Metadata quality evaluation: Experience from the openlanguage archives community. In Proceedings of the 7th International Conferenceon Asian Digital Libraries (ICADL 2004), volume 3334 of Lecture Notes onComputer Science, pages 320–329, 2004.

[IEEE, 2002] IEEE. Standard for Learning Object Metadata, jul 2002. Sponsoredby the Learning Technology Standards Committee of the IEEE, http://ltsc.ieee.org.

[ISO 15836:2003, 2003] Information and documentation the dublin core metadataelement set, 2003. ISO Standard 15836:2003.

[James et al., 2003] H. James, R. Ruusalepp, S. Anderson, and S. Pinfield. Fea-sibility and requirements study on preservation of e-prints. Technical re-port, Report Commissioned by the Joint Information Systems Committee,2003. available online at: http://www.jisc.ac.uk/uploaded documents/e-prints

report final.pdf.

[Joachims, 2002] T. Joachims. Optimizing search engines using clickthrough data.In KDD, pages 133–142, 2002.

[Kabel et al., 2003] S. Kabel, R. de Hoog, and B.J. Wielinga. Consitency inindexing learning objects: an empirical investigation. In Learning Objects2003 Symposium: Lessons Learned, Questions Asked, 2003. online version:http://www.cs.kuleuven.ac.be/∼erikd/PRES/2003/LO2003/Kabel.pdf.

[Kabel et al., 2004] S. Kabel, R. de Hoog, B. Wielinga, and A. Anjewierden. In-dexing learning objects: Vocabularies and empirical investigation of consistency.Journal of Educational Multimedia and Hypermedia, 13(4):405–425, October2004. Special Issue: Special Issue.

[Kim and Seamus, 2006] Y. Kim and R. Seamus. Automating metadata extrac-tion: Genre classification. In Proceedings UK e-Science All Hands Meeting 2006:Achievements, Challenges, and New Opportunities, Nottingham, 2006.

150 BIBLIOGRAPHY

[Kirsch et al., 2006] S.M. Kirsch, M. Gnasa, and A.B. Cremers. Beyond the web:Retrieval in social information spaces. In Proceedings of the 28th EuropeanConference on Information Retrieval (ECIR 2006), 2006.

[Knauff, 2001] B. Knauff. Merlot promises efficiency and quality control. availablehttp://www.dartmouth.edu/∼webteach/spotlights/merlot.html, 2001.

[Kolmel, 2006] B. Kolmel. Ambient learning final report. Technical report, CASSoftware AG, 2006.

[Krotzsch et al., 2005] M. Krotzsch, D. Vrandecic, and M. Volkel. Wikipedia andthe semantic web - the missing links, 2005.

[Krueger, 1992] C.W. Krueger. Software reuse. ACM Comput. Surv., 24(2):131–183, 1992.

[Li et al., 2005] Ying Li, Chitra Dorai, and Robert Farrell. Creating magic: systemfor generating learning object metadata for instructional content. In MULTI-MEDIA ’05: Proceedings of the 13th annual ACM international conference onMultimedia, pages 367–370, New York, NY, USA, 2005. ACM Press.

[Liddy et al., 2002] E.D. Liddy, E. Allen, S. Harwell, S. Corieri, O. Yilmazel, N.E.Ozgencil, A. Diekema, N. McCracken, J. Silverstein, and S. Sutton. Automaticmetadata generation & evaluation. In SIGIR ’02: Proceedings of the 25th annualinternational ACM SIGIR conference on Research and development in informa-tion retrieval, pages 401–402, New York, NY, USA, 2002. ACM Press.

[Lih, 2004] A. Lih. Wikipedia as participatory journalism: Reliable sources? met-rics for evaluating collaborative media as a news resource. In 5th InternationalSymposium on Online Journalism, 2004.

[MacGregor and In-Young Ko, 2003] R. MacGregor and In-Young Ko. Represent-ing Contextualized Data using Semantic Web Tools. In Proceedings of the 1st In-ternational Workshop on Practical and Scalable Semantic Systems, ISWC 2003,2003.

[Malloy and Hanley, 2001] T. E. Malloy and G. L. Hanley. Merlot: A faculty-focused website of educational resources. Behavior Research Methods Instru-ments & Computers, 33:274–276, 2001.

[Manning et al., 2005] C.D. Manning, P. Raghavan, and H. Schutze. An Introduc-tion to Information Retrieval. Cambridge University Press, 2005. preliminarydraft.

[Marchiori, 1998] M. Marchiori. The limits of Web metadata, and beyond. InWWW7: Proceedings of the seventh international conference on World Wide

BIBLIOGRAPHY 151

Web 7, pages 1–9, Amsterdam, The Netherlands, The Netherlands, 1998. Else-vier Science Publishers B. V.

[Mayorga et al., 2006] J.I. Mayorga, B. Barros, C. Celorrio, and M.F. Verdejo.Accessing a learning object repository through a semantic layer. In Workshopon ’Learning Object Repositories as Digital Libraries: Current challenges’, 2006.

[Mazzieri, 2004] M. Mazzieri. A fuzzy rdf semantics to represent trust metadata.In Proceedings of Semantic Web Applications and Perspectives (SWAP), 1stItalian Semantic Web Workshop, 2004.

[McCalla, 2004] G. McCalla. The ecological approach to the design of e-learningenvironments: Purpose-based capture and use of information about learners.Journal of Interactive Media in Education, 1(7):23, 2004. Special Issue on theEducational Semantic Web.

[McGreal, 2002] R. McGreal. Learning Objects: A Practical Definition. Interna-tional Journal of Instructional Technology & Distance Education, 2002.

[McKell and Thropp, 2001] M. McKell and S. Thropp. Ims learning resourcemeta-data information model, 2001. IMS Global Learning Consortium, Inc.

[Meire et al., 2007] M. Meire, E. Duval, and X. Chebab. Samgi: Automatic meta-data generation v2.0. In Proceedings of ED-MEDIA 2007, World Conference onEducational Multimedia, Hypermedia & Telecommunications, 2007. submitted.

[Melnik, 1999] S. Melnik. Algebraic specification for rdf models. Technical report,Stanford University, 1999.

[Metros and Bennett, 2004] S. Metros and K. Bennett. Learning objects in highereducation: The sequal. Research Bulletin 11, EDUCAUSE Center for AppliedResearch (ECAR), 2004.

[Mihaila et al., 2000] George A. Mihaila, Louiqa Raschid, and Marıa-Esther Vidal.Using quality of data metadata for source selection and ranking. In WebDB(Informal Proceedings), pages 93–98, 2000.

[Milli et al., 1995] H. Milli, F. Mili, and A. Mili. Reusing Software: Issues andResearch Directions. IEEE Transactions on Software Engineering, 21(6):528–562, 1995.

[Moen et al., 1997] W. E. Moen, E. L. Stewart, and C. R. McClure. The roleof content analysis in evaluating metadata for the us government informationlocator service (gils): results from an exploratory study. http://www.unt.edu/

wmoen/publications/GILSMDContentAnalysis.htm, 1997. accessed November 17,2006.

152 BIBLIOGRAPHY

[Moens, 1999] M.F. Moens. Automatically Indexing and Abstracting the Con-tent of Document Texts. PhD thesis, Faculteit Toegepaste Wetenschappen,K.U.Leuven, 1999.

[Nabeth et al., 2004] T. Nabeth, A.A. Angehrn, and R. Balakrishnan. Integratingcontext in e-learning systems design. icalt, 0:355–359, 2004.

[Nadkarni et al., 2001] P. Nadkarni, R. Chen, and C. Brandt. Umls concept in-dexing for production databases: A feasibility study. Journal of the AmericanMedical Informatics Association, 8:80–91, 2001.

[Najjar et al., 2003] J. Najjar, S. Ternier, and E. Duval. The actual use of meta-data in ARIADNE: an empirical analysis. In Proceedings of the 3rd AnnualARIADNE Conference, pages 1–6. ARIADNE Foundation, 2003.

[Najjar et al., 2004] J. Najjar, S. Ternier, and E. Duval. User behavior in learn-ing object repositories: an empirical analysis. In Proceedings of the ED-MEDIA2004 World Conference on Educational Multimedia, Hypermedia and Telecom-munications, pages 4373–4379. AACE, 2004.

[Najjar et al., 2005] J. Najjar, M. Meire, and E. Duval. Attention Metadata Man-agement: Tracking the use of Learning Objects through Attention.XML. InProceedings of Ed-Media, 2005.

[Najjar et al., 2006] J. Najjar, M Wolpers, and E. Duval. Attention metadata:Collection and management. In WWW2006 - Workshop on Logging Traces ofWeb Activity: The Mechanics of Data Collection, 2006.

[Nelson, 1965] T. H. Nelson. Complex information processing: a file structure forthe complex, the changing and the indeterminate. In Proceedings of the 196520th national conference, pages 84–100, New York, NY, USA, 1965. ACM Press.

[Nesbit et al., 2002] J. Nesbit, K. Belfer, and J. Vargo. A convergent participationmodel for evaluation of learning objects. Canadian Journal of Learning andTechnology, 28(3), 2002.

[Nikitin et al., 2005] S. Nikitin, V.Y. Terziyan, Y. Tsaruk, and A. Zharko. Query-ing Dynamic and Context-Sensitive Metadata in Semantic Web. In Proceedingsof AIS-ADM 2005, 2005.

[NISO, 2004] Understanding metadata. NISO Press, http://www.niso.org/

standards/resources/UnderstandingMetadata.pdf, 2004.

[Norton and Panar, 2003] M. Norton and A. Panar. Ims simple sequencing infor-mation and behavior model, 2003. IMS Global Learning Consortium, Inc.

BIBLIOGRAPHY 153

[Ochoa and Duval, 2006] X. Ochoa and E. Duval. Quality metrics for learningobject metadata. In Piet Kommers and Griff Richards, editors, Proceedings ofWorld Conference on Educational Multimedia, Hypermedia and Telecommuni-cations 2006, pages 1004–1011, Orlando, FL, USA, June 2006. AACE.

[O’Connor et al., 2004] A. O’Connor, V. Wade, and O. Conlan. Context-InformedAdaptive Hypermedia. In Proceedings of Workshop on Advanced Context Mod-elling, Reasoning And Management at UbiComp 2004, 2004.

[Oliver, 2001] R. Oliver. Learning objects: supporting flexible delivery of flexiblelearning. In G. Kennedy, M. Keppell, C. McNaught, and T. Petrovic, editors,Meeting at the crossroads: Proceedings of ASCILITE 2001, pages 453–460, 2001.

[Pansanato and Fortes, 2005] L.T.E. Pansanato and R.P.M. Fortes. Strategies forautomatic lom metadata generating in a web-based cscl tool. In WebMedia ’05:Proceedings of the 11th Brazilian Symposium on Multimedia and the web, pages1–8, New York, NY, USA, 2005. ACM Press.

[Pederson, 1999] A. Pederson. Life Cycle and Continuum - one viewpoint.Australian Archivists mailing list, http://lists.archivists.org.au/pipermail/archivists.org.au/aus-archivists/1999-February/001129.html, 1999.

[Pierre, 2001] J.M. Pierre. On the automated classification of web sites. LinkopingElectronic Articles in Computer and Information Science, 6(0), 2001.

[Pinkwart et al., 2005] N. Pinkwart, N. Malzahn, D. Westheide, and H.U. Hoppe.Community support based on thematic objects and similarity search. In Pro-ceedings of the International Workshop on Applications of Semantic Web tech-nologies for E-Learning (SW-EL’05), 2005.

[Polsani, 2003] P.R. Polsani. Use and abuse of reusable learning objects. Journalof Digital Information, 3(4), 2003. Article No. 164.

[Powell et al., 2004] A. Powell, M. Nilsson, A. Naeve, and P. Johnston. DCMIAbstract Model. Technical report, Dublin Core Metadata Initiative, 2004.

[Powell, 2005] A. Powell. The ’discovery to delivery’ DLF reference model. JISC-CETIS Conference, Edinburgh, November 2005, 2005.

[Prieto-Dıaz, 1991] R. Prieto-Dıaz. Implementing faceted classification for soft-ware reuse. Commun. ACM, 34(5):88–97, 1991.

[Reigeluth and Nelson, 1997] C.M. Reigeluth and L.M. Nelson. A new paradigm ofISD. In R.C. Branch and B.B. Minor, editors, Educational media and technologyyearbook, volume 22, pages 24–35. Englewoord, CO: Libraries Unlimited, 1997.

154 BIBLIOGRAPHY

[Riley and McKell, 2003] K. Riley and M. McKell. Ims digital repositories in-teroperability - core functions information model, 2003. IMS Global LearningConsortium, Inc.

[Robson, 2002] R. Robson. Metadata Matters. Digital Mathematics Library, FirstDML Planning Meeting, Participant statements, Washington, DC, July 20-30,available http://www.library.cornell.edu/dmlib/robson.pdf, 2002.

[Saini et al., 2006] P.S. Saini, M. Ronchetti, and D. Sona. Automatic generationof metadata for learning objects. In Sixth International Conference on AdvancedLearning Technologies, pages 275–279, 2006.

[Salton et al., 1975] G. Salton, A. Wong, and C. S. Yang. A vector space modelfor automatic indexing. Commun. ACM, 18(11):613–620, 1975.

[Schafer et al., 1999] J.B. Schafer, J. Konstan, and J. Riedi. Recommender sys-tems in e-commerce. In EC ’99: Proceedings of the 1st ACM conference onElectronic commerce, pages 158–166, New York, NY, USA, 1999. ACM Press.

[Sibun and Spitz, 1994] P. Sibun and A.L. Spitz. Language determination: Natu-ral language processing from scanned document images. In Proceedings of theFourth ACL Conference on Applied Natural Language Processing (13–15 Octo-ber 1994, Stuttgart), 1994.

[Sicilia and Garcıa, 2003] M. Sicilia and E. Garcıa. On the concepts of usabilityand reusability of learning objects. International Review of Research in Openand Distance Learning, 4(3), 2003.

[Simon et al., 2004] B. Simon, D. Massart, and E. Duval. Simple Query InterfaceSpecification. Technical report, CEN/ISSS Workshop on Learning Technologies,2004.

[Simon et al., 2005] B. Simon, D. Massart, F. van Assche, S. Ternier, E. Du-val, S. Brantner, D. Olmedilla, and Z. Miklos. A Simple Query Interfacefor Interoperable Learning Repositories. In Proceedings of the 1st Work-shop on Interoperability of Web-based Educational Systems, pages 11–18, 2005.http://www.13s.de/∼olmedilla/events/interopPapers/paper02.pdf.

[Smythe and Jackl, 2004] S. Smythe and A. Jackl. Ims content packaging infor-mation model, 2004. IMS Global Learning Consortium, Inc.

[Smythe et al., 2001] C. Smythe, F. Tansey, and R. Robson. Ims learner infor-mation package information model specification, 2001. IMS Global LearningConsortium, Inc.

[Smythe et al., 2005] C. Smythe, D. Cambridge, and M. McKell. Ims eportfolioinformation model, 2005. IMS Global Learning Consortium, Inc.

BIBLIOGRAPHY 155

[Spoerri, 1995] A. Spoerri. InfoCrystal, A Visual Tour For Information Re-trieval. PhD thesis, MIT, 1995. online version http://www.scils.rutgers.edu/∼aspoerri/InfoCrystal/InfoCrystal.htm.

[Strijker, 2004] A. Strijker. Reuse of Learning Objects in Context, Human andTechnical Aspects. PhD thesis, University Twente, The Netherlands, 2004.

[Stuckenschmidt and van Harmelen, 2001] H. Stuckenschmidt and F. van Harme-len. Ontology-based metadata generation from semistructured information. InProceedings of the First Conference on Knowledge Capture (K-CAP’01), 2001.

[Swift, 1997] C. Swift. About the peer reviews of merlot learning materials. avail-able http://taste.merlot.org/catalog/peer review/, 1997.

[Ternier et al., 2003] Stefaan Ternier, Filip Neven, Erik Duval, Maciej Macowicz,and Norbert Ebel. Web services for Learning Object Repositories: a Case Study -the ARIADNE Knowledge Pool System. In Proceedings of Twelfth InternationalWorld Wide Web Conference, Budapest, Hungary, pages 203–204, 2003.

[Toms et al., 1999] E. Toms, D. Campbell, and R. Blades. Does genre define theshape of information: the role of form and function in user interaction withdigital documents. In Proceedings of the 62nd American Society for InformationScience Annual Meeting, pages 693–704, 1999.

[Van Assche and Vuorikari, 2006] F. Van Assche and R. Vuorikari. A frameworkfor quality of learning resources. In U. Ehlers and J.M. Pawlowski, editors,European Handbook for Quality and Standardization in E-Leanring. Springer,2006.

[Vargo et al., 2003] J. Vargo, J.C. Nesbit, K. Belfer, and A. Archambault. Learn-ing object evaluation: Computer mediated collaboration and inter-rater relia-bility. International Journal of Computers and Applications, 25, 2003.

[Verbert and Duval, 2004] K. Verbert and E. Duval. Towards a global architecturefor learning objects: a comparative analysis of learning object content models.In Proceedings of the ED-MEDIA 2004 World Conference on Educational Mul-timedia, Hypermedia and Telecommunications, pages 202–208, 2004.

[Verbert et al., 2005a] K. Verbert, J. Jovanovic, E. Duval, D. Gaevic, andM. Meire. Ontology-based learning content repurposing: the alocom framework.PROLEARN-iClass thematic workshop on Learning Objects in Context, March3-4, Leuven, Belgium, 2005.

[Verbert et al., 2005b] K. Verbert, J. Jovanovic, D. Gasevic, and E. Duval. Re-purposing learning object components. In OTM 2005 Workshop on Ontologies,Semantics and E-Learning, 2005.

156 BIBLIOGRAPHY

[Vittorini and Di Felice, 2000] P. Vittorini and P. Di Felice. Issues in CoursewareReuse for a Web-based Information System. In Proceedings of NAWeb2000,2000.

[Vuorikari et al., 2006] R. Vuorikari, N. Manouselis, and E. Duval. Using meta-data for storing, sharing and reusing evaluations for social recommendations:the case of learning resources. In D.H. Go and S. Foo, editors, Social Informa-tion Retrieval Systems: Emerging Technologies and Applications for Searchingthe Web Effectively. Hershey, PA: Idea Group Publishing, 2006. accepted forpublication.

[W3C, 2004] W3C. Resource description framework (rdf): Concepts and abstractsyntax. http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/, feb 2004.

[Weiss, 2005] A. Weiss. The power of collective intelligence. netWorker, 9(3):16–23, 2005.

[White et al., 2001] R.W. White, J.M. Jose, and I. Ruthven. Comparing explicitand implicit feedback techniques for web retrieval: Trec-10 interactive trackreport. In The tenth Text Retrieval Conference (TREC 2001), Gaithersburg,Maryland, November 13-16, 2001.

[White et al., 2002] R.W. White, I. Ruthven, and J.M. Jose. The use of implicitevidence for relevance feedback in web retrieval. In Lecture Notes in ComputerScience, volume 2291. Springer Berlin/Heidelberg, 2002.

[Wiley, 2001] D.A. Wiley. Connecting learning objects to instructional design the-ory: A definition, a metaphor, and a taxonomy. In D.A. Wiley, editor, The In-structional Use of Learning Objects, online version. Open Publication License,available http://reusability.org/read/chapters/wiley.doc, 2001.

[Wiley, 2003] D.A. Wiley. Learning objects: Difficulties and opportunities. avail-able http://wiley.ed.usu.edu/docs/lo do.pdf, 2003.

[Yusof and Chell, 2000] Zawiyah M. Yusof and Robert W. Chell. The Records LifeCycle: an inadequate concept for technology-generated records. InformationDevelopment, 16(3):135–141, 2000.

[Zadeh, 1995] L.A. Zadeh. Fuzzy sets. Information and Control, 8:338–353, 1995.

[Zaıane, 2000] O.R. Zaıane. Web usage mining for a better web-based learningenviron-ment. In Conf. on Advanced Technology for Education, Banff, Alberta,2000.

BIBLIOGRAPHY 157

Curriculum Vitae

Na de studie Wiskunde-Wetenschappen aan het St.-Hubertuscollege in Neerpelt,startte Kris Cardinaels in 1991 de opleiding Licentiaat Informatica aan de K.U. Leuven. In 1995 rondde hij deze studie af met onderscheiding en begon indatzelfde jaar te werken als projectmedewerker in de onderzoeksgroep Hyperme-dia en Databases van Prof. Olivie. In de periode 1996-2001 heeft hij vooral gewerktaan het project ARIADNE, onder leiding van Prof. Duval, met als voornaamste rolhet ontwikkelen van het Knowledge Pool Systeem – een gedistribueerde gegevens-bank voor het beheren van leerobjecten en hun metadata. Gedurende deze periodewerkte hij ook mee aan de ontwikkeling van de publicatiegegevensbank van hetdepartement Computerwetenschappen ter ondersteuning van de verslaggeving injaarrapporten en projectaanvragen voor het departement.

In 2001 startte Kris Cardinaels deeltijds als Gastprofessor aan de KatholiekeHogeschool Limburg binnen de opleiding Communicatie- en Multimediadesign (C-MD). In het daaropvolgende academiejaar verliet hij het departement Computer-wetenschappen als vaste medewerker en werd voltijds in dienst genomen door deKHLim. Voor de voltooiing van het doctoraat bleef hij vrij medewerker aan deK.U.Leuven.

158 BIBLIOGRAPHY

Publicaties van de Doctorandus

1. K. Cardinaels, E. Duval, H. Olivie, A Formal Model of Learning ObjectMetadata. In Innovative Approaches for Learning and Knowledge Sharing(W. Nejdl and K. Tochtermann, eds.). In Proceedings of the First EuropeanConference on Technology Enhanced Learning - Innovative Approaches forLearning and Knowledge Sharing, Greece, October 1-6, pp. 74-87, 2006.

2. Xavier Ochoa, Kris Cardinaels, Michael Meire, Erik Duval, Methodologicaland Technological Frameworks for the Automatic Indexation of LearningManagement Systems Content into Learning Object Repositories, Proceed-ings of the ED-MEDIA 2005 World Conference on Educational Multimedia,Hypermedia, and Telecommunications, 2005.

3. K. Cardinaels, M. Meire, and E. Duval, Automating Metadata Generation:the Simple Indexing Interface. In Proceedings of the 14th International WorldWide Web Conference (WWW2005), pp. 548-556, 2005.

4. P. Huion, and K. Cardinaels, Your Place or Mine? Parkpleintje.be: Fromprototype to full-fledged community site for the senior citizen. In Proceed-ings of the IADIS International Conference Web Based Communities 2004(Kommers, P. and Isaıas, P. and Baptista Nuna, M., eds.), pp. 378-385, 2004

5. F. Neven, E. Duval, S. Ternier, K. Cardinaels, and P. Vandepitte, An openand flexible indexation- and query tool for Ariadne. In Proceedings of the ED-MEDIA 2003 World Conference on Educational Multimedia, Hypermedia,and Telecommunications (Lassner, D. and McNaught, C., eds.), pp. 107-114,2003

6. K. Cardinaels, and E. Duval, Composite learning objects: exposing the com-ponents, Proceedings of the 3rd Annual ARIADNE Conference (Duval, E.,ed.), pp. 1-7, 2003

7. K. Cardinaels, H. Olivie, and E. Duval, A Classification of Document Reusein Web-based Learning Environments. In Proceedings of ED-MEDIA WorldConference on Educational Multimedia, Hypermedia & Telecommunications,(Barker, P. and Rebelsky, S., eds.), pp. 234-238, 2002

8. K. Cardinaels, E. Duval, and H. Olivie, Issues in Automatic Learning ObjectIndexation. In Proceedings of ED-MEDIA World Conference on EducationalMultimedia, Hypermedia & Telecommunications, (Barker, P. and Rebelsky,S., eds.), pp. 239-240, 2002

9. E. Duval, E. Forte, K. Cardinaels, B. Verhoeven, R. Van Durm, K. Hendrikx,M. Wentland-Forte, N. Ebel, M. Macowicz, K. Warkentyne, and F. Haenni,

BIBLIOGRAPHY 159

The ARIADNE Knowledge Pool System. In Communications of the ACM44 (5), pp. 73-78, May, 2001.

10. B. Verhoeven, K. Cardinaels, R. Van Durm, E. Duval, and H. Olivie, Experi-ences with the ARIADNE pedagogical document repository. In Proceedingsof ED-MEDIA 2001 World Conference on Educational Multimedia, Hyper-media & Telecommunications, (Montgomery, C. and Viteli, J., eds.), pp.1949-1954, 2001

11. R. Van Durm, E. Duval, B. Verhoeven, K. Cardinaels, and H. Olivie, Ex-tending the ARIADNE Web-Based Learning Environment. In Proceedingsof ED-MEDIA 2001 World Conference on Educational Multimedia, Hyper-media & Telecommunications, (Montgomery, C. and Viteli, J., eds.), pp.1932-1937, 2001

12. E. Duval, K. Hendrikx, K. Cardinaels, R. Van Durm, B. Verhoeven, T. Clee-newerck, and H. Olivie, Using Ariadne for Real: the Leuven Experience. InProceedings of the Ariadne Foundation (Forte, E., ed.), vol 1, pp. 73-82, 2001

13. E. Duval, E. Vervaet, B. Verhoeven, K. Hendrikx, K. Cardinaels, H. Olivie,E. Forte, F. Haenni, K. Warkentyne, M. Wentland-Forte, and F. Simillion,Managing Digital Educational Resources with ARIADNE Metadata system.In Journal of Internet Cataloging 3 (1 and 2/3), pp. 145-171, 2000.

14. E. Forte, F. Haenni, K. Warkentyne, E. Duval, K. Cardinaels, E. Vervaet,K. Hendrikx, M. Wentland-Forte, and F. Simillion, Semantic and PedagogicInteroperability Mechanisms in the Ariadne Educational Repository. In ACMSIGMOD Record 28 (1), pp. 20-25, March, 1999.

15. E. Duval, K. Hendrikx, K. Cardinaels, E. Vervaet, R. Van Durm, B. Verho-even, and H. Olivie, Evaluating the Ariadne Core Tools in a Course on Algo-rithms and Data Structures. In Proceedings of Ed-Media99 - World Confer-ence on Educational Multimedia, Hypermedia & Telecommunications (Collis,B. and Oliver, Ron, eds.), pp. 1067-1072, 1999

16. K. Cardinaels, E. Duval, and H. Olivie, High-level database document spec-ifications using XML. In Proceedings of WebNet98 - World Conference ofthe WWW, Internet and Intranet (Maurer, H. and Olson, R.G., eds.), pp.115-120, 1998

17. K. Cardinaels, K. Hendrikx, E. Vervaet, E. Duval, H. Olivie, F. Haenni, K.Warkentyne, M. Wentland-Forte, and E. Forte, A knowledge pool system ofreusable pedagogical elements. In 4th International Conference on ComputerAided Learning and Instruction in Science and Engineering (Alvegard, C.,ed.), pp. 54-62, 1998

160 BIBLIOGRAPHY

Appendix A

The Formal Model,Summary

A.1 Learning Objects and Metadata

• The set of learning objects:

L = {l|l is a learning object} (1)

• Single-valued metadata element (facet):

f : L → codf (2)

• Multi-valued metadata element:

f : L → (codf )n

n ≥ 1 (3)

• Metadata records:

r : L → codr

codr = codpf1× codq

f2× . . .× codx

fn(4)

A.2 Fuzzy Metadata

• Single-valued facet:

f : L → codf × [0, 1] (5)

161

162 The Formal Model, Summary

• Multi-valued facet:

f : L → (codf × [0, 1])n

n > 1 (6)

• Confidence value:

CVf : codf × L → [0, 1]n (7)

A.3 Selecting Facet Values

MGf : L × (codf × [0, 1])n → (codf × [0, 1])m

m ≤ n (8)

Example:

MGf (l, f1(l), f2(l), ..., fn(l)) = fj(l) for that j for whichCVj(f, l) = max

∀k(CVk(f, l))

A.4 Context-awareness

• The set of contexts:

C = {c|c is a context} (9)

• Content-dependent facet (single-valued, multi-valued):

f : L × Cm → (codf × [0, 1])n

n > 0 (10)

• Reified, in the case of content aggregations:

C ⊂ Lf : L × Lm → (CODf × [0, 1])n (11)

A.5 Metadata Propagation 163

A.5 Metadata Propagation

• Accumulated metadata:

f(a) ⊃⋃

j

f(cj)

or∀cj ∈ a : f(cj) ⊂ f(a)

• Additive metadata:

f(a) =∑

j

f(cj)

• Special Relationships:

∀cj ∈ a : f(cj) ≤ f(a)

Documents

A Dynamic Learning Object Life Cycle and its Implications for Automatic Metadata Generation