Semantics in Adaptive and Personalized Services: Methods, Tools and Applications

Manolis Wallace, Ioannis E.Anagnostopoulos, Phivos Mylonas, andMaria Bielikova (Eds.)

Semantics in Adaptive and Personalized Services

Studies in Computational Intelligence,Volume 279

Editor-in-ChiefProf. Janusz KacprzykSystems Research InstitutePolish Academy of Sciencesul. Newelska 601-447 WarsawPolandE-mail: [email protected]

Further volumes of this series can be found on ourhomepage: springer.com

Vol. 258. Leonardo Franco, David A. Elizondo, andJose M. Jerez (Eds.)Constructive Neural Networks, 2009ISBN 978-3-642-04511-0

Vol. 259. Kasthurirangan Gopalakrishnan, Halil Ceylan, andNii O.Attoh-Okine (Eds.)Intelligent and Soft Computing in Infrastructure SystemsEngineering, 2009ISBN 978-3-642-04585-1

Vol. 260. Edward Szczerbicki and Ngoc Thanh Nguyen (Eds.)Smart Information and Knowledge Management, 2009ISBN 978-3-642-04583-7

Vol. 261. Nadia Nedjah, Leandro dos Santos Coelho, andLuiza de Macedo de Mourelle (Eds.)Multi-Objective Swarm Intelligent Systems, 2009ISBN 978-3-642-05164-7

Vol. 262. Jacek Koronacki, Zbigniew W. Ras,Slawomir T.Wierzchon, and Janusz Kacprzyk (Eds.)Advances in Machine Learning I, 2009ISBN 978-3-642-05176-0

Vol. 263. Jacek Koronacki, Zbigniew W. Ras,Slawomir T.Wierzchon, and Janusz Kacprzyk (Eds.)Advances in Machine Learning II, 2009ISBN 978-3-642-05178-4

Vol. 264. Olivier Sigaud and Jan Peters (Eds.)From Motor Learning to InteractionLearning in Robots, 2009ISBN 978-3-642-05180-7

Vol. 265. Zbigniew W. Ras and Li-Shiang Tsay (Eds.)Advances in Intelligent Information Systems, 2009ISBN 978-3-642-05182-1

Vol. 266.Akitoshi Hanazawa, Tsutom Miki,and Keiichi Horio (Eds.)Brain-Inspired Information Technology, 2009ISBN 978-3-642-04024-5

Vol. 267. Ivan Zelinka, Sergej Celikovsky, Hendrik Richter,and Guanrong Chen (Eds.)Evolutionary Algorithms and Chaotic Systems, 2009ISBN 978-3-642-10706-1

Vol. 268. Johann M.Ph. Schumann and Yan Liu (Eds.)Applications of Neural Networks in High Assurance Systems,2009ISBN 978-3-642-10689-7

Vol. 269. Francisco Fernandez de de Vega andErick Cantu-Paz (Eds.)Parallel and Distributed Computational Intelligence, 2009ISBN 978-3-642-10674-3

Vol. 270. Zong Woo GeemRecent Advances In Harmony Search Algorithm, 2009ISBN 978-3-642-04316-1

Vol. 271. Janusz Kacprzyk, Frederick E. Petry, and AdnanYazici (Eds.)Uncertainty Approaches for Spatial Data Modeling andProcessing, 2009ISBN 978-3-642-10662-0

Vol. 272. Carlos A. Coello Coello, Clarisse Dhaenens, andLaetitia Jourdan (Eds.)Advances in Multi-Objective Nature Inspired Computing,2009ISBN 978-3-642-11217-1

Vol. 273. Fatos Xhafa, Santi Caballé,Ajith Abraham,Thanasis Daradoumis, and Angel Alejandro Juan Perez(Eds.)Computational Intelligence for Technology EnhancedLearning, 2010ISBN 978-3-642-11223-2

Vol. 274. Zbigniew W. Ras and Alicja Wieczorkowska (Eds.)Advances in Music Information Retrieval, 2010ISBN 978-3-642-11673-5

Vol. 275. Dilip Kumar Pratihar and Lakhmi C. Jain (Eds.)Intelligent Autonomous Systems, 2010ISBN 978-3-642-11675-9

Vol. 276. Jacek MandziukKnowledge-Free and Learning-Based Methods in IntelligentGame Playing, 2010ISBN 978-3-642-11677-3

Vol. 277. Filippo Spagnolo and Benedetto Di Paola (Eds.)European and Chinese Cognitive Styles and their Impact onTeaching Mathematics, 2010ISBN 978-3-642-11679-7

Vol. 278. Radomir S. Stankovic and Jaakko AstolaFrom Boolean Logic to Switching Circuits and Automata, 2010ISBN 978-3-642-11681-0

Vol. 279. Manolis Wallace, Ioannis E.Anagnostopoulos,Phivos Mylonas, and Maria Bielikova (Eds.)Semantics in Adaptive and Personalized Services, 2010ISBN 978-3-642-11683-4

Manolis Wallace, Ioannis E.Anagnostopoulos,Phivos Mylonas, and Maria Bielikova (Eds.)

Semantics in Adaptive andPersonalized Services

Methods, Tools and Applications

13

Dr. Manolis WallaceDepartment of Computer Scienceand TechnologyUniversity of PeloponneseEnd of Karaiskaki st.22100, TripolisGreece

E-mail: [email protected]

Dr. Ioannis E.AnagnostopoulosUniversity of the AegeanDepartment of Information andCommunication Systems EngineeringKarlovassi, Samos,GR-83 200Greece


Dr. Phivos MylonasNational Technical University of AthensSchool of Electrical & Computer EngineeringDivision of Computer ScienceZographoy Campus, Iroon Polytechneioy 915780,AthensGreece


Prof. Maria BielikovaInstitute of Informatics andSoftware EngineeringFaculty of Informatics andInformation TechnologiesSlovak University ofTechnology in BratislavaIlkovicova 3842 16 Bratislava 4Slovakia

E-mail: bielik@fiit stuba sk

ISBN 978-3-642-11683-4 e-ISBN 978-3-642-11684-1

DOI 10.1007/978-3-642-11684-1

Studies in Computational Intelligence ISSN 1860-949X

Library of Congress Control Number: 2010920317

c© 2010 Springer-Verlag Berlin Heidelberg

This work is subject to copyright. All rights are reserved, whether the whole or partof the material is concerned, specifically the rights of translation, reprinting, reuseof illustrations, recitation, broadcasting, reproduction on microfilm or in any otherway, and storage in data banks. Duplication of this publication or parts thereof ispermitted only under the provisions of the German Copyright Law of September 9,1965, in its current version, and permission for use must always be obtained fromSpringer. Violations are liable to prosecution under the German Copyright Law.

The use of general descriptive names, registered names, trademarks, etc. in thispublication does not imply, even in the absence of a specific statement, that suchnames are exempt from the relevant protective laws and regulations and thereforefree for general use.

Typeset & Cover Design: Scientific Publishing Services Pvt. Ltd., Chennai, India.

Printed in acid-free paper

9 8 7 6 5 4 3 2 1

springer.com

Contents

Semantics in Adaptive and Personalized Services: Methods,Tools and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Manolis Wallace, Ioannis Anagnostopoulos, Phivos Mylonas,Maria Bielikova

Semantic-Enabled Information Access: An Application inthe Electricity Market Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Panos Alexopoulos, Manolis Wallace, Konstantinos Kafentzis,Christoforos Zoumas, Dimitris Askounis

Ontology-Based Profiling and Recommendations for MobileTV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Yannick Naudet, Armen Aghasaryanb, Sabrina Mignon, Yann Toms,Christophe Senot

The USHER System to Generate Semantic PersonalisedMaps for Travellers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49Zekeng Liang, Kraisak Kesorn, Stefan Poslad

Semantic Based Error Avoidance and Correction for VideoStreaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73Christian Spielvogel, Sabina Serbu, Pascal Felber, Peter Kropf

Semantics in the Field of Widgets: A Case Study in PublicTransportation Departure Notifications . . . . . . . . . . . . . . . . . . . . . . 93Alena Kovarova, Lucia Szalayova

An Adaptive Mechanism for Author-Reviewer Matching inOnline Peer Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109Ioannis Giannoukos, Ioanna Lykourentzou, Giorgos Mpardis,Vassilis Nikolopoulos, Vassili Loumos, Eleftherios Kayafas

VI Contents

Towards Emotion Recognition from Speech: Definition,Problems and the Materials of Research . . . . . . . . . . . . . . . . . . . . . 127Christos-Nikolaos Anagnostopoulos, Theodoros Iliou

Health Care Web Information Systems and PersonalizedServices for Assisting Living of Elderly People at NursingHomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145Stefanos Nikolidakis, Dimitrios D. Vergados,Ioannis Anagnostopoulos

Introducing Context-Awareness and Adaptation inTelemedicine Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163Charalampos Doukas, Ilias Maglogiannis, Kostas Karpouzis

Blog Rating as an Iterative Collaborative Process . . . . . . . . . . . 187Malamati Louta, Iraklis Varlamis

Simulation-Based UMTS e-Learning Software . . . . . . . . . . . . . . . 205Florin Sandu, Szilard Cserey

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

Semantics in Adaptive and PersonalizedServices: Methods, Tools and Applications

Manolis Wallace, Ioannis Anagnostopoulos, Phivos Mylonas, and Maria Bielikova

1 Introduction

Semantics in Adaptive and Personalized Services, initially strikes one as a specificand perhaps narrow domain. Yet, a closer examination of the term reveals muchmore.

On one hand there is the issue of semantics. Nowadays, this most often refersto the use of OWL, RDF or some other XML based ontology description languagein order to represent the entities of a problem. Still, semantics may also very wellrefer to the consideration of the meanings and concepts, rather than arithmetic mea-sures, regardless of the representation used. On the other hand, there is the issueof adaptation, i.e. automated re-configuration based on some context. This could bethe network and device context, the application context or the user context; we refer

Manolis WallaceDepartment of Computer Science and Technology, University of Peloponnese,End of Karaiskaki St., 22100, Tripolis, Greecee-mail: [email protected]

Ioannis AnagnostopoulosDepartment of Information and Communication Systems Engineering,University of the Aegean, Karlovassi, Samos, GR-83 200, Greecee-mail: [email protected]

Phivos MylonasImage, Video and Multimedia Systems Laboratory, School of Electrical and Computer Engi-neering, National Technical University of Athens, Zografou Campus, Iroon Polytechneioy 9,Zografou, Greecee-mail: [email protected]

Maria BielikovaInstitute of Informatics and Software Engineering, Faculty of Informatics and InformationTechnologies, Slovak University of Technology in Bratislava Ilkovicova 3,842 16 Bratislava 4, Slovakiae-mail: [email protected]

M. Wallace et al. (eds.): Semantics in Adaptive and Personalized Services, SCI 279, pp. 1–7.springerlink.com c© Springer-Verlag Berlin Heidelberg 2010

[email protected]

[email protected]

[email protected]

[email protected]

2 M. Wallace et al.

to the latter case as personalization. From a different perspective, there is the issueof the point of view from which to examine the topic. There is the point of view oftools, referring to the algorithms and software tools one can use, the point of viewof the methods, referring to the abstract methodologies and best practices one canfollow, as well as the point of view of applications, referring to successful and pio-neering case studies that lead the way in research and innovation. Or at least so wethought.

Based on the above reasoning, we identified key researchers and practitioners ineach of the aforementioned categories and invited them to contribute a correspond-ing work to this book. However, as the authors’ contributions started to arrive, wealso started to realize that although these categories participate in each chapter todifferent degrees, none of them can ever be totally obsolete from them. Moreover, itseems that theory and methods are inherent in the development of tools and applica-tions and inversely the application is also inherent in the motivation and presentationof tools and methods.

As a result, and contrary to what one might expect based on the title, the bookis not partitioned into distinct parts and every chapter simultaneously addresses allthree issues: methods, tools and applications.

Of course the editors’ work is only worth as much as the manuscripts that authorshave trusted in them. We are grateful to all contributors who trusted us with theirworks, regardless of whether those works made the final cut for inclusion in thebook, as well as the reviewers who have greatly assisted in safeguarding the qualityof this volume. Our thanks also go out to all our friends from the SMAP Initia-tive events, as well as to Janusz Kacprzyk, the series editor, and Thomas Ditzinger,senior editor with Springer, for their support.

2 Book Contents

The book consists of 11 more chapters, each one focusing on a different aspect of thetheory and practice of semantics in adaptive and personalized services, as follows.

In “Semantic-Enabled Information Access: An Application in the ElectricityMarket Domain”, prepared by Alexopoulos et al., a novel framework is presentedfor the generation of information retrieval systems. Borrowing the best from a vari-ety of scientific fields related to the processing of knowledge and information, suchas ontologies, case based reasoning and fuzzy systems, this framework is able notonly to represent the inherent uncertainty of real life information, adapt to appli-cation context while at the same time modeling and being able to consider the endusers’ interpretation of what is similar and what is not.

The framework is described via the presentation of the architecture, design anddevelopment of a system that uses it: the electronic library of the Hellenic Trans-mission System Operator S.A. (HTSO), a deployed semantic information accesssystem that provides the public with effective and efficient access to knowledge re-garding the Greek electricity market. Experience gained from the implementation of

Semantics in Adaptive and Personalized Services 3

the system indicates that the framework is particularly effective, while knowledgeelicitation might be the next barrier to target.

In “Ontology-based Profiling and Recommendations for Mobile TV”, preparedby Naudet et al., we focus on the issue of automated recommenders for mobile tele-vision content. In particular, we go beyond conventional systems for personalizedtelevision content selection that merely allow users to specify general preferencesand see an approach that allows matching between content and users along three dis-tinct dimensions: categories or themes of interests, content description, and preciseinterest descriptions defined in an ontology. User interests can be formalized usingone or multiple of those dimensions and can moreover be associated to contextualdata. The computation of user profiles relies on both explicit and implicit profiling,based on incremental learning of interest degrees from content usage.

This ontological formalization, used in conjunction with rules sets and a globalmatchmaking algorithm, has been successfully demonstrated in a mobile recom-mending system for broadcasted TV and Video on Demand, as part of the MOVIESproject. The profiling engine prototype has been implemented in a larger scope ofmultiple content delivery platforms (IPTV/VoD, Web portals, mobile video) wherethe customers can use a diversity of terminals: TV/Set-Top-Box, mobile phone, andlaptop, thus indicating its versatility and broad applicability.

In chapter “The USHER System to Generate Semantic Personalized Maps forTravellers”, by Liang et al., still focusing on mobile users, we turn our attention toGeospatial Information Systems, which have recently emerged as a leading tech-nology for the development of systems storing and delivering spatial information.Under the general umbrella of ontology-based personalized Spatial-Aware Map Ser-vices we see an ontology-based representation of dynamic user preferences inter-linked to a domain model that is able to detect shifts in user interests, the creation ofsharable user markup data governed by an access control matrix and the generationof personalized annotated GIS maps. The presented approach can enable users toset preferences based on their context and user profiles; to customise searching andselecting content; to markup maps in-situ forming a personalized spatial memory.

The framework has been used in the context of the USHER project in order todevelop a system that provides such services for the Queen Mary University of Lon-don (QMUL) Mile End campus and surrounding areas. The prototype applicationhas been used to demonstrate that semantics make it possible for users’ annotationsto be shared when they are relevant; as the authors note, clearly two new frontsopen before us: the automated and context-aware definition of this relevance andthe consideration of privacy issues.

With “Semantic based Error Avoidance and Correction for Video Streaming”, bySpielvogel et al., we see how semantics can play a role in media streaming decisionssuch as whether/how to alter video quality in order to avoid errors or perform errorcorrection, based not only on information regarding the network but also semanticinformation regarding the content itself. In this we assume that the content is avail-able in multiple description coding format, which makes it possible to shift betweendifferent qualities and levels of compression, according to the current network andstream context.

4 M. Wallace et al.

The presented implementation contains a search subsystem that is able to quicklylocate alternative sources for a desired video. Experimental results indicate that thissubsystem is particularly efficient, even in the case of very rare videos, which au-tomatically provides for more options regarding where from and how to implementthe streaming. Additionally, simulation results indicate that the proposed method-ology is capable of producing optimal decision regarding whether to perform erroravoidance, error correction or a combination of the two.

In “Semantics in the Field of Widgets: a Case Study in Public TransportationDeparture Notifications”, prepared by Kovarova and Szalayova, we discuss the uti-lization of semantics in widgets. In this context, semantics can be used to describewidgets’ required input in a more generic form, thus making it possible for a givenwidget to be reusable by different users and in quite different application and envi-ronment contexts. The theory is presented through a helpful running example: Johnwho lives in Bratislava and used a widget in order to quickly and easily acquirebus route information. If John moves, then the widget will need to draw input froma different source and then display information for different bus routes.

The presented approach has actually been implemented as a personalizable wid-get that is linked to a site providing bus route information for the city of Bratislava.Once users have provided the semantic feedback regarding their typical routes thewidget is able to automatically provide them with both conventional route informa-tion and relevant alerts (eg cancelations). It seems that the presented approach isquite efficient and could also be ported in other application domains, such as logis-tics or catering.

In “An Adaptive Mechanism for Author-Reviewer Matching in Online Peer As-sessment” authored by Giannoukos et al., a novel peer matching mechanism in thecontext of adaptation is proposed. This mechanism provides adaptive and person-alized services for performing automatic optimal matching between authors andreviewers, taking into account the feedback provided by the authors who withtheir turn perceived usefulness from the comments received by the reviewers. Themethodological background used is based on feed forward neural networks, and themain scope is to estimate the optimal reviewer set for a specific author. The pro-posed method uses past data to construct author and reviewer user profiles, whichare semantically represented.

In “Towards Emotion Recognition from Speech: Definition, Problems and theMaterials of Research”, authored by Anagnostopoulos and Iliou, we provide someexperiments regarding the problem of emotion recognition from speech, which is ofhigh importance for many applications. Beside the combination of speech process-ing and artificial intelligence techniques, new approaches incorporating linguisticsemantics are discussed, in parallel to classical artificial intelligence techniques thattry to solve the problem addressed. The authors emphasize that such approachescould be applied especially in emotions of people from different cultures, whereone can easily identify the significant role of semantics in linguistic emotion recog-nition.

In “Health Care Web Information Systems and Personalized Services for As-sisting Living of Elderly People at Nursing Homes” the authors Nikolidakis et al.


present a web application that can be used in nursing homes, in order to managethe health care services provided to elder people. The support provided in differenttypes of health services is semantically represented. The proposed architecture canbe used by doctors through PDAs or tablet PCs, in order to collect both personal andclinical information creating in parallel a personalized file record for the hospital-ized persons. This application can also generate a total report in respect to the needsas well as the demographics and population status in nursing homes. Finally, thereis the capability of exchanging semantically annotated information among differentnursing homes.

The chapter entitled “Introducing Context-Awareness and Adaptation in Tele-medicine Systems” authored by Doukas et al., the authors present a context awaremedical content adaptation platform that utilizes semantic content and context repre-sentation. Moreover, by using appropriate reasoning techniques, content adaptationas well as medical image and video transmission is performed only when deter-mined necessary. The mechanism encodes the transmitted data properly accordingto the network availability and quality, in respect to the user preferences and thepatient status. The architecture of the framework is open and does not depend on themonitoring applications used, the underlying networks or any other issues regardingthe employed telemedicine system.

In “Blog Rating as an Iterative Collaborative Process”, authored by Louta andVarlamis, we present an iterative collaborative process to provide a global rating fora set of blogs using local rating information expressed via blogroll and post hyper-links. The rating model is mathematically and semantically formulated, comprisinglocal accumulative blog site rating formation, collaborative local blog site forma-tion, as well as global rating formation. The semantic information attached to eachhyperlink allow bloggers to better describe their intentions behind creating the link,to prioritize affiliated blogs in the blogroll or even to provide topic information forthe pointed posts. The rating mechanism is also adopted to update the local scores,and to employ them in providing collaborative and global scores. An initial exper-imental evaluation shows that the model performs well by ”punishing” spam blogsthat receive many links from a single source and favouring blogs that receive inlinksin a standard basis.

Finally, in “Simulation-based UMTS e-Learning Software” Sandu and Csereydescribe an adaptable educational software, that allows university students or com-pany workers (mainly in the mobile communication field) to learn, understand andstudy the processes, events and flows that appear in typical telecommunication tech-nologies. The software consists of an editor capable of graphically representing se-mantic relations in information nodes and diagrams, based on personalization ofeducational services.

3 Related Work and Relevant Sources

A few years back, we found ourselves working on topics that simultaneously bor-rowed from and linked to semantics, media, personalization, adaptation and other

6 M. Wallace et al.

fields. Of course we were not the only ones; we just did not know who the otherswere and where to look for the sum of the related work performed by them.

Identifying this need, we organized the first meeting of our small informal soci-ety in Athens in 2006 as the 1st International Workshop on Semantic Media Adapta-tion and Personalization (SMAP). Several other such meetings have followed since.Certainly, in the proceedings of these meetings one can find very interesting works,completed, in progress or position statements that are closely related to the scope ofthis book.

• P. Mylonas, M. Wallace, I. Anagnostopoulos (Eds.), Semantic Media Adaptationand Personalization (proceedings of the 4th International Workshop on SemanticMedia Adaptation and Personalization, San Sebastian, Spain), IEEE ComputerSociety, 2009,

• P. Mylonas, M. Wallace, M. Angelides (Eds.), Semantic Media Adaptation andPersonalization (proceedings of the 3rd International Workshop on Semantic Me-dia Adaptation and Personalization, Prague, Czech Republic), IEEE ComputerSociety, 2008,

• P. Mylonas, M. Wallace, M. Angelides (Eds.), Semantic Media Adaptation andPersonalization (proceedings of the 2nd International Workshop on SemanticMedia Adaptation and Personalization, London, UK), IEEE Computer Society,2007,

• P. Mylonas, M. Wallace, M. Angelides (Eds.), Semantic Media Adaptation andPersonalization (proceedings of the 1st International Workshop on Semantic Me-dia Adaptation and Personalization, Athens, Greece), IEEE Computer Society,2006.

Similarly, relevant works are included in the edited volumes

• M. Angelides, P. Mylonas, M. Wallace (Eds.), Advances in Semantic MediaAdaptation and Personalization, Volume 2, CRC Press, 2009,

• M. Wallace, M. Angelides, P. Mylonas (Eds.), Advances in Semantic MediaAdaptation and Personalization, Springer Verlag Studies in Computational In-telligence, Vol. 93, ISBN 978-3-540-76359-8, February 2008

and journal special issues

• P. Mylonas, M. Bielikova, Y. Kompatsiaris, R. Troncy (Eds.), Semantic MediaAdaptation & Personalization, International Journal on Semantic Web and Infor-mation Systems, 2010,

• M. Angelides, P. Mylonas, M. Wallace (Eds.), Semantic Media Adaptation andPersonalization, Multimedia Tools and Applications, Volume 43, Number 3,2009,

• P. Mylonas, Hermann Hellwagner, Pablo Castells, M. Wallace (Eds.), MultimediaSemantics, Adaptation & Personalization, Signal, Image and Video Processing,Volume 2, Number 4, 2008,

• M. Angelides, P. Mylonas, M. Wallace (Eds.), Semantic Media Adaptationand Personalization, ACM/Springer Multimedia Systems Magazine, Volume 13,Number 2, August, 2007


that our society has been regularly producing. The book you are holding is actuallya part of this effort.

Manolis WallaceIoannis Anagnostopoulos

Phivos MylonasMaria Bielikova

Semantic-Enabled Information Access: AnApplication in the Electricity Market Domain

Panos Alexopoulos, Manolis Wallace, Konstantinos Kafentzis,Christoforos Zoumas, and Dimitris Askounis

Abstract. In this chapter we combine theory from ontologies, case base reasoningand fuzzy algebra to construct a novel framework for semantic-enabled informationaccess. This framework is able to provide a comprehensive and effective way forthe development of semantic information retrieval systems aimed to serve specificdomains and operate in under specific contexts. In order to facilitate readers and alsodemonstrate the effectiveness of the proposed framework the theory is presentedthrough a real life application in the electricity market domain.

1 Introduction

In this chapter we describe the development process of the electronic library of theHellenic Transmission System Operator S.A. (HTSO), a deployed semantic infor-mation access system that provides the public with effective and efficient accessto knowledge regarding the Greek electricity market. HTSO is a governmental or-ganization responsible for the management and operation of the Greek electricity

Panos Alexopoulos and Konstantinos KafentzisIMC Technologies S.A., Fokidos 47, 11527, Athens, Greecee-mail: palexopoulos,[email protected]

Manolis WallaceDepartment of Computer Science and Technology, University of Peloponnese,End of Karaiskaki St., 22100, Tripolis, Greecee-mail: [email protected]

Christoforos ZoumasHellenic Transmission System Operator S.A., 22 Asklipiou Str., 14568, Krioneri, Greecee-mail: [email protected]

Dimitris AskounisSchool of Electrical and Computer Engineering, National Technical University of Athens, 9,Iroon Polytechniou str., Zografou 15773, Athens, Greecee-mail: [email protected]


palexopoulos,[email protected]

[email protected]

[email protected]

[email protected]

10 P. Alexopoulos et al.

network and market and in this context it is has the main responsibility for provid-ing relevant to the market information to the public. The electronic library, havingthe form of a knowledge portal that provides access semantic-enabled services suchas search and navigation, serves this purpose.

More specifically, the available knowledge to be accessed comprises a numberof legal and technical documents which, due to their size and the lack of propercross referencing, are difficult for an individual to understand and use. The systemtackles the two problems by enabling the storage and retrieval of decomposed partsof the documents (usually paragraphs) as well as navigation across these parts. Theabove services are semantic-enabled in that the system implements them by utilizingdomain ontological knowledge and relevant reasoning techniques for capturing andinterrelating the parts’ semantic content. This allows for significantly more effectiveinformation retrieval, in terms of results relevance, as well as for more intuitivenavigation across the content through various semantic structures such as concepttaxonomies.

All the semantic characteristics of the system are implemented by means of anovel semantic information retrieval framework that has been developed within ourorganization and which provides a generic but comprehensible and structured wayof building semantic information retrieval systems in any domain. The frameworkdraws upon ideas and techniques from the areas of Case Based Reasoning, Ontolo-gies and Fuzzy Algebra and its basic characteristic is that it enables the knowl-edge engineer to adjust the knowledge representation and reasoning procedure tothe users’ subjective perception of information relevance.

In the rest of the chapter we describe the aforementioned framework and weillustrate its applicability in developing semantic information retrieval systems bydescribing the exact development process that we followed in the case of HTSO’selectronic library.

2 Semantic Information Retrieval Framework

2.1 Introduction

As suggested in the previous section, the core functionality of the deployed system,namely information retrieval (IR), was based on a hybrid semantic IR frameworkthat combines three distinct artificial intelligence reasoning techniques: StructuralCased Based Reasoning, Ontology-Based Reasoning and Fuzzy Algebra. This com-bination was made possible through a fuzzy ontology framework described in ([1])that allows for customized assessment of semantic similarity between ontologicalconcepts.

In the following paragraphs we describe this hybrid approach by discussing howStructural Case Based Reasoning is used for information retrieval, how the use ofontologies within SCBR transforms this retrieval to semantic one and how Fuzzy SetTheory helps with dealing with the inherent fuzziness of the concept of relevance.

Semantic-Enabled Information Access 11

2.2 Structural Case Based Reasoning for IR

The CBR technique originates from Schank’s concept of remindings ([9]) whichstates that when people are thinking they are merely recalling past experiences thatare somehow similar to their current situation. When applied in problem solving,this is translated into trying to solve new problems by comparing them to problemsalready solved ([2], [8], [4]). The underlying assumption is that if two problems aresufficiently similar, then their solutions are probably also similar.

Apart from problem solving, the CBR approach can be successfully applied forbuilding information retrieval systems. In such systems information items are re-garded as cases and they are retrieved according to the similarity between them andthe query. Thus, a key requirement is to define each time a proper similarity mea-sure that will produce the best results. Such a definition is heavily dependent on theapplication domain and on the intended users’ information needs.

In commercial CBR systems there are three main approaches that differ in thesources, materials, and knowledge they use ([4]). These are the textual approach, theconversational approach and the structural approach. In the latter, namely structuralCBR (SCBR), the basic idea is that cases are represented according to a commonstructure called the domain model. In different SCBR systems, this model can be assimple as a flat table or as complex as an object-oriented model.

Applying SCBR to information retrieval, means creating metadata-based descrip-tions of documents (or information objects in general) which are then stored as casesin the case base (see Figure 1). Each description (or characterization) contains a linkto the information object itself. What’s more, the vocabulary used to represent thecases is developed a-priori for the domain at hand and contains the relevant conceptsof the domain that occur in the information object items.

Fig. 1 CBR Representation (Adapted from [6])


When searching for information objects, the query of the user is first transformedto a characterization of a fictional (or ideal) information object (i.e. the informa-tion object which matches best the user’s query). This object, also referred as the“query-case”, is then compared to the cases stored in the system. The comparison isfacilitated through the cases’ characterization and the use of some similarity mea-sure while its results comprise a relevance score assigned to each of the stored cases.Thus, the system is able to retrieve those cases that are most similar (and thereforerelevant) to the query-case.

2.3 SCBR and Ontologies for Semantic IR

The key characteristic of the SCBR approach is that the definition of similaritymeasures is tightly integrated with object-oriented vocabulary representations ([5]).Such representations, however, cannot represent explicit semantics nor can theyperform any kind of reasoning. That makes their use for semantic assessment ofrelevance between cases extremely limited and inefficient. On the other hand, repre-sentation mechanisms with formal semantics afford applications the luxury of auto-mated reasoning. The latter is an important capability when it comes to comparingthe meanings of different cases and determining their similarity and relevance to auser’s query. That’s why the incorporation of formal semantics, by means of ontolo-gies, into CBR systems seems to be the next step in the evolution of these systems.

Ontologies have been developed and investigated for some time in Artificial In-telligence as the main way of facilitating knowledge sharing and reuse. However,only recently has the notion of ontology attracted attention from fields such as in-telligent information integration and retrieval, electronic commerce and knowledgemanagement. This is due to the fact that through ontologies it is possible to annotateinformation sources with machine-processable semantics facilitating thus effectiveand efficient access of them by various software artifacts and agents.

Technically speaking, ontologies are formal descriptions of the entities, relation-ships, and constraints that make a conceptual model. Depending on the expressive-ness and the degree of formality of the underlying representation language, ontolo-gies can range from a simple taxonomic hierarchy of concepts to a logic programutilizing first-order predicate logic, modal logic, or even higher order logics withprobabilities.

Given the above, incorporating formal semantics to SCBR means primarily re-placing the object-oriented vocabulary with an ontology, as shown in figure 2. Theresulting paradigm, namely Ontology-Based CBR, can then be used for more intel-ligent and efficient information retrieval.

More specifically, most ontology-based systems utilize logic-based deductive in-ference while SCBR systems provide a search functionality that makes use of sim-ilarity measures for ranking results according to their utility with respect to a givenquery. In the Ontology-Based CBR paradigm these two types of reasoning are com-bined by defining similarity measures that are tightly integrated with the ontologicalmodel instead. Such a measure has been defined and used in our case.


Fig. 2 Ontology-based CBR

2.4 The Role of Fuzziness

In figure 2, one can see that the object-oriented vocabulary has been actually re-placed by a fuzzy ontology ([7], [10]). The reason for that is that fuzzy logic andfuzzy algebra may be exploited to enhance the power and expressiveness of ontolo-gies, especially when it comes to dealing with the problem of assessment of seman-tic similarity and relevance [11]. Besides, according to Zadeh ([12]), relevance asa concept is fuzzy rather than bivalent as it denotes the degree at which a piece ofinformation is relevant to another piece or a query. And to define fuzzy concepts,what is needed is the conceptual structure of fuzzy set theory where everything is,or is allowed to be, a matter of degree.

2.5 Assessment of Semantic Similarity

As it can be deduced from the previous paragraphs, the most important aspect of ourIR framework regards the assessment of semantic similarity between ontologicalconcepts. This aspect, in our case, is facilitated through the framework describedin ([1]). The basic idea there is that the assessment of semantic relevance shouldbe application-oriented rather than domain-oriented. In other words, in different IRscenarios the same ontological information should be interpreted differently in orderto yield the most appropriate results. And this “different” interpretation is heavilydependent on the actual IR scenario and on the users’ intended information needs.

More specifically, the framework of ([1]) has in its basis a Fuzzy OntologyFramework according to which domain knowledge is modelled as a fuzzy ontology.


This ontology captures both concrete and vague knowledge about the application do-main by defining relevant concepts and fuzzy semantic relations between them.

More formally, a Fuzzy Ontology is a tuple OF = E,R where E is a set ofsemantic entities (or concepts) and R is a set of fuzzy binary semantic relations.Each element of R is a function R : E2→ [0,1].

In particular, R = T,NT where T is the set of taxonomic relations and NT isthe set of non-taxonomic relations. Fuzziness in a taxonomic relation R ∈ T has thefollowing meaning: High values of R(a,b), where a,b ∈ E , imply that b’s meaningapproaches that of a’s while low values suggest that b’s meaning becomes “nar-rower” than that of a’s. On the other hand, a non-taxonomic relation has an ad-hocmeaning defined by the ontology engineer. Fuzziness in this case is needed whensuch a relation represents a concept for which there is no exact definition. In thatcase fuzziness reflects the degree at which the relation can be considered as true.

In any case, the above semantic relations are the primary means for computing thesimilarity between concepts as they (usually) denote some kind of “semantic relat-edness” between them. However, which of these relations, in what way and to whatdegree should participate in the assessment of semantic similarity is application-dependent information which is captured separately from the ontological knowl-edge. This information is modelled by means of the Ontology Application Context(OAC).

OAC is in essence a set of parameters which intend to characterize the expectedrole of the fuzzy ontology in the similarity assessment process and which take dif-ferent values according to the application scenario. More formally, given a fuzzyontology OF = E,T,NT, OACOF

defines:

• how each taxonomic relation R ∈ T should be used for computing similarity be-tween concepts.

• how each non taxonomic relation R ∈ NT should be used for computing similar-ity between concepts.

• how each pair of a relation R1 ∈ T and a relation R2 ∈ NT should be used forcomputing similarity between concepts.

To do that, OAC comprises three different contexts that correspond to each of theabove cases:

• The Taxonomic Relation Application Context which is defined as a function F = fi, i = 1,2 where fi : T→ [−1,1].

• The Non Taxonomic Relation Application Context which is defined as a functionG = gi, i = 1,2 where gi : NT→ [−1,1].

• The Taxonomic - Non Taxonomic Relation Pair Application Context which isdefined as a function H = hi, i = 1,4 where hi : NT→ [−1,1].

The exact meaning of each context is the following:

• If R∈ T and a∈ E then f1(R) is the degree at which all concepts b∈ E for which[Trt(R)](a,b)) = 0 should be considered similar to a.

• If R∈ T and a∈ E then f2(R) is the degree at which all concepts b∈ E for which[Trt(R)]−1(a,b)) = 0 should be considered similar to a.


• If R ∈ NT and a ∈ E then g1(R) is the degree at which all concepts b ∈ E forwhich R(a,b) = 0 should be considered similar to a.

• If R ∈ NT and a ∈ E then g2(R) is the degree at which all concepts b ∈ E forwhich R−1(a,b) = 0, should be considered similar to a.

• If RNT ∈ NT, RT ∈ T and a ∈ E then h1(RNT ,RT ) is the degree at which allconcepts b∈ E for which [RNT t Trt(RT )](a,b) = 0 or [Trt(RT )t RNT ](a,b) = 0should be considered similar to a.

• If RNT ∈NT, RT ∈ T and a ∈ E then h2(RNT ,RT ) is the degree at which all con-cepts b∈E for which [RNT t Trt (RT )−1](a,b) = 0 or [Trt(R−1

T )t RNT ](a,b) = 0should be considered similar to a.

• If RNT ∈ NT, RT ∈ T and a ∈ E then h3(RNT ,RT ) is the degree at which allconcepts b∈ E for which [R−1

NT t Trt(RT )](a,b) = 0 or [Trt(RT )t R−1NT ](a,b) = 0

should be considered similar to a.• If RNT ∈NT, RT ∈ T and a ∈ E then h4(RNT ,RT ) is the degree at which all con-

cepts b∈E for which [R−1NT t Trt (RT )−1](a,b) = 0 or [Trt(R−1

T )t R−1NT ](a,b) = 0

should be considered similar to a.

The values of all the above degrees might range from −1 to 1. A degree of −1denotes that the relation or the pair of relations should not be considered at all inmeasuring similarity. A degree of 1 denotes the exact opposite, namely two conceptsconnected with this relation should be considered identical. Any degree between−1and 1 denotes an intermediate situation.

The utilization of OAC for the application-specific interpretation of the fuzzyontology in the process of the semantic similarity assessment is done through aprocess called ‘‘contextualization”. The formal description of this process has asfollows: Given a fuzzy ontology OF = E,T,NT and a corresponding applicationcontext OACOF

= F,G,H we define the application context operator as follows:

aco(R(a,b), f ) = R(a,b)1− f (R), 0≤ f (R)≤ 1

R(a,b)× (1 + f (R)), −1≤ f (R) < 0(1)

where R ∈ R and f ∈ OACOF. Then we apply this operator to the fuzzy ontology

through the following steps:

1. ∀RT ∈ T we take R′T = aco(Trt(RT ), f1) and R′′T = aco(Trt(RT )−1, f2)2. ∀RNT ∈NT we take R′NT = aco(RNT ,g1) and R′′NT = aco(R−1

NT ,g2)3. ∀RT ∈ T,RNT ∈ NT such that [RT t RNT ] = /0 we take R′T,NT = aco([R′T t

R′NT ],h1)∪ aco([R′′NT t R′T ],h2)∪aco([R′′T t R′NT ],h3)∪aco([R′′NT t R′′T ],h4)4. ∀RT ∈ T,RNT ∈ NT such that [RNT t RT ] = /0 we take R′′T,NT = aco([R′NT t

R′T ],h1)∪ aco([R′T t R′′NT ],h2)∪ aco([R′NT t R′′T ],h3)∪ aco([R′′T t R′′NT ],h4)5. ∀RT1 ,RT2 ∈ T,RNT ∈ NT such that [(RT1 t RNT )t RT2 ] = /0 we take R′T1,NT,T2 =

[R′T1,NT t (R′T2∪R′′T2

)].

In the end of the above procedure we take the fuzzy union of all the resultingrelations and we end up with a fuzzy ontology that comprises a single contextualizedfuzzy relation RC. Then the CBR engine of our system is able to determine the


semantic similarity between any two concepts a,b ∈ E simply by getting the degreeof the relation RC(a,b).

3 Application of the IR Framework in HTSO Case

3.1 Enabling Architecture and Development Methodology

The application of the aforementioned semantic IR framework in the case of HTSOwas facilitated through the architecture of figure 3 which reflects in a natural way thekey aspects of the framework. More specifically, the system consists of a commer-cial CBR engine which provides the necessary for the framework SCBR reasoningfunctionality and of a “Fuzzy Ontology Contextualizer” subsystem which imple-ments the process described in paragraph 2.5 for calculating the semantic similaritybetween concepts of the domain ontology.

Fig. 3 System Architecture


The CBR engine facilitates the definition of metadata for describing and stor-ing documents as cases, the definition of vocabularies for assigning values to thesemetadata and the usage of all these for calculating the similarity between cases. Forthe latter, in particular, the engine is based on the assignment of similarity valuesbetween pairs of vocabulary terms by some domain expert or knowledge engineer.Thus, the engine is able to perform the kind of reasoning that our framework sup-ports by merely using the contextualized fuzzy domain ontology as its vocabulary.

This is made possible by the contextualizer subsystem which takes as input thefuzzy domain ontology and its application context (also represented as an ontol-ogy), performs the reasoning algorithm of paragraph 2.5 and transforms the result-ing fuzzy relation in a compatible to the engine’s vocabulary format. This process isrepeated each time the initial ontology changes.

Thus, given the above architecture, the actual implementation of the frameworkfor HTSO comprised the following steps:

1. We modelled the available documents as cases through a proper metadata schemaand we stored them in the system’s case base.

2. We developed a fuzzy ontology covering the domain of the Electricity Market.3. We used the concepts of the ontology for the semantic annotation of the cases.4. We defined the Ontology Application Context for the specific ontology and the

specific IR scenario and we applied it to the ontology in order to produce itscontextualized version.

3.2 Case Representation

For the first step we took in mind that a basic requirement was that the system’sanswers to the users’ queries should be as detailed as possible, i.e. having the systemreturning whole documents was not an acceptable option. For that, we decided todecompose the documents at a paragraph level and to consider these paragraphs asthe system’s cases.

All the cases were represented by means of a common schema that included,among others, classical metadata such as title, author, language etc. Furthermore,we considered the attribute Thematic Content as the one to be used for the semanticcharacterization of the cases and the corresponding assessment of their semanticsimilarity. The values this attribute could take comprised semantic concepts derivedfrom the HTSO domain ontology.

3.3 Ontology Modelling

In the second step the HTSO domain ontology was developed and according to ourframework it was structured as a fuzzy ontology comprising, in the end, nine cat-egories of concepts, nine taxonomical relations and six non-taxonomical relations,


all relevant to the Electricity Market Domain. The concept categories were to beused for grouping the domain’s concepts according to their abstract meaning andfor identifying in a more intuitive way the various relationships between them.

Fig. 4 HTSO Sample Taxonomies

More specifically, after the knowledge acquisition phase which included contentanalysis and interviews with domain experts, the identified categories were:

• Market Processes• Market Rules• Market Rights & Obligations• Market Information Sources• Market Participants• Market Units & Systems


• Market Services• Market Actions• Market Extents

Each of these categories contained a significant number of concepts with theiroverall number across all categories being about 1800. Furthermore, correspondingfuzzy taxonomical relations per category were defined, all having the semantics offuzzy specialization as described in paragraph 2.5. Figure 4 depicts snapshots fromtwo of these taxonomies, namely processes and rights/obligations. Finally, a numberof non taxonomical relations, each relating concepts from different categories, weredefined. These were:

• participatesInProcess(Participant, Process)• performsAction(Participant, Action)• hasRightOrObligation(Participant, Right & Obligation)• regardsProcess(Rule, Process)• isRelatedToProcess(Extent, Process)• foundInInformationSource(Extent, Information Source)

3.4 Case Semantic Annotation

The semantic annotation of the system’s cases involved the assignment of values tothe “Thematic Content” attribute of each case. These values should be derived fromthe domain ontology’s semantic concepts and should reflect as accurately as possiblethe semantic content of the cases. For that, the assignment was mainly performedby experts who knew well the domain and the content.

This option, though quite demanding and time-consuming because of the largenumber of cases and concepts, was deemed as most appropriate because out of themany semantic terms contained in most of the system’s cases, only a few were ac-tually indicative of the cases’ thematic content. Furthermore, the maximum numberof annotation concepts in each case was not higher than three and that happenedbecause the ontology, in most of the cases, made redundant the annotation of thelatter with all the relevant concepts as the reasoning mechanism was able to inferthis relevance.

3.5 Ontology Contextualization

The final step of the process involved defining the Ontology Application Context.This was initially performed based on the results of the knowledge acquisition pro-cess while the final values of the context’s parameters were determined after atwo-month period of testing the system and receiving feedback from the users. Thevalues of the Taxonomic and the Non Taxonomic Relation Application Contextsare shown in table 1. As far as the Taxonomic - Non Taxonomic Pair ApplicationContext is concerned, the values for all pairs were h1 = 1, h2 = 0, h3 = 1 and h4 = 0.


Table 1 HTSO Taxonomic and Non Taxonomic Relation Application Contexts

Parameter Value Parameter Valuef1(Processes Taxonomy) 1 f2(Processes Taxonomy) -1f1(Extents Taxonomy) 1 f2(Extents Taxonomy) -1f1(Rules Taxonomy) 1 f2(Rules Taxonomy) -1

f1(Rights & Obligations Taxonomy) 1 f2(Rights & Obligations Taxonomy) -1f1(Information Sources Taxonomy) 1 f2(Information Sources Taxonomy) -1

f1(Participants Taxonomy) 1 f2(Participants Taxonomy) -1f1(Services Taxonomy) 1 f2(Services Taxonomy) -1

f1(Units & Systems Taxonomy) 1 f2(Units & Systems Taxonomy) -1f1(Actions Taxonomy) 1 f2(Actions Taxonomy) -1

g1(participatesInProcess) -0.2 g2(participatesInProcess) -1g1(performsAction) -0.2 g2(performsAction) -1

g1(hasRightOrObligation) 0 g2(hasRightOrObligation) -1g1(isRelatedToProcess) -0.6 g2(isRelatedToProcess) -0.95

g1(regardsProcess) -0.5 g2(regardsProcess) -0.7g1(foundInInformationSource) -0.2 g2(foundInInformationSource) -0.8

4 System Deployment and Evaluation

The Electronic Library of HTSO was deployed and made available to the publicin October 15th 2008 through the URL http://emarketinfo.desmie.gr/htso/user (fig-ure 5). The overall system’s implemented features and characteristics can be sum-marized as follows:

• Content and interface available in two languages, English and Greek.• Content retrievable and viewable in HTML and PDF format.• Structural navigation within the documents through their tables of contents.• Semantic navigation by means of the ontology’s taxonomies.• Semantic search through free text queries and filtering criteria.

The deployment of the system was preceded by a two-month period of thoroughtesting and fine-tuning of the framework’s parameters in order to increase the effec-tiveness of the retrieval. During the same period, people from HTSO were trainedin using the system’s administrative tools for content management and semanticannotation as well as for management of the domain ontology. This type of admin-istration is necessary as content is expected to get frequently added in the library andthe need for new ontological concepts describing this content might always arise.

The final evaluation of the system by the people of HTSO, right before its re-lease, yielded satisfaction from their part in terms of the quality of the retrieval andthe navigation capabilities. At the moment this chapter is being prepared, we aredesigning and HTSO plans to implement an end-user feedback mechanism in orderto be able to collect the end-users’ comments on the system’s effectiveness. Thesecomments are expected to be (continually) analyzed by the administrators and used


Fig. 5 HTSO Electronic Library

to further fine tuning of the system’s model; we will most probably be reporting onthese results in some future publication.

5 Summary and Conclusions

In this chapter we described the development process of the electronic library of theHellenic Transmission System Operator, a knowledge portal that utilizes explicitsemantics in order to provide effective access to documents related to the electricitymarket. Through this description, we have presented a novel semantic informationretrieval framework that provides a generic yet comprehensible and structured wayto build semantic information retrieval systems in any domain.

The framework, among others, enables us to take into consideration and modelthe users’ subjective perception of semantic similarity in the context of the specificapplication and domain. This leads to much higher user satisfaction in terms of thesystem’s search effectiveness. Other semantic similarity measures in the literaturefail to address the issue of subjectivity, so in this direction we have stepped onnew ground. Similarly, it is novel and useful that we are now able to define whichcomponents of the ontology and in what way should be used for the informationretrieval process.

During the actual realization of the proposed approach and methodology in thedevelopment of the HTSO system, perhaps the greatest challenge met was that of theknowledge acquisition process and in particular the engagement of the domain ex-perts into it. The reason for this is that explaining to the domain experts the system’sunderlying retrieval mechanism without getting into too technical details proved tobe a harder issue to tackle that what one might expect. It seems that the knowledgeelicitation barrier is still to be overcome.


Clearly, our future work will have to include the definition of a formal and de-tailed methodology through which knowledge engineers will be able to better “ex-ploit” the domain experts in the process of implementing the framework.

References

1. Alexopoulos, P., Wallace, M., Kafentzis, K.: A Fuzzy Ontology Framework for Cus-tomized Assessment of Semantic Similarity. In: 3rd International Workshop on SemanticMedia and Adaptation (SMAP 2008), Prague, Czech Republic, December 15-16 (2008)

2. Aamodt, A., Plaza, E.: Case-based Reasoning: Foundational Issues, methodological vari-ations and systems approaches. AI-Communications 7(1), 39–59 (1994)

3. Abecker, A., Hinkelmann, K., Maus, H., Muller, H.J. (eds.): GeschaftsprozessorientiertesWissensmanagement. Springer, Heidelberg (2002)

4. Bergmann, R., Breen, S., Goker, M., Manago, M., Wess, S.: Developing industrial case-based reasoning applications. LNCS (LNAI), vol. 1612. Springer, Heidelberg (1999)

5. Bergmann, R.: Experience Management: Foundations, Development Methodology, andInternet-Based Applications. LNCS (LNAI), vol. 2432. Springer, Heidelberg (2002)

6. Bergmann, R., Schaaf, M.: Structural Case-Based Reasoning and Ontology-BasedKnowledge Management: A Perfect Match? LNCS (LNAI), vol. 2432. Springer, Hei-delberg (2002); Journal of Universal Computer Science 9(7), 608–626 (2003)

7. Calegari, S., Sanchez, E.: A Fuzzy Ontology-Approach to improve Semantic Informa-tion Retrieval. In: Bobillo, F., da Costa, P.C.G., D’Amato, C., Fanizzi, N., Fung, F.,Lukasiewicz, T., Martin, T., Nickles, M., Peng, Y., Pool, M., Smrz, P., Vojtas, P. (eds.)Proceedings of the Third ISWC Workshop on Uncertainty Reasoning for the SemanticWeb - URSW 2007. CEUR Workshop Proceedings, vol. 327, CEUR-WS.org (2007)

8. Leake, D., Wilson, D.: Case Based Reasoning: Experiences, Lessons & Future Direc-tions. AAAI-Press, Menlo Park (1996)

9. Schank, R.C.: Dynamic Memory: A Theory of Reminding and Learning in Computersand People. Cambridge University Press, Cambridge (1982)

10. Straccia, U.: Towards a Fuzzy Description Logic for the Semantic Web (PreliminaryReport). In: Gomez-Perez, A., Euzenat, J. (eds.) ESWC 2005. LNCS, vol. 3532, pp.167–181. Springer, Heidelberg (2005)

11. Wallace, M.: Ontologies and Soft Computing in Flexible Querying. Control and Cyber-netics 2(38), 481–507 (2009)

12. Zadeh, L.A.: From search engines to question-answering systems the need for new tools.In: Menasalvas, E., Segovia, J., Szczepaniak, P.S. (eds.) AWIC 2003. LNCS (LNAI),vol. 2663, Springer, Heidelberg (2003)

M. Wallace et al. (eds.): Semantics in Adaptive and Personalized Services, SCI 279, pp. 23–48. springerlink.com © Springer-Verlag Berlin Heidelberg 2010

Ontology-Based Profiling and Recommendations for Mobile TV

Yannick Naudet, Armen Aghasaryan, Sabrina Mignon, Yann Toms, and Christophe Senot

Abstract. In this chapter, we present a recommending system that has been devel-oped for filtering TV content provided to users on their mobile devices. This re-commender is fully based on ontologies which are used to formalize both the user and her/his interests, and the audiovisual content. The developed ontologies allow matchmaking between user and content at different levels, based on three means to define user interests: categories, content description, or any combination of concepts defined in an ontology. The computation of user profiles relies on both explicit and implicit profiling, based on incremental learning of interest degrees from content usage.

1 Introduction

In converged mobile broadcast and cellular services, the ability to choose among hundreds of digital broadcast streams and to browse a multitude of IP-based con-tent will necessitate integration of advanced user interfaces capable of personal-ized service discovery. Next generation mobile terminals should be capable of not only displaying available services in the ESG (Electronic Service Guide) but dy-namically filter the content according to user preferences.

This is one of the main objectives of the project “Mobile Video and Interactive Services” (MOVIES1) in the scope of which the work presented in this chapter has been carried out [18]. The cooperation between DVB-H broadcast [6] and 3G mo-bile networks (Fig. 1) allows offering new services, such as interactive TV, per-sonalized services, on demand services, and media access rights protection and Yannick Naudet and Sabrina Mignon Centre de Recherche Public Henri Tudor 29, Av. John F. Kennedy, L-1855 Luxembourg-Kirchberg, Luxembourg e-mail: yannick.naudet,[email protected]

Armen Aghasaryan, Yann Toms, and Christophe Senot Alcatel Lucent Bell Labs Centre de Villarceaux, route de Villejust, 91620 Nozay, France e-mail: armen.aghasaryan,[email protected] [email protected] 1 Project partially funded by EUREKA Celtic initiative.

24 Y. Naudet et al.

security. The specificity of such a converged streaming & broadcast platform relies on the fact that the 3G return channel enables access to a centralized person-alization logic (profiling intelligence and content recommendation) while the broadcast content selection and usage tracking is done on the mobile terminal.

Mobile network Operator

Broadcast network Operator (DVB-H)

Information request

Content Broadcasting

Point-to-point delivery

3G

Cooperative Service Platform

Personalization

Security and Conditional Access

Interactivebroadcast module

Terminal

Interaction software

Content provider

Internet

Fig. 1 Converged DVB-H and 3G architecture.

For a few years now, the Semantic Web [3] and its associated technologies has been gaining interest, and has been used in particular for pushing personalization to a semantic level using ontologies. When classical recommending systems are often based on categories or matchmaking on a limited set of properties of the delivered content, ontology-based recommenders open new possibilities [2][4]. Using on-tologies, for content description or indexing and at the same time for user model-ing, allows a better matching of contents with users in the information filtering process. Concepts used in content and user profiles are formally defined within a common representation framework provided by ontologies. During matchmaking, there is thus no ambiguity between compared terms, leading then to more accurate results and a richer personalization thanks to inference possibilities.

Personalization systems using ontologies for concept disambiguation or reason-ing based on semantic relations have appeared recently. The main interest of those ontologies is to provide a basis for reasoning and to introduce inferred information in the matchmaking process [10][11], be it between user and content or user and user for respectively content-based or collaborative filtering approaches. In the multimedia domain, we can quote [16], in which semantic descriptions for user preferences have been added to MPEG7 and MPEG21. In [2], query semantic re-finement is addressed, based on a set of ontologies for personalized information retrieval in TV multimedia content collections and cultural archives. In [5], mul-timedia content filtering based on ontologies is addressed; weighted concept vec-tors for user profiles and content descriptions are used.

Ontology-Based Profiling and Recommendations for Mobile TV 25

The existing works seem however not to have explored all the possibilities of-fered by the use of ontologies. In particular, the expression of user interests for precise things formalized as a combination of concepts is not discussed. We have conceived a set of ontologies allowing user and Audio/Video content matchmak-ing, along three dimensions: categories or themes of interests, content description, and precise interest description. User interests can be formalized using one or mul-tiple of those dimensions and can moreover be associated to contextual data. This ontological formalization, used in conjunction with rules sets and a global match-making algorithm, has been successfully demonstrated in the mobile recommend-ing system for broadcasted TV and Video on Demand, which we present here.

This chapter is an extension of a preliminary version presented in [13]. In the remainder, section 2 first presents the mobile TV recommender architecture and the implementation of its main components: the profiling and recommending ser-vices. Section 3 presents the set of ontologies we have conceived. In section 4, we present the profiling service and discuss the used approach for explicit and im-plicit profile updating. Section 5 presents the recommending service and its asso-ciated algorithms. In section 6 we illustrate the recommender with two examples that have been tested in a fully integrated environment. Finally, section 7 con-cludes and gives some perspectives.

2 Mobile TV Recommender Architecture

The architecture illustrated in Fig. 2 shows a profiling and personalization solution of mobile TV and video delivery in a converged broadcast and streaming service. A key component of such a service is the ESG delivered to the terminal in the form of an XML file encapsulated in the broadcast channel [1]. ESG lists the Ser-vices and Content available to the user. Besides TV channels, it can describe VoD services, Radio, or Data services (news, weather, stocks, etc). Because consumers cannot afford wasting time finding their way in the large amount of content and services now available from mobile devices, personalized ESG targeting user needs and interests are the next step in mobile applications. Our personalized ESG application proposes both non-filtered and filtered content, which can be done in different ways: ordering, faceted classification according to each interest, and/or categories, etc.

In our architecture, personalization is done the following way. The ESG appli-cation makes a request to the recommendation service transmitting user and ESG identifier. The recommender first retrieves the user profile from the profiling module, together with ontological descriptions of contents from a dedicated know-ledge base. It then processes the data and returns a list of contents matching the user profile, with the associated coefficients and corresponding interest categories.

Additionally, the user is constantly monitored by the profiling service that maintains his profile up-to-date. The profiling service uses two mechanisms: ex-plicit and implicit profiling. In the explicit profiling phase, the user declares some of his interests and non-interests via a web portal. The implicit profiling consists

26 Y. Naudet et al.

in learning and updating the profile from usage traces (log files) which are col-lected in the ESG application, and then, packaged and sent to the profiling service by its corresponding proxy (on the terminal).

Fig. 2 Mobile TV recommender architecture.

Generally, such a profiling service can enable both well-known types of per-sonalization techniques: the content-based approach (e.g. [8]), and the collabora-tive filtering (e.g. [14]). The content-based algorithms look at the ‘similarity’ between the user (profile) and the item (metadata) to recommend, while the col-laborative filtering algorithms recommend the item if it has been appreciated by ‘similar’ users (based on consumption history or ratings). However, in the scope of the current work and to fit the needs of the recommending service, only the content-based approach is used by the profiling service. Therefore, the profiling service also relies on content metadata describing the semantics of the consumed contents and services.

Finally, as can be seen in Fig. 2, a more centralized solution for a pure VoD streaming service is also possible by using the same profiling and recommenda-tion components. In this case, the personalization requests as well as usage traces are received from the VoD portal.

2.1 Implementation

The profiling engine prototype has been implemented in a larger scope of multi-ple content delivery platforms (IPTV/VoD, Web portals, mobile video) where the customers can use a diversity of terminals: TV/Set-Top-Box, mobile phone, and


laptop [15]. The profiling engine allows the building and the querying of profiles through a Web Service/SOAP2 interface. One of the specificities of the imple-mentation in the context of converged streaming and broadcast Mobile TV consists in realizing two profiling proxies that interact with the central profiling service: 1/ the profiling proxy residing in the streaming platform, and 2/ the pro-filing proxy embedded in the mobile terminal (for the case of broadcasted con-tent). The role of both proxies is to capture the content usage data for each user and to transfer them to the profiling service in the right format.

The northbound interface of the profiling service provides an OWL user profile upon request from the recommendation service for a given user identifier. Such a profile can then be directly processed by the recommender. If profiles are pro-vided to the recommender in OWL format, they are however stored in a Data-Base (DB) in the form of <concept, value> pairs. This storage of instances in a rela-tional DB allows a faster access and thus faster processing for the profiling engine. However, the ontologies which define the structure of the profile are not stored in the database but kept as OWL files which brings flexibility by allowing an easy remodeling and assures a good synchronization with the recommender’s model.

The recommending system relies on ontology and rules processing on one hand, and matching algorithms on the other hand. Although it has been designed in a generic way with reusability in mind, it is specialized for the processing of on-tologies we present in the next section. As the whole system is written using the Java language, we have chosen to use the Jena3 semantic web library to manipu-late ontologies and profiles, express rules and make inferences.

During an initialization phase, the ontologies are loaded from their OWL repre-sentation. The engine infers a new knowledge base adding links accounting for proximity between some terms in the ontology set, in particular based on catego-ries similarity, and statically defined rules linking related concepts. This first in-ference phase is required only when ontologies have been modified. Then, once the recommender is requested to filter a set of content for a given user, a second inference phase based on the matchmaking approach begins.

The recommender is built as a service component, accessible from the mobile phone as a SOAP web service taking as a parameter a reference to an ESG. When this parameter is not specified, a VoD mode is selected. When the recommender is queried, it requests the user profile from the profiling service, as well as descrip-tion of contents referred in the ESG, and contextual information. The inference engine of the recommender, fed with a set of rules, is then run and matchmaking between the user profile and content descriptions is performed. The recommenda-tion is returned to the calling application.

The whole processing is done by the recommending service on a server side. On the mobile side, only user and ESG identifiers are used as inputs. This kind of architecture where all the workload is put on the server side is less constraining in our mobile environment, since we do not have to deal with low processing capa-bilities of mobile devices.

2 Simple Object Access Protocol, http://www.w3.org/TR/soap12-part0/ 3 http://jena.sourceforge.net/

28 Y. Naudet et al.

The inference engine used is built on the RETE algorithm [7]. This is an effi-cient pattern matching algorithm for rule systems, but it requires some tuning in its usage in order to obtain good performances. Indeed, the inference engine consid-ers atomic conditions constituting a rule antecedent one after the other, from the top to the bottom. In order to insure good performances, we reordered our rule’s antecedents following this principle: for each rule, atomic conditions should be ordered from top to bottom according to their lowest probability to be matched. Basically, the time needed for examining which facts, i.e RDF triples here, match a condition depends on the number of facts that have to be scanned. It is thus bet-ter to put the most discriminatory condition near to the top.

3 The Set of Ontologies

The ontology set we have conceived comprises ontologies for user, content, con-text, and also categories. Fig. 3 illustrates the relationships between the different ontologies. The user ontology (UO) allows expressing interests which are valid in a given context, hence the link to the context ontology (CXO).

Fig. 3 Ontology set for the mobile recommender.

Those interests can concern a content, as defined in the content ontology (CTO),

or a category of things, which is the core concept of the category ontology (CATO). Contents are described by a specific ontology, which in our case is TV-Anytime4 (TV-A), and are assigned to categories. Finally, there is a specific link to model the fact that a user is situated in a given context at a given time: it is especially impor-tant to consider the context in which the user is to receive some content.

The user ontology is inspired by different user models, among which GUMO (General User Model Ontology) [9]. We have only kept concepts that are relevant for our application case. In the proposed ontology, illustrated in Fig. 4, a User is represented by the Person he is, and his interests, preferences and usage history. 4 http://www.tv-anytime.org/


Fig. 4 The User Ontology.

The user-as-a-person part might in the future be formalized as a separated on-

tology. It comprises in particular:

• A personalia, which is represented by the Person concept, User being a sub-class of it. It is defined with concepts and properties linked to the user him-self: who he is, his demographic information, his working and leisure activi-ties, etc. In particular, the Role concept is also used with the TV-A ontology to specify e.g. actors, or director of a movie. This personalia part can be used e.g. for age-restricted content, to provide the user with content related to his birthplace or sending him a gift at his birthday, to propose content related to his work or leisure, language, etc.

• A list of Abilities, characterizing the physical capabilities of the user. This will be typically used for content alteration regarding handicaps, filtering content the user will not be able to access or understand, etc.

30 Y. Naudet et al.

Then, the remainder of the user ontology comprises:

• The Interests of the user regarding categories of things, content or more spe-cific interests that can be expressed using instances of rdf:Resource.

• The UsageHistory, containing references to instances of contents that were consumed by a user.

Fig. 5 The Content Ontology.

The Content concept is defined in the ontology related to content (CTO), illus-

trated in Fig. 5. Any consumed content is associated to a context, defining the situation in which the content was consumed. The Context concept is defined in the context ontology (CXO). The model proposes a UserGroup class for which interests and usage history can be specified. The groups will be used for applica-tions using collaborative filtering. The Interest concept is used to specify user in-terests as well as non-interests. Interests are associated to a validity period and a level of interest. The latter will be used to weight the interest in the global match-ing process.

An interest can be expressed classically using categories. We have designed a specific ontology for categories (CATO), illustrated in Fig. 6. It is a simple taxon-omy of categories, instances of a main class cato:ThingCategory, linked by a property isSubCategoryOf. Currently, we use categories and sub-categories related to movies, sport, music. For example, in the case of the Movie category, classical concepts such as Action, Adventure or Comedy can be found as sub-categories.


Extensive lists of categories exist, e.g. in GUMO or TV-A, which could be reused if needed as soon as the structure is adapted. Indeed, at the time we write, we have formalized a TV-A category taxonomy structured according to CATO. The main foreseen advantage is a direct correspondence with classification usually used in ESGs.

Another possibility offered by the model is to express interest for specific con-tent characteristics. This is typically what will be exploited when the user con-sumption history is used. Instances of cto:Content are created from the consumed content profiles, summarizing the user's preferences in terms of content.

Last, the user might be interested in specifying more precisely some interest for things or facts, that cannot be expressed using simple categories or through the de-scription of a content: e.g. expressing an interest for someone in a particular role, or for a given place, an event, etc. Such interests can be formalized by specifying directly an instance of rdf:Resource as the subject of interest. This way of express-ing user interest provides enough flexibility to express almost anything that can be said using facts and ontologies.

Fig. 6 An extendible taxonomy of categories.

The content ontology CTO (Fig. 5) is very small and generic so that any existing

rich multimedia content description model can be used to detail the properties of specific kinds of content. In order to deal with TV programs, we have implemented a part of the TV-Anytime standard as an ontology. CTO defines a few common properties for multimedia content. In particular, a Content is linked to categories, instances of the category ontology CATO, associated to a given proportion. A spe-cific property allows linking content instances with a technical description. In the case of the TV-A ontology, this is given by the tvao:BasicContentDescription con-cept, which is the main concept for content description according to the TV-A standard.

Finally, the context ontology CXO, proposes concepts both for defining user mood and feelings, and environmental conditions (e.g. time, location, temperature, noise, etc.). We do not go into more details here as it is not yet used in the current version of the mobile TV recommender. The reader can refer to [12] for more details.

32 Y. Naudet et al.

4 Profiling Service

This service is provided by a multi-platform and application-agnostic profiling en-gine [1] which realizes the automatic learning of each user’s profile and provides an estimation of its interest domains. For this purpose, usage traces from different sources (with different semantics and formats) are collected and analyzed, e.g. the consumption and purchase logs from diverse service delivery platforms (IPTV, Web Portals, or Mobile content) of a large telecommunication operator. The pro-filing engine is application-agnostic, in the sense that it is not tailored for a spe-cific personalized application, but provides a generic intelligent interface with a number of reusable primitives to be used by different personalized applications like targeted ad, content recommenders, or social networking applications. In the scope of this chapter, we focus however on a single content recommendation ap-plication, personalized EPG on mobile TV, although usage traces from different sources are considered: VoD consumption traces (download/purchase logs) and DVB-H broadcast content viewing (duration, channel zapping).

The profiling engine is designed in a way to decouple its internal logic from a particular user profile model and content metadata structure. It can thus be easily applicable to a new profile/metadata structure and semantics. In the next section, we first describe the basic concepts used in the internal data model of the profiling service as well as their mapping with the ontology concepts used by the recom-mending service. Then, the incremental profiling process is highlighted.

4.1 Concepts and Measurable Quantities

Semantic concepts constitute core elements of the profiling engine’s data model. They represent the glue between user profiles and content characteristics. All the entities, in which a user interest can be expressed, are special cases of the Seman-ticConcept class, see the inheritance relation in the diagram of Fig. 7 (right side). In addition to cato:ThingCategory, the profiling engine supports rdf:Resource, and cto:Content that can be specified as interest objects in the UO ontology.

The user profile is basically represented by a set of <concept, value> pairs, where each value is taken from the interval [0,1] and reflects the level of interest in the given (semantic) concept. More generally, the profiling engine manipulates three important classes of objects that allow associating a numerical value to a semantic concept, a mechanism that they inherit from their common ancestor Se-manticQuantity class; see Fig. 7 (left side). These entities are defined as follows:

• Quantity of Affiliation (QoA) characterizes the degree of affiliation of a con-tent item to a given semantic concept. Each content item can be characterized by a set of QoA. For example, the film “Shrek” can be described by Anima-tion = 0.9, Comedy = 0.8.

• Quantity of Consumption (QoC) characterizes the degree of intensity of a consumption act with respect to a given semantic concept. For example, if two users watch “Shrek” for resp. 10 minutes and 1 hour, respectively, it


could be inferred that the second user is more interested in that content (and its semantic concepts Animation and Comedy), than the first one. Thus, each consumption act can be characterized by a set of QoC.

• Quantity of Interest (QoI) characterizes the degree of interest of the user in a given semantic concept. The user profile is composed of a set of QoI.

Note that this model allows each class of semantic quantities QoA, QoC, and QoI to introduce its specific attributes, in addition to a single inherited attribute Value (Fig. 7). For example, an explicitly declared non-interest in a semantic con-cept can be expressed by an additional attribute Non-interest (Boolean) defined in the QoI class. In the next section, however, we focus on the implicit learning and update of its single inherited attribute expressing the interest value5.

Fig. 7 Profile modelling elements and their mapping to ontology concepts.

In the sequel, by abuse of notation, we will refer to semantic quantities QoA, QoC, and QoA to express both the class names and their respective attribute Value.

4.2 Incremental Profiling Process

First of all, the user can declare some of his preferences (interests and non-interests) via a web portal; we call this phase explicit profiling. The data provided by the user at this step are taken as a starting point for the implicit profiling proc-ess that updates the profile data by further analyzing the usage traces. The next stage consists in characterizing each consumption act in terms of values on seman-tic concepts. 5 The non-interest attribute is not updated by the incremental profiling process and therefore

it may contradict the implicitly learned and updated interest value.

34 Y. Naudet et al.

4.2.1 QoC Computation

The user’s consumption act is described in terms of Quantities of Consumption (QoC). Each QoC value gives a normalized measure, QoC ∈ [0,1], of the ob-served user interest for the given semantic concept. This measure is based on the assumption that the longer the user consumes the content, the more interested he is by the subject of the content, or similarly, the more the user pays to watch some content, the more interested he should be for that type of content. Such a measure combining both viewing duration and price can be obtained as follows:

in

in QoA

cQoC ∗+∗=

2)1(τ

, maxτττ act=, maxccc act=

, (4.1)

where i represents the relevant semantic concept, n refers to the consumption event

ordering, actτ is the actual consumption duration and maxτ is the total video dura-

tion, actc corresponds to the paid price for the consumed item, and maxc is the

maximum price of an item in a given domain.

4.2.2 QoI Update

The Quantity of Interest (QoI) represents the estimated value of user interest in a given semantic concept. In the profiling engine, two complementary QoI update functions are used: 1/ consumption event-based QoI learning, and 2/ time-based QoI decay.

The consumption event-based QoI learning function makes the QoI data on user interests evolve by combining their previously known values with their newly observed interest manifestation (QoC). We have considered a particular family of functions where the new QoI is obtained cumulatively by a weighted addition of the newly observed consumption, QoC, with the previous QoI:

( ) in

in

in

in QoCQoIWQoIQoI ∗+=+1 . (4.2)

The weight, given by the function ( ) 1<inQoIW , represents how much the new

observation is influencing the profile evolution. Note that this function should be selected carefully so that the formula (4.2) always produces QoI values inferior to 1. Such a variable weight allows obtaining a “learning curve” behavior, where small interests grow relatively slowly (because of a small weight) and high inter-ests are saturated by the upper limit of one (again, because of a diminishing vari-able weight). Here, the term “learning curve” makes reference to a relationship between the duration of student’s learning period and the gained knowledge or experience. The QoI evolution for a stable consumption pattern represents a sig-moid form as shown in Fig. 8.


The time-based decay function is used to account for the aging of profile data:

)(_ ,1i

kik

ik PQoIDecayQoIQoI =+ . (4.3)

This function is called with a given periodicity, indexed by k, which should be significantly larger than the average time interval between consumption events. For example, it can be called on a monthly or quarterly basis. In order to decide

how to decrease a given QoI, this function can take into account parameters, ikP ,

like the frequency or the recentness of consumption events on that semantic con-cept. So, depending on the consumption frequency, the QoI can be diminished linearly, exponentially, or without decay for a fixed period of non-consumption followed by a decay curve.

As an example, Fig. 8 illustrates a QoI evolution driven both by (4.2) and (4.3). The consumption event-based QoI update (4.2) is pushed by a relatively stable consumption sample on a given semantic concept: a QoC sequence with on aver-age 2 weekly occurrences during 160 days and the QoC values randomly selected in the interval [0, 0.5]. Due to the cumulative nature of the update function (4.2), the QoI approaches its upper limit of 1 after about 30 days of such a stable but moderate consumption. It follows a non-linear sigmoid-like evolution path that re-flects the nature of the variable weight in (4.2).

The time-based decay (4.3) is calculated on the basis of the consumption fre-quency (the number of days per month when consumption on a given semantic concept is registered). The decay function is applied every 3 weeks. The higher the consumption frequency, the less the decay applied on the QoI. Note that the decay function causes small step-wise perturbations on the QoI evolution curve without having a significant impact, except when the consumption frequency is constantly diminishing (i.e. beyond the period of first 160 days).

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200

time (day)

QoI

Freq

QoC

Fig. 8 A sigmoid QoI evolution curve given a temporal sequence of QoC.

36 Y. Naudet et al.

4.2.3 Profile Query

The profile query interface enables different personalization systems (e.g. targeted advertisement, content recommender, community-based applications) by giving them access through Web Services to user profiles and by providing some reus-able profile exploitation tools (distance computation, access to different views, etc). In the framework of the project for which the work reported here has been achieved, the profile query interface is in charge of mapping the user profile data into the User Ontology. Upon a request from the recommendation service the query interface provides the user’s profile in the form of ontology instances ex-pressed in OWL, i.e. UO instances in the integrated system. The QoI values are mapped into the interestLevel attribute of the uo:Interest class.

4.3 Privacy Enhancement and Explicit Profiling

The privacy of the end-user is a fundamental element to take into account when designing a personalization system which heavily relies on user’s profile knowl-edge. First, the service provider must ensure the compliance with the legal privacy rules in each country where the solution is deployed. Not the least is the user ac-ceptance a major issue as the user profiling can easily be perceived by end-users as a threat and an intrusion into their private life. Furthermore, the user should be provided with a comprehensive interface for setting his privacy options. Our ap-proach encompasses two aspects of user privacy protection:

1. Configuration of the intrusiveness level in the profiling process, i.e. which usage sources can be used and what kind of processing is allowed.

2. Access control of user profile data, i.e. which personalized applications can access to the (part of) profile data and under which circumstances.

Privacy is handled by using high-level privacy policy rules. A part of these rules is introduced by the service provider in order to define its global profiling and personalization strategy in conformance with the acting legislation. For exam-ple, among the requirements put forward by the European Union [17] there are three core principles.

• Transparency: The user has the right to be informed about the purpose of the processing of his personal data, the recipients of the data, and all other infor-mation required to ensure the processing is fair.

• Legitimate purpose: Personal data can only be processed for specified, ex-plicit and legitimate purposes and may not be processed further in a way in-compatible with those purposes.

• Proportionality: Personal data may be processed so far as it is adequate, rele-vant and not excessive in relation to the purposes for which they are collected and/or further processed.

The remaining part of the privacy policy rules is introduced by each user in or-der to tune his personal privacy preferences. For example, the user can specify the


types of services (mobile video streaming or broadcast, web browsing, IPTV, etc.) and the types of usage traces (viewing history, payments, interactivity, ESG navi-gation, etc.) that can be used for his profiling. The user can also identify the tem-poral and geographical contexts in which the profiling is activated (period of the day, location). In that sense, the filtering of traces directly at the user’s terminal level brings an additional protection to the user by avoiding the transmission of personal data onto the network.

On the other hand, with respect to the access to his profile data, a user can de-fine variable restrictions depending on the type of personalized application. For example, if one does not want to receive targeted advertisements based on some of his video interests, this part of the profile must be hidden to the targeted adver-tisements applications, but can be available for a recommender system. The re-strictions can concern also the granularity of the information made available in a given domain of user interests (e.g. the maximal visible depth in the taxonomy tree).

The graphical user interface for privacy management includes a multi-level opt-in mechanism, where at his first connection the user is asked to opt for each cate-gory of personalized applications (personalized EPG, VoD Recommender, targeted Ad, etc.). Then, the user is offered an option to configure more detailed privacy op-tions if he wishes to do so.

As an important part of the user interface for privacy management, we include also the explicit profiling feature: a read and write access of the user to his profile data. It not only allows to initialize the system, but also to rectify the learned pro-file. In the latter case, the profiling process continues with the new current profile modified explicitly by the user in the same way as in the initialization phase.

5 Recommending Service

The recommending service is called, from the client terminal, by a mobile ESG application. Giving a client identifier, the corresponding user profile instantiated from UO is retrieved from the profiling service through a web service call. The matchmaker module then computes a matching value for each content referred in the ESG program. This is done by comparing each content description, instanti-ated from CTO, with the user profile. Profiles containing few information are suf-ficient to get a recommendation, but obviously the more complete a profile, the more accurate will be the proposed contents. Some content can match a user in two ways: when the content matches some properties of the user profile (i.e. the user itself), or when it matches a user’s interest or non-interest.

The processing of content descriptions is performed in two steps. First an infer-ence phase using a set of rules, resulting in the filtering of unwanted content and in an inferred fact base that will be used in the second phase. Second, a match-making phase, during which, matching levels corresponding to the user’s interest level for the contents are computed. These two steps are detailed in the next two sections.

38 Y. Naudet et al.

5.1 Inference Rules

Associated to the set of ontologies is a set of inference rules allowing to deduce links between concepts used in profiles. Rules are classified in three types: a) inference rules, allowing to define the behavior linked to concepts defined in ontologies; b) fil-tering rules, allowing to exclude some contents from the processing; and c) interest creation rules, allowing to build new interests from the user profile. The first ones are necessary complements to ontologies and are an integral part of them. The others are specific to the applicative context and are exploited during the personalization process.

The following rule, expressed using the syntax of the Jena rule language, is an illustration of the first kind of rule. It formalizes the fact that a user, able to speak and having a specific mother tongue, is able to speak in the corresponding lan-guage. Note that the “built-in” mechanism of the Jena API is used to call functions in rules.

[speakAbility: (?user uo:nativeLanguage ?language), (?user uo:isAbleTo ?talkAbility), (?talkAbility rdf:type muo:Talk), (?talkAbility uo:level ?talkLevel), greaterThan(?talkLevel, 0.5), uriConcat(?user, '_speak_', ?language, ?speakAbility) -> (?user uo:isAbleTo ?speakAbility), (?speakAbility rdf:type uo:Speak), (?speakAbility uo:language ?language), (?speakAbility uo:level ?talkLevel)]

The next rule example is a filtering rule, allowing to pre-filter contents accord-

ing to the user’s age and the parental guidance associated to these contents: [parentalGuidance: (?user uo:age ?userAge), (?contentDesc rdf:type tvao:BasicContentDescription), (?content cto:isDescribedBy ?contentDesc), (?contentDesc tvao:parentalGuidance ?parentalGuidance), (?parentalGuidance tvao:minimumAge ?minimumAge), lessThan(?userAge, ?minimumAge) -> (?user, uo:wontBeInterestedIn, ?content)]

In this rule, a new fact is created using the uo:wontBeInterestedIn property speci-fying that the content has to be removed from the list to process, before the mat-chmaking phase. In this approach, the contents to be filtered are annotated before being actually removed all together in one step. Another possible method would be to use the reactive rule principle by calling a built-in function (see [11]). This option has been left apart since it is more time consuming.

In order to illustrate the last type, the following rule creates an interest for the “music” category when the user indicated in his profile that the “musician” activ-ity is part of his leisure:


[musician: (?musician rdf:type uo:Musician), (?musician uo:isLeisureActivityOf ?user), uriConcat(?user, '_music_interest_generated', ?musicInterest) -> (?user uo:isInterestedIn ?musicInterest), (?musicInterest rdf:type uo:Interest), (?musicInterest uo:hasSubject muo:music), (?musicInterest uo:interestLevel 0.5)]

In this case, an initial interest level is attributed to the new interest. It is then up-dated depending on the user consumption by the profiling service.

All these rules are exploited during several phases of the recommendation process, as shown in Fig. 9.

Fig. 9 Recommendation Process

First, inference rules are used to infer on the user profile and deduce a set of facts depending only on his profile. This phase is performed only when the user has modified his profile or when ontologies have changed. When the user asks for a recommendation, a new inference is launched on a set of contents to process, based on filtering and interest creation rules. This phase generates a set of pre-filtered contents, as well as a set of interests extending the user profile. In a last phase, matching levels are computed in order to finally provide resulting recom-mendations.

5.2 Matchmaking Approach

The second step in the recommendation process consists in performing matchmak-ing between the user’s (non-)interests and content descriptions. For an interest I of a user U, we express the matching MI ∈ [0, 1] between I and a content C as:

∑=

=))((

1

),)(())((

)(),(ISnb

iiSI

ctx CISMISnb

MICIMI α , (5.1)

where the interest level of I for U is written α(I) ∈ [0, 1]. Mctx ∈ [0, 1] is a matching function for contexts comparing the validity context of the interest I,

40 Y. Naudet et al.

ctx(I), and ctx(t) the context in which U is located at a given time t. α(I) is speci-fied in an instance of the uo:Interest class with the interestLevel property. Mctx can be a binary function or consider context’s proximity depending on the application domain. It is not used for the moment in the mobile TV application. S(I) denotes a subject of I, as defined in the user ontology UO. As we consider all subjects with an equal importance, MI is calculated based on the average of matchings MSI(S(I)i, C) ∈ [0,1]. According to UO, an interest subject can be either a category S(I) = catI, a content S(I) = contI , or any resource S(I) = resI in the RDF sense.

5.2.1 Categories Matching

For categories, Mcat = MSI(catI, C) is a function of catI and of all the categories cat(C) of C. Let mcat = [mcat1, ..., mcatn], be a vectorial function whose elements are the individual matchings mcati = mcat(catI, cat(C)i) between catI and a cate-gory cat(C)i of C, n being their total number. We can write Mcat = f(mcat). Differ-ent parameters will influence the calculus of Mcat: the maximum individual match-ing mcati, the mean and variance of mcat, the number of categories n. We have chosen to consider the maximum, increased by a delta proportional to the mean, which brings coherent results according to our tests. Let’s call this function maxM, which we will reuse several times, defined for a vector x of dimension n as:

∑ ≠=−−+= n

xmaxxi ii

xn

xmaxxmaxmaxM

,1)(

1

)1()(x , (5.2)

where )(max 1 ini xxmax == . For category matching, we have:

)maxM(M catcat mcat×= μ , (5.3)

with mcati = ϕi ∗ Sim(catI, cat(C)i). μcat is a tuning coefficient allowing to weight the importance of categories matching in the global matching calculus. According to the content ontology, a content can be linked to a category in two ways. For its main category, ϕ = 1, while for partial categories ϕ is the value specified by the proportion property. Sim is a generic similarity function for categories which is detailed below.

The matching between two categories is calculated based on the similarity of their direct super-categories and on the levels of these two categories in a category

taxonomy. The matching function ),,,(2121j

ci

c PPccsc computes the similarity be-

tween categories c1 and c2 considering one of their super-categories icP

1and j

cP2

.

It is a recursive function defined as:

),min(),(),,,(21212121 cc

jc

ic

jc

ic levlevPPSimPPccsc ××= λ , (5.4)

with levci the level of ci (for instance, category ”Sport” is of level 2, and

category ”Fighting sport” is of level 3), and 2

1

lev

lev

max

max −=λ , with maxlev the


maximum level a category can have; i.e. the deep of the category taxonomy. Let

( ) ( )),,,(2121,j

ci

cji PPccscmsc ==2c,1cscM , 1 ≤ i ≤ m, 1 ≤ j ≤ n be the matrix of

functions sc for c1 and c2, m and n being the number of parents respectively of c1

and c2. Let scmax(c1, c2) be the vector:

( )( )⎪⎩

⎪⎨⎧ ≤

=≤≤=

≤≤=

otherwisemsc

mnifmsc

mijinj

njjimi

1,1

1,1

)(max

)()(maxscmax . (5.5)

The similarity level between two categories c1 and c2 is given by

)),(maxM(),( 2121 ccccSim scmax= , which finally gives:

)))C(cat,cat(maxM(mcat iI

ii scmax×= ϕ , (5.6)

the final matching value between catI and cat(C)i being obtained by calculating Mcat as defined previously.

5.2.2 Resource and Content Matching

For resources, Mres = MSI(resI, C) is a function of resI and of all resources res(C) describing C. This similarity function represents proximity in the domain the re-sources belongs to: how much two entities of a same kind are similar? As content descriptions are all stored in a same knowledge base, RDF triples concerning each content must first be retrieved in order to be compared with resI. We define resC = (res(C)i)1≤i≤n the vector of resources describing C and relevant for the matchmak-ing, as the set of all resources r verifying:

)()()(:),,()(

,

DOMprosKBopstcr

rC

C

∈∧=∧∈∈=∃∨=∈∀

res

res (5.7)

where t is an RDF triple having for subject s, predicate p and object o and belong-ing to the knowledge base KB containing the content descriptions; c denotes the instance of content C in KB and DOM is the set of properties defined in the appli-cation domain (here, in our set of ontologies). The matching function for resources Mres ∈ [0, 1] is then defined as:

)),((max 1Ci

Iniresres resresmresM =×= μ , (5.8)

where μres is a tuning coefficient defining the importance attached to resource matching and mres ∈ [0, 1] is a recursive function computing the similarity be-tween two RDF resources, defined as:

42 Y. Naudet et al.

⎪⎪⎪

⎩

⎪⎪⎪

⎨

⎧

∈∧≠

=

=

∑∈∧∈

∈KBopresKBopres

ooresPp

oomresresP

Lresresresif

resresif

resresmres

),,(),,(:,),(

211

121

21

21

2111

211

),()(dim

1

0

1

),( , (5.9)

where P(res) is the set of properties p ∈DOM such that ∃(res,p,o) ∈ KB ; and L is the set of literals. The “=” operator on RDF resources is defined as follows. If both resources are literals, the comparison is performed on values, including conversion if types are different. For instance, "5.0"^^<http://www.w3.org/2001/XMLSche-ma#double> will be equal to "5"^^http://www.w3.org/2001/XMLSchema#string. Otherwise, URIs are directly compared.

When the subject of interest is a content, the matchmaking comes to comparing two instances of the class cto:Content, which is achieved using the same approach as for resources. The function Mcont is written as a specialization of Mres:

),(),( CcontMCcontMM Ires

ISIcont == , (5.10)

contI being the content that is a subject of I. By default, the weighting coefficient is the same, μcont = μres, but stays however independent.

5.2.3 Global Matching

For all contents having gone through the filtering step, the global matching value M(U,C) ∈ [0, 1] of a content C for a user U is computed over the set of all inter-ests and non-interests expressed in the user profile, as:

)),(),,((),( CUCUCUM NII MMΓ= , (5.11)

where MI = [MI1, ...,MIl] and MNI = [MNI1, ...,MNIm] are vectors containing matching values for respectively interests and non-interests calculated according to the formula for MI(I,C) defined in the preceding sections. The function Γ is chosen according to the desired behavior of the recommender. A default choice will be proposed to the user, but he will always be given the possibility to modify this choice in order to better reflect his expectations. We have identified two bor-derline cases that we take as a basis to determine the function Γ: a) the number of non-interests (resp. interests) exceeds the number of interests (resp. non-interests), whereas the average matching level of interests (resp. non-interests) is greater; b) the average matching level of interests and non-interests are equivalent, leading to an average matching level near zero.


If we consider that the average matching level must prevail over the number of interests (resp. non-interests), we may use the following function:

∑∑==

−=Γm

jj

l

ii CUMNI

mCUMI

l 111 ),(

1),(

1. (5.12)

On the other hand, if we consider that the number of interests is important, the fol-lowing function will be more appropriate:

⎟⎟⎠

⎞⎜⎜⎝

⎛−

+=Γ ∑∑

==

m

jj

l

ii CUMNICUMI

ml 112 ),(),(

1. (5.13)

Choosing between using Γ1 or Γ2 determines the behavior of the recommender for the first case. In the case matching levels of zero should be avoided when interests and non-interests have been defined (case b), Γ2 may be used. In order to obtain a more representative result, we may also give priority to the interest or non-interest having the maximum matching level, which leads to the following function:

( )),(),(1

23 CUCUdmax Γ×+×+

=Γ γβγβ

, (5.14)

where ),(max),(max),( CUCUCUdmax NII MM −= ; and β, γ are two

coefficient empirically chosen.

6 Experimentation

The integrated system has been successfully demonstrated during the final review of the MOVIES project. We present in this section two scenarios that have been tested, and finally discuss the global efficiency of the approach.

6.1 Illustrative Examples

Different use-cases have been tested and have helped enhancing the ontologies as well as the whole profiling and recommending process. One of them is the case of Billy (Fig. 10), a 20-year old boy mainly interested in adventure and action films; in addition he has some other less important interests (e.g. rock).

Each time he watches a new movie, his profile evolves accordingly. Fig. 10 shows the profile evolution from the beginning of the observation (t=n). In our scenario, Billy first watches (t=n+1) James Bond “Die Another Day” character-ized with action, adventure and automobile racing categories; then (t=n+2), “The Fast and the Furious” characterized with crime, action and automobile racing categories. These two consumption lead to the reinforcement of the action cate-gory that was already present in the profile. Additionally, the repeated consump-tion of the automobile racing category brings in this new interest in Billy’s profile.

44 Y. Naudet et al.

Profile evolution (selected categories)

0

0,2

0,4

0,6

0,8

1

consumption history

inte

rest

lev

el

Action 0,75 0,81 0,83

Adventure 0,75 0,81 0,81

Rock 0,08 0,08 0,08

AutoRacing 3,051,00

Crime 1,000

2+n1+nn

Fig. 10 Use Case 1.

When Billy opens the ESG on his mobile phone and asks for a recommendation, the system suggests a motor sport related documentary: a reporting on stock cars on one of the channels is currently playing…

Another use-case shows a more elaborated situation for the recommendation service. According to his profile data, John is a fan of wrestling and jazz, and dis-likes westerns. As a jazz fan, he particularly appreciates Charlie Parker. Strangely, he also likes films directed by Clint Eastwood, except westerns. The modelling of these interests is partly illustrated in Fig. 11.

To better illustrate the behavior of the recommender, we consider a video-on- demand case, where lots of videos are available. However, the behavior and re-sults would have been similar for the same contents proposed in an ESG. In the content database, two westerns are present among others: “Unforgiven” and “Il buono, il bruto, il cattivo”. Except for the language, which would have been a dis-criminating factor, these ones are given a low matching value because of the


Fig. 11 Use Case 2.

user’s non-interest. For the second one, Clint Eastwood is actor of the film, not di-rector. Thus the movie is still considered not being relevant. The interesting case is the one of “Million dollar baby”, which is directed by Clint Eastwood and is about boxing. The first property directly corresponds to one of the user’s interests. The second one corresponds to a category which is close to wrestling in our cate-gory taxonomy: they have the same parent category (fightingSport). Hence, the movie’s matching score is increased. Another interesting case is the one of “Bird”, which is somehow related to jazz (with Charly Parker) and is also directed by Clint Eastwood. Because it corresponds to multiple interests of the user, this mov-ie is logically rated better than the others. In this scenario, we have demonstrated the use of interests and non-interests, the matching for categories (Jazz, Wrestling, and Western) and the matching for resources (Charly Parker_Jazzman and Clint Eastwood_Director). Table 1 shows the results obtained with the different func-tions Γ presented in the previous section. The results are coherent with the user’s interests. The case of “Unforgiven” illustrates the difference between the three

Table 1 Sample matching values obtained with the different Γ functions.

Γ1 Γ2 Γ3

Million Dollar Baby 0,49 0,32 0,61

Letters from Iwo Jima 0,35 0,23 0,58

Bird 0,35 0,23 0,58

Mystic River 0,35 0,23 0,58

Unforgiven -0,05 0,1 0,25

Il buono, il brutto, il cattivo -0,45 -0,22 -0,39

46 Y. Naudet et al.

functions. Indeed, it is a western directed by Clint Eastwood, which corresponds to both an interest and a non-interest in John’s profile. In this case, Γ1 gives a val-ue near 0 as the total weight of considered interests and that of non-interests are balancing each others; Γ2 gives a positive value because the number of interests is more important (the movie is also categorized in “drama”); and Γ3 brings a better value as it considers both the number and maximal matching values.

Finally, if the user chooses to follow the system recommendation and consumes the first movie “Million Dollar Baby”, it will reinforce the concept Clint Eastwood Director and create new concepts such as boxing, Clint Eastwood Actor, etc.

6.2 Efficiency

Efficiency of the recommendations has been currently measured, for interests tar-geting categories, on a test set constituted by contents referred in IMDB com-pleted by fake content descriptions to have a good balance between movies and other kinds of contents. In this case, the obtained precision/recall values are almost perfect. Some false negatives appear when a same subcategory appears in different independent categories hierarchies. At the level of categories, the relevance of fil-tering only depends on the structure of the category taxonomy, which then must be carefully designed. The influence of multiple and potentially conflicting interests is hardly quantifiable and has not been measured yet. Finally, since we do not cur-rently consider approximate matchmaking, the matching with content or resource related interests, as well as with the contextual data, solely relies on the presence of corresponding statements in the content description and is thus only linked to the efficiency of the inference.

To validate the efficiency of the profiling service (i.e. the relevance of learned user interests) a more sophisticated usage data base is needed comprising a meas-urement of the user consumption intensity. Such a study is underway with con-sumption data provided by BARB6 for TV consumers.

7 Conclusion and Perspectives

We have been able to empirically demonstrate the potential of a complete profil-ing and recommending system based on incremental profile learning and semantic web technologies, for mobile TV. This system has been tested both in the case of VoD and broadcasted audiovisual content, on devices able to use 3G, WiFi and DVB-H. The implementation is modular and web-based, thus avoiding some limi-tations inherent to mobile devices.

The recommendation system we have experienced is in a stage that still can be enhanced. Results are very promising when considering user profiles concerning categories and also any ontology concept describing content or any entity. The system is currently extended in a research project to other application cases, ex-ploiting e.g. additional linked content, targeted advertisements, or communities of

6 Broadcasters Audience Research Board Ltd.


users. In further research works, we intend to exploit the context-awareness capa-bilities of our ontologies, which we did not discuss here. This will indeed be par-ticularly important in mobile environments.

From the profiling service perspective, there are several dimensions for exten-sion of the current solution: elaboration of adaptive decay functions taking into account consumption patterns for each content category, learning of non-interests and their integration within the incremental profiling process, multi-scale profile evolution approaches differentiating the long-term and short terms interests, ex-tension of the current approach taking into account the statistical correlations be-tween different semantic concepts, etc. Finally, a major next step studied currently is the learning of user community profiles for enabling peer-to-peer content shar-ing or community-based applications on mobiles.

References

[1] Aghasaryan, Betgé-Brezetz, S., Senot, C., Toms, Y.: A Profiling Engine for Con-verged Service Delivery Platforms. Bell Labs Technical Journal’s Summer 2008 issue on Applications and their Enablers in a Converged Communications World 13(2) (2008)

[2] Aroyo, L., Bellekens, P., Björkman, M., Houben, G.-J.: Semantic-based framework for personalized ambient media. Multimedia Tools and Applications 36(1-2), 71–87 (2008)

[3] Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American Magazine (2001)

[4] Buriano, L., Marchetti, M., Carmagnola, F., Cena, F., Gena, C., Torre, I.: The Role of Ontologies in Context-Aware Recommender Systems. In: Proc. of 7th International Conference on Mobile Data Management (MDM 2006), May 10-12, p. 80 (2006)

[5] Cantador, M., Fernández, D., Vallet, P., Castells, J.: A Multi-Purpose Ontology-Based Approach for Personalized Content Filtering and Retrieval. In: Advances in Semantic Media Adaptation and Personalization. Studies in Computational Intelli-gence series, vol. 93. Springer, Heidelberg (2008)

[6] DVB-CBMS A099, IP Datacast over DVB-H: Electronic Service Guide (ESG) (No-vember 2005)

[7] Forgy, Rete: A fast algorithm for the many pattern/many object pattern match prob-lem. Artificial Intelligence 19(1), 17–37 (1982)

[8] Germanakos, P., Mourlas, C.: Adaptation and Personalization of Web-based Multi-media Content. In: Proc. of the Workshop on Personalization for e-Health of the 10th International Conference on User Modeling (UM 2005), Edinburgh, July 29, pp. 67–70 (2005)

[9] Heckmann, D., Schwartz, T., Brandherm, B., Schmitz, M., von Wilamowitz-Moellendorff, M.: GUMO - the general user model ontology. In: Ardissono, L., Brna, P., Mitrović, A. (eds.) UM 2005. LNCS (LNAI), vol. 3538, pp. 428–432. Springer, Heidelberg (2005)

[10] Krunoslav, T., Alisa, D., Gordan, J., Mario, K., Sasa, D.: Semantic Matchmaking of Advanced Personalized Mobile Services using Intelligent Agents. In: 12th Confer-ence on Software, Telecommunications and Computer Networks SoftCOM (2004)

48 Y. Naudet et al.

[11] Li, L., Horrocks, I.: A software framework for matchmaking based on semantic web technology. In: Proceedings of the Twelfth International World Wide Web Confer-ence (2003)

[12] Mignon, S., Groues, V., Naudet, Y.: Advanced Personalisation by Ontologies: Audiovisual Content Filtering on Mobile Devices. In: Proc. of JFO 2008, Lyon, France, December 1-3 (2008)

[13] Naudet, Y., Aghasaryan, A., Toms, Y., Senot, C.: An Ontology-based Profiling and Recommending System for Mobile TV. In: Proc. of the 3rd International Workshop on Semantic Media Adaptation and Personalization (SMAP 2008), Prague, Czech Republic, December 15-16, pp. 94–99. IEEE Computer Society Publishers, Los Alamitos (2008)

[14] Pampapathi, R., Mirkin, B., Levene, M.: A Review of the Technologies and Methods in Profiling and Profile Classification. EPALS (2005)

[15] Senot, Y.T., Aghasaryan, A., Betgé-Brezetz, S.: Multi-Platform User and Usage Pro-filing Demonstration: Video-on-Demand Services in IPTV and Mobile Video envi-ronments. Demonstration Paper at User Modeling 2007, Athens, Greece (2007), http://www.iit.demokritos.gr/um2007/ UM2007-Demos-Leaflet.pdf

[16] Tsinaraki, C., Christodoulakis, S.: A multimedia user preference model that supports semantics and its application to mpeg 7/21. In: 12th Int. Conf. on Multi Media Mod-eling (MMM 2006), Beijing, China, January 4-6 (2006)

[17] European Parliament and Council Directive 95/46/EC of 24, on the Protection of In-dividuals with regard to the processing of Personal Data and on the free movement of such Data, Official journal L281 of 23.11 (October 1995),

http://ec.europa.eu/justice_home/fsj/privacy/docs/ 95-46-ce/dir1995-46_part1_en.pdf, http://ec.europa.eu/justice_home/fsj/privacy/docs/ 95-46-ce/dir1995-46_part1_en.pdf

[18] The official site of Eureka Celtic initiative, Movies project, http://www.celtic-initiative.org/Projects/MOVIES/


The USHER System to Generate Semantic Personalised Maps for Travellers

Zekeng Liang, Kraisak Kesorn, and Stefan Poslad

Abstract. Map applications based upon Geospatial Information Systems (GIS) are seen as a key application area for mobile users, e.g., to enable travellers and mo-bile assets to be located and tracked, with respect to spatial views, or maps, of des-tinations and routes. However, current GIS map services tend to lack support for personalisation to: enable users to set preferences based on their context and user profiles; to customise searching and selecting content; to markup maps in-situ forming a personalised spatial memory. For example, current services can’t store, spatial short-cuts, good parking spaces, etc, which have been discovered in-situ, in the physical world. These GIS map services also tend to lack a provision to enable such tagged personal spaces to be used within shared social spaces, i.e., to share spatial memories. An ongoing spatial-aware framework called USHER (U-commerce Services HEre for Roamers), has been extended, to semantically adapt and personalise maps, and tested. The contributions of this framework are: an on-tology-based representation of dynamic user preferences interlinked to a domain model that is able to detect shifts in user interests; the creation of sharable user markup data governed by an access control matrix; the generation of personalised annotated GIS maps.

1 Introduction

Spatial-Aware Map Services (SAMS) enable business, everyday and leisure trav-ellers to relate a location to a spatial context and to a spatial view of that context. Spatial contexts include specific services, buildings, persons at a location, and a location in relation to other locations, e.g., destinations, or routes and regions. Spatial contexts are defined with respect to a specific spatial view of a physical environment space (the map) which is normally a direct ‘overhead’ view of a re-gion. Typical components of SAMS are wireless smart mobile devices which en-able travellers to seamlessly access spatial information services, anytime and anywhere, maps which can be pre-cache or accessed on demand, location sensors such as a satellite based Global Positioning System (GPS) and a Geographic In-formation Systems (GIS). A GIS defines and organises spatial objects at varying Zekeng Liang, Kraisak Kesorn, and Stefan Poslad School of Electronic Engineering and Computer Science, Queen Mary, University of London, UK e-mail: zekeng.liang,[email protected]

[email protected]

50 Z. Liang, K. Kesorn, and S. Poslad

layers of spatial abstraction enabling GIS application to query and select spatial objects, then to build customised spatial views that relate to particular applications and user tasks.

SAMS applications such as vehicle SatNav systems offer maps that are location-aware, i.e., are centred around the current location and that show locations with ref-erence to routes and to a destination. Although these SAMS are location-aware, they offer only very limited user-awareness, e.g., preferences for route constraints such as fastest route, avoiding main roads etc. SAMS that are not user-aware must either provide lowest-common denominator spatial contexts and views, e.g., posi-tions along road routes, or combine many spatial views, e.g., positions along roads in relation to main tourist sights, business-driven building annotation, etc. These approaches either crowd too much information, much of which is unneeded, which is a particular problem for low-resource devices, or they may omit useful content because they adopt a lowest denominator approach to select content. In contrast, user-context aware SAMS can adapt maps to the traveller, e.g., content about foot-bridges for crossing over main roads can be included for pedestrians whereas it can be excluded for motorists. Context-aware systems tend in practice to orientate sys-tem outputs to the current context, e.g., mode of travel, what their travel activities are etc. More advanced user context aware systems may refer to the current context in relation to a goal-context [8]. In addition, these leverage the history of past user contexts in order to (partially) predict future (including goal) contexts [8].

User (context) awareness is often taken to be synonymous with personalisation. Personalisation focuses on adapting system outputs to particular interests of indi-vidual users [11], e.g., the preferences in visiting different types and instances of buildings and travelling specific routes. A user model or profile generally refers to the use of either or both user context and user preferences. User contexts and pref-erences and spatial contexts can be complex, multi-valued, heterogeneous, dy-namic and contradictory [8]. Increasingly, semantic representations such as On-tologies are used to represent these because of their higher expressivity and precision [11] [14] [22].

SAMS can filter and adapt spatial views to user contexts and preferences, e.g., specific travellers may be interested in specific types of building by architecture or by function. Travellers may also prefer to customise the presentation of content, e.g., to include both local names of services and any translations of names relative to their home language in order to make content more understandable to them. Other preferences may be used to quality selections or service recommendations derived from the set of all possible services.

Travellers often wish to create and store their own customised spatial contexts as map annotations, e.g., good or bad routes to a particular destination, good or bad vehicle parking areas, which they directly experienced, in the field. Users may wish to reuse these spatial experiences when they revisit an area. Users may also wish to share the information that they create with others and to share relevant information created by others. However, as the amount of such shared information increases, searching becomes harder. Furthermore, context awareness and in particular per-sonalization and location-awareness generates a raft of privacy concerns [20]. If

The USHER System to Generate Semantic Personalised Maps for Travellers 51

privacy issues within a context-aware service environment such as Location-Aware Services (LAS) are not properly addressed, users risk revealing their context, publi-cizing their personal details and even compromising their safety at the current loca-tion, at the intended destination and on route. There is often a legal requirement in many countries to protect the privacy of the mobile users’ information. If users per-ceive that the risks of using a technology outweigh their potential benefits, they may stop using that technology. The issue of keeping personalized, mobile, LAS private has to be addressed in order to exploit their full market potential.

The objectives of this research are to model and develop a system based on se-mantic geospatial services that adapts spatial content for mobile users to users’ tasks and to users’ preferences; to allow users to create, manage and share their own markup. The rest of the paper is constructed as follows. Section 2 gives a survey of the related work including personalised map services and semantic user profiling techniques. Section 3 describes how the personalised and user-aware SAMS applications as part of the USHER [7] system is designed and imple-mented. Some results of the semantic based personalised SAMS application dem-onstrator are presented in section 4. Finally, a discussion and conclusions of this research is given in sections 5 and 6 respectively.

2 Related Work

Personalisation has been proposed as a means to reduce information overload for over two decades [12]. The main motivation for personalisation for travellers is that it can act as an additional filter to the location for retrieving information, re-ducing the information overload for travellers as it filters this according to a spe-cific user profile rather than to all users. Personalisation is an added advantage when used in lower-resource service access devices used by travellers as informa-tion overload can also overload a person’s mobile device. In this section, existing personalised SAMS applications, user context and personal profile acquisition techniques, user profiling techniques are surveyed and analysed in order to iden-tify their best practices and limitations.

Open-StreetMap [5] allows registered users to create user diaries with location information, in a simple form which includes a diary topic, the content, author and creation date. This system allows users to set their profile description and home location. Freebase [2] provides a strong semantic structure for registered users to create their own types of data that can be shared on the Freebase web. However it does not provide specific support for mobile users and for sharing and overlaying the marked up data on a map. The GUIDE project [1] supports non-semantic based direct input of user preferences. The system mainly uses the user location as user context to retrieve location related information for users. Individual users can not filter the map data based on their own preferences or their individual tasks. In the CRUMPET project [7], personal profiles are specified by combining a mix of persona models with direct and indirect input by the user such as observations of where and what users choose to visit, but these models are not semantic. The Am-bieSense project [3] [4] situates each user task within a use-case using case-based


reasoning and combines this with location-awareness in order to make user rec-ommendations. RECO [6] is similar to AmbieSense but instead of using case-based reasoning, it situates each user task within a sequence, by learning a user’s preferences over time, in order to make user recommendations.

Before information services such as SAMS can be personalised, travellers’ con-text and profile need to be acquired. An important distinction for personal context acquisition is to classify these acquisition techniques by whether or not the user context or profile is directly input or indirectly gather through user interaction or a hybrid system is used. Probably most personalisation systems that can applie to SAMS include at least a basic element of directly gathering input from the user. Any information entered by the user into SAMS such as destinations and route preferences can be used to profile users. Information entered into other external applications such as calendar could also be input into a person modeller.

Gauch et al [14] proposed a method that automatically creates user profiles from Web-based information retrieval searches. First, a reference ontology is cre-ated automatically by spidering any of a number of online subject hierarchies. Second, Web pages are linked within each subject, are spidered, and used as train-ing data for a text classifier. Liu [5] proposed two steps to improve retrieval effec-tiveness. First, the system automatically deduces, for each user, a small set of categories for each query submitted by the user, based on his/her search history. Second, the system uses the set of categories to augment the query to conduct the web search. The framework relies on user’s usage history. It lacks the capability to adapt flexibility to a user’s changes in interests and ignores short-term interests. Xu and colleagues [16] [17] proposed a novel framework for semantic annotation and personalized information retrieval. However, users’ preference acquired from users’ queries may be ambiguous returning irrelevant results to user.

Alternatively, multiple sensors can be used to acquire a user’s context indi-rectly but this has its own challenges in determining what can be sensed unobtru-sively and dealing with false positives where the indirectly derived user context is incorrect.

There are two main ways to sense the users’ context, either smart device that travellers carry around with them acquire the user context information or smart en-vironments can be instrumented with sensors to sense and acquire the user context or both. There are several main disadvantages with instrumented environments, they can be expensive to create and maintain, may be far from being pervasive and may invade traveller’s privacy by collecting information and passing information about them to third parties. Location sensors in mobile devices can be used to gen-erate user contexts. If, for instance, a user visits a number of old churches, then he is probably interested in churches and perhaps also other historic buildings in this town, like an old city hall [7]. The rate of movement of users could be used to in-dicate the mode of transport. Smart phones increasingly incorporate micro sensors such as accelerometers and gyroscopes to support 3D user gestures and these can also be exploited to provide information about user contexts. By incorporating temporal information, Widyantoro and colleagues [13] presented novel scheme to represent a user’s interest categories, and an adaptive algorithm to learn the dynam-ics of the user’s interests through positive and negative feedback which is the main


novelty of this framework. However, user preferences are stored in metrics. Thus, no semantic relationships between concepts are stored.

To summarise, the surveyed work tends not to construct rich user models based on user and spatial contexts, user preferences, user annotations, user goals and tasks in order to adapt map content to individual users. There are several challenges with directly using user inputs. Inputs may only be gathered very intermittently, often only before travelling or when the user is stationary on-route. If travellers’ contexts are dynamic and rich, the system needs to more frequently ask or monitor the user for detailed input about their context and this in itself can be obtrusive and can overload the user [8]. The majority of the research and applications tend to focus on sensing specific isolated actions rather than being used to build persistent user models and to apply these to travellers. Hence, a hybrid system of user model is needed in practice.

In addition, surveyed systems do not provide a capability for travellers to create and share knowledge through creating and sharing their spatial markup information based on semantic modelling. There is a lack of representations of user’s profile at an appropriate level of details of user interests, and the methods lack support for the terminology heterogeneity problem e.g. terms in user’s interest may not appear in existing ontologies. These are the main objective of this research.

3 Semantic Based Personalised SAMS

The architecture of the semantic-based personalised application uses an extension of the CRUMPET system called USHER [7] based upon a three tier client server architecture, which consists of the client access device, combined client proxy and mediator, and service provider. The implementation of the map server is based upon a spatial extension of MySQL to store and retrieve spatial data. The client calls the GeoTools1 map API based middleware that supports advanced interactive map services via a client proxy which masks some of the complexity of the map re-trieval and adaptation from the client device. The map demonstrator uses GIS con-tent based on the Queen Mary, University of London (QMUL) Mile End campus and surrounding areas. These spatial services are described in subsequent sections.

3.1 System Architecture Overview

The system framework in Fig.1 shows the main components and their data flows within the system. The User Model (of Traveller) component receives a number of inputs: user goals & tasks, e.g., attending a meeting in Queen Mary University of London; user preferences such as interests in specific types of building by archi-tecture, either directly from user input or indirectly generated by a User Prefer-ences Acquisition component, acquired from the user queries; user annotation, e.g., markup data which is created by individual user in the field and can be shared with others; current user contexts to construct user model for individual

1 GeoTools: The Open Source Java Toolkit, See http://geotools.codehaus.org/


Fig. 1 A personalised SAMS application framework

mobile users. Current User Contexts are generated by User Context Acquisition component from the user events such as user information queries and through sensing users’ spatial-temporal context and a history of context changes. Spatial Temporal Context Acquisition component acquires environment context such as user location by gathering GPS data of the user, user movement by measuring three axes acceleration data of the user mobile device from environment events. The Context Processing supports mediation of multiple heterogeneous contexts and generates the context to adapt specific applications. Context Management component handles context storage, retrieval and access control. The Personalised SAMS Application component takes the input processed contexts to generate and delivery the personalised location-aware map to the mobile user.

3.2 Traveller (User) Context Acquisition and Modelling

The core part of the traveller personal context ontology model is described in Fig 2. The definitions of three traveller instances, the user stereotypes are shown as follows:

1. Tourist, someone is new to a place and wants to know more about the area. 2. Business Man: someone is new to the area and just wants to know the essential

necessary map information for them to finish their work. 3. Regular user: someone is familiar with the area, knows most of the basic map

information about the area. What they really want to know for the area is some-thing new in the area that they still don’t know.

Environment Events

Personalised SAMS Application

Store /Retrieve

Adapted Context

Context Processing(Mediation & Adaptation)

Spatial Temporal Context Acquisition

Context Management(Storage & Access Control)

Discovery / access

Contexts

UserAnnota-tion,Mark-up

User Model ( of Traveller)

User Prefer-ences

Goals & Tasks

User Context Acquisition

UserEvents

Current User

Contexts

Personalised, Location-aware Maps, Annotations

UserModel Data

User PreferencesAcquisition

User Queries


Fig. 2 Traveller personal context ontology model

The Traveller ontology consists of the following main properties, Travel Mode,

Travel Activity, Travel Goal, Destination, User Markup and User Prefe-rences/Interests. Travel Mode defines how the user is traveling when they are us-ing this map application, having the instances of walking, driving, cycling and jogging. User travel mode can be classified based on their movements detected by measuring based on three axes acceleration data received from the user mobile de-vice. Travel Activity has the properties of Posture and Repeat, and the instances of Transit and Stop. Posture defines what kinds of postures the user might be in such as walking, sitting and standing. Repeat defines weather user activity is periodic or not. The experiments to indirectly determine travel activity and posture has been carried out in other research, and will be reported elsewhere, and also by other project such as [27]. Travel Goal defines the purpose of travelling such as seeing something, meeting someone, delivering something or someone or collecting something or someone. Destination defines a traveller’s destination, which con-tains the property of Type and others. User Markup/Annotation defines how indi-vidual knowledge can be stored and presented in the map, they are sharable among users depended on the access restriction set by the markup owner. User Markup/ Annotation have the property of Type that classifies them as different places. User Preferences/ Interests define individual user interests.

Traveller

Travel Mode

Travel Activity

Travel Goal

Destina-tion

Business man Tourist Regular

traveller

hasInstance

hasTravelMode

hasActivity hasGoal

hasDestination

Walking

Driving

Cycling

Jogging

Transit

hasInstance

hasInstance

StophasInstance

Single-shot

Multi-shot

See something

Meeting someone

Gather resource

hasInstance

hasInstance

InstanceOf

Posture Repeat

Walking

Sitting Standing

.has

Instance

Type

Is-A

Sight

Eating Place

hasType

hasTypePub

Cafe Restaurant ...

Museum

hasSight

Park

...

...

Annotation, Markup

Preferences, Interests

hasMarkup

hasInterests


Each of these concepts can be expanded further at a finer level of granularity. For example, the destination concept can be expanded as follows. The Type prop-erty also contains other properties such as Eating Place and Sight. Eating Place indicates the places for food, having types such as pub, café, and restaurant. Sight defines the sigh seeing places such as museum, park and others. Destination has the instances of Single-shot and Multi-shot to represent single place and multi places involved in the destination respectively.

By defining the traveller personal context ontology model and gathering such user contexts, the system is capable of better understanding user’s goal, travel mode, destination and preference in order to generate a more personalised loca-tion-aware map for the mobile user.

3.3 Ontological-Based Traveller (User) Preference Representation

A hybrid method is used for acquiring knowledge about user preferences which combine a statistic calculation and an ontological Knowledge Base (KB) model. Ontological-based users’ preferences are constructed from user queries and un-structured textual data of markup information. An external lexical reference sys-tem, WordNet [18], and domain ontology are exploited into the process to achieve a higher degree of automation. Because some travellers’ interests (preferences) may change over time, the system, which only relies on usage history, might be-come worse when a traveller changes his/her interests. For example, a traveller A usually would choose the route that takes shorter time to reach the destination as a regular traveller, but might prefer a scenic route as a tourist in a new area. Thus, the system should be dynamic by continuous and incremental refining, extending, and updating traveller preferences during system operation in order to cope with new facts and evidence about users’ preferences. This requirement led to the de-velopment of a learning model with respect to dynamic versus static traveller pref-erences. To solve this problem, we create two types of profiles for each user to rep-resentation their preferences.

3.3.1 Dynamic User Preferences

To create dynamic user preferences, usage information is collected during a user query session. Some initial preferences for a new user will be created for his first use based on the selected user stereotype instance of user model. Ontologies en-able initial user preferences to be matched with existing concepts in the domain ontology and with relationships between these concepts. The method in this paper is based that of Gauch et al [14]. Whereas Gauch links a visited page onto five categories in the Open Directory Project, we link user interests to our traveller context domain ontology and to any markup descriptions associated with the markup points as the selected return results from user queries. Building an Onto-logical model of user’s interest may cause inconsistencies if the domain ontology does not contain any of the words that form a given user’s preferences (termino-logical problem). To solve this problem, after processing the Natural Language


Processing (NLP) technique2, key words from user queries and markup descrip-tions can be augmented by adding semantically similarity or related terms. Word-Net is exploited as a lexical reference system in order to find these additional related terms. Hence, the similarity between terms and concepts in the domain on-tology are computed to determine the best match category to users’ preferences. The concept which has highest similarity value will be selected in order to con-struct the user preferences. Then, the user preferences consists of all concepts re-sulting from the previous step and is constructed based upon the domain ontology. The main advantage of this technique over existing learning algorithms is that it does not require a large number of training sets to identify a strong pattern. This is suitable for modelling dynamic user’s preferences. Figure 3 shows the algorithm for generating user preferences. The result of this step is that the initial user’s pref-erences ( ) is created. All concepts in are called the user interest concepts ( ).

Fig. 3 User preferences acquisition algorithm that exploits WordNet

After creating , the presented system will recommend other relevant concepts ( ) to users. We hypothesise that the lower the concept is in the hierarchical-based user preferences, the more relevant concepts to user interests, or higher the level of concepts, the more general they are. Therefore, the proposed system will implicitly recommend instances based on the leaf nodes (LNs) in .

For example, a user accesses a shared markup point with description ‘Yongfa Chinese restaurant with nice buffet in Mile End’. The user preferences are con-structed using the domain ontology and WordNet. Hence, we can acquire the fol-lowing information:

1.

2. Yongfa– <is-a>-Chinese Restaurant

where is the ‘hypernym’ relationship. The hierarchical model of user prefer-ences is depicted in Fig. 4. In this example, the LN is the ‘Chinese’. The system

2 In this framework, we employ Espotter framework. See, ESpotter- Adaptive Named Enti-

ty Recognition for Web Browsing; http://people.kmi.open.ac.uk/jianhan/ESpotter

Eating Place Restaurant Chinese⊇ ⊇

1. Extract words from user query or markup description using NLP algorithm;

2. Get set of keywords K by remove stop words and stemming;

3. Get keywords C from class labels in domain ontol-ogy;

a. For each keyword pairs Ki,Ci b. Look up all word senses in WordNet; c. Compute similarity between Ki,Ci; d. Select the concept ( ) which has the

highest similarity value of word sense. 4. Construct the user’s preferences profile ( );


will recommend only two types of relevant concepts to users by adding ′ to based on the similarity to LN; Sibling Similarity (SS) (i.e. Happy Chops and Lo-tus), and its parents, the so-called Parent Similarity (PS), e.g., Thai (restaurant concept). The similarity degrees between and LN are measures based upon a distance vector, the number of nodes and their properties. For instance, the ‘Happy Chops’ and ‘Lotus’ (distance is 1 from LN) have a higher semantic relevance than the ‘Thai Smile’ (distance is 3 from LN). For those instances which have same parents with LN, the similarity is calculated from the properties between siblings. For instance, the ‘Happy Chops’ has a higher degree of similarity than the ‘Lotus’ in terms of the style property as Happy Chops is also a buffet restaurant whereas Lotus is a different style restaurant called “DimSum”. By this technique, the pre-sented system is able to model the taxonomy of users’ preferences profiles at an appropriate granularity more than the surveyed frameworks.

The advantage of this technique is that users’ preferences can be assigned a well-defined meaning using the global Ontology domain model. Ontologies consist of term descriptions and their interrelationships and support for logical inferences such that content retrieval can extend beyond the capability of keyword-based searches, e.g., semantic searches can find the relevant markup information even the searching keyword in the query does not appear in description of markup points.

Fig. 4 The Ontology-model of part of traveller model

3.3.2 Static User Preferences

User preferences can be learnt from user’ usage history, referred to as multi-session user preferences which are recorded by the user-model component using


the statistical model. The markup data from user can be accumulated from previ-ous markup information to form two metrics: Markup-Term (MT) and Concept-Term (CT) metrics (Fig. 5.). The MT matrix holds the relationships between the markup point and key-terms in markup description. Stop words have been re-moved and Porter stemming has been performed before constructing the MT matrix. The CT matrix derives information from the MT matrix and stores the re-lationships between concepts, from WordNet, and the key-terms. The value in each cell in CT(i,j) is a weighted value of each term which measures the important degree between the key-terms and concepts. We apply IF-TDF to calculate the weight of each term. We select this weight scheme because it is simple and effec-tive. It can be scaled to a large dataset [25].

Mark-up/Term Weatherspoon Steak Fishbone Fish Chip

M1 1 1 0 0 0 M2 0 0 1 1 1

(a) Markup-Term matrix (MT)

Concept/Term Weatherspoon Steak Fishbone Fish Chip Restaurant 0.855 0 0.855 0 0

food 0.577 0.855 0.577 0.855 0.855 animal 0 0 0 0.855 0 meat 0 0.855 0 0 0

(b) Concept-Term matrix (CT)

Fig. 5 Matrix representation of markup information

Some studies argue that only keywords and their frequencies (weights) are in-sufficient data for an accurate model of the user in semantic manner. Hence, we try to solve the above problem by inferring high-level knowledge about the user preferences by transforming CT matrix to the ontological-based model.

However, these static user preferences rely on previous usage data. This can re-sult in a failure to filter irrelevant markup points because users’ interests are dy-namic and are likely to change over time. Therefore, multi-shot interests are not always reliable and not always accurately reflect the user’s interests. Therefore, a dynamic model is needed to cope with this problem. In contrast, the static model is needed when the dynamic model is not able to identify the user interests.

3.3.3 Leaning and Updating User Preferences

The learning component is needed in order to improve further retrieval results by detecting user’s interest shifts and update the user preferences, updating weight of terms, and removing existing knowledge about users. User preferences can be up-dated implicitly during and after the retrieval process. However, here we do not focus on improving the learning algorithm. Thus, we adopted an adaptive learning algorithm proposed in [15] as follows:

(1) ∑+= −−

kti

tti

tit ikCTjkMT

NjiM

N

NjiM ),(*),(

1),(),( 1

1


where is the modified user preferences at time t; is the number of markup points which are related to the i-th concept that have been accumulated from time zero to time t; the second term on right hand side of (2) is the sum of the weight of the j-th term in the markup description that are related to the i-th concept and ob-tained between time t-1 and time t divided by . This approach allows the system to learn and update users’ interests rapidly and makes user preferences more dy-namic than the surveyed frameworks.

3.4 Personalised Map Content (Markup) Information Retrieval

Once knowledge-based and user preferences are obtained, semantic retrieval will be performed. The retrieval component applies the Ontology model in order to support semantic queries on text-based markup descriptions. Again there are sev-eral sub-processes involved: eliminating stop words within descriptions, process-ing query, and formulating queries etc.

3.4.1 WordNet

WordNet [18] is a semantic network database developed by Princeton under the direction of George A. Miller. The basic building block in WordNet is the synset. A synset is a set of synonyms denoting the same concept, paired with a description of the synset. The synsets are interconnected with different relational links, such as hypernymy (is-a-kind-of), meronymy (is-a-part-of), antonymy (is-an-opposite-of) and others. We exploit WordNet to disambiguate word sense in user prefer-ences.

3.4.2 Query Processing and Word Sense Unambiguous

This is because keywords in the user’s query could be ambiguous by containing more than one word senses. Hence, word sense disambiguation is necessarily. The system expands those keywords to other relevant concepts implicitly e.g., finding hypernymy (is-a-kind-of) concept and other synonyms from WordNet. The algo-rithm to disambiguate word sense of user keyword is shown in Figure 6.

Fig. 6 Disambiguate word-sense algorithm base-on user’s preferences

1. Get set of keywords Q by remove stop words and stemming;

2. Get keywords U from user’s preferences; 3. For each keyword pairs Qi,Ui

a. Look up all word senses in WordNet; b. Compute similarity between Qi,Ui; c. Select the highest similarity value of word sense.

4. Perform semantic search;


In summary, the user query is processed in order to extract keywords by remov-ing stop words and stemming. Stop words include: a, an, the, in, of, on, are, be, if, into, which etc. These words do not provide a significant meaning to the docu-ments or images in this research. Therefore, they should be removed to reduce ‘noise’ and to reduce the computation time. Stemming attempts to reduce a word to its stem or root form. Thus, the key terms of a query is represented by stems rather than by the original words. In our framework, Porter Stemming3 algorithm is applied. The remaining keywords from user’s query are called a set of query keywords Q. Likewise, a set of keywords U is created from user preferences. All pairs of U and Q are used to look up all word senses in WordNet and, then, computer the similarity between them. The highest similarity value of word sense is selected and perform semantic search later.

3.4.3 Semantic Search

After disambiguating word sense, the system will automatically formulate queries to be represented as SPARQL queries4. The SPARQL query performs a semantic search on the RDF file and returns results to a user. The SPARQL query language is a W3C recommendation for querying data from RDF documents which form part of the the KB. The SPARQL returns a list of instance tuples that satisfies the query. In order to perform semantic search, the similarity between user’s query, concepts in the domain ontology, and concepts in user’s preferences are needed to be measured. There are two types of measurements, cosine similarity and personal relevance, are deployed in this framework.

3.4.3.1 Similarity Measures

To ensure that the results are relevant to the query, a statistical computation, in the form of a cosine similarity measurement, is performed. Equation (2) defines the cosine similarity formula. The similarity between the query (q) and concepts (p) in the map content KB is measured using the following inner product:

(2)

The obtained results from cosine similarity measure are further filtered accord-ing to the user profile. Personal relevance measurement has been proposed in [19]. We adopt this formula to calculate similarity between user preference (u) and concepts (p) in the map content KB. The personal relevance measure is defined as shown in Equation (3):

(3)

3 Porter Stemming, See http://www.ling.gu.se/~lager/mogul/porter-stemmer/index.html 4 SPARQL query, See http://www.w3.org/TR/rdf-sparql-query

qp

qpqpsim

⋅=),(

pu

pupuprm

⋅=),(


3.4.3.2 Similarity Aggregation

To calculate the similarity between user preference, query and visual content, in-tegrating between cosine similarity and personal relevance measure so-called combSum model [19] is needed. The combSum model merges the two rankings by a linear combination of the relevance scores.

(4)

where . The choice of the coefficient in the linear combination above is

critical and provides a way to gauge the degree of personalization, from = 0 producing no personalization at all, to = 1, where the query (current user inter-ests) is ignored and results are ranked only on the basis of global user interests. The searching results are presented to user in descending order according to the value of score. More detail about combSum model can be found in [19].

3.5 Traveller Map Markup

To model users’ point of interests, a semantic ontology model has the advantage of building up a potentially detailed relation between the different types of user markup to allow better management and more precise searching. The construction of user markup point is showed in Fig.7. It contains the properties of hasContent which is used to store the content of the point, hasCreatedDate stores the date the point created, hasLocationX and hasLocationY keep the longitude and latitude of the point location, hasModifiedDate stores the date the point is being modified, hasName represents the name of the point, hasOwner stores the owner of the point, hasType stores the group of the point and isLocked to indicate weather this point is locked by the owner. More detailed about the markup point grouping and their accessing control will be discussed in section 3.6.

The following example illustrates how user markup data can be shared among users. User A can search the markup information that User B has created and shared in specific groups providing they both joined these tow groups. Searches are based on RDF instances represented using SPARQL [10]. User A can limit its search to only User B’s shared markup, because markup can be filtered by owner. The results can also be filtered based on additional constraints, e.g., to filter out the instances that are not within the two groups.

Fig. 7 The ontology model of the markup point

),()1(),(),,( qpsimpuprmuqdscore λλ −+⋅=

]1,0[∈λ λλ

λ


Fig 8 illustrates the structure of the RDF file of the markup point called Poin-tOfInterests in-stance. Using a semantic mediation model, the ontology can be converted to different formats to better support different specific applications. There are two approaches to store the user points of interest data. One way is to store them in RDF file format which can be used directly by the system. A second way is to convert the RDF instances data in order to support storage into a rela-tional database data such as MySQL. Storing all the data in a relational database can provide extra data storage management and access control, but it does require extra processing to convert RDF instances to database data format and vice versa. A hybrid approach can also be used to enable the database to store user sensitive data while an RDF file can be used to store point of interest data.

Fig. 8 RDF format of an ontology instance of markup point (PointOfInterests)

3.6 Map Markup Sharing

3.6.1 Restricting Access to User Markup Information

How requesters access an owner’s context is defined by an owner’s access control matrix showed in Table 1. The rows in this table specific the access levels for dif-ferent groups. Access level R defined as read only permission, while access level R+W defined as both read and write permission. The columns in this table describe how the requesters are grouped. There are three main types of grouping: Anony-mous Groups, Public Groups and Private Groups. Anonymous Groups represent the markup data are shared anonymously; owners do not have restricted access con-trol on them for read only on access, but they can specify the access constrains for the second access level. The Private Groups represent the groups that are created by the owner. The access controls of the private groups are fully determined by the owner. Each owner will have their own private group access control matrix. It is

<rdf_:PointOfInterests rdf:about="&rdf_;smap2008_Instance_11"

rdf_:hasContent="has nice meals and drinks"

rdf_:hasCreatedDate="20080605"

rdf_:hasLocationX="51.527615"

rdf_:hasLocationY="-0.051452026"

rdf_:hasModifiedDate="20080606"

rdf_:hasName="Good pub"

rdfs:label="smap2008_Instance_11">

<rdf_:hasOwner rdf:resource="&rdf_;smap2008_Instance_16"/>

<rdf_:hasType rdf:resource="&rdf_;smap2008_Instance_2"/>

</rdf_:PointOfInterests>


Table 1 Access control matrix

Grouping

Access

level

Anonymous Groups

Public Groups Private Groups

AG1 AG2 PG1 PG2 … Family Friend …

R All All GID010R GID013R UID0010, UID0012

UID0032, UID0044

R+W UID0002

UID0003

GID010W GID013W UID0050, UID0053

UID0061, UID0068

stored online, in the network, rather than in the mobile client for facilitate robust-ness and efficiency. Owners only need to set up a private group’s access constraints once and upload them into the system server side. The authentication of the access control does not need to involve the owner every time a requester makes a request for individual owner’s markup data. The public groups are the groups an owner joined or created for other people to join. The read access controls of public groups are not determined directly by the owners of the groups, every group member will have at least read access level for a group. However, the read and write access level to the public groups can be decided by owners. This proposed solution is based on access control mechanisms to specify and interpret preferences about who can ac-cess what information, at which level.

Requesters are separated into groups based on the public groups (organisations) they join, or based on the information owner’s private group settings such as fam-ily, friend, colleague and others (see Table 1). The user assigns each requester, an access level depending on its group membership. The combination of group and access level along with the requester IDs and group IDs form a grid-based access control table of requester group versus access level.

When making a request, a requester specifies their ID, the pseudonym of the owner (or holder) the owner’s ID. A credential-based mechanism is used to bind the service identifier to a specific type of credential. The token plays a role similar to the traditional x509 certificate but it is more general – it can bind any credential type to a service identity.

3.6.2 Access Control Evaluation

Access control evaluation is done by evaluating requests and credential tokens against the Information Owner’s preference policies [21], using the algorithm given below. The access controls for requesters are defined by the information owner. When a request is made for access to the user shared markup data via the system middleware (the broker component), the broker requests the access control matrix of the requested information owner ID. The Broker then evaluates the request based upon the access control matrix, the requestor’s credentials and the description of the user markup groups. There are three possible outcomes of the evaluation: reject


requests that are not permitted by the requester if the requester does not have a valid token; reveal the requested data when the requester exists in the owner’s ma-trix for accessing the data in terms of a valid token, access level, entity group (en-tity id); or notify the requester that there is no shared information from the specific owner that available to them;. The algorithm for access control evaluation consists of four steps and an example of accessing an individual owner’s markup informa-tion requested by a requester is given in pseudo-code below.

1. Validate the requester’s identity with the provided ID again the provided token.

2. Collect list of privileged group IDs available for the requester 3. Collect markup data based on the list of privileged group IDs:

For each group in owner’s Private Groups sector Check if the requester ID exists in the group access

constraints list If it exists collect the shared markup data in that group;

For each group in Public Groups sector that the owner joined

Check if the requester ID exists in the public group access constraints list

If it exists collect the markup data shared by the owner in that group;

For each group in Anonymous Sector

Collect all markup data in this sector;

4. Return the collected markup data if there is any or return the empty list if there is no markup data available to the requester from the owner.

4 Travellers Personalised Spatial Map Service

Travellers personalised spatial map service can be constructed based upon the se-mantic user modelling of travellers and then filtering and generating individual-ised maps to meet their needs. User markup information as part of the traveller model can be created and shared amongst groups of users controlled by a Control Access Matrix type mechanism (see section 3.6.1). The SAMS system of USHER used in this demonstrator can capture part of the traveller’s context indirectly through user events, user annotation / markup, user queries and environment event and directly from user input to construct the traveller model. Indirect user context input includes detecting user location by gathering GPS data of the user, obtaining user movement by measuring three axes acceleration data and retrieving user pref-erences/ interests by analysing user queries and markup information. User direct input includes setting a destination, travel goal and preferences/ interests. Travel-ler’s semantic model has three predefined user stereotypes instances which are


Business Man, Tourist and Regular. Different user stereotypes instances are ini-tialised with different default setting and can be changed based on the direct and indirect user input during their usage of the system. To construct the map service to meet the user’s task/goal, the system needs to load the ontology instance of the traveller model. The system will then extract the map raw data and convert it into a map based on the filters generated from the traveller model instances. Other as-sociated map data such as the shared markup points that are available and meet user interests from different users can be displayed as additional layers of the map content.

Fig. 9 Traveller A’s Tourist map in walking mode

An example of Traveller A’s Tourist map in pedestrian mode is shown in Fig 9. This is based on the destination, e.g., Queen Mary University of London, (QMUL), the user goal and task, e.g., visiting the campus and seeing something, set by traveller A, and the record history of the places the traveller has been to. The system decides the traveller’s status and generates the map accordingly. Most of the map content about the campus areas will be displayed to the user. The map’s presentation is based on the display preferences e.g. using different colours for different GIS objects and different symbols for different types of user mark-up information, etc.

Another example of Traveller B’s Business Man map in pedestrian mode is showed in Fig 10. Traveller B has also set the destination to QMUL but with a dif-ferent propose of having a meeting so the system set his stereotype as Business Man in walking mode as the system detects he is walking. The system generates the map bases on Traveller B’s user model which contains the basic content of the area and some useful/ important information relates to the meeting such as the meeting place.


Fig. 10 Traveller B’s Business Man map in walking mode

After having the meeting, Traveller B needs to see someone. The system

changes his map mode and the displaying map content accordingly as it detects the change of the user goal and the user is driving. The route in purple colour (user preferences for map presentation) between the meeting place (EE) and the place (W J Meade) to see someone is showed in Fig 11. Some of the markup informa-tion that shared by others which is relevant to the user goal is displayed, such as the the New Global (Pub) near his meeting place. In the driving mode, the map fo-cuses on the road information, and associated spatial objects and filters out other irrelevant map content.

Fig. 11 Traveller B’s Business Man map in driving mode


5 Discussion

Key issues for the presented ontology-based personalised SAMS include the con-struction of a semantic traveller model that can be used to personalise the map. The traveller model construction involves modelling the traveller, acquiring the traveller context including the travel mode, the use of the travel activity, travel goal, destination, user markup and user preferences (interests). The indirect user context acquisition is one of the main challenges for traveller model creation in-cluding methods to acquire user preferences and the travel mode that the traveller is in.

Integrating statistical computation into a personalisation model enables the use of more user-centred terminology in user models. However, the fact that the statis-tical technique relies solely on numeric data can result in a failure to understand-ing the meaning of users’ interests [19]. Use of only a statistical model, however, fails to capture the context in the shared markup information, which is user’s in-terest. This feature is not supported by usage mining techniques, but a semantic model (ontology). In this framework, the context of traveller preferences can be captured by NLP technique from shared markup points and then, restructures that information to form traveller preferences in a hierarchical structure in order to keep relationships between concepts found in the markup description. Conse-quently, the ontology is able to share the concept-based representation proposed for retrieval, and the expressiveness of ontologies to define user interests on the basis of the same concept space used to describe the map data. The rich concept descriptions of traveller interests and their relations provide useful information in order to easily retrieved markup information using a semantic query because a structured query (SPARQL) can express more precise information, leading to more accurate answers. In this framework, the personalised semantic search is achieved by exploiting an external knowledgebase (WordNet), a domain-specific ontology, and traveller model. This can be seen as a form of query expansion lead-ing to a more effective search mechanism.

Semantic searches are able to find the relevant markup points when querying class instance even if keyword(s) in the query are not presented in the descriptions of markup points or as concepts in traveller model. For example, Traveller A might want to find information about Chinese restaurant in a certain area e.g., Mile End. This is because the ontology contains semantic relationships with sub-concepts of restaurant and place (see Fig. 4). Therefore, the proposed system is thus able to recognize the restaurant information annotated with a restaurant style which belongs to the ‘Chinese’ concept even if the ‘Chinese’ word does not ap-pear in the markup description whereas the tradition type of user model cannot. This means that the personalised search obtains better precision and recall than previous user models. Learning dynamic user preferences (interests) from only the most recent observation leads to a traveller model that can adjust more rapidly to a traveller’s changing interest. This makes a traveller model more dynamic than previous frameworks e.g., [23] [24] [26].

The method used to detect the travel mode of the traveller is based on the pre-vious experiment and the results are not always accurate as there some similarity


in the movement pattern between some travel mode. For example, walking and jogging might have some similarity movement pattern, when the speeds between these two are not very distinctive, the outcome from the three axes accelerator can be similar and it will be difficult to separate them.

Another feature of this system is it provides the ability for travellers to create their own markup based on the visited places so that they will be shown on the map for their own convenience. More importantly, this markup information can be shared amongst their users. These requirements create challenges for how to de-sign the markup point structure in terms of scalability and usability. The system needs to be able to cope with a certain amount of information and store this in an organised way to facilitate precise searching. Anther key issue about managing this markup points is the restricted access control as travellers may only want to share their markup information within certain groups or certain users or they just want to share with any other travellers. To address this issue, an access control matrix is created for each traveller such that they can decide how their markup in-formation can be shared. By doing it this way, the traveller markup information can be safely shared amongst identified travellers.

6 Conclusion

Existing more advanced spatial aware map services can automatically adapt spa-tial content to users’ preferences and to the terminal display characteristics. A se-mantic extension to such a personalisation model has been proposed, to enable the model to adapt to users’ tasks, to support sharing of information and to support more finely grained searches using the relations among the instances of the ontol-ogy models. Users can create their own personalised markup in the field and can share this information with others. The semantic markup can also be stored in a re-lational database to support added access control and to improve data storage management.

References

[1] Cheverst, K., Davies, N., Mitchell, K., et al.: Developing a context-aware electronic tourist guide: some issues and experiences. In: Proc. SIGCHI conference on Human factors in computing systems, pp. 17–24 (2000)

[2] Freebase, Open, Shared Database of the World’s Knowledge developed by Metaweb, http://www.freebase.com/view/guid/ 9202a8c04000641f80000000010c2d43 (accessed in May 2008)

[3] Göker, A., Myrhaug, H.I.: User Context and Personalisation. In: European Confe-rence on Case-Based Reasoning (ECCBR), pp. 1–7 (2002)

[4] Kofod-Petersen, A., Aamodt, A.: Case-based situation assessment in a mobile con-text-aware system. In: Proc. Artificial intelligence in Mobile Systems 2003 (AIMS), pp. 41–49 (2003)

[5] OpenStreetMap, Map Features (2008), http://wiki.openstreetmap.org/index.php/Map_Features (accessed in April 2008)


[6] Pignotti, E., Edwards, P., Grimnes, G.A.: Context-Aware Personalised Service Deli-very. In: European Conference on Artificial Intelligence, ECAI 2004, pp. 1077–1078 (2004)

[7] Poslad, S., Laamanen, H.R., Malaka, A., et al.: CRUMPET: Creation of User-friendly Mobile services PErsonalised for Tourism. In: Proc. 3G 2001 Mobile Communication Technologies, London, pp. 28–32 (2001)

[8] Poslad, S.: Ubiquitous Computing: Smart Devices, Environments and Interaction. Wiley, London (2009)

[9] Titkov, L., Poslad, S., Tan, J.J.: An Integrated Approach to User-Centered Privacy for Mobile Information Services. Applied Artificial Intelligence 20, 159–178 (2006)

[10] W3C, SPARQL Query Language for RDF, http://www.w3.org/TR/rdf-sparql-query/ (accessed in May 2008)

[11] Vallet, D., Castells, P., Fernandez, M., et al.: Personalized Content Retrieval in Con-text Using Ontological Knowledge. IEEE Transactions on Circuits and Systems for Video Technology 17, 336–346 (2007)

[12] Maes, P.: Agents that reduce work and information overload. Communications of the ACM 37, 30–40 (1994)

[13] Widyantoro, D.H., Ioerger, T.R., Yen, J.: Learning User Interest Dynamics with a Three-Descriptor Representation. Journal of the American Society for Information Science and Technology 52, 212–225 (2001)

[14] Gauch, S., Chaffee, J., Pretschner, A.: Ontology-based personalized search and browsing. Web Intelligent and Agent Systems 1, 219–234 (2003)

[15] Liu, F., Yu, C., Meng, W.: Personalized Web Search For Improving Retrieval Effec-tiveness. IEEE Transaction on Knowledge and Data Engineering 16, 28–40 (2004)

[16] Zhang, Y., Zhang, X., Xu, C., et al.: Personalized retrieval of sports video. In: Proc. of the International Workshop on Multimedia Information Retrieval, pp. 313–322 (2007)

[17] Xu, C., Wang, J., Lu, H., et al.: A Novel Framework for Semantic Annotation and Personalized Retrieval of Sports Video. IEEE Transactions on Multimedia 10, 421–436 (2008)

[18] Miller, G.A.: WordNet: a lexical database for English. Communications of the ACM 38, 39–41 (1995)

[19] Castells, P., Fernández, M., Vallet, D., et al.: Self-tuning Personalized Information Retrieval in an Ontology-Based Frame-work. In: OTM Workshops on the Move to Meaningful Internet Systems, pp. 977–986 (2005)

[20] Kobsa, L.: Personalised Hypermedia and International Privacy. Communications of the ACM 45(5), 64–67

[21] Titkov, L., Poslad, S., Tan, J.J.: Enforcing Privacy via Brokering within Nomadic Environment. In: Proc. of the 4th International Symposium from Agent Theory to Agent Implementation (2004)

[22] Castells, P., Fernandez, M., Vallet, D.: An Adaptation of the Vector-Space Model for Ontology-Based Information Retrieval. IEEE Transactions on Knowledge and Data Engineering 19, 261–272 (2007)

[23] Mylonas, P., Vallet, D., Castells, P., et al.: Personalized Information Retrieval Based on Context and Ontological Knowledge. The Knowledge Engineering Review 23, 73–100 (2008)

[24] Daoud, M., Tamine, L., Boughanem, M., et al.: Learning Implicit User Interests Us-ing Ontology and Search History for Personalization. In: Proc. of Web Information Systems Engineering – WISE 2007, pp. 325–336 (2007)


[25] Chen, S., Williams, M.: Learning Personalized Ontologies from Text: A Review on an Inherently Transdisciplinary Area. In: Chen, N. (ed.) Personalized Information Re-trieval and Access: Concepts, Methods and Practices, New York (2008)

[26] Gondra, I.: Personalized Content-Based Image Retrieval. In: Chen, N. (ed.) Persona-lized Information Retrieval and Access: Concepts, Methods and Practices, New York (2008)

[27] Kobayashi, A., Iwamoto, T., Nishiyama, S.: UME: Method for Estimating User Move-ment Using an Acceleration Sensor. In: Proc. of International Symposium on Applica-tions and the Internet- SAINT 2008, pp. 169–172 (2008)

Semantic Based Error Avoidance andCorrection for Video Streaming

Christian Spielvogel, Sabina Serbu, Pascal Felber, and Peter Kropf

Abstract. Video streaming over best effort networks remains a challenging task.Video quality decreases with an increasing number of frames that are corrupted, lostor only received after playback time. We use semantic information about the videoand the network to decide between alternative or cooperative streaming sources toavoid or to correct data loss. We propose a distributed architecture that combines apeer-to-peer indexing archive for videos with error avoidance and error correctionmechanisms to select the best delivery method from the corresponding sources. Ourindexing-cache peer-to-peer overlay has two interesting properties for our selectionmodel: it efficiently locates several sources for a video (if they exist) and even rarevideos. Based on the coding characteristics of the available videos and the state ofthe network we apply a model for selecting between error avoidance, error correc-tion and a combination of both approaches. This model is evaluated by using thenetwork simulator NS-2 and a modified version of EvalVid.

1 Introduction

Delivering videos in the desired quality over best effort networks remains an impor-tant challenge. If a video is streamed from an arbitrary server to an arbitrary client,the perceived quality typically varies in an unpredictable way.

The commonly used methods for handling packet loss are Forward Error Cor-rection (FEC) and Automatic Repeat Request (ARQ). The problem about ForwardError Correction and Automatic Repeat Request is that under certain circumstancesthe first one produces additional packet loss and the second one causes a too largedelay for real time data. Forward Error Correction leads to additional loss when theredundant packets are transmitted over the same crowded path as the original ones.

Christian Spielvogel, Sabina Serbu, Pascal Felber, and Peter KropfUniversity of Neuchatel, Switzerlande-mail: [email protected]


[email protected]

74 C. Spielvogel et al.

The reason for too high delay of ARQ is that in case of continuous loss the samedata needs to be retransmitted multiple times before it arrives successfully.

This chapter presents a model for error avoidance in combination with error cor-rection in peer-to-peer video source networks. By using the model a tradeoff be-tween error avoidance and error correction can be found, so that the probabilityof additional packet loss or too late data arrival can at least be minimized or fullyavoided at last. Error avoidance is achieved by using semantic information about thecontent and the network conditions in order to adapt the media quality by using Mul-tiple Description Coding. The semantic information about the stream is composed ofthe number, size and type of frames within a GOP. The semantic information aboutthe network is composed of the estimated loss rate of the network path between thesender and the receiver. Multiple Description Coding enables a scalable solution byallowing adaptation without transcoding. A detailed overview about Multiple De-scription Coding can be found in section 3. In case error avoidance is not sufficientto deliver the data without packet loss, error correction is used additionally.

The error avoidance and correction model has been evaluated in an overlay peer-to-peer network. Video sources are located based on indexing-caches that containinformation about the videos in the network.

We have evaluated our model using the NS-2 [10] network simulator and an ex-tended version of the EvalVid plug-in [5]. We have extended EvalVid to supportthe evaluation of multiple sub-streams (descriptions) that are delivered within NS-2. The rest of the chapter is structured as follows: Section 2 presents related work,Section 3 gives and overview about Layered Coding and Multiple Description Cod-ing, Section 4 describes the peer-to-peer overlay and the efficient way we use tolocate videos. Section 5 introduces the error model, followed by Sections 6 and 1.7presenting the evaluation of the Multiple Description Coding approach, the streamlocation mechanism and the distributed streaming between the peers. Finally Sec-tion 8 summarizes the chapter.

2 Related Work

Approaches relying on error avoidance and error correction are not new. Forwarderror correction is based on the principle of reconstructing data that has been lostduring network transmission. A good overview about forward error correction canbe found in [1], [2] and [7]. The problem of all these Forward Error Correctionmechanisms is that they do not take into consideration the path of the redundantnetwork packets.

Error avoidance is based on the principle of reducing the packet loss probabilityby sending parts of the data either from different sources or over parallel paths [9].In [3] it is shown that Multiple Description Coding in combination with MultipleSource Streaming is able to deliver video streams in much better quality than theclassical server client approach. The main problem of these approaches is that theyeither consider network or stream characteristics, but none of them considers both.

Semantic Based Error Avoidance and Correction for Video Streaming 75

3 Introduction to Layered Coding

Layered coding is an approach for producing a compressed media stream that con-sists of multiple dependent or independent layers. The technique for producing de-pendent layers is called Scalable Coding, the one for producing independent layersis called Multiple Description Coding. The advantage of Scalable Coding is highcompression efficiency, while the advantage of Multiple Description Coding lies inhigh robustness against data loss.

Since our model for error avoidance and error correction in Peer-to-Peer networksis based on Multiple Description Coding, we give an overview about this techniquein section 3.1.

3.1 Overview of Multiple Description Coding

Multiple Description Coding (MDC) is used to produce multiple independent mediastreams of the same content. The streams are called descriptions and have roughlythe same storage size and influence on the resolution, frame rate or quality. Eachof the descriptions can be used independently or in combination with other descrip-tions. Single descriptions are used to produce the base quality, by combinig multi-ple descriptions it is possible to improve the resolution, frame rate or quality of theoverall bit stream. The highest resolution, frame rate or quality is achieved when alldescriptions are used in combination.

The advantage of Multiple Description Coding is the possibility of adapting themedia characteristics without transcoding. The adaptation decisions can be influ-enced by the server capacity, the state of the network or the resources of the play-back device. Application scenarios for multiple description coding are manifold.Multiple Description Coding in the temporal domain can be applied to support het-erogeneous devices with different frame rates. Devices with sufficient resources getthe full frame rate (e.g., 30 frames per second), devices with limited resources, likemobile devices, receive a limited number of layers resulting in a lower frame rate(e.g., 15 frames per second).

An application scenario for Multiple Description Coding in the spatial domainis the support of devices with different resolutions. For example an HDTV-set witha resolution of 1650x1080 pixels would need all layers to render the video in highquality without using interpolation – for a smartphone it would be sufficient to re-ceive only the base layer with a resolution of 320x480 pixels that can be displayedwithout discarding pixels.

A scenario for Multiple Description Coding in the quality domain is gracefuldegradation. Graceful degradation is the process of selecting a couple of enhance-ment layers that are not transmitted in case of insufficient network bandwidth. Bydropping descriptions it is possible to adapt the required bandwidth of the stream tothe available bandwidth of the network and avoid random loss. In the following sec-tions we are going to explain Multiple Description Coding in the temporal, spatialand quality domain in more detail.


3.2 Temporal Scalability

Temporal scaling is used to encode a video sequence into multiple descriptions, eachhaving a subset of frames with the same spatial resolution. The lowest frame rate isachieved by decoding any of the descriptions, by adding remaining descriptions theframe rate is increased until the full rate is achieved. A block diagram that showsa simple example of producing two independent descriptions for one stream canbe found in Figure 1. The two descriptions can be created very simply by splittingthe frames between the descriptions transforming them using the discrete cosinetransform, quantizing them and applying variable length coding.

Fig. 1 Block diagram for MDC in the temporal domain

3.3 Spatial Scalability

Spatial scaling is used to encode a video sequence into multiple descriptions havingthe same frame rate but each of them contributing a part to the full spatial resolution.When only one description is decoded the spatial resolution of the resulting video isminimal. Decoding additional descriptions increases the spatial resolution towardsthe full size of the raw video. A block diagram for the encoder can be found inFigure 2. As an example, two descriptions can be created in 6 steps as follows:

1. The raw video is spatially down-sampled, transformed using DCT and quan-tized to get the input for the second description.

2. To produce the 2nd description each frame is reconstructed using inverse quan-tization and the inverse discrete cosine transform.

3. Each frame is spatially up-sampled to the original size using interpolation.4. For the 2nd description each frame is up-sampled and subtracted from the orig-

inal image. This difference is known as the residual.5. The residual is transformed using the discrete cosine transformation and

quantized.6. The coefficients from both descriptions are encoded using variable length coding.


Fig. 2 Block diagram spatial scalable encoder

An evaluation of our Multiple Description Implementation and an the argumentationwhy MDC in the temporal domain is preferred over MDC in the spatial domain canbe found in Section 7.

4 Peer-to-Peer Overlay

4.1 The indexing-Cache Overlay

We introduce the distributed indexing architecture that is used by our selectionmodel for network-error treatment.

We consider a peer-to-peer (P2P) system composed by peers (computers) sharingvideo files. Each peer has a partial view of the file system: it can communicatedirectly with only a small set of peers, called neighbours. The whole set of peersforms an overlay network.

In our scenario, the user application provides each peer with a set of videos,which can be delivered on request to the other peers. This means that, in order tofind a certain video, a peer has to issue a search request in the peer-to-peer system,which is then responsible to efficiently find the peer(s) that store and provide thatvideo. A single location is enough when the network allows the transmission of thevideo in the desired quality. However, when multiple peers that have the video aredetected, they can participate in the process of selecting the network-error treatmentas alternative or cooperative streaming sources.

Intuitively, there are some videos that will be requested much more often thanother ones. Typically, the less popular videos will be available from only few peers,while the more popular ones will be provided by many peers. However, the success


rate of finding a video should not be influenced by its popularity. Thus, mechanismshave to be provided in order to also efficiently locate unpopular videos.

In order to assure a high success rate when searching for both unpopular and pop-ular videos while keeping the overlay maintenance and network costs low, at eachpeer we use a simple and dynamic structure called indexing-cache. This structureimplemented at each peer contains information about videos that the overlay candeliver. This way, a peer is not only aware of the videos that it can deliver itself, butalso of other videos and of the peers that can deliver them. The searching time for avideo can thus be considerably reduced.

In the overlay, each peer has a neighborhood (i.e., a set of peers that are knownto it), from which it periodically collects up-to-date information about videos. Thisinformation is then placed or refreshed in its indexing-cache as a pair containing thevideo and the peer that can deliver it. In order to be able to contact other peers, theneighborhood is also periodically updated. The number of peers in a neighborhoodis limited, so a peer has to replace an existing neighbor with a new one. The infor-mation from the existing neighbor is still kept in the indexing-cache, however, sinceit is no longer refreshed (this neighbor is not anymore part of the neighborhood),it has now an increasing age associated to it. The information from the new neigh-bor is added as up-to-date information. Whenever the limit of the indexing-cache isreached, the information with the highest age is removed.

An example with 7 peers and their videos can be found in Figure 3. For each peer,we show the videos that they own: peer A owns video v1, peer B owns videos v2and v3, and so on. For peer A, we highlight its indexing-cache, which contains alist of the videos located on its neighbors: v4 on peer D, v2 on peer F. Before thisconfiguration, peer C and then peer G used to be neighbors of A. This is the reasonwhy peer A has in its indexing-cache also the list of videos that peers C and G own,and with ages associated to them. When A will find a new neighbor to replace one of

v2,v3

v4

A

B

C

D

E

F

Gv1

v4

v5

v2

v4 Dv2 Fv6 G

v6, v7

v7 Gv4 C

age 1

age 2age 1

Fig. 3 Indexing-cache overlay Architecture


its current neighbors, at least the entry of video v4 will be discarded (since it has thebiggest age). The vertical arrow on the left of the indexing-cache of peer A showsthe direction of insertion of new index entries.

4.2 Indexing-Cache Maintenance

Algorithm 1 presents the pseudo-code for finding a new neighbor. In order to avoidnetwork partitions, each node keeps track of its number of incoming links (computedfrom the requests with different sources that it receives) and the new neighbor ischosen as the peer from the random walk that has the smallest number of incominglinks. This strategy provides strong connectivity between the peers from the system,with a non-biased in-degree (which also provides load balancing).

Algorithm 1. Pseudo-code for the neighborhood update algorithm at peer pi

1: Find new neighbor:2: if incLinks(pi) < RW.incLinksMin then3: RW.incLinksMin← incLinks(pi)4: RW.newNeighbor← pi

5: end if6: Add pi to RW.nodes.7: RW.length++8: if RW.length < RW.maxLength then9: Choose rn ∈ neigh(pi) and rn /∈ RW.nodes

10: Forward request to rn11: else12: Reply with RW.newNeighbor13: end if

The update neighborhood process works as follows. A peer issues a random walkto find a new neighbor. Peer pi is any peer the random walk goes through. The ran-dom walk message keeps track of the node (RW.newNeighbor) with the smallestnumber of incoming links (RW.incLinksMin). If pi has a smaller number of incom-ing links, these values will be updated (lines 1-5). Node pi is added to the list ofpeers that the random walk went through (line 6) and the length of the random walkis increased (line 7). Then, the request is forwarded to a randomly chosen neigh-bor rn, excluding the peers that the random walk had already gone through (lines 8-12). The last node in the random walk is in charge of sending a reply containing thenew neighbor (line 13). The new neighbor is thus the peer with the smallest numberof incoming links from the whole random walk.

In order to accommodate the new neighbor in the local view, an existing neigh-bor has to be removed, which, for efficiency, is the node that was used to send therandom walk.


4.3 Searching

Given that each peer has an accurate knowledge of the videos stored in its neigh-borhood and (possibly outdated) information about videos from other peers, we userandom walks of finite length for the search procedure. This method is expected toperform in practice as well as a TTL-limited flooding, but with much less traffic gen-erated. To search for a video, a peer sends a video request to a random neighbor thatchecks its indexing-cache for the video, and if not found, it will repeat the processby sending the video request further to a random neighbor. The periodic neighbor-hood updates give the peers a high diversity in their indexing caches, which makesthe random walk a simple strategy that is expected to find the requested video in ashort number of hops.

In order for the search result to contain multiple peers that have the requestedvideo, the indexing-cache can contain several locations for a video; also, the randomwalk can finish only when a certain number of locations have been found withoutexceeding the random walk maximum length. If needed, several random walks canbe issued. More than two peers having the same video are necessary to enable mul-tiple source streaming (in case of error avoidance), as well as delivering correctionstreams from alternative sources (in case of error correction).

4.4 Churn and Video Updates

The overlay deals easily with peer failure. When a peer from the indexing-cachefails, its corresponding entries are discarded. When a neighbor fails to respond, it issimply replaced with another peer from the overlay.

For joining the overlay, a new peer simply issues a random walk of a fixed size,and then it picks from the path the peers with the smallest number of incominglinks, with the purpose of reducing the risk of network partitioning. After joining,to ensure an initial degree of reachability, the peer can advertise its videos througha number of random walks. Then, its videos will gradually become more reachablethrough the periodic indexing-cache updates of the other peers in the system.

The video information of any peer can change over time, if the peer obtains morevideos from application level or if some of the videos are deleted. In such case, fora quick propagation of the information, the peer should notify the peers that have itas a neighbor in order for them to update their indexing-caches.

5 The Model for Error Avoidance and Error Correction inPeer-to-Peer Networks

When using the indexing approach described in Section 4, the network bandwidthbetween the sender and the receiver is not taken into consideration. In order to de-liver the content in the desired quality, it might be necessary to apply (1) error cor-rection, (2) error avoidance or (3) a combination of both approaches. In this section


we present a model to select between these three alternatives based on current net-work and content characteristics. The model combines two measures called QualityProbability and Network Probability. The combination of Quality Probability andNetwork Probability is called Success Probability.

SuccessProbability =

QualityProbability∗M

∏i=1

NetworkProbabilityi (1)

where M is the number of streaming peers and NetworkProbabilityi representsthe probability of successfully sending packets between peer i and the receiver.QualityProbability represents the probability that network errors are not propagatedwithin the video stream. The combination of both (i.e., the success probability) cantake values between 0 and 1.

5.1 Network Probability

Network probability is used to select between alternative peers based on the avail-able bandwidth to the receiver and the bit rate of the video stream. We calculatethe available bandwidth between the sender and the receiver based on the ”TCP-friendliness” formula obtained from [6]:

AvailableBandwidth =

s

tRT T

√2p3 + tRTO(3

√3p8 )p(1 + 32p2)

(2)

where s is the packet size, tRT T is the round-trip time, p is the packet loss probabilityand tRTO is the TCP retransmit timeout value.

Network Probability is calculated as the ratio between the required bit rate andthe available bandwidth:

NetworkProbability = min(1,AvailableBandwidthRequiredBandwidth

)

where AvailableBandwidth is the TCP-friendly available bandwidth (Equation 2)between the sender and the receiver and RequiredBandwidth is the bit rate of the(partial) video stream. Network Probability can take values between 0 and 1. In casethe available bandwidth from the sender to the receiver is sufficient to deliver thecontent without loss, Network Probability has the value of 1.

5.2 Quality Probability

Quality probability expresses the probability that all video frames that arrive at thereceiver can be decoded successfully. This probability depends on (1) the number


of lost packets and (2) the type of frames affected by the packet loss. MPEG codedvideo streams [8] consist of three main frame types, I, P and B [4]. I-frames (Intra-coded frames) have the advantage of being self-contained and allowing random ac-cess. They have the disadvantage that the compression rate is usually much lowercompared to P- or B-frames. P-frames (predictive-coded frame) have a better com-pression ratio than I-frames but encoding and decoding requires information fromthe previous I- or P-frames. The third type of frames are bidirectionally predictive-coded frames (B-frames). The advantage of B-frames is that they have the highestcompression ratio compared to I- and P-frames but they additionally depend on onepreceding and one succeeding frame in the Group of Pictures (GOP). B-frames arealways preceded by an I- or P-frame and the succeeded by a P-frame. So qualityprobability is used to consider the structure of the stream additionally to the lossrate of the network (NetworkProbability). As another example, a stream encodedusing only I-frames and losing many packets, usually results in a better quality incase that the same content is streamed with a lower bit rate but encoded using I-,P- and B-frames. Different packet losses have different effects on the media qualityand thus error handling has to be adapted to the relative importance of the frames.

The model requires knowing the number of network packets belonging to eachvideo frame as well as the loss probability of the network path. The loss probabilitycan be expressed as the ratio of transmitted and received packets:

LossProbability =PacketsReceived

PacketsTransmitted(3)

The number of received packets (PacketsReceived) is determined by sending test pack-ets from the sender to the receiver. The number of packets to be transmitted can becalculated by parsing the structure of the stream.

The LossProbability takes values between 0 and 1: 0 means that all packets arereceived, while 1 means that all packets are lost. Knowing the LossProbability ofthe path and the number of packets belonging to a video frame, the arrival proba-bility, which is the probability for successfully receiving one single frame, can becalculated using the statistical binomial distribution as follows:

ap(T,F, p) =T+F

∑i=T

(T + F

i

)∗ pi ∗ (1− p)T+F−i (4)

where T is the number of network packets, F is the number of forward error correc-tion packets and p is the loss probability of the path (defined in Equation 3).

Computing the arrival probability (ap) for one single frame is not sufficient forselecting between streams from alternative peers. Videos have playback times rang-ing from several seconds to hours and thus analyzing the complete structure wouldtake too long. However the fact that video streams are organized in subsequentgroups of pictures (GOPs) can be used to simplify calculations. In the test streamsused in our experiments each GOP follows the same frame pattern, ”IBBP...”, pro-viding sufficient information to make predictions about the complete video.


In order to model packet loss for a group of pictures, I-, P- and B-frames must beanalyzed separately as they have different sizes and dependencies:

apI = ap(NI,FI, p)apP = ap(NP,FP, p)apB = ap(NB,FB, p)

(5)

where apI,apP,apB are the probabilities that I-, P- and B-frames are not lost.NI ,NP,NB are the numbers of packets for each type of frame, FI,FP,FB are the num-bers of forward error correction packets used and p is the LossProbability (definedin Equation 3).

The probability (QualityProbability) for being able to successfully decode allframes belonging to the GOP is defined as:

QualityProbability = apI ∗ apCPP ∗ apCB

B

where CP is the total number of P-frames and CB is the total number of B-frames.In order to be able to determine the required amount of forward error correctionpackets (FI,FP,FB) for the I-, P- and B-frames, we compute the arrival probabilityfor each frame separately. The necessity for doing so is explained by giving anexample with two frames. Consider that an I-frame has an arrival probability of50% and the depending P-frame an arrival probability of 100%. Taking into accountthe dependency between the I-frame and the P-frame, the P-frame can also only beused with a probability of 50%. Sending correction packets for the B-frame wouldbe useless but by using the equations that are explained in the rest of this section, itcan be seen that it is the I-frame that needs to be protected.

The computation of the arrival probability (RI) for the I-frame is simple becauseno dependencies need to be considered:

RI = apI (6)

The dependencies of P- and B-frames are considered in the rest of this section. Whencomputing the arrival probability for P-frame i, the dependencies to the I- and allprevious P-frames have to be considered:

RP(i) =

RI ∗ app if i = 1,

RP(i−1) ∗ app if i > 1(7)

where P(i) is the ith P-frame in the GOP. In case of the first P-frame in the GOP(i = 1), only the probability of successfully decoding the I-frame and the P-frameitself is considered. In case that i > 1 also the dependencies to all previous P-framesare included.

As B-frames depend on I- and P- frames, the probability of arrival of a B-frameat position j is calculated as:

RB( j) = RP(k) ∗ apB (8)


where B( j) is the jth B-frame in the GOP and P(k) is the immediate successor framethat is referenced.

6 Evaluation

We first evaluate our Multiple Description Coding implementation (Section 7) andargue why we prefer MDC in the temporal domain over MDC in the spatial domain,then we show an evaluation of the efficiency of our video location mechanisms (i.e.,the indexing-cache overlay) and finally in Section 7.1 and Section 7.2 we presenttwo scenarios for streaming the content to the end-client.

7 Evaluation of Multiple Description Coding in the Temporaland Spatial Domain

In this section we evaluate two characteristics of Multiple Description Coding,namely the additional storage/bandwidth requirement, resulting from lower redun-dancy within each of the descriptions, as well as the graceful quality degenerationcapability. In the first experiment we have measured the storage/bandwidth over-head of Multiple Description Coding in the temporal domain (see Table 1) wherethe MDC streams are composed of two descriptions. The table shows the storagesize of the conventional stream, the sizes of the two descriptions and the overheadof the two descriptions compared to the original stream.

Table 1 Storage overhead for MDC in the temporal domain

FileName Size original Stream Size Descr.1 Size Descr.2 Overhead Descr.1+Descr.2bridge 1.9MB 998KB 998KB 5.05 %carphone 382KB 213KB 210KB 10.70 %clair 226KB 125KB 123KB 0.97 %coastguard 424KB 265KB 255KB 22.60 %container 208KB 117KB 109KB 8.65 %foreman 477KB 283KB 267KB 15.30 %grandma 511KB 276KB 276KB 8.02 %highway 1.5MB 802KB 783KB 5.60 %lotrings 2.4MB 1.3MB 1.3MB 8.30 %mother 168KB 93KB 87KB 7.14 %news 274KB 157KB 147KB 10.94 %salesman 352KB 197KB 198KB 12.21 %silent 284KB 156KB 148KB 7.04 %

Analyzing the results from Table 1 it can be seen that for the 13 test streams theaverage storage/bandwidth overhead is 8.88%. In the best case the overhead is only0.97 %, in the worst case 22.6 %.


Table 2 Mean Opinion Score values using MDC in the temporal domain

FileName Original Stream (MOS) Description 2 (MOS)bridge 1.08 3.0carphone 1.47 3.0clair 2.57 4.0coastguard 1.34 2.85container 1.82 3.0foreman 1.28 3.07grandma 2.13 4.0highway 1.13 3.86lotrings 1.19 4.32mother 1.83 4.0news 1.98 4.34salesman 1.98 4.79silent 1.66 4.26

In the next experiment we skip Description 1 and compare the quality against theeffect from randomly loosing the same amount of data from the single stream. Dueto the graceful degradation the MDC based streaming achieves a much higher MeanOpinion Score (MOS). MOS is an objective measure for representing the satisfac-tion of an end-user receiving a video stream. With this metric the value for the bestquality is 5 and for the worst quality is 1. In Table 2 it can be seen that receivingonly Description 2 always yields a better result than losing the same percentage ofdata randomly from the single stream. Summarizing the experiment it can be saidthat the quality of the 13 test streams, that were encoded using MDC in the temporaldomain was at least 23.6 % on average 41.58% and in the best case 62.6 % betterthan the quality of the corresponding original video streams that were transmittedunder the same conditions.

Evaluation of Multiple Description Coding in the spatial domainSimilarly to multiple description coding in the temporal domain we have evaluatedthe additional storage space/bandwidth requirement for multiple description codingin the spatial domain. The evaluation results can be found in Table 3. When theseresults are compared to the results from MDC in the temporal domain (Table 1) itcan be seen that the storage/bandwidth overhead resulting from using the temporalmultiple description encoder is at least 15.7 %, on average 37.94 % and in the bestcase 72.53 % lower than using the spatial multiple description encoder.

In the last experiment the loss of Description 1 is compared against randomlyloosing the same amount of data from the original stream. Analyzing the resultsfrom Table 4 it can be seen that loosing 1 description still yields a better result thansending the original stream in full quality and randomly loosing the same amountof data. Summarizing the evaluation it can be said that quality of the 13 test streams


Table 3 Spatial MDC downsampling results

FileName Size original Stream Size Descr.1 Size Descr.2 Overhead Descr.1+Descr.2bridge 1.9MB 1.2MB 1.3MB 31.58 %carphone 382KB 292KB 311KB 57.58 %clair 226KB 216KB 230KB 95.13 %coastguard 424KB 291KB 298KB 41.51 %container 208KB 131KB 137KB 29.81 %foreman 477KB 402KB 437KB 75.68 %grandma 511KB 339KB 376KB 39.33 %highway 1.5MB 989KB 1100KB 39.27 %lotrings 2.4MB 697MB 713MB 16.67 %mother 168KB 143KB 151KB 76.79 %news 274KB 193KB 201KB 43.8 %salesman 352KB 223KB 235KB 30.11 %silent 284KB 210KB 216KB 50.7 %superman 14MB 9.3MB 11MB 21.43 %f1-canada 7.8MB 5.9MB 6.5MB 61.54 %davinci 5.8MB 3.7MB 4.3MB 37.93 %

Table 4 Spatial MDC downsampling results

FileName Throughput Kbit/s Original Stream (MOS) Description 2 (MOS)bridge 431 1.08 3.0carphone 372 1.47 2.26clair 111 2.57 3.29coastguard 476 1.34 1.98container 182 1.82 2.76foreman 449 1.28 1.76grandma 310 2.13 3.28highway 375 1.13 2.35lotrings 1003 1.19 4.71mother 161 1.83 3.28news 210 1.98 2.32salesman 210 1.98 3.01silent 210 1.66 2.63

that were encoded using MDC in the temporal domain was at least 7.8 %, on average16.8% and in the best case 18.4 % better than the quality resulting from applyingMDC to the same streams in the spatial domain under the same conditions.

Conclusion from evaluating our Multiple Description EvaluationDue to the much lower storage space and bandwidth overhead as well as the betterloss probabilities of the streams, we are using multiple description coding in thetemporal domain.


7.1 Searching the Overlay

In order to evaluate the success rate of both unpopular and popular videos, we haveexecuted an experiment with 1,000 peers and 643 videos, where each peer has 5neighbors and an indexing-cache of up to 50 entries. In the indexing-cache there canbe up to 2 entries per movie. Each entry of the indexing-cache specifies a location,i.e., a peer that provides the movie. The association of videos to peers follows aZipf distribution with α=1. Each peer issues a search request for all videos in theform of two random walks, each one with a maximum length of 20 hops. The searchprocedure stops when at least one location for the requested movie has been found.

0

200

400

600

800

1000

0 100 200 300 400 500 600

Num

ber

of p

eers

Videos, in order of popularity

Video popularity (A)Random network (B)

Random network with cache (C)Indexing-cache overlay (D)

Fig. 4 Request success per video

The results are presented in Figure 4, where we show the request success foreach video. The horizontal axis represents the videos, in order of popularity (mostpopular movies on the left side). The vertical axis shows the number of peers thatsuccessfully find the specified movie. For comparison purposes, we have includedin the figure the success rate of the search procedure of the following cases (thelegend, from top to bottom):

(A) local-search only, which is actually the Zipf distribution of the videos (i.e.,number of peers that have a certain video);

(B) random walks in a random network, no notion of cache;(C) random walks in a random network, where each peer stores locally, in a cache,

the results (i.e., locations) of the video requests that it had issued;(D) random walks in the indexing-cache overlay; the information from the neigh-

bors (not the search results as before) is cached.

The caches of (C) and (D) have the same size and they use the same aging pro-cess as replacement policy. The particularity of these two cases is that whenever a


0

50

100

150

200

250

0 100 200 300 400 500 600

Num

ber

of f

ound

loca

tions

Videos, in order of popularity

Random Walk of length 10Random Walk of length 20Actual number of locations

0 1 2 3 4 5 6 7 8 9

10

0 100 200 300 400 500 600

Random Walk of length 10 (caption)Random Walk of length 20 (caption)Actual number of locations (caption)

Fig. 5 Average number of found locations for each movie

random walk containing a video request arrives at a peer, if the peer does not ownthe video, it searches for the video in the cache.

The results for the indexing-cache overlay (D) show that the popular videos arealways found and moreover, most of the unpopular videos are found by at least halfof the peers.

Under the same overlay configuration, we have done an experiment that showsthe number of locations found for each movie during the search procedure. Thistime, the search stops only when the maximum random walk length has beenreached. (Otherwise, there will be at most 2 hits). Again, each movie is requestedfrom all peers, and we have computed the average number of locations found in therequest path using a random walk of length 10 and 20, respectively. The results areshown in Figure 5, where we have also added, for comparison purposes, the num-ber of real locations of each movie in the overlay. As expected, popular movies arefound in more locations, while less popular movies are found in a smaller numberof locations. The advantage is that the search procedure already returns multiplelocations for the movies that are in at least 2 or 3 locations.

Figure 6 shows an analysis of the request success rate in a random network andin the indexing-cache overlay, while varying the number of issued random walksand their length. The z-axis shows the success rate as the percentage of times whereat least a location of the requested video was found. The experiments were donefor 500 peers and 380 videos with a popularity according to a Zipf distributionwith α=1. Each peer requests each video, making in total 190,000 requests. Forboth the indexing-cache overlay and the random network, a larger random walklength for the same number of random walks gives a higher request success ratethan a larger number of random walks for the same random walk length, since in thelatter case the same nodes might be visited, which is useless. As can be seen fromthe figures, the indexing-cache overlay returns a much higher request success ratethan the random overlay, even for low values of the random walk length.


1 1.5 2 2.5 3 3.5 4 1 2

3 4

5 6

7 8

0 20 40 60 80

100

Succ

ess

Rat

e

# Random WalksRandom W

alk Length

Indexing-cache OverlayRandom Network

Fig. 6 Request success in a random network

7.2 Streaming Scenarios

In this part we show that our model is able to find the best alternative among errorcorrection, error avoidance and a combination of both approaches. To keep the ex-amples comprehensible we pick a small subset of peers. The stream used consists oftwo descriptions encoded using MDC in the temporal domain. The full stream hasa rate of 1462 Kbit/s when it is encoded using I-frames only and 1081 Kbit/s whenit is encoded using I-, P- and B-frames.

The experiments have been performed using the network simulator NS-2 [10]and a plug-in called EvalVid [5]. Data streams from multiple servers are mergedand forwarded as one single stream to the player.

7.2.1 Scenario 1

The following example illustrates the necessity of combining Network Probabilityand Quality Probability. The content is provided by two alternative peers in differentqualities (Alternatives A and B, see Figure 7). The question is which peer to selectas streaming source. Alternative A is encoded using only I-frames; alternative B isencoded using I-, P- and B-frames. Calculating only Network Probability (Equa-tion 5.1) yields a better result for alternative B (Table 5). When Success Probability(Equation 1) is calculated (Table 6) it can be seen that alternative A yields a bet-ter result (because of the higher QualityProbability). In order to verify the success

Table 5 Stream and Network Characteristics

Alternative Bitrate Avail.-BW Netw.-ProbabilityA 1462 1257 0.86B 1081 989 0.90


Fig. 7 Logical view - Scenario 1

Table 6 Success Probability - Scenario 1

Alternative Success Probability MOSA 0.66 4.02B 0.5 3.50

probability calculation of our model, both decisions are simulated. For comparingthe alternative qualities again the Mean Opinion Score (MOS) metric [5] is used. Bysending both streams, it can be seen that considering Network Probability alone isnot sufficient and selecting alternative B would have been the wrong decision. TheMOS values of alternatives A and B are 4.02 and 3.50, respectively (see Table 6).Alternative B (the one with the lower bit rate) scores worse because of the temporaldependencies to the frames that were lost.

It can be seen that our model is able to select the better alternative. This smallexample is used to show that computing the ratio between the available bandwidthand the bit rate of the stream is not sufficient to decide between alternative streamingsources, because the structure of the stream has a strong influence on the resultingquality.

7.2.2 Scenario 2

In the second scenario it is assumed that two peers provide the requested contentin the same quality. The problem is that the network bandwidth is not sufficient tosend any description without loss. Both network paths to the receiver have an aver-age bandwidth of 300 Kbit/s, the required bandwidth for sending description 1 anddescription 2 are 539 Kbit/s and 542 Kbit/s respectively. The question is either tosend one description stream from each of the peers (and accept some loss) or one de-scription stream and one forward error correction stream. When success probabilityis calculated it can be seen that sending one description plus one forward error cor-rection stream is better than sending two descriptions (see the higher value of 0.88


Fig. 8 Logical view - Scenario 2

Table 7 Success Probability - Scenario 2

Alternative Success Probability MOS2 Descriptions 0.62 2.25

1 Description + FEC 0.88 2.45

compared to 0.62 in Table 7). In order to verify the success probability (Equation 1),both decisions are simulated. The MOS values from the simulations are also listed inTable 7. It can be seen that the result from sending one description and one forwarderror correction is 8.2 % better than sending two descriptions (see the higher MOSvalue of 2.45 compared to 2.25). The reason that sending two descriptions leads toa worse result than sending one stream and one forward error correction stream isthat none of the two descriptions can be fully received.

8 Conclusions

We have presented a semantics based model for selecting between network-erroravoidance, network-error correction and a combination of both approaches to delivermultimedia streams over best effort networks in the desired quality. This modelwas presented in the context of a low-cost indexing-cache overlay that has beenshown to deal well with requests for both popular and rare videos and, moreover,to locate multiple peers having the same video. The error handling model is basedon considering network characteristics in combination with stream characteristics.The evaluation has been performed by doing simulations using varying stream andnetwork characteristics within NS-2. The simulation results show that the model canbe used to take the decision — error avoidance, error correction or the combinationof both — that allows the system to deliver the stream in the best quality.

References

1. Lamparter, B., Boehrer, O., Effelsberg, W., Turau, V.: Adaptable forward error correc-tion for multimedia data streams. Technical Report TR-93-009, University of Mannheim(1993)


2. Liu, H., Ma, H., Zarki, M.E., Gupta, S.: Error control schemes for networks: AnOverview. Mobile Networks and Applications 2(2), 167–182 (1997)

3. Lee, I., Guan, L.: Reliable video communication with multi-path streaming using mdc.In: IEEE International Conference on Multimedia and Expo, ICME 2005 (2005)

4. Boyce, J.M., Gaglianello, R.D.: Packet loss effects on mpeg video sent over the publicinternet. In: Multimedia 1998: Proceedings of the sixth ACM international conferenceon Multimedia, pp. 181–190. ACM Press, New York (1998)

5. Klaue, J., Rathke, B., Wolisz, A.: Evalvid - a framework for video transmission andquality evaluation. In: Computer Performance Evaluation/Tools, pp. 255–272 (2003)

6. Padhye, J., Firoiu, V., Towsley, D., Kurose, J.: Modeling TCP throughput: A simpleModel and its Empirical Validation. In: SIGCOMM 1998: Proceedings of the ACMSIGCOMM 1998 conference Applications, Technologies, Architectures and Protocolsfor Computer Communication, pp. 303–314. ACM Press, New York (1998)

7. Park, K., Wang, W.: Qos-sensitive transport of real-time MPEG video using adaptiveforward error correction. In: IEEE International Conference on Multimedia Computingand Systems, vol. 2, pp. 426–432 (1999)

8. Claypool, M., Zhu, Y.: Using interleaving to ameliorate the effects of packet loss in avideo stream. In: ICDCSW 2003: Proceedings of the 23rd International Conference onDistributed Computing Systems, Washington, DC, USA, p. 508. IEEE Computer Soci-ety, Los Alamitos (2003)

9. Maxemchuk, N.F.: Dispersity Routing in Store-and-Forward Networks. PhD thesis, Uni-versity of Pennsylvania (1975)

10. The Network Simulator NS-2 (v2.1b8a) (October 2001), http://www.ns-2.com

http://www.ns-2.com


Semantics in the Field of Widgets: A Case Study in Public Transportation Departure Notifications

Alena Kovárová and Lucia Szalayová

Abstract. Widgets are becoming increasingly present in our everyday routines, which makes their portability and reusability desirable properties. As a particular example, we consider public transportation passengers who are extensively using the internet to make their lives simpler. In order to minimize the time spent at the bus stop, they use to check their bus line departures on the web before they leave their homes or offices. For this purpose, there exist several internet portals provid-ing information on local transportation time schedules. This chapter starts by pre-senting a better way (quicker and easier) of obtaining the same information – us-ing an adaptive desktop widget with comfortable user interface. The second step is the utilization of the semantics of considered data in order to make the widget portable through different data sources of the same domain.

1 Introduction

Due to the continually growing volume of information that is made freely avail-able online, people often find themselves in the inconvenient situation where they have to invest disproportional effort and time in order to interact with the informa-tion sources they use. Everyone subconsciously or consciously estimates how long it will take to obtain the desired information and more importantly whether this in-formation is worth this time and effort.

This process includes for example decisions such as which electronic newspa-per to read, which sports section to monitor, which broadcast to watch, which web pages contain relevant information and so on. This is of course a daily struggle; most of us would appreciate the time-saving and effort-saving option of having

Alena Kovárová Faculty of informatics and information technologies, Slovak University of Technology, Bratislava, Slovakia

Lucia Szalayová Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic

94 A. Kovárová and L. Szalayová

this “personalized” information wait for us somewhere nicely aligned. To come as close as possible to this vision, we come to the point of choosing a favorite news-paper, favorite channels and programs, favorite web pages; simply said: favorite information sources. But this is still not enough; even within these favorite sources it is still needed to search and to filter. This simply reflects the fact that the major-ity of the available information sources are built for the masses and therefore do not have any implemented personalization / personal adaptation features to serve the needs of each individual person.

The abovementioned quest for information can be divided into three distinct types, which refer to a case in which someone is searching for:

1. general knowledge (whether in an unknown or a known field) 2. specific information in an unknown field 3. specific information in a known field

In this work we focus on information quests of the third type. This means that

a user is interested in specific information from some known area and that he knows where to search for it and how to filter the information that is available at that location; the user already has a favorite source for this information. In other words, in our case the user is able to formulate his requirements in greater details and to be explicit. Examples of such requirements could be: “I want to monitor this specific list of stocks on the stock-market and I have no interest in the fluctua-tion of other stocks or of the general index.” or “I need to have the current weather forecast for the city where I live and I prefer to have it in textual and image form.” While this is a known area and the experienced user knows where and how to find (manually) the information he is interested in, the problem that remains is how to transform such a requirement into a computer language so that a computer can look for the information (automatically) instead of the user.

To understand this problem practically, let’s have a look at a specific case, the average morning of a “John”. John looks for information on bus departures from home to work. It takes John a little bit of time till he opens the relative web page in his web browser – it always takes John a few seconds to perform this task. The time depends on the degree to which John is capable of customizing the system he is working with and also on how much different settings will allow him (if they exist as an option) to speed up obtaining his desired information. Our goal is two-fold: to minimize this time and to relieve the user from the manual customization of the information source.

Clearly John's (and also our) requirement can be formulated like this: "I want to know, when my bus is going from where I am now and in the usual direction." It is important to notice words "my", "where" and "usual" because these assume an ap-plication is able to estimate the number of his bus, where he is and which direction he wants to travel. Once an application fulfilling this requirement is developed, a second question rises: "If John would move, could he still use the application?" If the widget worked at a semantic, rather than flat information, level, then

Semantics in the Field of Widgets 95

that kind of portability would be possible too, allowing John to continue using the tool he is accustomed to, even when his own circumstances and context change. The design and development of such a tool is the objective of this chapter.

The remainder of this chapter is structured as follows: Section 2 contains a brief survey of different solutions for the retrieval of desired information via browsers and widgets. We point on their pros and cons in view of our purpose. Section 3 describes our widget starting with possible data sources, through system overview and widget basic functionality. We also explain widget architecture and give a closer view at its data model. Section 3 is closed by widget evaluation. Section 4 deals with semantics and the corresponding ontology model, which could make the widget independent on data source. We compare our model with other ontol-ogy models, which belong to the same area, but they are based on different re-quirements. Section 6 lists our concluding remarks.

2 Related Background

Let's have a closer view of the user's possibilities of

• searching, • filtering, • retrieving the web data within specific site, • customizing web-application for his own benefit • how to obtain some web page the quickest way

It is the same for any kind of problem of the third type – when the user knows where to search and how to filter. So our first question is: What is the usual way to obtain information from Internet? Omitting the highly specialized web-applications, it is the well-known browsing.

2.1 Traditional Access to Resources on the Web Using Web Browsers

The most used are internet browsers e.g. Microsoft Internet Explorer, Mozilla Firefox, Safari or Opera. And how can the average internet browser save time? The user can set up some settings e.g. to save his favorite web page via “Add to Favorites”, to make some pages as his “home-page”, to “Show the windows and tabs from the last time” when internet browser starts. Such settings allow the user to set up different things about web pages but there is no possibility to specify or to ask for specific information within the web page (if the user wants just a part of the page). A little improvement brought Microsoft Internet Explorer 8 with Web Slices, which use simple HTML markup to represent a clipping of a web page, enabling users to subscribe to content directly within a web page1.

1 Internet Explorer 8: Features – Web Slices

http://www.microsoft.com/windows/internet-explorer/features/easier.aspx


To move closer to user needs, next to generic internet browsers we find site specific browsers. This type of browser is designed to create a more comfortable environment for the user, especially when browsing “the favorite” sites e.g. for e-mails or on different types of social networks. Examples of the site specific browsers are: Fluid (for Mac OS X), Mozilla Prism, Google Chrome or Bubbles. They are web-applications, which have the same core as web browsers but from the outside they look like desktop applications. They offer drag & drop function and have many other nice features, maybe they have some settings, which can be manually set up and then the user can obtain his information even quicker as in web browser, but they still do not guess user’s focus, do not give a chance to filter (specify which part of which web page) and do not offer the way of presentation.

Apart from bookmarking systems built-in web browsers, users can take advan-tage of bookmarking web services, such as social bookmarking system Delicious2 (formerly del.icio.us). Such services provide them with the possibility to organize their bookmarks by using tags and to have their bookmarks available independ-ently of user’s location and browser.

Another option, which can significantly speed-up user access to relevant infor-mation are personalized and adaptive web-based systems [2], especially when combined with site-specific browsers. Appropriately trained personalized web based system can often display the information the user is looking for directly on the first page.

2.2 Widgets

Without implementing own engine or robust system, a chance for solving our problem could be found between widgets (sometimes also called gadgets). In our context, they are not some elements, which help the user to navigate or to orientate or to pick a choice, but they are single-purpose mini-(web-)applications, which typically have a minimal size and are dedicated to bring simple solution based ef-fect while a user is working with a computer. Their functionality is oriented to one, specific goal – to display very specific information. They can be of two types, either for the web (web-widgets) or for the desktop (widgets) [3]. The latter one can be for computer as well as mobile devices [1]. In this work we focus on the desktop widget for computers, which can be freely located and easily combined within the desktop. Most often used engines for widgets or gadgets are: • Konfabulator3 from Yahoo! for Windows XP+ and MacOS

o known as Yahoo! widgets4 • Windows Sidebar from Microsoft for Windows Vista

o sidebar with gadgets on Windows Vista desktop

2 Delicious – social bookmarking http://delicious.com/ 3 Konfabulator, Reference manual, Version 4.5 http://manual.widgets.yahoo.com/ 4 Yahoo! widgets, http://widgets.yahoo.com/win


• Google Desktop Gadgets5 from Google for Windows XP+ o in a form of Google Desktop

• Opera Widgets6 from Opera for Beta MacOS 10.5 and Windows XP+ • Dashboard7 from Apple for MacOS 10.5

o as the 2nd desktop with widgets • Joost Widgets Joost 1.0 Beta Mac OS 10.5, Windows XP, Windows Vista

Most of them use a kind of API which processes mainly HTML, JavaScript,

XML and CSS files. There are some differences between different enterprises of widgets and gadg-

ets for desktops. From the user perspective some widgets are represented by views or icons which are located in a standard sidebar of the desktop and the widget be-come active only after click initiation where the icon spreads itself to the desktop. After this the widget can be relocated as wished. On the other hand some gadgets have almost double sized sidebar wideness as the widgets where gadgets are pro-viding the service during all the time of activeness. After clicking on it gadget spreads itself and increases the service quality or quantity whereas relocation is limited within the sidebar.

From an implementation point of view there are three possibilities for the user on how to have their own personal widget. The user should first decide which API he wants to use and if it is not already a part of his system or application, he needs to install it. Then those three choices are:

1. To find it on the web page with plenty of complete widgets, download it, manually set up it and use it.

2. To read a tutorial for extending a generic widget and follow simple instructions to created a specific one.

3. To read a tutorial for developers and program their own widget.

Which of these three will be chosen is highly dependent on the type of informa-tion, which should be displayed (the way of displaying is not now taken in to ac-count). Just like the site specific browser, the complete widgets cover the demand of the majority. Therefore non standard requirements are not covered by the first choice. If there is already a service as an RSS or a web-service, which can be re-quested for information, the second choice is sometimes enough. But in case of non-existent complete widget or service, the only choice is the third. The last one also gives a space for developer to implement some features, which would offer to the user some kind of personalization. But generally there is no effort to imple-ment widgets for one purpose with a broad usage (i.e. independent of information source); those, which obtain information from Internet, all are exactly one site or exactly one web-service oriented. It is because there is no standardization for these sites or services, which would be applied in such widget. 5 Google Desktop, http://desktop.google.com/index.html 6 Dev.Opera,, http://dev.opera.com/articles/view/creating-your-first-opera-widget/ 7 Dashboard widgets for Mac OS X Dashboard,

http://www.apple.com/downloads/dashboard


3 Public Transportation Departure Widget

Based on the survey presented in the previous section, we can implement a widget, in order to John’s request: "I want to know, when my bus is going from where I am now and in the usual direction." Widget technology is suitable for this purpose, while the request needs a very little space of user's desktop to display relevant information - the closest departures of chosen (guessed) stop, direction and line from public city transport. This tiny desktop application is mostly suitable for laptop owners (where the mobility can increase the need of extensive transpor-tation) as well as for any computer user who is interested in his/her favorite line schedules.

Here and in the following section we explain what the needs of our user are, which the features of the widget fulfilling these needs are and how they work to-gether. Finally, we tackle the two main related theoretical questions: “Would it be possible to use metadata describing data semantics in order to make the widget in-dependent of an information source?” and “What should the ontology model look like?”

Our first key point was to look for suitable information sources (to show, they are not good enough and to choose one of them as our data source) and the second is to gather user’s requirements for the application.

3.1 Sources of Public City Transport Departures in Bratislava

There are three well-known web sources of public transport information for Brati-slava. In following lines are shortly described all of them with emphasis on user possibilities.

The very first source is the web site http://www.dpb.sk [4]. This web site is administrated by the public transportation provider for the area of the capital city Bratislava in Slovakia. Process of reaching information (there are only timetables with departures) is relatively complicated and there is required a manual action – there are six steps needed within the browser. Thus, this source is not very favored between users. More over, it is not possible to personalize these pages.

The second solution can be found within the web site http://www.imhd.sk [7] (imhd). This site is probably the most used. There is for example a useful feature where the user can search also the stop to stop combination. An attractive service of the imhd is an email notification possibility - where actual changes, exclusions, news and useful information can be provided. Personalization possibility is very limited – thus, searching for relevant information is not brisk.

The last and the most recent source is web site from http://www.cp.sk [5], what is the National information system of timetables for Slovakia. This web site offer all kind of timetables for trains, buses, flights and different public city transports within Slovakia. Taking in to account only public city transport, the user can find his route by setting the starting stop, the last stop and time of departure or desired


arrival. The connection is found within interchanges, but user can ask also for di-rect connections only. The other choice is to get the entire timetable for one line at some stop for set date or to get the schedule for one bus and its route. The only possible personalization is to save the displayed page as the favorite one.

From previous lines it is clear, that there is no service, which would give us the required information on demand; there are only different web sites. Since our re-quirement is so specific, we had to choose the third choice: to program our own widget. Evaluating the pros and cons of different widgets APIs we have decided to implement the widget using the Konfabulator and we chose imhd as a data source for our widget, while it had structured html code good enough to parse it to our database.

3.2 System Overview

The idea of widget with line departures shall not substitute any of above men-tioned information sources. To explain the difference closer, imagine a following scenario: John is at work. He knows which buses stop next the building and knows which one is suitable for him. But he does not remember its departures and just wants to know what the closest time his bus comes is, because he does not want to stand on that bus stop for ages. Of course, he does not want to browse internet, where he either has to click many times or has to fill some input boxes always with the same strings. He used to print out the entire timetable for his bus, but he always needed to check time and search for relevant value in paper. John is not in-terested in transfer between lines, he does not search for the quickest or the cheap-est route. He does not need to know, when he will arrive to his destination.

As we already mentioned, our two goals are to minimize time/effort and man-ual customization, in other words, we want to fulfill John’s requirement the way, which would minimize the number of his actions and accelerate the access to the information. The widget, which can follow this, has to have at first some input and output. Example of input is when the user chooses a number of a line. This input is continuously monitored, what enables our widget to adjust to the user. The output is displayed to the user - view. Our output is desired departures, which are loaded either from a local database, or downloaded from a web. When downloading is in-duced, new data are stored in local database. The last case for user is the possibil-ity to set up predefined locations (Fig. 1, upper part), that enables the user to ad-just the widget from the first touch.

As departure schedules are from time to time changed, these changes need to be translated into the local database update to provide the user with the most up to date information. This updating process can run automatically every week, but the user can at any time, switch off this updating. There is also case for automatic clean up to erase data which are not used and are old. And the most important is to keep fresh data in displayed area – current departures, what is the last case of time actor (Fig. 1, lower part).


Fig. 1 Use case diagram of widget system

3.3 Widget Basic Functionality

Basic widget functionality is to display the upcoming five departures of selected line from the chosen stop in a set direction (Fig. 1, case Input choices). To get this, the user has to go through three steps, which should be done in proper and intui-tive order: 1. Select a line number – from a list within the dropdown-menu (Fig. 2, point 1),

selection is needed only if the user does not want the automatically chosen. 2. Change a direction – simple click (Fig 2, point 3), needed only if the widget

wrongly proposed the inverse one 3. Choose a stop – from a list within the dropdown-menu (Fig. 2, point 2), shown

are only those stops which belong to the previously selected line. This selection has to be done only if the automatically chosen stop is not the wanted one. In the case of the first-time line selection, the first stop of selected line is pre-selected. After these three steps, whether they were done automatically or by the user,

the upcoming five departures are displayed (from current time). The widget dis-plays exactly: line number + direction + departure time + time-left in minutes (Fig. 2, point 4). To have current data at any time, actualization is performed every minute.


Fig. 2 Widget description

To alleviate the user from permanent time checking – how many minutes re-mains to a departure - we implemented also one extra feature – sound. The widget can announce the time of the next departure e.g., "Next bus arrived at 12:00. That is in 3 minutes." Of course, this function can be turned off (Fig. 2, point 5).

Finally, every application should have a Help (Fig. 2, point 6). Our Help con-tains a user manual.

To make it more user friendly, we gave the user the possibility to set up his fa-vorite locations manually (Fig. 1, case Pre-defined location settings): The user can for every location choose several lines (with respective stops and directions), which he usually travels with, for example from school or office. The user can name it e.g., route “school->home”. The output is the same as within the basic functionality, only the upcoming five departures differ in line number and name of stop. Departures are ordered in the usual way – according to time of departure (Fig. 3).

Fig. 3 Widget setup for multiple lines within one route (in Slovak language, translation of route: Home -> Work)


The last of the basic widget functionalities is widget ability to adjust to the user's needs. As we do not use any other information sources (e.g. browsing his-tory) to find out what are user’s usual bus stops and bus lines, the widget has empty database (except default data) at the beginning. While the user uses the widget, it monitors his choices and stores number of selection of each choice in the local database (together with downloaded data). Finally, the most often chosen option can be pre-selected automatically and thus accelerate the service access.

3.4 System Architecture

We chose the Konfabulator as an engine for our widget. It means, we used mainly XML and JavaScript for programming and supported SQLite for our local data-base. Our system can be divided in following parts (Fig. 4):

GUI – Graphics User Interface, which use to send data (user choices) to the Task manager and according to them can ask the Task manager for new data from local database. The GUI can also send information about user's choices to User profiler.

The User profiler updates in database the number of user's selections. And re-member the user's settings including his favorite locations / routes. Anytime the user chooses a line number, stop or direction, its relevancy raises.

The Task manager

• updates GUI (departures) either because of time or user's different choice, • updates the local database (data downloaded from Public transport information

provider, if there was an Internet connection) and • cleans up the local database - due to performance optimization the Task Man-

ager will erase the least selected lines out of the database in certain period

Fig. 4 Conceptual architecture of the public transportation departures widget


The Downloader downloads entered web page, therefore it is needed an Inter-net connection, when user wants to download new time tables or a new calendar.

An input of the Parser is raw data (HTML code of a web page), which is parsed and stored in respective columns of the local database – wherefrom it will be loaded for the user as requested.

3.5 Data Model

To parse one web page takes several seconds, what was contrary to our goal. Therefore we needed to store the data in our local database. The most important is to store lines, their stops and departures for terminal stops. While there is a differ-ence in timetables depending on day type, we enlarged our database with two small separated tables – public and school holidays (Fig. 5).

The line table contains data about the line previously loaded by system. By lines there is a learning ability applied - so one of the attributes is used to specify the incremental value of line selection count.

The line stops table is loaded by data parsing of the left part of the schedule list. It contains information about stops of a respective line and time lag between each two upcoming stops in a route. Here is the learning capacity of the system done by incrementing the station selection count - selection of the station for specific line and direction.

The departure table, in database, represents departure times out of the base sta-tion - so the time of arrivals for specific station is calculated using the initial de-parture time and summary of time lags until the desired station. As departures are differentiated based on the actual day (working day, weekend, public holiday or school holiday) this feature is taken into consideration.

Fig. 5 Logical data model of the widget database

Previously mentioned day differentiation is being done by recognizing a week day (working or not) whereas a special feature for recognition of public or school holidays is represented within separate tables with these special days. A list con-taining the school holidays is updated yearly - this list can be gathered in the site


of The Ministry of Education in Slovakia8.Attribute region is necessary, while school holidays in our country differ on the basis of it.

3.6 Evaluation

Evaluation was done among the students of the Faculty of informatics and infor-mation technologies of Slovak University of Technology in Bratislava. Tests were performed by 10 volunteers who use the computer on a daily basis.

Their task was to download a new line (of public transport) in the application to display departures for one of its stops. By starting the widget, instruction guidelines were displayed, but were usually skipped by the testers. As testers realized during their first attempts that the widget displays only one default line, guidelines were used to get the information on how to extend the widget’s functionality. Overall, it took generally less than three minutes for users to find the desired link information. Testers observed the specific feature of the application - due to data parsing after the URL was set – that it was not possible to influence the widget for a moment. This feature has been previously well documented also within the guidelines.

One special feature of the widget is sound – the widget can announce the time of the next departure. This feature was also tested (the speech was realized by us-ing the Windows functionality of automatic reading of given text). This voice functionality was evaluated as being very popular by the users, whereas the widget was rated in a very positive way as a whole. No negative features were found. Testers came out with one recommendation: to display departures in centralized printout within the frame.

The system has been implemented according to its design. During the imple-mentation several traps occurred. One of the most complicated was not well-structured HTML code of imhd pages. Pair tag rules were many times broken, what forces us to deep study of the source code. It was necessary to identify key points within the HTML code which were used to identify the load sections. This way is complicated for the implementation and execution as well. Due to this fact implementation of automatic data updating has not been implemented – to update database (departures of a few lines) would take several minutes and during this time widget would be out of order. Due to this fact updating can be done if initi-ated by user in the same way as adding a new line.

4 Extending the Widget with Semantics

Coming back to question posed earlier in this chapter: "If John would move, could he still use the application?" It would be suitable, if our widget would work al-though it will have different information source with the same type of information – line departures. This idea assumes that the provider provides data also with their semantics. Such providers are very rare as well as widgets working with such data; more often are web widgets e.g., in project of Eetu Mäkelä with colleagues [6].

8 The web site of The Ministry of Education in Slovakia, http://www.minedu.sk


But the principle is the same, so we created our own ontology model (Fig. X.6) to represent the semantics and relations within data we are working with – parsing, storing and displaying in our widget. This ontology model includes all three main tables and their attributes from our data model.

Line

Stop

Stop of the line

Terminal stop

Departure

Type of day

Direction

Vehicle type

xsd: int

xsd: stringhas label

has number

is vehicle type

has stop* is in or

der

xsd: intxsd: int

has time-shift in

minutes

has direction

is at stoprdf: type

xsd: string

fro

m to

has departure*

from stop

is valid in

is scheduled at xsd: time

has label

xsd: stringhas label

Fig. 6 Ontology model of data from public transportation departures widget

To check the compatibility with provider, let's assume that the provider pro-vides the same model as presented Junli Wang and his colleagues in their work [8]. Their model is not meant for widgets, but it also deals with public transporta-tion. Their purpose is oriented on public transport query as transfer trip scheme, route query and station query. That is a wider range of public transport domain than ours, thus also their ontology model is wider (Fig. 7). Omitting the concepts, we do not use in our model, and leave the same ones out, it is noticeable only one serious difference: the concept of route with its timetable. We do not have anything like this in our model, while we can calculate it from departure from terminal stop plus time-shift to selected stop. Our model expect the timetable of departures (from terminal) without knowing the last stop, but their always need to have set the first and final one. This leads us to two conclusions. The first one is, our widget would not work on their ontology model unless we would reimplement our widget, and the second is that our ontology model is better, since we used de-partures, what is semantically lower concept than route – route can be easily cal-culated from departures.


Fig. 7 Urban public transport ontology [8]

5 Conclusions

It was already well known that it is possible to implement a widget (as a client), which downloads and parses data from some web source (server side). Moreover, such a widget can be personalized, because it can adjust itself to best serve the user, thus making the retrieving of information more comfortable and quick. This accommodation is achieved by monitoring the user's choices and storing the num-ber of selection for each choice in the local database. The only one disadvantage is that such widget is totally dependent on the data source. In this chapter, in order to make such widgets portable through different web sources in the same domain, we proposed the creation of an ontology model which can reflect data semantics.

We created such a model and compared it with an other one from the same do-main but with a different purpose. The comparison showed that the two ontologi-cal models differed in the main concept. This conclusion implies that although it is useful to use semantics in the widget (as in any other client application), it will work only if the server provides data with the same semantics.

Regarding further applications of the work presented herein, the widget could take a benefit of such semantic model which could be applied also in other kinds of systems with regular departures e.g., logistics or catering. In the same time, our ontology model can be extended so it would serve also for other purposes e.g., route planning.

Acknowledgement. This work was partially supported by the Scientific Grant Agency of Slovak Republic under the contract No. VG 1/0848/08.


References

[1] Boström, F., Nurmi, P., Floréen, P., Liu, T., Oikarinen, T., Vetek, A., Boda, P.: Capri-corn - an intelligent user interface for mobile widgets. In: Proceedings of the 10th in-ternational Conference on Human Computer interaction with Mobile Devices and Ser-vices, MobileHCI 2008, pp. 327–330. ACM, New York (2008)

[2] Brusilovsky, P., Millán, E.: User Models for Adaptive Hypermedia and Adaptive Edu-cational Systems. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) Adaptive Web 2007. LNCS, vol. 4321, pp. 3–53. Springer, Heidelberg (2007)

[3] Caceres, M.: Widgets 1.0: The Widget Landscape. W3C (2008), http://www.w3.org/TR/widgets-land/ (accessed 17 September 2009)

[4] Dopravný podnik Bratislava, a.s (company, provider), Public transportation for the area of the capital city Bratislava (web site), http://www.dpb.sk (accessed 17 September 2009)

[5] INPROP, s. r. o (company, provider), National information system of timetables for Slovakia (web site), http://www.cp.sk/ (accessed 17 September 2009)

[6] Mäkelä, E.: Enabling the Semantic Web with Ready-to-Use Web Widgets Export. In: Nixon, L.J.B., Cuel, R., Bergamini, C. (eds.) Proc. of the First Industrial Results of Semantic Technologies Workshop (FIRST 2007), pp. 56–69 (2007)

[7] mhd.sk (citizen union, provider), imhd.sk (web site of public transportation for the area of the capital city Bratislava), http://www.imhd.sk (accessed 17 September 2009)

[8] Wang, J., Ding, Z., Jiang, C.: An Ontology-based Public Transport Query System. In: Proceedings of the First International Conference on Semantics, Knowledge and Grid table of contents, pp. 62–64. IEEE Computer Society, Los Alamitos (2005)


An Adaptive Mechanism for Author-Reviewer Matching in Online Peer Assessment

Ioannis Giannoukos, Ioanna Lykourentzou, Giorgos Mpardis, Vassilis Nikolopoulos, Vassili Loumos, and Eleftherios Kayafas

Abstract. Peer assessment techniques are an effective means to take advantage of the knowledge that exists in web-based peer environments. Through these tech-niques, participants act both as authors and reviewers over each other’s work. However, as web-based cooperating environments continuously grow in popular-ity, there is a need to develop intelligent mechanisms that will retrieve the optimal group of reviewers to comment on the work of each author, with a view to increas-ing the usefulness that these comments will have on the author’s final result. This paper introduces a novel technique that incorporates feed forward neural networks to determine the optimal reviewers for a specific author during a peer assessment procedure. The proposed method seeks to match author to reviewer profiles based on feedback regarding the usefulness of reviewer comments as it was perceived by the author. The proposed mechanism is expected to improve the peer assessment procedure, by making it adaptive to individual user characteristics, increasing the quality of the projects of a group overall and speeding up the peer assessment pro-cedure. The method was tested on educational data derived from an e-learning course and the preliminary results that it yielded are promising.

Keywords: peer assessment, user matching, machine learning.

1 Introduction

During the last few years, web-based social networks have met rapid develop-ment. Such networks consist of individuals from different expertise backgrounds and with various profiles who cooperate with one another to boost their knowl-edge and performance. The presence of a large number of peer users, the common goals shared by the members of the community, as well as the diversity of knowl-edge and expertise of the participants, makes these networks an ideal environment for the incorporation of peer assessment techniques. These techniques enable users Ioannis Giannoukos, Ioanna Lykourentzou, Giorgos Mpardis, Vassilis Nikolopoulos, Vassili Loumos, and Eleftherios Kayafas Multimedia Technology Laboratory, School of Electrical and Computer Engineering, National Technical University of Athens, Zographou Campus, 15773 Athens, Greece e-mail: igiann,ioanna,gmpardis,[email protected], loumos,[email protected]

110 I. Giannoukos et al.

to evaluate and comment on each other’s work and thus benefit from the knowl-edge of some of their peers to improve the quality of their assignments.

Peer assessment techniques may be used in various real-world situations, such as the evaluation of novel research contributions, the assessment of medical diag-noses and all fields of corporate and academic education. Especially as far as the academic and educational domain is concerned, the beneficial aspects of peer as-sessment have been widely discussed throughout the literature. Researchers agree that peer assessment stimulates student motivation and encourages deeper learning and understanding [33, 29].

Although peer assessment techniques are broadly recognized as an important quality assurance mechanism, they also present a major drawback. More specifi-cally, they need to be manually coordinated by a supervising expert that selects peers, seeking to ensure that the group members will receive the best possible re-views. If this matching procedure is successful, it will help enhance the quality of the users’ work. The decision of the expert supervisor is typically based on his perceived knowledge of the user profiles and expertise. Relying, however, on hu-man experts to perform this task is inevitably a time-consuming process, since the supervising authority needs to carefully examine each case and decide accord-ingly. This considerable time loss delays the final outcome and in some cases re-sults in lowering the quality of the information which is finally released. On the other hand, if the matching procedure is forced to be made quickly, then mistakes are very likely to occur and a less than appropriate peer matching may be ar-ranged, due to the fact that the time spent to examine possible peer pairs is limited. All the above shortcomings may not have a substantial impact on small-scale groups, which involve a limited number of users, but in case where large user populations are involved, the outcome quality of the peer community might be significantly low.

Therefore, the traditional method of peer matching performed by a few expert in-dividuals seems to be less suitable and viable for modern web-based environments which involve a large number of participants. Instead, these environments require the use of an intelligent mechanism that will automatically and efficiently match peer reviewers to peer authors with a view to provide the author with the best possi-ble quality of reviews; a fact which is expected to lead to improved performance and better final quality results.

In this paper, a novel peer matching mechanism is proposed. This mechanism provides adaptive and personalized services for performing automatic optimal matching between authors and reviewers, taking into account the feedback that the authors provided in terms of their perceived usefulness of the comments received by the reviewers. More specifically, this mechanism is based on a popular ma-chine learning technique, namely feed forward neural networks, to estimate the optimal reviewers for a specific author. The proposed method uses past data to construct author and reviewer user profiles. In addition, it uses the author’s per-ceived usefulness made over a specific review, which is obtained through a quality feedback attribute. Then, the method uses the aforementioned data to estimate the usefulness that the author will probably find in the comments of each reviewer. Next, based on these estimations, it automatically assigns each author to the most

An Adaptive Mechanism for Author-Reviewer Matching in Online Peer Assessment 111

fitting reviewer. Therefore, the proposed method adapts itself to the personal char-acteristics of each individual and a fact which is expected to make the peer as-sessment procedure more efficient.

The rest of this paper is structured as follows: Section 2 presents the strengths and limitations of related research literature and section 3 introduces the reader to the theoretical background of the feed forward neural network technique. In sec-tion 4 the proposed method is analytically described. The method results on e-learning peer assessment data are presented and discussed in section 5. Section 6, includes a discussion regarding the potential of the proposed method as well as future extensions. Finally, section 6 concludes the study.

2 Related Literature

Various research studies can be found in the literature that refer to the use of peer assessment in different domains. It should be noted that although it has been applied to various domains, the most prominent use of peer assessment is in the education sector.

The peer assessment process has been found to be effective in promoting peer learning [33] and in improving students’ interpersonal relationships inside a class-room [29]. Supporting the above findings, the study of Berg et al. [3], which is applied on university level students, reports that a significant improvement can be observed when students process the feedback they receive from peer assessment and incorporate it in their work. This study also reports that the time that peer as-sessment takes place is a very important factor to the education procedure, since it should not coincide with teacher assessment of the students’ work, in order to be mostly effective. The value that peer assessment can bring to a class is not limited to students of university or high school level, but it can also be beneficial for more advanced student groups, such as the ones involved in teacher education, as re-ported by Sluijsmans et al. [28]. The results of this work are based on three em-pirical studies and suggest that the peer assessment procedure leads to a general improvement in students’ peer evaluation skills, as well as to their task perform-ance in the course field. Taking into account the quality of the students has also been found to be beneficial for the peer assessment procedure. To this end, the study of Ljungman et al. [20], which is performed at a university level education, examines the effect that peer assessment has on student performance, when “older” students are involved as peer examiners for “younger” students. The study concludes that involving students into this type of peer assessment procedure in-creases their motivation to learn, makes them acquire tacit knowledge and makes them understand the meta-cognitive competences that are necessary in order to become responsible and autonomous in learning. Apart from improving the per-formance of the students, peer assessment evaluations have also been found to be equally reliable and valid to the assessments produced by the teacher in [32]. This is mainly due to the fact that a peer assessor has more time to spend on the peer assessment procedure than the instructor, a fact that compensates for the decreased knowledge that the peer assessor may have.


Apart from classical education, peer assessment has also been widely used in online courses. In this type of courses, where instructors have less means to assess the students’ knowledge, while the number of students may be large, peer assess-ment is found to present a variety of advantages, while at the same time it over-comes the time and place restrictions posed by traditional peer assessment processes. More specifically, students that actively participate in on-line peer assessment activi-ties receive higher grades on final exams compared to those that do not [2], espe-cially at initial course stages [5]. In addition, the study of Prins et al. [30], which refers to peer assessment applied on a computer supported collaborative learning environment, showed that the students’ attitude towards peer assessment was posi-tive and the assessment results added value on their performance. A different im-plementation of peer assessment, performed by Chang et al. [4], showed that this procedure can be used for further purposes apart from performance enhancement. More specifically, in this study a fuzzy peer assessment system to be used in online peer assessment is developed. Using this system, students are divided into smaller groups that are assigned with a specific task to complete and then students within the groups assess the level of each others’ contributions in the cooperative activities of the student group. This use of peer assessment allows for all students to be rewarded based on their true participation to the final outcome of the group.

Peer assessment has also been used on the vocational sector. Keely et al. [19] reports a peer assessment procedure performed on the written correspondence be-tween health care providers, and more specifically on the consultation letters that these providers exchange. This study concludes that a high degree of satisfaction with the peer assessment procedure is observed. In addition, the participants report that peer assessment results in positive changes to the quality of their consultation letters. After performing a follow-up period of six months on the same partici-pants, the aforementioned study also reports that peer assessment also presents longstanding changes in the way that the participants complete their letters, affect-ing the latter in a positive way. The method of peer assessment is also used as a means to examine the professional competence among peer medical students. In the study of Dannefer et al. [10], fifteen users evaluate the work habits, prepared-ness, initiative, respect and trustworthiness of their peers. The findings of this study suggest that peer assessment can be used to foster reflection about profes-sional qualities and as a means of assessing professional skills. Another study fo-cusing on the issue of peer assessment among professionals is the one made by Tsai et al. [34]. In this study, twenty four teachers were involved in a three-round peer assessment in order to develop their science activities. Results of this work show that teachers develop more creative science activities -as a result of this pro-cedure- both in a theoretical and in a practical level.

Another field where peer assessment is broadly used is the academic publica-tion sector. Scientific journals internationally recognize this method as a quality assurance mechanism. To examine the effects of peer assessment on the journal sector, Yue et al. [36] performs this procedure on forty one clinical neurology journals with peer opinions obtained from 254 members of the World Federation on Neurology. Results imply that peer assessment is a viable technique that can be used to assess journal quality in the health sciences and provides a valuable tool


for collection development decision-making by health care librarians. However, the study of Grainger [12] suggests that peer assessment is only useful to the aca-demic publishing sector if the peer participants, especially those serving as re-viewers, are characterized by professional conduct and responsibility. In addition, this study stresses out that the review process should also be timely and qualita-tive, a condition which if not met might endanger the credibility and responsive-ness of the journal. The problem of time required to receive, file and organize arti-cle submissions, track article versions, match of authors to reviewers as well as maintain the correspondence with them exists even in journal publications with relatively few article submissions. The advent of technology used in the peer re-view process has helped to minimize the amount of time needed and reduce the costs related to the peer review process [24]. Another interesting result comes from the study of Schroter et al. [27]. This study examines the case where the au-thors of an article are given the opportunity to suggest the reviewers they consider as the most suitable to review their article and compares author- and editor- sug-gested reviewers in order to examine the differences in the review quality. Using data from ten biomedical journals this study reports that the quality of the reviews between author and editor suggested reviewers did not differ significantly, al-though the author-suggested reviewers tended to make more favorable recommen-dations for publication. Therefore, the study concludes that editors can rely on the reviewers that were suggested by the authors to make reviews of an adequate qual-ity, but should be cautious when considering their recommendations for the article publication. However, considering author-suggested reviewers as candidates to undertake the peer assessment procedure is not always favored, since, as reported by the study of Clark et al. [6], these reviewers might be well pre-disposed to-wards the authors. Instead of this practice, this study suggests that the editors can ask the authors of each article to shortly describe their contribution in relation to their prior work. This procedure enables the editors to identify those peers that have knowledge relevant to the subject of each submitted article and select the most appropriate reviewers accordingly.

From the above, one may observe that peer assessment has been found to be an especially beneficial quality assurance mechanism for various sectors. However, since typical peer assessment is performed manually, through a few expert indi-viduals, little attention has been given in automatically retrieving the optimal reviewer for a specific author. To this end, prototypes of author-reviewer pairs according to their level of proficiency are defined in [8] and [7]. In this study, stu-dents are categorized into “proficient” or “having difficulties”. Then, fuzzy logic is used to evaluate the possible level of satisfaction that the author would have towards the comments of the reviewer. This is performed by assigning positive weights to the pairs proficient, having difficulties and proficient, proficient and negative weights to the pair having difficulties, having difficulties. Next, genetic algorithms are used to find the optimal match among alternative possible mappings. However, this method does not adapt its estimations according to the specific characteristics and preferences of the author as they are determined by the authors’ feedback, but instead it uses a predetermined static fuzzy logic model to determine the optimal pairs.


Therefore, a mechanism that is adaptive to the special characteristics of each user should be sought in order to increase the peer assessment procedure effec-tiveness. This mechanism should automatically match authors to reviewers with a target to increase the usefulness that each author finds in the reviewers’ comments and thus improve quality of the authors’ works in a timely manner.

3 Feed Forward Neural Networks

Artificial neural networks have been successfully applied to various research and industry fields to perform tasks including forecasting, data classification and regression analysis.

The feed forward neural network architecture (FFNN) is one of the most popu-lar forms of artificial neural networks. These networks have been developed as a computational model of the functions and the learning processes of the human brain. Therefore, by mimicking the biological neural networks, they attempt to learn from examples and generalize their findings to an unseen population.

Typically, a FFNN, as described in [14], consists of layers that are composed of several processing elements, called neurons. There are three types of layers, the input, the hidden and the output layers. In this type of network, neuron connec-tions, called synapses, do not form a directed cycle. These synapses exist only between neurons of subsequent layers. Additionally, the information moves only forward, from the input to the output nodes. A FFNN can be considered as an acyclic graph, as shown in figure 1.

Input layer Hidden layer Output layer

Fig. 1 Feed Forward Neural Network Architecture

The output y of kth neuron can be calculated by multiplying its input x with a weight vector w, summing the bias b of the neuron and applying the result to the activation function f, as follows:


)( kk bxwfy +⋅=

The activation function can be either linear or non linear, but its most common form is the logistic sigmoidal function:

)exp(1

1)(

xxf

β−+= , where β is a slope parameter.

During its learning phase, the network is presented with a set of examples which form the network training set. Each example consists of an input vector and the corresponding output vector. The goal of the FFNN training is to minimize a cost function, which is typically defined as the mean square error between its actual and target outputs, by adjusting the network synaptic weights and neuron biases.

A very popular training algorithm is the back-propagation algorithm, proposed in [25] and [26]. According to this algorithm, information is passed forwardly from the input nodes, through the hidden layers, to the output nodes and the error be-tween the desired and the actual response of the network is calculated. Then, this error signal is propagated backwards to the input neurons, and the signal is used to adjust weights and biases of the network. This process is repeated for each example in the training set. As soon as the whole training set has been inserted to the net-work then an epoch elapses. The training set may be inserted to the network several times, therefore many epochs may be needed for the network training to finish.

A popular variation of the back-propagation algorithm is the Levenberg-Marquardt algorithm [13]. This algorithm increases the speed convergence and effectiveness of the network training, since it has been found to be effective in solving non-linear least squares problems, as in the case of minimizing the cost function of a FFNN.

However, a FFNN may end up being overtrained. In this case, the weights and biases of the network are over-adjusted and reflect only the specific characteristics of the training set. In this case the FFNN loses its generalization abilities. This phenomenon, called over-fitting, can be avoided by using in the training process a separate set, called the validation set. At the end of each epoch, the network error is calculated in both the training and validation sets. While the error of the training set is used to adjust the network parameters, the validation set error is only used to determine when to stop the learning process in order to prevent overtraining of the FFNN. More specifically, as soon as the network performance deteriorates on the validation set, meaning that overtraining has probably occurred, training stops and the state of parameters of the previous network epoch is stored. Therefore, the training phase can be terminated by reaching a minimum in the cost function, meeting the performance goal or by detecting that the validation set has produced an increasing mean square error.

To examine the network efficiency over a specific problem, this study uses two typical strategies. The first one is called k-fold repeated random subsampling. Ac-cording to this method, the network is trained k times, using different validation sets that are randomly extracted from the dataset, at each training session. Thus, the accuracy result of the network can be calculated by estimating the mean per-formance of the k networks. The second method uses a data set which is disjoint to


the validation and training sets, called the test set. The test set is used to estimate the generalization ability of a specific network on data different from those used during the training phase.

3.1 Strengths and Limitations of Using FFNNs

Neural networks present various strengths which make them suitable for classifi-cation and prediction tasks. One of their main advantages is that FFNNs are uni-versal function approximators. They can estimate any continuous function to any degree of accuracy [9, 11, 15-17]. As a result, neural networks have the ability to efficiently map nonlinear relationships between their input and output.

Additionally, FFNNs have the ability to generalize on an unseen population. A neural network can learn from examples and correctly predict the output of data that are not included in its training set, even if the training examples contain noisy information. The robustness of neural networks, in the presence of noise in the input data, is one of their most significant advantages [31].

FFNNs have the advantage of being data-driven instead of model-driven, that is, they do not a-priori assume an explicit relationship model among the data, as model-based linear or nonlinear methods do. Instead, the model structure and the model parameters that they use are derived from the actual dataset of the problem.

Moreover, real-world problems are often nonlinear and the relationship among their data is difficult to describe analytically. Usually, the only available informa-tion, regarding these problems, is prior experience, in the form of past data. There-fore, taking into consideration the characteristics of the FFNNs, which include arbitrary function approximation, nonlinearity, generalization capability, it is to be expected that they can be used to predict future events in an efficient manner.

Trained neural networks can quickly make predictions on an input set. This characteristic, along with their high degree of accuracy, makes them suitable for applications where training needs to be made sporadically but predictions should be made in real-time.

Nevertheless, besides their strengths neural networks also present certain limi-tations. Firstly, neural networks usually require some time for training due to the number of iterations needed to achieve their optimal performance. More specifi-cally, while minimizing the cost function during training, they may be trapped in local minima, therefore not achieving the optimal solution. To overcome this, mul-tiple training iterations usually take place and the most efficiently trained network is selected [18].

Another limitation that neural networks present is their dependency on the size and quality of the data used for their training [14]. The more indicative the exam-ples of the problem they are presented with, the more accurate the predictions they are expected to make. In addition, although they can infer a correct solution based on noisy data, they have difficulty in making correct predictions on data which are contradictory to the ones used for their training.

Finally, neural networks are black-box methods. As such, they cannot be ana-lyzed in great detail like linear models and the data relationship that they approach cannot be easily described [1].


4 The Proposed Method

In this section the proposed method is analytically described. The method uses a trained feed forward neural network to estimate the optimal set of reviewers for each author in order to facilitate the peer assessment process to increase its effectiveness.

In order for the proposed method to provide results that adapt to the characteris-tics of each user, peer profiles are firstly constructed. Since an individual may serve either as a reviewer or as an author, two distinct profiles are created for each type of peer user.

Reviewer profiles consist of information about the proficiency of the reviewer, the average strictness that this reviewer has demonstrated in grading past author projects, the average usefulness that the reviewer comments have received and the reviewer’s willingness to participate in the peer assessment procedure. The re-viewer proficiency can be calculated taking into consideration the available data about his achievements in the field. For instance, in the case of students, past aca-demic performance and average grades can be used while in the case of peer as-sessment in the academic sector, the method can use the number of journal articles that the author has published in the past. The element of average strictness refers to the average ratings that the reviewer has provided authors in the past reviews that he has submitted. The average usefulness attribute is calculated by using the prior quality feedback values that the reviewer has received from authors in the past. The usefulness attribute is quantified through a 5-item Likert scale instru-ment which is placed at the end of each comment that the authors receive. More specifically, after reading a comment –made by a specific reviewer – the author is asked to determine how useful he found this comment to be. The usefulness at-tribute may thus receive five possible values which range from 1 (not useful at all) to 5 (very useful). Finally, the reviewer’s willingness to participate in the proce-dure derives from the number of reviews he has completed versus the total number of reviews that he has been assigned with.

Author profiles are constructed based on their proficiency level and the average reviewer grading that they have received. The aforementioned attributes are sum-marized in Table 1.

The first time the algorithm is applied, it randomly creates reviewer-author pairs. As soon as this first reviewing phase is over, authors rate the usefulness of the reviewer comments that they have received. Then, according to the rating that the previously assigned reviewer-author pairs demonstrated during the peer as-sessment procedure, the algorithm adapts itself to calculate the optimal pairs.

At each stage of the peer assessment procedure, the algorithm uses a trained FFNN to estimate the optimal order of reviewers for each author, that is, to pro-vide a list of possible reviewers for each author. This list is calculated based on usefulness rates that the FFNN has estimated that the author would probably as-sign to the comments of each reviewer.

Figure 2 depicts the way the FFNN technique uses author and reviewer profiles as an input to estimate the usefulness level that the author of each pair would sug-gest. Specifically, each input vector is the concatenation of the reviewer profile and the author profile. The FFNN output is the usefulness rate that the input au-thor would probably present.


Table 1 Peer profile attributes

Peer type Level of proficiency in the field

Average strictness

Average usefulness rate

Willingness to participate in the procedure

Average reviewer grading

Author X - - - X

Reviewer X X X X -

Reviewer Profile

Author Profile

FFNN Usefulness

Fig. 2 The use of peer profiles by the FFNN technique

Each time the algorithm is used to generate optimal reviewer-author pairs, it

updates the peer profiles to incorporate recent user behavior. Therefore, as time progresses the method adapts itself to the more detailed user data that have been gathered, and in this way it increases its effectiveness. Figure 3 describes the algo-rithmic steps of the proposed method.

Firstly, for each author, an ordered list named RevList of the estimated pre-ferred reviewers is calculated by matching the author profile to the profile of each reviewer. This list comprises the reviewer profiles and the estimated usefulness rate calculated by the FFNN, as described earlier. Then, the first k reviewers, tak-ing into consideration the estimated usefulness quality feedback attribute, are cop-ied from the preferred reviewer list (RevList) to a set containing the possible reviewers (PosRev) for this author. However, since a reviewer may only comment on a specific predetermined number of peer projects, a number of reviewers, in the PosRev set for this author, might be preoccupied with other assignments and thus be unavailable for the review process. Thus, before selecting any reviewer, the algorithm examines the availability of the reviewers in the PosRev set and re-moves those that cannot review further assignments. Next, if the number of the remaining possible reviewers is positive, it randomly selects one and assigns this reviewer the task to comment on the author’s project. As soon as a reviewer has been matched to an author, his availability in RevAvail is updated; then, he is re-moved from the set of possible reviewers for this author and inserted into the set of selected reviewers (called Rev). However, after examining of the availability of currently selected possible reviewers, the algorithm may detect that there are no available reviewers in the possible reviewer list, PosRev. In that case, it examines the availability status of the next reviewer in the ordered list returned by the FFNN, until an available reviewer is found. The aforementioned procedure is re-peated until all users have been matched to a pre-defined number of reviewers, n.


Fig. 3 Algorithm Description


where RevList is an ordered list of the predicted reviewers for a specific author, PosRev is the set of candidate reviewers, RevAvail is a table indicating reviewer availability, Rev is the set of selected reviewers, k is the initial size of PosRev, b is the index of last examined reviewer in RevList and n is the predefined number of reviewers that an author should have been assigned with.

The procedure of randomly selecting one reviewer among the estimated optimal k ones was chosen to ensure that the algorithm will not end up assigning the same reviewer to the same authors and therefore boost the fairness of the algorithm.

5 Experimental Results

5.1 Method Implementation on e-Learning Data

To examine the effectiveness of the proposed method, this study uses educational peer assessment data, derived from an introductory level e-learning course on “Web Design”. The course is provided by the e-learning team of the Multimedia Technology Laboratory of the National Technical University of Athens [22], through the Moodle open-source LMS platform [23].

The Web Design course consists of seven educational sections and is offered twice a year, in the Spring and Fall semesters. During the seven sections, the edu-cational material of each program is delivered to the students and their knowledge is assessed through testing material which consists of five multiple choice tests, to examine the theoretical knowledge that the students acquired, and seven projects to test the application of this knowledge on practical terms.

The projects require from the students to create a web site which is assessed in terms of functionality, design and technical soundness. Functionality refers to how easy is for a user to find the desired information in a web page, while design to the choice of colors, images and character fonts in the page. The technical soundness of web page refers to how well-written is its source code. For a page to achieve the maximum possible grade, it should excel in all of the aforementioned criteria.

The course level is introductory and it is targeted towards adults of various educational backgrounds, ranging from high-school graduates to master-degree holders. Nevertheless, students are advised to have basic computer and English language skills, since an important part of the material is delivered in English.

Since the Spring 2008 course and at the end of each educational module, stu-dents have been participating in the peer assessment procedure. More specifically, each student is asked to review two randomly selected projects of fellow class-mates and receives the comments of two reviewers. Students grade each others’ projects by filling in a review form with four questions regarding the assessed design of the project, its technical soundness, functionality and overall impression. Therefore, the reviewers are asked to give their opinions about the three web page criteria mentioned above and provide an overall grade.

The grading scale for each question ranges from 1 (negative impression) to 5 (positive impression). As soon as a review form has been filled in, it is made visi-ble to the author who is then asked to evaluate the usefulness of the received


comments. To ensure that students will provide as an objective feedback as possi-ble, peer assessment ratings do not contribute to student grading.

5.2 Method Results

In this section the preliminary experimental results of the method are presented. The dataset consists of 152 reviews, conducted by 16 students during the Spring 2008 semester. The method was implemented using the Matlab R2008a platform environment [21].

To examine network efficiency, we first used 1000-fold repeated random sub-sampling. According to this method, the network was trained 1000 times, using at each training session a validation set which was randomly extracted from the data-set. The validation set was chosen to be the 15% (23 examples) of the dataset, leaving the rest 85% (129 examples) for the training set. The accuracy result of the network was calculated by estimating the average performance of the 1000 net-works. The Mean Absolute Error calculated was 0.7682. This result indicates that the network estimations of author perceived usefulness over a reviewer’s comment are acceptably accurate, since the error does not exceed one usefulness level in a scale of five.

The second strategy, used to estimate network efficiency, uses three disjoint sets, namely the training, validation and test set as described earlier. In this case, the data regarding a single student as an author were used as the test set and the rest were used as the training and validation sets. This strategy was applied on each one of the 16 students. Since student rating criteria may vary, with some stu-dents being stricter than the others, the network outputs were used to determine the optimal reviewer order for each student. To this end, these outputs were sorted in descending order from the better matching reviewer to the least preferred one. Next, the estimated network order was compared against the actual preference order of each author.

Table 2 presents the accuracy results in predicting the first one, two, three, four and five best reviewers that eight indicative students prefer. As one may observe, the proposed method achieved good results as far as student 1 is concerned. It was accurate at 87% of the times in finding the best reviewer for student 1, 83% in predicting the best 2 reviewers, 80% in finding the best 3 and 78% accurate in proposing the best 4 and 5 reviewers, according to the author’s profile. The best reviewers of Students 2, 3 and 8 were also accurately predicted, but at less accu-racy rates. The proposed method was not as accurate in the case of Students 5 and 6 as far as the first two criteria are concerned, but increased its effectiveness in the rest of the criteria. The method was not successful in predicting best 1, 2 and 3 reviewers for Student 4 but presented satisfactory results in criteria 4 and 5. The method failed to predict the correct reviewers for Student 7, as it was correct only in 5% and 3% of the times in criteria 1 and 2 respectively and did not exceed 50% in the rest.

The overall method results were 49%, 54%, 72%, 75% and 76% for the five criteria respectively. These preliminary results indicate that the selection for the best reviewers should be made using the estimated first 3, 4 and 5 reviewers.


Table 2 Indicative examples of the method accuracy

Criteria

Student no. 1 2 3 4 5

#1 87% 83% 80% 78% 78%

#2 64% 70% 75% 76% 78%

#3 31% 83% 76% 72% 69%

#4 51% 45% 37% 60% 60%

#5 29% 26% 93% 88% 84%

#6 21% 25% 68% 64% 64%

#7 5% 3% 49% 48% 49%

#8 72% 75% 75% 76% 74%

6 Discussion

This study proposes a novel method that uses a popular machine learning tech-nique to improve the efficiency of peer assessment procedures. The method at-tempts to match a reviewer to an author in order to maximize the estimated quality feedback that the author provides to the assigned reviewer in the form of the use-fulness that the author finds in the reviewer comments. The preliminary results presented in the study were acquired by applying the proposed method on an e-learning course regarding “Web Design” and seem promising.

By improving the peer assessment procedure, it is expected that the group that evaluates itself will increase the quality of its work. In the case of the e-learning course, a student’s work or project is evaluated in terms of three criteria, function-ality, technical soundness and design. Therefore, a student might have great taste, so, the projects he submits may have nice design, but the student might lack the knowledge of creating efficient source code. So, a reviewer who is familiar with the development of web page code would be the best choice for this author in or-der to help him become more efficient in web developer. The opposite example can also be observed, where a student has the ability to develop a bug-free web site, yet, he does not know how to improve the page aesthetically. Therefore, by matching each author to the best estimated reviewer, each author’s deficiencies should be amended through the personalized peer assessment procedure.

Additionally, the proposed method is expected to facilitate the peer assessment supervisors to automatically and efficiently match author – reviewer pairs. Espe-cially in large populations, matching a reviewer to an author is a process that re-quires a large amount of time from the supervisor. Additionally, sometimes it is difficult to estimate a good reviewer for an author. The proposed method could help reduce the shortcomings of peer assessment, speed up the process and help the population to get more reviews of high quality.

The proposed method highly depends on the user profiles that are gathered and thus the presence of highly detailed data could enable the system to provide better reviewer-author matches. In the e-learning course case study, the profiles that


were used include the user proficiency, average strictness, average usefulness that the reviewer had received in the past -from the authors whose work he has com-mented- and the average grade an author has received in the peer assessment pro-cedure.

In order to increase the effectiveness of the proposed method, more data could be examined. These data could derive from student demographic characteristics, student engagement to the course and progress. Nevertheless, the preliminary re-sults presented in this study seem promising, and can be considered a first step in facilitating the peer assessment procedure.

Additionally, the size of the group might be an important factor for the produc-tion of accurate results. Firstly, a large training set can be available at a short pe-riod, therefore the method can learn from very different student cases. Secondly, applying the peer assessment procedure to a relatively small group might invali-date the anonymity that is essential for peer assessment to succeed. In this case, students get acquainted with each other and so the ratings they provide tend to increase. Finally, a large group, where the competition among its members is high, can provide more comprehensive reviews and in this way, the authors may be pro-vided with more quality feedback.

Furthermore, the proposed method can by applied on other fields, besides e-learning courses. To this end, the only factor which should be changed in the gath-ering of the user profile data which are relevant to the specific process upon which the method needs to be applied. For instance, in the professional sector, the use of peer assessment has already helped towards providing better collaborative pro-jects. Another sector that the proposed method could be applied on is the peer re-view that takes place on the research publications sector. In both cases, user pro-file characteristics might follow the general attribute descriptions of table 1.

As far as machine learning is concerned, FFNNs are universal function ap-proximators, that is, they can estimate any function that can be described analyti-cally. However, they reduce their ability to produce accurate results when the training examples they are fed with contradict each other. In the dataset used in this study, there are examples where students rate their reviewers in an inconsis-tent manner. They sometimes assign a higher grade to a bad rather than a good review, especially at early course sections. This is related to the fact that at the initial stages of the course students are not familiarized with the peer assessment procedure. Later on, they start to provide both better reviews and more reliable quality feedback. Therefore, training the students to participate in peer assessment seems a very important factor of the procedure.

Trained neural networks can quickly make predictions on unseen data. This characteristic, along with their high degree of accuracy, makes them suitable for applications where training needs to be made sporadically but predictions should be made in real-time. However, their training requires a certain amount of time to complete. Moreover, FFNN training might fail, therefore multiple training ses-sions might be needed for the network to be efficient, a fact that can further in-crease the time required for the FFNN training to finish.

Future work includes testing the method on a larger dataset. A large dataset can provide more indicative examples to the network training procedure. Additionally,


the dataset should be firstly preprocessed or transformed before its use, in order to alleviate the social factors that may have influenced the review process. More ma-chine learning techniques can also be tested on the task of matching the optimal reviewers to authors based on user profiling. Another issue which should also be investigated in the future, is whether the proposed method actually benefits au-thors into increasing their performance and quality of final submitted work or not. Finally, the quality feedback, in the form of the usefulness attribute, should be reexamined in the future and the use of a fully objective and unbiased metric may also be considered.

7 Conclusion

This study proposes a method that uses a popular form of machine learning, feed forward neural networks, to determine the optimal reviewers for a specific author, during a peer assessment procedure. The proposed method matches reviewer to author profiles and aims at assigning the work of each author to the reviewer that will make the most useful comments.

Preliminary experimental results on educational e-learning data indicate that the method yields promising results, as the use of neural networks in estimating the optimal 3 to 5 peer reviewers achieved over 72% accuracy. The method may be applied on various types of peer assessment procedures in web-based environ-ments, besides e-learning, where past data regarding the users involved in the process are available.

References

[1] Andrews, R., Diederich, J., Tickle, A.B.: Survey and critique of techniques for ex-tracting rules from trained artificial neural networks. Knowledge-Based Systems 8, 373–389 (1995)

[2] Barak, M., Rafaeli, S.: On-line question-posing and peer-assessment as means for web-based knowledge sharing in learning. International J. Human Computer Stud-ies 61, 84–103 (2004)

[3] Berg van den, I., Admiraal, W., Pilot, A.: Design Principles and Outcomes of Peer Assessment. Stud. in High Education 31, 341–356 (2006)

[4] Chang, T., Chen, Y.: Cooperative learning in E-learning: A peer assessment of stu-dent-centered using consistent fuzzy preference. Expert Systems Appl. 36, 8342–8349 (2009)

[5] Chen, Y.C., Tsai, C.C.: An educational research course facilitated by online peer as-sessment. Innovations Education Teach International 46, 105–117 (2009)

[6] Clark, T., Wright, M.: Reviewing Journal Rankings and Revisiting Peer Reviews: Editorial Perspectives. J. Management Studies 44, 612–621 (2007)

[7] Crespo, R.M., Pardo, A., Pérez, J.P.S., Kloos, C.D.: An Algorithm for Peer Review Matching Using Student Profiles Based on Fuzzy Classification and Genetic Algo-rithms. In: Ali, M., Esposito, F. (eds.) IEA/AIE 2005. LNCS (LNAI), vol. 3533, pp. 685–694. Springer, Heidelberg (2005)


[8] Crespo, R.M., Pardo, A., Kloos, C.D.: An adaptive strategy for peer review. In: Fron-tiers in Education, Savannah, ASEE/IEEE (2004)

[9] Cybenko, G.: Approximation by superpositions of a sigmoidal function. Mathematics Control Signals Syst. 2, 303–314 (1989)

[10] Dannefer, E.F., Henson, L.C., Bierer, S.B., et al.: Peer assessment of professional competence. Med. Educ. 39, 713–722 (2005)

[11] Funahashi, K.I.: On the approximate realization of continuous mappings by neural networks. Neural Networks 2, 183–192 (1989)

[12] Grainger, D.W.: Peer review as professional responsibility: A quality control system only as good as the participants. Biomaterials 28, 5199–5203 (2007)

[13] Hagan, M.T., Menhaj, M.B.: Training feedforward networks with the Marquardt al-gorithm. IEEE Trans. Neural Networks 5, 989–993 (1994)

[14] Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice-Hall, Engle-wood Cliffs (1999)

[15] Hornik, K.: Some new results on neural network approximation. Neural Networks 6, 1069–1072 (1993)

[16] Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Networks 4, 251–257 (1991)

[17] Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are uni-versal approximators. Neural Networks 2, 359–366 (1989)

[18] Iyer, M.S., Rhinehart, R.R.: A method to determine the required number of neural-network training repetitions. IEEE T Neural Networ. 10, 427–432 (1999)

[19] Keely, E., Myers, K., Dojieiji, S., et al.: Peer assessment of outpatient consultation letters – feasibility and satisfaction. BMC Med. Education 22, 7–13 (2007)

[20] Ljungman, A.G., Silen, C.: Examination Involving Students as Peer Examiners. As-sessment & Evaluation in Higher Education 33, 289–300 (2008)

[21] Matlab, Matlab Environment (2008), http://www.mathworks.com/products/matlab/

[22] Medialab, E-Learning Services, Multimedia Technology Laboratory, National Tech-nological University of Athens (2008), http://elearn.medialab.ntua.gr

[23] Moodle, Moodle LMS (2008), http://moodle.org [24] Rowland, F.: The Peer Review Process. Learned Publishing 15, 247–258 (2002) [25] Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by

error propagation. Parallel distributed processing: Explorations in the micro-structure of cognition 1, 318–362 (1986a)

[26] Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986b)

[27] Schroter, S., Tite, L., Hutchings, A., Black, N.: Differences in Review Quality and Recommendations for Publication Between Peer Reviewers Suggested by Authors or by Editors. The Journal of the American Medical Association 295, 314–317 (2006)

[28] Sluijsmans, D.M.A., Prins, F.: A conceptual framework for integrating peer assess-ment in teacher education. Studies in Educational Evaluation 32, 6–22 (2006)

[29] Sluijsmans, D.M.A., Brand-Gruwel, S., Merrienboer, J.J.G.: Peer assessment training in teacher education: effects on performance and perceptions. Assessment and Evaluation in Higher Education 27, 443–454 (2002)

[30] Prins, F.J., Sluijsmans, D.M.A., Kirschner, P.A., Strijbos, J.W.: Formative peer as-sessment in a CSCL environment: A case study. Assessment and Evaluation in Higher Education 30, 417–444 (2002)


[31] Thrun, S.B.: Extracting provably correct rules from artificial neural networks, Tech-nical Report, UMI Order Number: IAI-TR-93-5, University of Bonn, Germany (1994)

[32] Topping, K.J.: Peer Assessment. Theory into Practice 48, 20–27 (2009) [33] Topping, K.: Peer assessment between students in colleges and universities. Review

of Educational Research 68, 249–276 (1998) [34] Tsai, C.C., Lin, S.S.J., Yuan, S.M.: Developing science activities through a net-

worked peer assessment system. Computers & Education 38, 241–252 (2002) [35] Wen, M.L., Tsai, C.-C.: University students’ perceptions of and attitudes toward

(online) peer Assessment. Higher Education 51, 27–44 (2006) [36] Yue, W., Wilson, C.S., Boller, F.: Peer assessment of journal quality in clinical neu-

rology. Journal of the Medical Library Association 95, 70–76 (2007)


Towards Emotion Recognition from Speech: Definition, Problems and the Materials of Research

Christos-Nikolaos Anagnostopoulos and Theodoros Iliou

Abstract. One hundred thirty three (133) sound/speech features extracted from Pitch, Mel Frequency Cepstral Coefficients, Energy and Formants were evaluated in order to create a feature set sufficient to discriminate between seven emotions in acted speech. After the appropriate feature selection, Multilayered Perceptrons were trained for emotion recognition on the basis of a 23-input vector, which pro-vide information about the prosody of the speaker over the entire sentence. Sev-eral experiments were performed and the results are presented analytically. Extra emphasis was given to assess the proposed 23-input vector in a speaker independ-ent framework where speakers are not “known” to the classifier. The proposed feature vector achieved promising results (51%) for speaker independent recogni-tion in seven emotion classes. Moreover, considering the problem of classifying high and low arousal emotions, our classifier reaches 86.8% successful recogni-tion. The second classification model incorporated Support Vector Machine with 35 predictive variables. The latter feature vector achieved promising results (78%) for speaker independent recognition in seven emotion classes. Moreover, consid-ering the problem of classifying high and low arousal emotions, our classifier reaches 100 % successful recognition for high arousal and 87% for low arousal emotions. Beside the combination of speech processing and artificial intelligence techniques, new approaches incorporating linguistic semantics could play a critical role to help computers understand human emotions better.

Keywords: Emotion recognition, speech processing, neural networks.

1 Introduction

Communication is an important capability, not only based on the linguistic part but also based on the emotional part. In the field of human-computer interaction (HCI), emotion recognition from the computer is still a challenging issue, especially when the recognition is based solely on voice, which is the basic mean of human com-munication. In human-computer interaction systems, emotion recognition could Christos-Nikolaos Anagnostopoulos and Theodoros Iliou Cultural Technology and Communication Department University of the Aegean Mytilene, Lesvos Island, GR-81100 e-mail: canag,[email protected]

128 C.-N. Anagnostopoulos and T. Iliou

provide users with improved personalization services by being adaptive to their emotions. Therefore, emotion detection from speech could have many potential ap-plications in order to make the computer more adaptive to the user’s needs.

The most expressive way humans display emotions is through facial expressions and speech characteristics. Recently, the information provided by cameras and mi-crophones enable the computer to “see” and “hear” the user though advanced im-age and sound processing techniques in systems similar to the one presented in Figure 1. Therefore, one of the skills that computer potentially can develop, is the ability to understand the emotional state of the person. Feedback from the user has traditionally been through the keyboard, mouse or through specialized interfaces, such as data gloves, touch screens and biosensors. A possible automated human af-fect analyzer should include all human interactive modalities (sight, sound and even aptics) and moreover it could be able to analyze nonverbal interactive signals as well (facial expressions, body gestures, and physiological reactions).

Another possibility is to include also modules for linguistic processing of the speech. Generally, speech carries linguistic information (i.e. words) that can be somehow associated with emotions (e.g. the word “happy” is correlated to a happy person), along with paralinguistic information which is extracted by speech proc-essing methods. Linguistic information identifies qualitative patterns that the speaker has articulated, while paralinguistic information is usually measured by quantitative features describing variations in the way that the linguistic patterns (i.e words or phrases) are pronounced. These latter includes variations in pitch and intensity without linguistic information and voice quality and are related to spec-tral properties that cannot be correlated to word identity.

A disadvantage in linguistic information relates to the cross cultural diversities of nations. An extremely interesting research is reported in [1]. According to Wierzbicka, bilingual people know well that when they try to describe the same experience in their two different languages they are often forced to present it dif-ferently in each, because emotion words in the two languages may not match. For this reason, she proposes that focus should be given on how the use of a method-ology developed in linguistic semantics known as NSM (Natural Semantic Meta-language) can help us to understand human emotions better.

This could be applied especially in emotions of people from different cultures, but also those of people from our own west-cultural sphere. Therefore, one can easily identify the significant role of semantics in linguistic emotion recognition.

According to Wierzbicka, εmotion terms are always language- and culture-specific and therefore carry with them a particular linguistic and cultural slant. By contrast, cognitive scenarios formulated in simple and universal human concepts can be free of any such slant and therefore can be closer to the reality of emotional experience. Therefore, the use of NSM would allows someone to compare emo-tion concepts across languages and cultures, and thus to elucidate both cultural differences and transcultural similarities. The use of NSM makes it possible to study human emotions from a genuinely cross-linguistic and cross-cultural, as well as a psychological, perspective and thus opens up new possibilities for the scientific understanding of subjectivity and psychological experience.

However, multi-modality architectures (i.e. including speech, image processing and body sensors) would affect significantly the user-friendliness of an emotion

Towards Emotion Recognition from Speech 129

recognition system. As a result, the research in the literature is directed towards to visual interpretation of facial gestures and voice processing as well. The former carries information concerning the facial expressions, while the latter provides useful data related to vocal intonations and characteristics. These two channels (i.e. visual and auditory) are considered as the most important in the human rec-ognition of affective feedback [33].

Microphone

Camera

Voice hue

Head positions

Gestures

Speech content

Dialogue/feedback

Image processing

Sound processing

Microphone

Camera

Voice hue

Head positions

Gestures

Speech content

Dialogue/feedback

Image processing

Sound processing

Fig. 1 Increased user personalization through an emotion recognition software system with two channels.

Relatively few of the existing works combine different modalities into a single

system for human affective state analysis. Examples are the works of Chen et al. [4], [5], De Silva and Ng [6], and Yoshitomi et al. [7], who investigated the effects of a combined detection of facial and vocal expressions of affective states. Almost all other existing studies investigate various human affective states separately in a single-modal analysis framework. Moving to the same direction, in this research, we deal with a single-modality system based on non linguistic speech processing module.

2 Related Work

2.1 Basic Emotions

Proponents of discrete emotion theories, inspired by Darwin, have suggested dif-ferent numbers of so-called basic emotions [8], [9], [10], [11], [12], [13]. Most of these are emotions that play an important role in adapting to frequently occurring and prototypically patterned types of significant events in our life, such as anger,


fear, joy, and sadness, which are relatively frequently experienced. However, the list of emotion does not end here as other emotions are also evident in our life such as anxiety, boredom and neutral just to name a few of them. Scherer [14] proposed the following “working definition of emotion” for which there is in-creasing consensus in the literature. Emotions are episodes of coordinated changes in several components (including at least neurophysiological activation, motor ex-pression, and subjective feeling but possibly also action tendencies and cognitive processes) in response to external or internal events of major significance to the organism. According to the definition mentioned above, social science scholars propose various representations of the human basic emotions. Adopting a theoreti-cally based approach, Fontaine et al. [14] has shown that four dimensions are needed to satisfactorily represent similarities and differences in the meaning of emotions. In order of importance, these four dimensions (or axes in the emotion space) are evaluation-pleasantness, potency-valence, activation-arousal, and un-predictability. From this 4-dimensional space, the research community focuses mainly in the 2-D space of valence and arousal as shown in Figure 2.

According to this two-dimensional view of emotions, large amounts of varia-tion in emotions can be located in a two-dimensional space, with coordinates of valence and arousal [14]. The valence dimension refers to the hedonic quality of an affective experience and ranges from unpleasant to pleasant. The arousal di-mension refers to the perception of arousal associated with the experience, and ranges from very calm to very excited at the other. For the identification of emo-tional expressions using a computer, the basic set of emotion includes joy, anger, disgust, fear, sadness, boredom and neutral. Figure 2 demonstrates this set of seven emotion classes that can also be well separated into two hyper classes, namely high arousal containing anger, happiness, anxiety/fear and low arousal containing neutral, boredom, disgust and sadness. The classification of disgust

High arousal

Low arousal

Positive valence

Anxiety/fear

Anger

Sadness

Joy: Excitement

Disgust

Neutral

Boredom

Negative valence

High arousal

Low arousal

Positive valence

Anxiety/fear

Anger

Sadness

Joy: Excitement

Disgust

Neutral

Boredom

Negative valence

Fig. 2 Emotions of Berlin Database according to valence and arousal.


into low arousal can be challenged, but according to the literature disgust belongs to low arousal emotions [32]. Table 1 highlights the effect of 5 emotions in well known speech parameters as reported in [35].

Table 1 Emotions and Speech Parameters as appear in [35].

Anger Happiness Sadness Fear Disgust Rate Slightly

faster Faster or slower

Slightly slower

Much faster

Very much faster

Pitch Average

Very much higher

Much higher Slightly lower Very much

higher Very much

lower

Pitch Range

Much higher Much wider Slightly nar-

rower Much wider

Slightly wider

Intensity Higher Higher Lower Normal Lower Voice

Quality Breathy,

chest Breathy, blaring

tonic Resonant

Irregular voicing

Grumble chest tone

Pitch Changes

Abrupt on stressed

Smooth, upward inflections

Downward inflections

Normal Wide, down-ward termi-nal inflects

Articula-tion

Tense Normal Slurring Precise Normal

2.2 Databases in Emotion Research

A comprehensive survey of the available emotional speech databases is given in [15]. Reading this survey, it is concluded that automated emotion recognition on these databases cannot achieve a correct classification that exceeds 50% for the four basic emotions. Moreover, the authors in [15] underline that natural (sponta-neous) emotions cannot be easily classified as simulated ones (acted) can be. An-other important finding in their survey is that the most common emotions that are investigated are anger, sadness, happiness, fear, disgust, joy, surprise, and bore-dom (see Table 2).

Table 2 Emotions recorded in the databases surveyed in [15].

Emotions Occurrences in databases Anger 26

Sadness 22 Hapiness 13

Fear 13 Disgust 10

Joy 9 Surprise 6 Boredom 5

Stress 3 Contempt 2

Dissatisfaction 2 Shame, pride, worry, startle, elation, despair, humour 1


Since even a human cannot classify easily natural emotions, it is difficult to ex-pect that machines can offer a higher correct classification. Therefore, for the shake of simplicity, the majority of the databases include acted emotional speech, which is sometimes exaggerated. Professional actors, drama students or normal people are used as actors for the creation of these emotional utterances. Table 3 indicates the types of speech emotion grouped in two classes (acted and spontane-ous) and their frequency of occurrence as reported in [15].

Our research was conducted using the Berlin Emotional Database (EMO-DB) [34]. In Berlin Emotional Database, ten German sentences have been acted in the above seven emotions by ten professional actors, five of them female. The data-base contains 535 phrases representing all the possible emotional instances. In our experiments, always whole utterances were analysed. Table 4 depicts the speaker codes, the utterance codes and the emotions that were acted by the actors. Berlin Emotional Database was selected since it is the most complete and rich speech re-cordings database, which is freely available to the scientific community.

Table 3 Acted and spontaneous speech occurrences in the databases surveyed in [15].

Type of emotion Occurrences Acted 21

Spontaneous 8 50% spontaneous speech/ 50% acted speech 2

Semi-spontaneous 1

Table 4 Speaker codes, Utterances and Emotions in Berlin Database.

Speaker code (gender)

Utterance code/context Emotion

03 (male) a01: Der Lappen liegt auf dem Eisschrank. W (anger)

08 (female) a02: Das will sie am Mittwoch abgeben. L (boredom)

09 (female) a04: Heute abend könnte ich es ihm sagen. E (disgust)

10 (male) a05: Das schwarze Stück Papier befindet sich da oben neben dem Holzstück.

A (anxiety /fear)

11 (male) a07: In sieben Stunden wird es soweit sein. F (happiness)

12 (male) b01: Was sind denn das für Tüten, die da unter dem Tisch stehen?

T (sadness)

13 (female) b02 Sie haben es gerade hochgetragen und jetzt gehen sie wieder runter.

N (neutral)

14 (female) b03: An den Wochenenden bin ich jetzt immer nach Hause gefahren und habe Agnes besucht.

15 (male) b09: Ich will das eben wegbringen und dann mit Karl was trinken gehen.

16 (female) b10: Die wird auf dem Platz sein, wo wir sie immer hinlegen.


3 Sound/Speech Features in Our Experiments

Many diverse acoustic low level and high-level features have been tested and as-sessed in the literature considering their performance. The fundamental frequency (F0), often referred to as the pitch, is one of the most important features for deter-mining emotion in speech [16], [17] [18], [19]. Bäzinger et al. argued that statis-tics related to pitch conveys considerable information about emotional status [20]. However, pitch was also shown to be most gender-dependent feature [21]. If the recognition system ignores this issue a misclassification of utterances might be the consequence. It should be noted, that most of the features that will be described below are gender-dependent to varying degrees.

Beside pitch, other commonly employed features are related to energy, speak-ing rate, formants as well as spectral features such as mel-frequency cepstral coef-ficients (MFCCs). Wang & Guan [22] and [23] used prosodic, Mel- Frequency Cepstral Coefficient (MFCC) and formant frequency features to represent the characteristics of the emotional speech while the facial expressions were repre-sented by Gabor wavelet features. Accordind to Kostoulas et al. [24] an individ-ual’s emotional state is strongly related to pitch and energy while pitch and energy of a speech signal expressing happiness or anger is, usually, higher than those as-sociated with sadness. Mel Frequency Cepstrtal Coefficients have been widely used for speech spectral representation in numerous applications, including speech, speaker, gender and emotion recognition. They are also increasingly find-ing uses in music information retrieval applications such as genre classification and audio similarity measures [25].

In this paper, pitch, energy, MFCCs and Formants were extracted from the speech waveform using Praat [26]. Using a frame length of 100ms, the pitch for each frame was calculated and placed in a vector to correspond to that frame. If the speech is unvoiced the corresponding marker in the pitch vector was set to zero. In addition., for each 5ms frame of speech, the first four standard MFCC pa-rameters were calculated by taking the absolute value of the STFT, warping it to a Mel-frequency scale, taking the DCT of the log-Mel spectrum and returning the first 4 components.

Energy, often referred to as the volume or intensity of the speech, is also known to contain valuable information. Energy provides information that can be used to differentiate sets of emotions, but this measurement alone is not sufficient to dif-ferentiate basic emotions. In the work presented in [27], Scherer concludes that fear, joy, and anger have increased energy level, whereas sadness has low energy level. The choice of the window in short-time speech processing determines the nature of the measurement representation.

A long window w would result in very little changes of the measurement in time whereas the measurement with a short window would not be sufficiently smooth. The energy frame size should be long enough to smooth the contour ap-propriately but short enough to retain the fast energy changes which are common in speech signals and it is suggested that a frame size of 10–20 ms would be ade-quate. Two representative windows are widely used, Rectangular and Hamming. The latter has almost twice the bandwidth of the former, for the same length. Fur-thermore, the attenuation for the Hamming window outside the passband is much


greater. Short-Time energy is a simple short-time speech measurement. It is de-fined as:

2)]()([ mnwmxEn −⋅=∑

where m is the overlapping length of the original signal x and Hamming windowed signal w with length n. For the length of the window a practical choice is 160-320 samples (sample for each 10-20 msec) for sampling frequency 16kHz. For our ex-periments the Hamming window was used, taking samples every 20msecs.

The resonant frequencies produced in the vocal tract are referred to as formant frequencies or formants [28]. Although some studies in automatic recognition have looked at the first two formant frequencies (F1 and F2) [29], [30], the for-mants have not been extensively researched. Scherer [27] refers some observa-tions concerning the formant frequencies along with several emotion classes. For happiness, the mean value of Formant 1 (F1) is decreased while the F1 range is in-creased. For anger, fear, and sadness, the F1 mean is increased while the F1 bandwidth is decreased. F2 mean is decreased for sadness, anger, fear, disgust. In our experiments, the first five formant frequencies will evaluated.

Based on the acoustic features described above and the literature relating to automatic emotion detection from speech, 133 features are calculated based on four prosodic groups which are represented as contours: the pitch, the 12 MFCCs, the energy, and the first 5 formant frequencies. From these 19 contours, we extracted seven statistics: the mean, the standard deviation, the minimum value, the maxi-mum value, the range (max-min) of the original contour and the mean and standard deviation of the contour gradient. All the 133 measurements are shown in Table 5.

Table 5 The 133 sound features. Shaded cells indicate the selected features

Prosodicgroup

ProsodicFeature

Mean Std

tive

Mean ofderivative

Std of deriva-

Max Min Range

1 Pitch 1 2 3 4 5 6 7MFCC1 8 9 10 11 12 13 14MFCC2 15 16 17 18 19 20 21MFCC3 22 23 24 25 26 27 28MFCC4 29 30 31 32 33 34 35MFCC5 36 37 38 39 40 41 42MFCC6 43 44 45 46 47 48 49MFCC7 50 51 52 53 54 55 56MFCC8 57 58 59 60 61 62 63MFCC9 64 65 66 67 68 69 70MFCC10 71 72 73 74 75 76 77MFCC11 78 79 80 81 82 83 84

2

MFCC12 85 86 87 88 89 90 913 E y nerg 92 93 94 95 96 97 98

F1 99 100 101 102 103 104 105F2 106 107 108 109 110 111 112F3 113 114 115 116 117 118 119F4 120 121 122 123 124 125 126

4

F5 127 128 129 130 131 132 133


3.1 Sound Feature Selection

In order to select the most important prosodic features and optimise the classifica-tion time, a subset evaluator was used. Subset evaluators take a subset of features and return a number which measure a quality of the subset and guides the further search. For the selection of the method, the WEKA data mining tool was used [32]. WEKA is a data mining workbench that allows comparison between many different machine learning algorithms. Moreover, WEKA offers many feature se-lection and feature ranking methods, where each method is a combination of fea-ture search and evaluator of currently selected features. Several combinations have been tested in order to assess the feature selection combination that gives the optimum performance for our problem. The feature evaluator and search method (offered in WEKA) that presented the best performance in the data set were CfsSubSetEval and BestFirst.

The Correlation-based Feature Selection Sub Set Evaluator (CfsSUbsetEval) assesses the predictive ability of each feature individually and the degree of re-dundancy among them. It prefers sets of features that are highly correlated with the class but are not correlated with other features. An option iteratively adds at-tributes that have the highest correlation with the class, provided that the set does not already contain an attribute whose correlation with the attribute in question is even higher. Best First feature search method searches the space of attribute sub-sets using the greedy hill-climbing approach and backtracking. Setting the number of consecutive non-improving nodes allowed controls the level of backtracking done. Best first may start with the empty set of attributes and search forward, or start with the full set of attributes and search backward, or start at any point and search in both directions (by considering all possible single attribute additions and deletions at a given point).

The combination of the above mentioned methods proposed 23 from the total of 133 features that were originally extracted. The shaded cells in Table 4 indicate the selected features. It can be seen, that from the first prosodic group (pitch), two features have been selected, namely the mean and min pitch. In addition, 16 fea-tures related to Mel Frequency Cepstral Coefficients were found important, while for the third prosodic group (energy) four features were proposed. Finally, only one formant feature (mean value of F1) was selected.

4 Classification

The first classification was performed using WEKA. The first classifier was an Artificial Neural Network following the multi-layer perceptron architecture. After experimentation with various network topologies, highest accuracy was found us-ing one hidden layer with as many neurons as the sum of inputs (23 features) and outputs (7 emotions). Therefore, the topology was always 23-30-7. The early stop-ping criterion was used based on a validation set consisting of 10% of the training set in the experiments and the number of training epochs was selected to be 200. This ensures that the training process stops when the meansquared error (MSE)


begins to increase on the validation set avoiding the over-fitting problem in this problem. The learning and momentum rate were left to the default setting of WEKA (0.3 and 0.2 respectively). Error backpropagation was used as a training algorithm. Moreover, all neurons follow the sigmoid activation function, while all attributes have been normalized for improved performance of the network.

The second classification was performed using DTREG [2]. The classifier was Support Vector Machine. A Support Vector Machine (SVM) performs classifica-tion by constructing an N-dimensional hyperplane that optimally separates the data into two categories. SVM models are closely related to neural networks. In fact, a SVM model using a sigmoid kernel function is equivalent to a two-layer, feed-forward neural network. Support Vector Machine (SVM) models are a close cousin to classical neural networks. Using a kernel function, SVMs are an alterna-tive training method for polynomial, radial basis function and multi-layer percep-tron classifiers in which the weights of the network are found by solving a quad-ratic programming problem with linear constraints, rather than by solving a non-convex, unconstrained minimization problem.

In the parlance of SVM literature, a predictor variable is called an attribute, and a transformed attribute that is used to define the hyperplane is called a feature. The task of choosing the most suitable representation is known as feature selection. A set of features that describes one case (i.e., a row of predictor values) is called a vector. So the goal of SVM modeling is to find the optimal hyperplane that sepa-rates clusters of vector in such a way that cases with one category of the target variable are on one side of the plane and cases with the other category are on the other size of the plane. The vectors near the hyperplane are the support vectors. After several experiments, the highest accuracy was found with 35 predictor vari-ables, using Radial Basis Function as the SVM kernel function, while the type of SVM model that was C-SVC.

4.1 Speaker Independent Recognition in Berlin Database

Speaker independent emotion recognition in Berlin database with Artificial Neural Network was evaluated averaging the results of five separate experiments. In each experiment, the measurements of a pair of speakers (e.g. speaker 03 and speaker 08), were extracted from the training set and formed the testing set for the classi-fier. The pairs were selected in order to include one male and one female speaker each time. The training and testing sets for the five experiments are shown in Table 6. Table 6a-6e represents the confusion matrices for the 5 experiments. Judging from the main diagonal of the confusion matrix of Table 7, the MLP per-formance in the 7 class recognition problem does not reach high accuracy. Over-all, we are witnessing approximately 51% correct classification in the seven emotions. The 23-feature vector seems that it is not sufficient enough to distin-guish the 7 emotions accurately. On the other hand, observing the results in the two hyper-classes (low and high arousal), the recognition rate reach 88.8% for high arousal and 84.8% for low arousal emotions (see Table 8).


Table 6 Testing and Training set for our experiments

Experiment Testing set Training set

1 10,11,12,15 (male), 09,13,14,16 (female) 03 (male), 08 (female)





Table 6a Experiment 1: evaluation in speakers 03 and 08.

High arousal emotions Low arousal emotions

Anger happiness anxiety/ fear

boredom disgust sadness neutral

Anger 5 (19.2%)

20 (76.9%)

1 (3.8%)

0 (0.0%)

0 (0.0%)

0 (0.0%)

0 (0.0%)

Happiness 0 (0.0%) 15 (83.3%)

0 (0.0%)

0 (0.0%)

3 (16.7%)

0 (0.0%)

0 (0.0%)

anxiety /fear 0 (0.0%) 5 (50.0%)

1 (10.0%)

4 (40.0%)

0 (0.0%)

0 (0.0%)

0 (0.0%)

Boredom 0 (0.0%) 0 (0.0%)

0 (0.0%)

12 (80.0%)

3 (20.0%)

0 (0.0%)

0 (0.0%)

Disgust 0 (0.0%) 1 (100.0%)

0 (0.0%)

0 (0.0%)

0 (0.0%)

0 (0.0%)

0 (0.0%)

Sadness 0 (0.0%) 2 (12.5%)

0 (0.0%)

7 (43.8%)

2 (12.5%)

5 (31.3%)

0 (0.0%)

Neutral 0 (0.0%) 0 (0.0%)

0 (0.0%)

11 (52.4%)

0 (0.0%)

0 (0.0%)

10 (47.6%)

Table 6b Experiment 2: evaluation in speakers 10 and 09.


anger happiness anxiety/ fear boredom disgust sadness neutral

Anger 18 (78.3%) 1 (4.3%) 2 (8.7%) 0 (0.0%)

2 (8.7%) 0 (0.0%)

0 (0.0%)

Happiness 2 (25.0%) 2 (25.0%) 3 (37.5%) 1 (12.5%) 0 (0.0%)

0 (0.0%)

0 (0.0%)

anxiety /fear 0 (0.0%) 0 (0.0%)

8 (88.9%) 1 (11.1%) 0 (0.0%)

0 (0.0%)

0 (0.0%)

Boredom 0 (0.0%) 2 (16.7%) 0 (0.0%) 10 (83.3%)

0 (0.0%)

0 (0.0%)

0 (0.0%)

Disgust 0 (0.0%) 0 (0.0%)

0 (0.0%) 7 (77.8%) 2 (22.2%)

0 (0.0%)

0 (0.0%)

Sadness 0 (0.0%) 0 (0.0%)

0 (0.0%) 2 (28.6%) 0 (0.0%)

3 (42.9%) 2 (28.6%)

Neutral 1 (7.7%) 0 (0.0%)

0 (0.0%) 6 (46.2%) 0 (0.0%)

2 (15.4%) 4 (30.8%)


Table 6c Experiment 3: evaluation in speakers 11 and 13.



Anger 14 (63.6%)

0 (0.0%)

1 (4.5%) 0 (0.0%)

6 (27.3%)

0 (0.0%)

1 (4.5%)

Happiness 6 (33.3%)

2 (11.1%)

2 (11.1%) 0 (0.0%)

6 (33.3%)

0 (0.0%)

2 (11.1%)

anxiety /fear 13 (76.5%)

0 (0.0%)

1 (5.9%) 1 (5.9%)

1 (5.9%)

1 (5.9%)

0 (0.0%)

Boredom 0 (0.0%) 0 (0.0%)

0 (0.0%) 7 (38.9%)

0 (0.0%)

1 (5.6%)

10 (55.6%)

Disgust 2 (20.0%)

0 (0.0%)

0 (0.0%) 1 (10.0%)

7 (70.0%)

0 (0.0%)

0 (0.0%)

Sadness 0 (0.0%) 0 (0.0%)

0 (0.0%) 0 (0.0%)

0 (0.0%)

9 (75.0%)

3 (25.0%)

Neutral 0 (0.0%) 0 (0.0%)

0 (0.0%) 1 (5.6%)

0 (0.0%)

1 (5.6%)

16 (88.9%)

Table 6d Experiment 4: evaluation in speakers 12 and 14.



Anger 14 (50.0%)

2 (7.1%)

12 (42.9%)

0 (0.0%)

0 (0.0%)

0 (0.0%)

0 (0.0%)

Happiness 5 (50.0%)

4 (40.0%)

0 (0.0%) 0 (0.0%)

0 (0.0%)

0 (0.0%)

1 (10.0%)


1 (5.6%)

9 (50.0%) 0 (0.0%)

0 (0.0%)

0 (0.0%)

0 (0.0%)

Boredom 1 (7.7%) 0 (0.0%)

0 (0.0%) 5 (38.5%)

0 (0.0%)

6 (46.2%)

1 (7.7%)

Disgust 3 (30.0%)

0 (0.0%)

0 (0.0%) 1 (10.0%)

4 (40.0%)

0 (0.0%)

2 (20.0%)

Sadness 0 (0.0%) 0 (0.0%)

0 (0.0%) 2 (14.3%)

0 (0.0%)

10 (71.4%)

2 (14.3%)

Neutral 0 (0.0%) 2 (18.2%)

0 (0.0%) 0 (0.0%)

0 (0.0%)

6 (54.5%)

3 (27.3%)

Table 6e Experiment 5: evaluation in speakers 15 and 16.



Anger 25 (92.6%)

0 (0.0%)

2 (7.4%) 0 (0.0%)

0 (0.0%)

0 (0.0%) 0 (0.0%)

Happiness 9 (52.9%)

7 (41.2%)

1 (5.9%) 0 (0.0%)

0 (0.0%)

0 (0.0%) 0 (0.0%)


0 (0.0%)

8 (53.3%) 0 (0.0%)

0 (0.0%)

0 (0.0%) 0 (0.0%)

Boredom 0 (0.0%) 0 (0.0%)

12 (52.2%) 9 (39.1%)

0 (0.0%)

2 (8.7%) 0 (0.0%)

Disgust 1 (6.3%) 8 (50.0%)

2 (12.5%) 2 (12.5%)

3 (18.8%)

0 (0.0%) 0 (0.0%)

Sadness 0 (0.0%) 0 (0.0%)

1 (7.7%) 0 (0.0%)

1 (7.7%)

10 (76.9%) 1 (7.7%)

Neutral 0 (0.0%) 0 (0.0%)

3 (17.6%) 2 (11.8%)

0 (0.0%)

1 (5.9%) 11 (64.7%)


Table 7 Overall performance after the execution of the 5 experiments in the 7 emotion classification framework.



Anger 76 (60.3%) 23 (18.3%)

18 (14.3%) 0 (0.0%) 8 (6.3%)

0 (0.0%) 1 (0.8%)

Happiness 22 (31.0%) 30 (42.3%)

6 (8.5%) 1 (1.4%) 9 (12.7%)

0 (0.0%) 3 (4.2%)

anxiety /fear 28 (40.6%) 6 (8.7%) 27 (39.1%) 6 (8.7%) 1 (1.4%)

1 (1.4%) 0 (0.0%)

Boredom 1 (1.2%) 2 (2.5%) 12 (14.8%) 43 (53.1%)

3 (3.7%)

9 (11.1%) 11 (13.6%)

Disgust 6 (13.0%) 9 (19.6%) 2 (4.3%) 11 (23.9%)

16 (34.8%)

0 (0.0%) 2 (4.3%)

Sadness 0 (0.0%) 2 (3.2%) 1 (1.6%) 11 (17.7%)

3 (4.8%)

37 (59.7%) 8 (12.9%)

neutral 1 (1.3%) 2 (2.5%) 3 (3.8%) 20 (25.0%)

0 (0.0%)

10 (12.5%) 44 (55.0%)

Table 8 Overall performance after the execution of the 5 experiments in the 2 hyper-class classification framework.


High arousal emotions 236 (88.8%) 30 (11.3%)

Low arousal emotions 41 (15.2%) 228 (84.8%)

Table 9 Overall performance after the execution of the 25% Random Sampling Validation Method by SVM.



Anger 84.38% 6.25% 9.37% 0.0% 0.0% 0.0% 0.0%

Happiness 5.55% 88.9% 5.55% 0.0% 0.0% 0.0% 0.0%

anxiety /fear

5.6% 0.0% 94.4% 0.0% 0.0% 0.0% 0.0%

Boredom 5% 5% 10% 55% 15% 0.0% 10%

Disgust 0.0% 27.27% 0.0% 18.18% 54.55% 0 (0.0%) 0.0%

Sadness 0.0% 0.0% 0.0% 0.0% 0.0% 80% 20%

neutral 5% 0.0% 0.0% 10% 0 (0.0%)

5% 80%


Table 10 Overall performance after the execution of the the 25% Random Sampling Validation Method by svm in the 2 hyper-class classification framework.


High arousal emotions 100% 0.0%

Low arousal emotions 13% 87%

For Speaker independent emotion recognition with SVM in Berlin database,

Random Sampling 25% Validation Method was used. Therefore, DTREG selected a random set of data rows (134 rows from 535) and held them out of the model building process. These rows were executed through the generated model and the misclassification error rate was reported. Overall, we are witnessing approxi-mately 78% correct classification in the seven emotions, as presented in Table 9. The 35-feature vector seems that it is sufficient enough to distinguish the 7 emo-tions accurately. On the other hand, observing the results in the two hyper-classes (low and high arousal), the recognition rate reach 100% for high arousal and 87% for low arousal emotions as shown in Table 10.

5 Conclusions – Discussion

In the field of human-computer interaction (HCI), emotion recognition from the computer is still a challenging issue, especially when the recognition is based solely on voice, which is the basic mean of human communication.

Generally the difficulty of the speech emotion recognition problem should be emphasized. In this interdisciplinary field of research, aspects of psychology and physiology are not always considered and literature still offers ideas rather than solutions.

The literature in emotion detection in speech is not very rich and researchers are still debating what features influence the recognition of emotion in speech. There is also considerable uncertainty as to the best algorithm for classifying emo-tion, and which emotions to class together. In Table 10, important issues such as number of features, number of classes and overall performance of similar re-searches are briefly presented.

Concluding this paper, the 23-input vector in ANN, and the 35 feature vector in SVM seems to be quite promising for speaker independent recognition in terms of high and low arousal emotions when tested in Berlin database. Therefore, more sound descriptors like periodicity, speaking rate, voiced/unvoiced time ratio should be further evaluated in a future research.

Although it is impossible to accurately compare recognition accuracies from this study to other due to different data sets used, the feature set implemented in this work seems to be promising for further research. The proposed feature set contains 23 features for ANN and 35 for SVM, which provide information about the prosody of the speaker over the entire sentence. A future work should encom-pass more features for further evaluation. Ultimately, samples of various speech databases could be assessed from the classifier in order to tackle also the problem


of multilingual context. The latter was interestingly addressed in [32]. In addition, the researchers usually deal with elicited and acted emotions in a lab setting from few actors, just like in our case.

It is also the case that assembling databases has not traditionally been consid-ered a high-profile or intellectually challenging area. Good quality recording and large balanced samples tend to be thought of as the basic requirements, with the human side assumed to be relatively straightforward. A little thought shows that in the domain of emotion that cannot be the case. The human race expends a huge proportion of its resources trying (with mixed success) to direct people out of some emotional states and into others. If it were easy to achieve the shifts, there would be no need for whole industries and cities to exist.

As a result, capturing a faithful, detailed record of human emotion as it appears in real action and interaction is an incredibly challenging task. Nevertheless, the payoff could also be tremendous. At root, it is enlisting computers to co-operate in the old task of directing people away from some emotional states and into others. The lure of technologies capable of doing that is enough to keep the enterprise go-ing in spite of the difficulties [3].

However, in the real problem, different individuals reveal their emotions in a diverse degree and manner. There are also many differences between acted and spontaneous speech. Speaker-independent detection of negative emotional states from acted and real-world speech was investigated in [31]. The experimentations demonstrated some important differences on recognizing acted versus non-acted speech, which cause significant drop of performance, for the real-world data.

References

[1] Wierzbicka, A.: Emotions across languages and cultures: Diversity and universals. Cambridge University Press, Cambridge (1999)

[2] Software for Predictive Modelling and Forecasting (2009), http://www.dtreg.com/

[3] Cowie, R., Cowie, E.D., Cox, C.: Beyond emotion archetypes: databases for emotion modelling using neural networks. Neural Networks 18(4), 371–388 (2005)

[4] Chen, L.S., Huang, T.S.: Emotional expressions in audiovisual human computer in-teraction. In: Proc. of International Conference of Multimedia and Expo (ICME), pp. 423–426 (2000)

[5] Chen, L.S., Huang, T.S., Miyasato, T., Nakatsu, R.: Multimodal human emo-tion/expression recognition. In: Proc. of 3rd IEEE International Conference on Auto-matic Face and Gesture Recognition (FG), pp. 396–401 (1998)

[6] De Silva, L.C., Ng, P.C.: Bimodal emotion recognition. In: Proc. of 4th IEEE Interna-tional Conference on Automatic Face and Gesture Recognition (FG), pp. 332–335 (2000)

[7] Yoshitomi, Y., Kim, S., Kawano, T., Kitazoe, T.: Effect of Sensor Fusion for Recog-nition of Emotional States Using Voice, Face Image and Thermal Image of Face. In: Proc. of 9th IEEE International Workshop on Robot and Human Interactive Commu-nication, pp. 178–183 (2000)

[8] Ekman, P.: Universals and Cultural Differences in Facial Expression of Emotion. In: Cole, J.R. (ed.) Motivation. University of Nebraska Press (1972)


[9] Ekman, P.: An Argument for Basic Emotions. Cognition and Emotion 6(3), 169–200 (1972)

[10] Izard, C.E.: The Face of Emotion. Appleton-Century-Crofts, New York (1971) [11] Izard, C.E.: Basic Emotions, Relations among Emotions and Emotion – Cognition

Relations. Psychological Review 99, 561–565 (1992) [12] Tomkins, S.S.: Affect, Imagery, Consciousness: The Positive Affects. Springer, New

York (1962) [13] Tomkins, S.S.: Affect Theory. In: Scherer, K.R., et al. (eds.) Approaches to Emotion.

Erlbaum, Hillsdale (1984) [14] Fontaine, J.R.J., Scherer, K.R., Roesch, E.B., Ellsworth, P.C.: The world of emotions

is not two dimensional. Psychological Sciences 18(12), 1050–1057 (2007) [15] Ververidis, D., Kotropoulos, C.: A State of the Art Review on Emotional Speech Da-

tabases. In: Proc. of the 1st Richmedia Conference, pp. 109–119 (2003) [16] Kim, S., Georgiou, P., Lee, S., Narayanan, S.: Real-time emotion detection system

using speech: Multi-modal fusion of different timescale features. In: Proc. of IEEE Multimedia Signal Processing Workshop, pp. 48–51 (2007)

[17] Morrison, D., Wang, R., De Silva, L.C.: Ensemble methods for spoken emotion rec-ognition in call-centres. Speech Communication 49, 98–112 (2007)

[18] Ang, J., Dhillon, R., Krupski, A., Shriberg, E., Stolcke, A.: Prosody-based automatic detection of annoyance and frustration in human–computer dialog. In: Proc. of the In-ternational Conference on Spoken Language Processing (ICSLP), pp. 2037–2040 (2002)

[19] Petrushin, V.: Emotion recognition in speech signal: experimental study, develop-ment, and application. In: Proc. of the 6th International Conference on Spoken Lan-guage Processing (ICSLP), pp. 222–225 (2000)

[20] Bänziger, T., Scherer, K.R.: The role of intonation in emotional expression. Speech Communication 46, 252–267 (2005)

[21] Abdulla, W.H., Kasabov, N.K.: Improving speech recognition performance through gender separation. In: Proc. of the 5th Biannual Conference on Artificial Neural Net-works and Expert Systems (ANNES), pp. 218–222 (2001)

[22] Wang, Y., Guan, L.: Recognizing human emotion from audiovisual information. In: Proc. of International Conference on Acoustic and Signal Processing (ICASP), pp. 1125–1128 (2005)

[23] Vogt, T., Andre, E.: Improving Automatic Emotion Recognition from Speech via Gender Differentiation. In: Proc. of Language Resources and Evaluation Conference (LREC), pp. 1123–1126 (2006)

[24] Kostoulas, T.P., Fakotakis, N.: A Speaker Dependent Emotion Recognition Frame-work. In: Proc. of Fifth International Symposium on Communication Systems, Net-works and Digital Signal Processing (CSNDSP), pp. 305–309 (2006)

[25] Fingerhut, M.: Music Information Retrieval, or how to search for (and maybe find) music and do away with incipits. In: International Association of Music Libraries, Archives and Documentation Centers (IAML) and the International Association of Sound and Audiovisual Archives (IASA), IAML-IASA Congress (2004)

[26] Boersma, P., Weenik, D.: Praat, a system for doing phonetics by computer, Technical Report 132, Inst Phonetic Sciences, Univ. Amsterdam (2003),

http://www.praat.org [27] Scherer, K.R.: Vocal communication of emotion: a review of research paradigms.

Speech Communication 40, 227–256 (2003)


[28] Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals. Prentice-Hall, Englewood Cliffs (1978)

[29] Lee, C.M., Yildirim, S., Bulut, M., Kazemzadeh, A., Busso, C., Deng, Z., Lee, S., Narayanan, S.: Emotion recognition based on phoneme classes. In: Proc. of the Inter-national Conference on Spoken Language Processing, ICSLP (2004)

[30] Waikato Environment for Knowledge Analysis, WEKA (2006), http://www.cs.waikato.ac.nz/ml/weka/

[31] Kostoulas, T., Ganchev, T., Fakotakis, N.: Study on speaker-independent emotion recognition from speech on real-world data. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds.) HH and HM Interaction. LNCS (LNAI), vol. 5042, pp. 235–242. Springer, Heidelberg (2008)

[32] Hozjan, V., Kacic, Z.: Context-independent multilingual emotion recognition from speech signals. International Journal of Speech Technology 6, 311–320 (2006)

[33] Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction. IEEE Signal Proc-essing Magazine 18, 32–80 (2001)

[34] Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of German emotional speech. In: Proc. of Interspeech, pp. 1515–1520 (2005)

[35] Murray, I.R., Arnott, J.L.: Towards a simulation of emotion in synthetic speech: a re-view of the literature on human vocal emotion. Journal of Acoustic Society Amer-ica 93(2), 1097–1108 (1993)


Health Care Web Information Systems and Personalized Services for Assisting Living of Elderly People at Nursing Homes

Stefanos Nikolidakis, Dimitrios D. Vergados, and Ioannis Anagnostopoulos

Abstract. Nursing homes, where home adaptations -environmental improvements- and assistive technology (AT) are provided, represent an increasingly attractive means of helping senior citizens maintain their independence and enhance the qual-ity of their life. Doctors and specialists are also involved in order to provide elderly people with personalized health care services, in this way improving their treatment and life conditions. The main difference that nursing homes have compared to a typical health home is that the former use new technologies and applications in or-der to collect data from the elderly and create an electronic file for each individual. The basic idea of this paper is to use a web application along with tablet PCs or PDAs in order to collect the personal information and the clinical characteristics of the patient. Moreover, this application helps doctors manage the nursing home and have a better view of the health status of each patient, while it also provides doctors with a report regarding the medical supplies needed at the nursing home and the overall status of the health condition of the population.

1 Introduction

A number of elderly people in the community receive a great deal of informal care from one or more sources, but others, some of whom are very frail, receive little or no informal care, often because they have few or no relatives [1]. Friends and neighbors rarely provide much care, and even if they do, it is usually not efficient. Despite the fact that elderly people have a lot of support from their relatives and friends, many of them need official health services, since most of them live on Stefanos Nikolidakis and Dimitrios D. Vergados University of Piraeus Department of Informatics 80 Karaoli & Dimitriou St., GR-185 34, Piraeus, Greece e-mail: [email protected], [email protected]

Ioannis Anagnostopoulos University of the Aegean Department of Information and Communication Systems Engineering Karlovassi, Samos Island, GR-832 00, Greece e-mail: [email protected]

146 S. Nikolidakis, D.D. Vergados, and I. Anagnostopoulos

their own while less live with an elderly person, who also has health problem. In case that they live with younger people, most of these people feel the need for some formal services and specialized personnel to take care of them. Most of the elderly are known to social services, so it could be presumed that they would re-ceive some formal services [4]. What is not known is the extent to which such a group of people -who are all thought to be in some way at the margin of commu-nity and residential care- would be supported by the statutory sector.

2 Health Care Services and Public Health Information Systems

Health care stands for any service [5], supply, equipment or prescription that peo-ple get in order to help them stay healthy. It includes preventive care (such as the yearly check-up), care for illness or injury, a hospital stay, surgery, visits to a doc-tor’s office, lab tests and X-rays, and even drug prescriptions. An individual may use some other types of services in order to improve their health, like buying over-the-counter medicine or keeping track of their own blood pressure. In this paper though, “health care” stands for those treatments that people receive from a trained and licensed health care practitioner, like their doctor or nurse practitioner. Health care services include services provided by different kinds of trained and li-censed providers. These services usually operate in places like a hospital, a doc-tor’s office or a health clinic. The health plan may include a large number of these providers, but often people may need to pay the costs on their own if the provider is not in the health plan’s network.

The health care providers [3] are entities that provide services approved as medi-cal and other health services in the Medicare law. The medical and other health services presented in detail in the Medicare law are listed below, according to [3]:

1. Physicians’ services 2. Nursing services 3. Services and supplies furnished as an incident to a physician’s profes-

sional services, or services or supplies which are commonly furnished in a physician’s office and commonly rendered without charge or included in a physician’s bill

4. Diagnostic services. Furnished to an individual as an outpatient by a hospital or by others under arrangements with them made by a hospital, and ordinarily furnished by a hospital to its outpatients for the purposes of diagnostic study

5. Outpatient physical therapy services 6. Outpatient health care services 7. Rural health clinic services 8. Federally-qualified health care services 9. Home dialysis supplies and equipment, self-care home dialysis support

services and institutional dialysis services and supplies 10. Antigens prepared by physicians for a particular patient 11. Services furnished by contract to a member of an eligible organization

by a physician assistant or by a nurse practitioner

Health Care Web Information Systems and Personalized Services 147

12. Blood clotting factors for haemophilia patients 13. Prescription drugs used in immunosuppressive therapy furnished to an

individual who receives an organ transplant, but only in case of certain drugs

14. Services furnished by a nurse that would be a physician’s services 15. Certified nurse-midwife services 16. Qualified psychologist services 17. Clinical social workers services 18. Erythropoietin for dialysis patients 19. Diabetes outpatient self-management training screening 20. Surgical dressings, splints, casts, and other devices used for reduction of

fractures and dislocations 21. Durable medical equipment 22. Prosthetic devices (other than dental) which replace all or part of an in-

ternal body organ and including one pair of conventional eyeglasses or contact lenses furnished subsequent to cataract surgery

23. Services of a certified register nurse anaesthetize 24. Screening mammography 25. Screening pap smear and screening pelvic exam

2.1 Health Care Providers

Generally, the definition of health care providers is based on the activities per-formed and not on the titles or labels of the professionals or institutions. The health care providers [3] listed in the Medicare law include:

1. Nursing home 2. Hospitals 3. Critical access hospitals 4. Comprehensive outpatient Rehabilitation facilities 5. Home health agencies and 6. Hospice programs

2.2 Nursing Home Services

A nursing home is an entity that provides skilled nursing care and rehabilitation services to people with illnesses, injuries or functional disabilities. Most facilities serve the elderly people and take care of their needs. However, some facilities pro-vide services to younger individuals with special needs, such as the developmen-tally disabled, mentally ill, and those requiring drug and alcohol rehabilitation. Nursing constitutes independent facilities, although some of them are operated within a hospital or retirement community. The level of care provided by nursing homes has increased significantly over the past decade. Many homes now provide a great part of the nursing care that was previously provided in a hospital. As a result, most nursing homes now focus their attention on rehabilitation, so that their clients


can return to their own homes as soon as possible. Some of the services [6] a nurs-ing home may provide include:

1. Therapies: (Physical therapy, Occupational therapy, Speech therapy, Respi-ratory therapy)

2. Specialty Care: (Alzheimer's treatment, Head trauma, Hematological con-ditions, Mental disease, Neurological diseases, Neuromuscular diseases, Or-thopedic rehabilitation, Pain therapy, Pulmonary disease, Para/quadriplegic impairments, Stroke recovery)

3. Independent Living: Independent living is for people who can take care of themselves and includes residing in one's own home or apartment.

4. Assisted Living: Assisted living provides apartment-style accommodations where services focus on providing assistance with daily living activities.

5. Congregate Care: Congregate care is similar to independent living, but fea-tures a community environment, with one or more meals per day prepared and served in a community dining room. Many other services and amenities may be provided such as transportation, pools, a convenience store, bank, barber/beauty shop, resident laundry, housekeeping, and security.

6. Intermediate Care: Intermediate care is nursing home care for residents needing assistance with activities of daily living, but without significant nursing requirements.

7. Skilled Nursing: Skilled nursing facilities are traditional nursing facilities that provide 24-hour medical nursing care for people with serious illnesses or disabilities.

8. Continuing Care Retirement Communities or Life Care Communities: These communities are planned and operated to provide a continuum of care from independent living through skilled nursing.

9. Sub-acute Care: Sub-acute care is intensive nursing care for patients re-covering from surgery or illness patients receive this care in a nursing home setting.

10. Hospice Care: Hospice care is a combination of facility-based and home care provided to benefit terminally ill patients and support their families.

11. Hospitals: In addition to traditional services, many hospitals offer skilled or sub-acute nursing services either in their facility or on their campus.

12. Respite Care: Respite care is provided on a temporary basis to allow a pri-mary care provider or family member relief for a few hours or days.

13. Adult Day Care: Adult day care programs provide meals and care services in a community setting during the day while a caregiver needs time off or must work.

14. Out-patient Therapy: Many facilities offer the same therapies provided in a nursing home on an out-patient basis. For those choosing a home-based option, out-patient therapy may be a necessary professional service.

15. Home Health Care: Home health care is provided in an individual's home by outside providers and aims to keep the individual functioning at the high-est possible level.


2.3 Nursing Home for Elderly People

It seems that most of the elderly people do not want to go to a nursing home [2], and most of their relatives do not want to institutionalize a loved one. But even though they may be committed to home care and have no intention of utilizing the services of a nursing home, circumstances make institutionalizing a necessity, and not a choice. A lot of times, during a long period of staying hospitalized, the eld-erly people may need a period of specialized care that they can not receive at their own home.

Like so many issues in care provision, the decisions surrounding this process involve practical considerations overlaid with emotional components [2]. Feelings of sadness, relief, guilt, and a sense of failure may all be experienced when there is the need to institutionalize a loved one in a nursing home. As time passes, and the raw emotions of the moment subside, one of the most important areas of comfort is the knowledge that one has the right home for their care recipient.

It is impossible to choose a nursing facility without first determining the type of care the patient needs. This information not only assists elderly people in finding a home that provides the proper level of care, but it also will be a major factor in de-termining the public aid that the care recipient will be eligible for. The three most common types of care for elderly people are personal care, often referred to as custodial, intermediate, and skilled nursing. Custodial care means that residents need help with personal activities such as dressing, bathing, and eating. This type of care is essentially non-medical and is administered by aides rather than trained medical personnel. Residents who need rehabilitative therapy and medications in addition to personal custodial care are candidates for intermediate care. Intermedi-ate care is delivered by licensed therapists and registered or licensed practical nurses. When the level of disability is such that the resident is not able to take care of himself or herself and may even be bedridden, skilled nursing care is needed. this is administered on the orders of an attending physician by licensed medical personnel.

3 Health Care and Information Technology

In recent years, grid computing has evolved as a standards-based approach for the coordinated sharing of distributed and heterogeneous resources to solve large-scale problems in dynamic virtual organizations [7]. Much of the existing devel-opments in grid computing have focused on compute grids and data grids. A compute grid provides distributed computational resources to meet the computa-tional requirements of applications, while a data grid provides seamless access to large amounts of distributed data and storage resources.

Although both healthcare and the use of IT to support the development of effec-tive treatment, delivery and management of healthcare, are top priorities in the health field in many countries, there are many competing areas of investment [8]. The benefits of using even basic IT to provide high quality information and deci-sion support to clinicians and patients are intuitively very significant. However,


progress even in basic IT has been patchy and slow in the healthcare industry. In other words, few high quality, well documented business cases with results and very few large scale IT implementation can be found. There are even fewer cases that demonstrate the benefits of dramatically new IT technologies (like Grid) or in innovative areas of healthcare such as genetics, imaging, or bioinformatics. There-fore in applying for funding and prioritisation of resources to continue developing HealthGrid applications, it is vital to create a clear and highly compelling business case which will act on all the types of healthcare.

In the future, the nursing homes will be an arena for medical treatment and care [9]. Health care will play an important role in achieving this. Due to factors, such as a change in demography and shortage of people working in the health sector, there will be a need to change the way hospitals are organised as well as the way that care is provided. Nursing homes can benefit from a variety of services, which in-clude rehabilitation after operations, policlinic controls and patient training. Possi-ble benefits from medical treatment at nursing homes could result in better utiliza-tion of hospital resources, improved care quality, and improved quality of life for the patients. Thus, emerging technologies, such as the grid, have the potential to fa-cilitate nursing home based health services, and to enable a closer integration of the home environment as a part of a hospital or other cooperating health institutions. In order to fulfil the vision of a more extensive use of nursing home based on health services, there is a need for a new computer infrastructure. The grid will play an es-sential role in realizing nursing home in the future. For nursing home treatment and care, there is a need for a technological infrastructure that integrates the nursing home with the medical/health institutions in question. The concept of virtual or-ganisations is well suited to nursing home based health care. In a nursing home a possible scenario is that during a recovery period of a patient, different virtual or-ganisations will be formed. These organizations will depend on the medical condi-tion of the patients as well as on the different phases of treatment and recovery. Ini-tially a nursing home will be a part of the virtual organisation. As soon as the patient has recovered to some extent, the virtual organisation could include both the nursing home and a primary care institution, with or without the hospital participat-ing. When the patient has fully recovered, and there is no longer the need for a nursing home, the virtual organisation will cease to exist. It is not unrealistic to an-ticipate that in the future -and for certain diseases- the healthcare services will be directly provided to the nursing home from nurses and doctors abroad.

But, which grid functionality is needed in order to support medical care and treatment at nursing home in an optimal way? Running a nursing home requires collaboration and information exchange among the nursing homes that constitute the virtual organisation. Speaking in terms of computational, information and col-laboration grids, as a way to categorise functionality, it is questionable whether there will be a need for vast amounts of computational power. But there will be a need to store rather large amounts of data, and functionality for collaboration. There is a need to perform consultations, exchange data and information such as monitoring data, remote control of medical equipment and virtual visits to the patients at nursing home. The information that is related to a particular patient is today scattered around in different health institutions. Common ontologies are


important in order to combine this information. An infrastructure for virtual hospi-tals and nursing home care has to meet security requirements for different levels of quality of service and facilitate interoperability in dynamically formed virtual organisations. Today one may observe different non grid initiatives addressing the need for the nursing home based on computing infrastructure. The benefits of hav-ing a common standard grid for nursing home based healthcare in the future, is easy the deployment and operability.

4 Improve Health Care at Nursing Home – Our Approach

The Web has been formed to be an integral part of numerous applications in which a user interacts with many entities that may be a service provider, a product seller, a friend or a colleague. Contents and services are available at different sources and places. Based on these, Web Applications have to combine all available knowl-edge in order to offer personalized and user-friendly services. Thus, one of the main goals of the Semantic Web is to enable applications that could offer to the end users high quality health care services that will take advantage from electroni-cally stored information. For this purpose, it is important and vital to propose and use some specific techniques in order to mine this data for actionable knowledge. Also discovered knowledge can effectively be used to enhance the users' Web ex-perience. However, it is important that the people, that will use these techniques, will also take into consideration the large size and the heterogeneous nature of the stored data, as well as the dynamic nature of user interactions with the Web.

Personalization is used more and more often in several areas of interactive mul-timedia, mainly in web applications. This is caused by the need to adapt the content and presentation style to the preferences of a given user or set of users in order to offer them better services. The health and personal care that a resident receives will be based on the individual’s needs. In our work we assume that based on the health conditions of the patients that ender the nursing home the relevant health care ser-vices are adopted and applied to them. For example if the health condition of pa-tient is critical then the relevant picture above the room (figure 4.2) turns into black. This helps the doctors to categorise the patient and offer personalized ser-vices that best fit to their needs. Moreover, we assume that there is a number of nursing homes that are cooperating among each other and through our software the doctors can add a patient to the best proper home in order to provide better treat-ment to the specific patient. Moreover, the use of a common database helps the doctors to view the patients’ medical history and provide to them better health care treatment and health care services.

As it has already been stated it is important for elderly people to have some trained personnel taking care of them and thus helping them have better living conditions. The nursing home for the elderly can provide these services and help them with their daily activities. However, there are several ways to improve the services provided at nursing homes so that elderly people will receive better health care which will focus on their individual problems. Consequently, it would be use-ful to know the problems that elderly people face. In fact a good idea is to keep a file for each one, which will contain the personal information and the clinical


characteristics that the personnel at the nursing home should be aware of, in order to provide to elderly people the proper services. It is also important that different nursing homes should be able to cooperate [7] in order to enable the transfer of senior citizens from one place to the other. For example, an elderly person who faces orthopaedic problems but currently stays at a nursing home which does not specialize in this kind of disorders, is vital to be transferred to a specialized home in order to receive better treatment [8].

Our approach is to use a web application along with a tablet PC or PDA in or-der to collect the personal information and the clinical characteristics of the people in the nursing home and to monitor their health status. It is important to create such a file for each elderly person in order to have a better view of their clinic his-tory and in this way provide them with better treatment.

In order to reach the above target [9], the nursing home should have some spe-cial technical equipment. Firstly a wireless connection at home and tablet PCs or PDAs is necessary. The personnel of the institute should also be trained to use the tablet PCs or PDAs as well as the web application. This web application will help doctors collect the information of the patients in a timely manner, and monitor their health status in order to have a better view of their needs. The most important functionality of this application is that it will be able to generate a report with the needs of the nursing people in medical supplies including the health status of nurs-ing people at the nursing home.

Fig. 1 The main page where the doctors can select a nursing home among the cooperating ones


We assume that we have a list of cooperating nursing homes which are special-ized in specific types of health care services and a web application that can man-age these nursing homes and is used in order to exchange the patients from one home to the other according to their needs. This means that the doctors can select a nursing home among some cooperating nursing homes and can then start a num-ber of actions for the management of the elderly people (Fig. 1). The doctors can add an elderly person to a specific room and bed by clicking on the circles above the doors as depicted in Fig. 2. The circles above the doors represent the beds, while the doors represent the rooms. They can also view the status of an elderly person in a room by clicking on the door they prefer.

Fig. 2 The page where the doctors can add an elderly people to a specific room and bed or view the status of the nursing home


Fig. 3a The form in which the doctors can add the general information of the elderly people and information about their health status


Fig. 3b The form in which the doctors can add the general information of the elderly people and information about their health status

In case that the doctors want to add an elderly person at the home, they have to press the relevant button, the cycles above the doors, (Fig. 2) that are driven to an-other page with a form that contains the personal information of the people and their clinical characteristics. If they have already added some person in a bed, they can add or change the clinical characteristics of that elderly person and give some medical supplies. As soon as the doctors have entered all the necessary informa-tion for that person, they have to press the submit button and the person is added in the home in the bed they have selected. As a result, in case that the elderly per-son has a good health condition the circle above the doors will be replaced by a white icon. In case that the elderly person has a problem it will be replaced by a grey icon and if the elderly person needs special care it will be replaced by a black icon. This can help the doctors acquire a better view of the condition of the elderly people in the nursing home and monitor the health status of the nursing home.

Prior to their hospitalization, the doctor creates the personal record file of each patient. Doctors use a tablet PC or a PDA, and fill in a form designed to store and save this information in a database. The form is illustrated in Fig. 3a and 3b.


In the option <add disease> the user can choose several diseases predefined from a list. According to the category <disease> that has been chosen, the relative diseases appear. Similarly, according to the disease relevant information regarding the necessary medication treatment appears. For example, if the doctors choose diseases of blood the deseases of iron deficiency anemia and coagulation factor deficiency will appear. Then, if the doctors choose iron deficiency anemia, the medication legofer oral sol 800 mg will appear.

In case that the doctors need a more specific view of the elderly people located in a room, they can press into the doors and get their name and a photo that the doctors or a nurse have captured for them (Fig. 4). The doctors can see the patients of this room and they can select the elderly person they want and view the per-son’s details and photo. Moreover, they can change the general information of the elderly person, for example the name the residence, and they can also view and change the clinical status of this person and add a disease and medical treatment and a description of the health status of the specific elderly person.

Fig. 4 Getting the clinical view of a specific patient

Doctors can also search for personalized information in respect to the health

status of each elderly person, as shown in Fig. 5. The results provided are the age, the room, the bed and the special needs of the elderly people. Moreover the doc-tors can see the list of the diseases that the elderly people suffer from.

Finally, the proposed application provides a report with the supplies and the equipment needed for the nursing home (Fig. 6). Doctors and physicians can see how many people are institutionalized in the nursing home, as well as what kind of diseases have been recorded and need to be supervised by the doctors. More-over, the medical staff can see the total needs in supplies for the nursing home, helping in this way in managerial issues.


Fig. 5 The page where the doctors can search for elderly people


Fig. 6 The report page of the nursing home needs

5 Ethical Issues on Health Care Services

The electronic health, or e-health, enhances the communication between patients and doctors. It also provides education through online resources, as well as infor-mation sharing irrespective of their location [10]. This implies a need for strict confidentiality and enforced protection of privacy [11]. Privacy is in fact recog-nized as a fundamental human right, at least in Europe. Public authorities are sharply aware of these repercussions, and they are putting considerable effort into privacy protection legislation.

Health information includes [12] information for staying well, preventing and managing disease, as well as making other decisions related to health and health


care. It includes information [13] for making decisions about health products and health services. It may be in the form of data, text, audio, and/or video. It may in-volve enhancements through programming and interactivity.

Health products [12] include drugs, medical devices, and other goods used to di-agnose and treat illnesses or injuries or to maintain health. Health products include both drugs and medical devices subject to regulatory approval by agencies such as the U.S. Food and Drug Administration or U.K. Medicines Control Agency and vi-tamin, herbal, or other nutritional supplements and other products not subject to such regulatory oversight.

Health services [13] include specific, personal medical care or advice; man-agement of medical records; communication between health care providers and/or patients and health plans or insurers, or health care facilities regarding treatment decisions, claims, billing for services, etc.; and other services provided to support health care.

There is an appropriate concern about the proper treatment of sensitive data. To better illustrate the problem of the privacy of personal data we will use the follow-ing scenario. A patient with acute abdominal pain is admitted into the Emergency Department of a hospital. The patient is assigned to a doctor that will perform the Acute Abdominal Pain Diagnosis procedure. The diagnosis procedure requires the doctor to access the patient history, then to carry out a physical exam, and finally to ask for some lab and imaging exams. Optionally, the doctor can ask the opinion of one or more colleagues, depending on the nature of the patient’s symptoms. The basic assumption is that the medical records of the patients are stored in a da-tabase and are accessible from any computer in the hospital. However, since the records contain sensitive information, the medical staff of the hospital should have specific restrictions on accessing the records. For instance, a doctor can only ac-cess records of patients assigned to his ward. Such a restriction requires that a pa-tient is admitted to the ward, possibly requiring that the patient is physically pre-sent in the ward and that access can only be made from terminals in the ward. However, when a doctor from another ward needs to access a patient record the above mechanism will not allow the access unless authorization is provided, for example, by the first doctor for a limited time.

The above is a typical procedure that is usually followed, so there is the need for an access control mechanism that will be used to protect the privacy of the pa-tient records. It is natural to use roles to reflect the various responsibilities in or-ganizations. Privacy Enhancing Technologies (PETs) are fairly new (the concept has only been around since the ‘90s), and have been extensively researched in both the USA and in Europe. In healthcare, PETs are mainly used for protection of the privacy of persons involved in medical data collection. The goal of these PETs is to guarantee anonymity of data subjects while making information available for clinical practice and research. The use of such techniques in healthcare has been demonstrated in several research projects and solutions that are already commer-cially deployed, in clinical trials, disease studies, for the exchange of research data, for the daily handling of sensitive data, etc. PETs such as anonymisation have even already reached the first steps that lead to standardization.


As we described earlier, trust is fundamental to healthcare. Patients rely on healthcare providers to keep their personal information confidential, to provide ac-curate and appropriate information about their conditions and possible treatments, and to recommend the therapy they believe to be in the patient’s best interest.

In response to the ever-growing public scrutiny of the Internet health arena, sev-eral organizations have championed e-health ethics initiatives [14]. These include:

1. Health On the Net (HON), Code of Conduct (www.hon.ch/HONcode/ Conduct.html)

2. American Medical Association, Guidelines for Medical and Health In-formation Sites on the Internet

3. Health Internet Ethics (Hi-Ethics), Ethical Principles for Offering Internet Health Services to Consumers (www.hiethics.org/Principles/index.asp)

4. Internet Healthcare Coalition, e-Health Code of Ethics (www. ihealthcoa-lition.org/ethics/ethics.html)

5. URAC “Health Web Site Standards” (www.urac.org/documents/ HealthWebSitev1-0Standards040122.pdf)

The goals of these initiatives and organizations are to draft ethical guidelines for creating credible and trustworthy health information and services on the Inter-net. According to [14] a summary on e-Health Code of Ethics is the following:

1. Candor: Disclose information that, if known by consumers, would likely affect their understanding or use of the site, or purchase or use of a prod-uct or service.

2. Honesty: Be truthful and not deceptive. People who seek health informa-tion on the Internet need to know that products or services are described truthfully and that information they receive is presented in clearly.

3. Quality: To make decisions about their health care, people need and have the right to expect that sites will provide accurate, well-supported infor-mation and products and services of high quality.

4. Informed Consent: Respect users’ right to determine whether or how their personal data may be collected, used, or shared. People who use the Internet for health-related reasons have the right to be informed that per-sonal data may be gathered, and to choose whether they will allow their personal data to be collected and whether they will allow it to be used or shared. They have a right to be able to choose, consent, and control when and how they actively engage in a commercial relationship.

5. Privacy: Respect the obligation to protect users’ privacy. People who use the Internet for health-related reasons have the right to expect that per-sonal data they provide will be kept confidential.

6. Professionalism in Online Healthcare: Respect fundamental ethical obli-gations to patients and clients. Inform and educate patients and clients about the limitations of online healthcare.

7. Responsible Partnering: Ensure that organizations and sites with which they affiliate are trustworthy. People need to be confident that organiza-tions and individuals who operate on the Internet undertake to partner only with trustworthy individuals or organizations.


8. Accountability: Provide meaningful opportunity for users to give feed-back to the site. People need to be confident that organizations and indi-viduals that provide health information, products, or services on the Internet take users’ concerns seriously and that sites make good faith ef-forts to ensure that their practices are ethically sound.

6 Conclusions – Future Work

There is a growing demand for health care services, which will be provided at nursing homes, particularly for the elderly and chronically ill people. The nursing homes and the trained personnel, who work there, should provide health care ser-vices that help elderly people to recover and improve their quality of life. The medical staff should also provide them with secure and supervised health care ser-vices. New technologies make this feasible and affordable. In this paper, we intro-duce a web application that can be used in nursing homes, in order to manage the health care services that are provided to the elderly people and support different types of health services according to the demands of the senior citizens. The doc-tors can use our application through PDAs or tablet PCs, in order to collect both personal and clinical information creating in parallel a personalized file record for the hospitalized persons. The important part of the application is the monitoring tool of the health status of the people in the nursing homes that helps doctors have a better view of the health status at the nursing home. This application can also generate a total report with the needs of the nursing home in respect to medical supplies as well as the demographics and population status of the nursing home. Last but not least is the exchange of information among different nursing homes and the use of this application in order to exchange and process the clinical data of the elderly people from one home to the other. As far as future considerations are concerned, the expansion of the use of applications and internet technologies for personalized health care services will help elderly people acquire better treatment and life conditions.

References

[1] Lansley, P., McCreadie, C., Tinker, A.: Can adapting the homes of older people and providing assistive technology pay its way? Age and Ageing, Oxford Journals 33(6), 571–576 (2004)

[2] Choosing a Nursing Home: A Caregiver’s Guide, National Family Caregivers Associa-tion (NFCA), 00/896-3650, http://www.thefamilycaregiver.org/pdfs/ NursHomeChecklist.pdf

[3] HIPAA workgroups, Information Access Management, Health Insurance Portability and Accountability Act, Policy Memorandum, Chapter 1: Entity Status, Health Care Providers (2004), http://www.hipaa.org/

[4] Allen, I., Hogg, D., Peace, S.: Elderly People: Choice, Participation and Satisfaction. Policy Studies Institute (PSI), London (1992)


[5] Consumer Guide to Health Care Coverage, Consumer Affairs & Business Regulation (OCABR), http://www.mass.gov/

[6] What is a Nursing Home? Nelson & Wallery Ltd., http://www.nursinghomeinfo.com/nhserve.html

[7] Foster, I., Kesselman, C., Tuecke, S.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International Journal of High Performance Computing Appli-cations 15(3), 200–222 (2001)

[8] Dean, K., Lloyd, S.: The Healthgrid White Paper. In: Proceedings of Healthgrid 2005, From Grid to Healthgrid, vol. 112, pp. 18–21. IOS Press, Amsterdam (2005)

[9] Burkow, T.M., Bakkevoll, P.A.: The Grid as an Enabler for Home Based Healthcare Services. In: Engelbrecht, R., et al. (eds.) Proceedings of MIE 2005, Connecting Medical Informatics and Bio-Informatics, ENMI, pp. 1305–1310 (2005)

[10] Eysenbach, G.: What is e-health? J. Med. Internet Res. 3(2), e20 (2001) [11] De Moor, G., Claerhout, B.: From grid to healthgrid: Confidentiality and ethical is-

sues. HealthGrid White Paper, ch. 8 [12] Code of Ethics,

http://www.advamed.org/MemberPortal/About/code/ [13] Ahmad, R.: eHealth Code of Ethics (1998) [14] Mack, J.: Beyond HIPAA: Ethics in the e-Health Arena. Healthcare Executive (2004)


Introducing Context-Awareness and Adaptation in Telemedicine Systems

Charalampos Doukas, Ilias Maglogiannis, and Kostas Karpouzis

Abstract. Proper coding and transmission of medical and physiological data is a crucial issue for the effective deployment and performance of telemedicine ser-vices. This chapter presents a platform for performing proper medical content adap-tation based on context awareness. Sensors are used in order to determine the status of a patient being monitored through a medical network. Additional contextual in-formation regarding the patient’s environment (e.g., location, data transmission de-vice and underlying network conditions, etc.) is represented through an ontological knowledge base model. Rule-based evaluation determines proper content (i.e., biosignals, medical video and audio) coding and transmission of medical data, in order to optimize the telemedicine process. The paper discusses the design of the ontological model and provides an initial assessment.

1 Introduction

A number of telemedicine applications exist nowadays, providing remote medical action systems (e.g., remote surgery systems), patient remote telemonitoring fa-cilities (e.g., homecare of chronic disease patients), and transmission of medical content for remote assessment ([[1]]-[5]). Such platforms have been proved sig-nificant tools for the optimization of patient treatment offering better possibilities for managing chronic care, controlling health delivery costs and increasing quality of life and quality of health services in underserved populations. Collaborative ap-plications that allow the exchange of medical content (e.g., a patient health record) between medical experts for educational purposes or for assessment assistance are also considered of great significance ([6]-[8]). Due to the remote locations of the involved actuators, a network infrastructure (wired and/or wireless) is needed to enable the transmission of the medical data. The majority of the latter data is

Charalampos Doukas University of the Aegean, Department of Information & Communication Systems Engineering, Greece

Ilias Maglogiannis University of Central Greece, Department of Biomedical Informatics Lamia, Greece

Kostas Karpouzis Image, Video and Multimedia Systems Lab, National Technical University of Athens, Greece

164 C. Doukas, I. Maglogiannis, and K. Karpouzis

usually medical images and/or medical video related to the patient. Thus, tele-medicine systems cannot always perform in a successful and efficient manner; Issues, like large data volumes (e.g., video sequences or high quality medical im-ages), unnecessary data transmission occurrence and limited network resources can cause inefficient usage of such systems ([9], [10]). In addition, wired and/ or wireless network infrastructures often fail to deliver the required quality of service (e.g., bandwidth requirements, minimum delay and jitter requirements) due to network congestion and/or limited network resources. Appropriate content coding techniques (e.g., video and image compression) have been introduced in order to assess such issues ([11]-[13]), however the latter are highly associated to specific content type and cannot be applied in general. Additionally, they do not consider the underlying network status for appropriate coding and still cannot re-solve the case of unnecessary data transmission. Scalable coding and context-aware medical networks can overcome the aforementioned issues, through performing appropriate content adaptation.

The realization and integration of Semantic Medical Devices can allow:

• To develop solutions for the realization of smart hospitals. • To provide mobile access to the patient’s Electronic Health Record

(EHR). • To enable the medical devices to send parts of the EHR (i.e. measure-

ment results), alerts, status information etc. to the PDA of the physician. • To develop medical devices that inherently support the interoperability

among each other. • To enable medical devices to send alerts (i.e. SMS) to the handheld de-

vices (i.e. mobile, pager) of the caregivers of the patients, if something goes wrong with the patient.

• To develop solutions that provides pervasive healthcare services (every-where, anytime) to the patients, whether staying at home or mobile.

• To develop solutions for ambient assistant living for the elderly patients (i.e. medication reminders, cognitive assistance etc.)

In addition, policy and rule-based mechanisms can provide better adaptivity of medical networks: For example, there is a need to adapt the frequency of meas-urements on a sensor depending on the activity and clinical condition of the pa-tient. This enables optimizing power consumption whilst ensuring that important episodes are not missed. Similarly, the use of variable thresholds for transmitting sensor readings reduces the need for communication and thus power consumption. Typically, sensor configuration may also change depending on the user’s context, e.g., location, current activity and medical history. Physiological parameters such as heart rate thresholds then need to be configured and customized accordingly. Policy-based techniques have been used for over a decade in network and systems management in order to define how the system should adapt in response to events such as failures, changes of context or changes in requirements. By specifying the policies (i.e., what actions should be performed in response to an event) declara-tively and separately from the implementation of the actions, it is possible to dy-namically change the adaptation directives without changing the implementation

Introducing Context-Awareness and Adaptation in Telemedicine Systems 165

or interrupting the functioning of the device. Thus, policy-based mechanisms pro-vide feedback control over the system and a constrained form of programmability.

This chapter presents a context aware medical content adaptation platform that utilizes semantic representation of the content and the context. Using proper rea-soning techniques, content adaptation is performed; medical image and video transmission only when determined necessary and encode the transmitted data properly according to the network availability and quality, the user preferences and the patient status. The framework’s architecture is open and does not depend on the monitoring applications used, the underlying networks or any other issues regarding the telemedicine system used. The rest of the paper is organized as fol-lows: Section 2 presents the notion of context awareness in telemedicine platforms as found in literature and Section 3 discusses design issues in context-aware medi-cal networks. Section 4 provides information on how context awareness can be achieved, whereas Section 5 discusses context representation issues. Section 6 de-scribes content adaptation techniques and Section 7 provides information regard-ing the reasoning scheme based on semantic rules for the content adaptation deci-sion. Section 8 presents the proposed platform architecture and Section 9 concludes the chapter.

2 Related Work

Context-awareness has been around for more than six years and a lot has been written on this concept. There are several different applications and application frameworks for modeling and evaluating context but only a few in the domain of healthcare and telemedicine. JCAF [40] is built to operate using a network ap-proach wherein different sensory, control and output devices are connected in a peer-to-peer fashion. Entities such as locations, persons or items have their own context, in which context items can be placed. Entities can request and set context items, and/or subscribe to context changes. An example usage described is the context-aware interactive hospital bed [40]. A touch screen computer attached to a patient’s bed uses context, and adjusts its display on context changes, effectively interacting with the environment. Based on proximity, entities and users are iden-tified and authenticated such that a different interface can be shown to surgeons, to nurses, or other personnel. The experimental setup is able to detect RFID chips on medicine trays, and match the retrieved information with known entities within the infrastructure, such that it is able to distinguish between several physical objects. Using the beds context and patient information, the bed is able to tell whether or not the medicines on the medicine tray are actually prescribed to the patient or are misplaced by a nurse, potentially becoming a health risk to the patient. As an al-ternative, [41] describes a service-based infrastructure. The chapter poses the posi-tion that “to greatly simplify the task of creating and maintaining context-aware systems, we should shift as much of the weight of context-aware computing onto network-accessible middleware infrastructures”. Although the do not cover pri-vacy in the application infrastructure they recognize that regarding sensory input “if it were processed in a context infrastructure, it is likely that the interactivity would be stilted due to network latency”. Biegel and Cahill [42] describe a model


designed for mobile context-aware applications based on ubiquitous computing. Their approach, called sentient object model, describes a network of sensors, ac-tuators and services which run independently but interact in an ad hoc setup.

All the aforementioned works present general frameworks for medical context modeling and utilize the latter for the provision of specific health services. To our best knowledge there is no other framework in the literature that exploits context awareness for proper medical content adaptation in telemedicine.

3 Design Issues in Context-Aware Medical Networks

The goal of research into context-awareness in clinical work is to provide a concep-tual and technical framework, which can help application programmers create context-aware clinical computer systems. Such a framework should enable the pro-grammer to design, develop, and deploy application-specific context-awareness features that are required in specific usage settings, while it automatically supports aspects of context-awareness, which are common across applications. This ap-proach is similar to other frameworks and toolkits supporting the development of context-aware application, like the Context Toolkit [43]. Requirements for context awareness systems and/or frameworks have been widely discussed and described (see e.g. [43], [44], [45], [46], [47]). Context aware medical applications introduce however additional special requirements; In a hospital there are a wide range of clinical computer systems in use, and new systems are installed and removed on a regular basis. Furthermore, many clinicians (typical research active doctors) build their own applications, such as quality databases supporting a specific clinical ex-periment. In order to make such applications context-aware there is a need for a stable infrastructure that can be accessed by these applications, and there is a need for a programming interface used by the developers of such applications. The basic design principle in a context-awareness framework for medical purposes is there-fore to divide it into two parts. One part supports the deployment of ’context ser-vices’, which are robust, scalable, flexible adaptable, extensible, etc. Such services run independently of the applications supplying or using context information. The other part enables developers of context-aware applications to represent, acquire, handle, store, and use context information.

Considering the aforementioned, the main design requirements for context-aware medical networks can be summarized into the following:

Distributed and Cooperating Services: Gathering and applying context information is often tied to specific spaces or environments dedicated to a specific purpose. For example, using a context-aware computer system to aid and guide a surgeon is highly dependent on accurate and detailed context information about things going on in the operating room. Therefore, a context-awareness infrastructure should be distributed and loosely coupled, while maintaining ways of cooperating in a peer-to-peer fashion.

Security and Privacy: Clinical data about patients are important context data for clinical applications, and such data should be handled secure and its privacy re-spected. For example, the hospital bed uses information about the treatment of the


patient as context information, enabling it to adjust itself to the patient. Hence, context data should be protected, subject to access control, and not revealed to un-authorized clients. Therefore, the context services should embed an access control mechanism. Furthermore, it is important to know the validity of clients delivering context data.

Lookup and Discovery: Context-aware clinical application will continuously enter and leave the hospital, e.g. running on mobile equipment or being deployed as new applications. Such clients should be able to locate and connect to relevant context services in the infrastructure. Services are therefore required to register at Lookup and Discovery services and reveal what they can do.

Extensibility: Clinical applications using new context information and acquisition methods will constantly be deployed in treatment facilities. Therefore, a context-awareness infrastructure should be extensible in several ways. First, it should be possible to deploy, modify, and remove context services. Second, the infrastruc-ture should support evolvement of supported types of context by dynamically load context definitions, functionality, and acquisition mechanisms, like new context sensors.

4 Enabling Context Awareness

Context awareness refers to the ability of systems to react based on their environ-ment. Devices and computer systems may have information about the circum-stances under which they are able to operate and based on rules, or an intelligent stimulus, react accordingly. The term context-awareness in ubiquitous computing was introduced by Schilit [14], [15]. Context aware devices may also try to make assumptions about the user's current situation. Dey defines context as "any infor-mation that can be used to characterize the situation of entities." [16].

Three important aspects of context are: (1) where the individual is; (2) who the individual is with; and (3) what resources are nearby. Although location is a pri-mary capability, location-aware does not necessarily capture things of interest that are mobile or changing. Context-aware in contrast is used more generally to in-clude nearby people, devices, lighting, noise level, network availability, and even the social situation; e.g., whether you are with your family or a friend from school.

In the domain of patient remote care context awareness refers to detection of patient status and appropriate adaptation of the medical services according to the latter status and environmental conditions.

4.1 Patient Status Awareness

Patient status awareness can be achieved by continuously monitoring the patient state through collecting information either directly related to the individual’s health (e.g., biosignals like heart rate, temperature, blood oximetry and others summarized in Table 1) or information that can be processed and indicate emergency cases (e.g., detection of fall events, call for help, etc.).


A broad definition of a signal is a ‘measurable indication or representation of an actual phenomenon’, which in the field of biosignals, refers to observable facts or stimuli of biological systems or life forms. In order to extract and document the meaning or the cause of a signal, a physician may utilize simple examination pro-cedures, such as measuring the temperature of a human body or have to resort to highly specialized and sometimes intrusive equipment, such as an endoscope. Fol-lowing signal acquisition, physicians go on to a second step, that of interpreting its meaning, usually after some kind of signal enhancement or ‘pre-processing’, that separates the captured information from noise and prepares it for specialized proc-essing, classification and decision support algorithms.

Table 1 Broadly used biosignals with corresponding metric ranges, number of sensors required and information rate.

Biomedical Measurements (Broadly Used Biosignals)

Voltage range

(V) Number of sen-sors

Information rate

(b/s)

ECG 0.5-4 m 5-9 15000

Heart sound Extremely small 2-4 120000

Heart rate 0.5-4 m 2 600

EEG 2-200 μ 20 4200

EMG 0.1-5 m 2+ 600000

Respiratory rate Small 1 800

Temperature of body 0-100 m 1+ 80

Biosignals require a digitization step in order to be converted into a digital

form. This process begins with acquiring the raw signal in its analog form, which is then fed into an analog-to-digital (A/D) converter. Since computers cannot han-dle or store continuous data, the first step of the conversion procedure is to pro-duce a discrete-time series from the analog form of the raw signal. This step is known as ‘sampling’ and is meant to create a sequence of values sampled from the original analog signals at predefined intervals, which can faithfully reconstruct the initial signal waveform. The second step of the digitization process is quantiza-tion, which works on the temporally sampled values of the initial signal and pro-duces a signal, which is both temporally and quantitatively discrete; this means that the initial values are converted and encoded according to properties such as bit allocation and value range. Essentially, quantization maps the sampled signal into a range of values that is both compact and efficient for algorithms to work with.

The latter information is usually collected by equipment installed on the patient or on his/her surrounding environment and is transmitted to monitoring units. Proper processing and classification follows in order to detect the patient status from the data.


4.2 Patient Data Collection and Transmission

The data acquisition is usually performed either through sensor devices placed on user’s body or monitoring devices at the user’s environment. The first collect biosignals, sounds, and/or movement related data, whereas the latter capture and process audiovisual content and generate estimation for events like patient falling, abnormal movement, distress situations like fire, etc. [17], [18], [19]. Previous works [25], [26] present overviews of such system and a prototype platform for detecting fall incidents and distress situation based on user motion and sound data. Sensor devices illustrated in Figure 1 have been used for data collection and transmission to the monitoring unit.

Regarding communication, there are two main enabling technologies according to their topology: on-body (wearable) and off-body networks. Recent technologi-cal advances have made possible a new generation of small, powerful, mobile computing devices. An off-body network connects to other systems that the user does not wear or carry and it is based on a Wireless Local Area Network (WLAN) infrastructure, while an on-body or Wireless Personal Area Network (WPAN) connects the devices themselves; the computers, peripherals, sensors, and other subsystems and runs at ad hoc mode.

Table 2 Wireless connection technologies for telemedicine systems.

Technology Data rate Range Frequency

IEEE 802.11a 54 Mbps 150 m 5 GHz

IEEE 802.11b 11 Mbps 150 m 2.4 GHz ISM

Bluetooth (IEEE 802.15.1) 721 Kbps 10 m - 150 m 2.4 GHz ISM

HiperLAN2 54 Mbps 150 m 5 GHz

HomeRF (Shared Wireless Access Protocol, SWAP)

1.6 Mbps (10 Μbps for Ver.2)

50 m 2.4GHz ISM

DECT 32 kbps 100 m 1880-1900 MHz

PWT 32 kbps 100 m 1920-1930 MHz

IEEE 802.15.3 (high data rate wireless personal area network)

11-55 Mbps 1 m - 50 m 2.4GHz ISM

IEEE 802.16 (Local and Metropolitan Area Networks)

120 Mbps City limits 2-66 GHz

IEEE 802.15.4 (low data rate wireless personal area network), Zigbee

250 kbps, 20 kbps, 40 kbps

100 m - 300 m 2.4 GHz ISM, 868 MHz, 915MHz ISM

IrDA 4Mbps (IrDA-1.1) 2 m IR (0.90 micro-meter)


Telemedicine systems set high demanding requirements regarding energy, size, cost, mobility, connectivity and coverage. Varying size and cost constraints di-rectly result in corresponding varying limits on the energy available, as well as on computing, storage and communication resources. Low power requirements are necessary also from safety considerations since such systems run near or inside the body.

Mobility is another major issue for pervasive e-health applications because of the nature of users and applications and the easiness of the connectivity to other available wireless networks. Both off-body and personal area networks must not have line-of-sight (LoS) requirements. The various communication modalities (see Table 2) can be used in different ways to construct an actual communication net-work. Two common forms are infrastructure-based networks and ad hoc networks. Mobile ad hoc networks represent complex systems that consist of wireless mobile nodes, which can freely and dynamically self-organize into arbitrary and tempo-rary, ”ad hoc” network topologies, allowing devices to seamlessly inter-network in areas with no pre-existing communication infrastructure or centralized administra-tion. The effective range of the sensors attached to a sensor node defines the cover-age area of a sensor node. With sparse coverage, only parts of the area of interest are covered by the sensor nodes. With dense coverage, the area of interest is com-pletely (or almost completely) covered by sensors. The degree of coverage also influences information processing algorithms. High coverage is a key to robust sys-tems and may be exploited to extend the network lifetime by switching redundant nodes to power-saving sleep mode.

Fig. 1 Wearable medical sensor devices: (a) A 3-axis accelerometer on a wrist device ena-bling the acquisition of patient movement data [37], (b) A ring sensor for monitoring of blood oxygen saturation [38], (c) Wearable heart rate monitoring system by Numetrex [39].

4.3 Medical Devices Access, Communication and Interoperability Issues

The discovery and description of the medical devices must be semantic in order to discover appropriate medical devices to which one device wants to communicate. Thus, we suggest that the profiles of medical devices must be described by using ex-isting ontologies, i.e. FIPA [17] or CC/PP [18], or by further specializing these on-tologies for medical devices. The FIPA ontology specifies a frame-based structure to describe devices, and is intended to facilitate agent communication for purposes such as content adaptation. On the other hand, CC/PP is an RDF-based framework for describing software and hardware profiles of the devices, specifically to facilitate


the decision making process of a server, on how to customize and transfer web con-tent to a client device in a suitable format.

On the other hand, medical devices can also interoperate with existing legacy systems, being operated on different health standards, i.e. HL7, OpenEHR etc. As shown in Figure 2, a component with the name “Device Management Module” en-riches the legacy systems with the capabilities of discovery and communication with external medical systems utilizing semantic annotations for devices and the retrieved context awareness. We suggest that this module should be developed for each of a particular health standard (i.e. HL7 v.2.3) compliant system in a lan-guage that can be executed on a number of platforms without its recompilation. The obvious choice for this purpose is Java, because the runtime environments to execute Java byte code exist for a number of platforms (software/hardware). Once this module has been developed for a particular health standard with the afore-mentioned capabilities, devices can easily discover this HIS/LIS and can query the functionalities that it provides and communicate with it seamlessly by understand-ing the semantic meanings of the functionalities that it offers.

4.4 Semantic Medical Devices and Services

We propose the use of Semantic Web Services (SWS) [14] to expose the function-alities of the medical devices as well as the functionalities of HISs/LISs, and to resolve the interoperability issues on each end. By exposing the various function-alities as Web Services and advertising them via SWS, medical devices can dis-cover the services available in a hospital, laboratory or a clinic wherever they are physically present. Finally, the semantic descriptions of the Web Services pro-vided by medical devices will automatically enable them to select, compose and execute the desired composite task.

Being a constituent part of the Ambient Intelligence, a medical device must have context-awareness capability, so that it could adapt itself to the rapidly changing situations. The various types of contextual information that can be used in the environment must be well defined so that different medical devices have a common understanding of the context. Also, there must be mechanisms for the medical device users to specify how different applications and services should be-have in different contexts.

A proposed architecture for medical devices interoperability through semantics is illustrated in Figure 2. The Context Awareness Management (CAM) compo-nent manages the context awareness behavior of a medical device. It includes Context Manager (CM), which retrieves the contextual information from the sub-components, i.e. Device Context, User Context, Security Context and the Physical Context. The device context provides information about the device (i.e. status, bat-tery power etc.); the user context provides information about the user of the device (i.e. patient/health professional, personal prefers.); the physical context provides information about the present environment (i.e. hospital, clinic, laboratory, home etc.); and the security context provides information about the required and pro-vided security level for a particular environment (i.e. a health professional must


Fig. 2 Proposed Architecture for Medical Device Interoperability utilizing Context aware-ness and Semantic modules

provide his user identity (i.e. smart card, eToken) to send or receive patient’s in-formation from/on the device etc.). These sub-components provide basic contex-tual information in the form of context markups (i.e. an RDF graph), which sup-port the CM not only to retrieve the contexts from Context Knowledge Base (CKB) through the Knowledge Query Engine (KQE), but also to infer higher-level contexts, with the help of Knowledge Reasoner (KR).

The CKB provides persistent knowledge storage, in the form of an extended context ontology for a particular environment (i.e. hospital, laboratory etc.) and the context markups that are given by the users or gathered from the basic context provider components (device context, physical context etc.). The CKB links the context ontology and markups in a single semantic model and provides interfaces for the KQE and the KR to manipulate correlated contexts. The KQE provides an abstract interface to the CM for extracting desired contexts from the CKB. To support expressive queries, any RDF Data Query Language can be used as context query language.

4.5 Patient Location Technologies

Positioning of individuals provides healthcare applications with the ability to offer services like supervision of elderly patients or those with mental illnesses who are ambulatory but restricted to a certain area. In addition, assisted care facilities can use network sensors and radiofrequency ID badges to alert staff members when


patients leave a designated safety zone. Network or satellite positioning technol-ogy also can be used to quickly and accurately locate wireless subscribers in an emergency and communicate information about their location. Proximity informa-tion services can direct mobile users to a nearby healthcare facility. Location-based health information services can help find people with matching blood types, organ donors, and so on. A more extensive list of location-based health services can be found in [21].

Positioning techniques can be implemented in two ways: Self-positioning and remote positioning. In the first approach, equipment that the user uses (e.g., a mo-bile terminal, or a tagging device) uses signals, transmitted by the gate-ways/antennas (which can be either terrestrial or satellite) to calculate its own po-sition. More specifically, the positioning receiver makes the appropriate signal measurements from geographically distributed transmitters and uses these meas-urements. Technologies that can be used are satellite based (e.g., the Global Posi-tioning System (GPS) and assisted-GPS), or terrestrial infrastructure-based (e.g., using the cell id of a subscribed mobile terminal).

The second technique is called remote positioning. In this case the individual can be located by measuring the signals traveling to and from a set of receivers. More specifically, the receivers, which can be installed at one or more locations, measure a signal originating from, or reflecting off, the object to be positioned. These signal measurements are used to determine the length and/or direction of the individual radio paths, and then the mobile terminal position is computed from geometric relationships; basically, a single measurement produces a straight-line locus from the remote receiver to the mobile phone. Another Angle Of Arrival (AOA) measurement will yield a second straight line, the intersection of the two lines giving the position fix for this system. Time delay can also be utilized: Since electromagnetic waves travel at a constant speed (speed of light) in free space, the distance between two points can be easily estimated by measuring the time delay of a radio wave transmitted between them. This method is well suited for satellite systems and is used universally by them. Popular applications that are based on the latter technique for tracking provision are the Ekahau Positioning Engine [22], MS RADAR [23] and Nibble [24]. More information regarding positioning tech-niques and systems can be found in [20].

4.6 Data Processing and Classification

The collected data contain information regarding the user’s physiological status (in case of biosignals), potential distress situations (e.g., falls in case of movement data) and general information that can be correlated with the patient state. The data need further processing upon collection until the latter information can be acquired. Proper filtering might be required in order to remove irrelevant data like noise (e.g., in case of movement or sound data). In some cases patient state can be deter-mined by applying simple value thresholds (e.g., in case of body temperature or heart rate) but in cases motion detection and interpretation advanced data classifica-tion techniques might be required. In [27] an overview of classification algorithms is


presented that can be applied on movement and sound data collected by on-body sensors for patient fall event detection.

4.7 User Environment Context Awareness

Apart from determining the patient status, context aware medical treatment and monitoring systems must incorporate information related to user’s environment. More specifically:

User’s indoor or outdoor location can be determined by external devices (i.e. GPS, mobile or WLAN phones) and facilitate the process of ambulatory dispatch-ing in case of emergency events. Based on location, proper proactive or reactive data transmission may also be performed. Information regarding the communica-tion equipment used (e.g., laptop computer, mobile phone or PDA) can facilitate the content adaptation in case of video communication.

Transmission capabilities of the underlying networking infrastructures (e.g., network interface type used, allocated bandwidth, real time network traffic infor-mation, etc.) can affect the communication and thus facilitate the determination of proper content adaptation like application of compression schemes.

More information regarding context-aware medical networks and telemedicine services can be found at [28].

Fig. 3 Illustration of the semantic representation of the context aware data adaptation system using an ontological structure. Major component and actuator classes are illustrated among with most important features for each class.

5 Context Semantic Representation

In order to semantically represent the context aware system and the content adap-tation the ontology illustrated in Figure 3 has been developed. Both the patient-related context and content have been modeled. More specifically: regarding the medical content, a representative class with three subclasses has been created. Each subclass represents image and video medical data, audio data and biosignals


respectively. Most important features for the proper content adaptation are the transmission data rate, type of encryption used (e.g., PKI [29], simple symmetric, or none), compression ratio (in case of scalable compression), codec used (e.g., H.264 for video, JPEG2000 for images and ITU G.723 for audio), and analysis (specifically for images and video according to the network status and the presen-tation device). The patient status is characterized according to physiological state, distress state (i.e. more generic from the latter containing status indications based on vocal and sound analysis), and movement state (e.g., detection of falls or long periods of inactivity). The basic attributes for the aforementioned states are the se-verity of the status (e.g., numerical representation of the emergency severity level), description of the incident and indication of fall or long inactivity status. A patient environment-related class has also been developed for representing the status of the underlying network infrastructures, the user location and the device types that are used for data collection, transmission. Concerning the network status, wired or wireless interfaces can be used. For both interfaces, the type of the medium, the total available bandwidth and the current throughput can affect the data transmission and thus content adaptation, whereas in the case of wireless in-terface the received signal strength might also be an important factor for the con-tent adaptation. User location has been categorized into indoor and outdoor with a simple description as a respective attribute. Finally, the class “Device Type” refers to the transmission device the patient/user operates for communicating with the treatment/monitoring units. In case of static devices (e.g. PCs) the operating sys-tem and the screen resolution might determine content like the video analysis, and frame rate, whereas in case of mobile devices (e.g., mobile phones, PDAs, etc.) memory and power resources can also affect the transmission and presentation of the medical content respectively.

The ontological model has been developed within the Protégé [34] semantic framework using the Ontology Web Language (OWL). The main advantages of the semantic representation of the context aware adaptive system can be summa-rized into the following:

− Flexibility to modify and extend the contextual scheme by adding more classes. In case the parameters that define the context of the patient (e.g.., status, environment, location, etc.) need to be modified, the ontological model can be altered without invoking modifications to the implementation modules or the architecture of the platform.

− Better and more flexible evaluation of the context facilitating the decisions for the medical content adaptation. Using advanced semantic rule evaluation tech-niques (to be discussed in Section 5) content adaptation decisions can be made according to a plethora of contextual parameters. The rules can be updated and extended without any need for system platform software modifications.

Additionally, ontologies are explicit because define the concepts, properties, re-lationships, functions, axioms and constraints that compose the contextual model. They are formal because they are machine readable and interpreted.


6 Content Adaptation

Content adaptation refers to proper medical data coding and proactive or reactive transmission for achieving better utilization of network and system resources dur-ing the monitoring and treatment process. The most demanding data in terms of network and system resources for transmission and processing are the medical and audiovisual data. Additionally, content adaptation can also include different data encryption schemes that can be applied according to data sensitivity and severity of an emergency incident.

6.1 Image and Video/Audio Coding

The coding of medical image and audiovisual data refers to data compression. Ac-cording to the patient status and underlying network interfaces and conditions, several compression schemes can be applied; for instance, uncompressed data can be transmitted in case of a fast wired network connection, whereas higher com-pression schemes can be applied when using wireless connections with lower data rate availability. In case of visual assessment it might be important to maintain particular parts of the image/video of visual context at higher quality and increase the compression on less diagnostic important regions. Examples of special region of interest (ROI) coding with scalable compression can be found at [30], [31] for both medical image and video data.

6.2 Adapted Data Security Policies

The medical context as presented in previous sections contains sensitive informa-tion regarding the patient status, location and context of the surrounding environ-ment. Therefore, several security issues are introduced and must be considered by context-aware medical networks:

− Maintaining information privacy, i.e. to prevent any disclosure of informa-tion directly related to the individual to a service or application without the user’s prior approval or knowledge.

− Maintaining context privacy, i.e. to prevent any disclosure of information related to the context in which the user is using the service (for example her current device parameters) and from which indirect information for the user could be extracted.

− Maintaining location privacy of the user, i.e. to deny an attacker the knowl-edge of a device’s current and past location and preventing linkability.

− Preserving anonymity of the users’ identifiable parameters for distinct sce-narios, i.e. preserving their “state of being not identifiable within a set of subjects”.


Proper solutions for resolving the latter issues can be:

− Mechanisms for protecting any type of sensitive information which the user considers private and for any level of granularity; the user decides how to protect her sensitive information and anonymity, and location privacy. Data abstractions over all types of low-level sensitive data are part of the mechanism and they are processed first allowing faster filtering and default setting;

− To help for the personalization, for hiding all the complexity of the system from the user, for delegating privacy decisions from the user to her device, descriptive profiles of user, user roles, scenarios, context are used.

− Rule-based access over the private data helps to delegate the decisions to the device and to take actions concerning the correct providing privacy pa-rameters to the services

− Any time when the context attributes change, the privacy protection mechanisms evaluates the overall privacy status and acts accordingly based on the predefined rules

In the presented platform several data encryption schemes can be applied for providing medical content privacy, confidentiality, non repudiation and encryp-tion. According to the sensitivity of the data and the severity of the case, simple symmetrical encryption schemes [32] to more complex public key infrastructures can be applied [33]. The platform decides according to specific context parame-ters, which data encryption methodology will be utilized prior to transmission. For instance in case of an emergency incident in an area where only low-bandwidth networks are available, the platform skips the encryption process.

6.3 Reactive Data Transmission

Unnecessary transmission of medical data or monitoring data (e.g., video from user’s environment) can be avoided by using reactive data transmission. In case of normal patient state, data related to the patient context and status (e.g., visual data and biosignals) can be transmitted to monitoring units proactively in specified time intervals. In case of a detected distress situation, reactive transmission can begin. More information on data transmission based on context awareness can be found in [28].

7 Content Adaptation Based on Semantic Rules Evaluation

In order to perform the appropriate medical content adaptation that has been dis-cussed in the previous sections, several semantic rules have been defined. These rules concern features of the ontological class that represents the context aware model se-mantically. By performing proper evaluation of the latter, decision regarding the content adaptation can be made.


The creation of semantic rules required the description of the latter through ab-stract semantic languages like the Semantic Web Rule Language (SWRL) [35]. The syntax for SWRL abstracts from any exchange syntax for OWL [48] and thus facilitates access to and evaluation of the language. An OWL ontology in the ab-stract syntax contains a sequence of axioms and facts. Axioms may be of various kinds, e.g., subClass axioms and equivalentClass axioms. It is proposed to extend this with rule axioms.

axiom ::= rule

A rule axiom consists of an antecedent (body) and a consequent (head), each of which consists of a (posibly empty) set of atoms. A rule axiom can also be as-signed a URI reference, which could serve to identify the rule.

rule ::= 'Implies(' [ URIreference ] annotation antecedent

consequent ')' antecedent ::= 'Antecedent(' atom ')' consequent ::= 'Consequent(' atom ')' Informally, a rule may be read as meaning that if the antecedent holds (is

"true"), then the consequent must also hold. An empty antecedent is treated as trivially holding (true), and an empty consequent is treated as trivially not holding (false). Rules with an empty antecedent can thus be used to provide unconditional facts; however such unconditional facts are better stated in OWL itself, i.e., with-out the use of the rule construct. Non-empty antecedents and consequents hold if all of their constituent atoms hold, i.e., they are treated as conjunctions of their at-oms. Rules with conjunctive consequents could easily be transformed into multi-ple rules each with an atomic consequent.

atom ::= description '(' i-object ')' | dataRange '(' d-object ')' | individualvaluedPropertyID '(' i-object i-object ')' | datavaluedPropertyID '(' i-object d-object ')' | sameAs '(' i-object i-object ')' | differentFrom '(' i-object i-object ')' | builtIn '(' builtinID d-object ')' builtinID ::= URIreference

Atoms can be of the form C(x), P(x,y), sameAs(x,y) differentFrom(x,y), or

builtIn(r,x,...) where C is an OWL description or data range, P is an OWL prop-erty, r is a built-in relation, x and y are either variables, OWL individuals or OWL data values, as appropriate. Atoms may refer to individuals, data literals, individ-ual variables or data variables. Variables are treated as universally quantified, with their scope limited to a given rule. As usual, only variables that occur in the ante-cedent of a rule may occur in the consequent.

i-variable ::= 'I-variable(' URIreference ')' d-variable ::= 'D-variable(' URIreference ')'

While this abstract syntax is consistent with the OWL specification, and is useful

for defining XML and RDF serialisations, it is rather verbose and not particularly


easy to read. Often a relatively informal "human readable" form is used similar to that used in many published works on rules.

In this syntax variables are indicated using the standard convention of prefixing them with a question mark (e.g.,?x). Using this syntax, a rule asserting that the composition of parent and brother properties implies the uncle property would be written:

parent(?x,?y) -> brother(?y,?z) -> uncle(?x,?z)

Within this context, the SWRL Factory [34] mechanism and an integrated Jess

rule engine [36] using the Protégé tool have been utilized. Jess provides both an interactive command line interface and a Java-based API to its rule engine. This engine can be embedded in Java applications and provides a flexible two-way run-time communication between Jess rules and Java. The Jess system consists of a rule base, a fact base, and an execution engine.

Two indicative sample SWRL rules follow that can be used within the pre-sented framework in order to facilitate the decision on the content adaptation based on patient’s context parameters:

Patient(?x) ^ PhysiologicalState(?y) ^ hasSever-

ity(?x,?y, ?severity) ^ hasDescription(?y,?description)^ Biosignal(?BS) ^ Biosignal-

Rate(?Rate)^ swrlb:otherThan(?severity,?Normal) -> DefineTransmissionRate(?Rate,”100kbps”)^ StartTransmission(“true”) Patient(?x) ^ MovementState(?move) ^ FallDetected(?x,?move) ^ hasDescription(?move,?description)^ UserLocation(?Location) ^

VideoRate(?Rate)^NetworkStatus(?Wired)^ swrlb:equals(?move,”Fall”)^swrlb:equals(?Location, “In-

door”)^swrlb:equals(?Wired, ”true”) -> DefineVideoRate(?Rate,”300kbps”)^ StartTransmission(“true”) The first rule examines the physiological state of the patient as characterized by

the status awareness modules in terms of status severity. If the latter is considered to be other than “Normal” then transmission of the collected biosignals to the monitoring units begins at a specific data rate. The second rule is more advanced and takes into consideration potential indication of a fall event, the location of the user and the net-work status. According to the rule, video transmission of the patient’s premises will begin in case a fall has been detected. High transmission rate will be used if the user is located indoor and a wired network infrastructure is used.

8 Proposed Architecture Scheme

This Section presents the proposed architecture scheme that incorporates modules that feature the discussed aspects of context awareness and medical content adapta-tion. The interconnection and communication of the different components can be il-lustrated as five different application layers (see Figure 4). Initial data acquisition from the sensor and monitoring devices is followed by proper processing for feature extraction. The context awareness is performed by classifying the generated features


and utilizing semantic evaluation of the latter. Application of semantic rules facili-tates the determination of patient status and detection of emergency events.

According to the detected patient status and additional contextual information regarding the patient’s environment and underlying network conditions, proper content adaptation to the medical data is performed. The content related to inci-dent is coded (i.e. compressed and encrypted) accordingly and transmitted to the monitoring units.

Fig. 4 Illustration of the incorporated application layers for context awareness and content adaptation and transmission.

Figure 5 illustrates a proposed architecture scheme for interconnected all the involved components for enabling context awareness and proper content coding.

The provision of the contextual data (i.e., estimated patient status based on rules evaluation, medical data and other context data) can be performed either through appropriate web-based and application interfaces or through creating ap-propriate Web Services, as discussed in the following paragraph.

8.1 Context Information Provision through Web Services

Web Services are emerging as a promising technology to build distributed applica-tions. It is an implementation of Service Oriented Architecture (SOA) that supports the concept of loosely-coupled, open-standard, language - and platform-independent systems. Web Services are accessed through the HTTP/HTTPS protocols and utilize XML (eXtendible Markup Language) for data exchange. This in turn implies that Web Services are independent of platform, programming language, tool and net-work infrastructure. Services can be assembled and composed in such a way to fos-ter the reuse of existing back-end infrastructure. The basic SOA includes three ser-vice components: provider, requester and registry. WSDL (Web Service Description Language) is commonly defined by the service provider for invoking the service. SOAP (Simple Object Access Protocol) is adopted as message transfer protocol between requester and provider and UDDI (the Universal Description, Discovery and Integration) is used for service registration and discovery.


The typical scenario illustrated in Figure 6 is based on publishing (WSDL ref-erence), searching for a service and binding to a service provider. The XML mes-saging between service consumer and provider exploits the SOAP protocol. SOAP provides automatic marshalling/unmarshalling of the arguments, like Remote Pro-cedure Call (RPC).

Fig. 5 The proposed architecture that incorporates modules and components for proper medical content adaptation based on context awareness.

Fig. 6 General Web Services component framework

Web services provide several technological and business benefits, a few of which include application and data integration, versatility, code re-use and cost savings. The inherent interoperability that comes with using vendor, platform, and language independent XML technologies and the ubiquitous HTTP as a transport


mean that any application can communicate with any other application using Web services. Web services are also versatile by design. They can be accessed by hu-mans via a Web-based client interface, or they can be accessed by other applica-tions and other Web services. Code re-use is another positive side-effect of Web services' interoperability and flexibility. One service might be utilized by several clients, all of which employ the operations provided to fulfill different business objectives.

In order to provide direct and efficient access to the contextual information generated by the platform, a Web Service module has been developed. The latter can expose specific functionality to developers for creating external client applica-tions that can monitor the acquired biosignals of the platform, get information re-garding the context of the patient and perform content adaptation based on rules evaluation. Figure 7 illustrates a sample WSDL definition for the developed Web Service. Two functions are described that concern the status of the patient and the parameters that describe the adaptation of the medical content based on the rules evaluation of the platform.

Fig. 7 WSDL sample description for the provided Web Services. “getPatientstatus” and “Con-tentAdaptationParams” refer to functions that provide information regarding the patient’s state and the medical content adaptation parameters as indicated by the proposed framework.

9 Conclusions

A context-aware medical content adaptation platform has been presented. The platform utilizes sensor data for determining the patient status and takes into ac-count additional contextual information like underlying network conditions, and data transmission devices. A semantic representation for the patient context has been developed and appropriate rule-based system is used in order to perform proper medical content adaptation according to the context, facilitating and im-proving the diagnosis and treatment process. In addition, a Web Service module provides access to information related to the context of the patient and the medical content adaptation.


Future work might include the deployment of the proposed platform in a real remote treatment and monitoring environment for assessing the actual contribution of context awareness and content adaptation to the remote medical care process.

References

1. Lin, J.C.: Applying telecommunication technology to health care delivery. IEEE Engi-neering in Medicine and Biology Magazine 4, 28–31 (1999)

2. Pavlopoulos, S., Kyriacou, E., Berler, A., Dembeyiotis, S., Koutsouris, D.: A novel emergency telemedicine system based on wireless communication technology-AMBULANCE. IEEE Transactions on Information Technology in Biomedicine 4, 261–267 (1998)

3. Deb, S., Ghoshal, S., Malepati, V.N., Kleinman, D.L.: Tele-diagnosis: remote monitor-ing of large-scale systems. In: Proc. of IEEE Aerospace Conference, pp. 31–42 (2001)

4. Choi, Y.B., Krause, J.S.H., Seo, C.K., Chung, E.K.: Telemedicine in the USA: stan-dardization through information management and technical applications. IEEE Com-munications Magazine 44, 41–48 (2006)

5. Pattichis, C.S., Kyriacou, E., Voskarides, S., Pattichis, M.S., Istepanian, R., Schizas, C.N.: Wireless telemedicine systems: an overview. IEEE Antennas and Propagation Magazine 44, 143–153 (2002)

6. Aakay, M., Marsic, I., Medl, A., Bu, G.: A system for medical consultation and educa-tion using multimodal human/machine communication. IEEE Transactions on Infor-mation Technology in Biomedicine 2(4), 282–291 (1998)

7. Zhou, J., Shen, X., Georganas, N.D.: Haptic tele-surgery simulation. In: Proc. of the 3rd IEEE International Workshop on Haptic, Audio and Visual Environments and their Applications, pp. 99–104 (2004)

8. Fontelo, P., DiNino, E., Johansen, K., Khan, A., Ackerman, M.: Virtual Microscopy: Potential Applications in Medical Education and Telemedicine in Countries with De-veloping Economies. In: Proc. of the 38th Annual Hawaii International Conference on System Sciences, p. 153 (2005)

9. Lage, A.-L., Martins, J., Oliveira, J., Cunha, W.: A quality of service approach for managing tele-medicine multimedia applications requirements. In: Proc. of IEEE Workshop on IP Operations and Management, pp. 186–190 (2004)

10. LeRouge, C., Garfield, M.J., Hevner, A.R.: Quality attributes in telemedicine video conferencing. In: Proc. of 35th Annual Hawaii International Conference on System Sciences, pp. 2050–2059 (2002)

11. Yu, H., Lin, Z., Pan, F.: Applications and improvement of H.264 in medical video compression. IEEE Transactions on Circuits and Systems 52(12), 2707–2716 (2005)

12. Bernabe, G., Gonzalez, J., Garcia, J.M., Duato, J.: A new lossy 3-D wavelet transform for high-quality compression of medical video. In: Proc. of IEEE EMBS International Conference on Information Technology Applications in Biomedicine, pp. 226–231 (2000)

13. Doukas, C.N., Maglogiannis, I., Kormentzas, G.: Medical Image Compression using Wavelet Transform on Mobile Devices with ROI coding support. In: Proc. of the 27th Annual International Conference of the IEEE EMBS, Shanghai, China,

14. Schilit, B., Adams, N., Want, R.: Context-aware computing applications. In: IEEE Workshop on Mobile Computing Systems and Applications (WMCSA 1994), Santa Cruz, CA, US, pp. 89–101 (1994)


15. Schilit, B.N., Theimer, M.M.: Disseminating Active Map Information to Mobile Hosts. IEEE Network 8(5), 22–32 (1994)

16. Dey, A.K.: Understanding and Using Context. Personal Ubiquitous Computing 5(1), 4–7 (2001)

17. Wang, S., Yang, J., Chen, N., Chen, X., Zhang, Q.: Human activity recognition with user-free accelerometers in the sensor networks. In: Proc. of International Conference on Neural Networks and Brain, pp. 1212–1217 (2005)

18. Miaou, S.G., Sung, P.-H., Huang, C.-Y.: A Customized Human Fall Detection System Using Omni-Camera Images and Personal Information. In: Proc. of 1st Transdiscipli-nary conference on Distributed Diagnosis and Home Healthcare, pp. 39–42 (2006)

19. Istrate, D., Castelli, E., Vacher, M., Besacier, L., Serignat, J.F.: Information extraction from sound for medical telemonitoring. IEEE Transaction on Information Theory in Biomedicine 2(10), 264–274 (2006)

20. Zeimpekis, V., Giaglis, G.M., Lekakos, G.: A taxonomy of indoor and outdoor position-ing techniques for mobile location services. ACM SIGecom Exchanges 3(4), 19–27 (2003)

21. Shih-wei, L., Shao-you, C., Yung-jen, H.J., Polly, H., Chuang-wen, Y.: Emergency Care Management with Location-Aware Services. In: Pervasive Health Conference and Workshops, pp. 1–6 (2006)

22. Ekahau LBS, http://www.ekahau.com (accessed on September 26, 2005) 23. Bahl, P., Padmanabhan, V.N.: RADAR: An In-Building RF-based User Location and

Tracking System. In: INFOCOM, pp. 775–784. IEEE Press, Los Alamitos (2000) 24. Castro, P., Chiu, P., Kremenek, T., Muntz, R.: A Probabilistic Room Location Service

for Wireless Networked Environments. In: Abowd, G.D., Brumitt, B., Shafer, S. (eds.) UbiComp 2001. LNCS, vol. 2201, pp. 18–34. Springer, Heidelberg (2001)

25. Doukas, C., Maglogiannis, I.: Enabling Human Status Awareness in Assistive Envi-ronments based on Advanced Sound and Motion Data Classification. Presented at The 1st ACM International Conference on PErvasive Technologies Related to Assistive Environments (PETRAE), Athens, Greece, July 16-19 (2008)

26. Doukas, C., Maglogiannis, I.: Advanced Patient or Elder Fall Detection based on Movement and Sound Data. Presented at 2nd International Conference on Pervasive Computing Technologies for Healthcare (2008)

27. Doukas, C., Maglogiannis, I.: Human Distress Sound Analysis and Characterization using Advanced Classification Techniques. Presented at 5th Hellenic Conference on Artificial Intelligence, Syros, Greece, October 2-4 (2008)

28. Doukas, C., Maglogiannis, I., Kormentzas, G.: Advanced Telemedicine Services through Context-aware Medical Networks. In: Proceedings of the IEEE EMBS co-sponsored International Special Topic Conference on Information Technology in Bio-medicine (ITAB 2006), Ioannina-Epirus, Greece, October 26-28 (2006)

29. Public Key Infrastructure, online information, http://www.ietf.org/html.charters/pkix-charter.html

30. Maglogiannis, Doukas, C., Kormentzas, G., Pliakas, T.: Optimized Mobile Access to DICOM Images using Wavelet compression with ROI coding support. To appear in IEEE Transactions on Information Technology in Biomedicine

31. Doukas, C., Maglogiannis, I.: Adaptive Transmission of Medical Image and Video us-ing Scalable Coding and Context-aware Wireless Medical Networks. EURASIP Jour-nal on Wireless Communications and Networking, Article ID 428397 2008, 12 (2008)

32. Makris, L., Argiriou, N., Strintzis, M.G.: Network and data security design for tele-medicine applications. Informatics for Health and Social Care 22(2), 133–142 (1997)


33. Bao, S.-D., Shen, L.-F., Zhang, Y.-T.: A novel key distribution of body area networks for telemedicine. In: Proc. of 2004 IEEE International Workshop on Biomedical Cir-cuits and Systems, pp. 1–17-20a (2004)

34. Protégé Ontology Editor and Knowledge Base Framework, more information, http://protege.stanford.edu/

35. The Semantic Web Rule Language definition, http://www.w3.org/Submission/SWRL/

36. The Jess Rule Engine, http://www.jessrules.com/jess/index.shtml 37. Malan, D., Fulford-Jones, T., Welsh, M., Moulton, S.: CodeBlue: An Ad Hoc Sensor

Network Infrastructure for Emergency Medical Care. In: International Workshop on Wearable and Implantable Body Sensor Networks (2004)

38. Rhee, S., Yang, B.-H., Chang, K., Asada, H.H.: The Ring Sensor: a New Ambulatory Wearable Sensor for Twenty-Four Hour Patient Monitoring. In: Proceedings of the 20th Annual International Conference of the IEEE Engineering in Medicine and Biol-ogy Society, vol. 20(4), pp. 1906–1909 (1998)

39. Numetrex cardio shirt, http://www.numetrex.com/about/cardio-shirt 40. Bardram, J.E.: The Java Context Awareness Framework (JCAF): A Service Infrastruc-

ture and Programming Framework for Context-Aware Applications (2005) 41. Hong, J., Landay, J.: An Infrastructure Approach to Context-Aware Computing (2001) 42. Biegel, G., Cahill, V.: A Framework for Developing Mobile, Context-aware Applica-

tions (2004) 43. Dey, A., Abowd, G.D., Salber, D.: A conceptual framework and a toolkit for support-

ing the rapid prototyping of context-aware applications. Human-Computer Interac-tion 16, 97–166 (2001)

44. Hohl, F., Mehrmann, L., Hamdan, A.: A context system for a mobile service platform. In: Schmeck, H., Ungerer, T., Wolf, L. (eds.) ARCS 2002. LNCS, vol. 2299, pp. 21–33. Springer, Heidelberg (2002)

45. Henricksen, K., Indulska, J., Rakotonirainy, A.: Modeling context information in per-vasive computing systems. In: Mattern, F., Naghshineh, M. (eds.) PERVASIVE 2002. LNCS, vol. 2414, pp. 167–180. Springer, Heidelberg (2002)

46. Hightower, J., Brumitt, B., Borriello, G.: The location stack: A layered model for loca-tion in ubiquitous computing. In: Proceedings of the Fourth IEEE Workshop on Mo-bile Computing Systems and Applications (WMCSA 2002). IEEE Computer Society Press, Los Alamitos (2002)

47. Abowd, G.D.: Software engineering issues for ubiquitous computing. In: Proceedings of the 21st international conference on Software engineering, pp. 75–84. IEEE Com-puter Society Press, Los Alamitos (1999)

48. Patel-Schneider, P.F., Hayes, P., Horrocks, I. (eds.): OWL Web Ontology Language Semantics and Abstract Syntax. W3C Recommendation 10 February (2004),

http://www.w3.org/TR/owl-semantics


Blog Rating as an Iterative Collaborative Process

Malamati Louta and Iraklis Varlamis

Abstract. The blogosphere is a part of the World Wide Web, enhanced with several characteristics that differentiate blogs from traditional websites. The number of different authors, the multitude of user-provided tags, the inherent connectivity between blogs and bloggers, the high update rate, and the time information attached to each post are some of the features that can be exploited in various information retrieval tasks in the blogosphere. Traditional search engines perform poorly on blogs since they do not cover these aspects. In an attempt to exploit these features and assist any specialized blog search engine to provide a better ranking of blogs, we propose a rating mechanism, which capitalizes on the hyperlinks between blogs. The model assumes that the intention of a blog owner who creates a link to another blog is to provide a recommendation to the blog readers, and quantifies this intention in a score transferred to the blog being pointed. A set of implicit and explicit links between any two blogs, along with the links’ type and freshness, affect the exchanged score. The process is iterative and the overall ranking score for a blog is subject to its previous score and the weighted aggregation of all scores assigned by all other blogs.

Keywords: blog, ranking, collaborative rating, local and global rating.

1 Introduction

In the competitive industry of web search, the increase of web coverage and the improvement in ranking of results are the two main aims of any potential player. Due to the rapid increase of its content, blogosphere attracted the interest of popular web search engines (e.g. Google, Yahoo! and AskJeeves), companies that provide access exclusively to the blogosphere content (e.g. Blogpulse [1], and Technorati [2]) and researchers that focus to web search [3].

Every blog consists of a series of entries (namely posts), which carry apart from text or other media content, several hyperlinks to other entries or web pages and a timestamp information concerning the post creation. Using this linking mechanism, blogosphere is converted to an interconnected sub-graph of the web, with links to the surrounding web graph too. Similarly to normal links, blog links are used as suggestions or as a means to express agreement or disagreement [4] to

Malamati Louta and Iraklis Varlamis Harokopio University of Athens, Department of Informatics and Telematics 176 71, Athens, Greece e-mail: louta,[email protected]

188 M. Louta and I. Varlamis

a blogs’ content. However, due to the ease of the publishing mechanism, they have been utilized to bias search engine results (e.g. splogs, google bombs etc).

Since publishing in blogs comes at no cost for web users, the content and number of links provided by individual writers world-wide can easily surpass those in registered websites. This change affects the structure of the web graph and forces search engines to adapt. The ranking mechanisms of web search engines have two main options, concerning blog links: a) to completely ignore them, in order to avoid spamming and b) to take them into account. In the latter case, they have to tackle several trust related issues.

This work perceives hyperlinks in blogs as recommendations to blog readers and models the network of hyperlinked blogs as a continuous process where respectful or disrespectful sources recommend other trustful or distrustful ones. The overall ranking score for a blog is computed on top of all its incoming links (inlinks). Moreover, the time information, which is attached in blog posts, is exploited in order to compute hyperlink freshness and re-calculate the overall score for a blog.

In the following section we provide reference to research works on web document ranking that make use of various web page information and to works that emphasize on the additional information that blogs carry. In section 3 we give an overview of blog information and the fundamental concepts of our rating model. Section 4 presents the mathematical formulation of our proposed model and suggests a model for attaching rating semantics to blogs. Through the experimental evaluation of our designed mechanism in section 5, we demonstrate the first results from the application of our model in a collection of blogs and present some interesting findings. Finally, section 6 contains the conclusions from this work and our next plans.

2 Related Work

Ranking on the web is primarily based on the analysis of the web graph as it is formulated by hyperlinks. It has been ten years since PageRank [5], the most cited ranking algorithm, has been introduced. Several research works, during this period, have attempted to improve PageRank’s performance and incorporate as many information as possible in the web graph, resulting in numerous PageRank variations and a multitude of interesting ideas (e.g., Topic-sensitive pagerank [6], Trustrank [7], Spamrank [8], Page-reRank [9], biased PageRank [10]).

The primary aim in all the aforementioned works is to attach extra semantics to hyperlinks, by analyzing neighboring content, or other structural information (e.g. topic, negative or positive opinion, etc.). In addition to automatically extracted semantics, several hyperlink metadata formats have been proposed, which allow web content authors to annotate hyperlinks [11], [12], [13] and search engines to distinguish between links that provide positive and negative recommendations. However, none of these metadata formats has yet been widely employed, and as a result, there is still not a widely accepted method for distinguishing between positive and negative links.

Blog Rating as an Iterative Collaborative Process 189

In the case of blogs, several ranking algorithms have been suggested that exploit explicit (EigenRumor algorithm [14]) and/or implicit (BlogRank [15], [16]) hyperlinks between blogs. All these algorithms formulate a graph of blogs, based on hyperlinks and then apply PageRank or a variation of it in order to provide an overall ranking of blogs. However, all these algorithms provide a static measure of blog importance that does not reflect the temporal aspects accompanying the evolution of the blogosphere.

Several models that capture the freshness of links have been proposed with applications in web pages and hyperlinks [17], [18], scientific papers and bibliographic citations [19], [20]. All these works are based on the fact that PageRank and its variations favor old pages. In order to balance this, a link (or citation) weighting scheme is employed, which is based on the age of the web page (or paper). In a post-processing step the authority of a pointed node decays, based on the node’s age and the incoming links age.

In the current work, we consider that ranking in the blogosphere is an iterative process. As a first step, we consider that links in the blogosphere act as recommendations to readers. In a second step, we exploit two special features of the blogosphere links: a) the difference between blogroll links, which denote a more permanent trust towards the blog being pointed, and post links, which represent a more transient reference to a blog, b) the timestamp information of a post, which can be employed as a timestamp for a hyperlink.

3 Background

This section illustrates the useful information that can be found in a blog and can be incorporated in the blog rating mechanism. In the following, we explain the details of each piece of information; we discuss its availability and its role in the iterative rating model.

3.1 Blog Structure

Although the blog structure is not standard, most blogs share the following structure: Each blog has a host URL and contains one or more posts, authored by the blog editors. Post information comprises an author, a body, a date and time of publishing and a URL of the full, individual article, called the permalink. A post optionally includes: comments of readers, category tags and links to referrers (trackback links).

The number of comments and trackbacks, where available, can be retrieved by processing the contents of each post. Since this type of information is not standard for all blog servers, the numbers can be retrieved for a small portion of the blogosphere (research works report that less than 1% of posts offers trackbacks and comments information [15]). Topic information is available for more posts ([15] report a number close to 24%). Although the choice of topic is subjective to the author, through the combined analysis of topic and author information we may


obtain useful information from of the blogosphere, such as authors that link to other authors, linked–related topics etc.

The date and time that an entry was registered is another useful piece of information. Analysis of entries based on date and time, will reveal more or less recent blogs, more or less active blogs and authors, and topics with short or long lifecycle.

Finally, the blogroll, the list of blogs that is usually placed in the sidebar of a blog, can be used as a list of recommendations by the blogger of other blogs. Blogroll is considered to be a fixed list of links that is updated infrequently. Blogrolls can be used to indicate the affiliated blogs of a certain blog.

3.2 Hyperlinks and the Blog Rating Model

The aim of the proposed blog rating model is to adaptively assign a score to every blog based on the recommendations from other blogs. Each blog contains: a blogroll, which is a set of hyperlinks to affiliated blogs, and one or more posts, published at different times that contain hyperlinks to the posts of other blogs. The model distinguishes between these two hyperlink types as depicted in Figure 1.

Fig. 1 Hyperlink types in the blog rating model

More specifically, a blogroll hyperlink is a link in the blogroll of blog A pointing to a blog B. It denotes that A gives a permanent recommendation for B and thus contributes a constant degree to the score of B. On the contrary, a post hyperlink from an individual post A1 of blog A to a post B2 in blog B denotes a temporary interest of A to the contents of B and consequently increases the score of B only for a short period of time after the post has been published.

The blogroll links of blog A increase the rating of all pointed blogs. Moreover, any new posts, which are added daily in blog A, contribute to the rating of the respective blogs they point to. As a consequence, the local rating assigned by a blog A to a blog B is the weighted sum of ratings assigned by blogroll and post hyperlinks respectively. This local rating information depicts the image of X for the part of the blogosphere pointed by X. This information is updated every time a new hyperlink appears, either in a post or in the blogroll of X. The rating assigned by a certain post hyperlink decreases as days pass and the post becomes old. By


monitoring a certain blog X for several periods, we are able to compute the accumulative local ratings assigned to all blogs pointed by blog X.

The local rating information of a blog A can be enhanced by the information provided by its affiliated blogs FA (e.g., the blogs in its blogroll). The affiliated blogs are the trusted blogs of A and their opinion for the blogosphere is of interest to A. As a result, the collaborative local rating combines the direct experiences of the evaluator blog A for B with information regarding B gathered from the N affiliated witness blog sites. If we consider that in Figure 1 the blogs C and A collaborate, then the collaborative rating of blog B is a weighted sum of local ratings provided for B by A and C.

In a similar manner, we are able to compute a global rating for every blog, by aggregating the local rating information of all blogs. Every new blog Y that is added to the blogosphere receives a default minimum global rating. This score increases by the number of incoming blogroll or post hyperlinks and is an indication of the blog’s credibility when it is used as a witness to other blogs.

In general, when we rate a service by combining multiple witnesses, we take into account the credibility of the witness and the freshness of information. In an analogous manner, when we combine local ratings from different blogs we must consider the freshness of ratings, which corresponds to a) the freshness of links and b) the freshness of prior rating (i.e., considering the time period during which the rating was estimated) and the credibility of each individual blog, which in context of this study for the global rating formation is depicted in the blog’s global rating on a previous period. For example, the rating of Blog B, in figure 1, is subject to the local ratings from A to B and from C to B, weighted by the global ratings of A and C in the previous known period. The former is the sum of ratings assigned via the blogroll link and Hyperlinkt1, where as the latter is based only on Hyperlinkt3, which however is more recent than Hyperlinkt1 (given that t3>t2>t1). If no more post links are added in the next period, the global rating of B decreases, since the freshness of Hyperlinkt1 and Hyperlinkt3 decreases. If no more post links are added for several periods, then the global rating of B is subject only to the rating assigned by the blogroll link of blog A.

4 Blog Site Rating System Formulation

Let us assume the presence of M Blog Sites BSs falling within the same category with respect to the topics covered and the interests shared. Let

,..., 21 MBSBSBSBS = be the set of Blog Sites in the system. In subsection

4.1, the local blog site rating formation is formally described taking into account only first hand information (i.e. what the evaluator blog site considers about the target blog site), in subsection 4.2, the blog site local rating is collaboratively formed (the evaluator blog site takes into account the opinion of other affiliated blog sites concerning the blog site under evaluation) , while in subsection 4.3, a global value for a blog site is formed taking into account the view of all blog sites in the system.


4.1 Local Accumulative Blog Site Rating Formation

Concerning the local formation of the Blog Site iBS rating, the Blog Site

jBS may rate iBS at time period c in accordance with the following formula:

∑>

+−==== ⋅=

c

knck

iBS

ktktiBS

ct BSLBSRwBSLABSR j

pp

j

p

01

)()( . (1)

where )( iBS

ct BSLABSR j

p = is the local accumulative iBS rating estimated by

jBS at time period ct p = , )( iBS

kt BSLBSR j

p = denotes the local rating the

evaluator jBS attributes to the target iBS at time period kt p = and weight

kt pw = provides the relative significance of the )( i

BSkt BSLBSR j

p = factor estimated

at time period k to the overall iBS rating estimation by the evaluator jBS .

Concerning the )( iBS

kt BSLBSR j

p = factor estimation, the evaluator jBS may

exploit the following formula:

)()()( iBS

ktEPiBS

ktBRiBS

kt BSEPwBSBRwBSLBSR j

p

j

p

j

p === ⋅+⋅= . (2)

As may be observed from Equation 2, the local rating of the target iBS is a

weighted combination of two factors. The first factor contributing to the overall

iBS rating value (i.e., )( iBS

kt BSBR j

p = ) forms the blogroll related factor. This

factor is introduced on the basis that the jBS blogroll provides a list of friendly

blog sites frequently accessed/read by the authors of jBS . It has been assumed

that )( iBS

kt BSBR j

p = lies within the [0,1] range, where a value close to 1 indicates

that the target iBS is a friendly blog site to the evaluator jBS . In the context of

this study, )( iBS

kt BSBR j

p = is modeled as a decision variable assuming values 1 or

0 depending on whether iBS belongs to the blogroll of jBS or not at time period

k , respectively. Alternatively, jBS could provide a rating of the friendly blog

sites in the blogroll, which could be exploited in order to differentiate

)( iBS

kt BSBR j

p = factor for the friendly blog sites comprised in the jBS blogroll.

This issue will be considered in a future version of this study.


The second factor contributing to the overall )( iBS

kt BSLBSR j

p = (i.e.

)( iBS

kt BSEP j

p = ) depends on the fraction of jBS posts pointing to iBS at time

period k . This factor has been assumed to lie within the [0,1] range and may be given by the following equation:

j

p

j

pj

p BSkt

iBS

kt

iBS

ktNoP

BSNoPBSEP

=

== =

)()( . (3)

where )( iBS

kt BSNoP j

p = denotes the number of posts created between time period

1−= ktp and kt p = pointing to the target blog site iBS and j

p

BSktNoP =

denotes the total number of the evaluator jBS posts created in between time

period 1−k and k .

Weights BRw and EPw provide the relative significance of the anticipated

blogroll related part and the posts related factor. It is assumed that weights BRw

and EPw are normalized to add up to 1 (i.e., 1=+ EPBR ww ). From the

aforementioned analysis, it is obvious that the )( iBS

kt BSLBSR j

p = factor lies within

the [0,1] range.

Weights kt pw = in equation (1) are normalized to add up to 1 ( ∑

>+−=

= =c

knck

kt pw

01

1)

and may be given by the following equation:

∑=

= = n

ll

kkt

w

ww

p

1

. (4)

where ⎭⎬⎫

⎩⎨⎧

<≥+−

=nck

nckcnwk ,

, .

At this point it should be noted that the authors have assumed that the local rating estimation takes place at consecutive, equally distributed, time intervals. For the formation of the local accumulative BS rating at a time period c , the evaluator considers only the n more recent ratings formed. The value n determines the memory of the system. Small value for the n parameter means that the memory of the system is small, whereas large value considers a large memory for the system. Equation (4) in essence models the fact that more recent local BS ratings should weigh more in the overall BS rating evaluation.


4.2 Collaborative Local Blog Site Rating Formation

In order to estimate the rating of a target Blog Site iBS , the evaluator Blog Site

jBS needs to contact a set WBS of N witness Blog Sites ( BSWBS ⊆ ) in

order to get feedback reports on the usability of the iBS . The set of the N

witnesses is a subset of the ,..., 21 MBSBSBSBS = set and can be the blog

sites in the blog roll of jBS . The target iBS overall collaborative rating

)( iBS

ct BSCLBSR j

p = may be estimated by the evaluator Blog Site jBS at time

period c in accordance with the following formula:

∑≠=

==

==

⋅

+⋅==

N

ikk

iBS

ctkBS

ct

iBStj

BScti

BSct

BSLABSRBSw

BSLABSRBSwBSCLBSR

k

p

j

p

j

cp

j

p

j

p

1

)()(

)()()(

. (5)

As may be observed from equation (5), the collaborative rating of the target iBS

is a weighted combination of two factors. The first factor contributing to the rating

value is based on the direct experiences of the evaluator blog site jBS , while the

second factor depends on information regarding iBS past behaviour gathered

from the N witnesses blog sites.

Weight )( xBS

ct BSw j

p = provides the relative significance of the rating of the target

blog site iBS as formed by the blog site xBS to the overall rating estimation by the

evaluator jBS . In general, )( xBS

ct BSw j

p = is a measure of the credibility of witness

xBS and may be a function of the local accumulative blog site rating attributed to

each xBS by the evaluator jBS . It has been assumed that weights )( xBS

ct BSw j

p =

are normalized to add up to 1 (i.e., 1)()(1

=+∑≠=

==

N

ikk

kBS

ctjBS

ct BSwBSw j

p

j

p). Thus,

weight )( xBS

ct BSw j

p = may be given by the following equation:

∑∈

=

== =

jx

j

p

j

pj

p

BSWBSBSx

BSct

xBS

ct

xBS

ctBSLABSR

BSLABSRBSw

∪)(

)()( . (6)


where )( xBS

ct BSLABSR j

p = is the local accumulative blog site rating attributed to

Blog Site xBS by the evaluator jBS . One may easily conclude that for the

evaluator jBS it stands 1)( == jBS

ct BSLABSR j

p.

At this point it should be noted that, considering different blog sites, the duration of each time interval introduced in subsection 4.1 for the local accumulative blog site rating estimation may differ. This has the side-effect that it is not necessary all witness blog sites to have estimated their local accumulative

rating concerning target blog site iBS at the same time. Let us for example

consider a blog site updating local accumulative blog site ratings per month and a blog site updating the related information per day. In order to introduce the time effect in our mechanism and model the fact that more recent ratings should weigh more in the overall collaborative blog site rating estimation, equation (5) should be rewritten as follows:

)()(

)()()(

1∑

≠=

⋅⋅

+⋅=N

ikk

iBStimek

BStimetime

iBStimej

BStimei

BStime

BSLABSRBSww

BSLABSRBSwBSCLBSR

k

kd

j

ckd

j

c

j

c

j

c

. (7)

where kdtimew is a decaying parameter given by the following equation:

c

dctime time

timetimew k

kd

−−= 1 . (8)

In the context of this study, kdtimew is modeled as a polynomial function. Other

functions (for example exponential) could be defined as well. As may be observed

from equation (7), the bigger the quantity kdc timetime − , the smaller the

contribution of witness blog site kBS rating provided to the overall collaborative

target iBS rating formation. At this point, we assume that when at ctime the

collaborative rating is estimated by the evaluator jBS , its local accumulative

ratings have also been updated.

4.3 Global Blog Site Rating Formation

In order to estimate the global rating of a target Blog Site iBS , a specialized blog

site search engine evaluator collects the feedback reports on the usability of the


iBS from the M Blog Sites belonging to the set ,..., 21 MBSsBSBSBS = .

The target iBS overall collaborative global rating )( iBS

ct BSGBSR j

p = may be

estimated by the evaluator Blog Site jBS at time c in accordance with the

following formula:

∑≠=

⋅⋅=M

ikk

iBStime

BStimei

BStime BSLABSRwwBSGBSR k

kd

k

kd

j

c1

)()( . (9)

As may be observed from equation (9), the global rating of the target iBS is a

weighted combination of the rating values provided by the blog sites kBS , based

on the direct experiences.

Weight kBSw provides the relative significance of the rating of the target blog

site iBS as formed by the blog site kBS to the overall rating estimation by the

evaluator blog site search engine. In general, kBSw is a measure of the credibility

of blog site kBS and may be a function of its prior global rating as estimated by

the evaluator blog site search engine during the previous time. In the context of

this study, weight kBSw is given by the following equation:

M

BSNoBSBSGBSRw k

iBStime

BS k

c

k)(

)(1

⋅=−

. (10)

where )(1 i

BStime BSGBSR k

c− denotes the prior global rating of blog site kBS

estimated at 1−ctime in accordance with equation (9), )( kBSNoBS denotes the

number of blog sites pointing to kBS at ctime and M is the total number of

blog sites in the system. The portion of the blog sites pointing to kBS at time c

has been introduced in equation (10) in order to enhance the credibility value of a

witness blog site providing a rating for the blog site iBS under evaluation.

Finally, in analogy to subsection 4.2, parameter kdtimew is the decaying factor

given by equation (8), introduced in order to weigh down possible outdated evaluation ratings provided.

4.4 The Semantics of the Rating Model

In order to support the operation of the rating mechanism, we suggest the use of semantics in the description of post and blogroll hyperlinks. The semantic


information which will be attached to each hyperlink will allow bloggers to better describe their intentions behind creating the link, to prioritize affiliated blogs in the blogroll or even to provide topic information for the pointed posts. The rating mechanism can be adopted to update the local scores, and to employ them in providing collaborative and global scores. RDF is a popular format for describing metadata and it is used to support our rating model.

As mentioned in section 4.1, the local (accumulative or not) blog site ratings are solely based on the recommendations provided by the blog itself. As a result, only the RDF file associated with the current blog is required for storing local ratings. In general, the RDF file comprises URI and ratings for each blog in the blogroll and for each blog referenced in the posts. A software entity acting on behalf of each blog is responsible for reading the RDF file, recomputing the accumulative localRating and updating the RDF with the new ratings and the new date of update. In the following, we provide a fictional example of an RDF file for a blog (e.g., my.blog.co.uk) that contains two links in the blogroll (e.g., myother.blog.co.uk and blog.co.uk/agoodone) and two post with hyperlinks to an affiliated blog (e.g., blog.co.uk/agoodone) and a non-affiliated blog (e.g., anyblog.co.uk), respectively.

<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:blog="http://www.blogtrust.fake/blog#"> <rdf:Description rdf:about="http://my.blog.co.uk/"> <blog:OutLinkTo> <rdf:Description rdf:about="http://myother.blog.co.uk/"> <blog:blogroll blog:localRating="1" blog:dateUpdated="2008-1-1"/> </rdf:Description> </blog:OutLinkTo> <blog:OutLinkTo> <rdf:Description rdf:about="http://blog.co.uk/agoodone"> <blog:blogroll blog:localRating="0.6" blog:dateUpdated="2009-5-1"/> <blog:postlink rdf:parseType="Resource"> <blog:localRating>0.9</blog:localRating> <blog:dateUpdated>2008-11-1</blog:dateUpdated> <blog:sourcepermalink>http://my.blog.co.uk/2008-11-

1</blog:sourcepermalink> <blog:targetpermalink>http://blog.co.uk/agoodone/2008-10-

28</blog:targetpermalink> </blog:postlink> </rdf:Description> </blog:OutLinkTo> <blog:OutLinkTo> <rdf:Description rdf:about="http://anyblog.co.uk/"> <blog:postlink rdf:parseType="Resource"> <blog:localRating>0.8</blog:localRating> <blog:dateUpdated>2008-12-15</blog:dateUpdated> <blog:sourcepermalink>http://my.blog.co.uk/2008-12-

15</blog:sourcepermalink> <blog:targetpermalink>http://anyblog.co.uk/2008-12-

12</blog:targetpermalink> </blog:postlink> </rdf:Description> </blog:OutLinkTo> </rdf:Description> </rdf:RDF>


Fig. 2 The RDF structure for a blog

On the other side, the local collaborative blog site rating is subject to the blog’s

RDF, but also to the RDF files of all other blogs in its blogroll, assuming that the witness set is constituted by the blog sites comprised in the blogroll. Moreover, the collaborative process will take into account the rating of each external recommendation. The rating is available in the original RDF file and the external recommendations can be retrieved from the respective RDF files. In such case the rating mechanism for a blog should process the blog’s current ratings and those provided by each of the affiliated blogs.

Finally, the rating mechanism must collect and process the RDF metadata files from all blogs in the set in order to calculate the global blog site rating, This process is repeated periodically, so as to keep the rating up to date.

In the computation of collaborative and global rating, the mechanism should take into account two factors that affect the transitivity of rating: a) the effect of ratings provided by the affiliated blogs depends upon their credibility (i.e., the picture the evaluator blog site has formed about them), b) the rating contributed by a certain postlink decreases day by day and c) the prior rating estimated during the previous time period decreases in order to weigh down possible outdated evaluation ratings provided. The latter factor is captured by the time decay factor of equation 8, whereas the former is captured by the respective weight

)( xBS

ct BSw j

p = in equation 5. The localRating and dateupdated values of the

postlink are employed to store these two factors.

5 Experimental Setup

In order to demonstrate the blog rating model, we performed experiments on a sample blog dataset provided by Nielsen BuzzMetrics, Inc. The dataset spans a period before and after an important event: the London bombings (4/7/2005 – 24/7/2005). Table 1 that follows summarizes the statistics of the dataset:


Table 1 Statistics of the sample blog set

Unique blogs number Links to any blog Links to blogs in the set Links to news sites 1,545,205 2,138,381 331,068 498,834

It is obvious from Table 1 that the majority of the links points to blogs outside of the initial set and a large portion of the links points to news sites. The blogs that are outside of the initial set can probably be spam blogs (splogs), which are massively pointed by blogs in the set in an attempt to improve their ranking.

We perform three experiments on the same dataset: a) We find the most referenced blogs and news site for a single day using inlinks only, b) we rank sites according to the global rating using information for a single day and compare results with those of the first experiment, c) we apply the global rating model in the blogs using the posts of a single day, using different values for the memory factor and compare the position of spam blogs in the different sets of ranked results. As it is explained in the analysis of the results, our rating model penalizes the spam blogs, even for small values of the memory factor.

Results in Table 2 contain the top-20 blogs ranked using the number of incoming links as the rating factor. According to these results, the most popular sites on the first and the last day in the dataset comprise news sites and spam blogs (positions 13 to 20 on 4/7/2005 and 11 to 20 on 24/7/2005). Although news sites are acceptable in the top ranked results, the spam blogs should be penalized by the rating model.

Table 2 Most referenced sites in the dataset for the 4th and 24th of July 2005

Most referenced sites (4/7) Most referenced sites (24/7) Rank URL Inlinks Rank URL Inlinks 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

www.livejournal.com spaces.msn.com www.xanga.com www.skaterz.info news.yahoo.com news.bbc.co.uk www.nytimes.com www.cnn.com www.washingtonpost.com pics.livejournal.com www.msnbc.msn.com www.guardian.co.uk fantasy-fest-nude.blogspot.com top-play-lolita.blogspot.com lolita-top-sites.blogspot.com hardcore-lesbian-pictures.blogspot.comlesbian-kissing-pictures.blogspot.com naturist-teen-photos.blogspot.com funny-as-shit.blogspot.com really-funny-shit.blogspot.com

9511 2724 2503 1647 1399 1127 1092 563 560 530 451 389 376 376 376 376 376 376 376 376

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

www.livejournal.com www.xanga.com spaces.msn.com news.yahoo.com www.nytimes.com news.bbc.co.uk pics.livejournal.com www.washingtonpost.com biz.yahoo.com www.bbc.co.uk miss-usa-teen.blogspot.com nude-thumbnails.blogspot.com nude-girls-thumbnails.blogspot.com asian-nude-thumbnails.blogspot.com non-nude-teen-photos.blogspot.com amateur-teen-nude.blogspot.com nude-amateur-photos.blogspot.com photos-amateur-gratuites.blogspot.com breast-pumps-reviews.blogspot.com young-naked-gay-boys.blogspot.com

3450 2502 1229 1070 1039 841 535 513 443 440 361 361 361 361 361 361 361 361 361 361

The first step towards correcting this problem is to use our rating model instead of the number of inlinks. The local ratings are computed on a per blog basis. Thereafter, the global rating for all sites is estimated, using the accumulative


algorithm. A useful observation is that spam blogs usually receive a large number of inlinks the day they are created, but they further receive no inlinks, so it is expected that spam blogs will receive lower ratings by our model. The results in Table 3 show the ranking of blogs in the dataset according to their global rating in the 4th of July. Since this is the first day in our dataset, the global rating is computed using the inlinks of this specific day (memory equals to zero, m=0). It is obvious from the results in Table 3 that news sites have improved their ranking against all other blogs (including spam blogs).

Table 3 Top-20 ranked sites in 4/7 using global rating (m=0)

Most referenced sites Rank URL Rank in 4/7 using inlinks 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

www.livejournal.com news.yahoo.com news.bbc.co.uk www.bbc.co.uk www.nytimes.com www.cnn.com www.washingtonpost.com biz.yahoo.com pics.livejournal.com www.msnbc.msn.com www.guardian.co.uk www.latimes.com www.usatoday.com livejournal.com today.reuters.com www.sfgate.com www.boston.com www.forbes.com www.timesonline.co.uk www.newsday.com

1 5 6 120 7 8 9 118 10 11 12 123 238 246 261 122 173 178 174 266

As mentioned before, spam blogs usually receive a large number of links in a single day, which explains their ranking in the results of Table 2. However, these incoming links have a single origin (another spam blog), which has been created for this reason. According to equation 3, the contribution of blogs that contain many links is small and consequently spam blogs of this type receive a small local rating. However, there are still spam blogs that receive fabricated inlinks from different origins in the same time. In order to penalize these links we examine the blogosphere for several days (i.e., 20 days) using our model with memory (i.e., local accumulative blog site rating formation considering 20 time periods – 20 days).

In Table 4, we present the top-20 ranked blog urls in the dataset (urls that contain the term ‘blog’) for the 24th of July, which is the last date in the set. The blogs are rated using the maximum possible memory in our dataset (m=20). The rightmost column of Table 4 contains the number of inlinks for each blog in the 24th of July and the middle column contains the position of this blog in the same date, ranked using only the number of inlinks. It is obvious that normal blogs rank higher when collective memory from the previous days is employed and surpass the spam blogs.


Table 4 Most highly ranked blogs in the 24th of July (global rating, m=20)

Most referenced blogs Rank URL Rank in 24/7 using inlinks Inlinks in 24/7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

radio.weblogs.com blog.livedoor.jp blogs.sun.com postsecret.blogspot.com blog.searchenginewatch.com www.blogathon.org atrios.blogspot.com blogs.salon.com www.problogger.net doc.weblogs.com www.blogherald.com profiles.blogdrive.com powerlineblog.com www.bloggingbaby.com googleblog.blogspot.com americaninlebanon.blogspot.com blogs.guardian.co.uk badhairblog.blogspot.com hurryupharry.bloghouse.net www.captainsquartersblog.com

199 222 318 348 1204 308 302 413 523 480 519 253 290 258 444 1000 1431 496 995 267

88 65 26 22 4 28 30 17 12 14 12 47 32 45 15 6 4 13 6 40

A point of interest in the results is that professional blogs, such as Sun’s blog or

Google’s blog are ranked high when global rating is employed considering accumulative local blog site rating, although they receive few links in a single day (low ranking using a single day’s links). However, such blogs: receive links in a daily basis, are the single targets of the post each time (in contrast to spam blogs) and are pointed by highly rated blogs.

Table 5 The effect of memory size in spam blog global ranking

Memory Inlinks 1 3 5 7 Global ranking for the first set of spam blogs first – last in the set

47 - 292 6298 - 12250

8207 - 19672

12157 - 21587

15019-23126

Best position for a spam blog 47 6298 7124 6699 7882

In a third set of experiments, we examine the ranking of the 53383 blogs of our blogosphere part in the 14th of July (the date was selected because it is in the middle of the period examined) using five different values for the system’s memory: a) we consider that for memory equaling zero, only the inlinks created on the specific date affect rating, b) we take into account the postlinks provided at most m (m=1,3,5,7) days before the 14th of July. We manually examine the set of results to find the position of the first spam blog in the global ranking of blogs. As it is portrayed in Table 5, the first, from a set of spam blogs (ranging from the 49th to the 292nd position), falls below the 6298th position when the local ratings of the current day are employed in the calculation of global rating. It falls even lower for bigger values of m, although the change is smaller.


6 Conclusions

This work presented an iterative collaborative process to provide a global rating for a set of blogs using local rating information expressed via blogroll and post hyperlinks. The rating model is mathematically formulated, comprising local and local accumulative blog site rating formation (where the accumulative rating is calculated considering the local rating as estimated upon different consecutive time periods), collaborative local blog site formation (where the evaluator blog site exploits information gathered form other affilitated witnesses blog sites) and global rating formation, incorporating the view of all blog sited in the system. Our model exploits two special features of the blogosphere: a) the difference between blogroll links, which denote a more permanent trust towards the blog being pointed, and post links, which represent a more transient reference to a blog, b) the timestamp information of a post, which can be employed as a timestamp for a hyperlink. Additionally, a suggestion on the semantics that can be attached to each blog is also provided.

An initial experimental evaluation shows that the model performs well by punishing spam blogs that receive many links from a single source and favouring blogs that receive inlinks in a standard basis. The next steps of this work is to develop the architecture and the system entities that estimate and attach the rating information to blogs and that process local ratings in a periodic manner in order to update collaborative and global ratings. Future work additionally includes incorporation of possible postlink negative recommendations.

References

1. Blogpulse. Automated trend discovery system for blogs (2005), http://blogpulse.com/ (accessed May 2009)

2. Technorati, Blog tracking service (2005), http://technorati.com/ (accessed May 2009)

3. Mishne, G.: Information Access Challenges in the Blogspace. In: IIIA-2006: International Workshop on Intelligent Information Access, Helsinki, Finland (2006)

4. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Now Publishers (July 2008) ISBN 978-1-60198-150-9

5. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project (1998)

6. Haveliwala, T.: Topic-sensitive PageRank. In: Proceedings of the Eleventh International World Wide Web Conference, Honolulu, Hawaii, May 2002, pp. 517–526 (2002)

7. Gyöngyi, Z., Garcia-Molina, H., Pedersen, J.: Combating web spam with TrustRank. In: Proceedings of the 30th International Conference on Very Large Data Bases (VLDB), Toronto, Canada, September 2004, pp. 271–279 (2004)

8. Benczur, A.A., Csalogany, K., Sarlos, T., Uher, M.: SpamRank - fully automatic link spam detection. In: Proceedings of the First International Workshop on Adversarial Information Retrieval on the Web, AIRWeb (2005)


9. Massa, P., Hayes, C.: Page-rerank: using trusted links to re-rank authority. In: Proceedings of Web Intelligence Conference, France (September 2005)

10. Jeh, G., Widom, J.: Scaling personalized web search. In: Proceedings of the Twelfth International World Wide Web Conference, Budapest, Hungary, May 2003, pp. 271–279 (2003)

11. Technorati.com. VoteLinks, http://developer.technorati.com/wiki/VoteLinks

12. Technorati.com. XFN (Xhtml Friends Network), http://gmpg.org/xfn/ 13. Varlamis, I., Vazirgiannis, M.: Web Document Searching. Using Enhanced Hyperlink

Semantics Based on XML. In: Proceeding of the International. Database Eng. & Applications Symposium (IDEAS 2001), pp. 34–43 (2001)

14. Nakajima, S., Tatemura, J., Hino, Y., Hara, Y., Tanaka, K.: Discovering Important Bloggers based on Analyzing Blog Threads. In: 2nd Annual Workshop on the Blogging Ecosystem: Aggregation, Analysis and Dynamics, WWW 2005 (2005)

15. Kritikopoulos, A., Sideri, M., Varlamis, I.: BlogRank: ranking blogs based on connectivity and similarity features. In: Proceedings of the 2nd international Workshop on Advanced Architectures and Algorithms For internet Delivery and Applications, AAA-IDEA 2006, Pisa, Italy, October 10, vol. 198. ACM, New York (2006),

http://doi.acm.org/10.1145/1190183.1190193 16. Adar, E., Zhang, L., Adamic, L., Lukose, R.: Implicit Structure and the Dynamics of

Blogspace. In: Workshop on the Blogging Ecosystem: Aggregation, Analysis and Dynamics, WWW 2004 (2004)

17. Amitay, E., Carmel, D., Herscovici, M., Lempel, R., Soffer, A.: Trend Detection Through Temporal Link Analysis. Journal of the American Society for Information Science & Technology 55(14), 1270–1281 (2004)

18. Bar-Yossef, Z., Broder, A., Kumar, R., Tomkins, A.: Sic Transit Gloria Telae: Towards an Understanding of the Web’s Decay. In: Proceedings of the 13th International Conference on World Wide Web, pp. 328–337 (2004)

19. Berberich, K., Vazirgiannis, M., Weikum, G.: Time-Aware Authority Ranking. Internet Mathematics Journal 2(3) (2005)

20. Yu, P.S., Li, X., Liu, B.: On the Temporal Dimension of Search. In: Proceedings of the 13th International World Wide Web Conference on Alternate Track Papers & Posters, pp. 448–449. ACM Press, New York (2004)


Simulation-Based UMTS e-Learning Software

Florin Sandu and Szilárd Cserey

Abstract. This chapter describes a soft-switch-based mobile network simulation developed for the purpose of e-Learning.

The subject of simulation is a 3GPP R4 mobile communication network, where 2 types of scenarios, MOC (Mobile-Originating Call) and MTC (Mobile-Terminated Call) are simulated.

The simulator is capable to generate and send real H.248 and RANAP (Radio Access Network Application Protocol) pre-recorded messages to a loop-back network interface, which can be monitored using a software like Wireshark, Ethe-real, so the messages can be decoded and clearly interpreted. This brings a great benefit to professors, enabling them to better explain and show to the students the behavior of specific mobile communication network elements, phases of call es-tablishment, call processing and call control.

It is an educational software, that allows university students or company workers in the mobile communication field to learn, understand and study the processes, events and flows that appear in typical UMTS call establishment and call control.

1 Introduction

The Release 4 architecture of the 3rd Generation Partnership Project in mobile telecom adds two new elements to traditional network architectures: the mobile switching center server (MSC server) and the media gateway (MGW). These net-work elements communicate through the so-called “Mc interface” [1, 2, 3].

This architecture gives better opportunities to mobile operators, as it splits the call control functionality from the call switching functionality of the mobile net-work. This way, the architecture becomes much more flexible and the network be-comes more scalable.

For the purpose of e-Learning of such modern and complex telecom architec-tures, the authors integrated and completed a software environment for simulation and monitoring of specific call control and switching, allowing visualization of messaging throughout the network and signaling between network entities. Florin Sandu “Transilvania” University, Bd Eroilor Nr.29A, 500036 – Brasov, Romania Tel.: +40268478705 [email protected]

Szilárd Cserey Siemens Program and System Engineering, Str. M. Kogalniceanu Nr.21 Bl.C6, 500090 – Brasov, Romania Tel.: +40765300301 [email protected]

206 F. Sandu and S. Cserey

The GUI of this application allows the trainees to perform detailed studies in the same way as in the process of protocol analysis, which belongs to the most important and resource-intensive work in the “real world” of telecom laboratories.

2 The Simulator as e-Learning Software

2.1 Motivation

The e-Learning package was built on top the OMNeT++ discrete event simulator and was written in C++ , using the OMNeT++ Application Programming Inter-faces (API) [4].

In order to implement our e-Learning concept, we have chosen the UMTS R4, an architecture that was not implemented in any other kind of mobile network simulator.

The simulator not just shows the message flows between specific network ele-ments, but also creates realistic packets that can be viewed using protocol moni-toring tools like Wireshark / Ethereal. This is the emulation part of the software, a tool that “generates“ realistic RANAP and H.248 protocol messages / packets used on the Mc interface of the R4 network [5].

The messages are not really generated as those appearing on real network emu-lators, these messages are only extracted from pre-recorded real trace files and are rebuilt to form new packets.

A very important feature of the simulator, that brings great benefits for e-Learning, is that it can be run step-by-step, this way letting the trainee to analyze thoroughly - and the trainer to explain in more detail - every message that was sent and received [14]. The simulation can be viewed from two perspectives: one is the architectural (that shows which type of message was exchanged between which type of network nodes) - this is specific to the OMNeT++ simulator; the other is the per-spective of protocol monitoring (using the Wireshark / Ethereal or other similar tools), where every packet can be analyzed in a great detail, bit-by-bit, showing what kind of protocol is used on each OSI layer and which are the specific parameters.

2.2 How the Simulation Environment Was Created The software was created using the OMNeT++ simulation engine and OMNeT++ APIs. We have developed 7 new models for OMNeT++, each model implement-ing the behavior of a specific network element. In order to have a complete simu-lation of the UMTS R4 mobile network, we had to create models for the following mobile network elements: User Mobile Equipment, Node B, Radio Network Con-troller, Circuit Switched - Media Gateway, Mobile Switching Center – Server, Gateway Mobile Switching Center – Server and a generic model representing the PSTN network [1, 4].

The simulation works only at the message-exchange level, every node can re-ceive specific messages and can respond to specific requests. We put emphasis on messages, message exchange and detailed call flow.

This was required by our specific “reverse engineering” approach: we intro-duced into simulation realistic packets, monitored in a real, operational, state-of-the-art industrial 3G network. These can be “monitored” and analyzed by the very popular Wireshark / Ethereal protocol monitor [5].

Simulation-Based UMTS e-Learning Software 207

The way we enhanced our simulation was based on collecting some realistic messages (difficult to be synthesized off-line) - packets prerecorded by the Tek-tronix K1205 and K1297 protocol analyzers in the laboratories of Siemens Pro-gram and System Engineering Romania. Thus they were brought to the attention of trainees, complex packets generated by real mobile communication equipment, real Circuit Switched Media Gateways and real Mobile Switching Center Servers. We created a C++ software that can rebuild the messages from these real packets. The simulation environment runs in two planes, one is the OMNeT++ simulation and the other is the packet generator software that sends in the same time with the simulation, real packets to a virtual loop-back adapter which is monitored by Wireshark / Ethereal (see figures 6, 7 and 8). The packet generator is controlled by the OMNeT++ simulation [4, 5, 6].

3 The UMTS R4 Architecture

With UMTS Release 4 (R4), the architecture of the core network circuit switched domain was revised radically. The circuit traffic is delivered over an internal packet-switched Internet Protocol (IP) network with connections to external net-works handled via media gateways (MGW). The architecture of a R4 network is given in figure 1. The architecture of the CS (Circuit-Switched) core is described by the 3GPP TS 23.205 specification, entitled “Bearer-independent circuit-switched core network”, termed bearer-independent because the core network can use asynchronous transfer mode (ATM) or IP, with many different Layer 2 op-tions. In this case, traffic entering or exiting the circuit-switched domain is con-trolled by the MGW [1, 2, 3].

This is responsible for switching the traffic within the core network domain and performing data translation between the packet-based format used within the core network and the circuit switched data transmitted on the PSTN or ISDN external network. The MGW is controlled by the mobile switching center (MSC) server,

MGW MGW

UTRAN

GERAN

PSTN

signalling interface

signalling and control interface

Nc

Nb

McMc

Iucs

A Iucs

A

MSC-S/VLR

MSC-S/VLR

HLR/ACCSE

D

CCAP

CAP

CONTROL

TRANSPORT

Fig. 1 The architecture of the Release 4 UMTS mobile communication network


which sends control commands to the MGW, for example to establish bearers in order to carry calls across the core network. The user data (i.e. voice traffic) with-in the CS-CN (Circuit Switched – Core Network) domain can be carried within ATM cells (ATM adaptation layer 2 - AAL2) or IP packets [1, 9, 10].

4 Implementation of the Simulator Based on OMNET++

As already mentioned in the previous paragraphs, the e-Learning package was built on top of the OMNeT++ simulator; the code was written in C++, using dif-ferent APIs from OMNeT++ that allows the integration with the simulation en-gine. OMNeT++ is an object-oriented modular discrete-event network simulator, which can be used for: traffic modeling of telecommunication networks, protocol modeling and other network related simulations [4].

Our development can be considered mainly as “modeling”, as we created mod-els for each element of an R4 UMTS network, and implemented their behavior from the perspective of call flows. The models have been implemented and tailored to support two kind of test scenarios: a Mobile-Originated Call (MOC) scenario

Fig. 2 The Network Editor


and a Mobile-Terminated Call (MTC) scenario, the most basic ones, that happen most frequently in a mobile communication network.

The simulation/emulation software consists of two parts. The first part is the simulation software which contains the simulation files of

every model (mobile equipment, Node B, RNC, Circuit-Switched Media Gateway, MSC-Server, GMSC-Server and PSTN network), and also the network topology description file.

The second part is the emulation software that generates and sends RANAP and H.248 messages to the virtual loop-back network interface.

In the OMNeT++ simulator, every model has to be described at least by one class, derived from the cSimpleModule class.

The behavior of a model is associated to a generic state machine. This state machine describes the states of a specific network node.

Every network node has an initialization state (INIT) where the variables are initiated; after this state the node enters into the waiting state (IDLE), where it waits for incoming messages. Nodes become active when they receive a message.

UE NodeB RNC MGW MSC_S GMSC_S PSTN

cSimpleModule

Fig. 3 The class hierarchy of the models

INIT

IDLE MSG RECV

MSG DISCR MSG SEND

PACK

Fig. 4 The state machine associated to the network model


Every event that happens in the simulated network must be caused by the send-ing and the receiving of a particular message. In order for the simulation to begin, a node in the initialization state must create and send a particular message to an appropriate node. There are two kinds of messages: one type of message is a mes-sage that is sent by a node to another node, or a message sent by a module to an-other module, and the other type of message is the self-message used for the im-plementation of counters and awake impulses. Because every message represents a specific event, these messages are introduced to a waiting queue, as events have to happen in a specific order.

Another kind of state is the state of receiving a message (MSG RECV), it could be a message coming from another node or could be a self message. After this state, the node enters into the message analyzing state, MSG DISCR (message discrimination).

The network node may or may not send a response message to the sender, if it decides to send a response, then it will enter into the MSG SEND state.

The last state is the PACK state where the node invokes the emulation software which will create realistic RANAP and H.248 protocol messages and send them to a virtual loop-back network adapter, a free software called VirtNet.

This interface can be monitored by the Wireshark / Ethereal network analyzer. RANAP messages are used by the MSC Server to communicate with the Radio Network Controller; these messages are forwarded by the Media Gateway to the RNC because the MSC Server is not directly interconnected with the RNC. RANAP is based on the SIGTRAN protocol stack – the Media Gateway contains a Signaling Gateway part whose role is to forward SIGTRAN messages.

The SIGTRAN (Signaling Translation) protocol stack is an adaptation of the SS7 protocol to the IP protocol – so that SS7 protocol messages could be transmit-ted through IP networks.

The H.248 / MEGACO (Media Gateway Control) protocol messages are used for the communication: notification and control messages between the MSC-Server which is the master/controller and the Media Gateway which is the slave.

The functionality of nodes can be described using SDL diagrams. SDL, the “Specification and Description Language” is used to describe the behavior of communication systems. The SDL standard was created by ITU in the Z.100 specification.

In figure 5, below, it is given the SDL diagram which describes the functional-ity of the Radio Network Controller (RNC).

The network nodes are behaving and communicating with each other as it is de-scribed in the 3GPP TS 23.205 specification where they can be found the call flows for MOC (Mobile-Originated Call) and MTC (Mobile-Terminated Call) [1].

In figure 6 it can be seen a generic illustration of the simulation environment. As simulation is started from the OMNeT++ engine, packets start to circulate on the network - if messages appear on the Mc interface, a packet generator is trig-gered to automatically generate the corresponding real packets and send them to the VirtNet loop-back adapter, while on the other side, Wireshark captures in real time these generated packets [6].


Yes

Yes No

SEND TO MSC-SRAB ASSIGNMENT RESP

from NodeB ?

SEND TO MGWIuUP Data

to RNC ?

WAIT FOR MESSAGE

Data ? Yes

Yes

Yes

No

No

NOTIFY

SEND MSG To NodeB

RAB ASSIGNMENT REQ ?

CHECK AND CAST signaling

CHECK AND CASTvoice

Iu Release Command ?

NOTIFY

SEND TO MSC-S Iu Release Complete

from MGW ?

SEND MSG To MGW

Yes

No

Fig. 5 The SDL diagram which describes the functionality of RNC

PACKET GENERATOR

OMNeT++ Simulation

OPERATING SYSTEM

ViRTNeT – VIRTUAL LOOPBACK ADAPTER

Command

H.248 / RANAP packets

WIRESHARK

H.248 / RANAP packets

Fig. 6 The generic architecture of the e-Learning environment


The figure 7 below shows a snapshot of the running simulation.

Fig. 7 Snapshot of the OMNet++ simulation

In figure 8 it can be seen how packets are captured and decoded by the Wire-shark / Ethereal software.

Fig. 8 Packets captured and decoded by Wireshark / Ethereal


5 Case Study: The MOC Scenario

As it can be seen in figures 10-12 below, which illustrate the call flows and simu-lation of a Mobile-Originated Call, the call begins with a SETUP message sent by the mobile user equipment (UE) to the MSC Server. The MSC Server responds with a CALL PROCEEDING message, and after that, it will send an IAM (Initial Address Message) to the GMSC Server which is connected to the Public Switched Telephone Network (PSTN). The MSC Server will indicate through this message that a forward bearer establishment will be done. With this message, the MSC Server will send to the GMSC Server all the information about the bearers charac-teristics [1, 7, 8].

The MSC Server will choose a Media Gateway (MGW) to establish the bearer for the call. The IAM message also contains the Media Gateway identifier. The IAM message is part of the BICC (Bearer Independent Call Control) protocol.

The GMSC Server will decide if the call must be forwarded or not to the PSTN network; it commands the MGW that resides under its control to make an association between the IP network (from the Core Network side) and the PSTN network.

To achieve this setting, the Media Gateway, that is connected directly to the PSTN network, will create two terminations. In the above call-flow, these termina-tions are T3 and T4. First the GMSC Server sends the “ADD.request($)” command to request from the Media Gateway the creation of a new context, to choose a ter-mination and to add to the newly created context. The $ character is a so-called “wildcard”, in this case $ is the “CHOOSE WILDCARD”, this will tell to Media Gateway to choose a termination. The Media Gateway will respond with the “ADD.request( T4)” message that will contain the ID’s (Identifications) of the con-text and the termination.

This process will be made in the same way for T3 termination (ADD.request($) and ADD.response( T3)).

The T3 termination resides on the Core Network and T4 on the other side, on the PSTN network. With the T4 termination it will be created a bearer to the PSTN network.

The GMSC Server forwards the IAM message to the PSTN network and sends a response message back to the MSC Server, an APM (Application Transport Message – which belongs to the Bearer Information Messages) message that con-tains information about the bearer’s characteristics. After the (Mobile Switching Center) MSC Server has received the APM message, it will use the “Establish Bearer Procedure” to request from the Media Gateway that is under it’s control to create a bearer to the remote Media Gateway that is directly connected to the PSTN network.

The MSC Server sends together with this request message the information got from the earlier received APM message, information like:

The “bearer address”, “binding reference” and the “bearer characteristics”.


UE RNC NodeB MSC-S MGW-1 MGW-2 GMSC-S PSTN

NAS SETUP

CALL PROCEEDING

IAM ( Initial Address Message )BICC

H.248

ADD.request ( $ )

ADD.reply ( T4 )

ADD.request ( $ )

ADD.reply ( T3 )

ISUP

BICC

IAM

APM ( Bearer Information Message )

H.248

ADD.request ( $ )

ADD.reply ( T2 )

ADD.request ( $ )

ADD.reply ( T1 )

BEARERESTABLISHMENT

BEARERESTABLISHMENT

Establish Bearer + Change Through-Connection Procedures

Prepare Bearer + Change Through-Connection Procedures

Fig. 9a The sequence of call flows in a Mobile Originated Call scenario – part 1



RANAP

BEARERESTABLISHMENT

RAB Assignment Request

RAB Assignment Response

IuUP

IuUP Init

IuUP Init Ack

RANAP

NbUP

NbUP Init

NbUP Init Ack

BICC

ISUP

NAS

CONTINUITY

ACM

ACM ( Address Complete Message )

ANM ALERTING

IuUP Initialization

NbUP Initialization

Fig. 9b The sequence of call flows in a Mobile Originated Call scenario – part 2



H.248

MOD.request ( T3 )

MOD.reply ( T3 )

MOD.request ( T4 )

MOD.reply ( T4 )

RANAP

ANM ( Answer Message )

H.248

MOD.request ( T1 )

MOD.reply ( T1 )

MOD.request ( T2 )

MOD.reply ( T2 )

NAS

CONNECT

CONNECT ACKNOWLEDGE

Communication

NAS

DISCONNECT

BICC RELEASE

Change Through-Connection + Activate Inter-Working Function + Activate Voice Processing Function Procedures

Activate Inter-Working Function + Activate Voice Processing Function Procedures

Fig. 9c The sequence of call flows in a Mobile Originated Call scenario – part 3



ISUP

Release

Release Complete

NAS

RELEASE

RELEASE COMPLETE

RANAP

Iu Release Command

BEARERRELEASE

Iu Release Complete

H.248

SUB.request ( T3 )

SUB.reply ( T3 )

BEARERRELEASE

SUB.request ( T4 )

SUB.reply ( T4 )H.248

SUB.request ( T1 )

SUB.reply ( T1 )

BICC Release Complete

H.248

MOD.request ( T2 )

MOD.reply ( T2 )

SUB.request ( T2 )

SUB.reply ( T2 )

BEARERRELEASE

Release Termination Procedure

Release Bearer + Change Through Connection Procedures

Release Termination Procedure

Fig. 9d The sequence of call flows in a Mobile Originated Call scenario – part 4


Fig

. 10

Mob

ile

Ori

gina

ted

Cal

l sce

nari

o –

sim

ulat

ion

setu

p


Fig. 11 Mobile Originated Call scenario – simulation running


Fig

. 12

Mob

ile

Ori

gina

ted

Cal

l sce

nari

o –

pack

ets

capt

ured

by

Wir

esha

rk


The establishment of the bearer between the two Media Gateways is done with the use of the “ADD.request( $)” and “ADD.reply( T2)” messages.

It can be observed that the T2 termination resides on the Core Network. At the creation of the bearer, a connection is established between the T2 termination re-siding on the local Media Gateway and the T3 termination residing on the remote Media Gateway.

By now it has been created a bearer between the two Media Gateways, the local and the remote and another bearer between the remote Media Gateway and the PSTN network. Next comes the “Prepare Bearer Procedure” that will create a bearer to the UMTS Radio Access Network (UTRAN).

The MSC Server will choose the characteristics of the bearer. The MSC Server will request from the Media Gateway to be prepared for the access bearer assign-ment by using the “Prepare Bearer Procedure”. This procedure is accomplished with the use of the “ADD.request( $ )” and “ADD.reply( T1 )” commands. T1 is the termination that is connected to the Radio Access Network.

The MSC Server requests from the Media Gateway to send the information about the “bearer address” and the “binding reference”, and in response the MSC Server will send the bearer characteristics and will request, from the Media Gate-way, to be notified if the bearer characteristics can be changed or not.

For voice calls, the MSC Server will send to the Media Gateway some informa-tion for voice encoding. For data calls, the MSC Server will send to the Media Gateway some information about the “PLMN Bearer Capability”.

The Media Gateway creates the T1 termination then it adds to the context and sends a response back to the MSC Server, with the IDs of the context and the ter-mination, the IP address and the port number of the termination.

After the Media Gateway responded with the “bearer address” and “binding reference” information, the MSC Server will request from the RNC (Radio Net-work Controller) to allocate the access bearer, by sending to the RNC the “RAB Assignment Request” command. This request will also contain the “bearer ad-dress” and “binding reference” information.

The MSC Server will be notified by the Media Gateway about the possibilities of modification of the bearer’s characteristics at a later phase. This procedure is called the “Bearer Modification Support Procedure”.

After this, the initializations of the user plane are done. The user plane is a pro-tocol stack from the Iu and Nb interfaces. The Iu interface resides between the RNC and the Media Gateway, and the Nb interface resides between two media gateways. The Nb UP and the Iu UP protocols are set to work in the “forward bearer establishment” mode. The Media Gateway knows that “forward bearer es-tablishment” is used, because of the information that was previously sent by the MSC Server at the “Establish Bearer” and “Prepare Bearer” procedures. After the radio access bearer assignment, the MSC Server will send a CONTINUITY mes-sage to the GMSC Server to acknowledge the assignment of the radio access bearer. Even at the beginning of the call processing when the IAM message was sent, the MSC Server warned the GMSC Server that soon it will send the CONTINUITY message - this behavior shows that the “late access bearer assign-ment” will not be used (“late access bearer assignment” means that the bearer


assignment will be done after the sending of “alerting” and “answer” messages, which in this case is not used). The called party sends and ACM (Address Com-plete Message) to the mobile network that is forwarded to the local MSC Server, which will send an “ALERTING” message to the “calling party” (to the party that starts the call). In this phase it will ring at the called party and this ringing tone will be also played at the caller party. If the called party responds to the call, in the same time an ANM (Answer Message) message is sent to the local MSC Server.

At the receipt of the ANM message, the interconnection of the terminations is done both at the local and the remote Media Gateways. The interconnection of the terminations is done with the “Change Through-Connection” procedure, after this phase the data packets can travel through the terminations in both directions. At this procedure the “MOD.request” and “MOD.response” commands are used.

Also with the use of these commands the “Activate Inter-Working Function” and “Activate Voice Processing Function” procedures are done.

The “Inter-Working Function” procedure is used in case of data calls, and “Voice Processing Function” procedure in case of voice calls. The “Voice Proc-essing Function” procedure is performed at the Media Gateway and it is used to insure the acoustic quality of the voice, at data calls this feature is deactivated.

The MSC Server sends to the caller party a “CONNECT” message to signal the successful connection of the call. The mobile user equipment “UE” will respond with a “CONNECT ACKNOWLEDGE” message.

From this point the conversation between the two parties can start. The termination of the call by the caller is done in the following way: The mobile user equipment sends a “DISCONNECT” message to the MSC

Server, that will send two “RELEASE” messages, one to the user equipment and one to the GMSC Server. The GMSC Server will also send a “RELEASE” mes-sage to the PSTN network. After the resources are released in the PSTN network, this will respond with a “RELEASE COMPLETE” message. The GMSC Server also releases the resources, it deletes the terminations or it keeps them but adds them to a NULL context, and deletes the other context.

The release of resources at the remote media gateway is done at both termina-tions, both at PSTN and Core Network part. After that the GMSC Server sends a “RELEASE COMPLETE” message to the MSC Server.

While these operations are executed at the GMSC Server, the MSC Server also frees the allocated resources and will command the RNC to delete the radio access bearer using the “Iu Release” command and the RNC will respond with the “Iu Re-lease Complete” message. This procedure is called the “Release Bearer Procedure”.

After this event the T1 termination will be deleted - this termination was allo-cated at the UTRAN (UMTS Terrestrial Radio Access Network) part.

This operation is executed using the “SUB.request( T1 )” command and the procedure is called the “Release Termination Procedure”.

The T2 termination that was allocated at the Core Network part will be deleted just after the MSC Server has received the “Release Complete” message from the GMSC Server.

First the connection between the terminations is untied, with the “Change-Through Connection Procedure” using the “MOD.request( T2 )” command and


just after the accomplishment of this operation, the T2 termination will be dis-missed, with the use of the SUB.request( T2 ) command. This procedure is also called “Release Termination Procedure”.

Figure 13 presents a MTC (Mobile Terminated Call) scenario – with the call arriving from the PSTN.

Fig. 13 Mobile Terminated Call scenario

6 Interpreting Trace Files. RANAP and H.248 Messages

The simulation environment generates 2 types of packets - RANAP and H.248. RANAP is a protocol used to ensure the communication between the MSC-Server and Radio Network Controllers. This protocol is used in UMTS signaling between the Core Network and the Radio Access Network. RANAP is carried over the Iu interface which directly connects the RNCs (Radio Network Controllers) to the CN (Core Network) [8, 12, 13].

RANAP is mainly used for tasks like: Relocation, Radio Access Bearer Man-agement, Paging and assures the transport of signaling messages between the UE (User Equipment) and Core Network, this is called as non-access stratum signaling.

The call setup in a MOC scenario begins with the SETUP message which is sent by a mobile equipment to the core network.

The RANAP implements the following functions:

• Relocation – which includes functions like SRNS (Serving Radio Net-work Subsystem) Relocation, Hard Handover

• RAB (Radio Access Bearer) Management – where Radio Access Bearers are handled using operations like: RAB Set-up (by eventually queuing the set-up), modification of RAB characteristics, clearing an existing RAB

• Iu Release – which releases all resources (from control & user plane), of a specific Iu instance, related to a certain UE

• Report Unsuccessfully Transmitted Data • Common ID Management – by permanently sending the identification of

UE, from CN to UTRAN


Fig. 14 Call initiation using SETUP

• Paging – where the CN pages an idle UE in order to establish a call with it • Management of tracing • UE-CN signaling transfer • Security Mode Control • Management of overload • Reset • Location Reporting

According to figure 14 , the MOC call begins with a SETUP message with it is

replied with a CALL PROCEEDING message. Figure 15 shows the protocol stack of the Iu interface. This is important to

know because this enables the protocol dissection in Wireshark in a correct manner.


Fig. 15 The protocol stack of the Iu interface

This is how the SETUP message looks after it is captured and dissected by Wireshark. As you can see, it has almost the same structure as the illustration above, the difference is that it is not carried on ATM just on a simple Ethernet frame.

Fig. 16 Wireshark dissection of the SETUP message - details on RANAP

In this case the RANAP protocol role is to carry the signaling message between the UE and Core Network, as it can be seen it encapsulates the SETUP message sent by the a mobile equipment (DTAP – Setup).

DTAP (Direct Transfer Application Part) messages are used to transfer call control and mobility management messages to and from the MS.


RANAP is the radio network layer signaling protocol of the Iu interface, it transfers the messages between RNC and 3G-SGSN, or between RNC and 3G MSC through the Iu interface. It provides a signaling channel through which messages are transparently carried between the UE and Core Network.

There are 28 types of RANAP messages, and this is of the type: “DirectTransfer” Direct Transfer is used when a UE – CN signaling message has to be sent from

the RNC to the CN without interpretation. The RANAP PDU (protocol data unit) is of the type “initiatingMessage”, which

means that the initiating node waits for a reply message to receive in response. When the MSC Server receives a SETUP message, it replies with a CALL

PROCEEDING. SETUP and CALL PROCEEDING are specifically used for the call establishment. The other type of message that is generated and captured by the simulation en-

vironment is H.248 or MEGACO, which is used by the MSC Server to control one or multiple Media Gateways [2].

As it can be seen in figure 17, the new architecture handles call control sepa-rately from call transport, this is why the new Mc interface was introduced.

Fig.17 The advantage of the R4 architecture

The main protocol used on this interface is H.248 - this ensures the communi-cation (control, notification) between MSC Servers and Media Gateways. RANAP just passes through Mc on its way to the MSC Server, but is not specifically origi-nated and terminated on this interface. RANAP is originated from the Radio


Network Controller and terminates at the MSC Server, and is not a protocol that directly belongs to the Mc interface. The architecture of H.248 contains some spe-cific elements called termination and context, these are abstract elements which define the status of connections inside the Media Gateway.

A termination, for example, could be a source or a destination of multimedia traffic. Any termination could sink or generate multiple flows of multimedia traf-fic. The termination could refer to a physical resource like a time-slot from a TDM (Time-Division Multiplex) circuit, in which case is considered as a semi-permanent termination as it will exist as long as it is fed with traffic by that TDM time-slot. The other type is the ephemeral termination, which can be created by the “add” command [1, 2].

An ephemeral termination could represent multimedia flows like RTP or AAL2 and it can have properties like : IP address, port number or channel IDs for ALL2.

Every termination of a Media Gateway has a specific name / ID, of 32 bits. The Context defines the connection of multiple terminations. All terminations

from a context will send and receive multimedia traffic. A termination can be con-nected to other just by simply moving it from a context to another. There is also a special type of context named “null context”. All terminations added to this con-text, are in fact disabled and are not connected to any other terminations. The illus-tration on figure 18 shows how 2 networks could be interconnected using a context and termination models, there are in fact 2 SCN (Switched Circuit Network ) bear-ers channels directly connected to a RTP multimedia flow from an IP network.

Termination RTP stream

TerminationSCN bearer channel

TerminationSCN bearer channel

Context

Termination

Null Context

Termination

Null Context

Termination

Context

Termination

Fig. 18 A generic model of the Media Gateway


Terminations can generate events, which are detected by Media Gateways and signaled to MSC Servers. An MSC Server could request from a Media Gateway to be notified about certain events - for this purpose it will send a command mes-sage named “modify”. If an event suddenly occurs, the Media Gateway informs the MSC Server using the “notify” command.

In figure 19, the dissection of a H.248 packet is shown, which contains and Add Request command expressed in the following format:

T 6107b89 c fffffffe AddReq 40000012

The AddReq message is sent by the MSC Server to the Media Gateway, in order to add the termination 40000012 to the fffffffe context. The code 6107b89 represents the transaction ID.

The context ID is expressed in hexadecimal form, so the ASCII equivalent of 0xfffffffe is $ which means to choose any context. The $ symbol is a wildcard which has the meaning of “choose any”.

If instead of $, the * symbol would be used, then it would mean to “select all”. With * the MSC Server could select ALL the terminations and contexts available in a Media Gateway.

Fig. 19 Wireshark dissection of an H.248 AddReq message

The following types of messages were used in the MOC scenario simulation: AddReq, AddReply, ModReq, ModReply, SubReq, SubReply.

AddReq – adds a termination to a context ( acknowledged by AddReply )


ModReq – modifies the properties of a termination ( acknowledged by ModReply)

SubReq-removes a termination from a context ( acknowledged by SubReply )

7 Conclusions

The present approach is not an exhaustive coverage of all the message types that can occur in an UMTS network, but a basic and consistent pool of messages that are exchanged between the MSC Server and the CS-Media Gateway. These are the new network elements that make the difference between the new UMTS R4 net-work and the traditional GSM/UMTS network, introducing the “soft-switching” technology into mobile communications. This message pool is scalable - as the “proof of concept” was successful for the specific of our implementation: the col-lection and reuse of prerecorded packets that are re-introduced in simulations, cre-ating this way a new kind of realistic e-Learning software packages. These could be very useful not only for university students but also for company employees, for the purpose of training - primary and updating (“delta training”).

The software was already tested and practically validated in different laboratory works at “Transilvania” University for last-year students in engineering studying mobile communications and telecom architectures.

The practical work was documented and cataloged as SCO (Shareable Content Objects) and listed in the Moodle LMS (Learning Management System) of our university for further use by teachers and tutors.

Personalization of educational services becomes possible by this SCORM (SCO Reference Model) compliance of the meta-data attached to these “educa-tional objects” (laboratory simulations-emulations). The semantic nature of these specific SCO fits them for “catalogues”, makes them “searchable” and possible to be “aggregated” in personalized “tailored” learning programs (“individualized paths”). The tutors can pick and recommend parts of the experiments and/or stu-dents themselves can choose subsets adapted for “beginner” or “advanced” levels. Furthermore, personalization can be done depending on various pre-requisites (prior graduated modules, quantified in “transferable credits” systems and/or cho-sen “vocational” profiles – including fees’ dependencies) [15, 16].

The intrinsic layered nature of protocol analysis allows adaptation on levels of difficulty for the approach of these e-Learning scenarios. They were chosen very popular protocol monitors and network simulators. The behavioral approach (state machines specific to telecommunication standards) brings an important feature to these educational services: virtualization, that involves semantics towards a “net-work of information”. This useful perspective can be extrapolated by students in the understanding of the distributed management of future global telecom systems controllable like “colonies” based on ontologies.


List of Abbreviations

3GPP Third Generation Partnership Project AAL2 ATM Adaptation Layer type 2 ANM Answer Message APM Application Transport Message ATM Asynchronous Transfer Mode BICC Bearer Independent Call Control CC Call Control CS Circuit Switched CS-CN Circuit Switched-Core Network GERAN GSM/EDGE Radio Access Network GGSN Gateway GPRS Support Node GMSC-S Gateway MSC Server HLR Home Location Register IAM Initial Address Message M3UA MTP 3 User Adaptation MAP Mobile Application Part MSC Mobile Switching Centre MSC-S MSC Server MOC Mobile Originated Call MTC Mobile Terminated Call MTP Message Transfer Part MTP3-B Message Transfer Part level 3 B MGW Media GateWay MEGACO Media Gateway Control Protocol M3UA MTP 3 User Adaptation OMNeT++ Objective Modular Network Testbed in C++ OSI Open Systems Interconnection NED Network Editor NAS Non-Access Stratum PSTN Public Switched Telephone Network RAB Radio Access Bearer RAN Radio Access Network RANAP Radio Access Network Application Part RNC Radio Network Controller RTP Real-time Transport Protocol SCCP Signaling Connection Control Part SCTP Streaming Control Transport Protocol SIGTRAN Signaling Transfer SGSN Serving GPRS Support Node TDM Time Division Multiplexing UMTS Universal Mobile Telecommunications System UTRAN UMTS Radio Access Network UE User Equipment VLR Visitor Location Register


References

[1] 3GPP TS 23.205 version 4.11.0 Release 4, Universal Mobile Telecommunications System (UMTS); Bearer-independent circuit-switched core network; Stage 2,

http://webapp.etsi.org/exchangefolder/ ts_123205v041100p.pdf, http://www.3gpp.org

[2] ITU-T H.248.1, Gateway control protocol: Version 3, http://www.itu.int [3] Bannister, J., Mather, P., Coope, S.: Convergence Technologies for 3G Networks IP,

UMTS, EGPRS and ATM. John Wiley & Sons, Chichester (2004) [4] Varga, A.: OMNeT++ Discrete Event Simulation System Version 3.2 User Manual

(2005), http://www.omnetpp.org [5] Lamping, U.: Wireshark Developer’s Guide (2007),

http://www.wireshark.org/ [6] Virtual Network Adapter VirtNet1.0,

http://www.ntkernel.com/w&p.php?id=32 [7] Korhonen, J.: Introduction to 3G Mobile Communications, 2nd edn. Artech House

(2003) ISBN 1-58053-507-0 [8] Kreher, R., Rüdebusch, T.: UMTS Signaling: UMTS Interfaces, Protocols, Message

Flows and Procedures Analyzed and Explained. John Wiley & Sons, Chichester (2007)

[9] Znaty, S.: Next Generation Network (NGN) dans les réseaux mobiles (2005), http://www.efort.com

[10] Znaty, S., Dauphin, J.L.: Architecture NGN: Du NGN Téléphonie au NGN Mul-timédia (2005), http://www.efort.com

[11] Hillebrand, F.: GSM and UMTS, The Creation of Global Mobile Communication. Wiley, Chichester (2001)

[12] Wisely, D., Eardley, P., Burness, L.: IP for 3G-Networking Technologies for Mobile Communications. Wiley, Chichester (2002)

[13] Glisic, S.G.: Advanced Wireless Networks, 4G Technologies. Wiley, Chichester (2006)

[14] Sorensen, B., Ramachandran, S.: Simulation-Based Automated Intelligent Tutoring. In: Smith, M.J., Salvendy, G. (eds.) HCII 2007. LNCS, vol. 4558, pp. 466–474. Springer, Heidelberg (2007)

[15] Gibson, D.: New directions in e-learning: Personalization, simulation and program assessment. Invited presentation at the International Conference on Innovation in Higher Education, Kiev, Ukraine (2003),

http://ali.apple.com/ali_media/Users/1000507/files/ others/New_Directions_in_elearning.doc

[16] Rose, A., Eckard, D., Rubloff, G.: An application framework for creating simulation-based learning environments, University of Maryland Dept. of Computer Science Technical Report CS-TR 3907 (1998)

Author Index

Aghasaryanb, Armen 23Alexopoulos, Panos 9Anagnostopoulos, Christos-Nikolaos

127Anagnostopoulos, Ioannis 1, 145Askounis, Dimitris 9

Bielikova, Maria 1

Cserey, Szilard 205

Doukas, Charalampos 163

Felber, Pascal 73

Giannoukos, Ioannis 109

Iliou, Theodoros 127

Kafentzis, Konstantinos 9Karpouzis, Kostas 163Kayafas, Eleftherios 109Kesorn, Kraisak 49Kovarova, Alena 93Kropf, Peter 73

Liang, Zekeng 49Loumos, Vassili 109

Louta, Malamati 187Lykourentzou, Ioanna 109

Maglogiannis, Ilias 163Mignon, Sabrina 23Mpardis, Giorgos 109Mylonas, Phivos 1

Naudet, Yannick 23Nikolidakis, Stefanos 145Nikolopoulos, Vassilis 109

Poslad, Stefan 49

Sandu, Florin 205Senot, Christophe 23Serbu, Sabina 73Spielvogel, Christian 73Szalayova, Lucia 93

Toms, Yann 23

Varlamis, Iraklis 187Vergados, Dimitrios D. 145

Wallace, Manolis 1, 9

Zoumas, Christoforos 9

Documents

Semantics in Adaptive and Personalized Services: Methods, Tools and Applications