SYSTEM SUPPORT FOR PERVASIVE MULTIMEDIA SYSTEMSlass.cs.umass.edu/theses/xiaotao.pdf · 2007-01-05 · system support for pervasive multimedia systems to address these challenges

SYSTEM SUPPORT FOR PERVASIVE MULTIMEDIA SYSTEMS

A Dissertation Presented

by

XIAOTAO LIU

Submitted to the Graduate School of theUniversity of Massachusetts Amherst in partial fulfillment

of the requirements for the degree of

DOCTOR OF PHILOSOPHY

September 2006

Department of Computer Science

c© Copyright by Xiaotao Liu 2006

All Rights Reserved


A Dissertation Presented

by

XIAOTAO LIU

Approved as to style and content by:

Prashant Shenoy, Co-chair

Mark D. Corner, Co-chair

Weibo Gong, Member

Deepak K. Ganesan, Member

W. Bruce Croft, Department ChairDepartment of Computer Science

To my dear wife — Fan Yang, and

my loving parents — Guojun Liu and Meijun Wang

ACKNOWLEDGMENTS

I would first like to thank my advisors: Professor Prashant Shenoy and Professor Mark

D. Corner, for their constant guidance and invaluable support during my Ph.D. study. Their

encouragement and patience provided me tremendous strength to accomplish this disser-

tation. Without them, this thesis would not have been possible. I am extremely fortunate

to have met and worked with such outstanding professors. Their invaluable advice will

continue to benefit my future career.

I owe a special debt of gratitude to Professor Prashant Shenoy for his contributions

in the development of my research and professional skills. Prashant introduced me into

the world of multimedia systems. His broad vision, deep insights, and strong guidance is

extraordinarily valuable to my research. Whenever I encountered research challenges and

difficulties, he has always been available for discussions and advices.

I express my sincere thanks to Professor Mark D. Corner for his critical insights, con-

tributions, and assistance. Mark has been extremely helpful to my research, especially in

the last two years of my graduate study. He provided very helpful guidance and feedback

in every aspect of my research and professional development.

I would like to thank the rest of my thesis committee: Professor Weibo Gong and

Professor Deepak K. Ganesan, for their constructive comments and suggestions on my

research. Weibo’s exceptional knowledge in mathematics and control theory helped me to

understand the critical issues in my research. Deepak’s expertise in sensor networks helped

me to develop the interdisciplinary research topics.

During my stay at UMass, I have had a wonderful time working and discussing with

many colleagues. In particular, I am thankful to Purushottam Kulkarni, Jiang Lan, Peter

v

Desnoyers, Huan Li, Ming Li, Gal Niv, Tim Wood, Bhuvan Urgaonkar, Abhishek Chandra,

Vijar Sundaram, and Gaurav Marthur, for their help with my work and the lively atmo-

sphere they created in the laboratory.

I would also like to extend my gratitude and appreciation to all the people in the Com-

puter Science Department at UMass. In particular, I am grateful to Karren Sacco, Sharon

Mallory, and Pauline Hollister for their kindness and incredible efficiency. Thanks also go

to Tyler Trafford and members of CSCF for their wonderful technical support.

This dissertation is dedicated my dear wife, Fan Yang. Her love, faith, patience, un-

derstanding, and continuous support are very important to me not only in the past, but also

in the future. I would also like to thank my loving parents, Guojun Liu and Meijun Wang,

for their never ending encouragement and support throughout my life. I also dedicate this

accomplishment to them. Finally, I would like to thank my sister, Xiaoling Liu, for her

support.

vi

ABSTRACT


SEPTEMBER 2006

XIAOTAO LIU

B.Eng., SOUTH CHINA UNIVERSITY OF TECHNOLOGY, GUANGZHOU, CHINA

M.S., UNIVERSITY OF MASSACHUSETTS AMHERST

Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST

Directed by: Professor Prashant Shenoy and Professor Mark D. Corner

The proliferation of multimedia-capable mobile devices, such as laptops, personal digi-

tal assistants (PDA), and cellular telephones, has lend to an explosive increase of multime-

dia contents. With the increasing capacities of disk storage, this trend has encouraged users

to create ever-larger personal digital libraries of audios, pictures, and videos. The preva-

lence of small and mobile devices such as sensors and RFID tags is expected to create an

ubiquitous computing environment. This ubiquitous environment can provide us pervasive

locationing and identification services, and further can create sensor data streams encoding

object’s location and identity which are the context of media capture. The concurrence

of these two trends enable a new set of multimedia systems that we refer to asPervasive

Multimedia Systems. Pervasive multimedia systems enable users to create, access and

navigate large volumes of multimedia content on a variety of personal mobile devices.

vii

Designing these pervasive multimedia systems face three major challenges: (i) achiev-

ing energy efficiency for the battery-powered mobile devices; (ii) searching and retrieving

media content in a fast and human-friendly manner; and (iii) identifying and locating in

scalable and maintainable fashions. In this thesis, We propose techniques that provide

system support for pervasive multimedia systems to address these challenges.

Due to the power constraints, battery-powered mobile devices impose requirements

of energy efficiency. On the other hand, the real-time nature of multimedia applications

makes it challenging to trade off application’s performance with energy savings. To achieve

energy efficiency without sacrificing application’s quality-of-service (QoS), We have de-

veloped power management techniques using application domain-specific knowledge and

time-series-based models. Our approaches can reduce the power consumption of mobile

devices significantly, while at the same time still meet the QoS requirements of applica-

tions.

Searching and retrieving media content is greatly enhanced by the textual annotations of

the context of media capture—when, where, andwho/what. Many mechanisms have been

developed to generate these context manually or semi-automatically. These mechanisms

are error prone and have high computational requirements. To automate the generation

of highly accurate context, We have designed and implemented sensor-enhanced video

annotation (SEVA), a context-aware multimedia recording system. SEVA exploits sensor

technology to automatically annotate the media with identities and locations of objects.

SEVA can greatly enhance user’s ability in searching and retrieving media content.

Locationing and identification system is a critical component in pervasive multimedia

systems. Its accuracy, deployability, and cost are crucial for the success of pervasive multi-

media systems. The currently available locationing systems have higher cost due to the use

of expensive and battery-powered tracking sensors. To address the issues of scalability and

maintainability of locationing systems, We propose Ferret, a pervasive locationing system

viii

incorporating passive RFID technology. Ferret is cost-effective, easily maintainable and

deployable compared to current locationing systems.

ix

TABLE OF CONTENTS

Page

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .v

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .vii

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xiv

CHAPTER

1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Scope of Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2.1 Energy Efficiency of Mobile Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2.2 Context-aware Multimedia Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.3 Pervasive Locationing and Identification Systems . . . . . . . . . . . . . . . . . . 6

1.3 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2. RELATED WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11

2.1 Content-based Media Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 Sensor Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3 Sensor Annotation of Multimedia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.4 Locationing Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.5 Power Management Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3. APPLICATION LEVEL POWER MANAGEMENT . . . . . . . . . . . . . . . . . . . . . . .24

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2 Chameleon Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.3 Application-level Power Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.3.1 MPEG Video Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

x

3.3.2 Video Conferencing Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.3.3 Word Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.3.4 Web Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.3.5 Batch Compilations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.4 A User-level Power Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.6 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.6.1 Chameleon-aware Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.6.1.1 Video Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.6.1.2 Video Conference Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523.6.1.3 Web Browser and Word Processor . . . . . . . . . . . . . . . . . . . . . 533.6.1.4 Batch Compilations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.6.2 Impact of Concurrent Workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553.6.3 Isolation in Chameleon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583.6.4 User-level Power Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593.6.5 System Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4. TIME SERIES-BASED POWER MANAGEMENT . . . . . . . . . . . . . . . . . . . . . . . .62

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.2 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.3 Profiling Current Demands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.3.1 Measurement of Processor Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644.3.2 Measurement of I/O Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.4 Predicting Future Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654.5 Speed Setting Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.5.1 Processor Speed Setting Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664.5.2 I/O Speed Setting Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.6 Implementation and Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.6.1 Implementation of TS-DVFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.6.2 Simulation of TS-DRPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.7 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.7.1 TS-DVFS Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

xi

4.7.1.1 Multimedia Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714.7.1.2 Other Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.7.2 TS-DRPM Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.8 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5. SENSOR-ENHANCED VIDEO ANNOTATION . . . . . . . . . . . . . . . . . . . . . . . . . . .78

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815.3 System Architecture and Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.3.1 Video Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825.3.2 Pervasive Locationing/Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835.3.3 Stream Correlation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855.3.4 Extrapolation and Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865.3.5 Filtering and Eliminating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 935.3.6 Query and Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945.5 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

5.5.1 Static Object, Static Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

5.5.1.1 Cricket Locationing System . . . . . . . . . . . . . . . . . . . . . . . . . . 985.5.1.2 GPS Locationing System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.5.2 Dynamic Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.5.2.1 Static Camera, Dynamic Objects . . . . . . . . . . . . . . . . . . . . . 1015.5.2.2 Dynamic Camera, Static Object . . . . . . . . . . . . . . . . . . . . . . 1055.5.2.3 Dynamic Camera, Dynamic Object . . . . . . . . . . . . . . . . . . . 107

5.5.3 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1075.5.4 Computational Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1095.5.5 Summary and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

6. RFID LOCALIZATION FOR PERVASIVE MULTIMEDIA . . . . . . . . . . . . . . .111

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1116.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1136.3 Ferret Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

6.3.1 Nomadic Location with RFID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

xii

6.3.2 Infrastructure Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1166.3.3 Location Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

6.4 RFID Locationing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

6.4.1 Offline Locationing Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1206.4.2 Translation, Rotation and Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . 1216.4.3 Online Locationing Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1256.4.4 Dealing with Nomadic Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

6.5 Implementation Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1276.6 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

6.6.1 Online Refinement Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1326.6.2 Offline Algorithm Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1336.6.3 Mobility Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1356.6.4 Object Motion Detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1376.6.5 Spatial Requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1386.6.6 Computational Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

6.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

7. CONCLUSIONS AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .140

7.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .144

xiii

LIST OF FIGURES

Figure Page

2.1 Current Locationing Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1 The Chameleon Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2 Three scenarios for task execution in a soft real-time application. . . . . . . . . . . . 28

3.3 Correlation Coefficients of MPEG 2 Videos. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.4 Correlation Coefficients of MPEG 4 Videos. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.5 Accuracy rate in predicting frame decode time within1ms with varyingwindow size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.6 Empirical CDF of error in predicting frame decode times with windowsize 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.7 Accuracy rate in predicting packet decode time within0.5ms with varyingwindow size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.8 Empirical CDF of error in predicting packet decode times with windowsize 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.9 Empirical CDF of error in predicting the number of packets in eachframe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.10 Event processing in a word processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.11 Characteristics of the TM5600-667 processor . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.12 Speed adapter mappings from the percentage CPU Speed to a CPUFrequency for the Transmeta TM5600. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.13 Average CPU power consumption and percentage of frames that are lateby more than 8ms (20% of the 40ms deadline). . . . . . . . . . . . . . . . . . . . . . . . 51

xiv

3.14 Characteristics of MPEG 4 Videos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.15 Average CPU power consumption of video conferencing. . . . . . . . . . . . . . . . . . 53

3.16 Average CPU power consumption and % of late events. . . . . . . . . . . . . . . . . . . . 55

3.17 Completion times and mean CPU power consumption for batchcompilations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.18 Performance of concurrent applications: average response time ofinteractive applications and the percentage of late events andframes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.19 Average CPU Power Consumption for various mixes. . . . . . . . . . . . . . . . . . . . . 57

3.20 Fraction of time spent at various frequency levels bymplayerinChameleon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.21 Isolation from power settings of other applications. . . . . . . . . . . . . . . . . . . . . . . 59

3.22 Average CPU power consumption of movie playback under GraceOS,Chameleon, and LongRun. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.23 Overhead of application-level power management (in CPU cycles). . . . . . . . . . 60

3.24 Cost of Voltage and Frequency Scaling (in CPU cycles). . . . . . . . . . . . . . . . . . . 61

4.1 The architecture of a TS-PM-enabled OS kernel . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.2 Characteristics of the TM5600-667 processor . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.3 Mapping Process Utilizations to a CPU Frequency in the TransmetaTM5600. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.4 Characteristics of the Simulated DRPM-Ready Hard Disk. . . . . . . . . . . . . . . . . 70

4.5 Characteristics of MPEG 4 Videos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.6 Average CPU power consumption and percentage of frames that are lateby more than 8ms (20% of the 40ms deadline). . . . . . . . . . . . . . . . . . . . . . . . 73

4.7 Characteristics of MPEG Videos for Rescaling/Transcoding . . . . . . . . . . . . . . . 74

4.8 CPU Energy Consumption and Execution Times for VideosRescaling/Transcoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

xv

4.9 CPU Energy Consumption and Execution Times for Other Workloads . . . . . . 76

4.10 Energy Consumption and Response Times of Disk Requests for DifferentWorkload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.1 Pervasive Locationing/Identification System. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5.2 Query and Response Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

5.3 Deriving an object’s path using curve fitting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5.4 The Operation of the extended Kalman filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.5 The Basic Optics Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.6 SEVA recorder laptop equipped with a camera, a 3D digital compass, aMote with wireless radio and Cricket receiver, a GPS receiver, and802.11b wireless. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5.7 The layout of static experiments using Cricket. . . . . . . . . . . . . . . . . . . . . . . . . . . 98

5.8 The error rate of static experiments using Cricket. . . . . . . . . . . . . . . . . . . . . . . . 99

5.9 The layout of experiments using GPS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.10 The error rate of static experiments using GPS. . . . . . . . . . . . . . . . . . . . . . . . . . 102

5.11 Mobile object on a pulley. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

5.12 Characteristics of different slopes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

5.13 Mean frames in error for a mobile object and static camera. . . . . . . . . . . . . . . 104

5.14 Remote control toy car with a Cricket node on the top. . . . . . . . . . . . . . . . . . . 105

5.15 Characteristics of different speeds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

5.16 Path of a mobile camera. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

5.17 Mean frames in error for a mobile camera. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.18 Response rate of Motes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

6.1 Use of Ferret to discover the location of a soup can in an office . . . . . . . . . . . 115

xvi

6.2 Coverage region of an RFID reader and tag detection probabilities in twodimensions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.3 Refining location estimates using multiple readings. . . . . . . . . . . . . . . . . . . . . . 120

6.4 Simplified 2D observation model for the antenna . . . . . . . . . . . . . . . . . . . . . . . 121

6.5 Left Handed Coordinate System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

6.6 Online location estimation in Ferret. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

6.7 Ferret Prototype System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

6.8 Online refinement of location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

6.9 Performance of offline Ferret under different likelihood thresholds. . . . . . . . . 133

6.10 Empirical CDF of Ferret’s locationing accuracy . . . . . . . . . . . . . . . . . . . . . . . . 134

6.11 Path of the Ferret device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

6.12 Performance of Ferret under various mobility patterns. . . . . . . . . . . . . . . . . . . 136

6.13 Fraction of object movements detected . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

xvii

CHAPTER 1

INTRODUCTION

1.1 Motivation

In the past decade, mobile devices such as laptops, personal digital assistants (PDA),

and cellular telephones have become indispensable in our daily-life. Concurrently, the pro-

cessing, storage, and communication capabilities of mobile devices have improved signifi-

cantly as predicted by Moore’s law. These technological advances enable mobile devices to

provide rich audio, video and imaging capabilities which were only available in powerful

desktop before. The prevalence of these multimedia-capable mobile devices has led to an

explosive increase of multimedia content, and has encouraged people to create ever-larger

digital libraries of audios, pictures, and videos. Navigating through large multimedia li-

braries imposes the requirement of searching and locating contents of interests in a fast and

human-friendly manner.

A concurrent trend is the proliferation of small and mobile devices such as RFID [21,

89] tags and numerous sensor technologies (e.g., Mica Motes [46], Telos [72], and the

XYZ [62]). For instance, every merchandise is equipped with a RFID tag encoding its

identity. These sensors and RFID tags are organized into sensor and RFID network to

create an ubiquitous computing environment full of tiny computing devices. This ubiq-

uitous computing environment can provide us pervasive and uninterrupted services such

as locationing and identification systems. Researchers have proposed several locationing

technologies using GPS [3], ultrasound [75], infrared [90], WiFi [2], RFID [45, 70]. These

pervasive services are able to provide us sensor data streams which encode the information

of objects, for example, object’s identity, location, and type. These information compose

1

the temporal, spatial, and social context of media capture, which can be used to organize

media libraries in context-aware basis and further enhance user’s ability in searching and

retrieving media.

The confluence of these trends enables a brand new set of multimedia systems—per-

vasive multimedia systems. Such multimedia systems exploit the sensor-rich world in

multimedia applications. They combine the provided pervasive services with multimedia-

capable mobile devices to create ever-richer media streams, which include not only the

traditional media streams of audios, pictures, and videos, but also the new sensor data

streams of the object’s information. Typically, pervasive multimedia systems capture the

identities and locations of objects nearby along with audios, pictures, and videos.

Such pervasive multimedia systems produce media streams and sensor data streams of

object’s identity and location. The availability of this metadata enables a variety of new

applications. For instance, users can locate a misplaced book on a bookshelf. Robots can

use such devices to conduct real-time identification and search operations. Vision-based

applications can use them to quickly learn the structure or organization of a space. In-

ventory tracking applications can proactively generate missing object alerts upon detecting

the absence of an object. Video surveillance system can proactively trigger security alerts

upon detecting a unidentified intruder. Multimedia digital libraries can use this metadata to

automatically organize multimedia into context-aware and even content-aware ways. Ob-

ject and face recognition techniques can use them to reduce the search space dramatically

to only objects which are really in the video or picture. We imagine that the ability to

automatically locate and identify thousands of objects in media streams will enable new

opportunities in vision and graphics, such as augmented reality and immersive systems.

Many these potential applications impose the following key requirements on the perva-

sive multimedia systems:

2

• The system should be energy efficient due to the energy constraints of battery-powered

mobile devices. The system should maintain the quality-of-service (QoS) of multi-

media applications when conserving power.

• The system should capture and time synchronize the sensor data with the audio,

picture, and video streams in real-time.

• The system should remove the information of uncorrelated objects, for instance, ob-

jects not present in the videos, from the sensor data.

• The system should automatically organize the media libraries in context-aware basis

and even content-aware basis according to thetemporal, spatial, and social context

of media capture— information of objects. Furthermore, the system should provide

fast and human-friendly tools to navigate through media libraries in content-based

ways.

• The system should be scalable, cost-effective, and easily maintainable.

Without satisfying these requirements, the pervasive multimedia systems have several draw-

back. First, the multimedia systems traditionally achieve energy savings by putting mobile

devices into sleep or standby mode when they are not in use, and waking up the devices to

active mode upon usage. For example, the laptop can be put into sleep after some amount

of idle time, and it can be woke up when the user press any key. Normally, the wake-up pro-

cedure takes tens of seconds. Such a power management technique can not reduce energy

consumption when mobile devices are being used, and thus it is not enough. Furthermore,

the long transition time (in terms of seconds) between sleep and active states makes these

techniques impractical for the multimedia applications having real-time nature since the

period or the deadline of the multimedia tasks is normally tens of milliseconds which is 3

magnitude smaller than the transition time between sleep and active states. Putting mobile

devices into sleep and waking them up between multimedia tasks will completely ruin the

QoS of multimedia applications.

3

Second, the multimedia systems either just capture the audios, pictures, and video

streams or they just capture the time and location information of media capture devices

along with media streams. Without the information of objects such as their identity and

location, content-based media retrieval is impossible. We can only locate contents of inter-

ests by browsing through all available media. For instance, to locate a photo with a person

present and taken in Miami beach, we must at least browse through all photos taken in

Miami beach if we don’t know who is present in every photo. However, the availability

of the identify and location of objects makes content-based media retrieval practical since

the system can automatically annotate media with the information of objects present in the

media. Therefore, apervasive multimedia systemmust be acontext-aware multimedia

systemalso.

Finally, the multimedia systems obtain the context of media capture by manually enter-

ing or automatically generating via a combination of learning- and vision-based object/face

recognition. Further, they use the obtained context for content-based media retrieval. Man-

ual entering the context of media capture is cumbersome and faces the difficulty of im-

precise human memory, and thus it is not suitable for large collections of media archives.

Automatic generating context by the learning and vision-based techniques is error prone

and has high computational requirements, and hence it is not appropriate for large media

libraries. By using the object’s information from sensor data streams, multimedia systems

can make this procedure automatic and highly accurate.

All these factors make it desirable to provide system support for pervasive multimedia

systems to meet these requirements. Many important challenges arise in building such sys-

tem support. In the next section, we examine some of the fundamental problems that need

to be addressed in order to build a feasible pervasive multimedia systems. The following

section summarize the primary contributions of this dissertation. We conclude this chapter

with the structure of this dissertation.

4

1.2 Scope of Research

In this section, we present the scope and goals of this dissertation. The requirements

of energy efficiency, context-aware property, and cost-effective and scalable constraints in

pervasive multimedia systems are discussed.

1.2.1 Energy Efficiency of Mobile Devices

Energy is a scarce resource in multimedia-capable mobile devices such as laptops, per-

sonal digital assistants (PDAs), and cellular phones. Within these mobile devices, CPU,

network interface cards (NICs), LCD, and hard drive (if they have) are the major com-

ponents of energy consumption. In contrast, multimedia applications such as audio-video

players, multimedia capture and editing programs tend to be resource-hungry. Typically,

such multimedia applications consume energy by accessing, processing and rendering large

amounts of data. Further, these applications impose soft real-time constraints, and thereby

impose lower bounds on the speeds at which multimedia data can be accessed, processed

and rendered by these applications. Therefore, we want to reduce the energy consumption

of mobile devices without sacrificing the performance of multimedia applications.

In this dissertation, we only consider achieving energy efficiency for the CPU subsys-

tem of mobile devices. The power consumption of CPU is nearly a quadratic function of

the CPU frequency, and the execution time of a task is inversely proportional to the CPU

frequency. Therefore, we can reduce the energy consumption of CPU by simply reducing

the CPU frequency. However, the real-time constraints of multimedia applications impose

lower bounds on the CPU frequency. In this situation, we want to take into account the

processing needs and real-time constraints of all the tasks when reducing the CPU fre-

quency to achieve energy efficiency. The goal of the first part of this dissertation is to (1)

develop on-line profiling technique to estimate the processing needs of applications, (2)

exploit these information to develop power management techniques for the CPU to achieve

energy efficiency without sacrificing the performance of applications, and (3) implement

5

these novel power management techniques in real system and evaluate their performance

in terms of energy savings and satisfying application’s real-time constraints.

1.2.2 Context-aware Multimedia Systems

A pervasive multimedia system is a context-aware multimedia system. By exploiting

sensor technology, pervasive multimedia systems produce not only media streams but also

sensor data streams containing the identities and locations of objects nearby. These sensor

data consist of the spatial, temporal, and social context of media capture, for instance when

and where a video was captured, and who/what was present in the video. As discussed

in Section 1.1, the availability of this metadata will be helpful for a tremendous number

of multimedia applications. However, without any processing, this metadata is initially

raw, noisy, and uncorrelated with the media stream. On the other hand, users want per-

vasive multimedia systems to provide accurate, well-organized and correlated metadata.

Therefore, we need to bridge the gap between pervasive sensing and users requirements.

Furthermore, pervasive multimedia systems use many heterogeneous hardware and soft-

ware platforms such as laptops, PDAs, GPS, RFID, and so on. Consequently, we need to

combine all of them together to provide a unified system.

The goal of the second part of this dissertation is to (1) gain fundamental understanding

of the research challenges in building pervasive multimedia systems, (2) design a unified

system architecture to cover the heterogeneous nature of pervasive multimedia systems,

and (3) develop and evaluate the performance of such pervasive multimedia systems using

real hardware platforms.

1.2.3 Pervasive Locationing and Identification Systems

As discussed in Section 1.1, pervasive multimedia systems exploit the pervasive loca-

tioning and identification systems provided by the ubiquitous deployment of sensors and

RFID tags to collect thespatial, temporal, andsocial context of media streams—when,

where, andwho. As a result, the cost, deployability, and accuracy of locationing and iden-

6

tification systems are critical factors in the success of pervasive multimedia systems. How-

ever, the currently available locationing and identification systems [2, 3, 43, 45, 70, 75, 90]

all use battery-powered small sensors to locate and identify objects which are attached with

these locationing sensors. Due to reasons of cost, form-factor and limited battery life, it is

impossible to equip every object with these sensors. Instead, we require a locationing and

identification technology that is easily maintained and scales to hundreds and thousands of

tagged objects.

The goal of the third part of this dissertation is to (1) identify the research challenges in

building pervasive locationing and identification systems, and (2) design and implement a

cost-effective and scalable pervasive locationing and identification systems.

1.3 Thesis Contributions

This dissertation presents a study of the system support for pervasive multimedia sys-

tems, to achieve the three goals outlined in Section 1.2. The first part, consisting of Chap-

ter 3 and 4, solves the problems of achieving the energy efficiency in a battery-constrained

mobile devices. Two power management techniques are designed and implemented to re-

duce the energy consumption of mobile devices without sacrificing the QoS of applications.

The second part addresses the challenges in building context-aware multimedia systems. A

novel sensor-enhanced multimedia system is proposed and developed for the purpose of

providing context support to pervasive multimedia systems. The third part presents a cost-

effective and scalable pervasive locationing and identification system. Particularly, a novel

pervasive locationing system incorporating RFID technology is proposed and studied.

In this section, we elaborate on the problems addressed by this thesis and our contribu-

tions as follows.

• Power Management Techniques of Mobile Devices. Most multimedia applications

are soft real-time applications, and they impose timeliness constraints. The inabil-

ity to meet the timeliness constraints impacts application correctness. For example,

7

playback glitch will be observed in a video decoder if it cannot decode a frame every

frame interval. Consequently, the power management techniques must not trade off

the QoS of applications with energy efficiency. For instance, when playing movies,

the CPU speed (frequency) cannot be reduced too much because the video player re-

quires the CPU running at certain speed to decode the video frames before the dead-

line. In this dissertation, we focus on the problem of achieving energy efficiency for

CPU without degrading the QoS of applications.

To achieve energy efficiency without sacrificing application’s performance, in this

thesis, we propose two novel power management techniques. We design efficient

profiling techniques to estimate the processing needs of applications. Further, we

reduce the CPU frequency (speed) to match the processing needs of applications. To

bound the performance degradation of applications in the case of estimation errors,

we propose CPU speed-setting strategies using feed-back control loop to choose an

appropriate CPU frequency (speed) for each process. Our experimental studies show

that our power management techniques greatly reduce the energy consumption of

CPU, and at the same time, still satisfy application’s QoS requirement [56, 57].

• Context-aware Multimedia System. A pervasive multimedia system is a context-

aware multimedia system. A context-aware multimedia system can produce media

streams annotated with the spatial, temporal, and social context of media capture,

for instance, when and where a video was captured, and who/what was present in

the video. By using the pervasive locationing and identification systems provided

by sensor network, this annotation procedure can be automated and highly accurate.

The integration of sensor network and multimedia systems requires properly han-

dling several limitations of sensors: the coverage and range, the mobility nature, the

resource constraints, and the accuracy.

8

To address these challenges, in this thesis, we design and implement a context-aware

multimedia system. By exploiting pervasive locationing services provided by sen-

sor technology, our system can locate a mobile object with very high accuracy and

record identities and locations of objects along with images and videos. This system

proposes a pervasive infrastructure to build up the wireless communication between

sensors and recoding devices. Our system exploits a series of correlation, interpo-

lation, extrapolation, and filtering techniques to produce video streams tagged with

highly accurate context metadata (e.g., when and where a video was captured, and

who/what is present in the video). Our experiments show that our sensor-enhanced

pervasive multimedia system can automatically annotate multimedia with highly ac-

curatetemporal, spatialandsocialcontext [53].

• Pervasive Locationing and Identification System. In pervasive multimedia systems,

pervasive locationing and identification system provides thetemporal, spatial, and

social context for media capture—when, where, andwho. The cost, deployability,

and accuracy of locationing and identification systems are extremely critical to the

success of pervasive multimedia systems. However, the currently available location-

ing and identification systems [2, 3, 43, 45, 70, 75, 90] are not cost-effective and less

scalable. They use expensive sensors which are battery-powered to locate and iden-

tify objects. They requires a large amount of human effort in battery maintenance.

In this thesis, to provide cost-effective and scalable locationing service to pervasive

multimedia system, we propose a scalable locationing and identification system in-

corporating passive RFID technology [21, 89]. The basic idea of our system is to

usethe location and directionality of RFID readersto infer the locations of nearby

objects which are attached with RFID tags. In particular, this system exploits the

user’s inherent mobility to produce readings of the tag from multiple vantage points,

and uses the intersection of the coverage regions from these readings to continually

improve its postulation of the object location. It is demonstrated that our RFID-based

9

pervasive locationing and identification system is scalable and can locate objects with

high accuracy [54].

1.4 Structure of the Thesis

The rest of this thesis is structured as follows. We consider the problem of energy ef-

ficiency in Chapter 3 and 4. Chapter 3 presents an application-level power management

technique for mobile devices, and Chapter 4 presents a time series-based power manage-

ment approach. In Chapter 5, we present a sensor-enhanced video annotation (SEVA)

system which use sensor network to automatically annotate media streams with context. In

Chapter 6, we present the design and implementation of Ferret, a pervasive locationing and

identification system incorporating passive RFID technology. Finally, we concludes with a

summary of our research contributions in Chapter 7.

10

CHAPTER 2

RELATED WORK

Pervasive multimedia systems draw from several related areas: content-based media

retrieval, sensor systems, sensor annotation, locationing systems, and power management

techniques. Due to the overwhelming amount of work in these areas, we only highlight the

most recent work in this chapter.

2.1 Content-based Media Retrieval

Content-based retrieval allows users to navigate through large media libraries in a fast

and human-friendly manner. There has been a great deal of work focused on content-based

image retrieval in the community. Smeulders et al. excellently surveyed these work in [80].

They identified two obstacles that content-based image retrieval still must overcome in

order to gain wide acceptance: “sensory gap” and “semantic gap”. The sensory gap is“the

gap between the object in the world and the information in a (computational) description

derived from a recording of that scene”[80]. For example, a car is recognized as something

else but a car if there is a tree in front of it. Another example is that the similar images from

different objects may be recognized as images of the same objects. The semantic gap is

“the lack of coincidence between the information that one can extract from the visual data

and the interpretation that the same data have for a user in a given situation”[80]. For

example, a picture of a person’s birthday party should be seen by a vision system as a

series of objects and people. The identification of these objects and people, the relationship

between people, and the content of this event are not represented.

11

To bridge the sensory gap, using domain and world knowledge, researchers have pro-

posed a combination of learning- and vision-based object/face recognition techniques [18,

19, 47, 52, 69, 100]. The knowledge database includes several sources of general knowl-

edge: the literal laws, the perceptual laws, the physical laws, the geometric and topological

rules, the category-based rules, and the man-made customs. In order to recognize an object,

we must search through the database using these laws and rules. Building a comprehensive

knowledge database requires a lot of human effort and experience. As a result, these tech-

niques are error prone and have high computational requirements, and therefore it is not

suitable for large media libraries.

To bridge the semantic gap, researchers have proposed techniques which require users

manually enter the semantic context [31]. This manual procedure is normally done long

after the image was capture, and it requires huge human effort. Therefore, manual pro-

cessing of each frame or image is cumbersome and faces the difficulty of imprecise human

memory, and thus it is not suitable for large collections of media archives.

All these techniques try to generate the temporal, spatial, and social context of media

capture—when, where, andwho/whatto bridge the sensory and semantic gaps. However,

these techniques are error prone and requires much effort due to the limitations of knowl-

edge database and human memory. The emergence of numerous sensor technologies can

automate the collections of the context of media capture and provide us highly accurate

context. The sensor-enhanced multimedia systems will gain tremendous research interests

in the future.

2.2 Sensor Systems

Small sensors can help us to sense the environment and locate and identify objects.

Their output are sensor data streams which can be used to determine the context of media

capture. A great deal of recent work has focused on developing new sensor technologies.

Several hardware platforms have been developed recently for portability, extensibility, and

12

research prototyping, such as the Mica Motes [46], Telos [72], the XYZ [62], and Star-

gate [83],

Mica series, released in 2001 by UC Berkeley, were carefully designed as a general

purpose platform for WSN research. Mica [46], the first platform, uses a ATmega128 8-bit

micro controller which only supports up to 16MHz clock speed and has only 4KB RAM

and 128KB Flash memory. Mica2, the follow on platform, replaces the RFM TR1000

radio module of Mica with the Chipcon CC1000 which offers more data rate and tunable

frequencies from 300 to 900MHz. Mica2Dot is the smaller version of Mica2. The newest

platform MicaZ replaces the CC1000 radio module with a CC2420 module which provides

more bandwidth and IEEE 802.15.4 compatibility. Mica series are the de facto standard

research platforms in the community.

Telos [72] is the latest sensor platform developed by UC Berkeley. Telos provides IEEE

802.15.4 and USB support. Telos uses a TI MSP430 16-bit micro controller. It supports

48KB Flash and 10KB RAM. Telos uses CC2420 radio module to provide communication

support. Telos provides greater performance and throughput in only one-tenth the power

consumption of previous motes platforms.

Stargate [83], which is designed by Intel and manufactured by Crossbow, is a higher

end sensor platform. Stargate uses a PXA 32-bit processor the speed of which can vary

from 100 to 400MHz. It includes 32MB of Flash memory and 16MB of SDRAM. Stargate

has a mote interface to communicate with Mica and Telos sensors, and can also support

IEEE 802.11 and Ethernet communication.

XYZ [62], designed in Yale, is complementary to the previous sensor platforms. It has

a larger dynamic operation range and a deep sleep mode which allows it to hibernate for

prolonged time periods. XYZ behaves like a Micaz mote in its lowest power mode, and it

can provide the similar functionalities to Stargate in its most powerful mode.

RFID, both active and passive, has significant potential to provide low-cost, short-range,

identification for many consumer goods and can help identify objects [21].

13

2.3 Sensor Annotation of Multimedia

Several systems automatically annotate images, videos, and audios with sensor data

such as GPS readings, light readings, temperature readings, and et al., and use these sensor

data to help media retrieval [1, 11, 17, 32, 67, 68, 84, 87].

Su et al. [84] augment film and video footage with sensor data such as light intensity,

color temperature and location, and evaluate the latency involved between an event captured

on video and on their system with light readings.

In [11, 32, 67, 68, 87], people stamp each picture with the location and the time au-

tomatically, and use these time and location information for the purpose of content-based

retrieval.

Toyama et al. [87] have built a World Wide Media eXchange (WWMX) database to

share geo-tagged images with others. They acquire time and geographic tags by manually

entering, from image header, and by matching time stamps between the location history

from GPS receiver and a photo. Combined with a map, the system allows users to effec-

tively view images from specified locations.

Naaman et al. have demonstrated how to use time and GPS location information

about digital photographs to organize photo collections and help recalling and finding pho-

tographs in [68]. In their later work [67], they propose to automatically generate related

contextual metadata such as the local daylight status and weather conditions at the time and

place a photo was taken, and identified the categories of contextual metadata that are most

useful for retrieving photographs. They acquire the time and GPS location information in

the same way as Toyama et. al do in [87].

Gemmell et al. [32] build a passive camera SenseCam which includes an accelerom-

eter/tilt sensor , a passive infrared sensor, a digital light sensor, a temperature sensor, and

a camera module. In this system, the changes of sensor data (e.g., light change, motion)

and the time interval trigger the image capture. They use accelerometer/tilt and digital light

sensors for power management of the camera module. The camera is shutdown whenever

14

movement and light changes are less than predefined thresholds. A handheld GPS provides

the location readings, while in the future an onboard GPS will be used. The photos, the

sensor data and GPS readings are uploaded into the MyLifeBits database [31]. They use

the date/time correlation between photos and GPS readings to set the location of the photos.

Davis et al. [11] have created a prototype ”Mobile Media Metadata” (MMM) on Nokis

3650 camera phones. Their prototype automatically stamps each photo with the time and

date of image capture, the GSM network CellID of image capture which provide the lo-

cation information of image capture, and the owner of the camera. Their prototype allows

users to manually annotate the photo with present objects and activity. These spatial, tem-

poral, and social contextual metadata can be combined and shared to infer the media content

at a later time.

Aizawa et al. have developed a wearable system equipped with sensors including GPS,

gyroscope, accelerometer, and a brain wave sensor in [1]. This system continuously cap-

tures video along with the sensor readings. The GPS data (time and location) are used to

extract key frames which are subjected to sophisticated processing such as image analysis.

They sample the key frames by time, by distance, by the speed of a movement, and by

the changes of speed and direction. They also propose and evaluate a conversation scene

detection scheme using human voice detection and human face detection.

Somewhat different from tagging images or videos with time and location informa-

tion, Ellis et al. explore the possibility of tagging continuous audio archival with the GPS

positions to automatically collect information on changes in location and activity in [17].

In summary, all these systems automatically record two parameters of media capture—

(whenandwhere, they cannot annotate media with the present objects—whoandwhatau-

tomatically. Without these annotations, content-based retrieval is nearly impossible. There-

fore, future work should generally concentrate on automatically recording the information

of present objects, for instance, their name, type, and location. In order to do this, objects

15

need to make the information available to others. Objects can exploit RFID technology or

ad-hoc networking technology such as bluetooth to publish these information.

2.4 Locationing Systems

Locationing system is a critical component in pervasive multimedia systems. Its accu-

racy, deployability, and cost are crucial for the success of pervasive multimedia systems.

Researchers have been developing systems and technologies [2, 3, 43, 45, 70, 75, 90]

which automatically locate people, equipment, and other things. These systems differ in

many parameters, such as the physical phenomena used and the locationing techniques.

Hightower and Borriello provide an excellent overview of current systems [44], and we

briefly survey most recent works here.

Locationing systems use different physical phenomena to detect object’s presence, such

as radio signal, ultrasound, and infrared. The use of radio signal can be further categorized

as satellite-based, 802.11 WiFi-based, and RFID-based. To localize an object, people can

choose from three locationing techniques, and locationing systems generally use them in-

dividually or combinatorially to determine an object’s position [44]:

• Triangulation: This technique computes an object’s position vialaterationor angu-

lation [44]. Laterationcalculates the position of an object by measuring its distances

from multiple known reference positions, andangulationuses angles instead of dis-

tances to compute an object’s position.

• Proximity: This technique determines an object’s position by measuring its nearness

to a known set of reference points.

• Scene analysis: This technique examines the features of a view from a particular

vantage point to determine the position of the observer or of objects in the view.

16

In order to have a better overview of these locationing systems, we present a figure here

to categorize these locationing systems. This figure is modified from the excellent table

in [44].

Locationing Systems Physical Phenomena Locationing Technique AccuracyGPS Satellite Lateration 1-15m

RADAR 802.11 Scene analysis & lateration 3-4.3mActive Badge Infrared Proximity Room size

Active Bat Ultrasound Lateration 9cmCricket Ultrasound Proximity and lateration 3-5cmSpotON RFID Lateration N/A

LANDMARC RFID Proximity 1-3m

Figure 2.1. Current Locationing Systems

GPS

The Global Positioning System (GPS) [3] is the most widely used outdoor locationing

system. GPS uses a worldwide satellite constellation (24 satellites + 3 backups) to deter-

mine object’s geographic position. GPS satellites periodically transmit wireless signals.

Upon receiving these signals, GPS receiver then triangulates its position by measuring its

distances to at least 4 satellites. GPS estimates the distance to satellites by taking time-

of-flight measurements of wireless radio. With a differential reference or the wide area

augmentation system, GPS can provide an accuracy of1 to 5 meters [44]. In such a de-

sign, receivers do not communicate with satellites, and satellites have no knowledge about

receivers. The advantage of this design is that it can support an unlimited number of users

worldwide and provide privacy, and the disadvantage is the complex design and the expen-

sive cost of receivers($100 per receiver). The European Galileo uses similar techniques to

GPS, and it is expected to reach less than10cm accuracy by 2008.

17

RADAR

RADAR [2], developed at Microsoft Research, is a building-wide WiFi-based location-

ing system using lateration and scene analysis. At the base stations, RADAR measures

the signal strength and signal-to-noise ratio of wireless signals emitted by network devices,

and then it uses this data to centrally compute the 2D position of a network device within

a building via scene analysis technique or lateration technique. By using the physical phe-

nomenon that the signal strength attenuates as the distance to the source increases, RADAR

uses the signal strength to measure the distance to the base station unlike GPS which uses

time-of-flight information. With the distance measurements, RADAR uses lateration tech-

nique to estimate the 2D positions of network devices. In using scene analysis, RADAR

builds a database of signal strength and signal-to-noise measurements by observing the ra-

dio transmissions at many positions throughout the building. The location of an 802.11

network device can then be lookuped in this database. The scene analysis implementation

of RADAR can locate an object within3 meters with50% probability, while the lateration

implementation has4.3 meters accuracy at the same probability. RADAR provides two

advantages: It uses the same infrastructure providing the general purpose 802.11 wireless

networking services, and it does not have additional device requirements other than the

normal 802.11 WiFi devices. Likewise, RADAR suffers from the huge human effort in

building the database of signal strength and signal-to-noise.

Active Badge

Active Badge [90] is a diffuse infrared-based (IR) indoor locationing system developed

at Olivetti Research Laboratory, now AT&T Cambridge. Active Badge uses proximity

technique to determine object’s position. In Active Badge, each object is attached with

an infrared badge for localization purpose. The infrared badge emits a unique identifier

periodically or on demand. A central server collects this data from fixed infrared sensors

around the building, and then estimate the badge’s location accordingly. A badge’s loca-

18

tion is represented symbolically using the location of infrared sensors, for example, in the

kitchen, in the living room. Due to the interference from the spurious infrared emissions,

Active Badge system has difficulty in the environments with fluorescent lighting or direct

sunlight. Active Badge can only provide an accuracy of room size level. Active Badge

provides centralized management and uses simple device of infrared badge in the cost of

no privacy.

Active Bat

Active Bat [43], from AT&T, is an ultrasound-based locationing system for indoor ob-

jects. Active Bat use time-of-flight lateration technique to locate objects within accuracy

of 9cm. Users and objects need to carry Bat tags in order to locate themselves. In Active

Bat, upon receiving the request sent by controller via short-range radio, a Bat tag responds

with an ultrasonic signal to a grid of ceiling-mounted ultrasound receivers. At the same

time, a reset signal is sent to these receivers from controller via serial network. Each sensor

computes its distance to the Bat tags by measuring the time interval from reset to the ar-

rival of ultrasonic signal. Finally, a central controller applies lateration technique on these

distance measurements to triangulate tag’s position. Due to the short range (several meters)

of ultrasound, Active Bat requires a large fixed-sensor infrastructure deployment, and thus

it lacks the ease of deployment and has higher cost. Active Bat uses simple Bat tags and

provides centralized management, however, it pays the price of scalability and privacy.

Cricket

In contrast to Active Bat, the indoor ultrasound-based Cricket locationing system [75]

uses ultrasound beacons and receivers in the infrastructure and objects, respectively. A

beacon periodically emits a radio signal along with an ultrasonic signal. As the radio signal

transmits in the speed of light, Cricket considers receiving it immediately. Upon receiv-

ing it, the receiver uses the time-of-flight of ultrasonic signal to measures its distance to

the beacon. With distances to more than3 beacons, receiver performs all the triangulation

19

computations to locate itself. Cricket can locate an object within3-5cm accuracy. In the

case of receiving only one beacon, Cricket uses proximity technique to estimate object’s

location. By moving the computational burden from infrastructure to mobile receivers,

Cricket provides the advantages of privacy and decentralized scalability, while it suffers

from the disadvantages of lack of centralized management and the more complicated mo-

bile receivers.

SpotON

SpotON [45] uses active RFID tags to locate indoor objects. Like in RADAR, SpotON

uses the radio signal attenuation to estimate tag’s distance to the base station. A central

server aggregates the distance measurements from several base stations to triangulate the

position of the tagged object. A complete SpotON system has not been made available as

of yet.

LANDMARC

LANDMARC [70] is another active RFID-based indoor locationing system. LAND-

MARC uses proximity techniques to estimate the position of the object attached with active

RFID tag. LANDMARC deploys multiple fixed RFID readers and reference tags as infras-

tructure, and measures the tracking tag’s nearness to reference tags by the similarity of

their signal received in multiple readers. Finally, LANDMARC uses the weighted sum (the

weight is proportional the nearness) of the positions of reference tags to determine the po-

sition of the tag being tracked. LANDMARC can locate an object in 2D with the accuracy

of 1-3 meters. Similar to SpotON, LANDMARC is also a central-controlled system.

Summary

All these systems use a combination of infrastructure and battery-powered mobile de-

vices to determine the position of an object. They differs in the hardwares (wireless radio,

infrared, and RFID) and the locationing techniques (triangulation, proximity, and scene

20

analysis) they use. Consequently, their positioning accuracies vary significantly, from sev-

eral centimeters to tens of meters. These systems successfully demonstrate the feasibility

of location sensing. However, due to the reasons of cost and limited battery life, it is im-

practical to equip every object with such a expensive and battery-powered tracking device.

In the future, research should focus on lowering cost, reducing infrastructure, improving

maintainability and scalability, and creating systems that can work both indoor and outdoor.

Passive RFID tag is an ideal solution because it doesn’t need battery and costs only several

cents per tag. Ad-hoc locationing techniques without infrastructure and central control will

be another exciting topic because they can provide high maintainability and scalability in

low cost. However, the lack of infrastructure will affect the locationing accuracy. To solve

this, the infrastructural system can incorporate with ad-hoc techniques to achieve a balance

among accuracy, scalability, and maintainability.

2.5 Power Management Technique

Power management techniques for mobile devices have received considerable research

attention. Most of the proposed techniques either use dynamic voltage and frequency scal-

ing (DVFS) [9, 23, 24, 59, 60, 61, 64, 73, 74, 82, 96, 97] or application/middleware-based

adaptation [26, 27, 78, 85] for energy savings. DVFS approaches extract energy savings

by varying the processor speed; the techniques do not affect the amount of processing per-

formed by the application—the processing is merely spread over longer time periods by

lowering CPU speeds. In contrast, middleware-based adaptation approaches vary quality

or data fidelity and thus, the amount of processing performed by the application to extract

energy savings. We review related work in both categories.

Application or middleware-based adaptation techniques trade the computational over-

head for application quality; energy savings are extracted by reducing video quality [78,

85], document quality [26] or data fidelity [27], and thus, the processing overheads. Proxy-

based adaptation for reducing streaming video quality has been explored in [78, 85]. Pup-

21

peteer adapts document quality (i.e. picture resolution, color depth, animation) for energy

savings of office applications [12, 26]. The impact of adapting the data fidelity on energy

savings of several applications has also been demonstrated in the Odyssey system [27, 71].

In contrast, DVFS techniques do not reduce the amount of processing overhead im-

posed by an application; instead they vary the CPU speed to match the CPU load and

extract energy savings [9, 23, 24, 59, 60, 61, 64, 73, 74, 82, 96, 97]. DVFS techniques fall

into four categories: hardware-based, OS-based, cooperative application-OS-based, and

application-directed methods. Hardware-based approaches such as Longrun [25] measure

system utilization in hardware and choose a system-wide speed setting based on the cur-

rent utilization. An online hardware approach for voltage and frequency control in multiple

clock domain microprocessors has been proposed in [95]. OS-based approaches determine

a system-wide CPU setting based on the processor demands of the currently active tasks

[23, 24, 59, 60, 74]. In this approach, individual applications do not have any direct control

over the CPU power settings. A single system-wide CPU setting is determined, typically

based on the needs of the most resource-hungry application, even when a mix of applica-

tions is executing on the processor. Furthermore, the operating system needs toinfer the

processing needs of the applications using online measurements and can incur estimation

errors.

In cooperative application-OS approaches, the application provides some domain-specific

information to the kernel. The OS kernel and the CPU scheduler use this information for

CPU speed setting and/or scheduling. The GRACE-OS project [96, 97] proposes a cooper-

ative application/OS approach to save energy for periodic multimedia applications. It uses

probability distributions of CPU usage of periodic applications and knowledge of applica-

tion periods (which is supplied by the application) for choosing CPU speeds. Aperiodic or

non-real-time applications are currently not handled by the approach.

Similarly, the Milly Watt project [16] explores the design of a power-based API that

allows a partnership between applications and the system in setting energy use policy. In

22

the context of this project, a Currentcy model that unifies energy accounting over diverse

hardware components and enables fair allocation of available energy among applications

[98], and a prototype energy-centric operating system, ECOSystem, that implements ex-

plicit energy management techniques from the system point of view have been proposed

[99]. Their goal is to extend battery lifetime by limiting the average discharge rate and to

share this limited resource among competing tasks according to user preferences.

A cooperative power management approach was proposed in [65] to unify low level

architectural optimizations (CPU, memory, register), OS power-saving mechanisms (Dy-

namic Voltage and Frequency Scaling) and adaptive middle techniques (admission control,

optimal transcoding, network traffic regulation). In this technique, interaction parameters

between the different levels are identified and optimized to significantly reduce power con-

sumption.

Finally, there has been some work on application-level power management. Researchers

have proposed several different application-controlled DVFS techniques for video decod-

ing [9, 61, 64, 73, 82]. While some require offline estimation of CPU demands for decoding

[64], others can estimate the CPU demands online [9, 61, 73, 82]. However, all of these

techniques implicitly assume only a single application is executing on the CPU and grant

complete control of the processor settings to the video decoder.

23

CHAPTER 3

APPLICATION LEVEL POWER MANAGEMENT

3.1 Introduction

Modern mobile devices use energy judiciously by incorporating a number of power

management features. For instance, modern processors such as Intel’s XScale and Pentium-

M and Transmeta’s Crusoe incorporate dynamic voltage and frequency scaling (DVFS) ca-

pabilities. DVFS enables the CPU speed to be varied dynamically based on the workload

and reduces energy consumption during periods of low utilization [38, 50, 93]. In gen-

eral, such techniques must be carefully designed to prevent the processor slowdown from

degrading the responsiveness of the application.

DVFS techniques fall into three categories: hardware-based, OS-based, and cooperative

application-OS-based. Hardware approaches such as LongRun [25] measure processor

utilization at the hardware level and vary the CPU speed based on the measured system-

wide utilization. OS-based approaches determine a system-wide CPU setting based on the

processor demands of the currently active tasks [24, 23, 59, 60, 74]. These approaches infer

the processing needs of applications using online measurements and can incur estimation

errors.

In cooperative application-OS approaches [96, 97], the application provides some domain-

specific information to the kernel. The OS kernel and the CPU scheduler use this infor-

mation for CPU speed setting and/or scheduling. GRACE-OS [96, 97] proposes such a

cooperative application/OS approach for periodic multimedia applications. Aperiodic or

non-real-time applications are currently not handled by the approach.

24

Speedsettings

App 2

Speedsettings

App 1

QueryCPU usage

SpeedAdapter

Scheduleprocesses

ProcessorMonitor

Demands

Chameleon OS Interface

DVFS−enabled Processor

Speedsettings

per−process speed setting

CPU Schedulerwith

User−levelPower Manager

Use

r Spa

ceK

erne

l Spa

ce

Figure 3.1. The Chameleon Architecture.

In what follows, we propose an approach, namely application-level power management,

where applications are given complete control over their CPU power settings—an applica-

tion is allowed to specify its CPU power setting independently of other applications, and

the operating system isolates an application from the settings used by other applications.

3.2 Chameleon Architecture

Chameleon consists of three key components as shown in Figure 3.1. First, Chameleon

provides anOS interfacethat enables applications to query the kernel for resource usage

statistics and to convey their desired power settings to the kernel. The details of the interface

are presented in Section 3.5. In general, a user-level power management strategy combines

OS-level resource usage statistics with application domain knowledge to determine a desir-

able CPU power setting. This can be achieved in one of two ways. An application can use

the Chameleon interface to directly modify its own power settings. Alternatively, an appli-

cation can delegate the task of power management to a user-level power manager. Such a

power manager can use resource usage statistics and any application-supplied information

to adjust the application’s power settings on its behalf.

25

Second, Chameleon includes a modified CPU scheduler that supports per-process CPU

power settings and application isolation. The scheduler maintains the current power set-

tings for each process and conveys these settings to the underlying processor whenever the

process is scheduled for execution (i.e., at context switch time). The application’s power

settings can be modified at any time via system calls, either by the application itself or by

a user-level power manager acting on its behalf.

One concern is that if one application misuses the speed setting, either maliciously or

inadvertently, it may degrade the performance of other applications. In Chameleon an ap-

plication’s power settings take effectonly when it is scheduled. This is the only mechanism

that is needed to provide isolation, while matters of policy, such as CPU-shares or energy

allocations, should be implemented separately. For instance, if applications slows the CPU

and uses more CPU time, this is no different than a process misbehaving by entering an

endless loop, something to be managed by scheduling policy. We experimentally demon-

strate Chameleon with Linux time sharing—a best-effort scheduler and with start time fair

queuing—a QoS-aware proportional-share scheduler. In the time sharing scheduler, each

application will be scheduled for a quantum, regardless of the behavior of other applica-

tions, while the proportional share scheduler provides greater guarantees on the frequency

and duration of those quantum. In fact, given Chameleon’s architecture, it does not require

any direct modifications to the CPU scheduling algorithm itself, and as a result Chameleon

is compatible with any scheduling algorithm.

Third, Chameleon implements a speed adapter that maps application-specified power

settings to the nearest CPU speed actually supported by the hardware. In particular, an

application specifies the desired CPU speed as a fractionfi of the maximum processor

speed. The speed adapter maps this fraction to the nearest supported CPU speed; since

different hardware processors support different discrete speeds, such an approach ensures

portability across hardware.

26

3.3 Application-level Power Management

Independent of the particular application, a user-level power management policy con-

sists of three key steps. (i)Estimate processor demand:In this step, a combination of

application domain knowledge and past CPU usage statistics is used to estimate proces-

sor demand in the near future. (ii)Estimate processor availability:This step explicitly

accounts for the impact of other concurrent applications. In this step, the amount of CPU

time that will be available to the application in the presence of other applications is esti-

mated. (iii) Determine processor speed setting:The third step chooses an speed setting

that attempts to “match” the processor demand to the processor availability. For instance,

if the actual demand is only half of the available CPU time, then the application can run the

processor at half speed and spread its CPU demand over the available time. In contrast, if

the processor demand and the processor availability are roughly equal, the application may

choose to run the processor at full speed.

In the rest of this section, we show how these ideas can be instantiated for four specific

applications that belong to three different application classes—soft real-time, interactive

best-effort, and batch.

3.3.1 MPEG Video Decoder

An MPEG video decoder is an example of a soft real-time application. Many multime-

dia applications such as DVD playback, audio players, music synthesizers, video capture

and editors belong to this category. A common characteristic of these applications is that

data needs to be processed with timeliness constraints. For instance in a video decoder

frames need to be decoded and rendered at the playback rate—for a 30 frames/s video, a

frame needs to be decoded once every 33ms. The inability to meet timeliness constraints

impacts application correctness; playback glitches will be observed in a video decoder, for

example.

27

��

��

dtprocessor demand c

processor availability e

(b) e < c

��

��

processor demand c(c) t + c < d, c < e

dt

processor availability e

��

��

t

processor demand c

d(deadline)

(a) t + c > d(task arrival)

Figure 3.2. Three scenarios for task execution in a soft real-time application.

A soft real-time application can use the following general strategy for user-level power

management. Assume that the application executes a sequence of tasks; the decoding of

a single frame is an example of a task. Letc denote the amount of CPU time needed to

execute this task at full processor speed. Letd denote the deadline of this task and lett

denote the task begin time. Further, lete denote the amount of CPU time that will actually

be allocated to the application for this task before its deadline. The parameterc captures

processor demand, whilee captures processor availability by accounting for the presence

of other concurrent tasks in the system. In a time sharing scheduler, for instance, the larger

the number of runnable tasks, the smaller the value ofe. In a QoS-aware scheduler that

allows a fixed fraction of the CPU to be reserved for an application, the value ofe will be

independent of other tasks in the system.

Given the processor demandc, processor availabilitye and deadlined, the processor

speed can be chosen as follows.

Case 1.If t + c > d then it is impossible to meet the task deadline (see Figure 3.2(a)).

Essentially, the task started “too late,” and neither the CPU scheduler nor the power man-

agement strategy can rectify the situation. In such a scenario, the appropriate policy is to

run the processor at full speed to mitigate the effects of the missed deadline.

Case 2: If e < c, then the processor demand exceeds processor availability for this

task (see Figure 3.2(b)). Although it is feasible to meet the deadline by allocating sufficient

CPU time to the task, the CPU scheduler is unable to do so due to presence of other concur-

28

rent applications. Since application performance will suffer due to insufficient processor

availability, the power management strategy should not further worsen the situation. Thus,

the application should run at full processor speed for this task.

The final scenario assumes that neither cases 1 or 2 are true.

Case 3:If t + c < d then task can finish before its deadline at full processor speed (see

Figure 3.2(c)). In this case, the policy should slow down the CPU such that the demandc

is spread over the amount of time the task will execute on the CPU, while still meeting the

deadline. The CPU frequencyf should be chosen as

f =c

min(e, d− t)· fmax

wherefmax is the maximum processor speed (frequency).

This strategy is applicable to a variety of soft real-time applications, so long as the

notion of a task is defined appropriately. In a video decoder, (i) decoding of each frame

represents a task, (ii)c denotes the time to decode the next frame at full speed, (iii)e

denotes the estimated duration for which the decoder will scheduled on the CPU until the

frame deadline, and (iv)d denotes the playback instant of the frame (as determined by the

playback rate of the video).

While d is known, parametersc ande need to be estimated for each frame.

Estimating processor demand:Processor demand is determined by estimating frame

decode times. We considermplayer an open-source video decoder that supports both

MPEG-2 and MPEG-4 playback. Note that MPEG-2 is widely used for DVD playback,

while MPEG-4 is used by commercial streaming systems such as QuickTime and Win-

dows Media. Usingmplayer, we encoded a number of MPEG-2 and MPEG-4 video clips

at different bit rates and different spatial resolutions. These video clips were decoded by

an instrumentedmplayerthat measured and logged the decode time of each frame at full

processor speed. We analyzed the resulting traces by studying the first order and second

order statistics of the decode times and frame sizes for each frame type (i.e.,I, P , B). In

29

our analysis we seek to: (i) show that it is likely that the frame size and frame decoding

time are correlated, and (ii) model this relationship using a linear function.

To demonstrate correlation between frame size and decoding time we first start with the

null hypothesis that two random variables,s ande corresponding to the frame size and the

frame decoding time, have a correlation coefficient of0, thus they are not correlated. By

measuringN samples ofs ande, we can obtain an estimate of the correlation coefficient:

rse =ΣN

i=1(si − s)(ei − e)

[ΣNi=1(si − s)2ΣN

i=1(ei − e)2]1/2, (3.1)

wheres andt, are the sample means of the decoding sizes and times. Using the result from

[6], we know that the acceptance region of the null hypothesis is:

−zα/2 ≤√

N − 3

2ln[

1 + rse

1− rse

] < zα/2, (3.2)

wherezα/2 is the z-score at anα level of significance. So ifA =√

N−32

ln[1+rse

1−rse] falls

outside the acceptance region of zero correlation bounded by±zα/2, the hypothesis that

the actual correlation is zero is rejected at theα level of significance. In other words ifA

falls outside this region, the hypothesis is rejected, and we can infer that there likely exists

a correlation between frame sizes and decoding times.

Given a 0.02 level of significance (±zα/2 = ±2.33), and a large set of measurements

from multiple video clips, we measured the sample correlation coefficient, and the param-

eterA to test if it falls within the acceptance region. Figures 3.3 for MPEG2, and 3.4 for

MPEG4, show the results of these measurements. The fact that A is outside the acceptance

regions of 2.33, we can be reasonable assured thats andd are correlated.

These results show a significant correlation between the frame size and decoding time,

allowing us to form a linear predictor of the decoding time. This is corroborated by the

findings of a prior study on MPEG-2 where an approximate linear relationship between

frame size and decode times was observed [5].

30

Res. Frame Type Bit-Rate(Kbps) rse A

352x288 I 1120.0 0.8956 111.7280352x288 P 1120.0 0.3443 47.7959352x288 B 1120.0 0.1808 39.7774

Figure 3.3. Correlation Coefficients of MPEG 2 Videos

Res. Frame Type Bit-Rate(Kbps) rse A

352x240 I 630.5 0.9045 42.0324352x240 P 630.5 0.7664 492.1438512x288 I 705.5 0.8201 40.9122512x288 P 705.5 0.8084 455.9816576x256 I 775.4 0.9162 88.7301576x256 P 775.4 0.7667 389.0298640x272 I 1290.9 0.8824 48.3261640x272 P 1290.9 0.6464 216.6028640x352 I 679.7 0.6861 50.9520640x352 P 679.7 0.8217 486.8483

Figure 3.4. Correlation Coefficients of MPEG 4 Videos

We have construct a predictor that uses the type and size of each frame to compute its

decode time. A key feature of our predictor is that the prediction model is parameterized

at run-time to determine the slope and intercept of the linear function. To do so, the video

decoder stores the observed decoding times of the previousn frames, scales these values

to the full-speed decoding time (since the observed decode times may be at slower CPU

speeds), and uses these values to periodically recompute the slopes and the intercepts of the

linear predictor using a least-squares fit. This not only enables the predictor to account for

differences across video clips (e.g., different bit rates require different linear predictors), it

also accounts for variations within a video (e.g., slow moving scenes versus fast moving

scenes in a video). The parameterized predictor is then used to estimate the decode time of

each frame at full processor speed.

For instance, given window sizen, suppose we have the lastn I frame sizes and de-

coding times, then we start to decode a new I frame and we already know the size of this

31

new frame. Letsi andei denote the frame size and the full-speed decoding time of theith

I frame, respectively,sn+1 denote the frame size of the new I frame anden+1 denote the

predicted full-speed decoding time of it. Thus theen+1 is given by a least squares-fit:

s =

∑ni=1 si

n

e =

∑ni=1 ei

n

b =

∑ni=1(si − s)ei∑ni=1(si − s)2

(3.3)

a = e− bs

en+1 = a + bsn+1

In the predictor shown in Equation 3.3, the window sizen has great impact on the

performance of the predictor, thus choosing an appropriaten is important issue in the design

of such a linear regression predictor. To do this, we applied the linear regression predictor

to our collected traces by varying the window sizen from 5 to 50, and then measured

the absolute accuracy of the linear regression predictor with different window sizes. In

particular we determined how often the predictor was within1ms of the actual decode

time. At a frame interval of33ms, an accuracy better than1ms makes providing on-time

frame decoding straightforward.

As shown for two sample videos in Figure 3.5, the linear regression predictor achieves

the best accuracy in most cases when the window sizen is less than10, and the accuracy

level has small variation in that area. We found similar results for other frame sizes and

videos. Therefore, we choose the window size8 for our predictor as the division operations

of Equation 3.3 can then transformed to the shift operations reducing the computational

cost.

After choosing a window size of8 it is important to know the distribution of the absolute

error rate. Figure 3.6 presents the accuracy of our predictors for all three different frame

types (i.e, I, P, B) with window size8 on two sample videos. Our experiments show that

32

0%

20%

40%

60%

80%

100%

5 10 15 20 25 30 35 40 45 50

Err

or R

ate

Window Size

I Type FrameP Type Frame

(a) MPEG-4 (640x352)

0%

20%

40%

60%

80%

100%

5 10 15 20 25 30 35 40 45 50

Err

or R

ate

Window Size

I Type FrameP Type FrameB Type Frame

(b) MPEG-2 (352x288)

Figure 3.5. Accuracy rate in predicting frame decode time within1ms with varying win-dow size

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0 5 10 15 20

Pro

babi

lity

Absolute Error (ms)

I Type FrameP Type Frame

(a) MPEG-4 (640x352)

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0 5 10 15 20

Pro

babi

lity

Absolute Error (ms)

I Type FrameP Type FrameB Type Frame

(b) MPEG-2 (352x288)

Figure 3.6. Empirical CDF of error in predicting frame decode times with window size 8

our predictor can almost always estimate frame decoding times within5ms—even a frame

that is5ms late can be masked by a small amount of buffering in a video player.

Estimating processor availability: Using the Chameleon interface, the application can

obtain the start times and the end times of the previousk instances where the application

was scheduled on the CPU. This history of quantum durations and the start times of the

quanta provide an estimate of how much CPU time was allocated to the application in the

recent past. Chameleon uses an exponential moving average of these values to determine

33

the amount of CPU time that is likely to be allocated to the application per unit time, and

this yields the processor availability over the nextd− t time units.

Determining processor speed:Given an estimatec of the frame decoding time ande

of the processor availability, the actual CPU frequencyf is chosen inmplayeras follows:

f =

fmax if t + c > d

fmax if e < c

min( c·fmax

min(e,d−t), fmax) otherwise

(3.4)

The Chameleon speed adapter then maps the computedf to the closest supported CPU

speed that is no less than the requested speed.

Implementation: We augmentedmplayerwith the frame decoding time predictor and

the speed setting strategy. Our modifications were primarily restricted to the beginning

and end of frame decoding method inmplayer. We usedgettimeofday to measure the

frame decoding time and the Chameleon interface to estimate the processor availability.

Other modifications involved using the Chameleon interface to set the CPU speed using

Equation 3.4. In all, the implementation of frame decoding time predictor involved 221

lines of C code, and the implementation of speed setting strategy involved 18 lines of C

code. This indicates that user-level power management strategy can be implemented at

relatively modest effort.

3.3.2 Video Conferencing Tool

Video conferencing is another popular, soft-realtime application; it is often based on

the H.26x family of compression standards (specifically, H.261, H.263 and H.264). Video

conferencing exhibits a slightly different soft-realtime property than mplayer—H.26x com-

pression is specially designed to support low-latency streaming. As parts of the video are

encoded, they are streamed over the network to the client, which decompresses them as

they are received. This is in contrast with mplayer and MPEG4, which retrieves whole

34

frames from disk and decodes them in bulk. Thus in a streaming application, Chameleon

does not have one deadline per-frame to meet, but rather a number of deadlines per-frame,

to be met as compressed data arrives. In the case of an Internet streaming application,

the client decodes individual IP packets as they arrive; each packet can be independently

decoded at the receiver without waiting for other packets of that frame. We construct a

Chameleon-aware video conferencing application as follows:

Estimating processor demand: We considergnomemeetingan open-source video

conferencing tool that supports H.261. Note that H.261 supports two resolutions: QCIF

(176x144) and CIF (352x288). Usinggnomemeeting, we ran a number of video confer-

ences at different spatial resolutions, and measured and logged the decode time of each

frame at full processor speed. The low level compression mechanisms in H.261 share

many common ideas with MPEG. Not surprisingly, we observed a similar linear relation-

ship between the packet size and the packet decoding time. Consequently, we use a similar

predictor to that in Section 3.3.1 to estimate the decoding time of a packet.

As shown for two sample video conferences in Figure 3.7, the linear regression predic-

tor achieves the best accuracy in most cases when the window size is more than8, and the

accuracy level has small variation in that area. Therefore, due to the same reason in Section

3.3.1, we choose window size8 for our predictor to reduce the computational cost.

Figure 3.8 shows the accuracy of our predictors with window size8 on two sample

conferences. Our experiments show that our predictor can almost always estimate packet

decoding times within1.5ms.

Estimating the deadline of each packet:The deadline of each frame can be deter-

mined from the time stamp of a frame and the frame rate. From this deadline, our client

must create a deadline for decoding each packet as it arrives. We adopt a simple policy of

dividing the deadline evenly by the number of packets the client expects to see in a frame.

In the case of H.261 the number of packets in a frame is unknown until the entire frame has

35

0%

20%

40%

60%

80%

100%

5 10 15 20 25 30 35 40 45 50

Err

or R

ate

Window Size

(a) QCIF (176x144)

0%

20%

40%

60%

80%

100%

5 10 15 20 25 30 35 40 45 50

Err

or R

ate

Window Size

(b) CIF (352x288)

Figure 3.7. Accuracy rate in predicting packet decode time within0.5ms with varyingwindow size

0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5

Pro

babi

lity

Absolute Error (ms)

(a) QCIF (176x144)

0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5

Pro

babi

lity

Absolute Error (ms)

(b) CIF (352x288)

Figure 3.8. Empirical CDF of error in predicting packet decode times with window size 8

arrived, thus we estimate the number of packets from the number of packets in the previous

frame.

As the client knows the deadlineD of a frame from its time stamp, it can estimate the

deadlined of theith packet in this frame as

d = t +D − t

n− i + 1

36

in which n is the estimated number of packets in this frame, andt is the task begin time

of decoding theith packet. Then the deadlined of theith packet in this frame is given by:

d =

D if i > n andith packet is the last packet

t + D−t2

if i > n andith packet is not the last packet

t + D−tn−i+1

if i ≤ n.

(3.5)

Our analysis of the gnomemeeting video conferencing traces showed that the frame

size, and thus the number of packets in a frame, is governed by the amount of human

motion in each frame. Due to the continuous nature of human motion, we used the number

of packets in the previous frame (denoted as last) and the mean number of packets in the

previousk frames (denoted as mean) for predictions, and we also applied a number of

time series models such as AR(1), AR(2), AR(3), and MA(1) to our resulting traces for

predictions [7].

As shown in Figure 3.9, We found the number of packets in the current frame to be the

best predictor of the number of packets in the next frame. Consequently, we use a simple

predictor that sets the estimated number of packets in the next frame to that in the current

frame.

Estimating processor availability: We use the same technique as that in Section 3.3.1

to estimate the processor availability.

Determining processor speed:Let n denote the estimated number of packets in the

current frame, lett denote the task begin time of decoding thejth packet, and letD denote

the deadline of the frame. Then the CPU speedf for decoding of thejth packet in the

current frame is determined by scaling its full-speed decode timec until the deadlined.

That is,

f =

fmax if t + c > d

fmax if e < c

min( c·fmax

min(e,d−t), fmax) otherwise

(3.6)

37

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10

Pro

babi

lity

Absolute Error

AR(1)AR(2)AR(3)MA(1)mean

last

(a) QCIF (176x144)

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10

Pro

babi

lity

Absolute Error

AR(1)AR(2)AR(3)MA(1)mean

last

(b) CIF (352x288)

Figure 3.9. Empirical CDF of error in predicting the number of packets in each frame

wherefmax denotes maximum CPU speed, andd is the estimated deadline given by Equa-

tion 3.5.

In an actual implementation, the computedf is mapped by the speed adapter to the

closest available speed that is no smaller than the requested speed.

Implementation: We modifiedgnomemeetingto implement the packet decoding time

predictor, the packet number predictor, and the speed setting strategy. Like in the case of

the video decoder, our modifications were restricted to the beginning and end of packet

decoding method ingnomemeeting, and we usedgettimeofday to measure the packet

decoding time and the Chameleon interface to estimate the processor availability. Other

modifications involved using the Chameleon interface to set the CPU speed using Equation

3.6. In all, the implementation of packet decoding time predictor involved 221 lines of C

code, the implementation of packet number predictor involved only one line of C code, and

the implementation of speed setting strategy involved 32 lines of C code. This indicates

that user-level power management strategy can be implemented at relatively modest effort.

38

3.3.3 Word Processor

A word processor from an Office suite is an example of an interactive best-effort ap-

plication. Many applications such as editors, shell terminals, web browsers and games fall

into this category. For instance, a word processor is an event-driven application that works

as follows. Upon an event such as a mouse click or key stroke, the word processor needs to

do some work to process the event. For example, when the user clicks on a menu item, the

application must display a drop-down menu of choices. When the user types a sentence,

each character representing a keystroke needs to be displayed on the screen. The window

needs to be redrawn when thedraw event arrives. The speed at which these events are

processed by the word processor greatly impacts the user’s experience.

Studies have shown that there exists a human perception threshold under which events

appear to happen instantaneously [8]. Thus, completing these events any faster would not

have any perceptible impact on the user. While the exact value of the perception threshold

is dependent on the user and the type of task being accomplished, a value of50ms is

commonly used [8, 23, 24, 59, 60]. We also use this perception threshold in our work.

An event-driven interactive application should choose CPU speed settings such that

each event is processedno later than the human perception threshold. One possible strategy

to do so is to (i) estimate the processor demand of an event, (ii) estimate the processor

availability in the next 50 ms, and (iii) choose a speed such that the demand is spread

over the available CPU time while still meeting the 50 ms perception threshold. Since an

event-based application may process many different types of events, estimating processor

demand for each event will require the approach to be explicitly aware of different event

types and their computational needs. Such a strategy can be quite complex for applications

such as browsers or a word processors that support a large number of event types.

Instead we propose a different technique that can meet the human perception threshold

without requiring explicit knowledge of various events types. Our technique referred to as

39

Perceptionthreshold

eventarrival

processruns

��

��

event processing

t’

50 − β

t

β

50 ms

Figure 3.10.Event processing in a word processor

gradual processor acceleration (GPA)accounts for the processor demand and the processor

availability implicitly.

Upon the arrival of any event, the word processor is configured to run at the slowest

CPU speed, and a timer is set (the timer value is less than the perception threshold). If the

processing of the event finishes before the timer expires, then the application simply waits

for the next event. Otherwise, it increases the CPU speed by some amount and sets another

timer. If the event processing continues beyond the timer expiration, the CPU speed is

increased yet again and a new timer is set. Thus, the processor is gradually accelerated

until either the event processing is complete or the maximum CPU speed is reached. In

order to ensure adequate interactive performance, the maximum CPU speed is always used

when the event processing time exceeds the perception threshold.

To understand how this policy works in practice, suppose that the event arrives at time

t and the application is actually scheduled on the CPU at timet′ (although the application

becomes runnable as soon as the event arrives, other concurrent applications can delay the

scheduling of this application). From the perspective of the user, a response is desirable

from the application no later thant + 50 ms. Since the application actually starts executing

at timet′, it needs to process the event within the remaining50 − β ms, whereβ = t′ − t

(see Figure 3.10). To do so, we choosen timers, which have valuest1, t2, ..., tn, and∑ni=1 ti = 50 − β. After the expiration of theith timer, the processor speed is increased

to fi, wherefi denotes a fraction of the maximum speed. The values offi are chosen such

that the processor speed increases progressively andfn = fmax = 1. Thus, the application

40

runs at full processor speed if the event processing continues beyond50 − β ms. Observe

that rather than explicitly estimating the processor demand of the event, the GPA technique

monitors the progress of the event processing and adjusts the processor speed accordingly.

Further,β implicitly captures the impact of other concurrent applications in the system.

Analysis: It is possible to bound the maximum slowdown incurred by an application

in the GPA technique by carefully choosing timer values and CPU speeds. To see how,

observe that if the processor were running at full speed, the amount of work done in the

interval [t′, t′ +∑n

i=1 ti] will take only∑n

i=1 fiti at full processor speed. If the actual full-

speed processing time of the event is smaller than this value, then event finishes before the

50− β ms perception threshold in the GPA technique, and thus the user does not perceive

any performance degradation. For any event requiring more than this amount of full speed

execution time, the maximum possible performance degradation under our strategy is given

by:

degrade = 50− β −n∑

i=1

fi · ti (3.7)

since the processor will run at full speed once the execution time exceeds the perception

threshold.

To illustrate, suppose that an event in the GPA technique should not take more than

20ms longer than it would take at full processor speed. Letβ = 0 for simplicity. If we

chose five timers with values30ms, 5ms, 5ms, 5ms, and5ms, and the processor speeds

during these timer intervals as45%, 60%, 80%, 90%, and100%, respectively, then, from

Equation 3.7, the maximum possible user-perceived degradation for any event is20ms.

This is the maximum slowdown for any event that requires more than50ms of processing

time.

Implementation: We implemented GPA into AbiWord, a sophisticated word processor

with a code base of hundreds of thousands lines of C code. We added code at the beginning

of the AbiWord event handler to implement the GPA technique. The X11-server assigns

a time-stamp to each new user event such as mouse click or key-stroke. We extracted

41

this time-stampt and usedgettimeofday to determine the execution start timet′. The

parameterβ is computed as the difference betweent′ andt. This took only 17 lines of C

code. The rest of the modifications involved setting timers and invoking the Chameleon

interface to modify the CPU speed when each timer expires, which took 23 lines of C code.

In all, the implementation of GPA took only 40 lines of C code—a fairly modest change.

3.3.4 Web Browser

A web browser is another example of an event-driven interactive application that needs

to process various events such as a mouse click or a keystroke. When the user types a

URL or data into a web form, the keystrokes need to be displayed on the screen. When

the user clicks on a java-script menu on a web page, the menu needs to be expanded.

When the mouse is positioned over a hyperlink, visual feedback needs to be provided by

changing the shape of the mouse cursor. When the user clicks on a link, the browser needs

to construct and send out a HTTP request; when data arrives from the remote server, it needs

to parse and display the incoming data. Although the network delay is beyond the control

of the browser, all other “local” events should be processed within the human perception

threshold for good interactive performance. The GPA technique can be directly used for

power management in such a browser.

We added our GPA technique toDillo , a compact, portable open-source browser that

runs on desktops, laptops and PDAs. As in the case of the word processor, our modifications

were restricted to the event handler in Dillo. First, we extracted the event arrival time and

the execution start time in the event handler to computeβ. We then added code to set

timers and increase the processor speed upon timer expiration. In all, the implementation

of GPA into Dillo involved 46 lines of C code, again demonstrating the modest nature of

our modifications.

42

3.3.5 Batch Compilations

Compilations using a utility such asmakeis an example of a batch application. Un-

like interactive applications where the response time is important, the completion time (or

throughput) is important for batch applications. Typically,makespawns a sequence of

compilation tasks, one for each source code file. One possible user-level power manage-

ment strategy is to estimate the processor demand for each compilation task and to choose

an appropriate speed setting. However, since each compilation task is a separate process

that is relatively short-lived, gathering CPU usage statistics in order to make reasonable

decisions for each process is difficult. Instead, we believe the correct strategy is to allow

the end-user to specify the desired speed setting. System defaults can be used when the

user does not specify a setting.

Most Unix-like operating systems support thenice utility, which allows the end-user

to specify a CPU scheduling priority prior for a new process. For instance, the user in-

vokes the commandnice -n N make to specify thatmakeshould run at priorityN .

A low priority enables the batch application to run in the background without interfering

with foreground interactive applications. A high priority can also be chosen if the new

application is more important than current applications.

A similar strategy can be used for choosing CPU speed settings. We implemented a

utility called pnicethat enables the end-user to specify a particular CPU speed setting for

a new process. For instance, the user can invoke the commandpnice -n N make to

specify that make and all compilations spawned by its should run at a fixed CPU speed

settingN . A lower speed setting enables energy savings at the expense of increasing the

completion time, whereas a higher lowers the completion time at the expense of higher

energy consumption.

Implementation ofpnicewas straightforward. The pnice process first changes its own

speed setting to the specified valueN using the Chameleon interface. Next, it invokesexec

to run the specified command. The Chameleon kernel implementation ensures that any

43

processforkedby a parent process inherits the CPU speed setting of the parent, giving the

user’s process the correct speed setting. Since DVFS-enabled processors support a small

number of discrete speed settings, the parameterN specifies which of theN discrete speed

settings to use for the application. Thepniceutility was implemented in 125 lines of C

code, again demonstrating that implementation of user-level power management policies

take modest effort.

3.4 A User-level Power Manager

The previous section demonstrated how many commonly used applications can imple-

ment their own power management strategy. However, implementing a user-level power

management strategy requires modification to the source code, which may not be feasible

for legacy applications. Such applications can delegate the task of power management to

a user-level power manager. The power manager can use CPU usage statistics and any

application-supplied knowledge to modify CPU speed settings on behalf of the applica-

tions. A simple user-level power manager may choose a single speed setting for all appli-

cations based on current utilization; the speed setting is varied with observed changes in

system utilization. A more complex strategy is to choose a different speed setting for each

individual application based on its observed behavior; doing so requires usage statistics to

be maintained for each application. Multiple user-level power managers can coexist in the

system, so long as each manages a mutually exclusive subset of the applications. Thus, it

is feasible to implement a different power manager for each class of application.

The Chameleon interface enables the entire range of these possibilities. To demonstrate

the flexibility of our approach, we take a recently proposed DVFS approach—GraceOS [96,

97]—and show how the proposed technique can be implemented as a user-level power

manager using Chameleon. Our objective is two-fold. First, we show that many recently

proposed approaches such as GraceOS that employ anin-kernel implementation can be

implemented as user-level power managers. Second, GraceOS advocates a cooperative

44

application-OS approach, where applications periodically supply information to the OS and

the OS chooses the processor speed setting based on this information and usage statistics.

We show that such interactions between the application and the CPU scheduler are feasible

using the interface provided by Chameleon.

Implementation: We begin with a brief overview of the GraceOS [96]. GraceOS is de-

signed for periodic multimedia applications that belong to the soft real-time class. GraceOS

treats such applications differently from traditional best-effort applications. Whereas best-

effort applications are scheduled using the Linux time-sharing scheduler and do not benefit

from DVFS, soft real-time applications are scheduled using a QoS-aware soft real-time

scheduler and benefit from DVFS.

To handle soft real-time applications, GraceOS employs two key components: (i) a real-

time scheduler and (ii) a DVFS algorithm. The CPU scheduler is vanilla earliest deadline

first (EDF); standard EDF theory is used to perform admission control of soft real-time

tasks based on their worst case CPU demands. Admitted soft real-time tasks have strict

priority over best-effort tasks. Deadlines derived from the application-specified periods

are used for EDF scheduling of these tasks. Three system calls—EnterSRT, ExitSRT, and

FinishJob—are used to convey the start and finish time of tasks (e.g., frame decoding) to

the scheduler.

The DVFS algorithm maintains a histogram of CPU usage and derives a probability

distribution of processor demand. The processor demand and the application-specified

periods are used in a dynamic programming algorithm to derive a list of speed scaling

points. Each point(x, y) specifies that a job should runs at the speedy when it has usedx

cycles. The DVFS algorithm monitors the cycle usage of the task. If the usage increases

beyondx, the next speed settingy is chosen. Observe that this technique has similarities

with our GPA technique where the progress of a task is monitored and the speed is increased

gradually. The key difference is that the durationsx and speedsy are computed at run-time

using dynamic programming, whereas in GPA, they are statically chosen.

45

To implement GraceOS as a user-level power manager, we must distinguish between

the DVFS component and the CPU scheduler. The DVFS algorithm is fully implemented in

user space and uses the Chameleon interface to query usage statistics and monitor progress.

The CPU scheduler and any interactions between the application and the scheduler must

be implemented separately from Chameleon. Since Chameleon does not make any specific

assumptions about the underlying scheduler, it is compatible with any CPU scheduling

algorithm, including EDF.

Consequently, our implementation of GraceOS includes three components: (i) a user-

level daemon to calculate the soft real-time task’s demand distribution, cycle budget, and

speed schedule using dynamic programming (300 lines of C code); (ii) use of Chameleon’s

/dev/syscpuinterface driver to query the actual usage of each soft real-time task (109 lines

of C code); and (iii) three system callsEnterSRT, ExitSRT, andFinishJobthat allow an

application to convey the beginning and end of each soft real-time task (23 lines of C

code). Observe that the first two components relate to the DVFS algorithm, while the

third component is used by the CPU scheduler in GraceOS. The GraceOS user-level power

manager runs at the highest CPU priority in our system. All soft real-time applications run

at the next highest CPU priority, and best effort jobs run at lower priorities. EDF scheduling

is emulated by manipulating priorities of tasks; the task with the earliest deadline is elevated

in priority.

3.5 Implementation

Our prototype of Chameleon is implemented as a set of modules and patches in the

Linux kernel 2.4.20-9.

New system calls:We added four new system calls to implement the Chameleon OS

interface: (i)get-speed, which returns the current CPU speed of the specified process or

process group; (ii)set-speed, which sets the CPU speed of the specified process or process

group; (iii) get-speed-schedule, which returns processor budget and speed schedule of the

46

specified process, and (iv)set-speed-schedule, which sets the processor budget and speed

schedule of the specified task. The latter two system calls enable sophisticated speed setting

strategies, where an application can specify ana priori schedule for changing the speed as

it executes.

Chameleon-enhanced /proc interface:We enhanced the/proc interface by adding a

/proc/Chameleon sub-tree. This directory holds one file for each Chameleon-driven process

and allows applications to query their CPU quantum allocations in the recent past.

Chameleon /dev interfaces:To support user-level power managers, we added two

new /dev interfaces:/dev/sysdvfsand/dev/syscpu. The system-wide utilization is reported

via /dev/sysdvfs, whereas the CPU cycles consumed by individual tasks are reported via

/dev/syscpu.

Process control block enhancements:In order to allow Chameleon to implement

techniques such as PACE [59, 60] and GraceOS [96, 97] as user-level power managers,

we borrowed several process control block attributes from the GraceOS implementation:

(i) cycle counter, which measures the CPU cycles used by a task, (ii) cycle budget, which

stores the number of allocated cycles, and (iii) speed schedule, which stores a list and

schedule of speed scaling points. Whereas these three attributes are meaningful only for

Chameleon processes managed by user-level power managers, we also added three more

attributes that are applicable to all processes in the system: (i) Chameleon-driven-flag,

which indicates if the process is directly modifying its speed settings; (ii) current-speed,

which specifies the current CPU speed setting of the process; (iii) inheritable-flag, which

indicates if the speed setting is inheritable by its children.

DVS kernel module: The DVS kernel module is actually responsible for interfacing

with the hardware in order to modify the processor speed. This is done by writing the

frequency and voltage to two machine special registers (MSR) [96, 97]. Chameleon can be

applied to any DVFS-enabled processor by implementing a DVS kernel module specific to

that processor.

47

Linux scheduler enhancements: We modified the standard scheduler to add per-

process speed settings and cycle charging. Similar to our process control block enhance-

ments, cycle charging is only necessary to implement other techniques as user-level power

managers, and is directly inspired by the GraceOS implementation [96, 97]. Whenever the

schedule() function is invoked, the modified scheduler will do the following: (i) in the case

of no context switch, it may change the speed of the current task according to its speed

schedule; (ii) in the case of a context switch, the scheduler performs some book-keeping

only for the previous task with a speed schedule (e.g., update its cycle counter, decrement

cycle budget, advance speed schedule, etc.); (iii) then the scheduler sets the CPU speed for

the new task based on its current-speed attribute.

Our implementation of Chameleon runs on a Sony Vaio PCG-V1CPK laptop with

Transmeta Crusoe TM5600-667 processor [86]. The Transmeta TM5600 processor sup-

ports five discrete frequency and voltage levels (see Figure 3.11) and implements theLon-

gRun[25] technology in hardware to dynamically vary the CPU frequency based on the

observed system-wide CPU utilization. LongRun varies the CPU frequency between a

user-specified maximum and minimum values—these values can be set by users by writing

to two machine special registers (MSR). By default, these values are set to300 MHZ and

677 MHz, enabling the full range of voltage scaling. LongRun can be disabled by set-

ting the minimum and maximum register values to the same frequency (e.g., setting both

to 533 MHz does not allow any leeway in changing the CPU frequency, effectively dis-

abling LongRun). This feature can be used to implement voltage scaling insoftware—the

power-aware application can determine the desired frequency and set the two registers to

this value. Figure 3.12 shows the mapping from CPU speed percentages to a corresponding

CPU frequency for the Transmeta processor used in our prototype implementation.

48

Freq. (MHz) Voltage (V) Power (W)300 1.2 1.30400 1.225 1.90533 1.35 3.00600 1.5 4.20667 1.6 5.30

Figure 3.11.Characteristics of the TM5600-667 processor

CPU Speed PercentageFreq. (MHz)[0%, 45%) 300[45%, 60%) 400[60%, 80%] 533(80%, 90%] 600(90%, 100%] 667

Figure 3.12.Speed adapter mappings from the percentage CPU Speed to a CPU Frequencyfor the Transmeta TM5600.

3.6 Experimental Evaluation

We evaluated Chameleon on a Sony Vaio PCG-V1CPK laptop equipped with a Trans-

meta Crusoe processor and 128MB RAM. The operating system was Red Hat Linux 9.0

with a modified version of Linux kernel 2.4.20-9. To compare Chameleon with other DVFS

approaches, we implemented three OS-based DVFS techniques proposed in the literature:

(i) PAST [93], (ii) PEAK [50], and (iii)AV Gn [38], all of which are interval-based system-

wide DVFS techniques. Our experiments involve running several applications under six

different configurations: (i) with DVFS disabled—the CPU always runs at the maximum

speed (denoted as FULL), (ii) using the hardwired LongRun technology, (iii) using PAST,

(iv) using PEAK, (v) usingAV Gn, and (vi) using Chameleon (where LongRun is disabled

for power-aware applications but enabled for legacy applications). We also provide a com-

parison of Chameleon to GraceOS using a soft-realtime application–GraceOS is applicable

only to periodic multimedia applications, and hence, it is not feasible to compare it to other

Chameleon applications.

The energy consumption of the processor during an intervalT is computed as

49

energy =n∑

i=1

piti (3.8)

wheren is the number of available frequency/voltage combinations on the processor,pi

denotes the power consumption of the processor when running at theith frequency/voltage

combination, andti represents the time spent at theith frequency/voltage combination

during the intervalT . We modified the Linux kernel to record the energy consumption of

the TM5600 processor using Equation 3.8 and Figure 3.11. Given the energy consumption

of the processor during an intervalT , the average power consumption of the processor

during this interval is computed as

poweravg =energy

T(3.9)

In our experiments, we observed that PEAK always consumed the least processor en-

ergy among all the DVFS techniques. However, it trades its energy savings with an un-

acceptably high performance degradation for time-sensitive multimedia and interactive ap-

plications. For example, video decoding of a30 minutes clip took an extra 16.6 minutes,

resulting in poor performance. Therefore, we omit the results of PEAK in the rest of this

chapter and refer the readers to [55] for these results.

3.6.1 Chameleon-aware Applications

We first demonstrate the effectiveness of our four Chameleon-aware applications. Our

experiments assume a lightly-loaded system that runs a single application with the typical

background system processes.

3.6.1.1 Video Decoder

We encoded several DVD movies at different bit-rates and resolutions using Divx MPEG

video codecs and MP3 audio codec The characteristics of six such movies are listed in Fig-

ure 3.14. The bit-rates are depicted in the form(a + b)Kbps, wherea is the video andb is

50

Movie 1 Movie 2 Movie 3 Movie 4 Movie 5 Movie 60

2

4

6

8

Movies

Ave

rage

Pow

er C

onsu

mpt

ion

in W

atts

Average Power Consumption of Movie Playback

1.65 2.

273.

37 3.64

5.30

1.51 1.

903.

16 3.26

5.30

1.70

2.51

3.52 3.

865.

30

1.95

2.86

3.75 4.

245.

30

2.31

3.38

4.01 4.

625.

30

2.71

3.85 4.

28 4.79 5.

30

ChameleonLongRunPASTAVGnFULL

(a) Average Power Consumption (Watts)


1%

2%

3%

4%

Movies

Per

cent

age

of L

ate

Fram

es

Performance of Movie Playback

0.45

%0.

49%

0.30

%0.

31%

0.15

%

0.27

%0.

28%

0.21

%0.

29%

0.14

% 0.39

%1.

30%

0.28

% 0.48

%0.

15% 0.

48%

1.13

%0.

33% 0.56

%0.

29%

1.20

% 1.56

%0.

97%

2.29

%0.

70%

1.55

%1.

40%

1.37

%1.

19%

1.11

%


(b) % of Late FramesFigure 3.13. Average CPU power consumption and percentage of frames that are late bymore than 8ms (20% of the 40ms deadline).

the audio bit-rate. We recorded the energy consumed by the processor during playback of

these movies at full speed, with LongRun, with Chameleon, with PAST, and withAV Gn.

Our experiments show that all five configurations handle movie playback very well.

The same playback quality is observed under these five configurations: identical execution

times which equal the length of the movies, identical frame rates, no dropped frames, and

no user-noticeable delays. However, the average CPU power consumption differs signifi-

cantly across the various configurations (see Figure 3.13(a)). Figure 3.13(a) shows that: (i)

neither PAST norAV Gn can outperform LongRun; (ii) LongRun can achieve significant

51

Res. Length Frames Bit-Rate(Kbps)Movie 1 640x272 3360s 80387 1290.9 + 179.2Movie 2 640x272 612s 14577 757.2 + 128.0Movie 3 640x352 7168s 179168 679.7 + 128.0Movie 4 640x352 602s 15003 861.9 + 128.0Movie 5 640x352 1755s 42040 2456.9 + 192.0Movie 6 640x480 2394s 57355 1674.6 + 384.0

Figure 3.14.Characteristics of MPEG 4 Videos

energy savings (from27.36% to 57.26%) when compared to FULL; (iii) the Chameleon-

awaremplayercan achieve an additional20.52% to 31.99% energy savings when compared

to LongRun.

Although there are no user-perceived playback problems (in terms of dropped frames or

playback freezes) under the five configurations, we do observe jitter in the playback quality

at the frame-level. Such small inter-frame jitter is inevitable in a time-sharing CPU sched-

uler, although its effects are not perceptible at the user-level.mplayerprovides statistical

measurements of late frames—the number of frames that are behind their deadline by more

than20% of the frame interval. As shown in Figure 3.13(b), the number of late frames in

Chameleon is mostly comparable to PAST andAV Gn and typically better than LongRun

(while consuming the least energy). FULL has the least—although not zero—late frames

at the expense of the highest energy consumption. The number of late frames is small

(0.2− 2.3%) in all configurations.

3.6.1.2 Video Conference Tool

To ensure repeatable and comparable experiments with the video conferencing tool,

we encoded several video clips with varying degrees of motion, and replay those videos

through a remote sending applications. The sender encodes these videos and transmits

them to our Chameleon-aware client over a lightly loaded network. This ensures a fair

comparison across the various DVFS techniques and enables us to carefully control the

amount of motion in each session.

52

Conference 1 Conference 20

2

4

6

8

Ave

rage

Pow

er C

onsu

mpt

ion

in W

atts

Average Power Consumption of Video Conference

1.45 1.

63

2.44

2.47

5.30

1.47 1.

65

2.46

2.49

5.30


(a) QCIF

Conference 3 Conference 40

2

4

6

8

Ave

rage

Pow

er C

onsu

mpt

ion

in W

atts

Average Power Consumption of Video Conference

2.67

4.07

4.07 4.

35

5.30

2.73

4.20

4.17 4.

41

5.30


(b) CIF

Figure 3.15.Average CPU power consumption of video conferencing.

We ran our video conference experiments under two resolutions, QCIF (176x144) and

CIF (352x288), for all five configurations. In our experiments, all five configuration han-

dle the video conference very well. The same quality is observed under all configurations:

identical execution times and no deadline misses (i.e., the decoding of each packet com-

pletes before the arrival of the next packet). Our results, shown in Figure 3.15, show that

LongRun achieves significant energy savings (from20.75% to 69.25%) when compared to

FULL. Chameleon-awaregnomemeetingachieves an additional11 − 34% energy savings

when compared to LongRun, while PAST andAV Gn are worse than LongRun.

3.6.1.3 Web Browser and Word Processor

We ran the web browser and the word processor and measured their average power

consumption, the average response time, and the percentage of late events (where event

processing time exceeds the50ms threshold).

To eliminate the impact of variable network delays, our experiments with the web

browser consisted of a client requesting a sequence of web pages from a web server on

a local area network; the requested web pages consist of actual web content that was saved

53

from a variety of popular web sites. Each experiment consists of a sequence of requests

to these web pages with a uniformly distributed “think-time” between successive requests.

The experiments differ in the requested web pages and the chosen think times; each exper-

iment is repeated under the five configurations, and we measure the mean for each experi-

ment.

The workload for the word processor emulates a user editing a sequence of documents.

Each experiment contains a script that makes a sequence of editing requests to these doc-

uments with a uniformly distributed “think-time” between successive requests. The ex-

periments differ in the edited documents and the chosen think times; each experiment is

repeated under the five configurations, and we measure the mean for each experiment.

Our results, depicted in Figure 3.16(a), show that LongRun consumes a factor of three

less power than FULL. Chameleon are able to extract an additional10.27% energy savings

when compared to LongRun, while PAST is worse than LongRun. We also note that the

average power consumption under Chameleon is only0.03W and0.06W higher than the

power consumption at the slowest CPU speed (300MHz) for the browser and the word

processor, respectively. Further, most events finish in Chameleon without any performance

degradation. The percentage of late events is only 0.24% and 0.22% in the word processor

and the browser, respectively, and is comparable to other approaches. Finally, the increase

in processing times of late events is no more than 20ms (obtained by substituting the chosen

timer values and CPU speeds in Equation 3.7).

3.6.1.4 Batch Compilations

We compiled a portion of thens-2network simulator usingmakeand ourpniceutility.

We chose different values of the CPU speed inpniceand measured the power consumption

and completion times ofmake. As expected, our results, depicted in Figure 3.17, show

that the power consumption can be traded for completion time by appropriately choosing a

speed setting. Faster speeds lower completion times at the expense of using more energy.

54

Web Browser Word Processor0

2

4

6

8

Interactive Applications

Ave

rage

Pow

er C

onsu

mpt

ion

in W

atts

Average Power Consumption of Applications

1.33 1.48 1.

88

1.40

5.30

1.36 1.44 1.

89

1.93

5.30



Web Browser Word Processor0

0.1%

0.2%

0.3%

0.4%

0.5%

0.6%

0.7%

0.8%

0.9%

1.0%

Interactive Applications

Per

cent

age

of L

ate

Eve

nts

Late Events Percentage of Applications

0.67

%

0.60

%

0.55

% 0.63

%

0.47

%

0.24

%

0.23

%

0.16

%

0.11

%

0.12

%


(b) % of Late Events

Figure 3.16.Average CPU power consumption and % of late events.

Freq. Completion Mean Power(MHz) Time Consumed

300 1376s 1.38W400 1066s 1.96W533 910s 3.00W600 812s 4.14W667 776s 5.15W

Figure 3.17.Completion times and mean CPU power consumption for batch compilations.

3.6.2 Impact of Concurrent Workloads

To demonstrate that applications can make locally- and globally-optimal power man-

agement decisions in the presence of concurrent applications, we considered four appli-

cation mixes: (i) video decoder and web browser (mix M1), (ii) video decoder and word

processor (mix M2), (iii) video decoder and batch compilations (mix M3), and (iv) batch

compilations and word processor (mix M4). Note that, from the perspective of the video

decoder, the background load increases progressively from mix M1 to M3.

Figure 3.19 and Figure 3.18 show the average power consumption and the performance

of these applications under various power management strategies. Figure 3.19 shows that

55

Mix M1 Mix M2 Mix M40

2

4

6

8

10

12

Concurrent Workloads

Ave

rage

Res

pons

e Ti

me

in m

illis

econ

ds

Average Response Time of Interactive Applications

10.1

3

8.85

8.12

7.47

7.39

5.86

4.44

4.15

3.99

3.97

5.96

4.81

4.62

4.06

4.02



0.3%

0.6%

0.9%

1.2%

1.5%

1.8%


Per

cent

age

of L

ate

Eve

nts

% of Late Events in Interactive Applications

0.26

%0.

19%

0.22

%0.

16%

0.13

%

0.22

%0.

13%

0.18

%0.

12%

0.11

%

0.56

%0.

28%

0.34

%0.

24%

0.21

%


(a) Average Response Time (Milliseconds) (b) % of Late Events


5.0%

10.0%

15.0%

20.0%

25.0%

30.0%


Per

cent

age

of L

ate

Fram

es

% of Late Frames in Movie Playback

0.41

%

0.42

%

0.31

%

0.46

%

0.21

% 2.11

%

1.87

%

1.34

%

1.40

%

0.85

%

23.5

8%

23.3

9%

22.5

7%

23.9

3%

23.5

9%


(c) % of Late FramesFigure 3.18.Performance of concurrent applications: average response time of interactiveapplications and the percentage of late events and frames.

Chameleon always consumes the least energy among the five configurations. The energy

savings range from19.81% to 31.19% when compared to LongRun, which itself extracts

up to41.89% reduction when compared to FULL. The performance degradation, depicted

in Figure 3.18(a), shows that interactive application performance in Chameleon is compa-

rable to the other techniques. For instance, the average event processing time of the word

processor under mix M2 increases from 4.4ms in Longrun to 5.96ms in Chameleon and is

well under the human perception threshold of 50ms. A similar result is seen for the web

56

Chameleon LongRun PAST AV Gn FULLMix M1 2.25W 3.27W 3.98W 4.42W 5.3WMix M2 2.47W 3.08W 3.79W 3.83W 5.3WMix M3 3.81W 5.27W 5.26W 5.27W 5.3WMix M4 3.71W 5.22W 5.23W 5.23W 5.3W

Figure 3.19.Average CPU Power Consumption for various mixes.

browser under mix M1. The percentage of late events remains well under 1% under all

mixes (see Figure 3.18(b)).

Figure 3.18(c) plots the percentage of late frames in the video decoder for different

mixes. The figure shows that the percentage of late frames in Chameleon is comparable to

other approaches. As the background load increases from mix 1 to mix 3, we see that the

percentage of late frames increases from around 0.4% to more than 22%. For mix M3, all

techniques, including FULL, incur 22% deadline misses. Decoding of the10 minutes clip

takes an extra 20 seconds under all techniques, resulting in poor performance. This is pri-

marily due to insufficient processor availability at higher loads, as opposed to deficiencies

in the power management technique. Due to the background load imposed by the batch

compilations in mix M3, the time sharing scheduler is unable to allocate sufficient CPU

time to the video decoder.

Figure 3.20 shows the fraction of time spent by the video decoder at different CPU

speed settings in Chameleon. In the absence of any background load, the decoder is able

to lower its speed setting to the lowest speed for more than 90% of the time. As the load

increases, the fraction of time spent at higher speeds increases. For mix M3, more than

80% of the time is spent at the highest speed (recall that insufficient processor availability

causes the video decoder to run at full speed—Case 2 in Section 3.3.1).

Under mix M3, the only possible solution is to use a QoS-aware scheduler that guar-

antees a fixed fraction of the CPU to the video decoder regardless of the background load.

We ran mix M3 with Chameleon on a proportional-share scheduler, namely Hierarchical

Start Time Fair Queue (HSFQ) CPU scheduler [34] In this experiment, we assigned1/14

57

No Background Load Mix M2 Mix M3 Mix M3 with H−SFQ0

20%

40%

60%

80%

100%

Concurrent RunsP

erce

ntag

e of

Tim

e

Fraction of Time at Each Frequency Level

300MHz400MHz533MHz600MHz667MHz

Figure 3.20.Fraction of time spent at various frequency levels bymplayerin Chameleon

fraction of CPU time to the batch compilations,12/14 fraction of CPU time to the video

decoder and the X server, and the remaining1/14 to the other tasks. As expected, the

percentage of late frames in the video decoder fell to a very small value. Further, since pro-

cessor availability is guaranteed in HSFQ, as shown in Figure 3.20, the video decoder was

able to spend73.73% of its execution time at the lowest frequency (300MHz) (compared

to 7.74% under time-sharing CPU scheduler). This causes the mean power consumption to

fall to 2.1W, a 44.8% reduction when compared to the time-sharing scheduler.

3.6.3 Isolation in Chameleon

We claim that Chameleon isolates an application from the power settings of other appli-

cations. To demonstrate the effects of such isolation, we ranmplayerwith a misbehaving

background application using the Linux time sharing scheduler. The background applica-

tion rapidly switches its CPU speeds from one setting to another every few milliseconds.

We ran mplayer with this application when it was well-behaved (it used a fixed CPU speed

throughout) and then with the misbehaving version of the application. We measured its

impact on the progress of the mplayer. As shown in Figure 3.21, the progress made by

mplayer is unaffected by the rapid changes of CPU speed by the misbehaving application—

58

0

2000

4000

6000

8000

10000

12000

14000

16000

0 100 200 300 400 500 600 700

Fram

e N

umbe

rTime(Seconds)

Progress of Frame Decoding in mplayer

well-behaved background loadmisbehaving background load

Figure 3.21. Isolation from power settings of other applications.

any change to the CPU speed by an application only impacts its own progress and has no

impact on the CPU shares received by other applications.

3.6.4 User-level Power Manager

We modifiedmplayerto use the GraceOS system calls and used it to decode the movies

in Figure 3.14. The GraceOS user-level power manager was used to make power man-

agement decisions on behalf of mplayer. We measure the energy consumed by mplayer

and plot it in Figure 3.22. Our results show that Grace-OS can achieve3.50% to 18.44%

energy savings when compared to LongRun. However, Chameleon outperforms GraceOS

by 9-41%. This is because the Chameleon-enhanced mplayer is able to estimate decoding

times of individual frames based on domain-knowledge, while GraceOS relies on external

observations of the CPU usage of mplayer. This domain knowledge yields an extra 9-41%

in Chameleon.

3.6.5 System Overhead

An important consideration is the overhead caused by frequent changes in the CPU

speed setting. Using the CPU cycle counter register, we measure the cost as1125 cycles

(about3.75 µs under300 MHz and1.69 µs under667 MHz). Due to better DVFS support

in the Transmeta processor, this is considerably lower than the 8,000-16,000 cycles reported

59


2

4

6

Movies

Average Power Consumption of Movie Playback

Ave

rage

Pow

er C

onsu

mpt

ion

in W

atts

1.65

2.11 2.

27

1.51 1.64 1.

90

1.70

2.11

2.51

1.95

2.76 2.86

2.31

3.09 3.

38

2.71

3.14

3.85

ChameleonGraceOSLongRun

Figure 3.22. Average CPU power consumption of movie playback under GraceOS,Chameleon, and LongRun.

Video decoder 2738 per frameGPA technique 1149 per timer

pnice 127

Figure 3.23.Overhead of application-level power management (in CPU cycles).

for an HP laptop used in the GraceOS experiments [96, 97]; however, both incur minimal

overhead. Finally, the overhead values of the video decoder, GPA andpnicestrategies are

2738, 1149, and127 CPU cycles, respectively, which is in the order of a few micro-seconds.

3.7 Concluding Remarks

This chapter proposes Chameleon, a new approach for power management in mobile

processors. We argue that applications know best what their energy needs are and propose

an approach that allows them to make decisions on power management. The operating

system only enforces protection and isolates applications from the power settings of other

applications.

Our integration of application-level power management policies into four applications

demonstrates that such policies impose a modest cost of tens of lines of code. Our results

show that Chameleon can extract up to 32% energy savings when compared to LongRun

60

to frequency (MHz)300 400 533 600 667

300 1101 1099 1086 1066from 400 1125 1095 1086 1066

frequency 533 1117 1104 1073 1066(MHz) 600 1125 1101 1092 1066

667 1117 1101 1088 1077

Figure 3.24.Cost of Voltage and Frequency Scaling (in CPU cycles).

and up to 50% savings when compared to recently proposed OS-based DVFS techniques,

while delivering comparable performance to time-sensitive and interactive applications.

Chameleon imposes negligible overheads and is very effective at scheduling concurrent

applications with diverse energy needs. More broadly, our results demonstrate the feasibil-

ity and benefits of power management at the application level.

However, implementing an application-level power management strategy requires mod-

ifications to the source code, which may not be feasible for legacy applications. To achieve

energy savings for applications without access to their source code, in the next chapter, we

propose a time series-based power management approaches for such applications.

61

CHAPTER 4

TIME SERIES-BASED POWER MANAGEMENT

4.1 Introduction

In Chapter 3, we propose Chameleon, an application-level power management ap-

proach, for mobile processors. Although Chameleon can achieve significant energy sav-

ings without performance degrade, Chameleon requires modification to the source code

of applications. Therefore, in this chapter, we propose atime series-based power man-

agement (TSPM)approach for mobile processors and disks. Unlike Chameleon, TSPM is

completely transparent to applications, and does not require access to the source code of

application and any information from applications. However, TSPM does share several

features with Chameleon in employing different CPU frequencies for different tasks and

not making assumptions about the nature of applications.

Our techniques are designed to handle both processors and disks with power manage-

ment capabilities. We assume a DVFS-capable processor and a DRPM-capable disk that

supports multiple rotational speeds and present techniques for varying the speeds of these

components based on the workload.

4.2 System Architecture

Our time series-based power management techniques assume a processor that supports

dynamic voltage and frequency scaling and a hard disk that supports different rotational

speeds. While DVFS-capable processors are widely available, disks that support multiple

rotational speeds are not yet commercially available—we assume that such disks will be

widely used in future mobile devices.

62

Applications

Speed Scaling

Get Speed

Monitoring

Set SpeedTS-

PM e

nabl

ed O

S

Processor and I/O Demands

Demands Profiler

Demands Predictor

Speed Adaptor

Processor/Disk

Figure 4.1. The architecture of a TS-PM-enabled OS kernel

Our TSPM technique consists of three components: (i) aprofiler that measures the

current CPU and I/O demands for individual tasks as well as for the system as a whole,

(ii) a predictor that uses a time series of recent CPU and I/O demand to predict future de-

mands using statistical methods, and (iii) aspeed settingstrategy that uses these predictions

to compute the desired CPU and disk settings as well as aspeed adapterthat maps these

settings to the nearest speed actually supported by the hardware. Observe that the speed set-

tings strategy computes an ideal speed for the processor and disk, while the speed adapter

maps it to a speed actually supported by the hardware (since different processors and disks

support different power settings, this separation ensures OS portability across hardware).

Figure 4.1 depicts these components. We assume that both TS-DVFS and TS-DRPM em-

ploy these components, although the specific algorithms used for profiling, prediction and

speed setting will differ for processors and disks.

4.3 Profiling Current Demands

This section provides an overview of the profiling techniques employed in TSPM to

measure processor and I/O demands of applications.

63

4.3.1 Measurement of Processor Demand

Since TS-DVFS supports per-process CPU speed settings, the profiler must estimate

the processor demands of individual processes. Typically, system-wide processor demand

is measured usingsystem utilization, which is given as:

Usys =Tbusy

T(4.1)

whereTbusy denotes the time for which the CPU is busy during some intervalT . This

concept can be extended to capture the CPU demands of individual processes.Process

utilizationu—the CPU utilization due to an individual process—can be defined as follows.

Consider a process that executes for timee within a quantum. Assume the CPU frequency

changes changesj times within the quantum and that the process runs at CPU frequencyf1

for time t1 and then at frequencyf2 for time t2 and so on. Then thefull-speed equivalent

execution timeefse for the executione is given by:

efse =

j∑i=1

fiti (4.2)

wherefi is the CPU frequency expressed as a percentage of the maximum available fre-

quency. Intuitively,efse represents the time for which the process would have executed in

the quantum if the processor were running at full speed throughout.

To compute the process utilization, assume that the process was scheduledn times

during an intervalT , andei is the full-speed execution time ofith execution. Letqi denote

the time quantum that the process gets during itsith execution in that interval. Then, the

process utilizationu during that interval is given as:

u =Pn

i=1 eiPnWe=1 qi

qi = ei + ei

busyiidlei

(4.3)

64

wherebusyi refers to the length of the continuous non-idle time in which theith execution

sits, andidlei denotes the length of the first continuous idle time after theith execution.

4.3.2 Measurement of I/O Demand

TS-DRPM estimates the I/O demands by measuring the response time of disk requests.

Given an intervalT , suppose that there aren I/O requests during this interval with response

timesr1, r2, ...,rn, respectively. Then, thedisk utilizationin this interval is given by:

u =

∑ni=1 ri · si

T(4.4)

wheresi is a scaling factor based on the current disk speed,0 < sW e ≤ 1, andrW e · si

denotes thefull-speed response time—the response time that would have been observed

if the disk were to run at full speed. Due to the presence of an on-board disk cache, not

all requests result in disk accesses, and hence, Equation 4.4 does not correctly reflect the

I/O demand of applications. To measure true I/O demand, we should only consider those

requests that result in misses in the on-disk cache (and result in actual disk accesses). The

profiler labels each I/O request as ahit or a missand computes the utilization by only

considering misses. Our profiler uses a heuristic for this labeling—only those read requests

with response times below a thresholdτ are labeled as hits; the remaining reads and all

writes are labeled as misses (typicallyτ is set to less than a millisecond for modern disks).

4.4 Predicting Future Demand

In order to adapt the speed of processor and hard disk proactively to changes in work-

load characteristics, TSPM needs to predict the future system behavior. Typical prediction

approaches use the last observed value, or an exponential decay function to incorporate an

observation history.

Time series analysis has been extensively used for forecasting in the community [7].

This technique builds a process model for the time series under consideration, and predicts

65

future values by combining the recently observed series with this model. We use the autore-

gressive AR(1) model to make predictions based on previously measured values [58]. This

model is similar to an exponential decay function, while the decay parameter is dynami-

cally adapted to changes in workload behaviors based on time series of observed values.

Using an AR(1) model, the processor or I/O demandsu at measurement intervaln + 1 are

given by Equation 4.5:

un+1 = u + φ1(un − u) (4.5)

where,φ1 andu are the autocorrelation and mean of the processor or I/O demands respec-

tively. They are estimated dynamically by our predictor using recent observations:

u =Pm−1

i=0 un−i

m

φ1 =Pm−1

i=0 (un−i−u)(un−1−i−u)Pm−1i=0 (un−i−u)(un−i−u)

(4.6)

Although both TS-DVFS and TS-DRPM use the AR(1) predictor, they use it in different

ways. TS-DVFS uses a different AR(1) predictor for each process in the system, while TS-

DRPM uses a single predictor to predict aggregate I/O demand.

4.5 Speed Setting Strategy

This section outlines the speed setting strategies for the processor and disk.

4.5.1 Processor Speed Setting Strategy

Since the AR(1) model may not always represent the time series of processor and I/O

demands accurately, our predictor is imperfect and will yield prediction errors. In order

to quickly respond to the prediction errors, we implement a two-level CPU speed setting

strategy in TS-DVFS. The first level works at the time scale of the prediction intervalT

and is responsible for computing a baseline CPU frequency for the entire intervalT . The

second level works at the granularity ofsubintervalswithin T and adjusts the baseline CPU

speed setting whenever prediction errors are detected.

66

Suppose that the prediction interval isT = n×10 ms, wheren is an integer (we choose

3 ≤ n ≤ 5 in our implementation). At the end of each such interval, TS-DVFS computes

the baseline CPU frequency for the next interval as

fbase = u× fmax (4.7)

wherefmax is the maximum CPU frequency, andu is the processor utilization prediction

for the next interval.

The intervalT is further divided intom subintervals, and the length of each such subin-

terval is 10nm

ms. Letuj denote the observed process utilization of the application until the

end of thejth subinterval and letfj denote the frequency setting in this subinterval. Then

the CPU frequency setting is adjusted as follows:

fk =

fk−1 if |uk−1 − uk−2| ≤ thresholduk−2

uk−1 × fmax if |uk−1 − uk−2| > thresholduk−2

fbase if k = 1 or uk−1 = 0

(4.8)

wherethresholdj is a predefined series of thresholds,u0 = u andf0 = fbase. Intuitively,

the frequency setting is adjusted whenever the observed utilization in a subinterval is a

threshold bigger than that in the previous subinterval (indicating a prediction error). It is

left unchanged when the two utilizations are within a threshold and reset to the baseline

when the utilization drops to zero.

4.5.2 I/O Speed Setting Strategy

Modern hard drives implement pre-fetching in the hardware to maximize disk cache

performance (for instance, track buffering is a form of pre-fetching where an entire disk

track is read whenever any sector on that track is requested). Any speed setting strategy

should avoid choosing disk speeds that will interfere with such pre-fetching. Aggressive

67

lowering of disk speed can impact pre-fetching, reduce the cache hit ratio, and severely

degrade application performance.

Our I/O speed setting strategy takes these factors into account when computing an ap-

propriate speed setting. Specifically, our technique take into account the arrival rate during

last interval, the hit ratio of the most recentn requests, and the performance slowdown at

different RPM levels when computing the speed. Suppose that the arrival rate for the last

T seconds isa, the hit ratio of the most recentn requests ish, and TS-DRPM predicts disk

utilization for the nextT seconds asu. Let Rdiff [i] denote the difference in the rotational

latency between the maximum RPM level and the candidate RPM leveli (note that the

seek time remains unchanged when changing the disk rotational speed). The performance

slowdownPdiff [i] is given by the increase in rotational latency seen by thesea×T requests:

Pdiff [i] = a(1− h)× T ×Rdiff [i] (4.9)

With this, we can predict the disk utilizations under different RPM levels for the nextT

seconds by:

ui = u + Pdiff [i] (4.10)

whereui is the predicted utilization if running on RPM leveli for the nextT seconds.

Let umax denote the utilization at the maximum RPM level. We choose the lowest RPM

level that satisfies the following property:

ui − umax

umax

≤ threshold (4.11)

wherethreshold is a predefined threshold.

4.6 Implementation and Simulation

We have implemented TS-DVFS in the Linux kernel on a Sony Vaio PCG-V1CPK

laptop with Transmeta Crusoe TM5600-667 processor [86]. Since DRPM-enabled disks

68

Freq. (MHz) Voltage (V) Power (W)300 1.2 1.3400 1.225 1.9533 1.35 3.0600 1.5 4.2667 1.6 5.3

Figure 4.2. Characteristics of the TM5600-667 processor

are not yet commercially available, we implement TS-DRPM in a simulated DRPM-ready

hard disk using DiskSim [30]. Next we present the details of our implementation.

4.6.1 Implementation of TS-DVFS

The Transmeta TM5600 processor supports five discrete frequency and voltage levels

(see Figure 4.2) and implements theLongRun[25] technology in hardware to dynamically

vary the CPU frequency based on the observed system-wide CPU utilization. LongRun

varies the CPU frequency between a user-specified maximum and minimum values—these

values can be set by users by writing to two machine special registers (MSR). By default,

these values are set to 300 MHZ and 677 MHz, enabling the full range of voltage scaling.

LongRun can be disabled by setting the minimum and maximum register values to the same

frequency (e.g., setting both to 533 MHz does not allow any leeway in changing the CPU

frequency, effectively disabling LongRun). This feature can be used to implement voltage

scaling insoftware—the OS can periodically determine the desired frequency and set the

two registers to this value.

Our prototype of TS-DVFS is implemented as a set of modules and patches in the

Linux kernel 2.4.20-9. Our implementation uses a scaling intervalT of 40ms, sub-intervals

of 10ms, and a window size4 for the AR(1) predictor. Our prototyping effort involved

the following issues: (i) implementation of the CPU demand profiler and predictor, (ii)

modifications to the kernel CPU scheduler to support per-process DVFS settings (which are

taken into account at context switch time), (iii) implementation of the CPU speed adaptor

69

Process Utilization Freq. (MHz)[0%, 45%) 300[45%, 60%) 400[60%, 80%] 533(80%, 90%] 600(90%, 100%] 667

Figure 4.3. Mapping Process Utilizations to a CPU Frequency in the Transmeta TM5600.

RPM Idle Power Seek Power Read/Write Power3000 0.8W 1.4W 1.3W3600 1.1W 1.7W 1.6W4200 1.4W 2.0W 1.9W4800 1.7W 2.3W 2.2W5400 2.0W 2.6W 2.5W

Figure 4.4. Characteristics of the Simulated DRPM-Ready Hard Disk

for the Transmeta processor, and (iv) determination of thethresholdj values from off-line

empirical experiments. The latter experiments yield a hardware-specific conversion table

(see Figure 4.3) for mapping process utilizations to a corresponding CPU frequency.

4.6.2 Simulation of TS-DRPM

To simulate TS-DRPM, we consider an IBM TravelStar 40GNX [88] laptop hard disk as

the baseline and enhance it with DRPM features. We enhance the disk with five different

RPM levels from 3000 to 5400 RPM with a step size of 600 RPM. The assumed power

characteristics for these RPM levels are shown in Figure 4.4. We implement TS-DRPM

for this disk in the DiskSim simulator [30]. We assume a scaling intervalT of 10s, a

threshold value of 0.15, a history of the 100 most recent requests, and a window size of

2 for the AR(1) predictor. We augment DiskSim with power models to record the energy

consumption of the disk at various RPM levels. Our implementation also accounts for the

queuing and service delays caused by the changes in the RPM level of the disk and in the

STANDY/ACTIVE/IDLE modes.

70


We evaluated TS-DVFS with a variety of applications and TS-DRPM with a variety of

real application traces. This section presents our experimental results.

4.7.1 TS-DVFS Results

To evaluate TS-DVFS, we ran applications under three different configurations: (i)

with DVFS disabled—the CPU always runs at the maximum available speed (denoted as

FULL); (ii) using the hardwired LongRun technology; (iii) using TS-DVFS and with the

LongRun technology disabled. The energy consumption of the processor during an interval

T is computed as

energy =n∑

i=1

piti (4.12)

wheren is the number of available frequency/voltage combinations on the processor,pi

denotes the power consumption of the processor when running at theith frequency/voltage

combination, andti represents the time spent at theith frequency/voltage combination

during the intervalT . We modify the Linux kernel to record the energy consumption of the

TM5600 processor using Equation 4.12 and Figure 4.2.

4.7.1.1 Multimedia Applications

We encoded several DVD movies at different bit-rates and resolutions using Divx MPEG4

video codec and MP3 audio codec. The characteristics of ten such movies are listed in Fig-

ure 4.5. The bit-rates are depicted in the form(a + b)Kbps, wherea is the video andb

is the audio bit-rate. We recorded the energy consumed in the playback of these movies

at full speed, with LongRun and with TS-DVFS. In addition to these techniques, we also

played these movies using Chameleon-enhanced mplayer under Chameleon which is the

application-level power management approach presented in Chapter 3.

Our experiments show that all four configuration—Chameleon, TS-DVFS, LongRun,

and FULL—handle these movies very well. The same playback quality is observed under

all three configurations: identical execution times, same frame rate, no dropped frames and

71

Res. Length Frames Bit-Rate(Kbps)Movie 1 640x272 3360s 80387 1290.9 + 179.2Movie 2 640x272 612s 14577 757.2 + 128.0Movie 3 640x352 7168s 179168 679.7 + 128.0Movie 4 640x352 602s 15003 861.9 + 128.0Movie 5 640x352 1755s 42040 2456.9 + 192.0Movie 6 640x480 2394s 57355 1674.6 + 384.0

Figure 4.5. Characteristics of MPEG 4 Videos

no user-noticeable delays. The results in Figure 4.6(a) show that: (i) LongRun already

achieves significant energy savings (from11.76% to 64.07%) compared to FULL, (ii) TS-

PM can achieve additional16.21% to 31.53% energy savings when compared to LongRun.

, and (iii) Chameleon can achieve extra3, 50% to 6, 46% energy savings as opposed to

TS-PM, and it is the most energy efficient technique among all four techniques.

Although there are no user-perceived playback problems, we do observe some inter-

frame jitter which is unavoidable in a time-sharing CPU scheduler. However, the effects of

these inter-frame jitter are not perceptible at the user-level. In Figure 4.6(b), we plotted the

percentage of frames that miss their deadline by more than20% of the frame interval. Our

results show that: (i) the percentage of late frames in Chameleon is better than TS-PM and

LongRun while consuming the least energy, (ii) TS-PM has the most late frames which is

1.6 to 6.2 times more compared to Chameleon, and (iii) FULL has the least—although not

zero—late frames while consuming the most energy.

In summary, TS-PM consumes the least energy when compared to FULL and LongRun

at the expense of more late frames, while Chameleon outperforms TS-PM in terms of both

energy consumption and late frames. This is because the Chameleon-enhanced mplayer

exploits application domain knowledge to estimate decoding times of individual frames,

while TS-PM externally observes the CPU usage of mplayer. This domain knowledge gives

us the benefit of less energy consumption and late frames. However, enhancing mplayer

with Chameleon requires modifications to mplayer’s source code, and hence, the choice of

72


2

4

6

8

Movies

Average Power Consumption of Movies Playback

Aver

age

Powe

r Con

sum

ptio

n in

Wat

ts

1.65 1.77

2.27

5.30

1.51 1.60 1.

905.

30

1.70 1.80

2.51

5.30

1.95 2.05

2.86

5.30

2.31 2.40

3.38

5.30

2.71 2.83

3.85

5.30

ChameleonTS−DVFSLongRunFULL



2%

4%

6%

8%

10%

Movies

Performance of Movies Playback

Perc

enta

ge o

f Lat

e Fr

ames

0.45

%1.

83%

0.49

%0.

15%

0.27

% 1.02

%0.

28%

0.14

%

5.17

%8.

77%

5.13

%5.

00%

0.48

%3.

24%

1.13

%0.

29%

1.20

%7.

46%

1.56

%0.

70% 1.

55%

6.29

%1.

40%

1.11

%

ChameleonTS−DVFSLongRunFULL

(b) % of Late FramesFigure 4.6. Average CPU power consumption and percentage of frames that are late bymore than 8ms (20% of the 40ms deadline).

either Chameleon or TS-PM depends on whether people want to trade off the cost of source

code modifications with higher energy efficiency and better playback quality.

In addition to movie playback, we studied the energy efficiency of TS-PM for transcod-

ing. We usedmencoder, which is an encoder tool included in MPlayer suite [66], to perform

two tasks: (i) to rescale the resolution of an MPEG4 movie, and (ii) to transcode a MPEG1

movie to MPEG4. The characteristics of these workloads are shown in figure 4.7. All these

movies use MP3 codec as their audio codec.

73

Orig. Charac. Charac. after Resc./Trans.Len. Res. Bit-Rate(Kbps) Res. Bit-Rate(Kbps)

Movies 7 610s 512x288 838.40+128.00 480x270 802.25+128.00Movies 8 608s 640x272 757.20+132.64 560x238 614.10+127.12Movies 9 600s 640x352 861.90+128.00 560x308 803.40+128.00Movies 10 610s 352x240 1150.00+224.00 352x240 735.27+224.00Movies 11 2270s 480x360 861.90+128.00 480x360 803.40+128.00

Figure 4.7. Characteristics of MPEG Videos for Rescaling/Transcoding

Our results in Figure 4.8 show that LongRun is unable to extract any energy savings

for these workloads. However, TS-DVFS can still achieve up to7.08% and14.41% energy

savings for rescaling and transcoding, respectively, when compared to LongRun and Full.

The data in Figure 4.8 also shows that TS-DVFS can extract these savings without any

significant performance degradation—the observed loss in performance is only2.73% for

the transcoding workload.

4.7.1.2 Other Applications

We also evaluated the efficiency of TS-DVFS for a variety of other application work-

loads such as build job and benchmark. We consider two such workloads: (i)Building the

Linux kernel with background MP3 audio playback, labeled as “make+mp3” in Figure 4.9;

(ii) Running Dhrystone benchmark [92] for500, 000, 000 runs.

Our results in Figure 4.9 show that LongRun is unable to extract any energy savings

when compared to FULL for “make+mp3”. In fact, LongRun incurs a 2.31% slowdown

and consumes an extra 69.27 Joules when compared to FULL (the increase in energy con-

sumption is due to the longer completion time for the build job). In contrast, TS-DVFS is

able to extract38.62% energy savings at the expense of4.03% longer execution time when

compared to LongRun.

Our results in Figure 4.9 also show that compared to the Dhrystone behaviors on FULL,

LongRun and TS-DVFS both can achieved0.419% and 4.784% energy saving without

performance degradation, respectively. These results show that TS-DVFS can achieve more

74

Movie 7 Movie 8 Movie 90

3000

6000

9000

12000

15000

Rescaling

Energy Consumption of Movies Rescaling

Ener

gy C

onsu

mpt

ion

in J

oule

s

5729

.54

6166

.18

6166

.48

5861

.73

6233

.10

6270

.44

8495

.92

9017

.32

9044

.84

TS−DVFSLongRunFULL

Movie 10 Movie 110

10000

20000

30000

40000

50000

60000

70000

80000

Transcoding

Energy Consumption of Movies Transcoding

Ener

gy C

onsu

mpt

ion

in J

oule

s

2881

4.70

3163

1.39

3188

7.54

3432

0.14

4085

3.57

4091

6.00


Movie 7 Movie 8 Movie 90

500

1000

1500

2000

2500

3000

Rescaling

Execution Time of Movies Rescaling

Exec

utio

n Ti

me

in S

econ

ds

1164

.00

1165

.00

1164

.00

1180

.00

1178

.00

1182

.00 17

01.0

017

03.0

017

02.0

0


Movie 10 Movie 110

3000

6000

9000

12000

15000

Transcoding

Execution Time of Movies Transcoding

Exec

utio

n Ti

me

in S

econ

ds

7931

.00

7720

.00

7720

.00

6099

.00

5966

.00

6018

.00


Figure 4.8. CPU Energy Consumption and Execution Times for Videos Rescal-ing/Transcoding

energy saving than LongRun while still deliver a very good performance even in the case

the system is nearly100% busy.

4.7.2 TS-DRPM Results

To evaluate TS-DRPM, we consider four disk configurations: (i) FULL, where the disk

is assumed to run at full speed with no power optimizations, (ii)TPMperf , the traditional

power management based on disk spin-down, where we assume a perfect arrival time pre-

dictor and transition the disk to sleep mode if the time to next request is long enough to

accommodate the spin-down followed by a spin-up, (iii) SIMPLE-DRPM—the technique

proposed in [40, 39]—which uses the variance in mean response times of disk requests to

estimate the I/O demand, and (iv) TS-DRPM, our time series-based technique.

We instrument the Linux kernel to gather traces of disk requests from a variety of ap-

plication mixes. We present results from two such workloads: (i) One consisting of a mix

of multimedia workloads such as movie playback, MP3 playback, and movie transcoding;

75

make+mp3 Dhrystone0

3000

6000

9000

12000

15000

Other Applications

Energy Consumption of Other Applications

Ener

gy C

onsu

mpt

ion

in J

oule

s

5405

.32 88

04.8

4

8735

.57

3103

.62

3245

.91

3259

.57


make+mp3 Dhrystone0

500

1000

1500

2000

2500

Other Applications

Execution Time of Other Applications

Exec

utio

n Ti

me

in S

econ

ds

1754

.00

1686

.00

1648

.00

610.

00

610.

00

610.

00


Figure 4.9. CPU Energy Consumption and Execution Times for Other Workloads

(ii) The other consisting of a mix of other workloads such as build jobs, web browsers,

editors, and shell terminals. We conduct trace driven simulations for the four configura-

tions and determine the energy consumption of the disk and the response time CDF. Figure

4.10 depicts our results and indicates that TS-DRPM,TPMperf , and SIMPLE-DRPM can

all achieve significant energy savings when compared to FULL. TS-DRPM yields the best

savings for both traces. It reduces the energy consumption of the multimedia workload by

9.46%, 20.35%, and36.01% when compared to SIMPLE-DRPM,TPMperf , and FULL,

respectively. For mixed workload, reduces the energy consumption by1.27%, 0.49%, and

30.79% compared to SIMPLE-DRPM,TPMperf , and FULL, respectively.

The response time CDFs indicate the performance seen by requests under the four con-

figurations. Since theTPMperf is equipped with a perfect predictor, it does not incur

any performance penalty, and its CDF curve is identical to FULL. The figure shows that

although TS-DRPM incurs a performance slowdown, the degradation is small when com-

pared to FULL.


This chapter proposes a new approach for power management in mobile devices. This

approach has the advantage of being adaptive to changing system workload and being trans-

parent to applications. Our TS-PM approach is based on time series and employs simple

statistical methods to predict future workloads and to compute power settings. TS-PM con-

76

0

20000

40000

60000

80000

100000

Mixed Workload

Energy Consumption of Hard Disk

Ener

gy in

Jou

les

TS−DRPMSIMPLE−DRPMTPMperfFULL

0

10000

20000

30000

40000

50000

Multimedia Workload

Energy Consumption of Hard Disk

Ener

gy in

Jou

les


0 20 40 600.4

0.5

0.6

0.7

0.8

0.9

1

Response Time (ms)

Response Time of Hard Disk on Mixed Workload

CDF


0 20 40 600.4

0.5

0.6

0.7

0.8

0.9

1

Response Time (ms)

Response Time of Hard Disk on Multimedia Workload

CDF


Figure 4.10. Energy Consumption and Response Times of Disk Requests for DifferentWorkload

sists of two components: (i) a time series-based DVFS (TS-DVFS) that uses per-process

utilizations to compute the task-specific CPU settings, and (ii) a time series-based dynamic

rotations per minute (TS-DRPM) technique that dynamically varies disk rotational speeds

based on the arrival rate, response times, and access patterns (hit ratios seen at the on-board

disk caches) of disk requests.

We have evaluated the energy efficiency of TS-PM through implementation and sim-

ulations. Our results show that, when compared to the LongRun technology, TS-PM re-

duces energy consumption by31.53%, 7.08%, 14.41%, 38.62% for movie playback, movie

rescaling, transcoding and Linux builds, respectively, without any significant performance

loss. The results from our simulations of TS-DRPM show that TS-PM can achieve up to

36.01% energy savings when compared to disks without any power saving features. TS-

PM yields up to20.35% savings when compared to traditional power management based

on disk spin-down.

77

CHAPTER 5

SENSOR-ENHANCED VIDEO ANNOTATION

5.1 Introduction

Content-based media retrieval has gained a lot of interests in the past decade. Orga-

nizing media in context-aware way is crucial for the success of content-based retrieval.

Among all metadata of media’s context,when, where, andwho/whatare the most impor-

tant ones [11]. People enhance searching and retrieving media by using textual annotations.

The annotations contain media’s context, and they are either manually entered [31] or auto-

matically generated by a combination of learning- and vision-based object/face recognition

techniques [18, 19, 47, 52, 69, 100]. Manual annotation is cumbersome and faces the diffi-

culty of imprecise human memory. Automatic annotation by the learning and vision-based

techniques is error prone and has high computational requirements.

Recently, numerous sensor technologies such as RFID [21] and low-power sensors [46,

62, 63, 72] have emerged, another trend is the ubiquitous deployment of positioning tech-

nologies such as GPS [3] and ultrasound [75] that triangulate the exact location of a user.

Taking advantage of these techniques, researchers record images, videos, and audio along

with sensor data such as GPS readings, light readings, temperature readings [1, 11, 17, 32,

68, 84, 87], and they use these sensor data to help media retrieval. However, these sys-

tems only record two metadata of contents-whenandwhereand miss the most important

one-who/what.

In this chapter, we consider the problem of automatically recording the most three im-

portant context metadata-when, where, andwho/whatalong with visual images and videos,

assuming a sensor-rich world. We first describe the challenges encountered. We then

78

present oursensor-enhanced video annotation (SEVA)system that solves these challenges.

SEVA records not onlywhenandwherevisual contents are taken but alsowho/whatare

present and the places they are present while capturing images and videos. Consequently,

SEVA produces a tagged stream which later can be used to efficiently search for videos and

images in context-based and even content-based ways.

Research Challenges

Numerous practical challenges arise in the design and implementation of SEVA.

• Mismatch in coverage and range:The SEVA recorder includes a video camera and

a wireless radio to record images and sensor data, respectively. Typically, the cam-

era is a directional image sensor that captures a limited view of the scene depending

on where the lens is pointing. In contrast, the wireless radio antenna is an omni-

directional device and is able to listen to sensors that are outside the viewable area

of the camera. This can result in false positives since the radio may record objects

that do not actually appear in the captured image. Even with a directional antenna,

it is difficult to precisely match the coverage of the radio and the lens; focus and

zoom-capabilities of lens further complicate the issue. Similarly, the lens can cap-

ture images of objects that are infinitely far from the camera (e.g., a distant building),

while the wireless radio has a limited range and is unable to record identities of ob-

ject that are outside its range. This results in false negatives where objects that are in

the view of the camera are unable to report their identities to the wireless radio.

• Mobility: Mobile objects and a moving camera causes objects to move in and out

of the field of view. SEVA must correctly identify which frames contain a particular

object with a high degree of accuracy.

• Limitations of power-constrained, bandwidth-poor sensors:Sensors attached to

objects are either battery-powered or passive. Due to power-constraints, battery-

powered sensors aggressively duty-cycle and use sleep modes to enhance their life-

79

times. Passive sensors such as RFID tags do not have a power source and instead

are powered by the electromagnetic signals from the wireless radio. Further, both

battery-powered and passive sensors use low-bandwidth wireless channels for com-

munication. While a video camera can record at a rate of 30 frames/second, due to

the resource constraints on sensors it is not feasible for the wireless radio to query all

objects every 33ms. Thus, sensors will respond less frequently than the intra-frame

duration, necessitating extrapolation techniques to annotate every frame.

• Limitations of positioning systems: The positioning systems can only update at a

much slower rate compared to the video rate. For instance, current GPS receivers

provide position readings at a rate of 1-10 readings per second, and current Cricket

system only provides a update rate of at most several readings per second. As a re-

sult, SEVA requires extrapolation techniques to annotate every frame with location

information. Further, SEVA requires a high degree of positioning accuracy in order

to properly identify viewable objects. Unfortunately, the current generation of po-

sitioning systems provide limited accuracy. For instance, current GPS technology

provides accuracy of 3-100 meters [3], while handling moving objects in ultrasound

has inherent problems [81]. SEVA must deal with the error that is introduced as a

result of these limitations.

The primary contribution of our work is to demonstrate the feasibility and benefits of

using sensors and locationing systems to automatically annotate video frames with the iden-

tities of objects. Our work has resulted in a number of novel techniques that are specifically

designed to address the above practical hurdles.

The mismatch in range and coverage of sensors is handled using a combination of

extrapolation and filtering. In particular, false positives are eliminated using elementary

optics and filtering techniques, while false negatives caused by a visible object that moves

out of radio range are handled using path extrapolation. To address the issue of mobile

objects as well as a moving camera, we draw upon the regression techniques and Kalman

80

filter techniques to determine the path of a mobile object and its location. To address the

issues of resource-constrained sensors and limited update rate of positioning systems, we

employ interpolation techniques to determine if an object is within range even if it did not

respond to a query or if the positioning system does not provide position reading when the

frame was captured. Finally, buffering and filtering are used to handle some, but not all, of

the inaccuracies of positioning systems.

5.2 System Model

In this section, we present the key assumptions made in SEVA. SEVA assumes a world

rich in sensors—we believe that, in the future, sensors will be pervasive, and most objects

will be equipped with one or more sensors. In general, sensors on objects will be heteroge-

neous and will be based of a mix of technologies such as RFID, Bluetooth, Zigbee, 802.11,

UWB, and future technologies.

We assume that all sensors report their identities as well as their locations when queried.

For stationary objects such as a building or a street sign, the precise location can be hard-

coded at sensor configuration time. To handle mobile objects as well as those that do not

hard-code their locations, we assume the presence of a positioning system. Currently,

we consider two types of positioning systems: GPS and an ultrasound system named

Cricket [81]. GPS is an outdoor positioning system that relies on satellites, and Cricket

is an indoor system based on ultra-sound beacons.

We also assume that the recording device incorporates four key elements: (i) a video

camera, (ii) a digital compass, (iii) a locationing system, and (iv) a wireless radio. The

camera is simply a digital recording device that captures video frames and the associated

audio. We assume that the parameters of the lens used in the camera are precisely known.

This is a reasonable assumption since these parameters are published or advertised for most

models of digital cameras and camcorders. The digital compass is used to determine the

direction where the camera is pointing at any instant; we use a 3D digital compass that

81

precisely provides both the orientation and the rotation of the camera. The camera is also

assumed to equipped with GPS and Cricket so that it can determine its coordinates both

indoors and outdoors. Together, the positioning device and the 3D Compass, in conjunction

with the lens parameters, are used to determine which part of the scene can be seen by the

camera. This automatic computation of the visual range of the camera is used to determine

which objects are in view and which ones are false positives. Finally, the wireless radio is

used to query objects for their identities and locations.

In addition to recording video, the SEVA recorder is assumed to log (i) the orientation

and rotation of the camera when the compass provides a new reading, (ii) the GPS and/or

Cricket coordinates of the camera when the positioning systems provide a new reading,

(iii) a time stamp for each frame, and (iv) the identities and the locations of each queried

object and the time when the response was received.

Assuming such an environment, we present the architecture, design and implementation

of oursensor-enhanced video annotation (SEVA)application in the following sections.

5.3 System Architecture and Design

SEVA captures two streams: one sensor data and one video and fuses them together

in a series of stages. Each step requires careful filtering and melding of object location,

object identification, camera positioning, and lens parameters. SEVA is capable of feed-

ing this annotated stream of video into a database for offline querying or to a streaming

query system. This process is broken into six key stages:video recording, pervasive loca-

tion/identification, correlation, extrapolation and prediction, filtering and elimination, and

finally database querying. Next, we describe these stages in detail.

5.3.1 Video Recording

SEVA provides a video recording module that receives video input and camera parame-

ters from any video source. The source must provide frames at a constant and known frame

82

rate, or it must time stamp each frame. This allows later stages to synchronize location

information with individual frames. The camera must also supply a set of lens parameters

to the recording module: the sensor size and the lens focal length. For lenses with fixed fo-

cal lengths—so called prime lenses—the focal length will not change from frame to frame.

However, SEVA is also capable of handling zoom lenses with variable focal lengths.

5.3.2 Pervasive Locationing/Identification

SEVA collects information about the location and identity of proximate objects. This

depends on a pervasive infrastructure that responds to broadcast messages from SEVA

through a wireless network. Any objects within wireless range respond with information

about their identity, including properties of the object.

Such infrastructures have been proposed for a broad array of systems [37, 48, 51, 77]

and future systems may use a variety of technologies and standards. SEVA is designed to

be independent from the exact technological implementation so here we only describe an

abstract set of properties that SEVA depends on.

The pervasive locationing and identification shown in Figure 5.1 produces the sensor

stream used by later stages of SEVA. The system is organized as a set of modular layers:

locationing, network, privacy, querying, and location mapping:

Camera

Pervasive Locationing/Identification

Network (RFID, WiFI)

Locationing (Ultrasound, GPS, Wifi)

Privacy Layer

Querying

Location Mapping Sensor Stream

Object

Locationing (Ultrasound, GPS, Wifi)

Querying

Network (RFID, WiFI)

Privacy Layer

Figure 5.1. Pervasive Locationing/Identification System.

83

The locationing layer provides location information to the objects as well as the camera.

The locationing system can be active, passive, or static. Active systems, such as active

ultrasound, beacon to the infrastructure, which responds with a location. Passive systems,

such as GPS, can compute locations with no transmission and only passive observations of

radio signals. Static systems use a programmed location. Active and passive systems are

best for objects that move, such as people and automobiles, whereas static systems are only

appropriate for immobile objects such as buildings and landmarks. The accuracy of these

systems is critical to SEVA’s efficacy.

The network layer provides communication between the camera and objects. As long as

the interface supports broadcasting, sending, and receiving, the particular technology used

(WiFi, Bluetooth, Zigbee, RFID) is immaterial. The range of the communication should

be sufficient to capture most objects within camera range; however, too great of a range

will affect the scalability of the system. The limited range does mean that large, distant

objects such as mountains will not be captured by the identification system—future SEVA

mechanisms will support this feature through GIS information.

A privacy layer ensures that objects can control their own visibility. While a complete

implementation of such a system is beyond the scope of this thesis, the privacy layer should

permit people to provide varying levels of information. For instance a person will provide

her name to her friend’s camera, whereas she will only provide meta-information such as

“a person” to an untrusted camera.

The querying layer manages interactions between the camera and the objects. The

camera broadcasts query messages to objects, which respond with identifying and location

information, as shown in Figure 5.2.

The locationing mapping layer maps different object locations to a frame of reference

relative to the camera. Therefore, SEVA can still compute visibility even when different ob-

jects may use different locationing systems, and enable interoperability across locationing

systems.

84

Boundary of wireless range

(b) Response

Boundary of wireless range

(a) Query

Requestor

Broadcast Query

Requestor

Response Response

ResponseResponse

Responder

Responder

Responder Responder

Figure 5.2. Query and Response Model.

5.3.3 Stream Correlation

The sensor stream needs to be time synchronized with the video stream in order to

correlate the location information in the former with specific frames in the latter. Unfor-

tunately, transmission, contention, and processing delays cause location information to be

desynchronized with the video.

Depending on whether sensors are active or passive, correlation can be done in two

ways. A straightforward implementation assumes a synchronized clock present at each

object—SEVA uses GPS receivers, cellular phone references, or NTP-based time sources.

If the sensor does not have a clock (e.g., RFID) or lacks resources to run a synchronization

protocol, then instead of a time stamp, it provides an estimate of the time from query to

response. This includes MAC layer delays (only meaningful for response from nearby ob-

jects, not available for the camera location) and internal processing. The recorder subtracts

this delay from the receipt time of the response and assigns the corrected time stamp to

the sensory information (propagation delays are assumed to be negligible). By perform-

ing this correlation, SEVA associates each query response and each camera location to the

appropriate frame.

85

5.3.4 Extrapolation and Prediction

Some per-object, per-frame location information will be missing from the correlated

sensor stream. This is due to three factors. First, sensors duty-cycle to maximize their

battery lifetime and will respond to queries only when awake. Broadcast requests will be

sent out every frame duration (e.g., every 33ms for 30 frames/s video) while sensors may

sleep for tens or hundreds of milliseconds between two wake-ups. Second, it is unlikely

that the network layer can scale its MAC protocol to the number of awake objects (due to

the possibility of MAC layer collisions). In that case the individual objects must randomly

ignore broadcast requests. Finally, the update rate of the positioning systems is slower (e.g.,

at most 10 positions/s) than the typical video rate (e.g., 25-30 frames/s). Therefore, only a

subset of frames contains the camera location.

SEVA explicitly deals with all of these scenarios by assuming: (i) each query will

obtain responses from only asubsetof the objects within radio range; and (ii) only asubset

of the frames contain the camera location. SEVA employs post-processing techniques to

account for missing responses and missing camera locations. Since we use the same post-

processing techniques for both objects and camera, in the followings, we only present our

techniques for objects, and it is straightforward to extend them to camera. Depending on

whether the objects are stationary or mobile, such interpolation is done as follows:

Rather than considering locations that are relative to the camera, SEVA considers the

absolute locations (relative to the world coordinate) of both the camera and the objects. This

can simplify the interpolation procedure because the locations logged in different frames

use the same reference coordinate frame.

Static objects: If the objects are static, extracting missing information is straightfor-

ward: we simply copy the reported location of the object to intermediate frames. In par-

ticular, if the object responds to queries at timet1 andt2 and reports the same location for

both queries this location is tagged for all frames captured between times[t1, t2].

86

time t1:(x1,y1)

t2:(x2,y2)

t3:(x3,y3). . .

camera

field ofview

derivedpath

Figure 5.3. Deriving an object’s path using curve fitting.

Mobile object: Next we consider a mobile object—determining missing location infor-

mation in this case requires a motion model. SEVA uses two different tracking algorithms:

regression technique [15] and Kalman filter/smoother [79, 94] to determine the object at

any time instant. In the followings, we present the detail of these two algorithms. As-

sume that the object has responded ton queries. Suppose that the reported locations are

(x1, y1, z1), (x2, y2, z2), . . . (xn, yn, zn) at timest1, t2, . . . tn.

Regression techniques:SEVA uses regression techniques [15] to derive a smooth curve

through the reported coordinates, which is then assumed to be the path taken by the mobile

object. Our regression technique determines the path (trajectory) of the object as a function

of time. Consequently, the location of the object at any instant can be then easily deter-

mined. Ifn = 2 then only two locations are known, and this technique reduces to a straight

line between the two reported locations. Whenn > 2, regression attempts to fit a curve

through the reported points. Since the fit is not exact, the curve that yields the least error

can be chosen. See Figure 5.3 for an example.

Our regression technique systematically triesn − 1 different curves for the best fit:

linear, a 2nd degree polynomial, 3rd-degree and so on. The polynomial can have a degree

of up to n − 1 for n known locations. The coefficients of each polynomial function are

then determined using the least squares method [15]. Finally, a coefficient of determination

is computed, which quantifies the goodness of the fit. The polynomial with the highest

coefficient of determination is chosen; if all polynomials report determination coefficients

87

less than a threshold, then the path of the object is too erratic to be approximated by a

smooth curve. In this case, we simply assume that the object moves in a straight line

between two successive reported locations (i.e., approximate the path as a sequence of

linear segments).

Since the objects reports X-axis coordinates ofx1, x2, . . . xn at timest1, t2, . . . tn, re-

spectively, the regression analysis yields ak-degree polynomial,1 ≤ k ≤ n − 1 that

represents its location along the X-axis as a function of time:

X(t) = a0 + a1t + a2t2 + . . . + akt

k (5.1)

wherea0, a1, . . . ak denote the coefficients as determined by the least squares method. Sim-

ilarly, the location along the Y and the Z-axis as a function of time is obtained:

Y (t) = b0 + b1t + b2t2 + . . . + bkt

k (5.2)

Z(t) = c0 + c1t + c2t2 + . . . + ckt

k (5.3)

Together, the functionsX(t), Y (t) andZ(t) enable us to determine the X, Y and Z co-

ordinates of the object for any time instantt between[t1, tn]. Thus, the missing location

information can be determined for every intermediate frame.

Kalman filter/smoother:SEVA exploits an extended Kalman filter (EKF) to track the

object’s movement. This EKF uses a state vector with 6 components, 3 position compo-

nents(x, y, z) and 3 velocity components(vx, vy, vz), and is inspired by the work of Smith

et.,al [81]. In particular, after getting a position sample, the EKF knows its state and how

confident it is in that state (a covariance matrix). Therefore, the EKF can use its internal

state to predict where the object might be in the future. When the next position sample

arrives, the EKF then corrects its internal state according to the difference between the pre-

dicted position and the actual reported position. Finally, the EKF uses the corrected state

88

vector as the location and velocity estimate at the time instant of the new sample. In sum-

mary, the EKF estimates object’s state (position and velocity) by using a form of feedback

control: the filter estimates the object’s state at some time and then obtains feedback in

the form of noisy position samplings. Next, we will show the detail of the prediction and

correction steps of the EKF in more detail.

In the prediction step, suppose the predicted state at thej−1th position sample isS−j−1,

the corrected state isS+j−1, and the reported position isMj−1 = (xj−1, yj−1, zj−1). Assume

the object moves at a constant velocity between position samples, the predicted stateS− at

time∆t after thej − 1th sample is given byS− = f(S+j−1) as follows:

x− = x+j−1 + v+

x ∗∆t

y− = y+j−1 + v+

y ∗∆t

z− = z+j−1 + v+

z ∗∆t

v−x = v+x

v−y = v+y

v−z = v+z (5.4)

in which f is a non-linear function, and we omit the subscript of(v+x , v+

y , v+z ) for clarity

and(x−, y−, z−) is the predicted position.

Now, assumeP is the 6 x 6 covariance matrix for the state vector. Then, the covariance

matrix is predicted as:

P− = AP+j−1A

T + Qj−1 (5.5)

whereP+j−1 is the corrected covariance matrix after getting thej − 1th position sample,A

is the state transition matrix specific to our model and given by the Jacobians of function

f , andQj−1 reflects how the quality of the state vector degrades over time.

In the correction step, once having thejth position sample, we can correct the predicted

state and covariance matrix accordingly. The EKF computes the difference between the

89

predicted position and the newly reported position sample, and then it incorporates the

reported position into the predictor state vector to reduce the difference and improve the

state estimate. The basic idea is to compute the weighted output between the predicted

state and the newly reported position based on their relative covariances which show our

confidence in these values. In the followings, we overview the underlying mathematics of

the correction step.

Suppose thejth position sample isMj, and the variance matrix of the new position

sample isRj given by the object; the predicted state vector and covariance matrix is given

by S−j andP−

j , respectively. We define a non-linear measurement functionh(S) that com-

putes the expected position of the object given a state vectorS, andH is the Jacobians ofh.

Therefore, the corrected state vectorS+j and the corrected covariance matrixP+

j is given

by:

Kj = P−j HT

j (HjP−j HT

j + Rj)−1 (5.6)

S+j = S−

j + Kj[Mj − h(S−j )] (5.7)

P+j = (I−KjHj)P

−j (5.8)

in which Kj is the Kalman gain which weight the corrected outputS+j between the pre-

dicted stateS−j and the newly reported position sampleMj according to their relative co-

variances. If the measurement noise is large, thenKj decreases, and the corrected output

S+j approaches the predicted stateS−

j . In contrast, if the measurement noise is small, then

Kj increases, and the corrected outputS+j approaches the new measurementMj. Further-

more, the corrected state estimateS+j can be used to backwardly improve the state estimates

S+j−1, S+

j−2, ...,S+0 . This backward recursion is called Kalman smoothing [79]. A complete

picture of the operations of the EKF is presented in Figure 5.4 which is modified from the

figure in the wonderful introduction of Kalman filter from Welch and Bishop [94].

90

)( 1−− = jj SfS

11 −Τ

−− += jjj QAAPP

1)( −Τ−Τ− += jjjjjjj RHPHHPK

)]([ −− −+= jjjjj ShMSS K

−+ −= jjjj PHKΙP )(

Time Update (“Predict”)Measurement Update (“Correct”)

(1) Project state ahead

(2) Project error covariance ahead

(1) Compute Kalman gain

(2) Update estimate with measurement Mj

(3) Update error covariance

Initial estimate for Sj-1 and Pj-1

Figure 5.4. The Operation of the extended Kalman filter.

In reality, we don’t have position sample for every time instant, and consequently, the

EKF cannot run correction step for every time instant. To solve this, we use the predicted

state vector as the object’s state estimate for the time instants without position sample; we

use the corrected state vector as the object’s state estimate for the time instants with position

samples.

Discussion:As we have shown above, when doing interpolation, regression techniques

do not consider the variance of the measurements, while the extended Kalman filter tech-

nique takes the variance into account. Without considering the variance, the interpolation

performance of regression techniques completely relies on the goodness of measurements.

In the contrast, the EKF tries to balance between our confidence in the system model and

the measurements. If the measurement is more reliable (variance is small), then the output

of the EKF is more consistent with our measurements. If the measurement is less reliable

(variance is large), then the output is less consistent with our measurements. By taking

into account the variance of measurements, we expect that EKF will outperform regression

techniques when the variance of measurements is large or not constant.

In our current implementation of the EKF, the non-linear difference functionf and

the non-linear measurement functionh is fixed and given by users. Its goodness greatly

91

affects the performance of the EKF in interpolating object’s location. Instead of fixing

these two non-linear functions, users can also use the unscented Kalman filter (UKF) [49] to

dynamically approximate the system dynamics and estimate these two non-linear functions.

By using these techniques, the UKF consistently achieves a better level of accuracy than

the EKF with higher computational complexity. In the interest of computational cost, we

use the EKF in this thesis.

Extrapolation: Our regression technique enables us to interpolate the location of an

object given its path for an interval[t1, tn]. However, this does not yield any location

information for frames captured before timet1 and those captured after timetn. This is

useful when an object goes out of the range of the wireless radio but remains in view of

the camera (e.g., an object that is steadily backing away from the camera). Once the object

leaves the wireless radio range its presence is no longer detected yielding false negatives.

The trajectory computed by the regression analysis can be used to extrapolate this in-

formation and annotate a small number of frames beforet1 and aftertn. Extrapolation of

the path beyond the intervals[t1, tn] enables us to eliminate some of these false negatives.

This extrapolation can be done only for a few frames (e.g., for a few seconds) in order to

reduce errors caused by a change in trajectory after the object leaves the wireless range.

Currently, our prototype uses a configurable parameter to determine the number of frames

for which location information is extrapolated beyond the[t1, tn] interval.

As to the Kalman filter techniques, we can simply use the state vectors att1 andtn to

extrapolate the location information for frames captured before timet1 and those captured

after timetn. As discussed previously in this section, our confidence (covariance matrix)

in the state vector will degrade over time. Similarly, we can use a configurable parameter

to control the number of frames for which location information is extrapolated beyond the

[t1, tn] interval.

92

5.3.5 Filtering and Eliminating

After the extrapolation and prediction stage, every video frame has been annotated with

the location information of camera and proximate objects and SEVA must now determine

which objects are within the camera’s field of view.

For each frame SEVA constructs a field of view based on an optics model, the camera’s

focal length, and parameters of the camera’s sensor. As shown in Figure 5.5, letf denote

the focal length of the lens and lety denote the height of the CMOS sensor of digital

camcorder. This implies that the camcorder has a viewable angleα = 2tan−1 y2f

. At a

distanced from the lens, the camera can see a view that ish = fd· y. So if the object

is within h2

of the camera’s axis, it is considered in view, otherwise it is out of view. In

Figure 5.5, the objectA is in the view and objectB is out of view. The figure shows

a two dimensional model, and it easily extends to three dimensions. Using this model,

combined with the location information, SEVA determines which objects are in the view

of the camera.

Camera Sensor

Focal Length (f)

Sen

sor H

eigh

t (y)

Distance to Object (d)

Field of View

Height (h)Object A

Object B

Figure 5.5. The Basic Optics Model

This model does not take obstructions into account and SEVA will believe that some

objects that are hidden by walls are actually visible. One possible solution is to use the

calculated distance with radio power control and a free-space communications model to

estimate whether the object is obstructed. Similarly the object may be out of focus and

therefore not visible. Some cameras have variable apertures and optics that provide the

93

depth-of-field of the image. This allows us to compute whether objects are in or out of

focus and tag them appropriately.

5.3.6 Query and Retrieval

This module consists of a storage system for annotated video and tools for query and

retrieval. The storage system stores videos and corresponding annotations separately; the

annotations and videos are synchronized and linked by the video’s frame index; the location

information in the annotations is translated into user-readable format (e.g., CS Building,

Room 101). A tool allows users to query and retrieve videos of interest. Queries can

specifywhena video was captured,whereit was captured, andwho is in the video. The

search engine then searches video annotations produced by SEVA and returns the videos’

frame indexes satisfying the query. Finally, the returned frame indexes can be used to

retrieve video clips from storage.

5.4 Implementation

To provide a test platform, we have constructed a prototype system based on a Sony

Motion Eye web-camera connected to a Vaio laptop. The location and identity querying,

correlation, extrapolation and prediction, filtering and elimination, and database storage

software runs on the laptop. SEVA currently uses two 3-D locationing systems for the

camera and objects: GPS and the Cricket Ultrasound locationing system. To obtain the ori-

entation of the camera we augmented the laptop with a Sparton SP3003D Digital Compass

that provides the orientation (heading, pitch, and roll) of the camera’s lens.

Video Recording. The CMOS-based camera provides uncompressed 320x240 video

at 12 frames-per-second. The camera has been set to a fixed focal length of2.75mm,

and uses a sensor size of2.4mm by 1.8mm. The video recording module uses an MPEG

encoder(ffmpeg0.4.8 [20]) to record video.

94

Figure 5.6. SEVA recorder laptop equipped with a camera, a 3D digital compass, a Motewith wireless radio and Cricket receiver, a GPS receiver, and 802.11b wireless.

Pervasive Location/Identification.Outdoors, SEVA uses Deluo GPS receivers equipped

with WAAS correction [14], connected to the laptop to locate the camera and the object.

The GPS unit provides latitude, longitude, and altitude, and it provides an accuracy of 5-15

meters [14].

Indoors, SEVA employs an ultrasound locationing system called Cricket [75]. Using a

network of ultrasound sensors built onto sensor boards, Cricket can provide 3-D locations

with an accuracy of a few centimeters. Cricket can be used in two modes: active and

passive. In the current implementation, SEVA uses the active mode as it is more accurate.

In the future SEVA will use the passive mode as it scales to a larger number of objects.

The pervasive locationing and identification system uses two different network layers

to communicate with the objects. Outdoor objects are laptops equipped with WiFi and

indoor objects are Mica2 [46] low-power sensor boards equipped with 900 MHz short-

range radios. The laptop communicates with the objects using a sensor board of the same

type. A simple broadcast-based query protocol is implemented between the Linux-based

recorder and the Mica2 nodes.

Correlation. As GPS provides a globally synchronized clock among GPS receivers, we

use this clock to correlate the location information with specific frames. Since the Cricket

system doesn’t provide such a globally synchronized clock, SEVA simply correlates the

location information with specific frames by subtracting the mean processing and/or MAC

95

layer delay from the receiving time of sensor data and assigning the corrected time stamp

to the sensory information.

Extrapolation and Prediction. As discussed in Section 5.3.4, we use regression analy-

sis to find the mathematical relationship between location and time, and we use the Kalman

filtering/smoothing technique to estimate the object’s location. Because the camera’s 3D

orientation will affect the result of filtering and elimination, we also apply the regression

analysis and the Kalman filtering/smoothing technique to the camera’s 3D orientation.

Filtering and Elimination. In this stage, objects’ coordinates are transformed into a

coordinate system with the camera as the origin. This transformation is straightforward for

the Cricket system since we can easily subtract the camera’s coordinate from the objects’

coordinates. The transformation for a GPS system requires computing the distance between

camera and object, and we use the GPS Drive package for this purpose [36].

Indexing and Querying. The results of filtering and elimination are inserted into

a MySQL database, while the videos are stored into the laptop’s file system. Before

SEVA adds annotations to the database, the outdoor GPS position (e.g., latitude, longi-

tude, and altitude) is translated into user-readable format (e.g., parking lot 45, UMass) via

geocoder[33], and the indoor Cricket location is translated into user-readable format (e.g.,

CS Building, Room 101) by extracting a user-readable location from the Cricket system.

For each video there is an entry in the database recording its start time and end time; for

every frame in a video there is an entry in the database recording its shooting time and

location; and for every annotation there is an entry in the database containing the object

identity and index of the corresponding video frame.

We have also implemented a simple GUI retrieval tool for content-aware queries on this

database. This tool supports queries onwherethe video was captured (e.g., CS Building,

Room 101),whenit was captured (e.g., morning of May 23, 2005), andwho is present in

the video (e.g., car, book, building) and retrieves the indexes of all annotated frames that

96

match this query. Finally, these frame indexes are used to retrieve frame sequences from

videos.


In evaluating SEVA, we set out to answer the following questions:

• How accurate is SEVA in tagging frames with a moving camera, moving objects, and

with different locationing systems?

• How well does SEVA scale to larger numbers of objects?

• What is the overhead in using SEVA?

To answer these questions we used three different locationing systems: the Cricket

ultrasound system, GPS, and static locationing. We setup the Cricket locationing system in

a4m x 10m x 3m room with five Cricket receivers mounted on the ceiling that serve as the

reference points for object and camera locationing. The origin of the coordinate system is

one of the corners of the room and the range of x, y and z is[0cm, 400cm], [0cm, 1000cm],

and[0cm, 300cm], respectively. Our GPS experiments were conducted in a large parking

lot with a clear view of the southern horizon. As the altitude did not vary significantly

for object and camera positions, we did not use it in any of our experiments. The camera

records all video at 12 frames/s.

To determine SEVA’s accuracy in tagging frames, we subject the system to five exper-

iments: a) the object and camera are both static, b) the object is moving in a straight line

and the camera is static, c) the camera is moving in different patterns and the objects are

static, d) the object is moving with semi-random trajectories and the camera is static, and

e) both the object and the camera are moving with semi-random trajectories. In these ex-

periments, we place the object in different positions—some inside the view of camera and

some outside the view of camera—and evaluate the error rate of our system when deter-

mining the viewability of objects. We selected the error rate or number of frames in error

97

Trajectory 3(x=200cm,

z=3cm)

Trajectory 1(y=550cm,

z=3cm)

Trajectory 2(y=650cm,

z=3cm)

CameraPosition

(x=223cm, y=350cm,z=57cm)

Figure 5.7. The layout of static experiments using Cricket.

as the evaluation criteria. An error occurs when SEVA tags a frame as containing an object

when it doesn’t (false positives), or it tags a frame as not containing an object when it does

(false negatives).

It is important to note that the objects that we are using to evaluate the system are only

a few square centimeters in size. Larger objects, such as people, may have inaccuracies in

the positioning information that is made up by straddling the line between viewable and

non-viewable. We leave the issue of partially viewable objects as future work.

5.5.1 Static Object, Static Camera

5.5.1.1 Cricket Locationing System

To evaluate SEVA’s performance with static objects and a static camera, we place an

object at a large number of positions along three different trajectories. The setup for this

experiment is shown in Figure 5.7. The camera is set up at(223, 350, 57) with its lens

pointing horizontally along the positiveY axis and having0◦ pitch and roll. We place

a single object (simply a Cricket node) at different positions along the three trajectories:

y = 550cm, y = 650cm andx = 200cm. As most of the errors are made very close to

the viewability boundary, we took readings every2.5cm near the boundary, and every5cm

when the object was at least30cm from the boundary.

For each object position we take100 frames, and for each frame we record the 3D

orientation of the camera and the coordinates of the camera and object. These coordinates

98

0%

20%

40%

60%

80%

100%

50 100 150 200 250 300 350 400

Err

or R

ate

X (cm)

Error Rate for Test Points with y = 550cm

Viewable Boundary

0%

20%

40%

60%

80%

100%

50 100 150 200 250 300 350 400

Err

or R

ate

X (cm)

Error Rate for Test Points with y = 650cm

Viewable Boundary

(a) Trajectory 1 (b) Trajectory 2

0%

20%

40%

60%

80%

100%

460 480 500 520 540 560

Err

or R

ate

Y (cm)

Error Rate for Test Points with x = 200cm

Viewable Boundary

(c) Trajectory 3Figure 5.8. The error rate of static experiments using Cricket.

are then fed into the SEVA system and we manually reviewed SEVA’s results to evaluate the

error rate (false positive for non-viewable objects and false negative for viewable objects).

The results of this experiment are shown in Figure 5.8.

As shown in Figure 5.8(a) and 5.8(b), the error rate is less than20% when the object is

along the boundary, and the error rate quickly drops to single digits when the object is only

2.5cm away from the boundary and to zero when it is only7.5cm away. One exception

occurs on Trajectory 2, and we get close to40% error rate when the object is along the

viewable boundary. We believe that this is caused by interference with the ultrasound

system from a nearby structural pillar.

99

Figure 5.8(c) shows that the error rate along the viewable boundary for Trajectory 3

is around50%, and it drops to zero percent when the object is only10cm away from

the boundary. The reason for this larger error rate along the viewable boundary is that

the measured location of the camera is5cm to 7cm lower than its real position, and the

measured location of the object is2cm to 3cm higher than its real position in most cases.

This type of error may come from the arrangement of Cricket reference points’ position

and could possibly be corrected by a different arrangement.

5.5.1.2 GPS Locationing System

We conducted a similar experiment with a GPS locationing system. GPS provides

latitudes and longitudes relative to the equator and prime meridian; however, for readability

we translate this coordinate system into (x, y) coordinates with the camera at the origin and

the camera pointing along the Y axis.

As shown in Figure 5.9, we used different positions along three trajectories:y = 10m,

y = 20m, andy = 80m. The positions are separated by a3m step size starting30m

from the viewable boundary and ending at the center of the field of the view. For each

position, we take100 pictures, and for each picture we record the 3D orientation of the

camera and the (x, y) coordinates of the camera and object. We then manually verify that

SEVA produces the correct results and record the error rate (false positive for non-viewable

objects and false negative for viewable objects). The results are shown in Figure 5.10.

Our results show that SEVA has more than20% error rate when the object is within15

meters from the boundary, and when the distance to boundary is more than18 meters the

error rate drops to zero. The low performance is due to the low accuracy of GPS (5-15m).

5.5.2 Dynamic Experiments

To evaluate SEVA’s extrapolation and prediction mechanisms, we performed two sets

of experiments: (i) mobile object with a stationary camera and (ii) stationary object with a

100

Trajectory 1: y=10m2: y=20m3: y=80m

CameraPosition(x=0m, y=0m,z=0m)

Figure 5.9. The layout of experiments using GPS.

mobile camera; and (iii) both object and camera are mobile. The video clips were reviewed

manually as before to determine which frames had erroneous annotations.

5.5.2.1 Static Camera, Dynamic Objects

When the object is moving and the camera is static the critical factor affecting SEVA’s

accuracy is the speed of the object relative to how often SEVA updates the object location.

If the object speed is very high in relation to the object location, it will mis-extrapolate the

object position and make mistakes in tagging objects as in or out of the field of view.

To explore this point we constructed two experiments: a repeatable experiment using

a straight-line trajectory, and a non-repeatable experiment using a semi-random path.Re-

peatable Experiment: To construct a repeatable experiment we use an object moving at

different speeds and updating its position at different intervals. In order to make the ex-

periments as repeatable as possible we designed a test apparatus. We hung a fishing line

across the camera’s field of view at an angle and attached the object to a pulley (see Figure

5.11). When we release the object it accelerates down the line and then stops at the bottom.

We can change the acceleration of the object by changing the gradient of fishing line. We

accelerated the object across the camera’s field of view using three different slopes:7.6◦,

10.93◦, and19.47◦ and the characteristics of these different slopes are shown in Figure 5.12.

The object updates its position using the Cricket ultrasound system and it can reliably

update its position at most once every250ms. In this experiment we used three different

101

0%

20%

40%

60%

80%

100%

0 5 10 15 20 25 30 35

Err

or R

ate

X (m)

Error Rate for Test Points with y = 10m

Viewable Boundary

0%

20%

40%

60%

80%

100%

0 5 10 15 20 25 30 35 40

Err

or R

ate

X (m)


Viewable Boundary

(a) Y = 10m (b) Y = 20m

0%

20%

40%

60%

80%

100%

0 10 20 30 40 50 60 70

Err

or R

ate

X (m)


Viewable Boundary

(c) Y = 80mFigure 5.10.The error rate of static experiments using GPS.

beacon intervals:250ms, 500ms, and1000ms. In our current setup, once we chose a

specific beacon interval, all objects will use the same interval. This is appropriate when all

objects move at similar speed. If we have an environment including a mix of fast moving

objects and slow moving objects, it is unsuitable to use the same beacon interval for all

objects because: (i) if the beacon interval is too large, the fast moving objects can not

get their positions updated in time, and thus the extrapolation technique may miss some

changing points of object’s moving trajectory. ; and (ii) if the beacon interval is too small,

the slow moving objects wakeup unnecessary frequently to update their position, and thus

the energy is wasted. To address this problem, we can vary the beacon interval according

to the object moving speed.

102

pulley+object

field of view

camera

Figure 5.11.Mobile object on a pulley.

Slope 1 Slope 2 Slope 3Gradient 7.6◦ 10.93◦ 19.47◦

Length 303cm 350cm 360cmAVG. Speed 86.57cm/s 145.83cm/s 205.71cm/s

Time 3.5s 2.4s 1.75sLength in Viewable Area 150cm 228cm 240cm

AVG. Speed in Viewable Area 112.06cm/s 181.90cm/s 271.77cm/sTime in Viewable Area 1.34s 1.25s 0.88s

Figure 5.12.Characteristics of different slopes.

For each slope and each beacon interval we encoded ten videos and used SEVA to

determine the object’s viewability of each frame. We manually compared SEVA’s results

with the original video on a frame-by-frame basis and evaluate which frame tags were in

error.

As before, incorrect decisions are made only when the object is close to the viewable

boundary. In these experiments that situation occurs either when the object enters or exits

the viewable area. The large number of frames in these experiments would make the error

rate appear very small, so instead of presenting an error rate, we present the absolute num-

ber of frames that are in error. When later querying the video for sequences including a

particular object, this metric determines how many extra or missing frames will be included

or excluded from the sequence. The result is taken over the average of all ten experiments.

We compare two systems: a full version of SEVA and a version of SEVA that does not

perform any extrapolation. The results are shown in Figure 5.13.

103

0

2

4

6

8

10

300 400 500 600 700 800 900 1000

AVG

. Erro

r Fra

mes

Beacon Interval (ms)

AVG. Error Frames

w/o Extrapolation-Enter FDVw/o Extrapolation-Leave FDVcurve fitting-Enter FDVcurve fitting-Leave FDVkalman filter-Enter FDVkalman filter-Leave FDV

0

2

4

6

8

10

300 400 500 600 700 800 900 1000

AVG

. Erro

r Fra

mes


AVG. Error Frames

w/o Extrapolation-Enter FDVw/o Extrapolation-Leave FDVcurve fitting-Enter FDVcurve fitting-Exiting FDVkalman filter-Enter FDVkalman filter-Exiting FDV

(a) Slope 1 (b) Slope 2

0

2

4

6

8

10

300 400 500 600 700 800 900 1000

AVG

. Erro

r Fra

mes


AVG. Error Frames

w/o Extrapolation-Enter FDVw/o Extrapolation-Leave FDVcurve fitting-Enter FDVcurve fitting-Leave FDVkalman filter-Enter FDVkalman filter-Leave FDV

(c) Slope 3Figure 5.13.Mean frames in error for a mobile object and static camera.

The results demonstrate that without extrapolation the average number of frames in

error increases from1.8 to 7.0 as the beacon interval increases from250ms to 1000ms.

The slower beacon interval forces SEVA to use old measurements of the object’s position

and cannot correct for them using extrapolation. The performance ofRegression techniques

andKalman filter techniqueare pretty similar, and the average number of frames in error

of both methods are less than1 and are fairly constant across beacon intervals.

The worst case occurs when the object is exiting the viewable area under the highest

acceleration and the beacon interval is the slowest. In this scenario the object leaves the

viewable area at375cm/sec, reaches the end of the wire, and suddenly stops. This rapid

deceleration causes the extrapolation method to fail and SEVA misplaces the object at in-

104

Figure 5.14.Remote control toy car with a Cricket node on the top.

tervening frame intervals. Given a faster beacon interval it is more likely that a beacon will

occur after the object leaves the viewable area, but before the object stops. This means that

two beacons straddle the exit from the viewable area and SEVA extrapolates the position

correctly.

Non-Repeatable Experiment: In the repeatable experiment, the object moves in a

straight line. Although this stresses SEVA’s extrapolation system, it does not require higher-

order regression analysis to determine the linear path. To test a more complex path we

recorded a new object: a remote control toy car with a Cricket node attached to the top. We

randomly moved the car around the room for5 minutes while recording the car with SEVA

The car moved in and out of the camera’s field of many times during the experiment and

we evaluated the performance in the same manner as before. Our results show that: (i) with

extrapolation usingregression techniques, the mean number of frames in error is around

2; and (ii) with extrapolation usingKalman filter, the mean number of frames in error is

around1.9. This is only slightly larger than object moving in a straight line.

5.5.2.2 Dynamic Camera, Static Object

If the camera is moving, but the objects are static, SEVA must interpolate the position

as well as the orientation of the camera. To test this function with a variety of movement

patterns, we placed4 − 5 objects separated by equal distance, and moved the camera in

three patterns as shown in Figure 5.16: (a)straight line, the camera moves in a straight

105

Straight Line Rotation z LineSlow 50cm/sec 25◦/sec 50cm/secFast 80cm/sec 60◦/sec 80cm/sec

Figure 5.15.Characteristics of different speeds.

line without changing the orientation of the lens; (b)rotation , the camera moves and the

lens’ orientation changes; (c)z-line, the camera moves in a z-shaped line without changing

its lens’ orientation. We evaluated SEVA’s performance using the frame error metric as

before. For each movement pattern we ran experiments under two different speeds labeled

slow and fast. The characteristics of these speeds are shown in Figure 5.15. In all cases we

used the full SEVA system with a location beacon interval250ms. Again we only report

the number of frames that are in error. The results are shown in Figure 5.17.

Object

DirectionCamera Moving

(c) z−Line

(a) Straight Line

(b) Rotation

Figure 5.16.Path of a mobile camera.

The results show that regression techniques and Kalman filter perform similarly. For

the straight line the average number of error frames, which is less than1.0, is comparable

to when the object is moving and the camera is stationary. When the camera moves in a

circle the average error frames is less than 2. We have traced these errors to variances in

the digital compass’s readings when the heading changes and the latency of digital compass

(up to100ms). When the camera moves in a z-line the average error frames is around1.2.

Although we don’t change the lens’ heading, SEVA’s interpolation fails when the camera

makes a sharp turn, slightly increasing the average number of error frames.

106

Straight Line Rotation z - LineCurve Slow 0.80 1.78 1.20Fitting Fast 0.70 1.67 1.30Kalman Slow 0.77 1.74 1.19Filter Fast 0.69 1.65 1.27

Figure 5.17.Mean frames in error for a mobile camera.

5.5.2.3 Dynamic Camera, Dynamic Object

When both camera and object are dynamic, SEVA must extrapolate the position of ob-

ject, and the position as well as the orientation of the camera. To test SEVA’s performance

under this scenario, we recorded the remote control car with a Cricket node attached to the

top (see Figure 5.14) using a mobile camera. We randomly moved the car around the room

for 10 minutes, and at the same time, we also randomly moved the camera around the room

in the combinations of the moving patterns as shown in Figure 5.16. The car was in and out

of the camera’s field of view many times during the experiment and we evaluated SEVA’s

performance using the frame error metric as before. Our results show that: (i) theregres-

sion techniquesand theKalman filterhave similar performance, and (ii) with extrapolation,

the mean number of frames in error is around 3. This larger error is due to the movement

of both camera and object. When either camera or object is static, the position of only one

of them is extrapolated from reported values, the other always uses the reported position.

While when both the camera and the object are static, for the same frame, the positions of

both the camera and the object may be extrapolated from reported values. These extrapo-

lated positions have larger errors than the reported positions, and as a result, we have larger

mean number of frames in error than when either camera or object is static.

5.5.3 Scalability

As discussed in Section 5.3, the camera uses periodic broadcast messages to query for

nearby objects. If there are a large number of objects within radio range, the radio’s MAC

layer may not scale to handle a large number of simultaneous responses. To test the scal-

107

ability of our current prototype we video recorded a large number of objects programmed

with static locations.

To create a larger number of objects we used low-bit rate wireless sensor nodes called

Motes [46], specifically Mica2 and Mica2dots. These nodes are representative of future

object tags due to their small size, low computational power and low energy consumption.

The Mica2 radio only supports a raw transmission rate of19.2 Kbps, and the effective

throughput is12.364 Kbps or42.93 packets/sec.

The scalability of the system is determined by the frequency at which the camera sends

queries relative to the number of objects and the rate of messages the radio can handle.

The maximum packet rate is fixed so we constructed an experiment with a variable number

of objects and query frequencies. We measure the response rate, which is the ratio of

responses the camera got (we only considered responses that were at most one beacon

behind) compared to the number of objects. The results are shown in Figure 5.18.

0%

20%

40%

60%

80%

100%

5 10 15 20 25

Res

pons

e R

ate

Number of Motes

Response Rate in Scalability Test

1 beacons/sec2 beacons/sec4 beacons/sec

Figure 5.18.Response rate of Motes.

The results show that the prototype can achieve100% response rate for up to4 objects

under all beacon frequencies. It achieves more than90% response rate for up to10 re-

sponders under all beacon frequencies. However the response rate for4 beacons/sec drops

quickly and almost linearly with more than10 responders, and it is72.3% with 15 respon-

108

ders and43.4% with 25 responders. The response rates for1 beacon/sec and2 beacons/sec

are almost the same with up to20 responders, while the response rate of2 beacons/sec

drops quicker than the response rate of1 beacons/sec after that.

A combination of these results with those of the dynamic object experiments indicate

that the current prototype should scale well to 10 fast moving objects. If the environment

includes a mix of fast moving objects and slow moving objects, further scalability can be

achieved if slow moving objects respond less frequently to beacons.

5.5.4 Computational Requirements

We measured the computational requirements of each of SEVA’s stages. The correla-

tion and the extrapolation modules impose a small computational overhead on the laptop

(less than100µs for each object); the filtering module imposes a200µs overhead for each

object. Unlike GPS systems, the Cricket sensor gives the distances to beacons instead of

3D coordinates, thus the laptop must solve a set of linear equations to compute the 3D co-

ordinates. This computation costs around150µs for each object. These results show that

our system incurs small overhead and will run online on relatively inexpensive hardware.

5.5.5 Summary and Discussions

Our experiments show that: (i) using Cricket, SEVA has zero error rates for static

objects when the object is only 7.5-10cm away from the boundary; (ii) using GPS, SEVA

has more than20% error rate when the object is within 15m from the boundary, and the

error rate drops to zero when the distance to boundary is more than 18m; (iii) for moving

objects or a moving camera SEVA only misses objects leaving or entering the viewable by

1-2 frames (80ms/frame); (iv) SEVA prototype can scale well to 10 fast moving objects

using current sensor technology.

SEVA’s performance and scalability are largely affected by the limits of current sensor

technology: (i) the larger error rate when using GPS is due to the low accuracy (5-15m) of

current GPS technology; (ii) the low scalability (no more than 10 fast moving objects) is

109

due to the low bandwidth (19.2Kbps) of Motes. However, we expect these problems will

be solved in the near future as sensor technology evolves: (i) SEVA using GPS or GPS-

like technology (Galileo) will have similar or comparable performance compared to SEVA

using Cricket in a few years as GPS is expected to reach 1-5m accuracy by the year 2013

with further improvements after 2016 [35], and Galileo is expected to reach less than 10cm

accuracy (compared to 3-5cm accuracy in Cricket) by 2008 [29]; (ii) SEVA can scale to ten

times more fast moving objects as the newest MicaZ mote has 250Kbps bandwidth [10].


In this chapter, we consider the problem of automatic sensor annotation of multime-

dia. The solution of this problem can help to organize multimedia content in context-aware

ways. The context metadata of multimedia can be used to search and locate content of

interest, and they are crucial for the success of content-based media retrieval. We iden-

tify the problems with existing solutions. To mitigate the limitations of these solutions,

we propose SEVA—a sensor-enhanced video annotation system. SEVA demonstrates the

feasibility and benefits of using sensors and locationing systems to automatically annotate

video frames with the identities of objects.

110

CHAPTER 6

RFID LOCALIZATION FOR PERVASIVE MULTIMEDIA

6.1 Introduction

In previous chapter, we present SEVA—a sensor-enhanced video annotation system.

SEVA exploits the pervasive locationing and identification services provided by GPS re-

ceivers [3] and Cricket ultrasound sensors [75] to automatically annotate video frames with

the identities and locations of objects. Our evaluations in Section 5.5 show that SEVA can

annotate video frames with high accuracy. However, GPS receivers and Cricket sensors are

all battery-powered, and they cost tens of dollars. Their operational lifetimes last for only

a few days, and people need to put a lot of effort in battery maintenance. Consequently,

these devices are only suited for high-unit-value objects such as cars and people, and they

are impractical for identifying and locating low-unit-value objects such as books and CDs

which have large volumes. For these kinds of objects, passive RFID (radio frequency iden-

tification) tag [21] is an ideal choice.

RFID tags are designed to replace bar-codes [89]. Each tag contains a numeric code

that uniquely identifies the object and can be queried by a wireless reader. It is likely that

in the near future many personal objects (e.g., books, clothing, food items, furniture) will

be equipped with self-identifying RFID tags. The ubiquity of RFID tags and the pervasive

nature of multimedia recording devices—enables novel pervasive multimedia applications

with automatic, inexpensive, and ubiquitous identification and location abilities. By equip-

ping cameras with RFID readers, it is possible to record images as well as the identities and

locations of all RFID-tagged objects contained within each image. The captured video can

then be queried in real-time to display the location of a particular object. The ability to pin-

111

point objects by their RFID identities within video streams enables many new applications.

For instance, users can use them to locate a misplaced book on a bookshelf. Robots can

use such devices to conduct real-time identification and search operations. Vision-based

applications can use them to quickly learn the structure or organization of a space. Inven-

tory tracking applications can proactively generate missing object alerts upon detecting the

absence of an object.

While the inexpensive nature of RFID tags eases large-scale deployment issues, their

passive nature raises a number of hurdles. A key limitation is that passive RFID tags are

self-identifyingbut notself-locating(i.e., upon being queried, a tag can report its identify

but not its location). Consequently, if multiple objects are present in a captured image,

it is not possible to distinguish between these objects or pinpoint their individual loca-

tions. Some of the above scenarios (e.g., pinpointing a misplaced book on a bookshelf)

require location information in addition to object identities. While numerous locationing

technologies such as GPS and ultrasound [75, 90, 91] are available, it is not possible to

equip passive RFID tags with these capabilities due to reasons of cost, form-factor and

limited battery life. Instead, we require a locationing technology that does not depend on

modifications to tags, is easily maintained, and scales to hundreds or thousands of tagged

objects.

In this chapter, we examine the problem of RFID-based locationing system. We then

present our approach—Ferret that overcome these limitations. Our approach provides per-

vasive locationing and identification services to objects attached with passive RFID tags.

The rest of this chapter is structured as follows. In Section 6.2, We provide the background

and identify limitations of existing RFID locationing techniques. Section 6.3 presents the

problem formulation and a high-level design of Ferret, while Section 6.4 presents the de-

tails of our RFID locationing system. Section 6.5 presents our implementation and Section

6.6 experimental results. Finally, Section 6.7 present our conclusions.

112

6.2 Background

Researchers have developed RFID-based indoor locationing systems [45, 70] using ac-

tive and battery powered RFID tags. In SpotON [45], Hightower, et. al, use the radio signal

attenuation to estimate tag’s distance to the base stations, and triangulate the position of the

tagged objects with the distance measurements to several base stations. LANDMARC [70]

deploys multiple fixed RFID readers and reference tags as infrastructure, and measures the

tracking tag’s nearness to reference tags by the similarity of their signal received in multiple

readers. LANDMARC uses the weighted sum (the weight is proportional to the nearness)

of the positions of reference tags to determine the 2D position of the tag being tracked.

All the above work [45, 70] use battery-powered RFID tags to identify and locate ob-

jects. These tags are expensive (at least tens of dollars per tag) and have limited lifetime

(from several days to several years). These limitations have prevented them from scaling

to applications dealing with hundreds and thousands of objects. In contrast, passive RFID

tags are inexpensive (less than a dollar per tag and the price is continuously falling) and do

not require battery power source. These features make passive RFID technology ideal for

such applications.

Fishkin, et.al, propose a technique to detect human interactions with passive RFID

tagged objects using static RFID readers in [22]. The proposed technique uses the change

of response rate of RFID tags to unobtrusively detect human activities on RFID tagged

objects such as, rotating objects, moving objects, waving a hand in front of objects, and

walking in front of objects. However, this doesn’t consider the problem of estimating the

locations of RFID tagged objects. Their experimental results show that their system could

nearly always detect rotations, while the system performed poorly in detecting translation-

only movement.

In [41], Hahnel, et.al, have proposed a mapping and localization approach using the

combination of a laser-range scanner and RFID technology. Their approach employs laser-

based FastSLAM [42] and Monte Carlo localization [13] to generate maps of static RFID

113

tags using mobile robots equipped with RFID readers and laser-range scanner. Through

practical experiments, they demonstrate that their system can build accurate 2D maps of

RFID tags, and they further illustrate that resulting maps can be used to accurately localize

the robot and moving tags.

Another system is the 3D RFID tag [76]. The 3D RFID system is equipped with a

robot-controlled uni-directional antenna, and the 3D tag consists of several combined tags.

Two kinds of 3D tags are developed: union tag and cubic tag. The proposed system can not

only detect the existence of the 3D tag but also estimate the orientation and position of the

object. However, they require usages of specific orientation-sensitive 3D tags custom-built

from multiple tags. Furthermore, the system uses highly expensive robot system to control

the antenna’s movement and then estimate the orientation and position of the object. In

contrast, Ferret only needs one standard orientation-insensitive tag per object and the user’s

inherent mobility to estimate the object’s location.

6.3 Ferret Design

Ferret is designed to operate on a handheld video camera with a display. To use Ferret,

the user selects some set of objects she would like to locate in the room and moves around

the room with the camera. Using an RFID reader embedded in the video camera, Ferret

samples for nearby tags, and in real-time updates the camera’s display with anoutlineof

the probable location of the objects she is searching for. Ferret’s knowledge of object

location can be imprecise, so rather than showing a single centroid point, Ferret displays

the outline, leaving the interpretation of the precise location to the user’s cognition. For

instance, if Ferret can narrow the location of a book to a small region on a shelf, a user can

quickly find the precise location. Figure 6.1 provides a pictorial representation of how the

system would work. In this scenario the user is looking for a soup can in a messy office.

After scanning the room using a Ferret-based camera, the system highlights a small region

that contains the soup can.

114

Figure 6.1. Use of Ferret to discover the location of a soup can in an office

6.3.1 Nomadic Location with RFID

Many pervasive systems that rely on location are predicated on the assumption that

the number of objects requiring location information is small and mobile. In contrast, we

designed Ferret to support a massive number of mostly static, or nomadic objects—objects

that change locations infrequently. As a fraction of all objects, nomadic ones are in the vast

majority—in any given room it is likely that there are hundreds, or possibly thousands of

nomadic or static objects, while there are only a few mobile ones.

The primary barrier to providing locationing information for such a large number of ob-

jects is the reliance on batteries—making objects self-locating requires the use of a battery-

powered locationing hardware. Even though locationing systems such as ultrasound [75]

and Ultra-Wide Band (UWB) are becoming more energy efficient, equipping hundreds of

objects in a room with self-locating capabilities simply does not scale, since it will re-

quire changing an unmanageable number of batteries. In contrast, passive RFID provides

a battery-free, inexpensive, distributed, and easily maintained method for identifying ob-

jects; Ferret adds locationing capabilities to such objects. Ferret leverages the fact that an

increasing number of objects will be equipped with RFID tags as a replacement to bar-

codes. Further, RFID tags continue to drop in price, and one can imagine attaching tags to

a large number of household or office objects.

As RFID tags are passive devices and have no notion of their own location, Ferret must

continuously calculate and improve its own notion of the object locations. The system fuses

115

a stream of noisy, and imprecise readings from an RFID reader to formulate a proposition of

the object’s location. The key insight in Ferret is to exploit the location of a camera/reader

to infer the location of objects in its vicinity. In essence, any tag that can be read by a reader

must be contained within its sensing range; by maintaining a history of tags read by the

system, Ferret can progressively narrow the region containing the object. This is a simple

yet elegant technique for inferring the location of passive RFID tags without expensive,

battery-powered locationing capabilities.

6.3.2 Infrastructure Requirements

Strictly speaking, calculating and displaying object locations do not require any in-

frastructural support. Displaying a location on the video, as well as combining multiple

readings of the object location, only requiresrelative locations, such as those from inertial

navigation systems [4]. However, it is likely that knowledge of object locations in relation

to a known coordinate system, such as GPS or a building map, will be useful for many

applications. We assume that the camera/reader uses such a locationing system, such as

Ultrasound or UWB, to determine its own location and then uses it to infer the location of

objects in its vicinity.

As Ferret uses a directional video camera and RFID reader, it also requires an orien-

tation system that can measure the pan (also known as heading and yaw), tilt (also known

as pitch), and roll of the system. While research has proposed orientation systems for Ul-

trasound [75], we have chosen to use commercially available digital compass to determine

the directionality of the reader and the camera at any instant. Similar to the locationing

system, Ferret benefits from having absolute orientation, although it can operate with only

a relative orientation.

6.3.3 Location Storage

For each object, Ferret must store a description of the object’s location. Considering

that some RFID tags are remotely rewritable, Ferret can store the location for an object

116

directly on the tag itself. Other options are to store the locations locally in each Ferret

reader, or in an online, external database. Each option provides different privacy, perfor-

mance, and management tradeoffs. Storing locations locally on each reader means that

each independent Ferret device must start finding objects with zero initial knowledge. As

the devices moves and senses the same object from different vantage points, it can use a

sequence of readings to infer and refine the object location. The advantage of this method is

that it works with read-only RFID tags and does not require any information sharing across

devices. However, it prevents the device from exploiting history available from other read-

ers that have seen the object in the recent past. In contrast, if location information can be

remotely written to the RFID tags then other Ferret devices can start with better initial esti-

mates of object location. However, this option requires writable tags and the small storage

available on a tag limits the amount of history that can be maintained. In both of the above

options, any device that has an RFID reader can determine object locations without needing

the full complexity of the Ferret system.

A third option is to store the location information in a central database. This has the

advantages of allowing offline querying and providing initial location estimates to mobile

readers; further, since database storage is plentiful, the system can store long histories as

well as past locations of nomadic objects. However, it requires readers to have connectivity

to the database, the burden of management, and privacy controls on the database. Storing

data on the tags also has implications for privacy control, however one must at least be

proximate to the tag to query its location.

At the heart of Ferret is an RFID localization system that can infer the locations of in-

dividual passive RFID tagged objects. Ferret then uses this localization system to dynam-

ically discover, update, store, and display object locations. The following section presents

the design of our RFID localization technique.

117

6.4 RFID Locationing

Consider an RFID reader that queries all tags in its vicinity—the reader emits a signal

and tags respond with their unique identifier. Given all responses to a query, the reader can

produce positive or negative assertions whether a particular tag is present within its reading

range. The reader can not directly determine the exact location of the tag in relation to the

reader, or even a distance measurement. However, just one positive reading of a tag greatly

reduces the possible locations for that particular object—a positive reading indicates that

the object is contained in the volume defined by the read range of the reader (see Figure

6.2). Ferret leverages the user’s mobility to produce a series of readings; the coverage

region from each reading is intersected with all readings from the recent past, further re-

ducing the possible locations for the object (see Figure 6.3). Using this method, Ferret can

continually improve its postulation of the object location.

In addition to positive readings of an object’s RFID tag, the reader implicitly indicates

a negative reading whenever it fails to get a reading for a particular tag that it is looking

for. Using a similar method to positive readings, Ferret subtracts the reader’s coverage

region from the postulation of the object’s location. This also improves the postulation

of the object’s location. A third method to reduce the likely positions for the object is to

modulate the power output of the reader. If a particular power output produces a positive

reading, and a lower power produces a negative reading, the system has gained additional

knowledge about the location of the object.

In general, whenever a tag is present in the read range, the reader is assumed to detect it

with a certain probability—objects closer to the centroid of its read range are detected with

higher probabilities, while objects at the boundary are detected with lower probabilities.

Thus, each positive reading not only gives us a region that is likely to contain the object, it

also associates probability values for each point within that region. Thiscoverage mapof a

reader is shown in Figure 6.2. The map can be determined from the antenna data sheet, or

118

Reader has a 95% chance of detecting a tag here



Figure 6.2. Coverage region of an RFID reader and tag detection probabilities in twodimensions.

by manually mapping the probability of detecting tags at different (x,y,z) offsets from the

reader.

Given a three dimensional grid of the environment and assuming no prior history, Ferret

starts with an initial postulate that associates an unknown probability of finding the object

at each coordinate within the grid. For each positive reading, the probability values of each

grid point contained within the coverage range are refined (by intersecting the range with

past history as shown in Figure 6.3). Similarly, for each negative reading, the probability

values of each grid point contained within the coverage range is decreased. This results in

a three-dimensional map,M(x, y, z), that contains the probability of seeing a tag at each

data point in relation to the reader. Using multiple power outputs requires building a map

for each power output level. Due to several constraints in our current prototype, Ferret

currently does not use power modulation; however, adding this to the system will be trivial.

The amount of computation that the system can do drastically affects the location algo-

rithm that performs intersections, the compensation for false negatives, and how it reflects

the map to the user. Next we describe two alternative methods, one that is computationally

intense and cannot be done in realtime on current mobile hardware. Such an offline tech-

nique is useful for describing an eventual goal for the system, or how to use the system for

119

Object

Ferret with RFID Reader

Coverage Map

Figure 6.3. Refining location estimates using multiple readings.

analyzing the data after it is collected. However, our goal is to implement Ferret on a mo-

bile device so we also describe an online algorithm with drastically reduced computational

cost.

6.4.1 Offline Locationing Algorithm

Formally, if we consider Ferret’s readings as a series of readings, both positive and

negative, as a seriesD = {D1, D2, D3, ...Dn}, and we want to derive the probability of the

object being at positionX, given the readings from the RFID, orP (X|D). If we assume

that each reading of the RFID reader is an independent trial, by applying the well-known

Bayesian filtering scheme, we can compute the likelihood as:

P (X|D) =P (X|{D1, ...Dn−1})P (Dn|X)

P ({D1...DN)}1

Z, (6.1)

where Z is a normalization factor. We omit the proof as it is a straight-forward application

of conditional probability.

If we first assume that the Ferret device (camera) is completely stationary, it operates

as follows: i) once Ferret receives the first positive reading of a tag it initializes a three

dimensional map,L, with the coverage mapM , to track the probability that the object is at

each of the coordinates in the map. ii) each successive reading multiplies each coordinate

in L by M(x, y, z) if the reading was positive, or1−M(x, y, z) if the reading was negative.

120

Figure 6.4. Simplified 2D observation model for the antenna

From the above, we can see that the key term is the probabilityP (Dn|X) of detecting

a tag given its poseX relative to the RFID antenna. To determine the observation model

of the RFID antenna, we used the following method. We setup the RFID antenna at some

particular location, and we put an RFID tag in a specific point near the antenna. We counted

the frequency of detecting the tag and used it as the probability of detection at this location

relative to the RFID antenna. We repeated this for every point in a discrete 3D grid around

the RFID antenna until we reached the area in which the tag can not be detected under any

circumstances. At the end of this procedure, we can then determine the observation model

of the RFID antenna. Due to the varying characteristics of different kinds of RFID antennas

and tags, we need to repeat this procedure for every kind of RFID antennas and tags. In

Figure 6.4, we show the simplified 2D observation model of our RFID antenna, while we

use the 3D observation model in our implementation. Basically, the major detection range

of our RFID antenna is like a balloon with a maximum length of 175cm and a maximum

width of 120cm. The likelihood inside the detection region is also shown in Figure 6.4. For

locations outside the two regions with likelihood0.5 and0.9 we assume a zero likelihood.

6.4.2 Translation, Rotation and Projection

The basic algorithm described above assumes a stationary camera/reader; Ferret’s no-

tion of object location does not improve beyond a point, even with a large number of

readings—most points in the reader’s range (i.e., within the coverage map) will continue to

have a high, and equally likely probability of detecting the tag. Subsequently, multiple read-

121

Figure 6.5. Left Handed Coordinate System

ings produce a large map with equally likely probabilities of the object’s location. Instead,

Ferret depends on the user’smotionto reduce the possibilities for the object location—as

the user moves in the environment, the same object is observed by the camera from multiple

vantage points and intersecting these ranges allows Ferret to narrow the region containing

the object. Incorporating motion is straightforward; however, the coordinates system of the

coverage mapM must be reconciled with that of the mapL before this can be done.

The coverage map shown in Figure 6.3 is described in a three-dimensional coordinate

system with the origin at the center of the reader’s RFID antenna, which we refer to as

the reader coordinate system. The camera, although attached to the RFID reader, is offset

from the reader, and has a slightly different coordinate system. We refer to this as the

camera coordinate systemwhich has its origin at the center of the camera’s CCD sensor.

To combine multiple readings from the reader, and subsequently display them to the user,

each mapM must be transformed into a common coordinate system. We refer to this as the

world coordinate system. The world can have it’s origin at any point in the space—with a

locationing system we can use its origin, or with an inertial location system we can use the

first location of the reader. Without loss of generality, we assume the reader, camera, and

world coordinate systems are left hand coordinate systems (see Figure 6.5).

Performing this transformation is possible using techniques from linear algebra and

computer graphics [28]. For each reading, the reader has a location and orientation with

122

respect to the world coordinates. This is described as a location(x0, y0, z0) and an orien-

tation with a pan ofα degrees (the rotation along the y axis, range[−180, 180]), a tilt of

β degrees (the rotation along the x axis, range[−90, 90]), a roll of γ degrees (the rotation

along the z axis, range[−180, 180]). The direction of the rotation is given by the left hand

rule where the thumb is in the positive direction of the rotation axis and the fingers show

the positive direction of rotation (see Figure 6.5). This transformation is formulated as a

rotation matrix:

R =

cos(γ) − sin(γ) 0

sin(γ) cos(γ) 0

0 0 1

×

1 0 0

0 cos(β) − sin(β)

0 sin(β) cos(β)

×

cos(α) 0 sin(α)

0 1 0

− sin(α) 0 cos(α)

(6.2)

whereR is a 3 x 3 orthonormal matrix which has columns that are mutually orthogonal

unit vectors, so thatR−1 = RT.

So, if a point is located at(xw, yw, zw) in the world coordinates, the object’s location in

the reader coordinates(xr, yr, zr) can be computed via:

xr

yr

zr

= R×

xw − x0

yw − y0

zw − z0

= R×

xw

yw

zw

+ T, T = −R×

x0

y0

z0

(6.3)

where the composite rotation matrixR is given by Equation 6.2.

Therefore, the reverse transformation from reader coordinate system to world coordi-

nate system is given by:

123

xw

yw

zw

= R−1 × (

xr

yr

zr

− T ) = R−1 ×

xr

yr

zr

+

x0

y0

z0

where(x0, y0, z0) is the reader’s position in world coordinate system andR−1 = RT.

When computing the intersection of coverage maps, Ferret first transforms the coverage

map,M into the world coordinate systems using Equations 6.2 and 6.4, and computes the

intersection according to the methods presented in Section 6.4.1 to produce a new mapL

containing the likelihood of object locations.

Once Ferret produces a three dimensional map that it believes contains a particular

object, it must overlay this region onto the video screen of the camera; doing so involves

projecting a 3D map onto a two dimensional display. This is done in two steps: thresholding

and projection. The threshold step places a minimum value for the likelihood on the map

L— by using a small, but non-zero value for the threshold, Ferret reduces the volume that

encompasses the likely position of the object. However, using a larger threshold may cause

Ferret to shrink the volume excessively, thus missing the object. Currently this is a tunable

parameter in Ferret—in the evaluation section we demonstrate how to chose s reasonable

value.

Finally, Ferret projects the intersection map onto the image plane of the video display.

Ferret must transform the intersection map from the world coordinate system into camera

coordinate system. Ferret performs this transformation using Equation 6.2 and 6.3, along

with the camera’s current position and orientation. As stated previously, the camera co-

ordinate system follows the left-hand convention, and the z-axis of the camera coordinate

system is co-linear with the camera’s optical axis. Assuming the camera has focal length

f , and a point is positioned at(xc, yc, zc) in camera coordinate system. The projection is

given by:

124

u

v

=f

zc

×

xc

yc

(6.4)

whereu andv is the projection at the CCD sensor.

For each reading the RFID reader produces, the location algorithm must performO(n3)

operations, for a three dimensional space that isn × n × n, in addition to translating and

rotating the coverage map, and projecting the location map onto the display.

If Ferret is searching for multiple objects, it must perform these operations foreach

individual object. In practice, we have found that each RFID reading consumes 0.7 seconds

on a modern processor, while our RFID reader produces 4 readings per second. Given the

speed at which a human may move the camera, this is not feasible to do in realtime, however

it works well for an offline system that has less stringent latency requirements.

An offline algorithm also has the opportunity to perform these operations for the whole

video, and then use the smallest region that it has computed to annotate the entire video

stream with that region.

6.4.3 Online Locationing Algorithm

Given that the offline algorithm is too computationally intensive for a mobile device to

operate in real-time, we describe a greatly simplified version of the locationing algorithm.

The primary goal is to reduce the representation of the probability of where the object is.

Instead of a full representation that describes the probability at each location, we reduce it to

describing just the convex region where the object is with very high probability. Describing

such a region is very compact, as we only need to track the points that describe the perimeter

of the convex region. Intersecting two maps is very fast, as it is a series of line intersections.

Figure 6.6 shows this in detail for two dimensions, extending it to three dimensions is

straightforward. The first half of the diagram shows sample points that describe the outside

of the coverage map. Ferret rotates and translates the coverage mapM as described in the

previous section, and intersects it with the current mapL. For each constanty value, the

125

system finds the intersection of the two line segments and uses that as the description of

the new mapL. For instance in Figure 6.6, we choose a constanty valuey1. After rotating

and translating the mapM to match the reader’s current position, the system intersects the

two line segments,(x1, y1) − (x3, y1) from the current mapL, with (x2, y1) − (x4, y1)

from the new mapM . The resulting intersection is the segment(x2, y1)− (x3, y1), which

describes the perimeter of the new location mapL. Ferret repeats this process for ally

values. Extending this to three dimensions is straightforward: intersect two line segments

for each pair of constanty andz value. This means the complexity of the intersection is

O(n2) rather thanO(n3) as in the offline algorithm.

Also, instead of using a map of probabilities for the coverage map, we reduce it to the

convex shape that describes the coverage region of the RFID reader than can read tags with

some probability greater than 0. This virtually eliminates the possibility of false positives.

Additionally, describing the perimeter only requires twox points for each pair ofy andz

values, thus the representation of the region is greatly reduced in size fromO(n3) to O(n2).

Using our prototype as an example, this reduces the storage requirement from 43.5M bytes

to 178K bytes—each of these are highly compressible. This greatly aids Ferret’s ability to

store the regions directly on the storage-poor tags. The line segment representation does

mean that the system cannot incorporate negative regions, as intersecting with a negative

region can create a concave, rather than convex, region. A concave region would return

the complexity of the representation and the intersection toO(n3). False negatives do not

affect the system, as negative readings are not used at all.

6.4.4 Dealing with Nomadic Objects

We designed Ferret to deal with objects that move infrequently—commonly referred

to as nomadic as opposed to mobile objects that move frequently. When objects do move,

Ferret should adjust to deal with this. In the online algorithm, this is straightforward. When

the location algorithm performs an intersection of two maps, it may produce a region of

126

(x1, y1)

(x2, y1) (x3, y1)

(x4, y1)(x1, y1) (x3, y1)

Figure 6.6. Online location estimation in Ferret.

zero volume. This indicates that the maps were disjoint, and th object could not possibly

be within the previously postulated region. The system then reinitializes the location map,

L, to the most current reading, which isM rotated and translated to the reader’s current

position.

However, the offline algorithm is more complicated as it produces a likelihood location

map. One solution is only using the intersection of positive readings to deal with nomadic

objects like what the online algorithm does, while this approach doesn’t utilize any use-

ful information given by the negative readings. Therefore, another practical solution is

applying a likelihood threshold to the likelihood location map and removing any location

with a probability less than the threshold. If the resulting location map is empty, we will

consider the object has moved and reinitialize the location map,L, to the most current

reading. Choosing an appropriate threshold is a critical factor in this approach. Using a

larger threshold will increase the likelihood that the resulting location map is empty when

the object actually does not move. In Section 6.6, we will show the experiments on how to

choose an appropriate threshold.

6.5 Implementation Considerations

We have implemented a prototype Ferret system as shown in Figure 6.7. Although

the prototype is quite large, this is due to the combination of many separate pieces of

127

hardware—there is nothing that would preclude a much smaller commercial version. Our

prototype is based on the following hardware:

(a) Front (b) Back

Figure 6.7. Ferret Prototype System

• A ThingMagic Mercury4 RFID readerwhich has a SensorMagic monostatic circular

antenna connected to it. The output power of the reader is set to30dBm (1Watt).

This reader operates at the frequency range909− 928MHz, and supports RFID tags

of EPC Class 0, EPC Class 1, and ISO 18000-6B. The reader is paired with a Thing-

Magic monostatic circular antenna that has a balloon shaped radiation pattern. An

alternative is to a use a linear antenna that has a more focused radiation pattern and

longer range; however, the narrower beam will produce fewer positive readings for

each tag. The tradeoff in antenna choice and the possibility of future antennas with

variable radiation patterns are interesting questions for future research. We used an

orientation-insensitive, EPC Class 1, Alien Technology “M” RFID tag operating at

915MHz.

• A Sony Motion Eye web-cameraconnected to a Sony Vaio laptop. This CMOS-based

camera is set to a fixed focal length of 2.75mm, and uses a sensor size of 2.4mm by

1.8mm. The camera provides uncompressed 320x240 video at 12 frames-per-second.

128

• Cricket[75] ultrasound 3D locationing system to estimate the location of the camera

and RFID reader. We deploy Cricket beacons (served as references) on the ceiling,

and attached a Cricket sensor to our prototype system. The Cricket sensor is offset

from the camera and RFID reader and we correct for this translation in software.

• A Sparton SP3003D digital compassto obtain the 3D orientations (pan, tilt, and roll)

of the camera’s lens and the reader’s antenna. We mount the compass, the camera’s

lens, and the reader’s antenna in a way that they all have same 3D orientations.

Our prototype system consists of the following software modules:

• Video Module: This module records the video stream from the web camera, and

transcodes the video stream into MPEG-2 video clip. In addition to this functional-

ity, the video module will project and highlight the estimated region containing the

target object when displaying video stream. We modify the FFmpeg video suite [20]

to implement this module. We implement the projection function according to Equa-

tion 6.4 to compute the projection of the location estimation, and then intercept the

display function of FFmpeg video suite to display the boundary of the projection

area. This module requires 638 lines of C code.

• RFID Module:This module controls the RFID reader, and records the readings from

the RFID reader. The RFID reader provides functions of remote control and query

via TCP connection using SQL-like query and control messages. The RFID module

submits a query request with interval value of 250ms to the reader, and then the RFID

reader periodically responds with configurable plaint text message including the tag

ID, the ID of the antenna reading the tag, and so on. This module requires 173 lines

of C code.

• Cricket and Compass Module:This module communicates with the Cricket sensor

and digital compass to obtain the location and orientation of camera and RFID reader.

129

The Cricket module communicates with the Cricket sensor via a serial port, and

the output of the Cricket sensor is its distances to beacons. Our module records

these distances, and uses them to triangulate the location of the Cricket sensor. After

adding some constant offset (measured manually), we then have the location of the

camera and RFID reader. This module costs us 486 lines of C code. The Compass

module also communicates with the compass via a serial port. This module costs us

420 lines of C code.

• Locationing Module:This module implements the locationing algorithms which are

discussed in Section 6.4. This implementation includes: (i) coordinate transforma-

tion functions between world coordinate system and the coordinate systems of cam-

era and RFID reader according to Equation 6.2, 6.4 , and 6.3, (ii) intersection func-

tions to compute intersection for positive readings and negative readings, and (iii)

a central database to store the location information. This implementation requires

1535 lines of C code.


In this section, we evaluate Ferret by focusing on the performance of locationing and

projection. In particular, we concentrate on how quickly Ferret can refine the location of

an object for a user. We show how to tune the offline algorithm to trade the size of the

location region and the overall error rate. We then show a comparison of the online and

offline systems. We demonstrate that Ferret can detect objects that move within a room and

we show the computation and storage costs of our system.

We measure Ferret’s performance using two metrics: the size of the postulated location

and the error rate. Ferret automatically provides the size, either the volume of the three-

dimensional region, or the area of the two-dimensional projection on the video screen. The

three-dimensional region is not a sphere, but to interpret the results, a sphere with a volume

of 0.01m3 has a diameter of26.7cm and a volume of0.1m3 has a diameter of57.6cm.

130

0.01

0.1

1

10

0 20 40 60 80 100 120

Volu

me

(m3 )

Time (secs)

0%

20%

40%

60%

80%

100%

0 20 40 60 80 100 120

Volu

me

Time (secs)

(a) Absolute Volume (b) Relative Volume

0.01

0.1

1

10

0 20 40 60 80 100 120

Volu

me

(m3 )

Time (secs)

(c) Display Area

Figure 6.8. Online refinement of location

Ferret’s error rate is the number of objects that do not appear in the area projected on to the

display. The error rate is determined through manual inspection of a video recording.

All of our experiments are conducted in a 4m x 10m x 3m room equipped with a Cricket

ultrasound system. We used five beacons mounted on the ceiling which we manually cal-

ibrated. The origin of our world-coordinate system is a corner of the room. The camera

records all video at 12 frames/second, and the RFID reader produces 4 readings per second.

For the online system, we use a coverage map that includes all places where the tag has

131

a non-zero probability of reading a tag. That region is an irregular shape that is 2.56m x

1.74m x 2.56m at the maximum and has a volume of approximately2m3.

6.6.1 Online Refinement Performance

The primary goal of Ferret is to quickly locate, refine, and display an outline on the

video display that contains a particular object. As this happens online, Ferret continuously

collects readings and improves its postulation of the object’s location—this is reflected as

the volume of the region shrinking over time. To demonstrate this, we placed one tag in

the room, and then walked around “randomly” the room with the prototype. We plot the

volume of the location estimation versus time in Figure 6.8. The absolute volume tracks the

total volume of the region, while the relative volume tracks the size of the region relative

to the starting coverage region of the reader. In this case Ferret does not make any errors

in locating the object. The time starts from the first positive reading of the tag and Ferret

begins with no previous knowledge about object locations.

The results show that the volume size of the location estimation drops from2m3 to

0.02m3 which is only 1% of the reader’s coverage region in less than 2 minutes. The

volume monotonically decreases, as intersecting positive readings only shrinks the area,

while negative readings are ignored. Also, this is a pessimistic view of the refinement

time—with prior knowledge, the process occurs much more rapidly. For instance, if the

user switches to searching for another object in the same room, Ferret can take advantage

of all of the previous readings. If a previous user has stored location information on the tag,

this reader can also take advantage of that from the time of the first reading. Additionally,

if some location information is stored in a centralized database, Ferret can immediately

project an area onto the video withoutanypositive readings.

In addition to the volume size of the location estimation, we also plot the projection

area versus time in Figure 6.8(c) in which the projection areas are shown as a percentage of

the image plane area. Our results show that the final projection area is only3% of the whole

132

image, or approximately a 54 pixel diameter circle on a 320 x 240 frame. However, the

projection area does not monotonically decrease as the volume does. This is because the

camera is constantly moving, thus the point-view constantly changes, and the same volume

can project different areas from different orientations.

6.6.2 Offline Algorithm Performance

While the online algorithm is useful for current mobile devices, the offline algorithm

uses more information, and a more precise representation of the object’s location likeli-

hood. To evaluate Ferret’s precision in locating objects, we placed 30 tags in a 2.5m x

2.5m x 2m region, and we move the prototype around the room for20 minutes. We repeat

the experiment 3 times and record the volume of the postulated region, and manually verify

how many objects are truly contained in the area projected onto the video plane. With 30

tags and 3 experiments, Ferret can make between 0 and 90 location errors.

Before evaluating the offline algorithm, we must set a threshold for the minimum like-

lihood for the object as described in Section 6.4. Recall that a larger threshold can reduce

the volume encompassing the likely position of the object. However, a larger threshold

will also increase the error rate of Ferret (the volume doesn’t contain the object). In order

to test the sensitivity of offline Ferret to the change of likelihood threshold, we varied the

likelihood threshold from0.00001 to 0.4, and ran the offline Ferret algorithm on the data

we collected in the experiment. We show the results in Figure 6.9.

Threshold Errors Mean Volume0.00001 5/90 0.0117m3

0.0001 5/90 0.0117m3

0.001 5/90 0.0116m3

0.01 5/90 0.0112m3

0.1 6/90 0.0108m3

0.2 7/90 0.0104m3

0.3 8/90 0.0102m3

0.4 9/90 0.0100m3

Figure 6.9. Performance of offline Ferret under different likelihood thresholds.

133

0 0.05 0.1 0.15 0.2 0.25 0.3 0.350

0.2

0.4

0.6

0.8

1

Volume (m3)

Prob

abilit

y

OnlineOffline

Figure 6.10.Empirical CDF of Ferret’s locationing accuracy

The results show that: (i) the number of errors almost doubles from 5 to 9 as threshold

increase from0.00001 to 0.4 (ii) the mean volume of the location estimation is essentially

constant; and (iii) for a threshold≤ 0.01, the number of errors doesn’t change. When using

too high of a threshold Ferret incorrectly shrinks the volume, leaving out possible locations

for the object. Considering the balance of error rate and mean volume, we choose a like-

lihood threshold of0.01. Using this threshold, we run the offline algorithm and compare

it to the performance of the online algorithm. In Figure 6.10, we plot the CDF of Ferret’s

location accuracy for both algorithms.

The results show that (i) The online algorithm can localize an object in0.15m3 and

0.05m3 regions with80% and50% probability, respectively. The0.15m3 and0.05m3 re-

gions are only7.5% and2.5% of the reader’s coverage region which is2m3; (ii) The offline

algorithm outperforms the online algorithm by localizing an object in a0.05m3 region with

more then90% probability and in a0.1m3 region with100% probability.

However, when we verify the online algorithm’s error rate, it only makes 2 errors, as

compared to the offline algorithm’s 5 errors. We believe that the slightly greater number of

errors in the offline algorithm is due to our incorporation of negative readings in the algo-

rithm. In this experimental setup, the prototype system is constantly moving and the tags

are in the coverage region of the RFID reader for a small portion of the total time (less than

5%). This scenario will generate 19 times the number of negative readings than positive

134

readings, and negative readings are weighted as heavily as positive readings. Considering

that we measured the performance of the reader under ideal conditions, we have overesti-

mated the performance of the RFID reader. The online algorithm does not exhibit the same

behavior as it does not ever use negative readings. As negative readings are correlated by

orientation, and location, we believe that more accurate modeling of reader performance is

an important direction for future research.

6.6.3 Mobility Effects

Ferret exploits the user’s mobility to produce a series of readings from multiple po-

sitions, and further refine its location estimation via intersecting the coverage regions at

these positions. The previous experiment showed the results of a human, yet uncontrolled,

mobility pattern. In reality users move erratically; however, their motions are composed

of smaller, discrete motion patterns. To study how individual patterns affect the perfor-

mance of Ferret we placed a single tag in the room and evaluated Ferret with a small set

of semi-repeatable motion patterns shown in Figure 6.11: (a)straight line, the prototype

system moves in a straight line, tangential to the object, without changing the orientation of

the camera lens and RFID reader; (b)head-on, the prototype moves straight at the object

and stops when the reader reaches the object; (c)z-Line, the prototype system moves in a

z-shaped line without changing its orientation; (d)rotation , the prototype system moves in

an arc, while keeping the lens orientation radial to the path; (e)circle, the prototype system

moves in a circle, while keeping the reader facing the object. Intuitively, the circular pattern

may be the least likely of the mobility patterns, whereas the head-on is probably the most

likely—once the user gets one positive reading, she will tend to head towards the object

in a head-on pattern. We evaluated Ferret’s performance using the volume of the resulting

region. For each movement pattern we ran three experiments, averaged the results, and

compared the smallest volume size of both online and offline Ferret. Our results are shown

in Figure 6.12.

135

(a) Straight line

(d) Rotation

(b) Head−on (c) z−Line

(e) Circle

RFID Tag

Ferret MovingDirectionFerret System

Figure 6.11.Path of the Ferret device

Straight line Head-on z-Line Rotate Circleonline Volume (m3) 0.020 0.0042 0.023 0.026 0.032offline Volume (m3) 0.0015 0.0030 0.0017 0.0011 0.026

offline : online 13.33 1.40 13.52 23.63 1.23

Figure 6.12.Performance of Ferret under various mobility patterns.

The results show that Ferret performs similarly for each of the movement patterns;

however the circular pattern performs the worst. The circular pattern always keeps the

object in view and generally in the center of the reader’s coverage region. This produces

a set of readings that generally cover very similar regions. In each of the other cases, the

mobility of the reader covers more disjoint spaces, and thus produces smaller volumes.

This is true even of the head-on pattern as the first reading and the last reading have very

little volume in common. Another result is that the offline algorithm widely outperforms

the online algorithm, except in the case of the circular and head-on patterns, where the

performance is similar. Much of the offline algorithm’s performance advantage comes from

incorporating negative readings to reduce the possible locations for the object. In the case

of the circular and head-on patterns, the object is always in view, producing few negative

readings, yielding similar performance to the online algorithm. Although non-intuitive,

this means thatnot seeing the object is as important as seeing it to narrow its location.

136

0

0.2

0.4

0.6

0.8

1

0 50 100 150 200

Frac

tion

Det

ecte

d

Moving Distance (cm)

onlineoffline

Figure 6.13.Fraction of object movements detected

6.6.4 Object Motion Detection

Ferret is designed to deal with objects that move infrequently, but when the object

does move, Ferret should detect this and start its refinement process over. As discussed in

Section 6.4, whenever Ferret encounters an empty location estimation, Ferret assumes that

the corresponding object has moved. To evaluate Ferret’s performance in detecting these

nomadic objects we place a tag in the room and use Ferret to estimate its location. We

then move the tag a distance between 5cm and 200cm and again use Ferret to estimate its

location. We repeat the experiment ten times for each distance, and record the number of

times that Ferret didn’t detect a moved object. The results are shown in Figure 6.13.

The figure shows that the online and offline Ferret can detect100% object movements

when the moving distance exceeds25cm and20cm, respectively. This is consistent with

our previous results that show that Ferret can localize an object to within an region with a

volume of hundredths of am3—this gives a radius on the order of20cm, exactly how well

Ferret can detect movement. As the object has not actually left the postulated area, Ferret

is still correct about the object’s location.

137

6.6.5 Spatial Requirements

The prototype has a non-zero probability of detecting tags in balloon-shaped region,

with maximum dimensions of 2.56m x 2.56m x 1.74m—this shape has a volume of approx-

imately2m3. For the offline algorithm we sample this coverage region every centimeter.

As discussed in Section 6.4, the offline algorithm requires every point in this space, while

the online algorithm only requires a set of points that describe the exterior of the region.

This reduced representation results in much smaller spatial requirements as compared to

offline spatial requirements: (i) the offline algorithm uses afloat of four bytes to describe

the probability of a sample point, and the total space is256 ∗ 256 ∗ 174 ∗ 4 = 43.5M

bytes using a three dimensional array to store the probabilities of all sample points, and (ii)

the online algorithm uses a two dimensional array (the dimensions correspond to y and z)

to represent the coverage region, and consequently, it only needs two bytes to track the x

value of every outside sample point, thus the total space required is256 ∗ 174 ∗ 2 = 178K

bytes. Both the offline and online representations are highly compressible: the offline can

be reduced to 250K bytes and the online representation to 5K bytes using the commonly

available compression toolgzip. For the foreseeable future, RFID tags will not contain

enough storage for the offline representation, while the online version is not unreasonable.

If tags have more or less storage the number of sample points can be adjusted, although

this will affect the precision of the system.

6.6.6 Computational Requirements

The computational requirements of offline and online have similar relationship. We

measured the computational and spatial requirements of Ferret’s locationing algorithm on

an IBM X40 laptop equipped with a 1.5GHz Pentium-M processor: (i) the offline algo-

rithm costs 749.32ms per reading for each object, and (ii) the online algorithm only costs

6ms per positive reading for each object, which is only1/125 of the offline computational

requirements. Our results show that the online algorithm incurs small overhead and will

138

run online to track multiple tags simultaneously on relatively inexpensive hardware, while

the offline algorithm incurs large overhead and can only run offline.


This chapter presents the design and implementation of Ferret, a scalable system for

locating nomadic objects augmented with RFID tags and displaying them to a user in real-

time. We present two alternative algorithms for refining a postulation of an objects location

using a stream of noisy readings from an RFID reader: an online algorithm for real-time

use on a mobile device, and an offline algorithm for use in post-processing applications.

We also present methods for detecting when nomadic objects move and how to reset the

algorithms to restart the refinement process.

We present the results of experiments conducted using a fully working prototype. Our

results show that (i) Ferret can refine object locations to only1% of the reader’s coverage

region in less than 2 minutes with small error rate (2.22%); (ii) Ferret can detect nomadic

objects with100% accuracy when the moving distances exceed20cm; and (iii) Ferret is

robust against different movement patterns of user’s mobility.

139

CHAPTER 7

CONCLUSIONS AND FUTURE WORK

Pervasive multimedia systems are enabled by the proliferation of multimedia mobile

devices, sensors and RFID tags. By attaching every object with sensors or RFID tags, we

can provide mobile devices with pervasive locationing and identification services. Using

these services, we can collect sensor data encoding thetemporal, spatial, andsocialcontext

of media capture—when, where, andwho. This metadata provide fast and easy access of

multimedia contents and enable a tremendous number of multimedia applications. Due

to the use of multimedia mobile devices, sensors and RFID tags, pervasive multimedia

systems are heterogeneous, power-constrained, and context-aware. Furthermore, pervasive

multimedia system must be scalable and easily maintainable.

This dissertation makes several key contributions toward providing providing system

support for pervasive multimedia systems. In this dissertation, we develop techniques for

energy efficient usage of resource-constrained pervasive multimedia systems. Further, we

design and implement a context-aware pervasive multimedia system which records thetem-

poral, spatial, andsocialcontext of media capture along with images and videos. Finally,

we address the issues of scalability and maintainability of locationing systems. Our specific

research contributions are as follows:

• Power Management Techniques of Mobile Device. We have designed and imple-

mentedChameleon, an application level power management technique.Chameleon

grants complete control over applications’ CPU speed settings to applications themselves—

an application is allowed to specify its CPU speed setting independently of other

applications. InChameleon, different applications can choose different policies and

140

yet coexist with one another concurrently. Our experimental studies show that lo-

cal decisions by individual applications can globally optimize system-wide energy

consumption, and simultaneously, satisfy QoS requirements of different application.

However, implementing an application-level power management strategy requires

modifications to the source code, which may not be feasible for applications not

providing access to their source code. Therefore, we have proposedTSPM, a time

series-based power management technique.TSPMemploys time series-based models

to predict future workloads of applications, and then change the speed settings of

CPU and disk for each application accordingly. Evaluation studies demonstrate that

our approach can achieve significant energy savings for mobile processors and disks

without hurting application’s QoS.

• Context-aware Multimedia System. We have developedSEVA, a context-aware

multimedia system. In addition to recording when and where a photo and video was

captured,SEVAalso records identities and locations of objects (as advertised by the

sensors attached to objects) along with images and videos. InSEVA, by using the

position technologies of GPS and ultrasound-based Cricket systems, we can locate

a mobile object with very high accuracy.SEVAuse a pervasive infrastructure to

build up the wireless communication between heterogeneous sensors and recoding

devices. This system exploits a series of correlation, interpolation, extrapolation, and

filtering techniques to produce video streams tagged with highly accurate context

metadata (e.g., when and where a video was captured, and who/what is present in

the video). This metadata can be used to efficiently search for videos or frames

containing particular objects or people. Further, they can be used to infer the content

of the videos and later organize the videos in context-aware basis. Fox example, a

video taken on a person’s birthday with family members present is very likely to be

the video of a birthday party.

141

• Pervasive Locationing and Identification System. We have designed a scalable lo-

cationing system calledFerret incorporating RFID technology. By combining loca-

tioning technology with pervasive multimedia applications, Ferret can locate objects

using their RFID tags and display their locations to a mobile user in real-time. Ferret

usesthe location and directionality of RFID readersto infer the locations of nearby

tags. Ferret leverages the user’s inherent mobility to produce readings of the tag from

multiple vantage points, and uses the intersection of the coverage regions from these

readings to continually improve its postulation of the object location. Ferret does this

through two novel algorithms: an online algorithm for real-time use on a mobile de-

vice, and an offline algorithm for use in post-processing applications. In the case of

the offline algorithm, we also incorporate negative readings—when the reader does

not see the tag—this greatly reduces the object’s possible locations. An experimen-

tal evaluation of our prototype system shows that Ferret is scalable and can locate

objects with high accuracy.

7.1 Future Work

In this section, we outline some future research directions which have evolved from the

unanswered questions in this dissertation.

• Power Management Techniques of Other Resources:In this thesis, we presented

the power management techniques of the CPU, and empirically demonstrated their

ability to achieve energy efficiency without sacrificing the performance of applica-

tions. However, besides CPU, display, network device, and memory subsystems are

the other three major sources of energy consumption, and hences, there is a need to

achieve energy efficiency for these components. We intend to design power manage-

ment techniques for these resources.

142

• Enhancements to Sensor-enhanced Video Annotation:We currently use the extended

Kalman filter (EKF) in SEVA’s stage of interpolation and extrapolation. Although

this technique has been proved efficient, we intend to explore some other interpolat-

ing and extrapolation techniques in the future. In particular, we plan to employ and

evaluate the unscented Kalman filter (UKF) technique. Unlike the EKF which uses

fixed state transition model, the UKF can dynamically estimate the state transition

model, and thus it has better accuracy in estimating the system state. Another direc-

tion in enhancing SEVA is to use the passive operating mode in Cricket ultrasound

locationing system. As opposed to the active mode we currently use, the passive

mode can scale to a large number of objects.

• Enhancements to RFID-based Locationing System:Ferret, our RFID-based location-

ing system, exploits the user’s mobility to produce a series of readings at different

vantage points, and uses the intersection of the coverage regions from these read-

ings to continually refine its postulation of the object location. This method can be

further improved by modulating the RFID reader’s power output which greatly af-

fect the coverage range. If the reader detects the object at a particular power output,

and it can not detect the same object at a lower power output, Ferret can reduce the

likely positions for the object even when it is static. We would like to incorporate the

method of modulating the reader’s power output into the Ferret system.

• A Unified System Architecture for Pervasive Multimedia Systems:Currently, SEVA

and Ferret operate as two separate systems. We would like to extend them to a com-

plete unified framework for pervasive multimedia systems. Such unification would

entail understanding and defining communication mechanisms and media specifica-

tions which is suitable and realistic for different kinds of sensors, for multimedia

contents and for applications.

143

BIBLIOGRAPHY

[1] Aizawa, K., Tancharoen, D., Kawasaki, S., and Yamasaki, T. Efficient retrieval oflife log based on context and content. InProceedings of the 1st ACM Workshop onContinuous Archival and Retrieval of Personal Experience (CARPE’04), New York,NY (October 2004), pp. 22–31.

[2] Bahl, P., and Padmanabhan, V. N. Radar: an in-building rf-based user location andtracking system. InProceedings of the 19th Annual Joint Conference of the IEEEComputer and Communications Societies (INFOCOM’00), Tel-Aviv, Israel(March2000), vol. 2, pp. 775–784.

[3] Bajaj, R., Ranaweera, S. L., and Agrawal, D. P. Gps: Location-tracking technology.Computer 35, 4 (March 2002), 92–94.

[4] Barshan, B., and Durrant-Whyte, H. F. Inertial navigation systems for mobile robots.IEEE Transactions on Robotics and Automation 11, 3 (June 1995), 328–342.

[5] Bavier, A., Moniz, A., and Peterson, L. Predicting mpeg execution times. InPro-ceedings of ACM Sigmetrics’98, Madison, WI(June 1998), pp. 131–140.

[6] Bendat, J., and Piersol, A.Random Data Analysis and Measurement Procedures,second ed. John Willey & Sons, 1985.

[7] Box, G. P., Jenkins, G. M., and Reinsel, G. C.Time Series Analysis Forecasting andControl, third ed. Prentice Hall, 1994.

[8] Card, S. K., Moran, T. P., and Newell, A.The Psychology of Human-ComputerInteraction. Lawrence Erlbaum Associates, 1983.

[9] Choi, K., Dantu, K., Cheng, W., and Pedram, M. Frame-based dynamic voltage andfrequency scaling for a mpeg decoder. InProceedings of the 2002 IEEE/ACM Inter-national Conference on Computer-aided Design (CAD’02), San Jose, CA(Novem-ber 2002), pp. 732–737.

[10] Crossbow technology inc. http://www.xbow.com.

[11] Davis, M., King, S., Good, N., and Sarvas, R. From context to content: Lever-aging context to infer media metadata. InProceedings of the 12th annual ACMInternational Conference on Multimedia (MM’04), New York, NY(October 2004),pp. 188–195.

144

[12] de Lara, E., Wallach, D., and Zwaenepoel, W. Puppeteer: Component-based adap-tation for mobile computing. InProceedings of the 3rd USENIX Symposium onInternet Technologies and Systems (USITS’01), San Francisco, CA(March 2001),pp. 159–170.

[13] Dellaert, F., Fox, D., Burgard, W., and Thrun, S. Monte carlo localization for mobilerobots. InProceedings of the 1999 IEEE International Conference on Robotics andAutomation (ICRA’99), Detroit, MI(May 1999), pp. 1322–1328.

[14] Deluo gps waas. http://www.deluoelectronics.com/.

[15] Devore, J. L.Probability and Statistics for Engineering and the Sciences, fifth ed.Brooks/Cole, 1999.

[16] Ellis, C. The case for higher-level power management. InProceedings of the 7thIEEE Workshop on Hot Topics in Operating Systems (HOTOS-VII), Rio Rico, AZ(March 1999), pp. 162–167.

[17] Ellis, D. P. W., and Lee, K. Minimal-impact audio-based personal archives. InProceedings of the 1st ACM Workshop on Continuous Archival and Retrieval ofPersonal Experience (CARPE’04), New York, NY(October 2004), pp. 39–47.

[18] Fan, J., Gao, Y., and Luo, H. Multi-level annotation of natural scenes using dominantimage components and semantic concepts. InProceedings of the 12th annual ACMInternational Conference on Multimedia (MM’04), New York, NY(October 2004),pp. 540–547.

[19] Feng, H., Shi, R., and Chua, T. A bootstrapping framework for annotating andretrieving www images. InProceedings of the 12th annual ACM International Con-ference on Multimedia (MM’04), New York, NY(October 2004), pp. 960–967.

[20] Ffmpeg 0.4.8. http://ffmpeg.sourceforge.net/index.php.

[21] Finkenzeller, K. RFID Handbook: Fundamentals and Applications in ContactlessSmart Cards and Identification, second ed. John Willey & Sons, 2003.

[22] Fishkin, K., Jiang, B., Philipose, M., and Roy, S. I sense a disturbance in the force:Long-range detection of interactions with rfid-tagged objects. InProceedings of the6th International Conference on Ubiquitous Computing (UbiComp’04), Nottingham,England(September 2004), pp. 268–282.

[23] Flautner, K., and Mudge, T. Vertigo: Automatic performance-setting for linux. InProceedings of the USENIX 5th Symposium on Operating Systems Design and Im-plementation (OSDI’02), Boston, MA(December 2002), pp. 105–116.

[24] Flautner, K., Reinhardt, S., and Mudge, T. Automatic performance-setting for dy-namic voltage scaling. InProceedings of the 7th ACM International Conferenceon Mobile Computing and Networking (MobiCom’01), Rome, Italy(July 2001),pp. 260–271.

145

[25] Fleischmann, M. Longrun power management - dynamic power management forcrusoe processors. Tech. rep., Transmeta Corporation, 2001.

[26] Flinn, J., de Lara, E., Satyanarayanan, M., Wallach, D., and Zwaenepoel, W. Reduc-ing the energy usage of office applications. InProceedings of the IFIP/ACM Inter-national Conference on Distributed Systems Platforms (Middleware 2001), Heidel-berg, Germany(November 2001).

[27] Flinn, J., and Satyanarayanan, M. Energy-aware adaptation for mobile applica-tions. InProceedings of the 17th ACM Symposium on Operating Systems Principles(SOSP’99), Charleston, SC(December 1999), pp. 48–63.

[28] Foley, J. D., Dam, A. V., Feiner, S. K., and Hughes, J. F.Computer Graphics:Principles and Practice in C, second ed. Addison-Wesley Professional, 1995.

[29] Galileo. http://en.wikipedia.org/wiki/Galileopositioningsystem.

[30] Ganger, G. R., Worthington, B. L., and Patt, Y. N. The disksim simulation environ-ment - version 2.0 reference manual.

[31] Gemmell, J., Bell, G., Lueder, R., Drucker, S., and Wong, C. Mylifebits: Fulfillingthe memex vision. InProceedings of the 10th annual ACM International Conferenceon Multimedia (MM’02), Juan Les Pins, France(December 2002), pp. 235–238.

[32] Gemmell, J., Williams, L., Wood, K., Lueder, R., and Bell, G. Passive capture andensuing issues for a personal lifetime store. InProceedings of the 1st ACM Workshopon Continuous Archival and Retrieval of Personal Experience (CARPE’04), NewYork, NY(October 2004), pp. 48–55.

[33] find the latitude and longitude of any us address. http://www.geocoder.us.

[34] Goyal, P., Guo, X., and Vin, H.M. A hierarchical cpu scheduler for multimedia oper-ating systems. InProceedings of the 2nd USENIX Symposium on Operating SystemsDesign and Implementation (OSDI’96), Seattle, WA(October 1996), pp. 107–122.

[35] Why modernize gps? http://www.gps.oma.be/gb/moderngb ok css.htm.

[36] Gpsdrive 2.09. http://www.gpsdrive.cc/.

[37] Grimm, R. System support for pervasive applications. PhD thesis, University ofWashington, Department of Computer Science and Engineering, December 2002.

[38] Grunwald, D., Levis, P., Farkas, K.I., III, C.B. Morrey, and Neufald, M. Policiesfor dynamic clock scheduling. InProceedings of the 4th USENIX Symposium onOperating Systems Design and Implementation (OSDI’00), San Diego, CA(October2000), pp. 73–86.

146

[39] Gurumurthi, S., Sivasubramaniam, A., Kandemir, M., and Franke, H. Drpm: Dy-namic speed control for power management in server class disks. InProceed-ings of the 30th IEEE Annual International Symposium on Computer Architecture(ISCA’03), San Diego, CA(June 2003).

[40] Gurumurthi, S., Sivasubramaniam, A., Kandemir, M., and Franke, H. Reducing diskpower consumption in servers.IEEE Computer: Special Issue on Power-aware andTemperature-aware Computing 36, 12 (December 2003), 59–66.

[41] Hahnel, D., Burgard, W., Fox, D., Fishkin, K., and Philipose, M. Mapping andlocalization with rfid technology. InProceedings of the 2004 IEEE InternationalConference on Robotics and Automation (ICRA’05), Barcelona, Spain(April 2004),pp. 1015–1020.

[42] Hahnel, D., Burgard, W., Fox, D., and Thrun, S. An efficient fastslam algorithm forgenerating maps of large-scale cyclic environments from raw laser range measure-ments. InProceedings of the 2003 IEEE/RSJ International Conference on IntelligentRobots and Systems (IROS’03), Las Vegas, NV(October 2003), pp. 206–211.

[43] Harter, A., Hopper, A., Steggles, P., Ward, A., and Webster, P. The anatomy of acontext-aware application. InProceedings of the 5th annual ACM/IEEE Interna-tional Conference on Mobile Computing and Networking (MobiCom’99), Seattle,WA(August 1999), pp. 59–68.

[44] Hightower, J., and Borriello, G. Location systems for ubiquitous computing.Com-puter 34, 8 (August 2001), 57–66.

[45] Hightower, J., Want, R., and Borriello, G. Spoton: An indoor 3d location sensingtechnology based on rf signal strength. Tech. Rep. 00-02-02, University of Wash-ington, 2000.

[46] Hill, J., and Culler, D. Mica: a wireless platform for deeply embedded networks.IEEE Micro 22, 6 (November/December 2002), 1224.

[47] Jin, R., Chai, J. Y., and Si, L. Effective automatic image annotation via a coher-ent language model and active learning. InProceedings of the 12th annual ACMInternational Conference on Multimedia (MM’04), New York, NY(October 2004),pp. 892–899.

[48] Johanson, B., Fox, A., and Winograd, T. The interactive workspaces project: Expe-riences with ubiquitous computing rooms.IEEE Pervasive Computing 1, 2 (2002).

[49] Julier, S. J., and Uhlmann, J. K. A new extension of the kalman filter to non-linear systems. InProceedings of the SPIE 11th International Symposium onAerospace/Defense Sensing, Simulation and Controls (AeroSense’97), Orlando,Florida (April 1997), pp. 182–193.

147

[50] K.Govil, E.Chan, and Wasserman, H. Comparing algorithms for dynamic speed-setting of a low-power cpu. InProceedings of the 1st Mobile ACM/IEEE Inter-national Conference on Computing and Networking Conference (MobiCom’95),Berkeley, CA(November 1995), pp. 13–25.

[51] Kindberg, T., and et. al. People, places, things: Web presence for the real world.Mobile Networks 7, 5 (October 2002).

[52] Li, B., and Goh, K. Confidence-based dynamic ensemble for image annotation andsemantics discovery. InProceedings of the 11th annual ACM International Confer-ence on Multimedia (MM’03), Berkeley, CA(November 2003), pp. 195–206.

[53] Liu, X., Corner, M., and Shenoy, P. Seva: Sensor-enhanced video annotation. InPro-ceedings of the 13th ACM Annual Conference on Multimedia (MM’05), Singapore(November 2005), pp. 618–627.

[54] Liu, X., Corner, M., and Shenoy, P. Ferret: Rfid localization for pervasive multime-dia. In Proceedings of the 8th International Conference on Ubiquitous Computing(UbiComp’06), Orange County, CA(September 2006).

[55] Liu, X., Shenoy, P., and Corner, M. Chameleon: Application controlled power man-agement with performance isolation. Tech. Rep. 04-26, University of MassachusetsAmherst, Department of Computer Science, 2004.

[56] Liu, X., Shenoy, P., and Corner, M. Chameleon: Application level power manage-ment with performance isolation. InProceedings of the 13th ACM Annual Confer-ence on Multimedia (MM’05), Singapore(November 2005), pp. 839–847.

[57] Liu, X., Shenoy, P., and Gong, W. A time series-based approach for power manage-ment in mobile processors and disks. InProceedings of the 14th ACM InternationalWorkshop on Network and Operating Systems Support for Digital Audio and Video(NOSSDAV’04), Cork, Ireland(June 2004), pp. 74–79.

[58] Liu, X., Shenoy, P., and Gong, W. A time series-based approach for power manage-ment in mobile processors and disks. Tech. Rep. 04-25, University of MassachusettsAmherst, Department of Computer Science, 2004.

[59] Lorch, J R., and Smith, A J. Improving dynamic voltage scaling algorithms withpace. InProceedings of the 2001 ACM SIGMETRICS Conference, Cambridge, MA(June 2001), pp. 50–61.

[60] Lorch, J R., and Smith, A J. Operating system modifications for task-based speedand voltage scheduling. InProceedings of the 1st ACM/USENIX International Con-ference on Mobile Systems, Applications, and Services (MobiSys’03), San Francisco,CA (May 2003), pp. 215–229.

[61] Lu, Z., Hein, J., Humphrey, M., Stan, M., Lach, J., and Skadron, K. Control-theoreticdynamic frequency and voltage scaling for multimedia workloads. InProceedings

148

of the 3rd ACM/IEEE International Conference on Compilers, Architecture, andSynthesis for Embedded Systems (CASE’01), Greenoble, France(October 2002),pp. 156–163.

[62] Lymberopoulos, D., and Savvides, A. XYZ: A motion-enabled, power aware sen-sor node platform for distributed sensor network applications. InProceedings ofInformation Processing in Sensor Networks (ISPN)(Los Angeles, CA, April 2005).

[63] Mainwaring, A., Polastre, J., Szewczyk, R., Culler, D., and Anderson, J. Wirelesssensor networks for habitat monitoring. InProceedings of the 1st ACM InternationalWorkshop on Wireless Sensor Networks and Applications (WSNA’02), Atlanta, GA(September 2002), pp. 88–97.

[64] Mesarina, M., and Turner, Y. Reduced energy decoding of mpeg streams. InProceedings of the ACM/SPIE Multimedia Computing and Networking Conference(MMCN) (January 2002), pp. 73–84.

[65] Mohapatra, S., Cornea, R., Dutt, N., Nicolau, A., and Venkatasubramanian, N. In-tegrated power management for video streaming to mobile handheld devices. InProceedings of the 11th ACM International Conference on Multimedia (MM’03),Berkeley, CA(November 2003), pp. 582–591.

[66] Mplayer 0.90. http://www.mplayerhq.hu.

[67] Naaman, M., Harada, S., Wang, Q., Garcia-Molina, H., and Paepcke, A. Contextdata in geo-referenced digital photo collections. InProceedings of the 12th annualACM International Conference on Multimedia (MM’04), New York, NY(October2004), pp. 196–203.

[68] Naaman, M., Paepcke, A., and Garcia-Molina, H. From where to what: Metadatasharing for digital photographs with geographic coordinates. InProceedings of the10th International Conference on Cooperative Information Systems (CoopIS’03),Catania, Sicily(November 2003), pp. 196–217.

[69] Nack, F., and Putz, W. Designing annotation before it’s needed. InProceedingsof the 9th annual ACM International Conference on Multimedia (MM’01), Ottawa,Canada(September 2001), pp. 251–260.

[70] Ni, L. M., Liu, Y., Lau, Y. C., and Patil, A. P. Landmarc: Indoor location sensingusing active rfid. InProceedings of the 1st IEEE International Conference on Perva-sive Computing and Communications (PerCom’03), Dallas-Fort Worth, TX(March2003), pp. 407–417.

[71] Noble, B., Satyanarayanan, M., and Price, M. A programming interface forapplication-aware adaptation in mobile computing. InProceedings of the 2ndUSENIX Symposium on Mobile and Location-independent Computing (MLICS’95),Ann Arbor, MI (April 1995), pp. 57–66.

149

[72] Polastre, J., Szewczyk, R., and Culler, D. Telos: Enabling ultra-low power wirelessresearch. InProceedings of the 4th International Conference on Information Pro-cessing in Sensor Networks: Special track on Platform Tools and Design Methodsfor Network Embedded Sensors (IPSN/SPOTS)(April 2005).

[73] Pouwelse, J., Langendoen, K., Lagendijk, I., and Sips, H. Power-aware video de-coding. InProceedings of the 22nd Picture Coding Symposium (PCS’01), Seoul,Korea(April 2001), pp. 303–306.

[74] Pouwelse, J., Langendoen, K., and Sips, H. Application-directed voltage scaling.IEEE Transactions on Very Large Scale Integration Systems (VLSI) 11, 5 (October2003), 812–826.

[75] Priyantha, N. B., Chakraborty, A., and Balakrishnan, H. The cricket location-supportsystem. InProceedings of the 6th annual ACM International Conference on MobileComputing and Networking (MobiCom’00), Boston, MA(August 2000), pp. 32–43.

[76] Roh, S., Park, J. H., Lee, Y. H., and Choi, H. R. Object recognition of robot using3d rfid system. InProceedings of the 2005 International Conference on Control,Automation and Systems (ICCAS’05), Gyeong Gi, Korea(June 2005).

[77] Roman, M., Hess, C., and Campbell, R. Gaia: An oo middleware infrastructure forubiquitous computing environments. InECOOP Workshop on Object-Orientationand Operating Systems(Malaga, Spain, June 2002).

[78] Shenoy, P., and Radkov, P. Proxy-assisted power-friendly streaming to mobile de-vices. InProceedings of the 2003 Multimedia Computing and Networking Confer-ence (MMCN’03), Santa Clara, CA(Janauary 2003), pp. 177–191.

[79] Simon, D.Optimal State Estimation, first ed. Wiley-Interscience, 2006.

[80] Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., and Jain, R. Content-based image retrieval at the end of the early years.IEEE Transactions on PatternAnalysis and Machine Intelligence 22, 12 (December 2000), 1349–1380.

[81] Smith, A., Balakrishnan, H., Goraczko, M., and Priyantha, N. Tracking moving de-vices with the cricket location system. InProceedings of the 2nd ACM InternationalConference on Mobile Systems, Applications, and Services (MobiSys’04), Boston,MA (June 2004), pp. 190–202.

[82] Son, D., Yu, C., and Kim, H. Dynamic voltage scaling on mpeg decoding. InProceedings of the 10th IEEE International Conference on Parallel and DistributedSystems (ICPADS’01), KyongJu City, Korea(June 2001), pp. 633–640.

[83] Stargate gateway by crossbow. http://www.xbow.com/Products/xscale.htm.

[84] Su, N. M., Park, H., Bostrom, E., Burke, J., Srivastava, M. B., and Estrin, D. Aug-memting film and video footage with sensor data. InProceedings of the 2nd IEEEAnnual Conference on Pervasive Computing and Communications (PerComm’04),Orlando, FL(March 2004), pp. 3–12.

150

[85] Tamai, M., Sun, T., Yasumoto, K., Shibata, N., and Ito, M. Energy-aware videostreaming with qos control for portable computing devices. InProceedings of the14th ACM International Workshop on Network and Operating Systems Support forDigital Audio and Video (NOSSDAV’04), Cork, Ireland(June 2004), pp. 68–73.

[86] Crosoe tm5600 processor data sheet. Transmeta Inc., http://www.transmeta.com.

[87] Toyama, K., Logan, R., and Roseway, A. Geographic location tags on digital images.In Proceedings of the 11th annual ACM International Conference on Multimedia(MM’03), Berkeley, CA(November 2003), pp. 156–166.

[88] Ibm hard disk - travelstart 40gnx. IBM, http://www.ibm.com.

[89] Want, R. An introduction to rfid technology.IEEE Pervasive Computing 5, 1(January–March 2006), 25–33.

[90] Want, R., Hopper, A., Falcao, V., and Gibbons, J. The active badge location system.ACM Transactions on Information Systems (TOIS) 10, 1 (January 1992), 91–102.

[91] Ward, A., Jones, A., and Hopper, A. A new location technique for the active office.IEEE Personal Communications Magazine 4, 5 (October 1997), 42–47.

[92] Weicker, R. P. Dhrystone: a synthetic systems programming benchmark.Commu-nications of the ACM 27, 10 (October 1984), 1013–1030.

[93] Weiser, M., Welch, B., Demers, A.J., and Shenker, S. Scheduling for reduced cpuenergy. InProceedings of the 1st USENIX Symposium on Operating Systems Designand Implementation (OSDI’94), Monterey, CA(November 1994), pp. 13–23.

[94] Welch, G., and Bishop, G. An introduction to the kalman filter. Tech. Rep. 95-041, University of North Carolina at Chapel Hill, Department of Computer Science,1995.

[95] Wu, Q., Juang, P., Martonosi, M., and Clark, D. Formal online methods for volt-age/frequency control in multiple clock domain microprocessors. InProceedings ofthe 11th ACM International Conference on Architectural Support for ProgrammingLanguages and Operating Systems (ASPLOS’04), Boston, Massachusetts(October2004).

[96] Yuan, W., and Nahrstedt, K. Energy-efficient soft real-time cpu scheduling for mo-bile multimedia systems. InProceedings of the 19th ACM Symposium on OperatingSystems Principles (SOSP’03), Bolton Landing, NY(October 2003), pp. 149–163.

[97] Yuan, W., and Nahrstedt, K. Practical voltage scaling for mobile multimedia devices.In Proceedings of the 12th ACM International Conference on Multimedia (MM’04),New York, NY(October 2004), pp. 924–931.

151

[98] Zeng, H., Ellis, C., Lebeck, A., and Vahdat, A. Ecosystem: Managing energy asa first class operating system resource. InProceedings of the 10th InternatioanlConference on Architectural Support for Programming Languages and OperatingSystems (ASLOS-X), San Jose, CA(October 2002), pp. 123–132.

[99] Zeng, H., Ellis, C., Lebeck, A., and Vahdat, A. Currentcy: A unifying abstraction forexpressing energy management policies. InProceedings of 2003 USENIX AnnualTechnical Conference, San Antonio, Texas(June 2003).

[100] Zhang, L., Hu, Y., Li, M., Ma, W., and Zhang, H. Effective propagation for faceannotation in family albums. InProceedings of the 12th annual ACM InternationalConference on Multimedia (MM’04), New York, NY(October 2004), pp. 716–723.

152

Documents

SYSTEM SUPPORT FOR PERVASIVE MULTIMEDIA SYSTEMSlass.cs.umass.edu/theses/xiaotao.pdf · 2007-01-05 · system support for pervasive multimedia systems to address these challenges