9
OddCI-Ginga: A Platform for High Throughput Computing Using Digital TV Receivers Rostand Costa 1, 2 , Diogo Henrique D. Bezerra 1 , Diénert A. Vieira 1 , Francisco Brasileiro 2 , Dênio Mariz Sousa 1 , Guido Souza Filho 1 1 Digital Video Application Lab (LAVID) Federal University of Paraíba (UFPB) João Pessoa, PB Brazil {rostand, diogoh, dienert, denio, guido}@lavid.ufpb.br 2 Distributed Systems Lab (LSD) Federal University of Campina Grande (UFCG) Campina Grande, PB Brazil {rostand, fubica}@dsc.ufcg.edu.br Abstract OddCI is a new architecture for distributed computing that is, at same time, flexible and highly scalable. Previous works have demonstrated the theoretical feasibility of implementing the proposed architecture on a digital television (DTV) network, but without taking into consideration any practical issues or details. This paper describes the implementation of a proof of concept for the architecture, called OddCI-Ginga, using a testbed based on DTV receivers compatible with the Brazilian DTV System. Performance tests using real broadcast transmission and the return channel demonstrate the feasibility of the model and its usefulness as a platform for efficient and scalable distributed computing. High throughput computing; bag-of-tasks; distributed computing infrastructures; broadcast networks; digital tv; set- top-boxes I. INTRODUCTION Parallel computing is a key technology to enable processing the enormous amount of data being generated by an ever-increasing number of sensors, scientific experiments, simulation models and other related data sources. Some data sets are so large that the only feasible way to deal with them, in a reasonable time, is to break their processing into smaller tasks, and run them in parallel on as many processors as one can possibly have access. This approach for parallel processing has been referred in the literature as High Throughput Computing (HTC) [1]. Parallelism at this extremely high scale can only be achieved if there are both a very large number of processing units available [2], as well as a relatively high independence level of the tasks comprising the parallel application. Fortunately, many parallel application workloads can be mapped into parallel tasks that can be processed completely independently from each other, forming a class of applications known as “bag-of-tasks” (BoT). The fact that the tasks of a BoT application are totally independent, not only makes their scheduling trivial, but also allows for faults to be tolerated with the use of a simple retry mechanism to recover tasks that eventually fail during execution [3]. As a result, BoT applications are less demanding on the quality of service supported by the underlying computational infrastructure. On the other hand, nowadays, it is more and more common to find devices that combine technologies that appeared initially in different contexts. An extreme example of such convergence is a cellular phone that, in addition to receive and place calls, is able to capture images and video, access the Internet, and reproduce television broadcasts. These smartphones, together with digital TV receivers, game consoles, tablets, and other similar convergent devices, all of them with the ability to be connected to the Internet, form a vast distributed contingent of computing resources able to execute fairly complex applications. This myriad of devices, computationally capable, well connected, and often underused, if properly coordinated and grouped, represents an enormous opportunity for building distributed computing infrastructures (DCI) of unprecedented scale, amenable to efficiently execute large HTC workloads. However, the throughput obtained running HTC workloads on a DCI depends not only on the scale that it provides, but also on the overheads associated to its operation. If for one side the size of the processing pool is the main performance enabler, for another side, the coordination effort involved in managing the DCI may represent a limiting factor. To achieve extremely high throughput, it is necessary to operate efficiently in extremely high scale, and also take care that the distribution of tasks to processors, the provision of input data to the tasks, and the collection of the results of the distributed processing do not become performance bottlenecks. In a previous work, Costa et al. have proposed a general architecture for the assembling of processing resources connected through a broadcast channel [4]. By leveraging on this feature, the On-demand distributed Computing Infrastructure (OddCI) architecture proposed by Costa and colleagues allows for the on-demand assembling of extremely large DCI in a flexible and scalable way. The OddCI architecture can be seen as a generalization of the ideas first described by Batista et al. [5], which proposed the utilization of Digital TV (DTV) receivers to build large computational grids. In this work we go a step further and describe the details of the implementation of an OddCI system atop the Brazilian Digital TV System. This implementation constitutes the first proof of concept for the OddCI architecture, and provides evidences of its viability in a real setting, as well as an idea of the performance that can be achieved by the system. There are several characteristics that make a DTV system an appropriate setting for supporting the implementation of an OddCI system. Firstly, DTV systems possess a reasonably

GRID2012 - OddCI Ginga - Final

Embed Size (px)

Citation preview

Page 1: GRID2012 - OddCI Ginga - Final

OddCI-Ginga: A Platform for High Throughput Computing Using

Digital TV Receivers

Rostand Costa1, 2, Diogo Henrique D. Bezerra1, Diénert A. Vieira1,

Francisco Brasileiro2, Dênio Mariz Sousa1, Guido Souza Filho1

1Digital Video Application Lab (LAVID)

Federal University of Paraíba (UFPB)

João Pessoa, PB – Brazil

{rostand, diogoh, dienert, denio, guido}@lavid.ufpb.br

2 Distributed Systems Lab (LSD)

Federal University of Campina Grande (UFCG)

Campina Grande, PB – Brazil

{rostand, fubica}@dsc.ufcg.edu.br

Abstract — OddCI is a new architecture for distributed

computing that is, at same time, flexible and highly scalable.

Previous works have demonstrated the theoretical feasibility of

implementing the proposed architecture on a digital television

(DTV) network, but without taking into consideration any

practical issues or details. This paper describes the

implementation of a proof of concept for the architecture,

called OddCI-Ginga, using a testbed based on DTV receivers

compatible with the Brazilian DTV System. Performance tests

using real broadcast transmission and the return channel

demonstrate the feasibility of the model and its usefulness as a platform for efficient and scalable distributed computing.

High throughput computing; bag-of-tasks; distributed

computing infrastructures; broadcast networks; digital tv; set-

top-boxes

I. INTRODUCTION

Parallel computing is a key technology to enable processing the enormous amount of data being generated by an ever-increasing number of sensors, scientific experiments, simulation models and other related data sources. Some data sets are so large that the only feasible way to deal with them, in a reasonable time, is to break their processing into smaller tasks, and run them in parallel on as many processors as one can possibly have access. This approach for parallel processing has been referred in the literature as High Throughput Computing (HTC) [1].

Parallelism at this extremely high scale can only be achieved if there are both a very large number of processing units available [2], as well as a relatively high independence level of the tasks comprising the parallel application.

Fortunately, many parallel application workloads can be mapped into parallel tasks that can be processed completely independently from each other, forming a class of applications known as “bag-of-tasks” (BoT). The fact that the tasks of a BoT application are totally independent, not only makes their scheduling trivial, but also allows for faults to be tolerated with the use of a simple retry mechanism to recover tasks that eventually fail during execution [3]. As a result, BoT applications are less demanding on the quality of service supported by the underlying computational infrastructure.

On the other hand, nowadays, it is more and more common to find devices that combine technologies that appeared initially in different contexts. An extreme example

of such convergence is a cellular phone that, in addition to receive and place calls, is able to capture images and video, access the Internet, and reproduce television broadcasts. These smartphones, together with digital TV receivers, game consoles, tablets, and other similar convergent devices, all of them with the ability to be connected to the Internet, form a vast distributed contingent of computing resources able to execute fairly complex applications. This myriad of devices, computationally capable, well connected, and often underused, if properly coordinated and grouped, represents an enormous opportunity for building distributed computing infrastructures (DCI) of unprecedented scale, amenable to efficiently execute large HTC workloads.

However, the throughput obtained running HTC workloads on a DCI depends not only on the scale that it provides, but also on the overheads associated to its operation. If for one side the size of the processing pool is the main performance enabler, for another side, the coordination effort involved in managing the DCI may represent a limiting factor. To achieve extremely high throughput, it is necessary to operate efficiently in extremely high scale, and also take care that the distribution of tasks to processors, the provision of input data to the tasks, and the collection of the results of the distributed processing do not become performance bottlenecks.

In a previous work, Costa et al. have proposed a general architecture for the assembling of processing resources connected through a broadcast channel [4]. By leveraging on this feature, the On-demand distributed Computing Infrastructure (OddCI) architecture proposed by Costa and colleagues allows for the on-demand assembling of extremely large DCI in a flexible and scalable way. The OddCI architecture can be seen as a generalization of the ideas first described by Batista et al. [5], which proposed the utilization of Digital TV (DTV) receivers to build large computational grids. In this work we go a step further and describe the details of the implementation of an OddCI system atop the Brazilian Digital TV System. This implementation constitutes the first proof of concept for the OddCI architecture, and provides evidences of its viability in a real setting, as well as an idea of the performance that can be achieved by the system.

There are several characteristics that make a DTV system an appropriate setting for supporting the implementation of an OddCI system. Firstly, DTV systems possess a reasonably

Page 2: GRID2012 - OddCI Ginga - Final

fast, and often underutilized, broadcast channel. Secondly, the number of DTV receivers available worldwide is already large, and will continue to increase in the years to come. Thirdly, the DTV receivers offer features that range from improving the image and audio quality of conventional TVs, to the ability of allowing active interaction of the audience with the content that is broadcasted. For this, the DTV receiver has features typical of a personal computer, such as memory, processor, hard disk, operating system and network connection. On the other hand, our choice for the Brazilian Digital TV System (SBTVD) is justified by the active role that some of us had in the development of this standard, as well as on the scale potential that can be materialized in Brazil and the other Latin American countries that have adopted the SBTVD, including the whole of South America, but Colombia.

The rest of the paper is organized as follows. For the sake of self-containment, in Section II we review the general architecture of OddCI systems. Section III discusses the model of operation, modeling and implementation of OddCI-Ginga, an OddCI instantiation based on the SBTVD. In Section IV we present the metrics of interest and how we have conducted the experiments that allowed us evaluate the performance of OddCI-Ginga. In Section V, we present and analyze the results of these experiments. In Section VI we review related work. Finally, we present our concluding remarks in Section VII.

II. ON-DEMAND DISTRIBUTED COMPUTING

INFRASTRUCTURE

The OddCI architecture proposed by Costa et al. consists of a Provider, a Backend, one or more broadcast networks, each containing a broadcast channel and a Controller, and Processing Node Agents (PNA). The latter are programs to be sent and executed in each of the computational resources accessible by the Controller via its corresponding broadcast network. Furthermore, it is assumed that each computational resource has access to a bidirectional channel, called direct channel, which connects it with both the Backend, as well as with its respective Controller (see Figure 1).

Figure 1. OddCI architecture

A brief description of these components is presented in Table I.

TABLE I - DESCRIPTION OF THE COMPONENTES OF THE ODDCI

ARCHITECTURE

Component Description

Provider

It is responsible for creating, managing and

destroying OddCI instances according to the client’s

requests. The Provider is also responsible for client

authentication and checking the client’s authorization

to use the resources that are being requested.

Controller

It is responsible for setting up the infrastructure,

instructed by the Provider. It formats and sends, via

the broadcast channel, control messages and PNA

images (executable) necessary to build and maintain

OddCI instances.

Backend

It is responsible for managing the activities of each

specific running application: tasks distribution, input

data supply, reception and post-processing of the

results generated by the parallel application, etc.

Processing

Node Agents

(PNA)

These are responsible for managing the execution of

the client application on the computational resource,

and the communication with the Controller and the

Backend.

Client

It represents the user who wants to run parallel

applications on the OddCI infrastructure. It requests

OddCI instances and provides the parallel

application, which will be transferred to the

computing resources by the Controller.

Direct Channel

It is a communication channel that allows

bidirectional communication between specific

components of the architecture, normally via an

Internet connection.

Broadcast

Channel

It is a unidirectional channel for data transmission

from the Controller to the resources. It can be a

digital TV channel or a cell phone network, for

example.

The OddCI model focuses on computational resources

that can be accessed simultaneously through messages delivered in a one-to-many way (broadcast). Such messages can contain data and/or programs. It is assumed that the programs may be automatically started after being received by resources reached by the message. Large scale DCIs can be constructed and disassembled on demand on these computational resources. An OddCI instance represents a dynamically built set of computational resources for one Client.

After receiving a valid request from a Client, including the Client’s credential, which contains the needed information for authentication and other access control procedures, the Provider decides, considering the currently active OddCI instances and its Controllers characteristics, if it is possible to meet the requirements contained in the request. If the request is accepted, then the Provider commands the most appropriate Controllers to create a new OddCI instance.

The OddCI instances are built by the Controllers using the computational resources that are connected to them via some broadcast communication technology, able to simultaneously distribute messages to all connected nodes.

Resources of an instance run the PNA component and are discovered and started up via a Wakeup Message (WM) transmitted by the Controller. This message contains, among other things, the executable of the PNA and the Client’s application image.

Backend

ControllerProvider PNABroadcast Channel

Direct ChannelClient

Page 3: GRID2012 - OddCI Ginga - Final

An active PNA regularly sends probes (heartbeat messages) to the Controller to report on its state, and indicate the instance that it currently integrates, if any.

In the next section we discuss how the OddCI architecture can be implemented using the resources and technologies of a DTV system.

III. ODDCI-GINGA

To instantiate the OddCI architecture on a DTV, it is necessary to implement the four software components discussed in the previous section, i.e. the Provider, the Controller, the Backend, and the PNA.

The Provider’s role is played by a TV network that produces and broadcasts national programming for several affiliated stations. The Controller’s role can be played by a station or local repeater of DTV, which holds the concession of the TV channel and whereby will send, along with its programming, control messages (data) to the tuned DTV receivers. The Backend can be deployed as a set of servers under the control of the Client, or a third party, possibly using resources from a public cloud provider. Each PNA is an application that runs on the DTV middleware, present in the DTV receiver, which for the SBTVD is called Ginga [6]. The PNA will use the TCP/IP stack and the return channel (residential Internet) as a direct channel for communication with the Controller and the Backend. Figure 2 identifies the technologies currently available in a DTV system [7] that can be used and how they are associated with the elements of the generic OddCI architecture.

Figure 2. OddCI architecture and the current technologies available on

DTV system for its implementation

A. OddCI-Ginga Operation Model

The OddCI-Ginga works as follows. Initially, the Client asks the Provider to create an OddCI instance, providing the application in a format that allows it to be executed in DTV receivers. The Provider validates the Client and the application’s image and accepts (or not) the request based on historical estimates of the audience and the receivers connected at the time the request is received. Then, the Controller formats and sends a control message to be transmitted via the broadcast channel, which includes an implementation of the PNA component, compatible with the DTV receivers. The station, after validating the Controller and the control message, uses its transmitter to send the control message to all receivers tuned to its channel.

Sending data using the distribution process and execution of interactive applications, as described in the Brazilian DTV standard, is as follows: initially the image content of the

application is serialized as an object carousel in the DSM/CC standard [8], where the folders and files related to the application are encoded in sessions and encapsulated in a MPEG2 Transport Stream (TS) [9]. After coding the data, the application properties such as name, type, class and other main characteristics are defined and structured in the Application Information Table (AIT), and encapsulated into TS packets. After the preparation of the data, the configuration of the Program Map Table (PMT) occurs, with the identification (PID) used by the data TS Object Carousel, and the PID of the AIT. The necessary descriptors are also added to identify the existence of a data stream for a particular program or service. Finally, the data stream is multiplexed with other streams of audio, video and data. The broadcast station then transmits the combined data stream.

All DTV receivers that are tuned in the station’s frequency will receive the control message. Each receiver checks the data stream and performs a processing routine that verifies the integrity of the content received. Data is written obeying the structure of the folders and files configured in the AIT. At the end of the processing, the middleware is notified of the existence of a new application by passing information about the name, type and mode of execution of the application to the application’s manager that selects the presentation module (engine) for the type of application: NCL/Lua [10] or Java DTV [6], for example.

The DTV receiver receives the control message that encapsulates an application (in this case, the PNA) with the AUTOSTART flag activated, which immediately triggers the execution of the PNA. Then, the PNA uses the return channel of the receiver (direct channel) to signal to the Controller its availability to participate in the instance, and, if accepted, loads the Client's application for execution. From this point on, the Client’s application uses the direct channel to obtain tasks and to send results directly to the Backend.

B. The Processing Node Agent

We implemented the PNA using the programming models provided by Ginga (Java and NCL). According to

the OddCI architecture, an active PNA has two states, idle

and busy [4]. In idle state, the PNA is not integrating an

OddCI instance but it is monitoring the broadcast channel

permanently for WM that may have been sent by its

associated Controller. When a WM is received, the PNA

changes from idle to busy, loads and executes the

application image, and stores the identification of the

instance on which it was included. The PNA will remain in

this state until either the application finishes its execution,

or it receives a Reset Message from its associated Controller carrying the identification of the instance to which the PNA

belongs. At this time, the PNA releases the resources used

by the application and returns to the idle state, restarting the

cycle. In both states, the PNA keeps regularly sending

heartbeat messages that contain the PNA’s state and the

identification of the instance to which it belongs, if any. A snippet of a PNA implemented in the Java DTV

language containing its main algorithm is shown in Figure 3.

DSM-CC

MPEG-2 Transport Stream

PNAPNAPNABroadcast

ChannelControllerProvider

DTV Return path

Internet

Xletapplication

PNA Xlet

Middleware

Gateway

CarouselGenerator

Web service

Desktop/web application

Cloudresource

Client BackendDirect

Channel

Dig

ital

TV

Te

chn

olo

gie

sO

dd

CIC

om

po

ne

nts

Page 4: GRID2012 - OddCI Ginga - Final

public class PNA {

protected int state = PNAState.IDLE;

protected String pna_id = "EMPTY";

public void startXlet() throws XletStateChangeException {

while (true) {

pna_id = control.sendHBI(pna_id, state);

if (!pna_id.equals("EMPTY") && control.hasMessage()) {

switch (state) {

case PNAState.IDLE:

if

(control.get(PNAAttribute.MSGTYPE).equals("WAKEUP")) {

state = PNAState.BUSY;

vm =

newVMThread(control.get(PNAAttribute.APPIMAGE));

vm.start();

}

break;

case PNAState.BUSY:

if (!vm.isAlive()) {

state = PNAState.IDLE;

} else {

If

(control.get(PNAAttribute.MSGTYPE).equals("RESET")) {

if (vm.isAlive()) {

vm.stopped = true;

}

if (finalize != null) {

finalize.run();

}

// the PNA is free again

state = PNAState.IDLE;

}

}

break;

}

}

}

}

}

Figure 3. Main algorithm of the PNA in the Java DTV language

C. Provider, Controller, and Backend Components

The Controller and the Backend also have been developed in a complete and fully functional way, with adherence to all basic events described in the sequence diagram in Figure 4. This enabled the observation of the dynamics of the OddCI system, with the Controller interacting with the PNA through the exchange of control messages to create and remove instances, including the transmission of the application’s images.

Figure 4. OddCI-Ginga sequence diagram

To validate the Backend it was created a parallel application, called Primes, with two modules: the client module, which was developed as an application that runs on the DTV receiver, and a server module, which runs on a conventional computer representing the Backend component. The objective of the client module is to process the tasks received from the server module, which are characterized by two numbers representing a discrete numerical range. The client module should calculate all the existing primes in the

range and return the result to the server module. At this point, the client module requests a new task and the cycle restarts.

IV. PERFORMANCE EVALUATION

In order to conduct a preliminary study of the performance of an OddCI-Ginga system we have built a working prototype that allowed all communication flows between the PNA and the Controller (via the broadcast and the direct channels) and the exchange of information between the parallel application and its respective Backend (via the direct channel).

The environment setup for the tests involves a complete DTV system for the transmission and reception of signal following the SBTVD standard, available at the Digital Video Application Lab at the Federal University of Paraíba (Lavid/UFPB), consisting of: carousel generator, multiplexer, modulator, transmitter (low power for local use), and some DTV receivers running the Ginga middleware. The use of a real environment leads to results that are reliable parameters for assessing the performance of the platform on a larger scale.

The following subsections detail what are the metrics that were used to assess the system’s performance, the experiments executed, and the configuration of the environment used in the tests.

A. Performance Metrics

Three specific characteristics of an OddCI-Ginga system were considered to measure the efficiency of the implemented system: a) the speed of the Controller to trigger commands via the broadcast channel; b) the ability of the return channel to receive tasks to be processed and transmit results; and, finally c) the potential of the DTV receiver for processing parallel applications. In this sense, the following performance metrics were measured:

Time for preparing the PNA (σ), which measures

the speed of the OddCI-Ginga to create instances,

and considers the time involved in the

communication Controller-PNA-Backend to start

the execution of the application; it is computed as:

𝜎 = 𝑤 + 𝑑 + 𝑟 + 𝑎 (1)

where 𝑤 is the time of preparation and

transmission of the WM (containing the executable

image of the PNA) from the Controller to the receiver using the broadcast channel (data

carousel), 𝑑 is the processing time of the data

carousel and loading of the PNA image in the

receiver, 𝑟 is the time for sending the data request

from the PNA to the Backend, and 𝑎 is the

response time from the Backend to PNA.

Average processing time (δ), which measures the

average processing time of several tasks of a

parallel application that are executed by the DTV receiver; it accounts the time elapsed since it starts

processing a task received from the Backend (P1)

BackendNodeControllerProviderClient

Loop

Loop

1: Configures OddCIinstance 2: Asks for Instance

creation 3: Broadcast PNA

5: PNA ready notify

6: PNA accepted 6.1: PNA BUSY

13: App Exits

9: App asks a task

10: App receives task8: Probe’s thread sends Heartbeat

7: PNA starts App

14: PNA IDLE

12: App sends results

11: App perform task

4: PNA starts Probe’s thread

Page 5: GRID2012 - OddCI Ginga - Final

until the time it completes the computation of the

task (P2).

B. Description of the Experiments

The first experiment aimed at measuring the time for preparing the PNA (σ), using applications of various sizes. In this sense, eight WM were formatted with sizes of 119, 500, 1000, 1500, 2500, 3500 and 7500 KB.

The clocks of the Controller and the Backend were synchronized using NTP (Network Time Protocol) and the measurement process involved the following steps. The Controller sends a WM to the receiver at time T1. When the message is received by the receiver, this message is interpreted and the PNA is loaded and executed. At the beginning of the execution, the PNA marks the instant of time tA and requests the data to the Backend using the return channel. Upon receiving the request, the Backend logs the time instant tB and sends this timestamp to the PNA in the response of the request. Upon receiving the response, the PNA records the tC time instant. With the instants tA, tB and tC, the PNA calculates the instant T2 = tB - ((tC - tA) / 2), based on the classical algorithm by Cristian [11]. The total transmission time is then computed as T = T2-T1.

We also performed experiments to assess the average processing time (δ) of the DTV receivers. One experiment used the Primes application with interval limits of various magnitudes; the sizes chosen were equal to 10n, with n varying from 1 to 6. In the case of the Primes application, the metric δ is calculated by dividing the size of the numerical range, which is a task of the application ({II, IF}), by total processing time:

𝛿 = 𝐼𝐹−𝐼𝐼

𝑃2− 𝑃1 (2)

Although the Primes application represents a real example (primes factorization has great use in science in general) and is especially suited to the purpose of this stage of the experiment - to stress the computing ability of the receiver, we have also performed tests with a well-known bioinformatics application. The selected application for the tests was BLAST (Basic Local Alignment Search Tool) [12], a bioinformatics algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences. A BLAST search compares a query sequence with a library or database of sequences, and identifies library sequences that resemble the query sequence above a certain threshold. It is available for download at the U.S. National Center for Biotechnology Information (NCBI) website [13]. In this case, the application was implemented in the C++ language as a resident application - which run directly on the operating system of the DTV receiver.

For the sake of comparison, Prime and BLAST were also executed on a reference personal computer. We also conducted a broader assessment of the capacity of DTV receivers considering, besides the reference PC, resources provided by public cloud computing providers. To this end, we did a crossed analysis using the results of a benchmarking executed by Neustar/Webmetrics [14]. The programs used in the Benchmark were ported to the DTV receivers available

and their performance could be assessed considering the same benchmark. Again, these programs were implemented in C++ and run as resident applications.

Unless stated otherwise, all experiments were replicated as many times as necessary, so to obtain average values with confidence intervals those have a maximum error of 5%, and a confidence level of 95%.

C. Configuration of the Testbed

The testbed comprised the following components (their configuration is detailed in Table II): a) TV station, used by the Controller for generating tasks

in the data carousel, multiplexing, modulation and

transmission of WM;

b) DTV receivers to receive and process WM broadcast

over the air by the TV station;

c) two versions of the PNA (one in Ginga-NCL/Lua and

another in Ginga-J), both implementing the behavior described in Section III.B;

d) One client application in two versions (Ginga-

NCL/Lua and Ginga-J), which implements the “sieve of

Eratosthenes” to find prime numbers [15];

e) Two resident applications implemented in C++: a

bioinformatics algorithm and a benchmarking program;

f) Provider, Controller and Backend developed as

network services and executed on conventional PCs.

TABLE II DETAILS OF THE COMPONENTS OF THE TEST

ENVIRONMENT OF THE ODDCI-GINGA

Component Description

TV Station

Linear Modulator ISMOD (ISDB-T Digital Modulator -

Series ISCHIO) and Linear Carousel Generator and

Multiplexer / DommXstream (Installed on an Intel (R)

Xeon (R) x3430 2.4 GHz server with Dektec card, 3 GB

RAM, Gigabit Ethernet network card, 32-bit Ubuntu

Server OS - v. 10.04); maximum rate of the data

carousel set to 1Mbps.

Digital TV

Receivers

Low-end: Proview XPS-1000 (firmware 1.6.70, the

Ginga RCASoft, STMicroelectronics STi7001

processor, Tri-core (audio, video, data) 266 MHz

clock, RAM 256 MB DDR, 32 MB flash memory,

Fast Ethernet (10/100) network card, and adaptation

from the STLinux Operating System.

High-end: PVR based on Intel CE 3100 processor

with 1.06 GHz, RAM 256 MB DDR, Fast Ethernet

(10/100) network card, and adaptation from the Linux

Operating System.

Processing

Node Agent

(PNA):

Version A: in Ginga-NCL/Lua Script, image

(executable) with 116.5 Kb. Version B: in Ginga-J,

image with 20.3 Kb.

Client

Application

Primes application, which implements the “sieve of

Eratosthenes” algorithm to find prime numbers up to a

threshold value. Implemented in two versions: Lua

Script and Ginga-J, the resulting executable had a size

of 2.6 Kb and 10.8 Kb, respectively.

Bioinformatics application: using a cross compiler,

we ported the NCBI Toolkit (blastall and blastcl3

programs) to the low-end DTV receiver used.

Bitcurrents Benchmarking: We implemented the

same algorithms for CPU-intensive tasks (1,000,000

sin and sum operations) and I/O intensive tasks

(sequential search for a record in a 500,000-record file

with 128MB of size) described in the Bitcurrent’s

Page 6: GRID2012 - OddCI Ginga - Final

benchmarking methodology for both DTV receivers

used in our tests (low-end and high-end).

Provider,

Controller e

Backend

Provider, Controller and Backend implemented as

network services running on the middleware

Apache/Tomcatv6.0.33, HTTP protocol for message

exchange using Web framework Grails / Groovy scripts,

MySQL v.5.1 for storing tasks and results in the

backend. In the case of the Provider, a Web interface

was created for clients to request the creation of

instances and communication with the data carousel.

These components were running on a computer with an

Intel (R) Xeon (R) x3363 2.83 GHz, 512 MB RAM,

Gigabit Ethernet network card and 32-bit Ubuntu Server

OS v9.10.

Reference

Personal

Computer

For comparison of performance with the DTV receiver it

was used a notebook with Intel (R) Core (TM) i3-

2310M 2.1 GHz, 4 GB Memory RAM, Fast Ethernet

Network Interface Card and Ubuntu SO 64-bits V11.10.

V. RESULTS AND ANALYSIS

The average times for preparing the PNA for various sizes of images are shown in Figure 5. It is observed that the preparation time grows linearly with the size of the image, as expected. This analysis shows that the preparation time can be estimated reliably, since it depends only on the image size, and has little dependence on other factors involved.

Figure 5. Time to load the PNA

The results for the tests using the Primes application are shown in Figure 6 (log scale), which demonstrated that the low-end DTV receiver is, in average, 27 times slower than the reference PC. Another observation is that the application in the low-end DTV receiver overflows the memory when we tried to process numbers above 106.

Figure 6. Comparison of the runtime for the Primes

application

In the case of the bioinformatics application BLAST, the tests represented varied workloads, and were accomplished using the BLASTALL and BLASTCL3 programs. They were divided into three categories: local processing with small databases (#1-9), local processing with large databases (#10-12) and remote processing (#13-15). A total of 15 experiments were made in the low-end DTV receiver in both “use mode”, with a TV channel tuned, and “standby mode”, with the middleware in an inactive state. The same tests were reproduced in the reference PC. The results for the first two categories are showed in Table III and discussed in the following, while the results for the last category are presented in Table IV and discussed later.

TABLE III. PROCESSING TIME OBTAINED IN THE EXECUTION OF BLASTALL PROGRAM IN THE DTV RECEIVER AND THE PC.

#Test DTV Receiver PC with

x86 Linux (s) In Use (s) Standby (s)

1 3.338 1.356 0.556

2 2.102 1.333 0.041

3 5.185 3.208 0.076

4 0.179 0.117 0.015

5 0.173 0.116 0.016

6 0.175 0.116 0.013

7 1.026 0.612 0.293

8 0.944 0.610 0.023

9 1.642 0.090 0.025

10 0.177 0.118 0.015

11 9314.247 6315.410 213.770

12 38858.298 26973.262 747.372

We used the program BLASTALL with different input parameters. We computed the average performance decrease

for the samples presented in with a confidence level of 90%.

The average performance of the DTV receiver, when

compared to the PC, was 20.6 worse with a maximum error

of 10%. The results also show that the average performance

reduction when comparing the execution times for the DTV

receiver in standby and in normal use is 1.65, with a

maximum error of 17%.

Figure 7. Comparison of access time to a web page

The tests to assess the performance of the direct channel are shown in Figure 7 (vertical logarithmic scale). Through a simple program that uses the direct channel to fetch larger input data from the Backend, tests were conducted to access Web pages with 100, 500, 1,000, 1,500, 2,500, 3,500, 5,000, and 7,000 KB using a standard home access connection of 1Mbps.

The reference computer accessed different pages without problems, whereas the application of the low-end DTV receiver had memory problems with pages larger than 2,500

0

10

20

30

40

50

60

70

80

0 1000 2000 3000 4000 5000 6000 7000 8000

Load

Tim

e (s

)

Image Size (Kb)

PNA Preparation Time

Page 7: GRID2012 - OddCI Ginga - Final

KB. Thus, for comparison, we calculate the projected time for pages above 2,500 KB in the DTV receiver, using linear regression. The time of the receiver is on average 19-fold higher than the reference computer. The difference is smaller than in the previous experiment because it involves the time of data traffic on the link, constant in both cases.

We also tested the capacity of the low-end DTV receiver to communicate appropriately with the Backend using the direct channel to obtain tasks and to send results using the BLASTCL3 program. This program submits a sequence to be looked for in databases of NCBI, receives the result and writes it in a file. As the search processing is run remotely, the most relevant aspect in this experiment is the way how the receiver handles data over the network connections. In this case, as it can be verified in Table IV, there is no significant performance difference between the PC and the receiver. Eventual NCBI server load or network traffic can explain the test #13, in which DTV receiver took less time than the PC.

TABLE IV. PROCESSING TIME OBTAINED IN THE EXECUTION OF BLASTCL3 PROGRAM IN STB AND PC.

#Test DTV Receiver PC with

x86 Linux (s) In Use (s) Standby (s)

13 79.285 77.389 114.240

14 84.916 89.880 82.158

15 449.189 436.174 445.050

We also conducted tests to compare the performance of the DTV receivers with that of virtual machines offered by public cloud computing providers. Our comparison uses the benchmarking that was conducted by the Bitcurrent’s team [16]. We performed the same Bitcurrent’s tests of CPU and I/O intensive-tasks in both low and high-end DTV receivers. The results are consolidated in Table V (average in seconds with confidence level of 95% and maximum error of 2%).

TABLE V. DTV RECEIVER RESULTS

Test DTV Receiver

Low-End High-End

CPU Test 2.55 0.19

I/O Test 12.90 1.48

The complete results of the Bitcurrent’s benchmarking are consolidated in a report [17]. Table VI presents a summary of these results.

TABLE VI. BITCURRENT´S CLOUD BENCHMARKING RESULTS

Test Public PaaS/IaaS Services (in seconds)

Salesforce Google Rackspace Amazon Terremark

1-pixel GIF 0.11 0.25 0.18 0.23 0.23

2-MByte GIF 0.50 1.97 3.25 4.41 5.00

CPU Test 8.13 1.63 2.16 10.03 3.75

I/O Test 6.26 2.03 3.33 19.46 12.35

As can be seen, both DTV receivers performed similar or superior to conventional IaaS and PaaS platforms, especially for the CPU test. Although the tests have been conducted considering that the devices were idle, we also tested the DTV receivers during their normal operation (when the user is watching TV). The performance loss observed was 33% for low-end DTV receivers and 15% for a high-end DTV receiver, but the results remained compatible with those

found in cloud providers. We note that this is a superficial comparison, because we do not have confidence intervals for Bitcurrent's benchmarking.

The evaluation of the low-end receiver processing capability shown that it is in average 27 times slower than a typical personal computer. As the tests involved low cost receptors, representing the worst case, and the observed trend is that the capacity of DTV receivers will keep improving, it is expected that this ratio may become more favorable, as can be seen in the tests with the high-end receiver. However, the fact that the receiver is slower is not necessarily a problem, since the DTV network potential scale is in the order of at least thousands times greater than a traditional computational grid.

The memory limitations of the DTV receivers observed in our experiments should be used to define the suitable applications profile for OddCI instances. As the BoT applications philosophy is that they can be very small, it is perfectly feasible to find applications for which the key requirements are the processing. There are cases in which the memory use is small and constant (which does not increase allocation with time) as pattern matching applications. Thus, adjustments in the BoT application tasks granularity can allow an appropriate use of this infrastructure.

In the experiments, it was possible to verify that the DTV broadcast channel was effective for the OddCI-Ginga purposes. An SBTVD channel has a total bandwidth next to ~18 and ~21 Mbit/s, depending on the configuration [18][19]. The experience shows that broadcast band may have a residual 1-4 Mbit/s for data carousel, considering the required throughput for an H.264 coded Full HD video flow and a safety margin. With 1Mbit/s, the load using a PNA and a typical BoT application takes approximately 10s. With a ~7MB PNA it takes no more than 70s, an acceptable time as the load of the PNA in the receiver is made at the first WM received and ignored in the others.

VI. RELATED WORK

Considering the use of unconventional devices for the construction of infrastructures to run HTC applications, we highlight three systems: the BOINCOID project [20], the Folding@home project [21], and the TVGrid system [5].

The BOINCOID project was created in 2008 and also addresses the use of unconventional devices for running HTC applications with a focus on systems based on the Android operating system. Its main objective is porting the BOINC platform [22] to Android. This initiative enables the participation of a huge number of devices based on Android in volunteer computing projects.

The Folding@home is a distributed computing project designed to perform molecular simulations to understand protein folding, malformations and diseases. It uses the idle time of video game consoles connected to the Internet to obtain a PetaFLOPS scale performance [23]. This experience confirms the trend of using digital devices and shows the emerging high scalability that such devices can offer.

The TVGrid [5] system proposes the use of DTV receivers to run BoT applications. The OddCI architecture is a generalization of the TVGrid idea, and OddCI-Ginga is a

Page 8: GRID2012 - OddCI Ginga - Final

concrete instantiation of an OddCI system atop a DTV system.

Neill at al. [24] investigate the use of a heterogeneous system architecture that combines a traditional computer cluster with a broadband network of embedded set-top box to run parallel applications. The experimental results also confirm that a broadband network of embedded processors is a promising new platform for a variety of computationally-intensive and data-intensive grid applications and already able to deliver significant performance gains for some classes of OPEN MPI applications. Fedak at al. [25] build a platform to experiment on distributed computing over broadband connected low-power devices, called DSL-Lab, that offer the possibility for researchers to experiment in conditions close to what is usually available with domestic connection to the Internet.

Although voluntary computing [26] has proven to be suitable to provide extremely high throughput, this can only be achieved if significant effort is devoted to convince voluntary participants to join the system which, in turn, depends on greater or lesser extent on factors such as the merit and public appeal of the application, the amount of media coverage received, explicit advertisement campaigns in popular media, viral marketing, incentives to volunteers and other public relations activities [27].

Scalability in the deployment is achieved by making this task extremely simple and by having the resource owner actively involved in the system setup. If on one side the involvement of the user allows deployment in millions of resources to be cost-effectively attained, on the other side, it makes the growth of the infrastructure slow and out of the control of the voluntary computing infrastructure provider.

Using an approach like OddCI opens up the opportunity to have different business models where the resources are not offered for free. In the case of non-conventional devices like Digital TV receivers and mobile phones, they can be grouped and coordinated at an appropriate scale by TV station and telephone system operators, respectively. Incentive measures already existing in these contexts, as well as channels of billing and charging can be fully reused, reducing or eliminating the transactional cost of the Provider.

For instance, in the scenario of the OddCI-Ginga, the STB owner can be rewarded in the form of pay-per-view credits, representing a reward with higher added value than the payment of very small amounts of money. By purchasing bulk lots of pay-per-view credits, the Provider increases sales of TV station operators, helping in covering the return of the structure of the network TV.

VII. CONCLUSIONS

In this paper we present the first proof of concept for the OddCI architecture proposed by Costa et al. [4]. The implementation of the OddCI-Ginga system atop a DTV network, setting up a real test bed and evaluating its performance showed not only the feasibility of this approach, but also the fact that it is promising.

In particular, this research attempted to obtain field measurements about the DTV potential on OddCI systems.

Thus it was possible to confirm the linear behavior in broadcast transmission time of control messages, which is dependent of the size of the transmitted information and not dependent of the device's amount achieved; the adequacy of the direct channel to obtain tasks and return results to the Backend component; and, the actual potential for parallel processing of DTV receivers, considering a low cost (entry level) worst case device.

Although not the focus of this work, it is known that to allocate and maintain active a large scale of receptors pool (thousands or millions) is not trivial. First, the devices are prone to failure and, more importantly, to voluntary disconnections, being therefore, volatile. Second, the broadcast channel is efficient but unidirectional, demanding the complementary use of the direct channel for the OddCI instances operation, which must be optimized. In this sense, the focus of further works in progress is to investigate how unpredictability and volatility aspects involved in the coordination and use of broadcast networks computational resources, properly identified and treated, can be circumvented by applying prediction techniques and compensatory algorithms.

Another important issue is related to the energy consumption of devices associated with cable, satellite and other pay TV services. The problem is that, as currently deployed, they can be always running, even when people think they’ve turned them off [28]. Future stages of our research will be devoted to investigate the relationship flop / watt from set-top boxes and PCs and how the energy wasted on stand-by periods of TV receivers can be reused by systems OddCI.

REFERENCES

[1] M. Litzkow, M. Livny and M. Mutka. “Condor - A Hunter of Idle Workstation,” Proceedings of the 8th International Conference of

Distributed Computing Systems, IEEE Comput. Soc. Press, 1988.

[2] R. Costa, F. Brasileiro, G. Lemos and D. Mariz. “Analyzing the impact of elasticity on the profit of cloud computing providers,”

IEEE/ACM. Proceedings of the 2nd International Workshop on Cloud Computing and Scientific Applications (CCSA’12), in conjunction

with CCGRID’12. May 2012.

[3] W. Cirne, D. Paranhos, L. Costa, E. Santos-Neto, F. Brasileiro et al. “Running bag-of-tasks applications on computational grids: the

MyGrid approach,” IEEE. Proceedings of the International Conference on Parallel Processing (ICPP'03). 2003, pp. 407.

[4] R. Costa, F. Brasileiro, G. Lemos and D. Mariz. “OddCI: On-Demand

Distributed Computing Infrastructure,” ACM. Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers.

2009, pp. 1-10 , doi: 10.1145/1646468.1646478.

[5] C. Batista, T. Araujo, D. Anjos, M. Castro, F. Brasileiro and G. Lemos. “TVGrid: A grid architecture to use the idle resources on a

digital TV network,” Proceedings of the 7th IEEE Int. Symposium on Cluster Computing and the Grid. IEEE Computer Society.

Washington, DC, USA. 2007, pp. 823-828, doi: 10.1109/CCGRID.2007.117.

[6] R. Kulesza, J. Lima, Á. Guedes, L. Junior, S. Meira and G. Lemos.

“Ginga-J - An open java-based application environment for interactive digital television services,” Springer Boston. 2011, vol.

365, pp. 34-49. doi:10.1007/978-3-642-24418-6_3.

[7] S. Morris and A. S. Chaigneau. “Interactive TV Standards: A Guide

to MHP, OCAP, and JavaTV,” Focal Press, 2005.

Page 9: GRID2012 - OddCI Ginga - Final

[8] ISO/IEC. ISO/IEC TR 13818.6. “Information technology: Generic

coding of moving pictures and associated audio information. Part 6: Extensions for DSM/CC,” 1998

[9] ISO/IEC. ISO/IEC 13818.2. MPEG Committee International Standard: “Generic coding of moving pictures and associated audio

information: video,” ISOMEG 1994

[10] L. F. Soares, R. F. Rodrigues and M. F. Moreno. “Ginga NCL: the Declarative Environment of the Brazilian Digital TV System,”

Journal of the Brazilian Computer Society. March, 2007, no. 4, vol. 12.

[11] F. Cristian. “Probabilistic clock syncronization,” Distributed

Computing, vol. 3, no. 3. 1989, pp. 146-158, doi:10.1007/BF01784024.

[12] S. F. Altschul, W. Gish, W. Miller, E. W. Myers and F. J. Lipman.

“Basic local alignment search tool,” Journal of Molecular Biology, vol. 215 no. 3. 1990, pp. 403–410, doi:10.1006/jmbi.1990.9999.

[13] NCBI. “Blast,” Available in: http://blast.ncbi.nlm.nih.gov/Blast.cgi.

June 2012.

[14] Neustar Webmetrics. Available in: http://www.webmetrics.com. June 2012.

[15] ARM. (2011) Sieve of eratosthenes. Available in: http://www.keil.com/benchmarks/sieve.asp. June 2012.

[16] Bitcurrent. Available in: http://www.bitcurrent.com/. June 2012. [17] Bitcurrent Team. “The performance of clouds”. Available in

http://www.webmetrics.com/landingpage/bitcurrentcloud2/index.html June 2012.

[18] ABNT 15606-2. “Digital terrestrial television - Data coding and transmission specification for digital broadcasting. Part 2: Ginga-

NCL for fixed and mobile receivers - XML application language for application coding,” ABNT/CEE-85 Digital Television Committee.

NBR 15606-2. June 2011, pp. 1-288.

[19] ABNT 15606-4. “Digital terrestrial television — Data coding and transmission specification for digital broadcasting. Part 4: Ginga-J -

The environment for the execution of procedural applications,” ABNT/CEE-85 Digital Television Committee. NBR 15606-4. May

2010, pp. 1-90.

[20] Boincoid. “An Android Port of the Boinc Platform,” 2011. Available

in: http://boincoid.sourceforge.net. June 2012.

[21] Folding@home. PS3 FAQ, 2011. Available in:

http://folding.stanford.edu/English/FAQ-PS3. June 2012.

[22] D. P. Anderson. “BOINC: A system for public-resource computing and storage,” Grid Computing, 2004. Proceedings of the Fifth

IEEE/ACM International Workshop on Grid Computing (GRID'04). 2004, pp. 4-10, doi:10.1109/GRID.2004.14.

[23] Folding@home. “Petaflop barrier crossed,” Available in:

http://blog.us.playstation.com/2007/09/25/foldinghome-petaflop-barrier-crossed-update/. June 2012.

[24] R. Neill, L. P. Carloni, A. Shabarshin, V. Sigaev and S. Tcherepanov.

“Embedded Processor Virtualization for Broadband Grid Computing,” The Proceedings of the 12th IEEE/ACM International

Conference on Grid Computing (Grid’11). 2011.

[25] G. Fedak, J. P. Gelas, T. Hérault, V. Iniesta, D. Kondo, L. Lefèvre, P. Malécot, L. Nussbaum, A. Rezmerita and O. Richard. “DSL-Lab: a

Platform to Experiment on Domestic Broadband Internet,” Proceedings of the International Symposium on Parallel and

Distributed Computing (ISPDC'10). Istanbul, Turquey, 2010.

[26] D. P. Anderson, J. Cobb, E. Korpela, M. Lebofsky and D. Werthimer. “SETI@Home - an experiment in public-resource computing,”

Communications of the ACM Archive. ACM New York USA, November 2002, vol. 45(11), pp. 56-61, doi:10.1145/581571.581573.

[27] D. P. Anderson and G. Fedak. “The Computational and Storage Potential of Volunteer Computing,” Proceedings of the Sixth IEEE

International Symposium on Cluster Computing and the Grid (CCGRID'06). pp. 73--80. Singapore, May 2006.

[28] Bloomberg. “Stop Cable Boxes From Draining Nation’s Power

Supply,” Available in: http://www.bloomberg.com/news/2011-07-11/stop-cable-boxes-from-draining-the-nation-s-power-supply-

view.html. June, 2012.