Upload
dienert-vieira
View
224
Download
0
Embed Size (px)
Citation preview
OddCI-Ginga: A Platform for High Throughput Computing Using
Digital TV Receivers
Rostand Costa1, 2, Diogo Henrique D. Bezerra1, Diénert A. Vieira1,
Francisco Brasileiro2, Dênio Mariz Sousa1, Guido Souza Filho1
1Digital Video Application Lab (LAVID)
Federal University of Paraíba (UFPB)
João Pessoa, PB – Brazil
{rostand, diogoh, dienert, denio, guido}@lavid.ufpb.br
2 Distributed Systems Lab (LSD)
Federal University of Campina Grande (UFCG)
Campina Grande, PB – Brazil
{rostand, fubica}@dsc.ufcg.edu.br
Abstract — OddCI is a new architecture for distributed
computing that is, at same time, flexible and highly scalable.
Previous works have demonstrated the theoretical feasibility of
implementing the proposed architecture on a digital television
(DTV) network, but without taking into consideration any
practical issues or details. This paper describes the
implementation of a proof of concept for the architecture,
called OddCI-Ginga, using a testbed based on DTV receivers
compatible with the Brazilian DTV System. Performance tests
using real broadcast transmission and the return channel
demonstrate the feasibility of the model and its usefulness as a platform for efficient and scalable distributed computing.
High throughput computing; bag-of-tasks; distributed
computing infrastructures; broadcast networks; digital tv; set-
top-boxes
I. INTRODUCTION
Parallel computing is a key technology to enable processing the enormous amount of data being generated by an ever-increasing number of sensors, scientific experiments, simulation models and other related data sources. Some data sets are so large that the only feasible way to deal with them, in a reasonable time, is to break their processing into smaller tasks, and run them in parallel on as many processors as one can possibly have access. This approach for parallel processing has been referred in the literature as High Throughput Computing (HTC) [1].
Parallelism at this extremely high scale can only be achieved if there are both a very large number of processing units available [2], as well as a relatively high independence level of the tasks comprising the parallel application.
Fortunately, many parallel application workloads can be mapped into parallel tasks that can be processed completely independently from each other, forming a class of applications known as “bag-of-tasks” (BoT). The fact that the tasks of a BoT application are totally independent, not only makes their scheduling trivial, but also allows for faults to be tolerated with the use of a simple retry mechanism to recover tasks that eventually fail during execution [3]. As a result, BoT applications are less demanding on the quality of service supported by the underlying computational infrastructure.
On the other hand, nowadays, it is more and more common to find devices that combine technologies that appeared initially in different contexts. An extreme example
of such convergence is a cellular phone that, in addition to receive and place calls, is able to capture images and video, access the Internet, and reproduce television broadcasts. These smartphones, together with digital TV receivers, game consoles, tablets, and other similar convergent devices, all of them with the ability to be connected to the Internet, form a vast distributed contingent of computing resources able to execute fairly complex applications. This myriad of devices, computationally capable, well connected, and often underused, if properly coordinated and grouped, represents an enormous opportunity for building distributed computing infrastructures (DCI) of unprecedented scale, amenable to efficiently execute large HTC workloads.
However, the throughput obtained running HTC workloads on a DCI depends not only on the scale that it provides, but also on the overheads associated to its operation. If for one side the size of the processing pool is the main performance enabler, for another side, the coordination effort involved in managing the DCI may represent a limiting factor. To achieve extremely high throughput, it is necessary to operate efficiently in extremely high scale, and also take care that the distribution of tasks to processors, the provision of input data to the tasks, and the collection of the results of the distributed processing do not become performance bottlenecks.
In a previous work, Costa et al. have proposed a general architecture for the assembling of processing resources connected through a broadcast channel [4]. By leveraging on this feature, the On-demand distributed Computing Infrastructure (OddCI) architecture proposed by Costa and colleagues allows for the on-demand assembling of extremely large DCI in a flexible and scalable way. The OddCI architecture can be seen as a generalization of the ideas first described by Batista et al. [5], which proposed the utilization of Digital TV (DTV) receivers to build large computational grids. In this work we go a step further and describe the details of the implementation of an OddCI system atop the Brazilian Digital TV System. This implementation constitutes the first proof of concept for the OddCI architecture, and provides evidences of its viability in a real setting, as well as an idea of the performance that can be achieved by the system.
There are several characteristics that make a DTV system an appropriate setting for supporting the implementation of an OddCI system. Firstly, DTV systems possess a reasonably
fast, and often underutilized, broadcast channel. Secondly, the number of DTV receivers available worldwide is already large, and will continue to increase in the years to come. Thirdly, the DTV receivers offer features that range from improving the image and audio quality of conventional TVs, to the ability of allowing active interaction of the audience with the content that is broadcasted. For this, the DTV receiver has features typical of a personal computer, such as memory, processor, hard disk, operating system and network connection. On the other hand, our choice for the Brazilian Digital TV System (SBTVD) is justified by the active role that some of us had in the development of this standard, as well as on the scale potential that can be materialized in Brazil and the other Latin American countries that have adopted the SBTVD, including the whole of South America, but Colombia.
The rest of the paper is organized as follows. For the sake of self-containment, in Section II we review the general architecture of OddCI systems. Section III discusses the model of operation, modeling and implementation of OddCI-Ginga, an OddCI instantiation based on the SBTVD. In Section IV we present the metrics of interest and how we have conducted the experiments that allowed us evaluate the performance of OddCI-Ginga. In Section V, we present and analyze the results of these experiments. In Section VI we review related work. Finally, we present our concluding remarks in Section VII.
II. ON-DEMAND DISTRIBUTED COMPUTING
INFRASTRUCTURE
The OddCI architecture proposed by Costa et al. consists of a Provider, a Backend, one or more broadcast networks, each containing a broadcast channel and a Controller, and Processing Node Agents (PNA). The latter are programs to be sent and executed in each of the computational resources accessible by the Controller via its corresponding broadcast network. Furthermore, it is assumed that each computational resource has access to a bidirectional channel, called direct channel, which connects it with both the Backend, as well as with its respective Controller (see Figure 1).
Figure 1. OddCI architecture
A brief description of these components is presented in Table I.
TABLE I - DESCRIPTION OF THE COMPONENTES OF THE ODDCI
ARCHITECTURE
Component Description
Provider
It is responsible for creating, managing and
destroying OddCI instances according to the client’s
requests. The Provider is also responsible for client
authentication and checking the client’s authorization
to use the resources that are being requested.
Controller
It is responsible for setting up the infrastructure,
instructed by the Provider. It formats and sends, via
the broadcast channel, control messages and PNA
images (executable) necessary to build and maintain
OddCI instances.
Backend
It is responsible for managing the activities of each
specific running application: tasks distribution, input
data supply, reception and post-processing of the
results generated by the parallel application, etc.
Processing
Node Agents
(PNA)
These are responsible for managing the execution of
the client application on the computational resource,
and the communication with the Controller and the
Backend.
Client
It represents the user who wants to run parallel
applications on the OddCI infrastructure. It requests
OddCI instances and provides the parallel
application, which will be transferred to the
computing resources by the Controller.
Direct Channel
It is a communication channel that allows
bidirectional communication between specific
components of the architecture, normally via an
Internet connection.
Broadcast
Channel
It is a unidirectional channel for data transmission
from the Controller to the resources. It can be a
digital TV channel or a cell phone network, for
example.
The OddCI model focuses on computational resources
that can be accessed simultaneously through messages delivered in a one-to-many way (broadcast). Such messages can contain data and/or programs. It is assumed that the programs may be automatically started after being received by resources reached by the message. Large scale DCIs can be constructed and disassembled on demand on these computational resources. An OddCI instance represents a dynamically built set of computational resources for one Client.
After receiving a valid request from a Client, including the Client’s credential, which contains the needed information for authentication and other access control procedures, the Provider decides, considering the currently active OddCI instances and its Controllers characteristics, if it is possible to meet the requirements contained in the request. If the request is accepted, then the Provider commands the most appropriate Controllers to create a new OddCI instance.
The OddCI instances are built by the Controllers using the computational resources that are connected to them via some broadcast communication technology, able to simultaneously distribute messages to all connected nodes.
Resources of an instance run the PNA component and are discovered and started up via a Wakeup Message (WM) transmitted by the Controller. This message contains, among other things, the executable of the PNA and the Client’s application image.
Backend
ControllerProvider PNABroadcast Channel
Direct ChannelClient
An active PNA regularly sends probes (heartbeat messages) to the Controller to report on its state, and indicate the instance that it currently integrates, if any.
In the next section we discuss how the OddCI architecture can be implemented using the resources and technologies of a DTV system.
III. ODDCI-GINGA
To instantiate the OddCI architecture on a DTV, it is necessary to implement the four software components discussed in the previous section, i.e. the Provider, the Controller, the Backend, and the PNA.
The Provider’s role is played by a TV network that produces and broadcasts national programming for several affiliated stations. The Controller’s role can be played by a station or local repeater of DTV, which holds the concession of the TV channel and whereby will send, along with its programming, control messages (data) to the tuned DTV receivers. The Backend can be deployed as a set of servers under the control of the Client, or a third party, possibly using resources from a public cloud provider. Each PNA is an application that runs on the DTV middleware, present in the DTV receiver, which for the SBTVD is called Ginga [6]. The PNA will use the TCP/IP stack and the return channel (residential Internet) as a direct channel for communication with the Controller and the Backend. Figure 2 identifies the technologies currently available in a DTV system [7] that can be used and how they are associated with the elements of the generic OddCI architecture.
Figure 2. OddCI architecture and the current technologies available on
DTV system for its implementation
A. OddCI-Ginga Operation Model
The OddCI-Ginga works as follows. Initially, the Client asks the Provider to create an OddCI instance, providing the application in a format that allows it to be executed in DTV receivers. The Provider validates the Client and the application’s image and accepts (or not) the request based on historical estimates of the audience and the receivers connected at the time the request is received. Then, the Controller formats and sends a control message to be transmitted via the broadcast channel, which includes an implementation of the PNA component, compatible with the DTV receivers. The station, after validating the Controller and the control message, uses its transmitter to send the control message to all receivers tuned to its channel.
Sending data using the distribution process and execution of interactive applications, as described in the Brazilian DTV standard, is as follows: initially the image content of the
application is serialized as an object carousel in the DSM/CC standard [8], where the folders and files related to the application are encoded in sessions and encapsulated in a MPEG2 Transport Stream (TS) [9]. After coding the data, the application properties such as name, type, class and other main characteristics are defined and structured in the Application Information Table (AIT), and encapsulated into TS packets. After the preparation of the data, the configuration of the Program Map Table (PMT) occurs, with the identification (PID) used by the data TS Object Carousel, and the PID of the AIT. The necessary descriptors are also added to identify the existence of a data stream for a particular program or service. Finally, the data stream is multiplexed with other streams of audio, video and data. The broadcast station then transmits the combined data stream.
All DTV receivers that are tuned in the station’s frequency will receive the control message. Each receiver checks the data stream and performs a processing routine that verifies the integrity of the content received. Data is written obeying the structure of the folders and files configured in the AIT. At the end of the processing, the middleware is notified of the existence of a new application by passing information about the name, type and mode of execution of the application to the application’s manager that selects the presentation module (engine) for the type of application: NCL/Lua [10] or Java DTV [6], for example.
The DTV receiver receives the control message that encapsulates an application (in this case, the PNA) with the AUTOSTART flag activated, which immediately triggers the execution of the PNA. Then, the PNA uses the return channel of the receiver (direct channel) to signal to the Controller its availability to participate in the instance, and, if accepted, loads the Client's application for execution. From this point on, the Client’s application uses the direct channel to obtain tasks and to send results directly to the Backend.
B. The Processing Node Agent
We implemented the PNA using the programming models provided by Ginga (Java and NCL). According to
the OddCI architecture, an active PNA has two states, idle
and busy [4]. In idle state, the PNA is not integrating an
OddCI instance but it is monitoring the broadcast channel
permanently for WM that may have been sent by its
associated Controller. When a WM is received, the PNA
changes from idle to busy, loads and executes the
application image, and stores the identification of the
instance on which it was included. The PNA will remain in
this state until either the application finishes its execution,
or it receives a Reset Message from its associated Controller carrying the identification of the instance to which the PNA
belongs. At this time, the PNA releases the resources used
by the application and returns to the idle state, restarting the
cycle. In both states, the PNA keeps regularly sending
heartbeat messages that contain the PNA’s state and the
identification of the instance to which it belongs, if any. A snippet of a PNA implemented in the Java DTV
language containing its main algorithm is shown in Figure 3.
DSM-CC
MPEG-2 Transport Stream
PNAPNAPNABroadcast
ChannelControllerProvider
DTV Return path
Internet
Xletapplication
PNA Xlet
Middleware
Gateway
CarouselGenerator
Web service
Desktop/web application
Cloudresource
Client BackendDirect
Channel
Dig
ital
TV
Te
chn
olo
gie
sO
dd
CIC
om
po
ne
nts
public class PNA {
protected int state = PNAState.IDLE;
protected String pna_id = "EMPTY";
public void startXlet() throws XletStateChangeException {
while (true) {
pna_id = control.sendHBI(pna_id, state);
if (!pna_id.equals("EMPTY") && control.hasMessage()) {
switch (state) {
case PNAState.IDLE:
if
(control.get(PNAAttribute.MSGTYPE).equals("WAKEUP")) {
state = PNAState.BUSY;
vm =
newVMThread(control.get(PNAAttribute.APPIMAGE));
vm.start();
}
break;
case PNAState.BUSY:
if (!vm.isAlive()) {
state = PNAState.IDLE;
} else {
If
(control.get(PNAAttribute.MSGTYPE).equals("RESET")) {
if (vm.isAlive()) {
vm.stopped = true;
}
if (finalize != null) {
finalize.run();
}
// the PNA is free again
state = PNAState.IDLE;
}
}
break;
}
}
}
}
}
Figure 3. Main algorithm of the PNA in the Java DTV language
C. Provider, Controller, and Backend Components
The Controller and the Backend also have been developed in a complete and fully functional way, with adherence to all basic events described in the sequence diagram in Figure 4. This enabled the observation of the dynamics of the OddCI system, with the Controller interacting with the PNA through the exchange of control messages to create and remove instances, including the transmission of the application’s images.
Figure 4. OddCI-Ginga sequence diagram
To validate the Backend it was created a parallel application, called Primes, with two modules: the client module, which was developed as an application that runs on the DTV receiver, and a server module, which runs on a conventional computer representing the Backend component. The objective of the client module is to process the tasks received from the server module, which are characterized by two numbers representing a discrete numerical range. The client module should calculate all the existing primes in the
range and return the result to the server module. At this point, the client module requests a new task and the cycle restarts.
IV. PERFORMANCE EVALUATION
In order to conduct a preliminary study of the performance of an OddCI-Ginga system we have built a working prototype that allowed all communication flows between the PNA and the Controller (via the broadcast and the direct channels) and the exchange of information between the parallel application and its respective Backend (via the direct channel).
The environment setup for the tests involves a complete DTV system for the transmission and reception of signal following the SBTVD standard, available at the Digital Video Application Lab at the Federal University of Paraíba (Lavid/UFPB), consisting of: carousel generator, multiplexer, modulator, transmitter (low power for local use), and some DTV receivers running the Ginga middleware. The use of a real environment leads to results that are reliable parameters for assessing the performance of the platform on a larger scale.
The following subsections detail what are the metrics that were used to assess the system’s performance, the experiments executed, and the configuration of the environment used in the tests.
A. Performance Metrics
Three specific characteristics of an OddCI-Ginga system were considered to measure the efficiency of the implemented system: a) the speed of the Controller to trigger commands via the broadcast channel; b) the ability of the return channel to receive tasks to be processed and transmit results; and, finally c) the potential of the DTV receiver for processing parallel applications. In this sense, the following performance metrics were measured:
Time for preparing the PNA (σ), which measures
the speed of the OddCI-Ginga to create instances,
and considers the time involved in the
communication Controller-PNA-Backend to start
the execution of the application; it is computed as:
𝜎 = 𝑤 + 𝑑 + 𝑟 + 𝑎 (1)
where 𝑤 is the time of preparation and
transmission of the WM (containing the executable
image of the PNA) from the Controller to the receiver using the broadcast channel (data
carousel), 𝑑 is the processing time of the data
carousel and loading of the PNA image in the
receiver, 𝑟 is the time for sending the data request
from the PNA to the Backend, and 𝑎 is the
response time from the Backend to PNA.
Average processing time (δ), which measures the
average processing time of several tasks of a
parallel application that are executed by the DTV receiver; it accounts the time elapsed since it starts
processing a task received from the Backend (P1)
BackendNodeControllerProviderClient
Loop
Loop
1: Configures OddCIinstance 2: Asks for Instance
creation 3: Broadcast PNA
5: PNA ready notify
6: PNA accepted 6.1: PNA BUSY
13: App Exits
9: App asks a task
10: App receives task8: Probe’s thread sends Heartbeat
7: PNA starts App
14: PNA IDLE
12: App sends results
11: App perform task
4: PNA starts Probe’s thread
until the time it completes the computation of the
task (P2).
B. Description of the Experiments
The first experiment aimed at measuring the time for preparing the PNA (σ), using applications of various sizes. In this sense, eight WM were formatted with sizes of 119, 500, 1000, 1500, 2500, 3500 and 7500 KB.
The clocks of the Controller and the Backend were synchronized using NTP (Network Time Protocol) and the measurement process involved the following steps. The Controller sends a WM to the receiver at time T1. When the message is received by the receiver, this message is interpreted and the PNA is loaded and executed. At the beginning of the execution, the PNA marks the instant of time tA and requests the data to the Backend using the return channel. Upon receiving the request, the Backend logs the time instant tB and sends this timestamp to the PNA in the response of the request. Upon receiving the response, the PNA records the tC time instant. With the instants tA, tB and tC, the PNA calculates the instant T2 = tB - ((tC - tA) / 2), based on the classical algorithm by Cristian [11]. The total transmission time is then computed as T = T2-T1.
We also performed experiments to assess the average processing time (δ) of the DTV receivers. One experiment used the Primes application with interval limits of various magnitudes; the sizes chosen were equal to 10n, with n varying from 1 to 6. In the case of the Primes application, the metric δ is calculated by dividing the size of the numerical range, which is a task of the application ({II, IF}), by total processing time:
𝛿 = 𝐼𝐹−𝐼𝐼
𝑃2− 𝑃1 (2)
Although the Primes application represents a real example (primes factorization has great use in science in general) and is especially suited to the purpose of this stage of the experiment - to stress the computing ability of the receiver, we have also performed tests with a well-known bioinformatics application. The selected application for the tests was BLAST (Basic Local Alignment Search Tool) [12], a bioinformatics algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences. A BLAST search compares a query sequence with a library or database of sequences, and identifies library sequences that resemble the query sequence above a certain threshold. It is available for download at the U.S. National Center for Biotechnology Information (NCBI) website [13]. In this case, the application was implemented in the C++ language as a resident application - which run directly on the operating system of the DTV receiver.
For the sake of comparison, Prime and BLAST were also executed on a reference personal computer. We also conducted a broader assessment of the capacity of DTV receivers considering, besides the reference PC, resources provided by public cloud computing providers. To this end, we did a crossed analysis using the results of a benchmarking executed by Neustar/Webmetrics [14]. The programs used in the Benchmark were ported to the DTV receivers available
and their performance could be assessed considering the same benchmark. Again, these programs were implemented in C++ and run as resident applications.
Unless stated otherwise, all experiments were replicated as many times as necessary, so to obtain average values with confidence intervals those have a maximum error of 5%, and a confidence level of 95%.
C. Configuration of the Testbed
The testbed comprised the following components (their configuration is detailed in Table II): a) TV station, used by the Controller for generating tasks
in the data carousel, multiplexing, modulation and
transmission of WM;
b) DTV receivers to receive and process WM broadcast
over the air by the TV station;
c) two versions of the PNA (one in Ginga-NCL/Lua and
another in Ginga-J), both implementing the behavior described in Section III.B;
d) One client application in two versions (Ginga-
NCL/Lua and Ginga-J), which implements the “sieve of
Eratosthenes” to find prime numbers [15];
e) Two resident applications implemented in C++: a
bioinformatics algorithm and a benchmarking program;
f) Provider, Controller and Backend developed as
network services and executed on conventional PCs.
TABLE II DETAILS OF THE COMPONENTS OF THE TEST
ENVIRONMENT OF THE ODDCI-GINGA
Component Description
TV Station
Linear Modulator ISMOD (ISDB-T Digital Modulator -
Series ISCHIO) and Linear Carousel Generator and
Multiplexer / DommXstream (Installed on an Intel (R)
Xeon (R) x3430 2.4 GHz server with Dektec card, 3 GB
RAM, Gigabit Ethernet network card, 32-bit Ubuntu
Server OS - v. 10.04); maximum rate of the data
carousel set to 1Mbps.
Digital TV
Receivers
Low-end: Proview XPS-1000 (firmware 1.6.70, the
Ginga RCASoft, STMicroelectronics STi7001
processor, Tri-core (audio, video, data) 266 MHz
clock, RAM 256 MB DDR, 32 MB flash memory,
Fast Ethernet (10/100) network card, and adaptation
from the STLinux Operating System.
High-end: PVR based on Intel CE 3100 processor
with 1.06 GHz, RAM 256 MB DDR, Fast Ethernet
(10/100) network card, and adaptation from the Linux
Operating System.
Processing
Node Agent
(PNA):
Version A: in Ginga-NCL/Lua Script, image
(executable) with 116.5 Kb. Version B: in Ginga-J,
image with 20.3 Kb.
Client
Application
Primes application, which implements the “sieve of
Eratosthenes” algorithm to find prime numbers up to a
threshold value. Implemented in two versions: Lua
Script and Ginga-J, the resulting executable had a size
of 2.6 Kb and 10.8 Kb, respectively.
Bioinformatics application: using a cross compiler,
we ported the NCBI Toolkit (blastall and blastcl3
programs) to the low-end DTV receiver used.
Bitcurrents Benchmarking: We implemented the
same algorithms for CPU-intensive tasks (1,000,000
sin and sum operations) and I/O intensive tasks
(sequential search for a record in a 500,000-record file
with 128MB of size) described in the Bitcurrent’s
benchmarking methodology for both DTV receivers
used in our tests (low-end and high-end).
Provider,
Controller e
Backend
Provider, Controller and Backend implemented as
network services running on the middleware
Apache/Tomcatv6.0.33, HTTP protocol for message
exchange using Web framework Grails / Groovy scripts,
MySQL v.5.1 for storing tasks and results in the
backend. In the case of the Provider, a Web interface
was created for clients to request the creation of
instances and communication with the data carousel.
These components were running on a computer with an
Intel (R) Xeon (R) x3363 2.83 GHz, 512 MB RAM,
Gigabit Ethernet network card and 32-bit Ubuntu Server
OS v9.10.
Reference
Personal
Computer
For comparison of performance with the DTV receiver it
was used a notebook with Intel (R) Core (TM) i3-
2310M 2.1 GHz, 4 GB Memory RAM, Fast Ethernet
Network Interface Card and Ubuntu SO 64-bits V11.10.
V. RESULTS AND ANALYSIS
The average times for preparing the PNA for various sizes of images are shown in Figure 5. It is observed that the preparation time grows linearly with the size of the image, as expected. This analysis shows that the preparation time can be estimated reliably, since it depends only on the image size, and has little dependence on other factors involved.
Figure 5. Time to load the PNA
The results for the tests using the Primes application are shown in Figure 6 (log scale), which demonstrated that the low-end DTV receiver is, in average, 27 times slower than the reference PC. Another observation is that the application in the low-end DTV receiver overflows the memory when we tried to process numbers above 106.
Figure 6. Comparison of the runtime for the Primes
application
In the case of the bioinformatics application BLAST, the tests represented varied workloads, and were accomplished using the BLASTALL and BLASTCL3 programs. They were divided into three categories: local processing with small databases (#1-9), local processing with large databases (#10-12) and remote processing (#13-15). A total of 15 experiments were made in the low-end DTV receiver in both “use mode”, with a TV channel tuned, and “standby mode”, with the middleware in an inactive state. The same tests were reproduced in the reference PC. The results for the first two categories are showed in Table III and discussed in the following, while the results for the last category are presented in Table IV and discussed later.
TABLE III. PROCESSING TIME OBTAINED IN THE EXECUTION OF BLASTALL PROGRAM IN THE DTV RECEIVER AND THE PC.
#Test DTV Receiver PC with
x86 Linux (s) In Use (s) Standby (s)
1 3.338 1.356 0.556
2 2.102 1.333 0.041
3 5.185 3.208 0.076
4 0.179 0.117 0.015
5 0.173 0.116 0.016
6 0.175 0.116 0.013
7 1.026 0.612 0.293
8 0.944 0.610 0.023
9 1.642 0.090 0.025
10 0.177 0.118 0.015
11 9314.247 6315.410 213.770
12 38858.298 26973.262 747.372
We used the program BLASTALL with different input parameters. We computed the average performance decrease
for the samples presented in with a confidence level of 90%.
The average performance of the DTV receiver, when
compared to the PC, was 20.6 worse with a maximum error
of 10%. The results also show that the average performance
reduction when comparing the execution times for the DTV
receiver in standby and in normal use is 1.65, with a
maximum error of 17%.
Figure 7. Comparison of access time to a web page
The tests to assess the performance of the direct channel are shown in Figure 7 (vertical logarithmic scale). Through a simple program that uses the direct channel to fetch larger input data from the Backend, tests were conducted to access Web pages with 100, 500, 1,000, 1,500, 2,500, 3,500, 5,000, and 7,000 KB using a standard home access connection of 1Mbps.
The reference computer accessed different pages without problems, whereas the application of the low-end DTV receiver had memory problems with pages larger than 2,500
0
10
20
30
40
50
60
70
80
0 1000 2000 3000 4000 5000 6000 7000 8000
Load
Tim
e (s
)
Image Size (Kb)
PNA Preparation Time
KB. Thus, for comparison, we calculate the projected time for pages above 2,500 KB in the DTV receiver, using linear regression. The time of the receiver is on average 19-fold higher than the reference computer. The difference is smaller than in the previous experiment because it involves the time of data traffic on the link, constant in both cases.
We also tested the capacity of the low-end DTV receiver to communicate appropriately with the Backend using the direct channel to obtain tasks and to send results using the BLASTCL3 program. This program submits a sequence to be looked for in databases of NCBI, receives the result and writes it in a file. As the search processing is run remotely, the most relevant aspect in this experiment is the way how the receiver handles data over the network connections. In this case, as it can be verified in Table IV, there is no significant performance difference between the PC and the receiver. Eventual NCBI server load or network traffic can explain the test #13, in which DTV receiver took less time than the PC.
TABLE IV. PROCESSING TIME OBTAINED IN THE EXECUTION OF BLASTCL3 PROGRAM IN STB AND PC.
#Test DTV Receiver PC with
x86 Linux (s) In Use (s) Standby (s)
13 79.285 77.389 114.240
14 84.916 89.880 82.158
15 449.189 436.174 445.050
We also conducted tests to compare the performance of the DTV receivers with that of virtual machines offered by public cloud computing providers. Our comparison uses the benchmarking that was conducted by the Bitcurrent’s team [16]. We performed the same Bitcurrent’s tests of CPU and I/O intensive-tasks in both low and high-end DTV receivers. The results are consolidated in Table V (average in seconds with confidence level of 95% and maximum error of 2%).
TABLE V. DTV RECEIVER RESULTS
Test DTV Receiver
Low-End High-End
CPU Test 2.55 0.19
I/O Test 12.90 1.48
The complete results of the Bitcurrent’s benchmarking are consolidated in a report [17]. Table VI presents a summary of these results.
TABLE VI. BITCURRENT´S CLOUD BENCHMARKING RESULTS
Test Public PaaS/IaaS Services (in seconds)
Salesforce Google Rackspace Amazon Terremark
1-pixel GIF 0.11 0.25 0.18 0.23 0.23
2-MByte GIF 0.50 1.97 3.25 4.41 5.00
CPU Test 8.13 1.63 2.16 10.03 3.75
I/O Test 6.26 2.03 3.33 19.46 12.35
As can be seen, both DTV receivers performed similar or superior to conventional IaaS and PaaS platforms, especially for the CPU test. Although the tests have been conducted considering that the devices were idle, we also tested the DTV receivers during their normal operation (when the user is watching TV). The performance loss observed was 33% for low-end DTV receivers and 15% for a high-end DTV receiver, but the results remained compatible with those
found in cloud providers. We note that this is a superficial comparison, because we do not have confidence intervals for Bitcurrent's benchmarking.
The evaluation of the low-end receiver processing capability shown that it is in average 27 times slower than a typical personal computer. As the tests involved low cost receptors, representing the worst case, and the observed trend is that the capacity of DTV receivers will keep improving, it is expected that this ratio may become more favorable, as can be seen in the tests with the high-end receiver. However, the fact that the receiver is slower is not necessarily a problem, since the DTV network potential scale is in the order of at least thousands times greater than a traditional computational grid.
The memory limitations of the DTV receivers observed in our experiments should be used to define the suitable applications profile for OddCI instances. As the BoT applications philosophy is that they can be very small, it is perfectly feasible to find applications for which the key requirements are the processing. There are cases in which the memory use is small and constant (which does not increase allocation with time) as pattern matching applications. Thus, adjustments in the BoT application tasks granularity can allow an appropriate use of this infrastructure.
In the experiments, it was possible to verify that the DTV broadcast channel was effective for the OddCI-Ginga purposes. An SBTVD channel has a total bandwidth next to ~18 and ~21 Mbit/s, depending on the configuration [18][19]. The experience shows that broadcast band may have a residual 1-4 Mbit/s for data carousel, considering the required throughput for an H.264 coded Full HD video flow and a safety margin. With 1Mbit/s, the load using a PNA and a typical BoT application takes approximately 10s. With a ~7MB PNA it takes no more than 70s, an acceptable time as the load of the PNA in the receiver is made at the first WM received and ignored in the others.
VI. RELATED WORK
Considering the use of unconventional devices for the construction of infrastructures to run HTC applications, we highlight three systems: the BOINCOID project [20], the Folding@home project [21], and the TVGrid system [5].
The BOINCOID project was created in 2008 and also addresses the use of unconventional devices for running HTC applications with a focus on systems based on the Android operating system. Its main objective is porting the BOINC platform [22] to Android. This initiative enables the participation of a huge number of devices based on Android in volunteer computing projects.
The Folding@home is a distributed computing project designed to perform molecular simulations to understand protein folding, malformations and diseases. It uses the idle time of video game consoles connected to the Internet to obtain a PetaFLOPS scale performance [23]. This experience confirms the trend of using digital devices and shows the emerging high scalability that such devices can offer.
The TVGrid [5] system proposes the use of DTV receivers to run BoT applications. The OddCI architecture is a generalization of the TVGrid idea, and OddCI-Ginga is a
concrete instantiation of an OddCI system atop a DTV system.
Neill at al. [24] investigate the use of a heterogeneous system architecture that combines a traditional computer cluster with a broadband network of embedded set-top box to run parallel applications. The experimental results also confirm that a broadband network of embedded processors is a promising new platform for a variety of computationally-intensive and data-intensive grid applications and already able to deliver significant performance gains for some classes of OPEN MPI applications. Fedak at al. [25] build a platform to experiment on distributed computing over broadband connected low-power devices, called DSL-Lab, that offer the possibility for researchers to experiment in conditions close to what is usually available with domestic connection to the Internet.
Although voluntary computing [26] has proven to be suitable to provide extremely high throughput, this can only be achieved if significant effort is devoted to convince voluntary participants to join the system which, in turn, depends on greater or lesser extent on factors such as the merit and public appeal of the application, the amount of media coverage received, explicit advertisement campaigns in popular media, viral marketing, incentives to volunteers and other public relations activities [27].
Scalability in the deployment is achieved by making this task extremely simple and by having the resource owner actively involved in the system setup. If on one side the involvement of the user allows deployment in millions of resources to be cost-effectively attained, on the other side, it makes the growth of the infrastructure slow and out of the control of the voluntary computing infrastructure provider.
Using an approach like OddCI opens up the opportunity to have different business models where the resources are not offered for free. In the case of non-conventional devices like Digital TV receivers and mobile phones, they can be grouped and coordinated at an appropriate scale by TV station and telephone system operators, respectively. Incentive measures already existing in these contexts, as well as channels of billing and charging can be fully reused, reducing or eliminating the transactional cost of the Provider.
For instance, in the scenario of the OddCI-Ginga, the STB owner can be rewarded in the form of pay-per-view credits, representing a reward with higher added value than the payment of very small amounts of money. By purchasing bulk lots of pay-per-view credits, the Provider increases sales of TV station operators, helping in covering the return of the structure of the network TV.
VII. CONCLUSIONS
In this paper we present the first proof of concept for the OddCI architecture proposed by Costa et al. [4]. The implementation of the OddCI-Ginga system atop a DTV network, setting up a real test bed and evaluating its performance showed not only the feasibility of this approach, but also the fact that it is promising.
In particular, this research attempted to obtain field measurements about the DTV potential on OddCI systems.
Thus it was possible to confirm the linear behavior in broadcast transmission time of control messages, which is dependent of the size of the transmitted information and not dependent of the device's amount achieved; the adequacy of the direct channel to obtain tasks and return results to the Backend component; and, the actual potential for parallel processing of DTV receivers, considering a low cost (entry level) worst case device.
Although not the focus of this work, it is known that to allocate and maintain active a large scale of receptors pool (thousands or millions) is not trivial. First, the devices are prone to failure and, more importantly, to voluntary disconnections, being therefore, volatile. Second, the broadcast channel is efficient but unidirectional, demanding the complementary use of the direct channel for the OddCI instances operation, which must be optimized. In this sense, the focus of further works in progress is to investigate how unpredictability and volatility aspects involved in the coordination and use of broadcast networks computational resources, properly identified and treated, can be circumvented by applying prediction techniques and compensatory algorithms.
Another important issue is related to the energy consumption of devices associated with cable, satellite and other pay TV services. The problem is that, as currently deployed, they can be always running, even when people think they’ve turned them off [28]. Future stages of our research will be devoted to investigate the relationship flop / watt from set-top boxes and PCs and how the energy wasted on stand-by periods of TV receivers can be reused by systems OddCI.
REFERENCES
[1] M. Litzkow, M. Livny and M. Mutka. “Condor - A Hunter of Idle Workstation,” Proceedings of the 8th International Conference of
Distributed Computing Systems, IEEE Comput. Soc. Press, 1988.
[2] R. Costa, F. Brasileiro, G. Lemos and D. Mariz. “Analyzing the impact of elasticity on the profit of cloud computing providers,”
IEEE/ACM. Proceedings of the 2nd International Workshop on Cloud Computing and Scientific Applications (CCSA’12), in conjunction
with CCGRID’12. May 2012.
[3] W. Cirne, D. Paranhos, L. Costa, E. Santos-Neto, F. Brasileiro et al. “Running bag-of-tasks applications on computational grids: the
MyGrid approach,” IEEE. Proceedings of the International Conference on Parallel Processing (ICPP'03). 2003, pp. 407.
[4] R. Costa, F. Brasileiro, G. Lemos and D. Mariz. “OddCI: On-Demand
Distributed Computing Infrastructure,” ACM. Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers.
2009, pp. 1-10 , doi: 10.1145/1646468.1646478.
[5] C. Batista, T. Araujo, D. Anjos, M. Castro, F. Brasileiro and G. Lemos. “TVGrid: A grid architecture to use the idle resources on a
digital TV network,” Proceedings of the 7th IEEE Int. Symposium on Cluster Computing and the Grid. IEEE Computer Society.
Washington, DC, USA. 2007, pp. 823-828, doi: 10.1109/CCGRID.2007.117.
[6] R. Kulesza, J. Lima, Á. Guedes, L. Junior, S. Meira and G. Lemos.
“Ginga-J - An open java-based application environment for interactive digital television services,” Springer Boston. 2011, vol.
365, pp. 34-49. doi:10.1007/978-3-642-24418-6_3.
[7] S. Morris and A. S. Chaigneau. “Interactive TV Standards: A Guide
to MHP, OCAP, and JavaTV,” Focal Press, 2005.
[8] ISO/IEC. ISO/IEC TR 13818.6. “Information technology: Generic
coding of moving pictures and associated audio information. Part 6: Extensions for DSM/CC,” 1998
[9] ISO/IEC. ISO/IEC 13818.2. MPEG Committee International Standard: “Generic coding of moving pictures and associated audio
information: video,” ISOMEG 1994
[10] L. F. Soares, R. F. Rodrigues and M. F. Moreno. “Ginga NCL: the Declarative Environment of the Brazilian Digital TV System,”
Journal of the Brazilian Computer Society. March, 2007, no. 4, vol. 12.
[11] F. Cristian. “Probabilistic clock syncronization,” Distributed
Computing, vol. 3, no. 3. 1989, pp. 146-158, doi:10.1007/BF01784024.
[12] S. F. Altschul, W. Gish, W. Miller, E. W. Myers and F. J. Lipman.
“Basic local alignment search tool,” Journal of Molecular Biology, vol. 215 no. 3. 1990, pp. 403–410, doi:10.1006/jmbi.1990.9999.
[13] NCBI. “Blast,” Available in: http://blast.ncbi.nlm.nih.gov/Blast.cgi.
June 2012.
[14] Neustar Webmetrics. Available in: http://www.webmetrics.com. June 2012.
[15] ARM. (2011) Sieve of eratosthenes. Available in: http://www.keil.com/benchmarks/sieve.asp. June 2012.
[16] Bitcurrent. Available in: http://www.bitcurrent.com/. June 2012. [17] Bitcurrent Team. “The performance of clouds”. Available in
http://www.webmetrics.com/landingpage/bitcurrentcloud2/index.html June 2012.
[18] ABNT 15606-2. “Digital terrestrial television - Data coding and transmission specification for digital broadcasting. Part 2: Ginga-
NCL for fixed and mobile receivers - XML application language for application coding,” ABNT/CEE-85 Digital Television Committee.
NBR 15606-2. June 2011, pp. 1-288.
[19] ABNT 15606-4. “Digital terrestrial television — Data coding and transmission specification for digital broadcasting. Part 4: Ginga-J -
The environment for the execution of procedural applications,” ABNT/CEE-85 Digital Television Committee. NBR 15606-4. May
2010, pp. 1-90.
[20] Boincoid. “An Android Port of the Boinc Platform,” 2011. Available
in: http://boincoid.sourceforge.net. June 2012.
[21] Folding@home. PS3 FAQ, 2011. Available in:
http://folding.stanford.edu/English/FAQ-PS3. June 2012.
[22] D. P. Anderson. “BOINC: A system for public-resource computing and storage,” Grid Computing, 2004. Proceedings of the Fifth
IEEE/ACM International Workshop on Grid Computing (GRID'04). 2004, pp. 4-10, doi:10.1109/GRID.2004.14.
[23] Folding@home. “Petaflop barrier crossed,” Available in:
http://blog.us.playstation.com/2007/09/25/foldinghome-petaflop-barrier-crossed-update/. June 2012.
[24] R. Neill, L. P. Carloni, A. Shabarshin, V. Sigaev and S. Tcherepanov.
“Embedded Processor Virtualization for Broadband Grid Computing,” The Proceedings of the 12th IEEE/ACM International
Conference on Grid Computing (Grid’11). 2011.
[25] G. Fedak, J. P. Gelas, T. Hérault, V. Iniesta, D. Kondo, L. Lefèvre, P. Malécot, L. Nussbaum, A. Rezmerita and O. Richard. “DSL-Lab: a
Platform to Experiment on Domestic Broadband Internet,” Proceedings of the International Symposium on Parallel and
Distributed Computing (ISPDC'10). Istanbul, Turquey, 2010.
[26] D. P. Anderson, J. Cobb, E. Korpela, M. Lebofsky and D. Werthimer. “SETI@Home - an experiment in public-resource computing,”
Communications of the ACM Archive. ACM New York USA, November 2002, vol. 45(11), pp. 56-61, doi:10.1145/581571.581573.
[27] D. P. Anderson and G. Fedak. “The Computational and Storage Potential of Volunteer Computing,” Proceedings of the Sixth IEEE
International Symposium on Cluster Computing and the Grid (CCGRID'06). pp. 73--80. Singapore, May 2006.
[28] Bloomberg. “Stop Cable Boxes From Draining Nation’s Power
Supply,” Available in: http://www.bloomberg.com/news/2011-07-11/stop-cable-boxes-from-draining-the-nation-s-power-supply-
view.html. June, 2012.