61
> Cloud Computing > Edge Computing > Data Analytics > Cyber-Physical Systems JUNE 2019 www.computer.org

Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

> Cloud Computing> Edge Computing> Data Analytics> Cyber-Physical Systems

JUNE 2019 www.computer.org

Page 2: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

COMPSAC 2019 DAT A DRIVEN INTELLIGENCE

FOR A SMARTER WORLD

Hosted by Marquette University, Milwaukee, Wisconsin, USA July 15�19 www.compsac.org

In the era of "big data" there is an unprecedented increase in the amount of data collected in data warehouses. Extracting meaning and knowledge from these data is crucial for governments and businesses to support their strategic and tactical decision making. Furthermore, artificial intelligence (AI) and machine learning (ML) makes it possible for machines, processing large amounts of such data, to learn and execute tasks never before accomplished. Advances in big data-related technologies are increasing rapidly. For example, virtual assistants, smart cars, and smart home devices in the emerging Internet of Things world, can, we think, make our lives easier. But despite perceived benefits of these technologies/methodologies, there are many challenges ahead. What will be the social, cultural, and economic challenges arising from these developments? What are the technical issue related, for example, to the privacy and security of data used by AI/ML systems? How might humans interact with, rely on, or even trust AI predictions or decisions emanating from these technologies? How can we prevent such data-driven intelligence from being used to make malicious decisions?

Authors are invited to submit original, unpublished research work, as well as industrial practice reports. Simultaneous submission to other publication venues is not permitted. All submissions must adhere to IEEE Publishing Policies, and all will be vetted through the IEEE CrossCheck portal. For full CFP and conference information, please visit the conference website at WWW.COMPSAC.ORG

IMPORTANT DATES April 7, 2019: Paper notifications April 15, 2019: Workshop papers due May 1, 2019: Workshop paper notifications May 17, 2019 - Camera ready submissions and advance author registration due

♦ IEEE-�CW£1TE

Be The Difference.

ORGANIZING COMMITTEE General Chairs: Jean-Luc Gaudiot, University of California, Irvine, USA; Vladimir Getov, University of Westminster, UK Program Chairs in Chief: Morris Chang, University of South Florida, USA; Stelvio Cimato, University of Milan, Italy; Nariyoshi Yamai, Tokyo University of Agriculture&: Technology,Japan Workshop Chairs: Hong Va Leong, Hong Kong Polytechnic University, Hong Kong; Yuuichi Teranishi, National Institute of Information and Communications Technology, Japan; Ji-Jiang Yang, Tsinghua University, China Local Organizing Committee Chair: Praveen Madiraju, Marquette University, USA Standing Committee Chair: Sorel Reisman, California State University, USA Standing Committee Vice Chair: Sheikh Iqbal Ahamed, Marquette Universiity, USA

Page 3: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

STAFF

EditorCathy Martin

Publications Operations Project SpecialistChristine Anthony

Publications Marketing Project SpecialistMeghan O’Dell

Production & DesignCarmen Flores-Garvey

Publications Portfolio ManagersCarrie Clark, Kimberly Sperka

PublisherRobin Baldwin

Senior Advertising CoordinatorDebbie Sims

Circulation: ComputingEdge (ISSN 2469-7087) is published monthly by the IEEE Computer Society. IEEE Headquarters, Three Park Avenue, 17th Floor, New York, NY 10016-5997; IEEE Computer Society Publications Office, 10662 Los Vaqueros Circle, Los Alamitos, CA 90720; voice +1 714 821 8380; fax +1 714 821 4010; IEEE Computer Society Headquarters, 2001 L Street NW, Suite 700, Washington, DC 20036.

Postmaster: Send address changes to ComputingEdge-IEEE Membership Processing Dept., 445 Hoes Lane, Piscataway, NJ 08855. Periodicals Postage Paid at New York, New York, and at additional mailing offices. Printed in USA.

Editorial: Unless otherwise stated, bylined articles, as well as product and service descriptions, reflect the author’s or firm’s opinion. Inclusion in ComputingEdge does not necessarily constitute endorsement by the IEEE or the Computer Society. All submissions are subject to editing for style, clarity, and space.

Reuse Rights and Reprint Permissions: Educational or personal use of this material is permitted without fee, provided such use: 1) is not made for profit; 2) includes this notice and a full citation to the original work on the first page of the copy; and 3) does not imply IEEE endorsement of any third-party products or services. Authors and their companies are permitted to post the accepted version of IEEE-copyrighted material on their own Web servers without permission, provided that the IEEE copyright notice and a full citation to the original work appear on the first screen of the posted copy. An accepted manuscript is a version which has been revised by the author to incorporate review suggestions, but not the published version with copy-editing, proofreading, and formatting added by IEEE. For more information, please go to: http://www.ieee.org/publications_standards/publications/rights/paperversionpolicy.html. Permission to reprint/republish this material for commercial, advertising, or promotional purposes or for creating new collective works for resale or redistribution must be obtained from IEEE by writing to the IEEE Intellectual Property Rights Office, 445 Hoes Lane, Piscataway, NJ 08854-4141 or [email protected]. Copyright © 2019 IEEE. All rights reserved.

Abstracting and Library Use: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy for private use of patrons, provided the per-copy fee indicated in the code at the bottom of the first page is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923.

Unsubscribe: If you no longer wish to receive this ComputingEdge mailing, please email IEEE Computer Society Customer Service at [email protected] and type “unsubscribe ComputingEdge” in your subject line.

IEEE prohibits discrimination, harassment, and bullying. For more information, visit www.ieee.org/web/aboutus/whatis/policies/p9-26.html.

IEEE COMPUTER SOCIETY computer.org • +1 714 821 8380

www.computer.org/computingedge 1

IEEE Computer Society Magazine Editors in Chief

ComputerDavid Alan Grier (Interim), Djaghe LLC

IEEE SoftwareIpek Ozkaya, Software Engineering Institute

IEEE Internet ComputingGeorge Pallis, University of Cyprus

IT ProfessionalIrena Bojanova, NIST

IEEE Security & PrivacyDavid Nicol, University of Illinois at Urbana-Champaign

IEEE MicroLizy Kurian John, University of Texas, Austin

IEEE Computer Graphics and ApplicationsTorsten Möller, University of Vienna

IEEE Pervasive ComputingMarc Langheinrich, University of Lugano

Computing in Science & EngineeringJim X. Chen, George Mason University

IEEE Intelligent SystemsV.S. Subrahmanian, Dartmouth College

IEEE MultiMediaShu-Ching Chen, Florida International University

IEEE Annals of the History of ComputingGerardo Con Diaz, University of California, Davis

Page 4: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

JUNE 2019 • VOLUME 5, NUMBER 6

THEME HERE

8Towards

AI-Powered Multiple Cloud

Management

24Error-Resilient

Server Ecosystems for

Edge and Cloud Datacenters

39Multimodal

Sentiment Analysis:

Addressing Key Issues and Setting

Up the Baselines

Page 5: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

48Computing in

the Real World Is the Grandest of

ChallengesSubscribe to ComputingEdge for free at www.computer.org/computingedge.

Cloud Computing

8 Towards AI-Powered Multiple Cloud Management BENIAMINO DI MARTINO, ANTONIO ESPOSITO, AND ERNESTO

DAMIANI

16 Serverless Computing: From Planet Mars to the Cloud JOSÉ LUIS VÁZQUEZ-POLETTI AND IGNACIO MARTÍN LLORENTE

Edge Computing

24 Error-Resilient Server Ecosystems for Edge and Cloud Datacenters

GEORGIOS KARAKONSTANTIS, DIMITRIOS S. NIKOLOPOULOS, DIMITRIS GIZOPOULOS, PEDRO TRANCOSO, YIANNAKIS SAZEIDES, CHRISTOS D. ANTONOPOULOS, SRIKUMAR VENUGOPAL, AND SHIDHARTHA DAS

28 A Serverless Real-Time Data Analytics Platform for Edge Computing

STEFAN NASTIC, THOMAS RAUSCH, OGNJEN SCEKIC, SCHAHRAM DUSTDAR, MARJAN GUSEV, BOJANA KOTESKA, MAGDALENA KOSTOSKA, BORO JAKIMOVSKI, SASKO RISTOV, AND RADU PRODAN

Data Analytics

36 The Unreasonable Eff ectiveness of Software Analytics TIM MENZIES

39 Multimodal Sentiment Analysis: Addressing Key Issues and Setting Up the Baselines

SOUJANYA PORIA, NAVONIL MAJUMDER, DEVAMANYU HAZARIKA, ERIK CAMBRIA, ALEXANDER GELBUKH, AND AMIR HUSSAIN

Cyber-Physical Systems

48 Computing in the Real World Is the Grandest of Challenges

MARILYN WOLF

51 Society 5.0: For Human Security and Well-Being YOSHIHIRO SHIROISHI, KUNIO UCHIYAMA, AND NORIHIRO

SUZUKI

Departments 4 Magazine Roundup

7 Editor’s Note: Trends in Cloud Computing 72 Conference Calendar

Page 6: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

4 June 2019 Published by the IEEE Computer Society 2469-7087/19/$33.00 © 2019 IEEE

CS FOCUS

T he IEEE Computer Society’s lineup of 12 peer-reviewed tech-

nical magazines covers cut-ting-edge topics ranging from software design and computer graphics to Internet comput-ing and security, from scien-tifi c applications and machine intelligence to visualization and microchip design. Here are highlights from recent issues.

Computer

Enabling Human-Centric Smart Cities: Crowdsourcing-Based Practice in ChinaExisting smart city systems are designed mainly for urban man-agement from a governmen-tal perspective, but they fail to cover the wide spectrum of citi-zens’ daily lives. The authors of

this article from the December 2018 issue of Computer propose a crowdsourcing-based platform through which multiple players can collaborate to produce more abundant, personalized, and proactive services.

Computing in Science & Engineering

Mixed Precision: A Strategy for New Science OpportunitiesSince the days of vector super-computers, computational scien-tists have relied on high-precision arithmetic to accurately solve problems. But changes to hard-ware, spurred by the demand for more computing capability and

Magazine Roundup

Page 7: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

www.computer.org/computingedge 5

growth in machine learning, have researchers considering lower-precision alternatives. Read more in the November/December 2018 issue of Computing in Science & Engineering.

IEEE Annals of the History of Computing

How Modeless Editing Came To BeSeveral word processing and desk-top publishing innovations grew out of 1970s research by Larry Tesler and his collaborators at Stanford and Xerox PARC. One prototype was PUB, a markup language for print publishing. Another was Gypsy, a text editor with a cut/copy/paste command suite that became the de facto standard in desktop publishing and most other applications. Read more in the July–September 2018 issue of IEEE Annals of the History of Computing.

IEEE Computer Graphics and Applications

CellPAINT: Interactive Illustration of Dynamic Mesoscale Cellular EnvironmentsCellPAINT allows non-expert users to create interactive mesoscale illustrations that integrate a vari-ety of biological data. Like popular digital painting software, scenes are created using a palette of molecular “brushes.” The authors of this article from the November/December 2018 issue of IEEE Com-puter Graphics and Applicationsdescribe how the current release

allows creation of animated scenes with an HIV virion, blood plasma, and a simplifi ed T-cell.

IEEE Intelligent Systems

Autonomous Nuclear Waste ManagementThis article from the November/December 2018 issue of IEEE Intelligent Systems focuses on the design, development, and demon-stration of a reconfi gurable rational agent-based robotic system that aims to highly automate nuclear waste management processes. The proposed system is being demonstrated through a down-sized, lab-based setup incorporat-ing a small-scale robotic arm, a time-of-fl ight camera, and a high-level rational agent-based decision making and control framework.

IEEE Internet Computing

Network Neutrality Is About Money, Not PacketsThe topic of network neutrality has occupied engineers, economists, and telecommunication lawyers since at least 2003. Network engi-neers often largely perceive this as yet another debate about quality-of-service, but this constitutes proba-bly the least interesting part of the problem and has served more often to distract from other, more impor-tant, issues. Almost all network neutrality problems are rooted in economic incentives, and thus are likely to require approaches related to pricing and enhancing competi-tion. Read more in the November/December 2018 issue of IEEE Inter-net Computing.

IEEE Micro

Performance Assessment of Emerging Memories through FPGA EmulationEmerging memory technologies off er the prospect of large capac-ity, high bandwidth, and a range of access latencies ranging from DRAM-like to SSD-like. In this article from the January/Febru-ary 2019 issue of IEEE Micro, the authors evaluate the performance of parallel applications on CPUs whose main memories sweep a wide range of latencies within a bandwidth cap. The article high-lights the performance impact of higher latency on concurrent appli-cations and identifi es conditions under which future high-latency memories can eff ectively be used as main memory.

IEEE MultiMedia

Clustering of Musical Pieces through Complex Networks: An Assessment over Guitar SolosMusical pieces can be modeled as complex networks. This fos-ters innovative ways to categorize music, paving the way toward novel applications in multimedia domains, such as music didactics, multimedia entertainment, and digital music generation. Cluster-ing these networks through their main metrics allows for grouping similar musical tracks. To show the viability of the approach, the authors of this article from the October–December 2018 issue of IEEE MultiMedia provide results on a dataset of guitar solos.

Page 8: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

6 ComputingEdge June 2019

MAGAZINE ROUNDUP

IEEE Pervasive Computing

Invisible, Inaudible, and Impalpable: Users’ Preferences and Memory Performance for Digital Content in Thin AirThe authors of this article from the October–December 2018 issue of IEEE Pervasive Computing address novel interfaces that enable users to access digital content “in thin air” in the context of smart spaces that superimpose digital layers upon the topography of the phys-ical environment. The authors collect and evaluate users’ prefer-ences for pinning digital content in thin air. They also set guide-lines for practitioners to assist the

design of novel user interfaces that implement digital content super-imposed on the physical space.

IEEE Security & Privacy

The Need for Speed: An Analysis of Brazilian Malware Classifi ersUsing a dataset containing about 50,000 samples from Brazilian cyberspace, the authors of this article from the November/Decem-ber 2018 issue of IEEE Security & Privacy show that relying solely on conventional machine-learn-ing systems without taking into account the change of the sub-ject’s concept decreases the per-formance of classifi cation. They

emphasize the need to update the decision model immediately after concept drift occurs.

IEEE Software

Relationships between Project Size, Agile Practices, and Successful Software Development: Results and AnalysisLarge-scale software development succeeds more often when using agile methods. Flexible scope, frequent deliveries to produc-tion, a high degree of requirement changes, and more competent pro-viders are possible reasons. Read more in the March/April 2019 issue of IEEE Software.

IT Professional

Lightweight Access Control System for Wearable DevicesWearable devices are being used in health and military environments where, due to information sensi-tivity, it is necessary to be able to control the way information is han-dled by the wearable device. How-ever, current security solutions for wearable devices focus mostly on protecting information access from unauthorized parties. With this in mind, the authors of this article from the January/February 2019 issue of IT Professional propose a wearable device access control system that, in addition to protect-ing the information from unauthor-ized access, allows for defi ning and enforcing a set of restrictions about how the information should be handled.

From the analytical engine to the supercomputer, from Pascal to von Neumann, IEEE Annals of the History of Computing covers the breadth of computer history. � e quarterly publication is an active center for the collection and dissemination of information on historical projects and organizations, oral history activities, and international conferences.

www.computer.org/annals

Page 9: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

2469-7087/19/$33.00 © 2019 IEEE Published by the IEEE Computer Society June 2019 7

EDITOR’S NOTEEDITOR’S NOTEEDITOR’S NOTEEDITOR’S NOTEEDITOR’S NOTEEDITOR’S NOTEEDITOR’S NOTE

T he adoption of cloud computing has grown enormously in the past decade, with the vast majority of organizations

now relying on the cloud. As cloud computing continues to evolve in 2019, new trends are emerg-ing. This issue of ComputingEdge discusses some of the biggest trends in cloud computing: multi-cloud, artifi cial intelligence (AI), and serverless.

More people are choosing to use multiple cloud-computing services in one heterogeneous archi-tecture (multicloud) to improve fl exibility and cut costs. In IEEE Internet Computing’s “Towards AI-Powered Multiple Cloud Management,” the authors describe how AI supports automated management across clouds. Many people are also moving toward serverless computing, where they pay cloud providers for only the resources they consume. Computing in Science & Engineering’s “Serverless Computing: From Planet Mars to the Cloud” presents a serverless architecture for sci-entifi c data analysis.

Cloud computing is often used alongside edge computing in Internet of Things (IoT) applica-tions. Computer’s “Error-Resilient Server Ecosys-tems for Edge and Cloud Datacenters” argues that

more effi cient and resilient hardware and software server stacks are needed for IoT cloud and edge computing. “A Serverless Real-Time Data Analyt-ics Platform for Edge Computing,” from IEEE Inter-net Computing, shows how data from IoT devices can be better analyzed with serverless computing.

Serverless computing isn’t the only new idea in data analytics. IEEE Software’s “The Unreason-able Eff ectiveness of Software Analytics” exam-ines why software analytics can predict software project behavior. IEEE Intelligent Systems’ “Multi-modal Sentiment Analysis: Addressing Key Issues and Setting Up the Baselines” dives into how deep learning is improving multimodal sentiment classifi cation.

Finally, this ComputingEdge issue covers cyber-physical systems (CPS), which integrate computation and physical components. Comput-er’s “Computing in the Real World Is the Grand-est of Challenges” emphasizes the importance of CPS in our modern world. Another article from Computer, “Society 5.0: For Human Security and Well-Being” describes a Japanese initiative aimed at making society more sustainable and prosper-ous through CPS.

Trends in Cloud Computing

Page 10: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

8 June 2019 Published by the IEEE Computer Society 2469-7087/19/$33.00 © 2019 IEEE

23mic01-dimartino-2883839.3d (Style 5) 05-04-2019 15:24

Towards AI-Powered MultipleCloud Management

Beniamino Di MartinoUniversity of Campania “Luigi Vanvitelli”

Antonio EspositoUniversity of Campania “Luigi Vanvitelli”

Ernesto DamianiKhalifa University and University of Milan

Abstract—Cloud users worldwide are looking at the next generation of artificial

intelligence (AI) powered cloud management tools to automate cloud performance

tuning and anomaly detection. To be effective across clouds, AI tools need a common

representation of cloud services and support for machine learning optimization

targeting multiple objectives. We put forward the notion that ontology-based models

can support both.

& CLOUD COMPUTING’S SERVICE model, based

on elastic on-demand allocation of virtual

resources, turned out to be suitable for sup-

porting data-intensive applications such as

artificial intelligence (AI) pipelines, which

exploit cloud scaling to perform large-volume

data ingestion, preparation, model training,

and inference. Today, many companies world-

wide rely on the cloud for large AI workloads,

making cloud management and control a key

issue for AI pipelines. Experience has shown

that different stages of an AI pipeline may

have diverse nonfunctional requirements

(e.g., different data confidentiality levels); for

this reason, more and more organizations

adopt edge-cloud or multicloud deployment

strategies, deploying each pipeline stage on a

different public or private cloud. Multicloud

deployment promises to prevent provider

lock-in, take advantage of dynamic resource

pricing at run-time, and secure the content

exchanged or stored on the multicloud. Early

approaches to multicloud deployment were

mostly programmatic: They consisted of multi-

cloud libraries, which allowed run-time map-

ping of computations to the resources of

multiple cloud providers. However, program-

matic control of multicloud deployment hard-

codes deployment decisions in scripts, which

may lead to lack of flexibility and, ultimately,

to inefficiency and loss of control. Some

recent models of multicloud services, such as

MANTUS,9 support decoupling of the architec-

ture model and the cloud resources used.

This separation allows users to apply dynamic

reconfiguration to tune the resources used onDigital Object Identifier 10.1109/MIC.2018.2883839

Date of current version 6 March 2019.

View from the Cloud

641089-7801 � 2019 IEEE Published by the IEEE Computer Society IEEE Internet Computing

Page 11: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

www.computer.org/computingedge 9

23mic01-dimartino-2883839.3d (Style 5) 05-04-2019 15:24

each cloud according to performance and cost

targets. Still, reconfiguration procedures are

mostly manually coded, and little support is

available for automated management across

clouds. Our own research efforts focused on

cloud architecture representations that rely

on a service ontology for defining their enti-

ties.9 Using the Web Ontology Language for

Services (OWL-S) standard to model cloud

Resources, Services and Patterns5 provides a

high-level framework that can be used not

only for architecture description and mainte-

nance, but also—through automated reason-

ing—can effectively support Multiple Cloud

Service Discovery andBrokering, and architecture

agnostic applications’ development and deploy-

ment. The approach can be extremely useful

when dealing with applications that can be

deployed in amulticloud environment. Let us con-

sider a common business intelligence application

in which an extraction, transformation and load-

ing process retrieves data from a database

(DBMS), a customer relationship management

(CMR), and an enterprise resource planning (ERP)

system, and preprocesses them for further analy-

sis after their storage in a data warehouse system.

TheCMRand ERPcomponents utilize data coming

from their own databases, while data recorded in

the data warehouse are used by the OLAP system

and by the data mining component to perform

market analysis. Introducing such an architecture

arises a series of concerns, first of all regarding

interoperability, since the several application’s

components need to interact to achieve the busi-

ness goal. If each of such components were to be

hosted by different cloud providers, because of

the specific functionalities they provide or for eco-

nomic reasons, one could be concerned about the

real capability of such components to communi-

cate due to differences in the communication

interfaces provided by the providers. A semantic

representation, such as the one that will be pre-

sented in this paper, can ease the communication

difficulties and enable interoperability.

Such capabilities have been demonstrated

through several industrial case studies within

the FP7 mOSAIC project.10 Ontology-based mod-

els’ feasibility for representing the computation

of AI pipelines has been experimented in several

industrial case studies within the H2020 TOREA-

DOR project.2

SEMANTIC REPRESENTATION OFCLOUD PATTERNS AND SERVICES

The semantic representation reported in

Figure 1 has been developed to ease the portabil-

ity and interoperability issues that may arise

when either trying to compose multiple cloud

services, or migrating data and applications

from a platform to another.8 The representa-

tion, of which we report an overview, is consti-

tuted by a multilayered stack of conceptual

models, connected to one another in a graph-

like structure but that are still independent.

The conceptual models are divided into the fol-

lowing patterns.

� The Application Pattern layer describes gen-

eral applications and their components, with

no specific connection to an implementing

technology or platform. The solutions repre-

sented by such patterns can be theoretically

applied to different scenarios, independently

from the platforms, programming languages,

or technologies involved.� The Cloud Pattern layer provides the descrip-

tion of cloud-based solutions to be applied

when implementing applications. The seman-

tic representation helps not only to efficiently

categorize them, but also to identify eventual

connections, similarities, and similar applica-

tion scenarios. As many cloud patterns can be

used together to form more complex one, the

representation also supports pattern compo-

sitions. Cloud Patterns’ components are repre-

sented by cloud services, which are described

in the lower layer.� The Cloud Service layer focuses on the des-

cription of cloud services and functionalities

Figure 1. Semantic representation of cloud patterns and

services.

January/February 2019 65

Page 12: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

10 ComputingEdge June 2019

23mic01-dimartino-2883839.3d (Style 5) 05-04-2019 15:24

they expose. The objective of this layer is to

provide a common semantics for the descrip-

tion of services exposed by cloud platforms

so that comparisons can be done both at func-

tional and operational levels.� The Operation layer describes functionalities

exposed by services and not referring to the

cloud, including resources exposed through

the web.� The Ground layer represents the connec-

tion between the abstract description of

input/output parameters and operations

exposed by services, and their actual

implementation. While the WSDL standard

is the native format used for this purpose,

any other standard way to represent the

invocation interface of cloud or web serv-

ices can be used.

Different technologies have been employed

to implement the layers, but they are all based

on the standard OWL language for ontology

description. The application and cloud patterns

layers have been implemented using a combina-

tion of ODOL,3 a semantic representation of

design patterns which has been augmented and

adapted to describe the static entities of pat-

terns, and OWL-S, an ontology designed to

describe web services which, in our case, has

been exploited to define the dynamic behavior

of patterns. OWL-S has also been used to

describe the services and operations layers,

for which it provides native support. Figure 2

portrays the graphical representation of the

semantic description of patterns and services

that we have briefly introduced. In particular,

Figure 2(a) shows the whole representation,

with all the classes and connections existing

among its components, while Figure 2(b)

zooms in on the pattern class and its direct

connections.

All layers are supported by a background

ontology, which provides a common set of

terms to describe each component of the

described patterns, services, and operations,

and which will be further described in the fol-

lowing sections.

NONFUNCTIONAL PROPERTIES:EXAMPLE FOR LEGAL CONTRAINTS

While functional composition of cloud serv-

ices is by itself a complex, yet interesting, mat-

ter, there are nonfunctional aspects that should

be taken in consideration when selecting a spe-

cific service to fulfill user needs. The semantic

description presented in the previous section

covers functional requirements of applications

and cloud services, making it possible to select

and compose them according to the required

functionalities. However, it does not take in con-

sideration nonfunctional requirements, and legal

constraints regarding the use of data in a specific

environment or country are a strong limitation

when selecting a cloud service. The frame-

work,6 of which we report an overview here,

Figure 2. Graphical representation of the semantic multilayered description of patterns and services.

View from the Cloud

66 IEEE Internet Computing

Page 13: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

www.computer.org/computingedge 11

23mic01-dimartino-2883839.3d (Style 5) 05-04-2019 15:24

provides a user-oriented tool, that is able to

check the compliance of cloud services offered

by different cloud platforms, in respect to the

Italian regulations. A simplified graphical inter-

face allows the user to interact with the frame-

work, enabling her to specify the application’s

requirements. More specifically, users can do

the following tasks.

� Select the nature of the data that will be

treated among a set of predefined categories

(sensitive, health, judicial, or not subject to

data protection).� Decide the scope and aim of the data treat-

ment, by selecting one of the available cate-

gories (scientific, statistical, historical, or

generic).� Specify the cloud service provider she wants

to use for its resources, and eventually nar-

row down the selection of available services

offered by that provider, according to the

requirements of her application.� Decide the location of the data center, choos-

ing among the possible locations offered for

the selected provider.

The framework was developed to run with

the Italian legislation and in particular is

based on the formalization of the Italian Legis-

lative Decree 196/2003 and Italian Code for

Digital Administration. It can be used in differ-

ent ways. The basic usage allows the user to

establish the compliance of a specific service,

running in a certain location or exploiting a

specific data-center, to the regulations. Of

course, the kind of data used and the purpose

of their processing should be known

beforehand. Conversely, the user can discover

the kinds of data processing that are allowed

for a certain combination of cloud provider

and data-center location.

The knowledge base exploited by the frame-

work contains two main information sets:

� the semantic rules derived from the

legislation;� the semantic description of the terms of ser-

vice of the providers’ cloud services (such as

Amazon, Microsoft, IBM).

The first set of information has been obtained

by analyzing, via natural language processing

(NLP) techniques, the reference laws on privacy.

The NLP analysis translated prescriptive senten-

ces into logical rules, while ontologies have been

used to describe the terms of services exposed

by the cloud providers. Figure 3 shows the main

components of the framework and how they are

related.

There are two main components: the back

end, composed of the Ontology Cache, the

OWL Parser, and the SWIPL Facade, and the

front end, represented by the user interface

implemented in HTML. The OWL Parser

extracts information from ontologies coded in

OWL and converts them into Prolog facts that

are then questioned using the rules by the

SWIPL Facade component.

CLOUD ONTOLOGYAn ontology is needed to provide a common

background terminology for all the semantic

layers of the cloud representation. The cloud

ontology7 describes an approach, which aims at

providing such an ontology, via a description of

Figure 3. Architecture of the legislation-aware framework.

January/February 2019 67

23mic01-dimartino-2883839.3d (Style 5) 05-04-2019 15:24

they expose. The objective of this layer is to

provide a common semantics for the descrip-

tion of services exposed by cloud platforms

so that comparisons can be done both at func-

tional and operational levels.� The Operation layer describes functionalities

exposed by services and not referring to the

cloud, including resources exposed through

the web.� The Ground layer represents the connec-

tion between the abstract description of

input/output parameters and operations

exposed by services, and their actual

implementation. While the WSDL standard

is the native format used for this purpose,

any other standard way to represent the

invocation interface of cloud or web serv-

ices can be used.

Different technologies have been employed

to implement the layers, but they are all based

on the standard OWL language for ontology

description. The application and cloud patterns

layers have been implemented using a combina-

tion of ODOL,3 a semantic representation of

design patterns which has been augmented and

adapted to describe the static entities of pat-

terns, and OWL-S, an ontology designed to

describe web services which, in our case, has

been exploited to define the dynamic behavior

of patterns. OWL-S has also been used to

describe the services and operations layers,

for which it provides native support. Figure 2

portrays the graphical representation of the

semantic description of patterns and services

that we have briefly introduced. In particular,

Figure 2(a) shows the whole representation,

with all the classes and connections existing

among its components, while Figure 2(b)

zooms in on the pattern class and its direct

connections.

All layers are supported by a background

ontology, which provides a common set of

terms to describe each component of the

described patterns, services, and operations,

and which will be further described in the fol-

lowing sections.

NONFUNCTIONAL PROPERTIES:EXAMPLE FOR LEGAL CONTRAINTS

While functional composition of cloud serv-

ices is by itself a complex, yet interesting, mat-

ter, there are nonfunctional aspects that should

be taken in consideration when selecting a spe-

cific service to fulfill user needs. The semantic

description presented in the previous section

covers functional requirements of applications

and cloud services, making it possible to select

and compose them according to the required

functionalities. However, it does not take in con-

sideration nonfunctional requirements, and legal

constraints regarding the use of data in a specific

environment or country are a strong limitation

when selecting a cloud service. The frame-

work,6 of which we report an overview here,

Figure 2. Graphical representation of the semantic multilayered description of patterns and services.

View from the Cloud

66 IEEE Internet Computing

Page 14: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

12 ComputingEdge June 2019

23mic01-dimartino-2883839.3d (Style 5) 05-04-2019 15:24

cloud services based on well-known semantic

technologies, such as OWL and OWL-S. Here, we

provide an overview of such an ontology, as it

represents a set of semantic definitions which

the ontology description provided in the previ-

ous sections requires to be effective.

The whole ontology is constituted by a set

of interrelated subontologies, connected

through ad-hoc links and properties, which

define a common representation base for

existing cloud services, together with the

operations they expose and the parameters

they exchange.

Three main layers compose the cloud

ontology.

� The upper layer contains the Agnostic Ser-

vice Description Ontology, which provides a

common terminology to describe cloud

services, resources, operations, and

parameters. Through this ontology, it is

possible to annotate cloud entities by

using a general and shareable catalog of

concepts, which enables their discovery

and comparison.� The central layer is represented by the Cloud

Services Categorization Ontology, which offers

a categorization of cloud services and virtual

appliances. The several categories are based

on the specific functionalities the different

services and appliances offer, as declared

by their respective vendors. By importing the

upper ontology, references to platform

specific services and resources, which are

organized according to the proposed

categorization, can be directly related to

Agnostic descriptions to enable comparisons.� The bottom layer is in turn composed of two

different groups of ontologies, describing

proprietary specific services, operations,

and parameters. In particular, the Cloud Pro-

vider Ontology set defines proprietary con-

cepts that describe a specific cloud

provider’s offer: a single and independent

ontology exists for each cloud provider,

which can be added or removed indepen-

dently, in a modular fashion. Purpose of this

set of ontologies is to describe proprietary

services, resources, and related operations

with the providers’ specific concepts, which

can be categorized against the Cloud Serv-

ices Categorization Ontology and further

annotated through the Agnostic Service

Description Ontology.

On the other hand, the Cloud Services OWL-S

Description set describes the several services

offered by the different providers in terms of

their internal work flow and grounding; in this

way, the ontology provides the information

needed to automatically instantiate such serv-

ices. Figure 4 portrays the aforementioned ontol-

ogy, highlighting the three different layers and

their connections.

SCOPE: A CLOUD SERVICESCOMPOSITION TOOL

The use and management of the semantic

representations proposed in previous sections

require knowledge about the ontology languages

used to describe them, of logical languages,

such as Prolog or SWRL, and of query languages,

such as SPARQL, to interact with them. However,

it is not viable to suppose that all users know

how to build such queries, or that they are famil-

iar with the semantic technologies backing

them, that is why the graphical tool SCOPE has

been created.2 Using the GUI, users can compose

services and patterns just by dragging and drop-

ping boxes and arrows, while the system sug-

gests which connections to establish or services

to use.

The architecture of cloud composer consists

of three main components.

Figure 4. Architecture of the cloud ontology.

View from the Cloud

68 IEEE Internet Computing

Page 15: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

www.computer.org/computingedge 13

23mic01-dimartino-2883839.3d (Style 5) 05-04-2019 15:24

� The Parser, which takes as input OWL plus

OWL-S files, describing the desired services,

pattern, and internal orchestration, and

automatically generate an internal repre-

sentation. Such a representation is used to

keep trace of the modifications made by

the users before actually saving them into

the ontologies.� A Graph model, which is the internal repre-

sentation for cloud patterns and services.

Such a model is implemented as a Java data

structure.� The GUI, which allows visualization, creation,

and modification of all elements defined in

the graph model.

The tool is organized so that the users do not

deal directly with the ontologies describing serv-

ices and patterns, nor with the internal repre-

sentation. The necessary SPARQL queries are

dynamically produced and the ontologies are

modified accordingly using the Jena OWL Web

Ontology Language API. Such an API is at the core

of the Parser component: through Jena, the tool

automatically analyzes the services and patterns

ontologies, and then a graph model is produced

following the semantic-based representation.

The graph model has been implemented in

Java: vertices of the graph are used to represent

both cloud services or patterns participants, while

edges are used to represent both calls between

services and relations existing between patterns’

participants. Using such a representation, the tool

can create a graph that simultaneously describes

both a pattern’s structure andworkflow.

MACHINE LEARNINGAPPROACH TO CLOUDMANAGEMENT

Public cloud providers’ ability to offer

affordable machine learning (ML) services has

triggered interest in using ML for cloud manage-

ment as well. ML techniques analyze patterns

in the cloud entities’ parameters (especially

numerical ones) to classify, predict, and opti-

mize cloud workloads. However, their seamless

application across different clouds requires a

shared representation of such entities. To dis-

cuss how the above ontology-based models for

representing cloud resources and computations

can support ML-driven optimization, let us con-

sider a simple example: an agent using an ML

model for allocating VMs in a multicloud sce-

nario including a public and a private cloud.

To allow the agent to position new VMs, we

could set up an ML classifier targeting categories

{public, private}. The simplest ML model for

doing so is probably a Voronoi (1-NN) model

that represents each VM as a numerical resource

usage vector V ¼ (CPU, RAM, disk) and decides

where to allocate it based on a set of “correct”

allocations, in turn based on experience. The

implementation of our model would periodically

monitor current CPU, disk, and memory usage of

VMs and decide their allocation choosing

between public and private based on the alloca-

tion of the closest VM in the reference set.

The behavior of our ML model clearly

depends on the quality of the reference set,

which needs to represent the best workload dis-

tribution for the specific environment. Perhaps

more importantly, model behavior also

depends on the notion of “closeness,” i.e., on

the definition of distance used. Euclidean dis-

tance between raw vectors is the obvious

choice, but it is not necessarily the distance a

human engineer would want to consider when

writing a configuration cron job for the same

purpose. Using Euclidean distance, two VMs

exhibiting coherent behavior in their usage vec-

tors (all components of one go “up” when the

corresponding ones of the other also go “up” of

the same amount, only with respect to a differ-

ent baseline) could be further apart from each

other than another pair whose loads are uncor-

related but have the same average and vari-

ance. Thus, these two VMs could be matched to

different reference points by the 1-NNmodel, and

different allocation decisions could result, while

the human engineer would make the same alloca-

tion decision for both, disregarding the baseline

difference.

The standard data science solution to this

problem is to recenter and rescale the data, sub-

tracting from each component the average and

dividing by the standard deviation computed

across all VMs. By the way, this adds to the

computational load of the model’s execution,

which uses cloud resources that cannot be billed

to customers.

January/February 2019 69

23mic01-dimartino-2883839.3d (Style 5) 05-04-2019 15:24

cloud services based on well-known semantic

technologies, such as OWL and OWL-S. Here, we

provide an overview of such an ontology, as it

represents a set of semantic definitions which

the ontology description provided in the previ-

ous sections requires to be effective.

The whole ontology is constituted by a set

of interrelated subontologies, connected

through ad-hoc links and properties, which

define a common representation base for

existing cloud services, together with the

operations they expose and the parameters

they exchange.

Three main layers compose the cloud

ontology.

� The upper layer contains the Agnostic Ser-

vice Description Ontology, which provides a

common terminology to describe cloud

services, resources, operations, and

parameters. Through this ontology, it is

possible to annotate cloud entities by

using a general and shareable catalog of

concepts, which enables their discovery

and comparison.� The central layer is represented by the Cloud

Services Categorization Ontology, which offers

a categorization of cloud services and virtual

appliances. The several categories are based

on the specific functionalities the different

services and appliances offer, as declared

by their respective vendors. By importing the

upper ontology, references to platform

specific services and resources, which are

organized according to the proposed

categorization, can be directly related to

Agnostic descriptions to enable comparisons.� The bottom layer is in turn composed of two

different groups of ontologies, describing

proprietary specific services, operations,

and parameters. In particular, the Cloud Pro-

vider Ontology set defines proprietary con-

cepts that describe a specific cloud

provider’s offer: a single and independent

ontology exists for each cloud provider,

which can be added or removed indepen-

dently, in a modular fashion. Purpose of this

set of ontologies is to describe proprietary

services, resources, and related operations

with the providers’ specific concepts, which

can be categorized against the Cloud Serv-

ices Categorization Ontology and further

annotated through the Agnostic Service

Description Ontology.

On the other hand, the Cloud Services OWL-S

Description set describes the several services

offered by the different providers in terms of

their internal work flow and grounding; in this

way, the ontology provides the information

needed to automatically instantiate such serv-

ices. Figure 4 portrays the aforementioned ontol-

ogy, highlighting the three different layers and

their connections.

SCOPE: A CLOUD SERVICESCOMPOSITION TOOL

The use and management of the semantic

representations proposed in previous sections

require knowledge about the ontology languages

used to describe them, of logical languages,

such as Prolog or SWRL, and of query languages,

such as SPARQL, to interact with them. However,

it is not viable to suppose that all users know

how to build such queries, or that they are famil-

iar with the semantic technologies backing

them, that is why the graphical tool SCOPE has

been created.2 Using the GUI, users can compose

services and patterns just by dragging and drop-

ping boxes and arrows, while the system sug-

gests which connections to establish or services

to use.

The architecture of cloud composer consists

of three main components.

Figure 4. Architecture of the cloud ontology.

View from the Cloud

68 IEEE Internet Computing

Page 16: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

14 ComputingEdge June 2019

23mic01-dimartino-2883839.3d (Style 5) 05-04-2019 15:24

Another reason why Euclidean distance may

not work as expected is related to the model’s

goal. Indeed, human-perceived distances related

to nonfunctional properties such as fairness or

confidentiality are often not isotropic, i.e., they

do not treat all features equally (like Euclidean

distance does). In terms of our example, if

we want to use our model to achieve allocation

fairness (rather than optimize our load), then

disk and RAM usage should not compensate

each other.

Summarizing, we have noticed that the

behavior and execution cost of the ML model

depend on a hidden discretionary parameter

(the distance used) that may not be uniform

across clouds, impairing the use of ML models in

multicloud environments.

Distance Based on Equivalence Relations

Let us now take a step toward explicit

representation of this parameter by refining

our problem statement. Instead of treating our

VMs’ features as vectors of numbers, we

model them as elements of an equivalence

class S, defined by some relation Fs. In terms

of our example’s 1-NN model, the reference

set will contain one representative for each

equivalence class. Two VMs will fall in the

same class where the value of Fs computed

on their representation is the same. Then, we

will be able to assign distance between clas-

ses using a simple lookup table. The lookup

table represents explicitly the metric struc-

ture we want to give to the resource space

and can be published and shared. Also, we

can define multiple tables corresponding to

different goals. Not surprisingly, tabled distan-

ces can be learnt,11 or computed automati-

cally by other AI approaches: methods to do

so include the classic isometric feature map-

ping procedure, or isomap, which can reliably

recover low-dimensional structure in percep-

tual datasets. Another well-studied technique

more suitable for stochastic usage data is t-

SNE by Maaten and Hinton.5

ML-FRIENDLY REPRESENTATIONSOF THE CLOUD RESOURCES SPACE

The above discussion suggests that seman-

tic representations of cloud resources should

be complemented with an ontological repre-

sentation of the distance to be used in the

resources’ metrics space, which in turn will

support ML models in achieving the specific

objective to be attained managing the multi-

cloud. The straightforward way to do so is to

augment the ontology with the representation

of the equivalence relations and of the associ-

ated distance tables. In general, for each cloud,

the AI agent will:

1. choose the equivalence relation that

expresses one or more of its nonfunctional

objectives;

2. identify the corresponding lookup table on

the equivalence classes of resources;

3. use the table to build and tune the corre-

sponding ML model.

It is important to remark that steps 2 and 3

are the interface between symbolic (ontology-

based) and subsymbolic (ML-based) operation of

the multicloud management agent, and there-

fore, equivalence relations are a key point of

interest for future standardization of multicloud

ontologies.

The extension of cloud ontology to represent

equivalence relations among resources12 will

guarantee the uniform behavior (if not uniform

achievements) on the part of all ML models

reconfiguring the cloud to reach an objective

defined at the ontology (symbolic) level, such as

a nonfunctional property mentioned in a service

level agreement.

It is also important to remark that the AI

agent can pursue multiple goals simultaneously:

To achieve jointly the goal of fairness and the

goal of performance, two different metric spaces

can be considered having different lookup

tables, and two different ML models can be

tuned and used. Alternatively, a “hybrid” equiva-

lence class can be defined, and a natural metric

used on it.

ACKNOWLEDGMENTThis work was supported in part by the Euro-

pean Union’s Horizon 2020 Research and Innova-

tion Program under the TOREADOR Project

under Grant 688797 and in part by the mOSAIC

Project under Grant 256910.

View from the Cloud

70 IEEE Internet Computing

Page 17: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

www.computer.org/computingedge 15

23mic01-dimartino-2883839.3d (Style 5) 05-04-2019 15:24

& REFERENCES

1. E.Damiani, C. Ardagna, P.Ceravolo, andN. Scarabottolo,

“Towardmodel-basedbig data-as-a-service: The

TOREADORapproach,” inProc. Eur. Conf. Adv.

Databases Inf. Syst., 2017, pp. 3–9.

2. B. Di Martino and A. Esposito, “A tool for mapping and

editing of cloud patterns: The semantic cloud patterns

editor,”Stud. Inform. Control, vol. 27, no. 1, pp. 117–126,

2018.

3. J. Dietrich and C. Elgar, “An ontology based

representation of software design patterns,” inProc. Des.

Pattern Formalization Techn., Jan. 2007, pp. 258–279.

4. C. Fehling, F. Leymann, R. Retter, W. Schupeck, and

P. Arbitter, Cloud Computing Patterns—Fundamentals

to Design, Build, and Manage Cloud Applications.

New York, NY, USA: Springer, 2014.

5. L. van derMaaten andG. Hinton, “Visualizing data using

t-SNE,” J.Mach. Learn. Res., vol. 9, pp. 2579–2605,

2008.

6. B. Di Martino, G. Cretella, and A. Esposito, “Towards

a legislation-aware cloud computing framework,”

Procedia Comput. Sci., vol. 68, pp. 127–135, 2015.

7. B. Di Martino, G. Cretella, A. Esposito, and G. Carta,

“An OWL ontology to support cloud portability and

interoperability,” Int. J. Web Grid Services, Indersci.,

vol. 11, no. 3, pp. 303–326, 2015.

8. B. Di Martino, A. Esposito, and G. Cretella, “Semantic

representation of cloud patterns and services with

automated reasoning to support cloud application

portability,” IEEE Trans. Cloud Comput., vol. 5, no. 4,

pp. 765–779, Oct./Dec. 2017.

9. A. Palesandro, M. Lacoste, N. Bennani, C. Ghedira-

Guegan, andD. Bourge, “MANTUS: Putting aspects to

work for flexiblemulti-clouddeployment,” inProc.

IEEE10th Int. Conf. CloudComput., 2017, pp. 656–663.

10. D. Petcu et al., “Experiences in building a mosaic of

clouds,” J. Cloud Comput., vol. 2, no. 1, 2013, pp. 2–12.

11. A. Bar-Hillel, “Learning distance functions using

equivalence relations,” in Proc. 20th Int. Conf. Mach.

Learn., 2003, pp. 11–18.

12. N. Guarino and C. Welty, “Identity, unity, and

individuality: Towards a formal toolkit for ontological

analysis,” in Proc. 14th Eur. Conf. Artif. Intell., 2000,

pp. 219–223.

Beniamino Di Martino is a Full Professor at the

University of Campania. He is an author of 14

international books and more than 300 publica-

tions in international journals and conferences;

has been a coordinator of EU funded FP7-ICT

Project mOSAIC, and participates to various inter-

national research projects; is an editor/associate

editor of seven international journals and EB Mem-

ber of several international journals; is vice chair

of the Executive Board of the IEEE CS Technical

Committee on Scalable Computing; is a member

of the IEEE WG for the IEEE P3203 Standard on

Cloud Interoperability, IEEE Intercloud Testbed

Initiative, IEEE Technical Committees on Scalable

Computing and on Big Data, Cloud Standards

Customer Council, and Cloud Experts’ Group of

the European Commission. Contact him at: benia-

[email protected].

Antonio Esposito is a Postdoc with the Depart-

ment of Engineering, University of Campania

“Luigi Vanvitelli” (Italy). He received the Ph.D.

degree in December 2016, with a thesis on a pat-

tern-guided semantic approach to the solution of

portability and interoperability issues in the cloud

enabling automatic services composition. Contact

him at: [email protected].

Ernesto Damiani is a Full Professor at the Univer-

sity of Milan, where he leads the Secure Software

Architectures Lab. He is the founding director of the

Cyber-Physical Systems Research Center, Khalifa

University, UAE. He is an author of more than 500

publications in international journals and conferen-

ces and of several patents. He has been the coordi-

nator of EU funded H2020 Project TOREADOR and

participates to various international research proj-

ects. He is an ACM distinguished scientist and a

recipient of the Stephen Yau Award. He received a

doctorate honoris causa from INSA Lyon (France) for

his contributions to research and innovation architec-

tures for big data analytics. Contact him at: ernesto.

[email protected].

January/February 2019 71

23mic01-dimartino-2883839.3d (Style 5) 05-04-2019 15:24

Another reason why Euclidean distance may

not work as expected is related to the model’s

goal. Indeed, human-perceived distances related

to nonfunctional properties such as fairness or

confidentiality are often not isotropic, i.e., they

do not treat all features equally (like Euclidean

distance does). In terms of our example, if

we want to use our model to achieve allocation

fairness (rather than optimize our load), then

disk and RAM usage should not compensate

each other.

Summarizing, we have noticed that the

behavior and execution cost of the ML model

depend on a hidden discretionary parameter

(the distance used) that may not be uniform

across clouds, impairing the use of ML models in

multicloud environments.

Distance Based on Equivalence Relations

Let us now take a step toward explicit

representation of this parameter by refining

our problem statement. Instead of treating our

VMs’ features as vectors of numbers, we

model them as elements of an equivalence

class S, defined by some relation Fs. In terms

of our example’s 1-NN model, the reference

set will contain one representative for each

equivalence class. Two VMs will fall in the

same class where the value of Fs computed

on their representation is the same. Then, we

will be able to assign distance between clas-

ses using a simple lookup table. The lookup

table represents explicitly the metric struc-

ture we want to give to the resource space

and can be published and shared. Also, we

can define multiple tables corresponding to

different goals. Not surprisingly, tabled distan-

ces can be learnt,11 or computed automati-

cally by other AI approaches: methods to do

so include the classic isometric feature map-

ping procedure, or isomap, which can reliably

recover low-dimensional structure in percep-

tual datasets. Another well-studied technique

more suitable for stochastic usage data is t-

SNE by Maaten and Hinton.5

ML-FRIENDLY REPRESENTATIONSOF THE CLOUD RESOURCES SPACE

The above discussion suggests that seman-

tic representations of cloud resources should

be complemented with an ontological repre-

sentation of the distance to be used in the

resources’ metrics space, which in turn will

support ML models in achieving the specific

objective to be attained managing the multi-

cloud. The straightforward way to do so is to

augment the ontology with the representation

of the equivalence relations and of the associ-

ated distance tables. In general, for each cloud,

the AI agent will:

1. choose the equivalence relation that

expresses one or more of its nonfunctional

objectives;

2. identify the corresponding lookup table on

the equivalence classes of resources;

3. use the table to build and tune the corre-

sponding ML model.

It is important to remark that steps 2 and 3

are the interface between symbolic (ontology-

based) and subsymbolic (ML-based) operation of

the multicloud management agent, and there-

fore, equivalence relations are a key point of

interest for future standardization of multicloud

ontologies.

The extension of cloud ontology to represent

equivalence relations among resources12 will

guarantee the uniform behavior (if not uniform

achievements) on the part of all ML models

reconfiguring the cloud to reach an objective

defined at the ontology (symbolic) level, such as

a nonfunctional property mentioned in a service

level agreement.

It is also important to remark that the AI

agent can pursue multiple goals simultaneously:

To achieve jointly the goal of fairness and the

goal of performance, two different metric spaces

can be considered having different lookup

tables, and two different ML models can be

tuned and used. Alternatively, a “hybrid” equiva-

lence class can be defined, and a natural metric

used on it.

ACKNOWLEDGMENTThis work was supported in part by the Euro-

pean Union’s Horizon 2020 Research and Innova-

tion Program under the TOREADOR Project

under Grant 688797 and in part by the mOSAIC

Project under Grant 256910.

View from the Cloud

70 IEEE Internet Computing

This article originally appeared in IEEE Internet Computing, vol. 23, no. 1, 2019.

Page 18: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

16 June 2019 Published by the IEEE Computer Society 2469-7087/19/$33.00 © 2019 IEEE

20mcse06-vzquezpoletti-2875315.3d (Style 4) 05-04-2019 15:32

Serverless Computing:From Planet Marsto the Cloud

Serverless computing is a new way of managing

computations in the cloud. We show how it can be put to

work for scientific data analysis. For this, we detail our

serverless architecture for an application analyzing data

from one of the instruments onboard the ESA Mars

Express orbiter, and then, we compare it with a

traditional server solution.

SERVERLESS AND (PUBLIC) CLOUD COMPUTINGServerless computing is an execution model where the user provides a code to be run without anyinvolvement in server management or capacity planning. Many serverless implementations are offeredin the form of compute runtimes, which execute application logic but do not persistently store data. Asthe uploaded code is exposed to the outside world, the developer no longer needs to be involved inmultitasking, handling requests, or operating system costs (installation, maintenance, and licenses).Serverless computing is a form of utility computing and paradoxically relies on actual serversmaintained by cloud computing providers.

Cloud computing is a provision model that gives access to dynamic, elastic, and on-demandcomputational resources. Within the general deployment models, public clouds have attractedmuch interest among the scientific community in recent years. Public cloud services are providedby an independent organization that owns compute resources, which are then offered to theircustomers. Public cloud users, especially those from the science field, find this pay-as-you-gopricing model to be very convenient. Its great flexibility allows budgets to be easily adapted tocomputing needs.

Jos�e Luis V�azquez-PolettiUniversidad Complutense deMadrid

Ignacio Mart�ln LlorenteUniversidad Complutense deMadridHarvard University

Editors:Konrad Hinsen;[email protected],

Matthew Turk;[email protected]

DEPARTMENT: SCIENTIFIC PROGRAMMING

Computing in Science & EngineeringNovember/December 2018 73

Published by the IEEE Computer Society1521-9615/18/$33.00 �2018 IEEE

Page 19: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

www.computer.org/computingedge 17

20mcse06-vzquezpoletti-2875315.3d (Style 4) 05-04-2019 15:32

Amazon web services (AWS) is one of the most widely used public cloud platforms. Among theirnumerous services, Lambda (aws.amazon.com/lambda) offers what can be regarded as serverlesscomputing. Under the premise “run code, not servers,”AWS Lambda makes it possible to run code inresponse to specific events while transparently managing the underlying compute resources for the user.

AN APPLICATION FROM A NEIGHBORING PLANETThe application ported to the Lambda serverless environment processes data from the Mars AdvancedRadar for Subsurface and Ionosphere Sounding (MARSIS).1 MARSIS is a pulse-limited and low-frequency radar sounder and altimeter installed on the Mars Express orbiter from the European SpaceAgency (sci.esa.int/mars-express/), which has been traveling around Mars since 2003. The instrumentgained fame in July 2018 when it detected liquid water hidden beneath the Martian South Pole.2

The application processes MARSIS data from the active ionospheric sounding (AIS) experiment anddisplays it graphically in order to identify magnetic fields. It also allows for the detection of inducedmagnetic fields deep in the Martian ionosphere, compressing its plasma.3 Through this process, newavenues have become available to study the effects of solar wind and understand dust storms.

Figure 1 shows the process performed by the application, which reuses software previously used inthe mission.

Our objective was to process every data file once it became available. This minimized response time,achieving an optimal performance-cost trade-off. The Mars Express mission should be extended to atleast the year 2022, and the study of the ionosphere has been identified as one of the mission’spriorities, as observations performed by the orbiter will see their coverage augmented and theirlong-time series extended.

In light of this, a large MARSIS dataset was generated as our starting point to evaluate the bestcomputing approach. This consisted of 9761 files with a total size of 92 GB and was retrievedbetween 2005 and 2016 (see Figure 2). The idea was that if our proposal worked with this largedataset, it should also be suitable for future data.

Figure 1. Application overview. Data from the AIS experiment are converted into images, which arethen used to calculate magnetic fields.

COMPUTING IN SCIENCE & ENGINEERING

November/December 2018 74 www.computer.org/itpro

20mcse06-vzquezpoletti-2875315.3d (Style 4) 05-04-2019 15:32

Serverless Computing:From Planet Marsto the Cloud

Serverless computing is a new way of managing

computations in the cloud. We show how it can be put to

work for scientific data analysis. For this, we detail our

serverless architecture for an application analyzing data

from one of the instruments onboard the ESA Mars

Express orbiter, and then, we compare it with a

traditional server solution.

SERVERLESS AND (PUBLIC) CLOUD COMPUTINGServerless computing is an execution model where the user provides a code to be run without anyinvolvement in server management or capacity planning. Many serverless implementations are offeredin the form of compute runtimes, which execute application logic but do not persistently store data. Asthe uploaded code is exposed to the outside world, the developer no longer needs to be involved inmultitasking, handling requests, or operating system costs (installation, maintenance, and licenses).Serverless computing is a form of utility computing and paradoxically relies on actual serversmaintained by cloud computing providers.

Cloud computing is a provision model that gives access to dynamic, elastic, and on-demandcomputational resources. Within the general deployment models, public clouds have attractedmuch interest among the scientific community in recent years. Public cloud services are providedby an independent organization that owns compute resources, which are then offered to theircustomers. Public cloud users, especially those from the science field, find this pay-as-you-gopricing model to be very convenient. Its great flexibility allows budgets to be easily adapted tocomputing needs.

Jos�e Luis V�azquez-PolettiUniversidad Complutense deMadrid

Ignacio Mart�ln LlorenteUniversidad Complutense deMadridHarvard University

Editors:Konrad Hinsen;[email protected],

Matthew Turk;[email protected]

DEPARTMENT: SCIENTIFIC PROGRAMMING

Computing in Science & EngineeringNovember/December 2018 73

Published by the IEEE Computer Society1521-9615/18/$33.00 �2018 IEEE

Page 20: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

18 ComputingEdge June 2019

20mcse06-vzquezpoletti-2875315.3d (Style 4) 05-04-2019 15:32

Execution time may take from milliseconds to half a minute for each data file depending on its size(see Figure 3). As there were many independent, simple processes involved, where each computationneeded to be optimized for latency and completed in near real time, we considered porting to aserverless environment to be the next logical step.

GETTING IT DONEWITH AWS LAMBDAWith regard to Lambda, a so-called function was created that contained the following.

1. An executable that performs the core operations on the specified data file. This executable,programed in C, was generated on an Amazon Linux box with static libraries. Its size is21 KB.

2. The main function code that handles the data file, invoking the executable and copying theoutput to a specified repository. Node.js 6.10 was used to program it, but Lambda also offersJava, C#, and Python bindings. Its size is 11.1 KB.

The main function code is also responsible for retrieving a reference file needed in the process. Thereason for not including it in the function bundle is its size (77 MB), which exceeds the maximumallowed. All output files must be generated in a scratch directory (/tmp) limited to 512 MB. The rest ofthe filesystem where the function is executed is read-only.

Figure 2. Arrival dates and times of the entire dataset from the AIS experiment between 2005 and2016.

Figure 3. Distribution of data file sizes from the AIS experiment between 2005 and 2016 (in MB).

SCIENTIFIC PROGRAMMING

November/December 2018 75 www.computer.org/itpro

Page 21: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

www.computer.org/computingedge 19

20mcse06-vzquezpoletti-2875315.3d (Style 4) 05-04-2019 15:32

The function was then configured to be triggered every time a new file with the .dat suffix appeared ina specified simple storage service (S3) bucket. In Lambda, there are many event sources that can beused for triggering functions, all of them related to AWS services. Some examples are email receptionby simple email service, data addition to a DynamoDB table, and scheduled triggers usingCloudWatch events.

Returning to our function, output files are moved to another S3 bucket. An overview of thearchitecture is shown in Figure 4.

Since Lambda is a serverless service, users do not need to be concerned with server specifications.Functions were conceived to be small and short, and the price depends on the following parameters.

� Memory assigned to the function. We specified 704 MB, with a maximum of 3008 MB.� Deadline. Even if 30 s could be enough, we specified 1 min, with a maximum of 5.

When one of the previous limits is reached, Lambda stops the execution of the function.

TUNING OUR FUNCTION: COLD ANDWARM CONTAINERSWhen a function is invoked, Lambda releases a container (sandbox) to be allocated within the AWSinfrastructure, function files are copied onto it, and execution takes place. If several trigger eventsoccur at the same time, several containers are released in order to meet the demand.

The lifetime of each of these containers is 15 min after the execution of the function. If there is anew invocation, the container is reused, including all of its files. We took advantage of this inorder to speed up the execution of the function. For instance, a warm (reused) container does notrequire the function to obtain the reference file (77 MB), because it has already been downloaded.Also, a warm container skips the Node.js language initialization and the code initialization itself.

Figure 4. Architecture overview showing AWS services involved. Official icon set provided by Amazonweb services.

COMPUTING IN SCIENCE & ENGINEERING

November/December 2018 76 www.computer.org/itpro

20mcse06-vzquezpoletti-2875315.3d (Style 4) 05-04-2019 15:32

Execution time may take from milliseconds to half a minute for each data file depending on its size(see Figure 3). As there were many independent, simple processes involved, where each computationneeded to be optimized for latency and completed in near real time, we considered porting to aserverless environment to be the next logical step.

GETTING IT DONEWITH AWS LAMBDAWith regard to Lambda, a so-called function was created that contained the following.

1. An executable that performs the core operations on the specified data file. This executable,programed in C, was generated on an Amazon Linux box with static libraries. Its size is21 KB.

2. The main function code that handles the data file, invoking the executable and copying theoutput to a specified repository. Node.js 6.10 was used to program it, but Lambda also offersJava, C#, and Python bindings. Its size is 11.1 KB.

The main function code is also responsible for retrieving a reference file needed in the process. Thereason for not including it in the function bundle is its size (77 MB), which exceeds the maximumallowed. All output files must be generated in a scratch directory (/tmp) limited to 512 MB. The rest ofthe filesystem where the function is executed is read-only.

Figure 2. Arrival dates and times of the entire dataset from the AIS experiment between 2005 and2016.

Figure 3. Distribution of data file sizes from the AIS experiment between 2005 and 2016 (in MB).

SCIENTIFIC PROGRAMMING

November/December 2018 75 www.computer.org/itpro

Page 22: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

20 ComputingEdge June 2019

20mcse06-vzquezpoletti-2875315.3d (Style 4) 05-04-2019 15:32

We finally estimated the container average warming time (execution time in cold containers minusthat in warm ones) to be 5 s.

The full function structure is described in Figure 5.

SERVERLESS OR SERVERFUL? A PERFORMANCEAND COSTANALYSISThis section compares our serverless solution with a more traditional server-based one, but still in thecloud, using AWS Elastic Compute Cloud (EC2).

In the AWS EC2 case, a t2.small machine (1 virtual CPU, 2 GB memory) was prepared witheverything needed for processing the input files when they became available, including thereference file. As keeping this machine up 24/7 was not a valid option, the following procedurewas considered.

1. Start the machine and download the .dat file from the S3 input bucket.2. Run the executable, one .dat file at a time.3. Upload the output file to the S3 output bucket.4. If there is a new .dat file in less than 1 min, go to 2.5. Stop the machine.

In the EC2 simulations, we assumed the existence of an external program that detects the availabilityof new .dat files in the S3 input bucket. As explained before, this process is automatic in Lambda,where the user can easily define triggers.

Figure 5. Flowchart describing the lifecycle of our AWS Lambda function.

SCIENTIFIC PROGRAMMING

November/December 2018 77 www.computer.org/itpro

Page 23: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

www.computer.org/computingedge 21

20mcse06-vzquezpoletti-2875315.3d (Style 4) 05-04-2019 15:32

Stopping the machine ensures that its configuration and files (executable, reference file) will be readyfor the next process without any further cost (which is $0.023/h when running). Starting the machinetakes an average of 26 s, and the file transfer takes around 4 s.

The arrival of files, as detailed in Figure 2, was simulated in EC2. The machine would be up a total of173 h 380 (including the 10 wait for new .dat files). The execution of 4776 processes (48.92% of thedataset) was delayed because of machines that were either stopped or busy. In the first case, boot timedelays were applied. In the second, a new machine was started if its boot time was lower than theestimated remaining time for the active process.

During the simulations, no more than two simultaneous instances were needed. This happened 512times. However, the accumulated delay was 39 h 200: an average of 29 s for each of the 4776 affectedprocesses.

Using the virtual machine approach costs $3.99. Storage with S3 involves a very small cost, as itcharges $0.023/GB when transfers are less than 50 TB/month. The total cost would be $2.21, $2.12corresponding to input transfers (92 GB) and $0.092 to output transfers (4 GB).

Moving to Lambda, the total usage time was 10 h 110, but AWS would bill 8 min more because ofrounding up. With regard to container temperature, 898 were cold and 8863 were warm, takingadvantage of the tweak described before. Also, 5 s was the average time for container warming,whereas the accumulated delay for Lambda was 1 h 150.

Table 1 shows a comparison of the execution times in both solutions for a set of file sizes. As shown,Lambda offers the best performance for all selected cases.

Prices in Lambda depend on the chosen configuration. In our case (704 MB and 10 deadline), the totalcost would be $4.25 just for execution. Storage has the same cost as in the EC2 solution ($2.21), asS3 buckets are used.

As can be seen in the breakdown of costs for each of the solutions shown in Table 2, the EC2 solution isslightly cheaper than the Lambda one. On the other hand, Table 1 shows that execution time and delaysfor Lambda are lower. Moreover, execution on EC2 would require the development of a trigger system.

AWS Lambda can be a valid solution if the demand is for low latency processing at a reasonable cost,and the underlying infrastructure is not a concern (only two performance parameters can be specified).Serverless computing requires processes to be atomized, and Lambda is limited by the Amazon Linuximage it comes with, static libraries, and what can be installed and used in less than the time availablefor execution. If you rely on some of the many services offered by Amazon or you are willing to

Table 1. Comparison of execution times in AWS EC2 and Lambda considering file sizes.

File size (MB) Execution time AWSEC2 (ms)

Execution time AWSLambda (ms)

4.9 3374 1800

13 9228 3700

17 11 796 4600

Table 2. Costs for each of the solutions.

AWS EC2 AWS Lambda

Execution $3.99 $4.25

Storage $2.21

COMPUTING IN SCIENCE & ENGINEERING

November/December 2018 78 www.computer.org/itpro

20mcse06-vzquezpoletti-2875315.3d (Style 4) 05-04-2019 15:32

We finally estimated the container average warming time (execution time in cold containers minusthat in warm ones) to be 5 s.

The full function structure is described in Figure 5.

SERVERLESS OR SERVERFUL? A PERFORMANCEAND COSTANALYSISThis section compares our serverless solution with a more traditional server-based one, but still in thecloud, using AWS Elastic Compute Cloud (EC2).

In the AWS EC2 case, a t2.small machine (1 virtual CPU, 2 GB memory) was prepared witheverything needed for processing the input files when they became available, including thereference file. As keeping this machine up 24/7 was not a valid option, the following procedurewas considered.

1. Start the machine and download the .dat file from the S3 input bucket.2. Run the executable, one .dat file at a time.3. Upload the output file to the S3 output bucket.4. If there is a new .dat file in less than 1 min, go to 2.5. Stop the machine.

In the EC2 simulations, we assumed the existence of an external program that detects the availabilityof new .dat files in the S3 input bucket. As explained before, this process is automatic in Lambda,where the user can easily define triggers.

Figure 5. Flowchart describing the lifecycle of our AWS Lambda function.

SCIENTIFIC PROGRAMMING

November/December 2018 77 www.computer.org/itpro

Page 24: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

22 ComputingEdge June 2019

20mcse06-vzquezpoletti-2875315.3d (Style 4) 05-04-2019 15:32

migrate from other providers, orchestration with Lambda is a must. Obviously, the drawback comes inthe form of vendor lock-in.

ACKNOWLEDGMENTSThe authors would like to thank the ESA Mars Express mission team (in particular, itsProject Scientist, Dr. Dmitri Titov) and their colleagues from the UCMMartian StudiesGroup (in particular, Prof. Luis V�azquez, Dr. Mar�la Ram�lrez-Nicol�as, Dr. Pedro J. Pascual,Dr. Salvador Jim�enez, and Dr. David Usero).

The authors are also grateful for the support provided by the Spanish Ministry of Economyand Competitiveness under project number TIN2015-65469-P and by the EuropeanCommission under the IN-TIME project under Grant 823934.

REFERENCES1. G. Picardi, D. Biccardi, R. Seu, J. Plaut, W. T. K. Johnson, and R. L. Jordan, “MARSIS: Mars

advanced radar for subsurface and ionosphere sounding,” inMars Express: The ScientificPayload. Noordwijk, The Netherlands: ESA Publ. Div., 2004, pp. 51–69.

2. R. Orosei et al., “Radar evidence of subglacial liquid water on Mars,” Science, 2018, http://science.sciencemag.org/content/early/2018/07/24/science.aar7268

3. M. Ram�lrez-Nicol�as et al., “The effect of the induced magnetic field on the electron densityvertical profile of the Mars’ ionosphere: A Mars Express MARSIS radar data analysis andinterpretation, A case study,” Planet. Space Sci., vol. 20, pp. 49–62, 2016.

ABOUT THE AUTHORS

Jos�e Luis V�azquez-Poletti is an Associate Professor with the Universidad Complutensede Madrid, Madrid, Spain, where he is also the Head of its Open Source Software andOpen Technologies Office. His research interests include distributed, parallel, anddata-intensive computing technologies, as well as innovative applications of thosetechnologies to the scientific area. He received the Ph.D. degree in computer architecturefrom the Universidad Complutense de Madrid. Contact him at [email protected].

Ignacio Mart�ln Llorente is a Visiting Professor with the Harvard John A. Paulson Schoolof Engineering and Applied Sciences, Cambridge MA, USA, a Professor in computerarchitecture and the Head of the Data-Intensive Cloud Lab, Universidad Complutense deMadrid, Madrid, Spain, and the Co-Founder and Director of the OpenNebula open-source project for cloud computing management. His research interests includedistributed, parallel, and data-intensive computing technologies, as well as innovativeapplications of those technologies to business and scientific problems. Contact him [email protected] or [email protected].

November/December 2018 79 www.computer.org/itpro

SCIENTIFIC PROGRAMMING

This article originally appeared in Computing in Science & Engineering, vol. 20, no. 6, 2018.

Page 25: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

PURPOSE: The IEEE Computer Society is the world’s largest association of computing professionals and is the leading provider of technical information in the field.

MEMBERSHIP: Members receive the monthly magazine Computer, discounts, and opportunities to serve (all activities are led by volunteer members). Membership is open to all IEEE members, affiliate society members, and others interested in the computer field.

COMPUTER SOCIETY WEBSITE: www.computer.org

OMBUDSMAN: Direct unresolved complaints to [email protected].

CHAPTERS: Regular and student chapters worldwide provide the opportunity to interact with colleagues, hear technical experts, and serve the local professional community.

AVAILABLE INFORMATION: To check membership status, report an address change, or obtain more information on any of the following, email Customer Service at [email protected] or call +1 714 821 8380 (international) or our toll-free number, +1 800 272 6657 (US):

• Membership applications• Publications catalog• Draft standards and order forms• Technical committee list• Technical committee application• Chapter start-up procedures• Student scholarship information• Volunteer leaders/staff directory• IEEE senior member grade application (requires 10 years

practice and significant performance in five of those 10)

PUBLICATIONS AND ACTIVITIESComputer: The flagship publication of the IEEE Computer Society, Computer, publishes peer-reviewed technical content that covers all aspects of computer science, computer engineering, technology, and applications.

Periodicals: The society publishes 12 magazines, 15 transactions, and two letters. Refer to membership application or request information as noted above.

Conference Proceedings & Books: Conference Publishing Services publishes more than 275 titles every year.

Standards Working Groups: More than 150 groups produce IEEE standards used throughout the world.

Technical Committees: TCs provide professional interaction in more than 30 technical areas and directly influence computer engineering conferences and publications.

Conferences/Education: The society holds about 200 conferences each year and sponsors many educational activities, including computing science accreditation.

Certifications: The society offers three software developer credentials. For more information, visit www.computer .org/certification.

2019 BOARD OF GOVERNORS MEETINGS6 – 7 June: Hyatt Regency Coral Gables, Miami, FL(TBD) November: Teleconference

EXECUTIVE COMMITTEEPresident: Cecilia Metra President-Elect: Leila De Floriani; Past President: Hironori Kasahara; First VP: Forrest Shull; Second VP: Avi Mendelson; Secretary: David Lomet; Treasurer: Dimitrios Serpanos; VP, Member & Geographic Activities: Yervant Zorian; VP, Professional & Educational Activities: Kunio Uchiyama; VP, Publications: Fabrizio Lombardi; VP, Standards Activities: Riccardo Mariani; VP, Technical & Conference Activities: William D. Gropp 2018–2019 IEEE Division V Director: John W. Walz2019 IEEE Division V Director Elect: Thomas M. Conte2019–2020 IEEE Division VIII Director: Elizabeth L. Burd

BOARD OF GOVERNORSTerm Expiring 2019: Saurabh Bagchi, Leila De Floriani, David S. Ebert, Jill I. Gostin, William Gropp, Sumi Helal, Avi MendelsonTerm Expiring 2020: Andy Chen, John D. Johnson, Sy-Yen Kuo, David Lomet, Dimitrios Serpanos, Forrest Shull, Hayato YamanaTerm Expiring 2021: M. Brian Blake, Fred Douglis, Carlos E. Jimenez-Gomez, Ramalatha Marimuthu, Erik Jan Marinissen, Kunio Uchiyama

EXECUTIVE STAFFExecutive Director: Melissa RussellDirector, Governance & Associate Executive Director: Anne Marie KellyDirector, Finance & Accounting: Sunny HwangDirector, Information Technology & Services: Sumit Kacker Director, Marketing & Sales: Michelle TubbDirector, Membership Development: Eric Berkowitz

COMPUTER SOCIETY OFFICESWashington, D.C.: 2001 L St., Ste. 700, Washington, D.C. 20036-4928 • Phone: +1 202 371 0101 • Fax: +1 202 728 9614Email: [email protected]

Los Alamitos: 10662 Los Vaqueros Cir., Los Alamitos, CA 90720Phone: +1 714 821 8380 • Email: [email protected]

Asia/Pacific: Watanabe Building, 1-4-2 Minami-Aoyama, Minato-ku, Tokyo 107-0062, Japan • Phone: +81 3 3408 3118 Fax: +81 3 3408 3553 • Email: [email protected]

MEMBERSHIP & PUBLICATION ORDERSPhone: +1 800 272 6657 • Fax: +1 714 821 4641 Email: [email protected]

IEEE BOARD OF DIRECTORSPresident & CEO: Jose M.D. MouraPresident-Elect: Toshio FukudaPast President: James A. JefferiesSecretary: Kathleen KramerTreasurer: Joseph V. LillieDirector & President, IEEE-USA: Thomas M. CoughlinDirector & President, Standards Association: Robert S. FishDirector & VP, Educational Activities: Witold M. KinsnerDirector & VP, Membership and Geographic Activities: Francis B. Grosz, Jr.Director & VP, Publication Services & Products: Hulya KirkiciDirector & VP, Technical Activities: K.J. Ray Liu

revised 13 February 2019

Page 26: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

24 June 2019 Published by the IEEE Computer Society 2469-7087/19/$33.00 © 2019 IEEE78 C O M P U T E R P U B L I S H E D B Y T H E I E E E C O M P U T E R S O C I E T Y 0 0 1 8 - 9 1 6 2 / 1 7 / $ 3 3 . 0 0 © 2 0 1 7 I E E E

CLOUD COVER

In the past few decades, aggressive miniaturization of semiconductor circuits has driven the explo-sive performance improvement of digital systems that have radically reshaped the way we work,

entertain, and communicate. At the same time, new

paradigms such as cloud and edge computing, as well as the Internet of Things (IoT), enable billions of devices to interconnect intelligently. These devices will generate huge volumes of data that must be pro-cessed and analyzed in centralized or decentralized datacenters located close to users. The analysis of this data could lead to new scientific discoveries and new applications that will improve our lives.1

Such advances are at risk, however, because ongoing technology minia-turization appears to be ending, as it frequently causes otherwise-identical nanoscale circuits to exhibit di� erent performance or power-consumption behaviors, even though they’re de-signed using the same processes and architecture (see Figure 1). Such variations are caused by imperfec-tions in the manufacturing process

that are magni� ed as circuits get smaller. This can result in many fabricated chips not meeting

their intended performance and power speci� cations, thereby endangering the correct functionality of products that use them.

Error-Resilient Server Ecosystems for Edge and Cloud DatacentersGeorgios Karakonstantis and Dimitrios S. Nikolopoulos, Queen’s University Belfast

Dimitris Gizopoulos, National and Kapodistrian University of Athens

Pedro Trancoso and Yiannakis Sazeides, University of Cyprus

Christos D. Antonopoulos, University of Thessaly

Srikumar Venugopal, IBM Research

Shidhartha Das, ARM Research

The explosive growth of Internet-connected

devices that form the Internet of Things and

the fl ood of data they yield require new energy-

effi cient and error-resilient hardware and

software server stacks for next-generation

cloud and edge datacenters.

r12clo.indd 78 11/17/17 3:12 PM

Page 27: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

www.computer.org/computingedge 25D E C E M B E R 2 0 1 7 79

EDITOR SAN MURUGESAN BRITE Professional Services; [email protected]

Manufacturers try to deal with the huge performance and power vari-ability in fabricated chips—and hide it from the software layers—by adopting pessimistic safety timing margins and redundant error-correction schemes designed to counteract the worst pos-sible scenario.2 Typically, such mea-sures are extremely pessimistic be-cause they’re based on rare worst-case operating conditions and because the capabilities of the worst performing chips are far inferior to those of the vast majority of identically manufac-tured circuits. As a result, the majority of the chips are constrained to operate at the low speed and high power con-sumption of the relatively few worst-case chips, not at the speed and power they could actually achieve.3

The use of pessimistic timing mar-gins—along with the inability to save power e� ciently by scaling down the supply voltage in nanoscale circuits because that would make circuits even more prone to failure—has elevated energy e� ciency’s signi� cance.4

Reducing processors’ energy con-sumption could not only enable prod-ucts to meet their tight power budgets but could also let users improve perfor-mance by employing more resources or by operating the chips at a higher frequency.5 This would be particularly important for servers, which will soon have to handle the huge amounts of data that the increasing number of in-terconnected devices will generate, a total estimated to reach 24.3 exabytes per month in 2019.6

IMPROVEMENTS REQUIRE NEW DESIGN APPROACHESSubstantially improving energy-e� ciency requires new types of error-resilient server ecosystems that can handle hardware components’ in-creased power and performance vari-ability more intelligently than con-ventional pessimistic paradigms.

The computing industry should see such heterogeneity not as a problem but as an opportunity to improve en-ergy e� ciency. This could be done by not arti� cially constraining all chips’ performance based on a few outliers but rather by letting each chip operate according to its true capabilities. Ex-ploiting such heterogeneity requires a shift away from current approaches and a redesign of next-generation serv-ers’ hardware and system software.

SERVER ECOSYSTEM Making the most of heterogeneity requires automated � rmware-level procedures to expose each processor’s and memory resource’s capabilities, even as they change over time. This requires the embedding of diagnostic and health-monitoring daemons in the � rmware of any server that eval-uates hardware components’ opera-tions during the product lifetime (see Figure 2). These daemons would access on-chip sensors and error-detection circuitry to collect and analyze vari-ous parameters, such as correctable and uncorrectable errors, perfor-mance counters, system crashes and hangs, and thermal and power be-havior. This would be similar to the

procedures that the machine-check architecture in x86 systems has ad-opted (www.mcelog.org).

The new system would use a hard-ware exposure interface (HEI) en-hanced to collect the required infor-mation and communicate it to the software stack, which would identify energy-e� cient voltage, frequency, and refresh-rate states for processors and memory subsystems.

We should also rethink the design of all system-software layers—includinghypervisors and resource-management frameworks such as OpenStack—used in today’s datacenters. These layers should be able to operate hardware close to its performance and power limits. For example, hypervisors could use this ca-pability to allocate processor and mem-ory resources with di� erent reliability, power, and performance-efficiency characteristics to virtual machines so that they improve performance while re-ducing the server’s energy consumption. Cloud-management frameworks could leverage the same capability to provide better quality of service (QoS) to users and applications.

However, operating hardware out-side its normal safety margins could introduce critical system-software

FReq1 FReq2 FReqk

Frequency

Num

ber o

f chi

ps

CPU1

CPU1

CPU2...CPU3

CPU2

CPU3

Figure 1. Identical chips, in this case CPUs, might have substantially different perfor-mance characteristics, such as operating frequency (Freq), even if they were designed and manufactured using the same processes.

r12clo.indd 79 11/17/17 3:12 PM

78 C O M P U T E R P U B L I S H E D B Y T H E I E E E C O M P U T E R S O C I E T Y 0 0 1 8 - 9 1 6 2 / 1 7 / $ 3 3 . 0 0 © 2 0 1 7 I E E E

CLOUD COVER

In the past few decades, aggressive miniaturization of semiconductor circuits has driven the explo-sive performance improvement of digital systems that have radically reshaped the way we work,

entertain, and communicate. At the same time, new

paradigms such as cloud and edge computing, as well as the Internet of Things (IoT), enable billions of devices to interconnect intelligently. These devices will generate huge volumes of data that must be pro-cessed and analyzed in centralized or decentralized datacenters located close to users. The analysis of this data could lead to new scientific discoveries and new applications that will improve our lives.1

Such advances are at risk, however, because ongoing technology minia-turization appears to be ending, as it frequently causes otherwise-identical nanoscale circuits to exhibit di� erent performance or power-consumption behaviors, even though they’re de-signed using the same processes and architecture (see Figure 1). Such variations are caused by imperfec-tions in the manufacturing process

that are magni� ed as circuits get smaller. This can result in many fabricated chips not meeting

their intended performance and power speci� cations, thereby endangering the correct functionality of products that use them.

Error-Resilient Server Ecosystems for Edge and Cloud DatacentersGeorgios Karakonstantis and Dimitrios S. Nikolopoulos, Queen’s University Belfast

Dimitris Gizopoulos, National and Kapodistrian University of Athens

Pedro Trancoso and Yiannakis Sazeides, University of Cyprus

Christos D. Antonopoulos, University of Thessaly

Srikumar Venugopal, IBM Research

Shidhartha Das, ARM Research

The explosive growth of Internet-connected

devices that form the Internet of Things and

the fl ood of data they yield require new energy-

effi cient and error-resilient hardware and

software server stacks for next-generation

cloud and edge datacenters.

r12clo.indd 78 11/17/17 3:12 PM

Page 28: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

26 ComputingEdge June 201980 C O M P U T E R W W W . C O M P U T E R . O R G / C O M P U T E R

CLOUD COVER

errors and even crash servers. We need system-software-resilience ap-proaches to avoid or mitigate such errors and sustain high server avail-ability. In datacenters, this means not degrading the QoS of a customer workload in case of a server crash and adopting mechanisms for speeding up the recovery process.

EMPOWERING INTERNET EVOLUTIONManufacturers could integrate our pro-posed ecosystem into standard high-end servers and the newly introduced microservers. Microservers don’t per-form as well as mainstream servers yet, but they can service many types of application requests with an appro-priate level of performance and with signi�cantly less power consumption.

Integrating our proposed software and hardware ecosystem into servers would help power next-generation datacenters in the cloud and at the network edge, where energy e�ciency is particularly critical for minimizing power supply, cooling, and mainte-nance costs.

Energy-e�cient servers would help create a more sustainable Internet. Presently, most Internet processing and storage takes place in the cloud, in massive centralized datacenters that contain tens of thousands of servers, consume as much electricity as a small city, and utilize expensive cooling mechanisms. This won’t be practical in the IoT era because the current Inter-net infrastructure’s limited network capacity won’t accommodate the exa-bytes of data that Internet-connected

devices will soon generate. However, using the typical centralized datacen-ters along with new decentralized da-tacenters at the network edge, closer to users, could limit the load put on the Internet infrastructure by allowing preprocessing and selective forward-ing of data to the cloud. This paradigm, used by both edge and fog computing, has advantages over the cloud para-digm and is being promoted by ma-jor companies such as Cisco, Huawei, IBM, and Intel as a way to transform the next-generation Internet.7

Finally, edge resources’ ability to provide all necessary services within a home or small business improves privacy because the data they carry doesn’t have to travel through the pub-lic network or reside in third-party datacenters.

Edge/cloud datacenter use cases

Secu

rity

thre

at a

naly

sis

and

coun

term

easu

res

that

exp

loit

the

intri

nsic

har

dwar

e he

tero

gene

ity

Stand-alone deployment

Applications

Operating system (OS)

UniServerdriver/serverVirtual machine (VM)

Variability-aware OpenStack

Scalable fault-tolerant hypervisor

Mic

ro-

chec

kpoi

ntin

g

Syst

em s

oftw

are

Apps

Firm

war

eHa

rdw

are

Proactive and adaptiveerror recovery

Hardware resourcesmanagement

Hardware exposure interface

Intrinsically and functionally heterogeneous hardware (cores, static/dynamic memories)

System con�gurationpolicies

Stress log

Stress daemons Health monitor

Health log

VM/hardware interactionmonitoring characterization

VM scheduling onheterogeneous cluster

VM

VM

OS OS

Applications Applications

Figure 2. This server ecosystem spans all layers of the system stack and enhances it with technologies for monitoring hardware health and optimizing overall operation under an extended range of operating points.

r12clo.indd 80 11/17/17 3:12 PM

Page 29: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

www.computer.org/computingedge 27D E C E M B E R 2 0 1 7 81

Realizing our proposed error-resilient, energy-efficient eco-system faces many challenges,

in part because it requires the design of new technologies and the adoption of a system operation philosophy that de-parts from the current pessimistic one.

The UniServer Consortium (www .uniserver2020.eu)—consisting of ac-ademic institutions and leading com-panies such as AppliedMicro Circuits, ARM, and IBM—is working toward such a vision. Its goal is the develop-ment of a universal system architec-ture and software ecosystem for serv-ers used for cloud- and edge-based datacenters. The European Communi-ty’s Horizon 2020 research program is funding UniServer (grant no. 688540).

The consortium is already imple-menting our proposed ecosystem in a state-of-the art X-Gene2 eight-core, ARMv8-based microserver with 28-nm feature sizes. The initial character-ization of the server’s processing cores shows that there is a signi�cant safety margin in the supply voltage used to operate each core. Results show that the some cores could use 10 percent below the nominal supply voltage that the manufacturer advises. This could lead to a 38 percent power savings.8

Similarly promising is the char-acterization of the DRAM memories used on the ARMv8-based microserver showing that the refresh rate and sup-ply voltage could be decreased by 98 percent and 5 percent from nominal levels, respectively. Such reductions could lead to an average power sav-ings of more than 22 percent across a range of benchmarks.9

REFERENCES1. H. Bauer, M Patel, and J. Veira, The

Internet of Things: Sizing up the Oppor-tunity, online report, McKinsey & Co., December 2014; www.mckinsey.com/industries/semiconductors/our-insights/the-internet-of-things-sizing-up-the-opportunity.

2. S. Ghosh and K. Roy, “Parameter Variation Tolerance and Error Resil-iency: New Design Paradigm for the

Nanoscale Era,” Proc. IEEE, volume 98, no. 10, 2010, pp. 1718–1751.

3. P.N. Whatmough et al., “An All- Digital Power-Delivery Monitor for Analysis of a 28nm Dual-Core ARM Cortex-A57 cluster,” Proc. 2015 IEEE Solid-State Circuits Conf. (ISSCC 15), 2015, pp. 1–3.

4. H. Wong et al., “Implications of Historical Trends in the Electrical E¨ciency of Computing,” IEEE Annals of the History of Computing, vol. 33, no. 3, 2011, pp. 46–54.

5. H. Esmaeilzadeh et al., “Dark Silicon and the End of Multicore Scaling,” IEEE Micro, vol. 32, no. 3, 2012, pp. 122–134.

6. Cisco Visual Networking Index: Global Mobile Data Tra�c Forecast Update, 2016–2021, online white paper, Cisco Systems, March 2017; www.cisco .com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/mobile-white-paper-c11-520862.html.

7. W. Shi et al., “Edge Computing: Vision and Challenges,” IEEE Internet of Things J., vol. 3, no. 5, 2016, pp. 637–646.

8. G. Papadimitriou et al., “Harnessing Voltage Margins for Energy E¨-ciency in Multicore CPUs,” Proc. 50th IEEE/ACM Int’l Symp. Microarchitec-ture (MICRO 17), 2017, pp. 503–516.

9. K. Tovletoglou, D. Nikolopoulos, and G. Karakonstantis, “Relaxing DRAM Refresh Rate through Access Pattern Scheduling: A Case Study on Stencil-Based Algorithms,” Proc. 23rd IEEE Int’l Symp. Online Testing and Robust System Design (IOLTS 17), 2017, pp. 45–50.

GEORGIOS KARAKONSTANTIS

is an assistant professor in the

School of Electronics, Electrical

Engineering, and Computer Science

(EEECS) at Queen’s University

Belfast and the scientific coordinator

of the UniServer project. Contact

him at [email protected].

DIMITRIOS S. NIKOLOPOULOS is

a professor and the head of the

School of EEECS at Queen’s

University Belfast. Contact him at

[email protected].

DIMITRIS GIZOPOULOS is a profes-

sor in the Department of Informatics

and Telecommunications at the

National and Kapodistrian University

of Athens, where he leads the

Computer Architecture Laboratory.

Contact him at [email protected].

PEDRO TRANCOSO is an associate

professor in the University of Cyprus’

Department of Computer Science.

Contact him at [email protected].

YIANNAKIS SAZEIDES is an asso-

ciate professor in the University of

Cyprus’ Department of Computer

Science. Contact him at yanos@

cs.ucy.ac.cy.

CHRISTOS D. ANTONOPOULOS

is an assistant professor in the

University of Thessaly’s Electrical

and Computer Engineering

Department. Contact him at cda@

uth.gr.

SRIKUMAR VENUGOPAL is a

research scientist at IBM Research–

Ireland. Contact him at srikumarv@

ie.ibm.com.

SHIDHARTHA DAS is a principal

research engineer at ARM Research

and a Royal Academy of Engineering

visiting professor at Newcastle

University. Contact him at sdas@

arm.com.

Read your subscriptions through the myCS publications portal at

http://mycs.computer.org

r12clo.indd 81 11/17/17 3:12 PM

80 C O M P U T E R W W W . C O M P U T E R . O R G / C O M P U T E R

CLOUD COVER

errors and even crash servers. We need system-software-resilience ap-proaches to avoid or mitigate such errors and sustain high server avail-ability. In datacenters, this means not degrading the QoS of a customer workload in case of a server crash and adopting mechanisms for speeding up the recovery process.

EMPOWERING INTERNET EVOLUTIONManufacturers could integrate our pro-posed ecosystem into standard high-end servers and the newly introduced microservers. Microservers don’t per-form as well as mainstream servers yet, but they can service many types of application requests with an appro-priate level of performance and with signi�cantly less power consumption.

Integrating our proposed software and hardware ecosystem into servers would help power next-generation datacenters in the cloud and at the network edge, where energy e�ciency is particularly critical for minimizing power supply, cooling, and mainte-nance costs.

Energy-e�cient servers would help create a more sustainable Internet. Presently, most Internet processing and storage takes place in the cloud, in massive centralized datacenters that contain tens of thousands of servers, consume as much electricity as a small city, and utilize expensive cooling mechanisms. This won’t be practical in the IoT era because the current Inter-net infrastructure’s limited network capacity won’t accommodate the exa-bytes of data that Internet-connected

devices will soon generate. However, using the typical centralized datacen-ters along with new decentralized da-tacenters at the network edge, closer to users, could limit the load put on the Internet infrastructure by allowing preprocessing and selective forward-ing of data to the cloud. This paradigm, used by both edge and fog computing, has advantages over the cloud para-digm and is being promoted by ma-jor companies such as Cisco, Huawei, IBM, and Intel as a way to transform the next-generation Internet.7

Finally, edge resources’ ability to provide all necessary services within a home or small business improves privacy because the data they carry doesn’t have to travel through the pub-lic network or reside in third-party datacenters.

Edge/cloud datacenter use cases

Secu

rity

thre

at a

naly

sis

and

coun

term

easu

res

that

exp

loit

the

intri

nsic

har

dwar

e he

tero

gene

ity

Stand-alone deployment

Applications

Operating system (OS)

UniServerdriver/serverVirtual machine (VM)

Variability-aware OpenStack

Scalable fault-tolerant hypervisor

Mic

ro-

chec

kpoi

ntin

g

Syst

em s

oftw

are

Apps

Firm

war

eHa

rdw

are

Proactive and adaptiveerror recovery

Hardware resourcesmanagement

Hardware exposure interface

Intrinsically and functionally heterogeneous hardware (cores, static/dynamic memories)

System con�gurationpolicies

Stress log

Stress daemons Health monitor

Health log

VM/hardware interactionmonitoring characterization

VM scheduling onheterogeneous cluster

VM

VM

OS OS

Applications Applications

Figure 2. This server ecosystem spans all layers of the system stack and enhances it with technologies for monitoring hardware health and optimizing overall operation under an extended range of operating points.

r12clo.indd 80 11/17/17 3:12 PM

This article originally appeared in Computer, vol. 50, no. 12, 2017.

Page 30: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

28 June 2019 Published by the IEEE Computer Society 2469-7087/19/$33.00 © 2019 IEEE

Internet of Things, People, and ProcessesEditor: Schahram Dustdar • [email protected]

64 Published by the IEEE Computer Society 1089-7801/17/$33.00 © 2017 IEEE IEEE INTERNET COMPUTING

A Serverless Real-Time Data Analytics Platform for Edge ComputingStefan Nastic, Thomas Rausch, Ognjen Scekic, and Schahram Dustdar • TU Wien

Marjan Gusev, Bojana Koteska, Magdalena Kostoska, and Boro Jakimovski • Ss. Cyril and Methodius University

Sasko Ristov and Radu Prodan • University of Innsbruck

A novel approach implements cloud-supported, real-time data analytics in

edge computing applications. The authors introduce their serverless edge-

data analytics platform and application model and discuss their main design

requirements and challenges, based on real-life healthcare use case scenarios.

W ith the increasing growth of the Inter-net of Things (IoT) and edge comput-ing,1-3 an abundance of geographically

dispersed computing infrastructure and edge resources remain largely underused for data analytics applications. At the same time, the value of data becomes effectively lost at the edge by remaining inaccessible to the more powerful data analytics in the cloud due to networking costs, latency issues, and limited interoper-ability between edge devices. The reason for both of these shortcomings is that today’s cloud models do not optimally support data analytics at the volume and variety of data originating from sensors and edge devices, typically char-acterized by high latencies and response times. There is a hard line between the edge and the cloud parts of analytics applications in terms of responsibilities, design, and runtime consider-ations. While contemporary solutions for cloud-supported, real-time data analytics mostly apply analytics techniques in a rigid bottom-up approach regardless of the data’s origin,4,5 doing data analytics on the edge forces developers to resort to ad hoc solutions speci�cally tailored to the available infrastructure. The process is

largely manual, task-speci�c, and error-prone and usually requires good knowledge of the underlying infrastructure. Consequently, when faced with large-scale, heterogeneous resource pools, performing effective data analytics is dif�cult, if not impossible.

A promising approach to address these issues is the serverless computing paradigm. Serverless computing is an emerging cloud-based execu-tion model in which user-de�ned functions are seamlessly and transparently hosted and man-aged by a distributed platform.6 There are multi-ple commercial and open source implementations of serverless platforms, such as Amazon Web Services Lambda (see http://aws.amazon.com/lambda), Apache OpenWhisk (http://openwhisk.org), or OpenLambda.6 The bene�ts of the server-less model become especially evident in the con-text of the described cloud and edge model, as both models seek to mitigate inef�cient, error-prone, and costly infrastructure and application management.

In this article, we propose a uni�ed cloud and edge data analytics platform, which extends the notion of serverless computing to the edge and facilitates joint programmatic resource and

Page 31: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

www.computer.org/computingedge 29

A Serverless Real-Time Data Analytics Platform for Edge Computing

JULY/AUGUST 2017 65

analytics management. We intro-duce a reference architecture for our platform and a serverless applica-tion execution model, which enable uniform development and operation of analytics functions, thereby free-ing users from worrying about the complexity of the underlying edge infrastructure. Finally, we outline a set of research challenges and design principles to serve as a road map toward a fully-�edged platform for cloud and edge real-time data analytics.

Use Case ScenariosTo better understand the need for distributed cloud and edge process-ing, we present two scenarios from IoT mobile healthcare (mHealth) and discuss holistic data analytics.

Use Case 1: Measuring Human Vital Signs in DisastersIn case of a major disaster, prompt paramedic attention is crucial to save people’s lives. An extension of major disaster protocol includes wearable biosensors that can be attached to injured people by on-site medics, providing critical information about the patient’s medical condition and helping determine a priority queue for further patient processing.

As soon as the sensors are attached, they start emitting data to nearby portable edge devices, such as a smartphone. Such devices per-form only basic, lightweight ana-lytics, such as triage decision tree, to determine the overall stability of the injured person and inform onsite medics. Once a network con-nection becomes available, selected data can be transferred to the cloud for more complex analysis that will predict and prioritize more pre-cisely the patients’ treatments, help-ing improve coordination between hospitals, optimize the time needed to reach a hospital, and provide timely information about patients’ conditions.

Use Case 2: Measuring Human Vital Signs in Everyday LifeIn this use case, a wearable biosen-sor measures a patient’s vital signs during everyday life activities. Sensors continuously stream data of electrocardiogram readings to a nearby edge device that performs simple data analytics to monitor the patient’s health condition. If an anomaly such as a heart failure is detected, the system immediately noti�es medical emergency services. Preprocessed and �ltered data are subsequently sent to the cloud, where comprehensive data analytics can be performed to gain better insight into the patient’s overall condition, and help with diagnostics.

Holistic Data Analytics in mHealthThe presented use cases demonstrate the need for consolidating cloud- and edge-based data analytics tech-niques. To enable prompt reactions to a patient’s changing health condi-tion, low-latency algorithms should process the data at the edge in real-time. Conversely, detecting patterns in large amounts of historic patient data requires analytics techniques that depend on cloud storage and processing capabilities. In an archi-tecture that combines cloud and edge data analytics, edge devices should act as a data gateway. Preprocessed and �ltered data can be sent to the cloud, where they become persistent and highly available for compute-intensive analytics. To consolidate different techniques, and transpar-ently handle data management, a holistic data analytics platform that uni�es cloud and edge resources is needed.

Serverless PlatformOur main objective is to provide a full-stack platform for supporting real-time data analytics across cloud and edge in a uniform manner. The key role of the distributed cloud and

edge platform is to facilitate auto-mated management of the underlying resource pool and optimal placement of analytics functions to support the envisioned serverless execution model. This approach enables com-bining the bene�ts of the edge (lower response time and heterogeneous data management) with the computa-tional and storage capabilities of the cloud. For example, time-sensitive data, such as life-critical vital signs, can be analyzed at the edge, close to where data are generated instead of being transported to the cloud for processing. Alternatively, selected data can be forwarded to the cloud for further, more powerful analysis and long-term storage.

Platform Use and Architecture OverviewFigure 1a shows a high-level view of the platform and the main top-down control process (left) and bottom-up data management and delivery pro-cess (right). The proposed serverless data analytics paradigm is particu-larly suitable for managing differ-ent granularities of data analytics approaches bottom-up. This means that the edge focuses on local views (for example, per edge gateway), while the cloud supports global views, that is, combining and analyzing data from different edge devices, regions, or even domains. Data are collected from the underlying devices and delivered to the applications via con-sumption APIs. More importantly, the data analytics can be performed on edge nodes, cloud nodes, or both, and delivered from any of the nodes directly to the application, based on the desired view. Moreover, the top-down control process allows decou-pling of application requirements (the what) from concrete realization of those requirements (the how). This allows developers to simply de�ne the analytics function behavior and data-processing business logic and application goals (for example,

Internet of Things, People, and ProcessesEditor: Schahram Dustdar • [email protected]

64 Published by the IEEE Computer Society 1089-7801/17/$33.00 © 2017 IEEE IEEE INTERNET COMPUTING

A Serverless Real-Time Data Analytics Platform for Edge ComputingStefan Nastic, Thomas Rausch, Ognjen Scekic, and Schahram Dustdar • TU Wien

Marjan Gusev, Bojana Koteska, Magdalena Kostoska, and Boro Jakimovski • Ss. Cyril and Methodius University

Sasko Ristov and Radu Prodan • University of Innsbruck

A novel approach implements cloud-supported, real-time data analytics in

edge computing applications. The authors introduce their serverless edge-

data analytics platform and application model and discuss their main design

requirements and challenges, based on real-life healthcare use case scenarios.

W ith the increasing growth of the Inter-net of Things (IoT) and edge comput-ing,1-3 an abundance of geographically

dispersed computing infrastructure and edge resources remain largely underused for data analytics applications. At the same time, the value of data becomes effectively lost at the edge by remaining inaccessible to the more powerful data analytics in the cloud due to networking costs, latency issues, and limited interoper-ability between edge devices. The reason for both of these shortcomings is that today’s cloud models do not optimally support data analytics at the volume and variety of data originating from sensors and edge devices, typically char-acterized by high latencies and response times. There is a hard line between the edge and the cloud parts of analytics applications in terms of responsibilities, design, and runtime consider-ations. While contemporary solutions for cloud-supported, real-time data analytics mostly apply analytics techniques in a rigid bottom-up approach regardless of the data’s origin,4,5 doing data analytics on the edge forces developers to resort to ad hoc solutions speci�cally tailored to the available infrastructure. The process is

largely manual, task-speci�c, and error-prone and usually requires good knowledge of the underlying infrastructure. Consequently, when faced with large-scale, heterogeneous resource pools, performing effective data analytics is dif�cult, if not impossible.

A promising approach to address these issues is the serverless computing paradigm. Serverless computing is an emerging cloud-based execu-tion model in which user-de�ned functions are seamlessly and transparently hosted and man-aged by a distributed platform.6 There are multi-ple commercial and open source implementations of serverless platforms, such as Amazon Web Services Lambda (see http://aws.amazon.com/lambda), Apache OpenWhisk (http://openwhisk.org), or OpenLambda.6 The bene�ts of the server-less model become especially evident in the con-text of the described cloud and edge model, as both models seek to mitigate inef�cient, error-prone, and costly infrastructure and application management.

In this article, we propose a uni�ed cloud and edge data analytics platform, which extends the notion of serverless computing to the edge and facilitates joint programmatic resource and

Page 32: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

30 ComputingEdge June 2019

Internet of Things, People, and Processes

66 www.computer.org/internet/ IEEE INTERNET COMPUTING

regarding provisioning) instead of dealing with the complexity of dif-ferent management, orchestration, and optimization processes.

Figure 1b shows the platform’s core architecture. We detail the layers of the architecture in the following.

The analytics function wrapper and APIs layer. This layer focuses on executing and managing user-provided data analytics functions — for example, delivering required data to the function and creating the

resulting endpoints. To this end, this layer wraps the analytics functions in executable artifacts such as Linux containers and relies on the underly-ing layers to perform concrete run-time actions and execution steps.

The orchestration layer. This layer interprets and executes user-de�ned real-time analytics functions, re-quire ments, and con�guration mod-els. It acts as a gluing component, bringing together an application’s con�guration model, user-de�ned

analytics functions, and the plat-form’s runtime mechanisms. There-fore, the orchestration layer receives the application con�guration direc-tives, in terms of high-level objectives such as optimizing network latency. It interprets and analyzes these goals and decides how to orchestrate the underlying resources, as well as the user-de�ned functions, by invoking the underlying runtime mechanisms. To this end, this layer contains micro (edge-based) and macro (cloud-based) orchestration and control loops. For example, it can use the scheduling and placement mechanisms to deter-mine the most suitable node (cloud or edge) for an analytics function to re-duce the network latency.

The runtime mechanisms layer. This is an extensible plug-in layer, providing mechanisms to support executing the actions initiated by the orchestration layer. The deploy-ment, scheduling, elasticity, and basic reasonable defaults for the quality of service (QoS) are core run-time mechanisms. More precisely, the platform determines the mini-mally required elastic resources, provisions them, deploys, and then schedules and executes analytics functions, which will satisfy QoS requirements. On the other hand, the governance, placement, fault tolerance, and extended QoS mecha-nisms are optional. For example, in some cases, the data could be con-�dential and some geographical regions should be excluded. Plac-ing the functions closer to the data and deciding whether to use cloud or edge resources could improve the QoS. Additionally, having a k-fault-tolerant platform that will mitigate failure risks to acceptable levels could further improve the QoS.

Serverless Stream ModelTo facilitate the serverless execu-tion of edge real-time data ana-lytics applications, we propose an

Figure 1. Cloud and edge real-time data analytics platform. (a) High-level usage context. (b) Internal software architecture.

Dataconsumption endpoints/APls

Data deliveryviews

Data analytics

Datacollection

Application/control

requirements/goals

Goal-to-executionmapping

Runtimemanagement/enforcementmechanisms

Resourcecon�guration &management

Serverlessexecution

model

Infrastructure resource pool

Distributedcloud-and-edge

platform

Cloud

(a)

Edge/fogdevices IoT devices

Stream application model

Resources abstraction/virtualization layer

Analytics function wrapper & APls layer

Dis

trib

uted

clo

ud-a

nd-e

dge

plat

form

Orc

hest

ratio

nla

yer

Monitor

Analyze Plan

Execute

Plugins/extensions runtime mechanisms layer

Dep

loym

ent

Sche

dulin

g

Plac

emen

t

Elas

ticity

Faul

tto

lera

nce

QoS

Gov

erna

nce

(b)

Page 33: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

www.computer.org/computingedge 31

A Serverless Real-Time Data Analytics Platform for Edge Computing

JULY/AUGUST 2017 67

extension of the traditional stream processing model. In our server-less stream model (see Figure 2), the transformation function is the core concept and encapsulates user-de�ned data analytics logic to pro-cess data along the stream. These functions are then composed into topologies that enable complex data processing applications. In our model, we consider streams as �rst-class citizens — that is, streams are de�ned through all of the presented concepts, as well as function wrap-pers and stream contracts.

The wrapper is responsible for encapsulating the transformation functions and exposing a thin API layer, enabling the analytics function layer to treat functions as microser-vices. This lets our system transpar-ently schedule and deploy functions using container-based deployment strategies, and compose functions to complex topologies. For stateful functions, these wrappers also pro-vide implicit state management. The wrapper transparently handles state replication and migration, and access to a function’s state is controlled via the exposed API.

The contract is a high-level de-scription of how the platform man-ages deployment and execution of streams and their functions. Spe-ci�cally, a contract gives a user �ne-grained control over the plat-form’s runtime mechanisms — that is, placement and scaling policies, governance rules, and QoS require-ments. A contract is divided into sections, where each section speci-�es a respective runtime mecha-nism. The placement section allows control over how analytics func-tions are deployed across the infra-structure, such as which cloud or edge resources to use. The scaling section allows control over elas-ticity strategies — that is, how the platform should adapt to varying workloads. Governance rules allow additional de�nition of restrictions

regarding security or privacy, such as the exclusion of certain geo-graphical regions in the distributed infrastructure. The QoS section al-lows control over QoS requirements the platform should respect, for ex-ample, maximum stream latency or minimum stream throughput. Unless explicitly de�ned in respec-tive sections, the platform will en-act sane defaults for each runtime mechanism.

Design Requirements and ChallengesThis section outlines the design re-quirements and challenges to realize our real-time data analytics platform and its main runtime mechanisms in edge.

Provisioning Data Analytics Functions and Edge ResourcesThe serverless paradigm and appli-cation execution model undoubt-edly has the potential to offer a wide range of bene�ts for provisioning and managing real-time edge data

analytics functions. Unfortunately, due to the inherently different nature of edge infrastructure, for exam-ple, in terms of available resources, networks, and so on, provisioning solutions designed for cloud-based serverless landscapes are hardly applicable out of the box in this new computing environment. Fundamen-tal architecture and design assump-tions behind such approaches need to be reexamined and speci�cally tailored for the edge infrastructure to support seamless provisioning of both infrastructure resources and application components.

In our previous work, we de-veloped models7 and middleware8

that enable provisioning edge/IoT resources and applications across large-scale IoT and edge deploy-ments. Although, these solutions offer numerous advantages to the application developers and opera-tions managers — such as a logically centralized point of operation in a geographically dispersed infrastruc-ture, uniform interaction patterns with both cloud and edge resources,

Figure 2. Serverless stream application model. The transformation function is the core concept and encapsulates user-de�ned data analytics logic to process the stream data.

Stream

DataData analytic

function

StateAnalytics

business logic

Wrapper API

Contract

Internet of Things, People, and Processes

66 www.computer.org/internet/ IEEE INTERNET COMPUTING

regarding provisioning) instead of dealing with the complexity of dif-ferent management, orchestration, and optimization processes.

Figure 1b shows the platform’s core architecture. We detail the layers of the architecture in the following.

The analytics function wrapper and APIs layer. This layer focuses on executing and managing user-provided data analytics functions — for example, delivering required data to the function and creating the

resulting endpoints. To this end, this layer wraps the analytics functions in executable artifacts such as Linux containers and relies on the underly-ing layers to perform concrete run-time actions and execution steps.

The orchestration layer. This layer interprets and executes user-de�ned real-time analytics functions, re-quire ments, and con�guration mod-els. It acts as a gluing component, bringing together an application’s con�guration model, user-de�ned

analytics functions, and the plat-form’s runtime mechanisms. There-fore, the orchestration layer receives the application con�guration direc-tives, in terms of high-level objectives such as optimizing network latency. It interprets and analyzes these goals and decides how to orchestrate the underlying resources, as well as the user-de�ned functions, by invoking the underlying runtime mechanisms. To this end, this layer contains micro (edge-based) and macro (cloud-based) orchestration and control loops. For example, it can use the scheduling and placement mechanisms to deter-mine the most suitable node (cloud or edge) for an analytics function to re-duce the network latency.

The runtime mechanisms layer. This is an extensible plug-in layer, providing mechanisms to support executing the actions initiated by the orchestration layer. The deploy-ment, scheduling, elasticity, and basic reasonable defaults for the quality of service (QoS) are core run-time mechanisms. More precisely, the platform determines the mini-mally required elastic resources, provisions them, deploys, and then schedules and executes analytics functions, which will satisfy QoS requirements. On the other hand, the governance, placement, fault tolerance, and extended QoS mecha-nisms are optional. For example, in some cases, the data could be con-�dential and some geographical regions should be excluded. Plac-ing the functions closer to the data and deciding whether to use cloud or edge resources could improve the QoS. Additionally, having a k-fault-tolerant platform that will mitigate failure risks to acceptable levels could further improve the QoS.

Serverless Stream ModelTo facilitate the serverless execu-tion of edge real-time data ana-lytics applications, we propose an

Figure 1. Cloud and edge real-time data analytics platform. (a) High-level usage context. (b) Internal software architecture.

Dataconsumption endpoints/APls

Data deliveryviews

Data analytics

Datacollection

Application/control

requirements/goals

Goal-to-executionmapping

Runtimemanagement/enforcementmechanisms

Resourcecon�guration &management

Serverlessexecution

model

Infrastructure resource pool

Distributedcloud-and-edge

platform

Cloud

(a)

Edge/fogdevices IoT devices

Stream application model

Resources abstraction/virtualization layer

Analytics function wrapper & APls layer

Dis

trib

uted

clo

ud-a

nd-e

dge

plat

form

Orc

hest

ratio

nla

yer

Monitor

Analyze Plan

Execute

Plugins/extensions runtime mechanisms layer

Dep

loym

ent

Sche

dulin

g

Plac

emen

t

Elas

ticity

Faul

tto

lera

nce

QoS

Gov

erna

nce

(b)

Page 34: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

32 ComputingEdge June 2019

Internet of Things, People, and Processes

68 www.computer.org/internet/ IEEE INTERNET COMPUTING

and utility-based resource consump-tion — there’s still a number of chal-lenges to address to enable our vision of a serverless real-time data analyt-ics platform for the edge. These in-clude fully automated provisioning solutions, where user interaction is limited to providing high-level poli-cies and goals that need to be ful-�lled by the platform; design and implementation of provisioning and orchestration mechanisms for the edge network (for example, based on network function virtualization slicing or software-de�ned net-works); and models and techniques for secure edge-resource negotiation based on smart contracts and Block-chain technologies.

Availability and Scheduling of Loosely Coupled Edge ResourcesThe analytics wrapper and APIs layer (see Figure 1b) is an abstraction level that addresses management of data

analytics functions. These abstrac-tion concepts that arrive at differ-ent speeds stochastically must be mapped to real runtime mechanisms. This orchestration and con�guration cannot occur in an easy and predict-able manner, because of the limited resources in the runtime mecha-nisms layer. If there exist unlimited, tightly coupled resources as in the cloud, then mapping is ideal because each new function will be mapped to a new resource runtime mechanism in the described monitor, analyze, plan, and execute loop.

Resource availability in edge computing complicates the execu-tion of the planned activities by the orchestration layer. Multiple meth-ods can solve the problem of limited available resources and required pro-cessing, communication, or storage demands, such as inserting queues in front of each resource in the pool, or a common (grouped) queue.

Scheduling tasks over provisioned resources is by itself a challenge, as it is an NP-hard, real-time problem. Scheduling in the edge’s loosely cou-pled infrastructure is even more chal-lenging than for the cloud’s tightly coupled one. Even if the edge provi-sions elastic resources, the edge’s dis-tribution will generate huge network latency because of its slower wide area network (WAN) compared to the cloud’s high-speed LAN. Therefore, the platform should provide light-weight algorithms for optimizing the real-time decision making for sched-uling, considering multiple con�ict-ing criteria, such as energy ef�ciency, dependability, real-time response, resiliency, reliability, time-predict-ability, fault tolerance, and system cost. To reduce the tradeoff between the acceptable schedule that can be calculated fastest (in real time), a foot-print analyzer can be implemented, whose historical outputs can be used

Related Work in Scalable Data Analytics Applications

Traditionally, scalable data analyt-ics applications have been realized

with cloud-supported, distributed, data-stream processing systems. Maintaining low end-to-end latencies under high data velocity is a major challenge for such sys-tems, particularly in large-scale Internet of Things scenarios. Systems like Stream-Cloud1 and Twitter Heron2 were devel-oped to handle massive amounts of data, using concepts such as auto-paralleliza-tion of stream operators, clustering, and elastic scaling. While these approaches address scalability issues, they don’t consider edge-speci�c features such as locality awareness, which are crucial for achieving low-latency, real-time analytics.

Different approaches extended tra-ditional stream processing with novel algorithms for deploying and scheduling operators at the edge. For example, Apos-tolos Papageorgiou and colleagues ex-tended the stream topology deployment

algorithm of Apache Storm.3 In particular, their deployment optimization approach incorporates quality of service (QoS) metrics of topology-external interactions to reduce communication latencies with-in stream-processing topologies. Valeria Cardellini and colleagues propose a simi-lar approach, where the scheduling algo-rithm takes into account network QoS metrics, such as latency, between stream operators.4

Only a few efforts have been made to develop novel architectures for data analyt-ics platforms in the edge. Mahadev Satyana-rayanan and colleagues propose GigaSight, a hybrid architecture for computer-vision analytics based on cloudlets.5 In GigaSight, cloudlets �lter and process mobile video streams in near real time. Only process-ing results, such as recognized objects, and corresponding metadata are sent to the cloud, thereby reducing end-to-end laten-cies as well as bandwidth usage.

References1. P. Valduriez et al., “StreamCloud: An Elastic and

Scalable Data Streaming System,” IEEE Trans.

Parallel and Distributed Systems, vol. 23, no. 12,

2012, pp. 2351–2365.

2. S. Kulkarni et al., “Twitter Heron: Stream Pro-

cessing at Scale,” Proc. ACM Sigmod Int’l Conf.

Management of Data, 2015, pp. 239–250.

3. A. Papageorgiou, E. Poormohammady, and

B. Cheng, “Edge-Computing-Aware Deploy-

ment of Stream Processing Tasks Based on

Topology-External Information: Model, Algo-

rithms, and A Storm-Based Prototype,” Proc.

IEEE Int’l Conf. Big Data, 2016, pp. 259–266.

4. V. Cardellini et al., “On QoS-Aware Scheduling

of Data Stream Applications Over Fog Comput-

ing Infrastructures,” Proc. IEEE Symp. Computers

and Comm., 2015, pp. 271–276.

5. M. Satyanarayanan et al., “Edge Analytics in the

Internet of Things,” IEEE Pervasive Computing,

vol. 14, no. 2, 2015, pp. 24–31.

Page 35: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

www.computer.org/computingedge 33

A Serverless Real-Time Data Analytics Platform for Edge Computing

JULY/AUGUST 2017 69

as heuristics to predict edge resource behavior without interrupting real-time data analytics.

Resources Virtualization and Heterogeneous Infrastructure MappingMost of the data collection devices (sensors) in the infrastructure re-source pool are abstracted by their virtualized model in the resources abstraction layer. The main idea of the model is to transfer data and re-alize the data analytics functions via cloud/fog computing servers.

We address the autonomous func-tion of IoT devices as a challenge that our model will face. Autono-mous processing means that certain data analytics algorithms are also performed on a lower level — that is, the processing is located closer to the source (in the edge). This leads to a dew computing design, including a dew server (edge device) on the path between the IoT devices/sensors and the cloud. Our model implements the autonomous function by mapping the main data analytics functions directly to the IoT devices.

In this context, an open chal-lenge is to solve the interoperability and integration between different device implementations on the infra-structure level. Our resource virtu-alization layer directly addresses these issues, because each device is presented by its functions.

Finally, portability on the lower layer means to transfer de�ned func-tions between various devices. For example, this is addressed by the fault tolerance and elasticity runtime mechanisms to replicate or enable a spare device that will continue per-forming the de�ned functions if one resource fails or share the load if the load increases.

Rapid Elasticity at the EdgeCloud-based serverless platforms use commodity infrastructure and have small footprint and short execution

duration, combined with statisti-cal multiplexing of a large number of heterogeneous workloads over time.9 Edge, on the other hand, has different characteristics because of its different infrastructure — that is, infrastructure deployment and its geographic dispersion, the topology of network connectivity, and local-ity-awareness. One challenge is the discoverability of resources. Given the number of heterogeneous devic-es, standard discovery mechanisms could become burdened with mas-sive workloads; as a solution, and to ensure elasticity, dynamic strate-gies should be in place for sharing resources. Another perspective is the geographic dispersion, where edge devices are scattered around the network and have limited capacity. A framework should be established, which can dynamically select appro-priate devices in regard to proximity to data sources, while considering latency and mobility. The framework should differentiate critical and non-critical functions by setting priori-ties to the services in the case of a heavy workload in a single location. The topology of the network con-nectivity represents another issue in rapid provisioning of processing capabilities. It should allow imper-ceptible handling of any increased workload without increasing laten-cies and cope with heterogeneous devices, while maintaining a secure environment. As a result, the com-munication protocols among the nodes should be carefully selected and established.

Data ManagementApart from the �ve Vs (volume, ve-locity, variety, veracity, and value) for big data analytics, some real-time analytics applications require additional data management, which should be performed in edge, as well. The data must be preprocessed before going through the transformation functions. Further, data ingestion,

transcription services, deduplica-tion, and natural language-process-ing algorithms are required.

Many real-time analytics, such as our �rst use case, require updat-ing the model in real time accord-ing to the shorter data horizon. This requirement refreshes the incremen-tal learning algorithms, which will compensate the lack of edge resources compared to the cloud. On the other hand, the lack of storage capacity is compensated with the smaller data obsolescence — that is, prediction upon smaller data series. The sec-ond use case is a good representative of this. In both use cases, the data can be forgotten or destroyed after being processed, which will mitigate the risk of data leakage and bring the data privacy and security to an acceptable level. For other real-time analytics, such as longer data hori-zons or data obsolescence, the same challenges remain, as is the case in the cloud.

To tackle these requirements, one challenge is to design a whole work-�ow of dependent processes in the orchestration layer. Placement, QoS, and fault tolerance should be added to the core runtime mechanisms for these types of real-time analytics, as struggling or failure of some execu-tion nodes could lead to service-level agreement violations.

Despite the elastic computing power of cloud, along with high-

speed networks, real-time analyt-ics in the novel edge computing landscape are becoming ever-more challenging. Our platform facili-tates real-time data analytics over cloud and edge computing in a uniform manner. The proposed serverless stream model makes transparent the underused under-lying heterogeneous edge infra-structure and enables easier and more intuitive development of vari-ous real-time analytics functions,

Internet of Things, People, and Processes

68 www.computer.org/internet/ IEEE INTERNET COMPUTING

and utility-based resource consump-tion — there’s still a number of chal-lenges to address to enable our vision of a serverless real-time data analyt-ics platform for the edge. These in-clude fully automated provisioning solutions, where user interaction is limited to providing high-level poli-cies and goals that need to be ful-�lled by the platform; design and implementation of provisioning and orchestration mechanisms for the edge network (for example, based on network function virtualization slicing or software-de�ned net-works); and models and techniques for secure edge-resource negotiation based on smart contracts and Block-chain technologies.

Availability and Scheduling of Loosely Coupled Edge ResourcesThe analytics wrapper and APIs layer (see Figure 1b) is an abstraction level that addresses management of data

analytics functions. These abstrac-tion concepts that arrive at differ-ent speeds stochastically must be mapped to real runtime mechanisms. This orchestration and con�guration cannot occur in an easy and predict-able manner, because of the limited resources in the runtime mecha-nisms layer. If there exist unlimited, tightly coupled resources as in the cloud, then mapping is ideal because each new function will be mapped to a new resource runtime mechanism in the described monitor, analyze, plan, and execute loop.

Resource availability in edge computing complicates the execu-tion of the planned activities by the orchestration layer. Multiple meth-ods can solve the problem of limited available resources and required pro-cessing, communication, or storage demands, such as inserting queues in front of each resource in the pool, or a common (grouped) queue.

Scheduling tasks over provisioned resources is by itself a challenge, as it is an NP-hard, real-time problem. Scheduling in the edge’s loosely cou-pled infrastructure is even more chal-lenging than for the cloud’s tightly coupled one. Even if the edge provi-sions elastic resources, the edge’s dis-tribution will generate huge network latency because of its slower wide area network (WAN) compared to the cloud’s high-speed LAN. Therefore, the platform should provide light-weight algorithms for optimizing the real-time decision making for sched-uling, considering multiple con�ict-ing criteria, such as energy ef�ciency, dependability, real-time response, resiliency, reliability, time-predict-ability, fault tolerance, and system cost. To reduce the tradeoff between the acceptable schedule that can be calculated fastest (in real time), a foot-print analyzer can be implemented, whose historical outputs can be used

Related Work in Scalable Data Analytics Applications

Traditionally, scalable data analyt-ics applications have been realized

with cloud-supported, distributed, data-stream processing systems. Maintaining low end-to-end latencies under high data velocity is a major challenge for such sys-tems, particularly in large-scale Internet of Things scenarios. Systems like Stream-Cloud1 and Twitter Heron2 were devel-oped to handle massive amounts of data, using concepts such as auto-paralleliza-tion of stream operators, clustering, and elastic scaling. While these approaches address scalability issues, they don’t consider edge-speci�c features such as locality awareness, which are crucial for achieving low-latency, real-time analytics.

Different approaches extended tra-ditional stream processing with novel algorithms for deploying and scheduling operators at the edge. For example, Apos-tolos Papageorgiou and colleagues ex-tended the stream topology deployment

algorithm of Apache Storm.3 In particular, their deployment optimization approach incorporates quality of service (QoS) metrics of topology-external interactions to reduce communication latencies with-in stream-processing topologies. Valeria Cardellini and colleagues propose a simi-lar approach, where the scheduling algo-rithm takes into account network QoS metrics, such as latency, between stream operators.4

Only a few efforts have been made to develop novel architectures for data analyt-ics platforms in the edge. Mahadev Satyana-rayanan and colleagues propose GigaSight, a hybrid architecture for computer-vision analytics based on cloudlets.5 In GigaSight, cloudlets �lter and process mobile video streams in near real time. Only process-ing results, such as recognized objects, and corresponding metadata are sent to the cloud, thereby reducing end-to-end laten-cies as well as bandwidth usage.

References1. P. Valduriez et al., “StreamCloud: An Elastic and

Scalable Data Streaming System,” IEEE Trans.

Parallel and Distributed Systems, vol. 23, no. 12,

2012, pp. 2351–2365.

2. S. Kulkarni et al., “Twitter Heron: Stream Pro-

cessing at Scale,” Proc. ACM Sigmod Int’l Conf.

Management of Data, 2015, pp. 239–250.

3. A. Papageorgiou, E. Poormohammady, and

B. Cheng, “Edge-Computing-Aware Deploy-

ment of Stream Processing Tasks Based on

Topology-External Information: Model, Algo-

rithms, and A Storm-Based Prototype,” Proc.

IEEE Int’l Conf. Big Data, 2016, pp. 259–266.

4. V. Cardellini et al., “On QoS-Aware Scheduling

of Data Stream Applications Over Fog Comput-

ing Infrastructures,” Proc. IEEE Symp. Computers

and Comm., 2015, pp. 271–276.

5. M. Satyanarayanan et al., “Edge Analytics in the

Internet of Things,” IEEE Pervasive Computing,

vol. 14, no. 2, 2015, pp. 24–31.

Page 36: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

34 ComputingEdge June 2019

Internet of Things, People, and Processes

70 www.computer.org/internet/ IEEE INTERNET COMPUTING

without worrying about nonfunc-tional requirements.

Our model will switch the cur-rent view of centralized premise and cloud real-time analytics into more distributed, edge, ubiquitous, real-time analytics, in which the data’s value won’t be lost at the edge and all computing layers will be used evenly. Our vision is that all com-puting layers work together like a team, without making the cloud the team’s most important player. Our platform will act as a team manager that will follow the road map toward a fully-�edged platform for cloud and edge real-time data analytics. It will overcome the challenges of pro-visioning data analytics functions at edge resources, making them highly available and rapidly elastic, and processing and managing the data in real time without (or before) passing it to the cloud.

AcknowledgmentsThis work is partiality supported by JPI Urban

Europe, ERA-NET, under project 5631209 and

by bilateral MKD-AUT project 18779 (Scal-

ability and Elasticity Performance of Cloud

Services).

References1. M. Satyanarayanan, “The Emergence of

Edge Computing,” Computer, vol. 50, no.

1, 2017, pp. 30–39.

2. V. Bahl, “Cloud 2020: Emergence of Micro

Data Centers (Cloudlets) For Latency

Sensitive Computing,” keynote address,

Devices and Networking Summit, 21

Apr. 2015; https://channel9.msdn.com/

Events/Microsof t-Research/Devices-

and-Networking-Summit-2015/Closing-

Keynote.

3. F. Bonomi et al., “Fog Computing and Its

Role in the Internet of Things,” Proc. MCC

Workshop Mobile Cloud Computing, 2012,

pp. 13–16.

4. P. Valduriez et al., “StreamCloud: An

Elastic and Scalable Data Streaming

System,” IEEE Trans. Parallel and Dis-

tributed Systems, vol. 23, no. 12, 2012,

pp. 2351–2365.

5. S. Kulkarni et al., “Twitter Heron: Stream

Processing at Scale,” Proc. ACM Sigmod

Int’l Conf. Management of Data, 2015,

pp. 239–250.

6. S. Hendrickson et al., “Serverless Com-

putation with OpenLambda,” Proc. Use-

nix Conf. Hot Topics in Cloud Computing,

2016, pp. 33–39.

7. S. Nastic et al., “Provisioning Software-

De�ned IoT Cloud Systems,” Proc. Int’l

Conf. Future Internet of Things and Cloud,

2014; doi:10.1109/FiCloud.2014.52.

8. S. Nastic, H.-L. Truong, and S. Dustdar,

“A Middleware Infrastructure for Utility-

Based Provisioning of IoT Cloud Systems,”

Proc. 1st IEEE/ACM Symp. Edge Comput-

ing, 2016; doi:10.1109/SEC.2016.35.

9. D. Breitgand et al., “SLA-Aware Resource

Over-Commit in an IaaS Cloud,” Proc. 8th

Int’l Conf. Network and Service Manage-

ment, 2012, pp. 73–81.

Stefan Nastic is a postdoctoral research assis-

tant at the Distributed Systems Group

(DSG), TU Wien, Austria. His research

interests include Internet of Things (IoT)

and edge computing, cloud computing,

big data analytics, and smart cities. Nas-

tic has a DrTech in programming, provi-

sioning, and governing IoT cloud systems

from TU Wien. Contact him at snastic@

infosys.tuwien.ac.at.

Thomas Rausch is a PhD student at the DSG,

TU Wien, Austria. His research interests

include IoT, edge computing, and event-

based systems. Rausch has an MS in soft-

ware engineering and Internet computing

from TU Wien. Contact him at trausch@

dsg.tuwien.ac.at.

Ognjen Scekic is a postdoctoral university

assistant at the DSG, TU Wien, Austria.

His research interests include social com-

puting, collective adaptive systems, and

smart cities. Scekic has a PhD in auto-

mated incentive management for social

computing (foundations, models, tools

and algorithms) from TU Wien. Contact

him at [email protected].

Schahram Dustdar is a full professor of

computer science, and he heads the DSG

at TU Wien. His work focuses on distrib-

uted systems. Dustdar is an IEEE Fellow,

a member of the Academia Europaea,

an ACM Distinguished Scientist, and

recipient of the IBM Faculty Award. He

is on the editorial boards of IEEE Inter-

net Computing and Computer. He’s an

associate editor of IEEE Transactions

on Services Computing, ACM Transac-

tions on the Web, and ACM Transactions

on Internet Technology. He’s the editor-

in-chief of Springer Computing. Con-

tact him at [email protected];

http://dsg.tuwien.ac.at/staff/sd/.

Marjan Gusev is a professor at Ss. Cyril and

Methodius University, Skopje, Macedo-

nia. His research interests include IoT,

cloud computing, and eHealth solutions.

Gusev has a PhD in electrical sciences

from the University of Ljubljana. Contact

him at marjan.gushev@�nki.ukim.mk.

Bojana Koteska is a PhD student and teach-

ing and research assistant at Ss. Cyril

and Methodius University. Her research

interests include scienti�c and cloud

computing, software quality, and dis-

tributed computing. Koteska has an MS

in software engineering from Ss. Cyril

and Methodius University. Contact her at

bojana.koteska@�nki.ukim.mk.

Magdalena Kostoska is an assistant profes-

sor at Ss. Cyril and Methodius Univer-

sity. Her research interests include cloud

computing and IoT. Kostoska has a PhD

in cloud computing from Ss. Cyril and

Methodius University. Contact her at

magdalena.kostoska@�nki.ukim.mk.

Boro Jakimovski is an associate professor at

Ss. Cyril and Methodius University. His

research interests include grid comput-

ing, high-performance computing, par-

allel and distributed processing, and

genetic algorithms. Jakimovski has a

PhD in distributed systems from Ss. Cyril

and Methodius University. Contact him at

boro.jakimovski@�nki.ukim.mk.

Sasko Ristov a university assistant (postdoc)

at the University of Innsbruck, Austria,

Page 37: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

www.computer.org/computingedge 35

A Serverless Real-Time Data Analytics Platform for Edge Computing

JULY/AUGUST 2017 71

and an assistant professor at Ss. Cyril and

Methodius University, Skopje, Macedonia.

His research interests include performance

modeling and optimization and parallel

and distributed systems. Ristov has a PhD

degree in computer science from Ss. Cyril

and Methodius University. Contact him at

[email protected].

Radu Prodan is an associate professor at the

University of Innsbruck, Austria. His

research interests include parallel and

distributed systems and software tools

(performance, debugging, scheduling, fault

tolerance, and energy). Prodan has a habili-

tation degree in computer science from the

University of Innsbruck. He’s an associate

editor of IEEE Transactions on Parallel and

Distributed Systems. Contact him at radu@

dps.uibk.ac.at.

Read your subscriptions through the myCS publi-cations portal at http://mycs.computer.org.

NOMINATE A COLLEAGUE FOR THIS AWARD!

DUE: 15 OCTOBER 2017

CALL FOR STANDARDS AWARD NOMINATIONS

IEEE COMPUTER SOCIETY HANS KARLSSON STANDARDS AWARD

Submit your nomination electronically: awards.computer.org | Questions: [email protected]

• Requires 3 endorsements.

• Self-nominations are not accepted.

• Do not need IEEE or IEEE Computer Society membership to apply.

A plaque and $2,000 honorarium is presented in recognition of outstanding skills and dedication to diplomacy, team facilitation, and joint achievement in the development or promotion of standards in the computer industry where individual aspirations, corporate competition, and organizational rivalry could otherwise be counter to the bene� t of society.

Internet of Things, People, and Processes

70 www.computer.org/internet/ IEEE INTERNET COMPUTING

without worrying about nonfunc-tional requirements.

Our model will switch the cur-rent view of centralized premise and cloud real-time analytics into more distributed, edge, ubiquitous, real-time analytics, in which the data’s value won’t be lost at the edge and all computing layers will be used evenly. Our vision is that all com-puting layers work together like a team, without making the cloud the team’s most important player. Our platform will act as a team manager that will follow the road map toward a fully-�edged platform for cloud and edge real-time data analytics. It will overcome the challenges of pro-visioning data analytics functions at edge resources, making them highly available and rapidly elastic, and processing and managing the data in real time without (or before) passing it to the cloud.

AcknowledgmentsThis work is partiality supported by JPI Urban

Europe, ERA-NET, under project 5631209 and

by bilateral MKD-AUT project 18779 (Scal-

ability and Elasticity Performance of Cloud

Services).

References1. M. Satyanarayanan, “The Emergence of

Edge Computing,” Computer, vol. 50, no.

1, 2017, pp. 30–39.

2. V. Bahl, “Cloud 2020: Emergence of Micro

Data Centers (Cloudlets) For Latency

Sensitive Computing,” keynote address,

Devices and Networking Summit, 21

Apr. 2015; https://channel9.msdn.com/

Events/Microsof t-Research/Devices-

and-Networking-Summit-2015/Closing-

Keynote.

3. F. Bonomi et al., “Fog Computing and Its

Role in the Internet of Things,” Proc. MCC

Workshop Mobile Cloud Computing, 2012,

pp. 13–16.

4. P. Valduriez et al., “StreamCloud: An

Elastic and Scalable Data Streaming

System,” IEEE Trans. Parallel and Dis-

tributed Systems, vol. 23, no. 12, 2012,

pp. 2351–2365.

5. S. Kulkarni et al., “Twitter Heron: Stream

Processing at Scale,” Proc. ACM Sigmod

Int’l Conf. Management of Data, 2015,

pp. 239–250.

6. S. Hendrickson et al., “Serverless Com-

putation with OpenLambda,” Proc. Use-

nix Conf. Hot Topics in Cloud Computing,

2016, pp. 33–39.

7. S. Nastic et al., “Provisioning Software-

De�ned IoT Cloud Systems,” Proc. Int’l

Conf. Future Internet of Things and Cloud,

2014; doi:10.1109/FiCloud.2014.52.

8. S. Nastic, H.-L. Truong, and S. Dustdar,

“A Middleware Infrastructure for Utility-

Based Provisioning of IoT Cloud Systems,”

Proc. 1st IEEE/ACM Symp. Edge Comput-

ing, 2016; doi:10.1109/SEC.2016.35.

9. D. Breitgand et al., “SLA-Aware Resource

Over-Commit in an IaaS Cloud,” Proc. 8th

Int’l Conf. Network and Service Manage-

ment, 2012, pp. 73–81.

Stefan Nastic is a postdoctoral research assis-

tant at the Distributed Systems Group

(DSG), TU Wien, Austria. His research

interests include Internet of Things (IoT)

and edge computing, cloud computing,

big data analytics, and smart cities. Nas-

tic has a DrTech in programming, provi-

sioning, and governing IoT cloud systems

from TU Wien. Contact him at snastic@

infosys.tuwien.ac.at.

Thomas Rausch is a PhD student at the DSG,

TU Wien, Austria. His research interests

include IoT, edge computing, and event-

based systems. Rausch has an MS in soft-

ware engineering and Internet computing

from TU Wien. Contact him at trausch@

dsg.tuwien.ac.at.

Ognjen Scekic is a postdoctoral university

assistant at the DSG, TU Wien, Austria.

His research interests include social com-

puting, collective adaptive systems, and

smart cities. Scekic has a PhD in auto-

mated incentive management for social

computing (foundations, models, tools

and algorithms) from TU Wien. Contact

him at [email protected].

Schahram Dustdar is a full professor of

computer science, and he heads the DSG

at TU Wien. His work focuses on distrib-

uted systems. Dustdar is an IEEE Fellow,

a member of the Academia Europaea,

an ACM Distinguished Scientist, and

recipient of the IBM Faculty Award. He

is on the editorial boards of IEEE Inter-

net Computing and Computer. He’s an

associate editor of IEEE Transactions

on Services Computing, ACM Transac-

tions on the Web, and ACM Transactions

on Internet Technology. He’s the editor-

in-chief of Springer Computing. Con-

tact him at [email protected];

http://dsg.tuwien.ac.at/staff/sd/.

Marjan Gusev is a professor at Ss. Cyril and

Methodius University, Skopje, Macedo-

nia. His research interests include IoT,

cloud computing, and eHealth solutions.

Gusev has a PhD in electrical sciences

from the University of Ljubljana. Contact

him at marjan.gushev@�nki.ukim.mk.

Bojana Koteska is a PhD student and teach-

ing and research assistant at Ss. Cyril

and Methodius University. Her research

interests include scienti�c and cloud

computing, software quality, and dis-

tributed computing. Koteska has an MS

in software engineering from Ss. Cyril

and Methodius University. Contact her at

bojana.koteska@�nki.ukim.mk.

Magdalena Kostoska is an assistant profes-

sor at Ss. Cyril and Methodius Univer-

sity. Her research interests include cloud

computing and IoT. Kostoska has a PhD

in cloud computing from Ss. Cyril and

Methodius University. Contact her at

magdalena.kostoska@�nki.ukim.mk.

Boro Jakimovski is an associate professor at

Ss. Cyril and Methodius University. His

research interests include grid comput-

ing, high-performance computing, par-

allel and distributed processing, and

genetic algorithms. Jakimovski has a

PhD in distributed systems from Ss. Cyril

and Methodius University. Contact him at

boro.jakimovski@�nki.ukim.mk.

Sasko Ristov a university assistant (postdoc)

at the University of Innsbruck, Austria,

This article originally appeared in IEEE Internet Computing, vol. 21, no. 4, 2017.

Advertising Personnel

Debbie Sims: Advertising CoordinatorEmail: [email protected]: +1 714 816 2138 | Fax: +1 714 821 4010

Advertising Sales Representatives (display)

Central, Northwest, Southeast, Far East: Eric KincaidEmail: [email protected]: +1 214 673 3742Fax: +1 888 886 8599

Northeast, Midwest, Europe, Middle East: David SchisslerEmail: [email protected] Phone: +1 508 394 4026Fax: +1 508 394 1707

Southwest, California: Mike HughesEmail: [email protected]: +1 805 529 6790

Advertising Sales Representative (Classi�eds & Jobs Board)

Heather BuonadiesEmail: [email protected]: +1 201 887 1703

Advertising Sales Representative (Jobs Board)

Marie ThompsonEmail: [email protected]: 714-813-5094

ADVERTISER INFORMATION

Page 38: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

36 June 2019 Published by the IEEE Computer Society 2469-7087/19/$33.00 © 2019 IEEE

REDIRECTIONS

MARCH/APRIL 2018 | IEEE SOFTWARE 97

Furthermore, to make those es-timates, we need only a remarkably small number of attributes. For ex-ample, feature subset selection (FSS) is an automatic technique for �nd-ing what attributes we can remove without damaging our ability to make a prediction from the data. Recent results show that traditional FSS methods (for example, step-wise regression) can be improved by AI search algorithms that quickly search very large subsets of the attri-butes to �nd the most useful ones.7 Applying FSS to software defect pre-dictors or software effort estimators often reduces datasets with 24 to 42 attributes to sets with only two or three attributes.5,8 This means that (24 – 3)/24 � 88 percent to (42 – 3)/ 42 � 93 percent of the collected at-tributes aren’t essential to predict-ing software quality. That is, much of what we thought was important for quality prediction turns out to be mostly irrelevant. And, more in-terestingly, we can’t tell beforehand what attributes will be the most use-ful9 until we test those attributes on real project data.

This result is surprising, to say the least. Software engineers tend to emphasize the complexities, rather than the simplicity, of software proj-ects. Much has been written about what factors might in�uence a soft-ware project, so developers often spend much effort collecting dozens of attributes. Yet for the projects I’ve mentioned, nearly all those at-tributes are irrelevant for prediction.

Important QuestionsWhy are so many things irrelevant? Software engineering data often con-tains much noise. Collecting data from multiple projects is dif�cult because the collected data’s mean-ing can vary from project to project.

We can remove such noisy attri-butes without damaging predictive prowess.

We should also remove most of the closely associated attributes. Suppose a software company assigns its most skilled programmers to mission-critical projects. In that data, “programming skill” would be associated with “criticality.” We could dispense with either (but not both) of those attributes without los-ing important information.

In addition, there’s the effect of context. Figure 1 shows data from NASA regarding software projects at the Jet Propulsion Laboratory (JPL). Most of the projects are of high complexity. Thus, feature selection would tend to delete “high complex-ity” because it’s (mostly) a constant across all the data. That is, although no one doubts that software com-plexity contributes to software cost, for the JPL data, it’s mostly irrelevant.

So, what does this mean for the practice of analytics? The previous examples tell us that real software projects can surprise and confound our expectations. In new projects, we should check all expectations

(that some factor contributes to soft-ware quality). That’s the bad news. The good news is that such checks are now fast to run, given the ready availability of data-mining tools (and developers skilled in using them).

M ore generally, note how these examples are all motivation for this new

Redirections department. Our �eld is rife with any number of truisms that are commonly quoted but rarely checked. Perhaps it’s time to reverse that trend. Let’s all look over old re-sults in software engineering with a fresh eye and ask, “Which of those results are most applicable?” and “Can we con�rm those results using contemporary data?” Hopefully, this department will prompt many such inquiries.

References 1. R. Prikladnicki and T. Menzies,

“From Voice of Evidence to Redirec-

tions,” IEEE Software, vol. 35, no. 1,

2018, pp. 11–13

2. T. Menzies and T. Zimmermann,

“Software Analytics: So What?,”

IEEE Software, vol. 30, no. 4, 2013,

Low Nominal Highcomplexity

Very high Extra high

25

50

2 10

10

20

30

40

50

No. o

f pro

ject

s

FIGURE 1. The distribution of complexity in a NASA project dataset.5 Complexity is

a constant across nearly all the data, so it could be removed from consideration as an

attribute during software analytics.

FROM THE EDITOREditor: Editor Nameaffi l [email protected]

96 IEEE SOFTWARE | PUBLISHED BY THE IEEE COMPUTER SOCIETY 0 7 4 0 - 7 4 5 9 / 1 8 / $ 3 3 . 0 0 © 2 0 1 8 I E E E

REDIRECTIONSEditor: Tim MenziesNorth Carolina State [email protected]

The Unreasonable Effectiveness of Software AnalyticsTim Menzies

AS RAFAEL PRIKLADNICKI and I commented in last issue’s Voice of Evidence article, it’s time to ask, “What’s surprising about software engineering?”1 Accordingly, in this article, I explore one of the great mysteries of software analytics: why does it work at all?

Software analytics distills large amounts of low-value data into small chunks of very-high-value data. Such chunks are often predictive; that is, they can offer a somewhat accurate prediction about some quality attri-bute of future projects—for exam-ple, the location of potential defects or the development cost.

In theory, software analytics shouldn’t work because software proj-ect behavior shouldn’t be predictable. Consider the wide, ever-changing range of tasks being implemented

by software and the diverse, con-tinually evolving tools used for soft-ware’s construction (for example, IDEs and version control tools). Let’s make that worse. Now consider the constantly changing platforms on which the software executes (desktops, laptops, mobile devices, RESTful services, and so on) or the system developers’ varying skills and experience.

Given all that complex and con-tinual variability, every software project could be unique. And, if that were true, any lesson learned from past projects would have limited ap-plicability for future projects.

This turns out not to be the case. One of the lessons of software ana-lytics is that software projects have predictable properties2 and that at least some of those properties hold

for future projects. Stranger still, the number of variables required to make those predictions is small—which means that most of the things we think might affect software qual-ity have little impact in practice.

Not as Complex as We ThoughtConsider the task of predicting how long it takes to build software. Given dozens of attributes describ-ing a software project, we can usu-ally guess that project’s development time. We can do this using qualita-tive methods (for example, planning poker,3 which is favored by the agile community) or parametric-modeling methods (favored by large govern-ment projects4,5). However we do it, such estimates are surprisingly accurate.3,5,6

Call for Submissions

Do you have a surprising result or industrial experience? Something that challenges

decades of conventional thinking in software engineering? If so, email a one-

paragraph synopsis to [email protected] (use the subject line “REDIRECTIONS: Idea:

[your idea]”). If that looks interesting, I’ll ask you to submit a 1,000- to 2,400-word

article (where each graph, table, or � gure is worth 250 words) for review for IEEE

Software. Note: Heresies are more than welcome (if supported by well-reasoned

industrial experiences, case studies, or other empirical results). —Tim Menzies

Page 39: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

www.computer.org/computingedge 37

REDIRECTIONS

MARCH/APRIL 2018 | IEEE SOFTWARE 97

Furthermore, to make those es-timates, we need only a remarkably small number of attributes. For ex-ample, feature subset selection (FSS) is an automatic technique for �nd-ing what attributes we can remove without damaging our ability to make a prediction from the data. Recent results show that traditional FSS methods (for example, step-wise regression) can be improved by AI search algorithms that quickly search very large subsets of the attri-butes to �nd the most useful ones.7 Applying FSS to software defect pre-dictors or software effort estimators often reduces datasets with 24 to 42 attributes to sets with only two or three attributes.5,8 This means that (24 – 3)/24 � 88 percent to (42 – 3)/ 42 � 93 percent of the collected at-tributes aren’t essential to predict-ing software quality. That is, much of what we thought was important for quality prediction turns out to be mostly irrelevant. And, more in-terestingly, we can’t tell beforehand what attributes will be the most use-ful9 until we test those attributes on real project data.

This result is surprising, to say the least. Software engineers tend to emphasize the complexities, rather than the simplicity, of software proj-ects. Much has been written about what factors might in�uence a soft-ware project, so developers often spend much effort collecting dozens of attributes. Yet for the projects I’ve mentioned, nearly all those at-tributes are irrelevant for prediction.

Important QuestionsWhy are so many things irrelevant? Software engineering data often con-tains much noise. Collecting data from multiple projects is dif�cult because the collected data’s mean-ing can vary from project to project.

We can remove such noisy attri-butes without damaging predictive prowess.

We should also remove most of the closely associated attributes. Suppose a software company assigns its most skilled programmers to mission-critical projects. In that data, “programming skill” would be associated with “criticality.” We could dispense with either (but not both) of those attributes without los-ing important information.

In addition, there’s the effect of context. Figure 1 shows data from NASA regarding software projects at the Jet Propulsion Laboratory (JPL). Most of the projects are of high complexity. Thus, feature selection would tend to delete “high complex-ity” because it’s (mostly) a constant across all the data. That is, although no one doubts that software com-plexity contributes to software cost, for the JPL data, it’s mostly irrelevant.

So, what does this mean for the practice of analytics? The previous examples tell us that real software projects can surprise and confound our expectations. In new projects, we should check all expectations

(that some factor contributes to soft-ware quality). That’s the bad news. The good news is that such checks are now fast to run, given the ready availability of data-mining tools (and developers skilled in using them).

M ore generally, note how these examples are all motivation for this new

Redirections department. Our �eld is rife with any number of truisms that are commonly quoted but rarely checked. Perhaps it’s time to reverse that trend. Let’s all look over old re-sults in software engineering with a fresh eye and ask, “Which of those results are most applicable?” and “Can we con�rm those results using contemporary data?” Hopefully, this department will prompt many such inquiries.

References 1. R. Prikladnicki and T. Menzies,

“From Voice of Evidence to Redirec-

tions,” IEEE Software, vol. 35, no. 1,

2018, pp. 11–13

2. T. Menzies and T. Zimmermann,

“Software Analytics: So What?,”

IEEE Software, vol. 30, no. 4, 2013,

Low Nominal Highcomplexity

Very high Extra high

25

50

2 10

10

20

30

40

50

No. o

f pro

ject

s

FIGURE 1. The distribution of complexity in a NASA project dataset.5 Complexity is

a constant across nearly all the data, so it could be removed from consideration as an

attribute during software analytics.

FROM THE EDITOREditor: Editor Nameaffi l [email protected]

96 IEEE SOFTWARE | PUBLISHED BY THE IEEE COMPUTER SOCIETY 0 7 4 0 - 7 4 5 9 / 1 8 / $ 3 3 . 0 0 © 2 0 1 8 I E E E

REDIRECTIONSEditor: Tim MenziesNorth Carolina State [email protected]

The Unreasonable Effectiveness of Software AnalyticsTim Menzies

AS RAFAEL PRIKLADNICKI and I commented in last issue’s Voice of Evidence article, it’s time to ask, “What’s surprising about software engineering?”1 Accordingly, in this article, I explore one of the great mysteries of software analytics: why does it work at all?

Software analytics distills large amounts of low-value data into small chunks of very-high-value data. Such chunks are often predictive; that is, they can offer a somewhat accurate prediction about some quality attri-bute of future projects—for exam-ple, the location of potential defects or the development cost.

In theory, software analytics shouldn’t work because software proj-ect behavior shouldn’t be predictable. Consider the wide, ever-changing range of tasks being implemented

by software and the diverse, con-tinually evolving tools used for soft-ware’s construction (for example, IDEs and version control tools). Let’s make that worse. Now consider the constantly changing platforms on which the software executes (desktops, laptops, mobile devices, RESTful services, and so on) or the system developers’ varying skills and experience.

Given all that complex and con-tinual variability, every software project could be unique. And, if that were true, any lesson learned from past projects would have limited ap-plicability for future projects.

This turns out not to be the case. One of the lessons of software ana-lytics is that software projects have predictable properties2 and that at least some of those properties hold

for future projects. Stranger still, the number of variables required to make those predictions is small—which means that most of the things we think might affect software qual-ity have little impact in practice.

Not as Complex as We ThoughtConsider the task of predicting how long it takes to build software. Given dozens of attributes describ-ing a software project, we can usu-ally guess that project’s development time. We can do this using qualita-tive methods (for example, planning poker,3 which is favored by the agile community) or parametric-modeling methods (favored by large govern-ment projects4,5). However we do it, such estimates are surprisingly accurate.3,5,6

Call for Submissions

Do you have a surprising result or industrial experience? Something that challenges

decades of conventional thinking in software engineering? If so, email a one-

paragraph synopsis to [email protected] (use the subject line “REDIRECTIONS: Idea:

[your idea]”). If that looks interesting, I’ll ask you to submit a 1,000- to 2,400-word

article (where each graph, table, or � gure is worth 250 words) for review for IEEE

Software. Note: Heresies are more than welcome (if supported by well-reasoned

industrial experiences, case studies, or other empirical results). —Tim Menzies

Page 40: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

38 ComputingEdge June 2019

REDIRECTIONS

98 IEEE SOFTWARE | W W W.COMPUTER.ORG/SOFT WARE | @IEEESOFT WARE

pp. 31–37; doi:10.1109/MS.2013.86;

goo.gl/aGS7wP.

3. K. Molokken-Ostvold and N.C.

Haugen, “Combining Estimates

with Planning Poker—an Empirical

Study,” Proc. 18th Australian Soft-

ware Eng. Conf. (ASWEC 07), 2007,

pp. 349–358; doi:10.1109/ASWEC

.2007.15.

4. F. Sarro, A. Petrozziello, and M.

Harman, “Multi-objective Soft-

ware Effort Estimation,” Proc.

38th Int’l Conf. Software Eng.

(ICSE 16), 2016, pp. 619–630;

doi:10.1145/2884781.2884830.

5. Z. Chen et al., “Finding the Right

Data for Software Cost Modeling,”

IEEE Software, vol. 22, no. 6, 2005,

pp. 38–46; doi:10.1109/MS.2005

.151.

6. F. Zhang et al., “Towards Building

a Universal Defect Prediction Model

with Rank Transformed Predictors,”

Empirical Software Eng., vol. 21,

no. 5, 2016, pp. 2107–2145.

7. M.A. Hall and G. Holmes, “Bench-

marking Attribute Selection Tech-

niques for Discrete Class Data

Mining,” IEEE Trans. Knowledge

and Data Eng., vol. 15, no. 6, 2003,

pp. 1437–1447.

8. T. Menzies et al., “Defect Predic-

tion from Static Code Features:

Current Results, Limitations, New

Approaches,” Automated Software

Eng., vol. 17, no. 4, 2010, pp. 375–

407; doi:10.1007/s10515-010-0069-5.

9. R. Krishna and T. Menzies, “Bell-

wethers: A Baseline Method for

Transfer Learning,” 3 Dec. 2017;

arxiv.org/abs/1703.06218.

Read your subscriptions through the myCS publications portal at

http://mycs.computer.org

ABOUT THE AUTHOR

TIM MENZIES is a full professor at North Carolina State University, where

he leads the RAISE (Real-World AI for Software Engineering) research

group. Contact him at [email protected]; menzies.us.

Author guidelines: www.computer.org/software/authorFurther details: [email protected]

www.computer.org/software

IEEE Software seeks practical,

readable articles that will appeal to

experts and nonexperts alike. The

magazine aims to deliver reliable,

useful, leading-edge information

to software developers, engineers,

and managers to help them stay

on top of rapid technology change.

Topics include requirements,

design, construction, tools, project

management, process improvement,

maintenance, testing, education and

training, quality, standards, and more.

CallArticlesfor

This article originally appeared in IEEE Software, vol. 35, no. 2, 2018.

@s e cur it ypr ivac y

FOLLOW US

Page 41: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

2469-7087/19/$33.00 © 2019 IEEE Published by the IEEE Computer Society June 2019 39

00mis00-poria-2882362.3d (Style 4) 05-04-2019 15:37

Multimodal SentimentAnalysis: Addressing KeyIssues and Setting Up theBaselines

We compile baselines, along with dataset split, for

multimodal sentiment analysis. In this paper, we explore

three different deep-learning-based architectures for

multimodal sentiment classification, each improving upon

the previous. Further, we evaluate these architectures with

multiple datasets with fixed train/test partition. We also

discuss some major issues, frequently ignored in

multimodal sentiment analysis research, e.g., the role of

speaker-exclusive models, the importance of different

modalities, and generalizability. This framework illustrates

the different facets of analysis to be considered while

performing multimodal sentiment analysis and, hence,

serves as a new benchmark for future research in this

emerging field.

Emotion recognition and sentiment analysis is opening up numerous opportunities pertaining to socialmedia in terms of understanding users’ preferences, habits, and their contents.10 With the advancementof communication technology, an abundance of mobile devices, and the rapid rise of social media, alarge amount of data is being uploaded as a video, rather than text.2 For example, consumers tend torecord their opinions on products using a webcam and upload them on social media platforms, such asYouTube and Facebook, to inform the subscribers of their views. Such videos often containcomparisons of products from competing brands, pros and cons of product specifications, and otherinformation that can aid prospective buyers to make informed decisions.

Soujanya PoriaNanyang TechnologicalUniversity

Navonil MajumderInstituto Polit�ecnico Nacional

Devamanyu HazarikaNational University ofSingapore

Erik CambriaNanyang TechnologicalUniversity

Alexander GelbukhInstituto Polit�ecnico Nacional

Amir HussainEdinburgh Napier University

Editor:Erik [email protected]

DEPARTMENT: Affective Computing and Sentiment Analysis

IEEE Intelligent SystemsNovember/December 2018 17

Published by the IEEE Computer Society1541-1672/18/$33.00 �2018 IEEE

Page 42: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

40 ComputingEdge June 2019

00mis00-poria-2882362.3d (Style 4) 05-04-2019 15:37

The primary advantage of analyzing videos over mere text analysis, for detecting emotions andsentiment, is the surplus of behavioral cues. Videos provide multimodal data in terms of vocal andvisual modalities. The vocal modulations and facial expressions in the visual data, along with text data,provide important cues to better identify true affective states of the opinion holder. Thus, acombination of text and video data helps to create a better emotion and sentiment analysis model.

Recently, a number of approaches to multimodal sentiment analysis producing interesting results havebeen proposed.11,13 However, there are major issues that remain mostly unaddressed in this field, suchas the consideration of the context in classification, effect of speaker-inclusive and speaker-exclusivescenario, the impact of each modality across datasets, and generalization ability of a multimodalsentiment classifier. Not tackling these issues has presented difficulties in the effective comparison ofdifferent multimodal sentiment analysis methods. In this paper, we outline some methods that addressthese issues and set up a baseline based on state-of-the-art methods. We use a deep convolutionalneural network (CNN) to extract features from visual and text modalities.

This paper is organized as follows: The “Related Work” section provides a brief literature review onmultimodal sentiment analysis. The “Unimodal Feature Extraction” section briefly discusses thebaseline methods; experimental results and discussion are given in the “Experiments andObservations” section, and finally, “Conclusion” section concludes the paper.

RELATED WORKIn 1970, Ekman et al.6 carried out extensive studies on facial expressions. Their research work showedthat universal facial expressions are able to provide sufficient clues to detect emotions. Recent studieson speech-based emotion analysis4 have focused on identifying relevant acoustic features, such asfundamental frequency (pitch), the intensity of utterance, bandwidth, and duration.

As to fusing audio and visual modalities for emotion recognition, two of the early works were done byDe Silva et al.5 and Chen et al.3 Both works showed that a bimodal system yielded a higher accuracythan any unimodal system.

While there are many research papers on audio-visual fusion for emotion recognition, only a fewresearch works have been devoted to multimodal emotion or sentiment analysis using text clues alongwith visual and audio modalities. Wollmer et al.14 fused information from audio, visual and textmodalities to extract emotion and sentiment. Metallinou et al.8 fused audio and text modalities foremotion recognition. Both approaches relied on feature-level fusion.

In this paper, we study the behavior of the method proposed in Poria et al.,12 in the aspects rarelyaddressed by other authors, such as speaker independence, the generalizability of the models andperformance of individual modalities.

UNIMODAL FEATURE EXTRACTIONFor the unimodal feature extraction, we follow the procedures by bc-LSTM.12

Textual Feature ExtractionWe employ CNN for textual feature extraction. Following Kim,16 we obtain n-gram features from eachutterance using three distinct convolution filters of sizes 3, 4, and 5, respectively, each having 50 feature-maps. Outputs are then subjected to max-pooling followed by rectified linear unit activation. Theseactivations are concatenated and fed to a 100-dimensional (100-D) dense layer, which is regarded as thetextual utterance representation. This network is trained at utterance level with the emotion labels.

Audio and Visual Feature ExtractionIdentical to Poria et al.,12 we use 3-D-CNN and openSMILE7 for visual and acoustic featureextraction, respectively.

November/December 2018 18 www.computer.org/intelligent

IEEE INTELLIGENT SYSTEMS

Page 43: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

www.computer.org/computingedge 41

00mis00-poria-2882362.3d (Style 4) 05-04-2019 15:37

FusionIn order to fuse the information extracted from different modalities, we concatenated the featurevectors representative of the given modalities and sent the combined vector to a classifier for theclassification. This scheme of fusion is called feature-level fusion. Since the fusion involvedconcatenation and no overlapping, merge, or combination, scaling and normalization of the featureswere avoided. We discuss the results of this fusion in the “Experiments and Observations” section.

Baseline Method

1. bc-LSTM:We follow the method bc-LSTM12 where they used a bidirectional LSTM tocapture the context from the surrounding utterances to generate context-aware utterancerepresentation.

2. SVM: After extracting the features, we merged and sent to an SVM with RBF kernel for thefinal classification.

EXPERIMENTS AND OBSERVATIONSIn this section, we discuss the datasets and the experimental settings. Also, we analyze the resultsyielded by the aforementioned methods.

Datasets

1. Multimodal Sentiment Analysis Datasets: For our experiments, we used the MOUD dataset,developed by Perez-Rosas et al.9 They collected 80 product review and recommendationvideos from YouTube. Each video was segmented into its utterances (498 in total) and eachof these was categorized by a sentiment label (positive, negative and neutral). On average,each video has six utterances and each utterance is five seconds long. In our experiment, wedid not consider neutral labels, which led to the final dataset consisting of 448 utterances.We dropped the neutral label to maintain consistency with previous work. In a similarfashion, Zadeh et al.15 constructed a multimodal sentiment analysis dataset calledmultimodal opinion-level sentiment intensity (MOSI), which is bigger than MOUD,consisting of 2199 opinionated utterances, 93 videos by 89 speakers. The videos address alarge array of topics, such as movies, books, and products. In the experiment to address thegeneralizability issues, we trained a model on MOSI and tested on MOUD. Table 1 showsthe split of train/test of these datasets.

2. Multimodal Emotion Recognition Dataset:The IEMOCAP database1 was collected for thepurpose of studying multimodal expressive dyadic interactions. This dataset contains 12 hoursof video data split into five minutes of dyadic interaction between professional male and femaleactors. Each interaction session was split into spoken utterances. At least three annotators

Table 1. Person-independent train/test split details of each dataset (�70/30% split).Note: X! Y represents train: X and test: Y; Validation sets are extracted from

the shuffled train sets using 80/20% train/val ratio.

Dataset Train Test

utterance video utterance video

IEMOCAP 4290 120 1208 31

MOSI 1447 62 752 31

MOUD 322 59 115 20

MOSI!MOUD

2199 93 437 79

Affective Computing and Sentiment Analysis

November/December 2018 19 www.computer.org/intelligent

00mis00-poria-2882362.3d (Style 4) 05-04-2019 15:37

The primary advantage of analyzing videos over mere text analysis, for detecting emotions andsentiment, is the surplus of behavioral cues. Videos provide multimodal data in terms of vocal andvisual modalities. The vocal modulations and facial expressions in the visual data, along with text data,provide important cues to better identify true affective states of the opinion holder. Thus, acombination of text and video data helps to create a better emotion and sentiment analysis model.

Recently, a number of approaches to multimodal sentiment analysis producing interesting results havebeen proposed.11,13 However, there are major issues that remain mostly unaddressed in this field, suchas the consideration of the context in classification, effect of speaker-inclusive and speaker-exclusivescenario, the impact of each modality across datasets, and generalization ability of a multimodalsentiment classifier. Not tackling these issues has presented difficulties in the effective comparison ofdifferent multimodal sentiment analysis methods. In this paper, we outline some methods that addressthese issues and set up a baseline based on state-of-the-art methods. We use a deep convolutionalneural network (CNN) to extract features from visual and text modalities.

This paper is organized as follows: The “Related Work” section provides a brief literature review onmultimodal sentiment analysis. The “Unimodal Feature Extraction” section briefly discusses thebaseline methods; experimental results and discussion are given in the “Experiments andObservations” section, and finally, “Conclusion” section concludes the paper.

RELATED WORKIn 1970, Ekman et al.6 carried out extensive studies on facial expressions. Their research work showedthat universal facial expressions are able to provide sufficient clues to detect emotions. Recent studieson speech-based emotion analysis4 have focused on identifying relevant acoustic features, such asfundamental frequency (pitch), the intensity of utterance, bandwidth, and duration.

As to fusing audio and visual modalities for emotion recognition, two of the early works were done byDe Silva et al.5 and Chen et al.3 Both works showed that a bimodal system yielded a higher accuracythan any unimodal system.

While there are many research papers on audio-visual fusion for emotion recognition, only a fewresearch works have been devoted to multimodal emotion or sentiment analysis using text clues alongwith visual and audio modalities. Wollmer et al.14 fused information from audio, visual and textmodalities to extract emotion and sentiment. Metallinou et al.8 fused audio and text modalities foremotion recognition. Both approaches relied on feature-level fusion.

In this paper, we study the behavior of the method proposed in Poria et al.,12 in the aspects rarelyaddressed by other authors, such as speaker independence, the generalizability of the models andperformance of individual modalities.

UNIMODAL FEATURE EXTRACTIONFor the unimodal feature extraction, we follow the procedures by bc-LSTM.12

Textual Feature ExtractionWe employ CNN for textual feature extraction. Following Kim,16 we obtain n-gram features from eachutterance using three distinct convolution filters of sizes 3, 4, and 5, respectively, each having 50 feature-maps. Outputs are then subjected to max-pooling followed by rectified linear unit activation. Theseactivations are concatenated and fed to a 100-dimensional (100-D) dense layer, which is regarded as thetextual utterance representation. This network is trained at utterance level with the emotion labels.

Audio and Visual Feature ExtractionIdentical to Poria et al.,12 we use 3-D-CNN and openSMILE7 for visual and acoustic featureextraction, respectively.

November/December 2018 18 www.computer.org/intelligent

IEEE INTELLIGENT SYSTEMS

Page 44: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

42 ComputingEdge June 2019

00mis00-poria-2882362.3d (Style 4) 05-04-2019 15:37

assigned to each utterance one emotion category: happy, sad, neutral, angry, surprised, excited,frustration, disgust, fear and other. In this paper, we considered only the utterances withmajorityagreement (i.e., at least two out of three annotators labeled the same emotion) in the emotionclasses of angry, happy, sad, and neutral. Table 1 shows the split of train/test of this dataset.

Speaker-Exclusive ExperimentMost of the research work on multimodal sentiment analysis is performed with datasets having acommon speaker(s) between train and test splits. However, given this overlap, results do not scale totrue generalization. In real-world applications, the model should be robust to speaker variance. Thus,we performed speaker-exclusive experiments to emulate unseen conditions. This time, our train/testsplits of the datasets were completely disjoint with respect to speakers. While testing, our models hadto classify emotions and sentiments from utterances by speakers they have never seen before. Below,we elaborate this speaker-exclusive experiment:

� IEMOCAP: As this dataset contains ten speakers, we performed a ten-fold speaker-exclusive test, where in each round exactly one of the speakers was included in the test setand missing from the train set. The same SVM model was used as before and accuracy wasused as a performance metric.

� MOUD: This dataset contains videos of about 80 people reviewing various products inSpanish. Each utterance in the video has been labeled as positive, negative, or neutral. In ourexperiments, we consider only samples with positive and negative sentiment labels. Thespeakers were partitioned into five groups and a five-fold person-exclusive experiment wasperformed, where in every fold one out of the five group was in the test set. Finally, we tookan average of the accuracy to summarize the results (Table 2).

� MOSI:MOSI dataset is rich in sentimental expressions, where 93 people review variousproducts in English. The videos are segmented into clips, where each clip is assigned asentiment score between 3 to þ3 by five annotators. We took the average of these labels asthe sentiment polarity and naturally considered two classes (positive and negative). LikeMOUD, speakers were divided into five groups and a five-fold person-exclusive experimentwas run. For each fold, on average 75 people were in the training set and the remaining inthe test set. The training set was further partitioned and shuffled into 80%– 20% split togenerate train and validation sets for parameter tuning.

Table 2. Accuracy reported for speaker-exclusive (Sp-Ex) and speaker-inclusive (Sp-In) splitfor Concatenation- Based Fusion. IEMOCAP: 10-fold speaker-exclusive average.

MOUD: Five-fold speaker-exclusive average.MOSI: Five-fold speaker-exclusive average.Legend: A stands for Audio, V for Video, T for Text.

Modality IEMOCAP MOUD MOSI

Combi-nation

Sp-In Sp-Ex Sp-In Sp-Ex Sp-In Sp-Ex

A 66.20 51.52 – 53.70 64.00 57.14

V 60.30 41.79 – 47.68 62.11 58.46

T 67.90 65.13 – 48.40 78.00 75.16

T þA 78.20 70.79 – 57.10 76.60 75.72

T þ V 76.30 68.55 – 49.22 78.80 75.06

A þV 73.90 52.15 – 62.88 66.65 62.4

T þA þV

81.70 71.59 – 67.90 78.80 76.66

IEEE INTELLIGENT SYSTEMS

November/December 2018 20 www.computer.org/intelligent

Page 45: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

www.computer.org/computingedge 43

00mis00-poria-2882362.3d (Style 4) 05-04-2019 15:37

1) Speaker-Inclusive versus Speaker-Exclusive: In comparison with the speaker-inclusive experiment,the speaker-exclusive setting yielded inferior results. This is caused by the absence of knowledgeabout the speakers during the testing phase. Table 2 shows the performance obtained in the speaker-inclusive experiment. It can be seen that audio modality consistently performs better than visualmodality in both MOSI and IEMOCAP datasets. The text modality plays the most important role inboth emotion recognition and sentiment analysis. The fusion of the modalities shows more impact foremotion recognition than for sentiment analysis. Root mean square error (RMSE) and TP-rate of theexperiments using different modalities on IEMOCAP and MOSI datasets are shown in Figure 1.

Contributions of the ModalitiesAs expected, bimodal, and trimodal models have performed better than unimodal models in allexperiments. Overall, audio modality has performed better than visual on all datasets. Except forMOUD dataset, the unimodal performance of text modality is substantially better than the other twomodalities (Figure 2).

Figure 1. Experiments on IEMOCAP and MOSI datasets. The top-left figure shows the RMSEof the models on IEMOCAP and MOSI. The top-right figure shows the dataset distribution.Bottom-left and bottom-right figures present TP-rate of the models on IEMOCAP and MOSIdataset, respectively.

Figure 2. Performance of the modalities on the datasets. Red line indicates the median of theaccuracy.

Affective Computing and Sentiment Analysis

November/December 2018 21 www.computer.org/intelligent

00mis00-poria-2882362.3d (Style 4) 05-04-2019 15:37

assigned to each utterance one emotion category: happy, sad, neutral, angry, surprised, excited,frustration, disgust, fear and other. In this paper, we considered only the utterances withmajorityagreement (i.e., at least two out of three annotators labeled the same emotion) in the emotionclasses of angry, happy, sad, and neutral. Table 1 shows the split of train/test of this dataset.

Speaker-Exclusive ExperimentMost of the research work on multimodal sentiment analysis is performed with datasets having acommon speaker(s) between train and test splits. However, given this overlap, results do not scale totrue generalization. In real-world applications, the model should be robust to speaker variance. Thus,we performed speaker-exclusive experiments to emulate unseen conditions. This time, our train/testsplits of the datasets were completely disjoint with respect to speakers. While testing, our models hadto classify emotions and sentiments from utterances by speakers they have never seen before. Below,we elaborate this speaker-exclusive experiment:

� IEMOCAP: As this dataset contains ten speakers, we performed a ten-fold speaker-exclusive test, where in each round exactly one of the speakers was included in the test setand missing from the train set. The same SVM model was used as before and accuracy wasused as a performance metric.

� MOUD: This dataset contains videos of about 80 people reviewing various products inSpanish. Each utterance in the video has been labeled as positive, negative, or neutral. In ourexperiments, we consider only samples with positive and negative sentiment labels. Thespeakers were partitioned into five groups and a five-fold person-exclusive experiment wasperformed, where in every fold one out of the five group was in the test set. Finally, we tookan average of the accuracy to summarize the results (Table 2).

� MOSI:MOSI dataset is rich in sentimental expressions, where 93 people review variousproducts in English. The videos are segmented into clips, where each clip is assigned asentiment score between 3 to þ3 by five annotators. We took the average of these labels asthe sentiment polarity and naturally considered two classes (positive and negative). LikeMOUD, speakers were divided into five groups and a five-fold person-exclusive experimentwas run. For each fold, on average 75 people were in the training set and the remaining inthe test set. The training set was further partitioned and shuffled into 80%– 20% split togenerate train and validation sets for parameter tuning.

Table 2. Accuracy reported for speaker-exclusive (Sp-Ex) and speaker-inclusive (Sp-In) splitfor Concatenation- Based Fusion. IEMOCAP: 10-fold speaker-exclusive average.

MOUD: Five-fold speaker-exclusive average.MOSI: Five-fold speaker-exclusive average.Legend: A stands for Audio, V for Video, T for Text.

Modality IEMOCAP MOUD MOSI

Combi-nation

Sp-In Sp-Ex Sp-In Sp-Ex Sp-In Sp-Ex

A 66.20 51.52 – 53.70 64.00 57.14

V 60.30 41.79 – 47.68 62.11 58.46

T 67.90 65.13 – 48.40 78.00 75.16

T þA 78.20 70.79 – 57.10 76.60 75.72

T þ V 76.30 68.55 – 49.22 78.80 75.06

A þV 73.90 52.15 – 62.88 66.65 62.4

T þA þV

81.70 71.59 – 67.90 78.80 76.66

IEEE INTELLIGENT SYSTEMS

November/December 2018 20 www.computer.org/intelligent

Page 46: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

44 ComputingEdge June 2019

00mis00-poria-2882362.3d (Style 4) 05-04-2019 15:37

Generalizability of the ModelsTo test the generalization ability of the models, we trained the framework on MOSI dataset in speaker-exclusive fashion and tested with MOUD dataset. From Table 3, we can see that the trained model withMOSI dataset performed poorly with MOUD dataset.

This is mainly due to the fact that reviews in MOUD dataset had been recorded in Spanish, so bothaudio and text modalities miserably fail in recognition, as MOSI dataset contains reviews in English. Amore comprehensive study would be to perform generalizability tests on datasets of the samelanguage. However, we were unable to do this for the lack of benchmark datasets. Also, similarexperiments of cross-dataset generalization were not performed on emotion detection, given theavailability of only a single dataset (IEMOCAP).

Table 3. Cross-dataset results: Model (with previous configurations) trained on MOSI datasetand tested on MOUD dataset.

Modality Combination Accuracy

SVM bc-LSTM

T 46.5% 46.9%

V 43.3% 49.6%

A 42.9% 47.2%

T þA 50.4% 51.3%

T þ V 49.8% 49.8%

A þV 46.0% 49.6%

T þA þV 51.1% 52.7%

Table 4. Accuracy reported for speaker-exclusive classification. IEMOCAP: Ten-fold speaker-exclusive average.MOUD: Five-fold speaker-exclusive average.MOSI: 5-fold speaker-exclusive average. Legend: A represents Audio, V represents Video, T represents Text.

Modality IEMOCAP MOUD MOSI

Combi-nation

SVM bc-LSTM

SVM bc-LSTM

SVM bc-LSTM

A 52.9 57.1 51.5 59.9 58.5 60.3

V 47.0 53.2 46.3 48.5 53.1 55.8

T 65.5 73.6 49.5 52.1 75.5 78.1

T þA 70.1 75.4 53.1 60.4 75.8 80.2

T þ V 68.5 75.6 50.2 52.2 76.7 79.3

A þV 67.6 68.9 62.8 65.3 58.6 62.1

T þA þV

72.5 76.1 66.1 68.1 77.9 80.3

IEEE INTELLIGENT SYSTEMS

November/December 2018 22 www.computer.org/intelligent

Page 47: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

www.computer.org/computingedge 45

00mis00-poria-2882362.3d (Style 4) 05-04-2019 15:37

Comparison Among the Baseline MethodsTable 4 consolidates and compares the performance of all the baseline methods for all the datasets. Weevaluated SVM and bc-LSTM fusion with MOSI, MOUD, and IEMOCAP dataset.

From Table 4, it is clear that bc-LSTM performs better than SVM across all the experiments. So, it isvery apparent that consideration of the context in the classification process has substantially boostedthe performance.

Visualization of the DatasetsMOSI visualizations present information regarding dataset distribution within single and multiplemodalities (Figure 3). For the textual and audio modalities, comprehensive clustering can be seen withsubstantial overlap. However, this problem is reduced in the video and all modalities scenario withstructured declustering but the overlap is reduced only in multimodal. This forms an intuitiveexplanation of the improved performance in the multimodal scenario. IEMOCAP visualizationsprovide insight for the four-class distribution for uni- and multimodal scenario, where clearly the

Figure 3. T-SNE 2-D visualization of MOSI and IEMOCAP datasets when unimodal featuresand multimodal features are used.

Affective Computing and Sentiment Analysis

November/December 2018 23 www.computer.org/intelligent

00mis00-poria-2882362.3d (Style 4) 05-04-2019 15:37

Generalizability of the ModelsTo test the generalization ability of the models, we trained the framework on MOSI dataset in speaker-exclusive fashion and tested with MOUD dataset. From Table 3, we can see that the trained model withMOSI dataset performed poorly with MOUD dataset.

This is mainly due to the fact that reviews in MOUD dataset had been recorded in Spanish, so bothaudio and text modalities miserably fail in recognition, as MOSI dataset contains reviews in English. Amore comprehensive study would be to perform generalizability tests on datasets of the samelanguage. However, we were unable to do this for the lack of benchmark datasets. Also, similarexperiments of cross-dataset generalization were not performed on emotion detection, given theavailability of only a single dataset (IEMOCAP).

Table 3. Cross-dataset results: Model (with previous configurations) trained on MOSI datasetand tested on MOUD dataset.

Modality Combination Accuracy

SVM bc-LSTM

T 46.5% 46.9%

V 43.3% 49.6%

A 42.9% 47.2%

T þA 50.4% 51.3%

T þ V 49.8% 49.8%

A þV 46.0% 49.6%

T þA þV 51.1% 52.7%

Table 4. Accuracy reported for speaker-exclusive classification. IEMOCAP: Ten-fold speaker-exclusive average.MOUD: Five-fold speaker-exclusive average.MOSI: 5-fold speaker-exclusive average. Legend: A represents Audio, V represents Video, T represents Text.

Modality IEMOCAP MOUD MOSI

Combi-nation

SVM bc-LSTM

SVM bc-LSTM

SVM bc-LSTM

A 52.9 57.1 51.5 59.9 58.5 60.3

V 47.0 53.2 46.3 48.5 53.1 55.8

T 65.5 73.6 49.5 52.1 75.5 78.1

T þA 70.1 75.4 53.1 60.4 75.8 80.2

T þ V 68.5 75.6 50.2 52.2 76.7 79.3

A þV 67.6 68.9 62.8 65.3 58.6 62.1

T þA þV

72.5 76.1 66.1 68.1 77.9 80.3

IEEE INTELLIGENT SYSTEMS

November/December 2018 22 www.computer.org/intelligent

Page 48: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

46 ComputingEdge June 2019

00mis00-poria-2882362.3d (Style 4) 05-04-2019 15:37

multimodal distribution has the least overlap (increase in red and blue visuals, apart from the rest) withsparse distribution aiding the classification process.

CONCLUSIONWe have presented useful baselines for multimodal sentiment analysis and multimodal emotionrecognition. We also discussed some major aspects of multimodal sentiment analysis problem, such asthe performance in the unknown-speaker setting and the cross-dataset performance of the models.

Our future work will focus on extracting semantics from the visual features, relatedness of the cross-modal features and their fusion. We will also include contextual dependency learning in our model toovercome the limitations mentioned in the previous section.

REFERENCES1. C. Busso et al., “IEMOCAP: In teractive emotional dyadic motion capture database,” Lang.

Resour. Eval., vol. 42, no. 4, pp. 335–359, 2008.2. S. Poria, E. Cambria, D. Hazarika, N. Mazumder, A. Zadeh, and L. P. Morency, “Multi-level

multiple attentions for context-aware multimodal sentiment analysis,” in Proc. Int. Conf.Data Mining, 2017, pp. 1033–1038.

3. L. S. Chen, T. S. Huang, T. Miyasato, and R. Nakatsu, “Multi-modal human emotion/expression recognition,” in Proc. 3rd IEEE Int. Conf. Autom. Face Gesture Recognit., 1998,pp. 366–371.

4. D. Datcu and L. Rothkrantz, “Semantic audio-visual data fusion for automatic emotionrecognition,” Euromedia, 2008, http://mmi.tudelft.nl/pub/dragos/_datcu_euromedia08.pdf

5. L. C. De Silva, T. Miyasato, and R. Nakatsu, “Facial emotion recognition using multi-modalinformation,” in Proc. IEEE ICICS, 1997, pp. 397–401.

6. P. Ekman, “Universal facial expressions of emotion,” Culture and Personality:Contemporary Readings/Chicago, pp. 151–158, 1974.

7. F. Eyben, M. Wo€llmer, and B. Schuller, “Opensmile: The munich versatile and fast open-source audio feature extractor,” in Proc. 18th ACM Int. Conf. Multimedia, 2010,pp. 1459–1462.

8. A. Metallinou, S. Lee, and S. Narayanan, “Audio-visual emotion recognition using Gaussianmixture models for face and voice,” in Proc. 10th IEEE Int. Symp. ISM, 2008, pp. 250–257.

9. V. Pe�rez-Rosas, R. Mihalcea, and L.-P. Morency, “Utterance- level multimodal sentimentanalysis,” in Proc. 51st Annu. Meeting Assoc. Comput. Linguistics, 2013, pp. 973–982.

10. S. Poria, E. Cambria, R. Bajpai, and A. Hussain, “A review of affective computing: Fromunimodal analysis to multimodal fusion,” Inf. Fusion, vol. 37, pp. 98–125, 2017.

11. N. Majumder, D. Hazarika, A. Gelbukh, E. Cambria, and S. Poria, “Multimodal sentimentanalysis using hierarchical fusion with context modeling,” Knowl. Based Syst., vol. 161,pp. 124–133, 2018.

12. S. Poria, E. Cambria, D. Hazarika, N. Majumder, A. Zadeh, and L.-P. Morency, “Context-dependent sentiment analysis in user- generated videos,” in Proc. 55th Annu. Meeting Assoc.Comput. Linguistics (volume 1: Long papers), July 2017, pp. 873–883.

13. S. Poria, I. Chaturvedi, E. Cambria, and A. Hussain, “Convolutional MKL based multimodalemotion recognition and sentiment analysis,” in Proc. Int. Conf. Data Mining, 2016,pp. 439–448.

14. M. Wollmer et al., “Youtube movie reviews: Sentiment analysis in an audio-visual context,”IEEE Intell. Syst., vol. 28, no. 3, pp. 46–53, May/Jun. 2013.

15. A. Zadeh, R. Zellers, E. Pincus, and L.-P. Morency, “Multimodal sentiment intensityanalysis in videos: Facial gestures and verbal messages,” IEEE Intell. Syst., vol. 31, no. 6,pp. 82–88, Nov./Dec. 2016.

16. Y. Kim, “Convolutional neural networks for sentence classification,” arXiv:1408.5882,2014.

IEEE INTELLIGENT SYSTEMS

November/December 2018 24 www.computer.org/intelligent

Page 49: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

www.computer.org/computingedge 47

00mis00-poria-2882362.3d (Style 4) 05-04-2019 15:37

ABOUT THE AUTHORS

Soujanya Poria is a Presidential Postdoctoral Fellow with the School of ComputerScience and Engineering, Nanyang Technological University, Singapore. Contact him [email protected].

Navonil Majumder is currently working toward the Ph.D. degree at the Centro deInvestigaci�on en Computaci�on (CIC) of the Instituto Polit�ecnico Nacional, Mexico City,Mexico. Contact him at [email protected].

Devamanyu Hazarika is currently working toward the Ph.D. degree at the School ofComputing at National University of Singapore, Singapore. Contact him [email protected].

Erik Cambria is an Assistant Professor with the School of Computer Science andEngineering, Nanyang Technological University, Singapore. Contact him [email protected].

Alexander Gelbukh is a Research Professor with the CIC of the Instituto Polit�ecnicoNacional, Mexico City, Mexico. Contact him at [email protected].

Amir Hussain is a Full Professor with the School of Computing, Edinburgh NapierUniversity, Edinburgh, U.K. Contact him at [email protected].

Affective Computing and Sentiment Analysis

November/December 2018 25 www.computer.org/intelligent

00mis00-poria-2882362.3d (Style 4) 05-04-2019 15:37

multimodal distribution has the least overlap (increase in red and blue visuals, apart from the rest) withsparse distribution aiding the classification process.

CONCLUSIONWe have presented useful baselines for multimodal sentiment analysis and multimodal emotionrecognition. We also discussed some major aspects of multimodal sentiment analysis problem, such asthe performance in the unknown-speaker setting and the cross-dataset performance of the models.

Our future work will focus on extracting semantics from the visual features, relatedness of the cross-modal features and their fusion. We will also include contextual dependency learning in our model toovercome the limitations mentioned in the previous section.

REFERENCES1. C. Busso et al., “IEMOCAP: In teractive emotional dyadic motion capture database,” Lang.

Resour. Eval., vol. 42, no. 4, pp. 335–359, 2008.2. S. Poria, E. Cambria, D. Hazarika, N. Mazumder, A. Zadeh, and L. P. Morency, “Multi-level

multiple attentions for context-aware multimodal sentiment analysis,” in Proc. Int. Conf.Data Mining, 2017, pp. 1033–1038.

3. L. S. Chen, T. S. Huang, T. Miyasato, and R. Nakatsu, “Multi-modal human emotion/expression recognition,” in Proc. 3rd IEEE Int. Conf. Autom. Face Gesture Recognit., 1998,pp. 366–371.

4. D. Datcu and L. Rothkrantz, “Semantic audio-visual data fusion for automatic emotionrecognition,” Euromedia, 2008, http://mmi.tudelft.nl/pub/dragos/_datcu_euromedia08.pdf

5. L. C. De Silva, T. Miyasato, and R. Nakatsu, “Facial emotion recognition using multi-modalinformation,” in Proc. IEEE ICICS, 1997, pp. 397–401.

6. P. Ekman, “Universal facial expressions of emotion,” Culture and Personality:Contemporary Readings/Chicago, pp. 151–158, 1974.

7. F. Eyben, M. Wo€llmer, and B. Schuller, “Opensmile: The munich versatile and fast open-source audio feature extractor,” in Proc. 18th ACM Int. Conf. Multimedia, 2010,pp. 1459–1462.

8. A. Metallinou, S. Lee, and S. Narayanan, “Audio-visual emotion recognition using Gaussianmixture models for face and voice,” in Proc. 10th IEEE Int. Symp. ISM, 2008, pp. 250–257.

9. V. Pe�rez-Rosas, R. Mihalcea, and L.-P. Morency, “Utterance- level multimodal sentimentanalysis,” in Proc. 51st Annu. Meeting Assoc. Comput. Linguistics, 2013, pp. 973–982.

10. S. Poria, E. Cambria, R. Bajpai, and A. Hussain, “A review of affective computing: Fromunimodal analysis to multimodal fusion,” Inf. Fusion, vol. 37, pp. 98–125, 2017.

11. N. Majumder, D. Hazarika, A. Gelbukh, E. Cambria, and S. Poria, “Multimodal sentimentanalysis using hierarchical fusion with context modeling,” Knowl. Based Syst., vol. 161,pp. 124–133, 2018.

12. S. Poria, E. Cambria, D. Hazarika, N. Majumder, A. Zadeh, and L.-P. Morency, “Context-dependent sentiment analysis in user- generated videos,” in Proc. 55th Annu. Meeting Assoc.Comput. Linguistics (volume 1: Long papers), July 2017, pp. 873–883.

13. S. Poria, I. Chaturvedi, E. Cambria, and A. Hussain, “Convolutional MKL based multimodalemotion recognition and sentiment analysis,” in Proc. Int. Conf. Data Mining, 2016,pp. 439–448.

14. M. Wollmer et al., “Youtube movie reviews: Sentiment analysis in an audio-visual context,”IEEE Intell. Syst., vol. 28, no. 3, pp. 46–53, May/Jun. 2013.

15. A. Zadeh, R. Zellers, E. Pincus, and L.-P. Morency, “Multimodal sentiment intensityanalysis in videos: Facial gestures and verbal messages,” IEEE Intell. Syst., vol. 31, no. 6,pp. 82–88, Nov./Dec. 2016.

16. Y. Kim, “Convolutional neural networks for sentence classification,” arXiv:1408.5882,2014.

IEEE INTELLIGENT SYSTEMS

November/December 2018 24 www.computer.org/intelligent

IEEE WORLD CONGRESS ON SERVICES 20198–13 July 2019 • University of Milan • Milan, Italy

conferences.computer.org/services/2019Register Now

Engage, Learn, and Connect at IEEE SERVICES 2019— The leading technical forum covering services computing and applications, as well as service software technologies, for building and delivering innovative industry solutions.

■ IEEE International Congress on Big Data (BigData Congress 2019)

■ IEEE International Conference on Cloud Computing (CLOUD 2019)

■ IEEE International Conference on Edge Computing (EDGE 2019)

■ IEEE International Conference on Cognitive Computing (ICCC 2019)

■ IEEE International Congress on Internet of Things (ICIOT 2019)

■ IEEE International Conference on Web Services (ICWS 2019)

■ IEEE International Conference on Services Computing (SCC 2019)

■ Plus two additional signature symposia on future digital health services and future fi nancial services

Don’t miss IEEE SERVICES 2019—the ONLY services conference that publishes its proceedings in the IEEE Xplore digital library—where the brightest minds converge for service computing’s latest developments and breakthroughs.

This article originally appeared in IEEE Intelligent Systems, vol. 33, no. 6, 2018.

Page 50: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

48 June 2019 Published by the IEEE Computer Society 2469-7087/19/$33.00 © 2019 IEEE90 C O M P U T E R P U B L I S H E D B Y T H E I E E E C O M P U T E R S O C I E T Y 0 0 1 8 - 9 1 6 2 / 1 8 / $ 3 3 . 0 0 © 2 0 1 8 I E E E

CYBER-PHYSICAL SYSTEMS

Cyber-physical systems (CPS) are often regarded as small and simple when it comes to comput-ing. The opposite is true. These systems rely on large and complex hardware and software.

CPS not only su� er from all of the problems of informa-tion technology (IT) systems, they also present unique and di� cult challenges.

Ford Motor announced at CES 2016 that their F-150 pickup, which for years has been the most popular ve-hicle in America, now includes over 150 million lines of software in its design. That is a big software project by any measure. Software systems of this size present a number of challenges. They are traditionally built using software from multiple sources and often in multiple languages. Version control and con� guration management must be

exercised on the code base. The soft-ware must be tested.

Now consider that all of this soft-ware is wrapped in 4,000 pounds of metal and plastic. The physicality of cyber-physical computing raises the stakes. The cost of failures is mea-sured not just in lost productivity

but in physical damage and lives.Computing systems tied to physical plants must meet

constraints that do not concern IT systems. Timing is key to the behavior of cyber-physical systems. Failure to meet timing constraints and deadlines can cause the system to fail. Timing is a � rst-class functional characteristic in the case of CPS.

Cyber-physical systems also present lifecycle chal-lenges that dwarf even the considerable long lifetimes of many legacy IT systems. Automobiles are typically in ser-vice for 10 to 20 years. Many airplanes operate for a half century or more. Roadways and buildings, which are in-creasingly instrumented with IoT sensors, might last for centuries. Electric power grids operate over continental scales and must operate continuously.

Computing in the Real World Is the Grandest of ChallengesMarilyn Wolf, Georgia Tech

Cyber-physical systems are no longer a sideline

for computing—they are a core concern for

computer scientists and engineers.

r5cps.indd 90 5/9/18 3:31 PM

Page 51: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

www.computer.org/computingedge 49M AY 2 0 1 8 91

EDITOR DIMITRIOS SERPANOS ISI/ATHENA and University of Patras; [email protected]

Security is also a key challenge that takes on new dimensions in the context of cyber-physical systems. Insecure computing devices in cyber-physical systems—and there are many—not only threaten information but also physical safety. Dimitrios Ser-panos and I recently argued (“Scan-ning The Issue,” Proc. IEEE , vol. 106, no. 1, Jan. 2018, pp. 7–8; https://doi.org/10.1109/JPROC.2017.2777799) that safety and security can’t be considered separate and distinct in cyber-physical systems: insecure systems degrade safety; safety considerations also mean that some traditional computer security approaches can’t be used in cyber-physical systems.

What do these challenges mean? What should we do about them? Do we need to treat cyber-physical computa-tion di� erently than the development of IT systems?

The � rst thing we need to do is take CPS seriously. The professional societies and universities often treat cyber-physical systems as secondary. We should promote cyber-physical computing to the forefront of profes-sional concerns.

All computer scientists and engi-neers should know a few basic princi-ples of cyber-physical computing, just as computer scientists are expected to know the fundamentals of algorithms and operating systems. Real-time and low-power computing are fun-damental to CPS and also underlie other major computing applications. Multimedia and virtual reality rely on the principles established by embed-ded and cyber-physical computing. Data centers must live under power constraints, a topic that was pioneered by embedded computing.

CPS practitioners need to make use of the best practices from IT and scienti� c computing, a principle that isn’t always followed. CPS designers also need to understand when those

principles are lacking or inadequate. In many cases, we know how to adapt computing system design to the needs of CPS; in some cases, particularly in the case of safety and security, we have more work to do.

To manage the challenges of safety and security, we must extend tradi-tional engineering methodologies. Physical plant designers are used to building machines to the speci� ca-tions of an unchanging world. Unfor-tunately, computer systems face an ever-changing set of security chal-lenges and potential attacks. Even cyber-physical systems that aren’t di-rectly connected to the Internet are vulnerable to these attack vectors—a number of studies have demonstrated indirect attacks on cyber-physical in-stallations. We need to develop cyber-physical systems that can be updated to respond to new cyberthreats. And we need cyber-physical architectures that are inherently resilient in the face of a wide range of attacks.

Cyber-physical systems are no longer a sideline for comput-ing—they are a core concern for

computer scientists and engineers. It’s too late to turn back the clock to a day when computers were isolated in data centers. These challenges can be met with a combination of e� ort and skill.

MARILYN WOLF is the Rhesa “Ray”

S. Farmer, Jr. Distinguished Chair

in Embedded Computing Systems

at Georgia Tech and a Georgia

Research Alliance Eminent Scholar.

She is a Fellow of IEEE, Fellow of

ACM, a Golden Core Member of the

IEEE Computer Society, and recipi-

ent of the ASEE Frederick E. Terman

Award. Contact her at marilyn.wolf@

gatech.edu.

Author guidelines: www.computer.org/software/authorFurther details: [email protected]

www.computer.org/software

IEEE Software seeks practical,

readable articles that will appeal to

experts and nonexperts alike. The

magazine aims to deliver reliable,

useful, leading-edge information

to software developers, engineers,

and managers to help them stay

on top of rapid technology change.

Topics include requirements,

design, construction, tools, project

management, process improvement,

maintenance, testing, education and

training, quality, standards, and more.

CallArticlesfor

r5cps.indd 91 5/9/18 3:31 PM

90 C O M P U T E R P U B L I S H E D B Y T H E I E E E C O M P U T E R S O C I E T Y 0 0 1 8 - 9 1 6 2 / 1 8 / $ 3 3 . 0 0 © 2 0 1 8 I E E E

CYBER-PHYSICAL SYSTEMS

Cyber-physical systems (CPS) are often regarded as small and simple when it comes to comput-ing. The opposite is true. These systems rely on large and complex hardware and software.

CPS not only su� er from all of the problems of informa-tion technology (IT) systems, they also present unique and di� cult challenges.

Ford Motor announced at CES 2016 that their F-150 pickup, which for years has been the most popular ve-hicle in America, now includes over 150 million lines of software in its design. That is a big software project by any measure. Software systems of this size present a number of challenges. They are traditionally built using software from multiple sources and often in multiple languages. Version control and con� guration management must be

exercised on the code base. The soft-ware must be tested.

Now consider that all of this soft-ware is wrapped in 4,000 pounds of metal and plastic. The physicality of cyber-physical computing raises the stakes. The cost of failures is mea-sured not just in lost productivity

but in physical damage and lives.Computing systems tied to physical plants must meet

constraints that do not concern IT systems. Timing is key to the behavior of cyber-physical systems. Failure to meet timing constraints and deadlines can cause the system to fail. Timing is a � rst-class functional characteristic in the case of CPS.

Cyber-physical systems also present lifecycle chal-lenges that dwarf even the considerable long lifetimes of many legacy IT systems. Automobiles are typically in ser-vice for 10 to 20 years. Many airplanes operate for a half century or more. Roadways and buildings, which are in-creasingly instrumented with IoT sensors, might last for centuries. Electric power grids operate over continental scales and must operate continuously.

Computing in the Real World Is the Grandest of ChallengesMarilyn Wolf, Georgia Tech

Cyber-physical systems are no longer a sideline

for computing—they are a core concern for

computer scientists and engineers.

r5cps.indd 90 5/9/18 3:31 PM

This article originally appeared in Computer, vol. 51, no. 5, 2018.

mult-22-03-c1 Cover-1 July 12, 2016 4:40 PM

http://www.computer.org

JULY

–SEP

TEM

BER

2016

IEEE M

ultiM

edia

July–Sep

temb

er 20

16

❚ Quality M

od

eling

Vo

lum

e 23

Nu

mb

er 3

IEEE MultiMedia serves the community of scholars,

developers, practitioners, and students who are interested in multiple

media types and work in fields such as image and video processing, audio

analysis, text retrieval, and data fusion.

Read It Today!

www.computer.org /multimedia

www.computer.org/itpro

Page 52: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

Whether your enjoy your current position or you are ready for change, the IEEE Computer Society Jobs Board is a valuable resource tool.

Take advantage of these special resources for job seekers:

www.computer.org/jobs

Keep Your Career Options Open

Upload Your Resume Today!

No matter your career

level, the IEEE Computer

Society Jobs Board keeps

you connected to workplace trends

and exciting new career prospects.

JOB ALERTS

CAREER ADVICE

WEBINARS

TEMPLATES

RESUMES VIEWED BY TOP EMPLOYERS

Page 53: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

2469-7087/19/$33.00 © 2019 IEEE Published by the IEEE Computer Society June 2019 51C O M P U T E R 0 0 1 8 - 9 1 6 2 / 1 8 / $ 3 3 . 0 0 © 2 0 1 8 I E E E P U B L I S H E D B Y T H E I E E E C O M P U T E R S O C I E T Y J U LY 2 0 1 8 91

CYBER-PHYSICAL SYSTEMSEDITOR DIMITRIOS SERPANOS

ISI/ATHENA and University of Patras; [email protected]

Today, more than ever, advances through inno-vations in science and technology, such as the dramatic increase in computing power, are con-tributing to improvements in business and so-

ciety. At the same time, the world is facing global-scale

challenges such as depletion of natural resources, global warming, growing economic disparity, and terrorism. We live in a challenging age of uncertainty, with complexity growing at all levels. Thus, it’s criti-cal that we leverage ICT to its fullest to gain new knowledge and create new values by making connections between “people and things” and between the “real and cyber” worlds to e� ectively and e� ciently resolve issues in society, create better lives

for its people, and sustain healthy economic growth. Overcoming these challenges by encouraging various stakeholders at multiple levels to share a common future vision will be vital to realizing such a society through digitalization.

Society 5.0: For Human Security and Well-BeingYoshihiro Shiroishi, Kunio Uchiyama, and Norihiro Suzuki, Hitachi Re-search and Development Group

The Japanese Cabinet’s “Society 5.0” initiative

seeks to create a sustainable society for human

security and well-being through a cyber-physical

system. Keidanren (Japan Business Federation) is

well aligned to proactively deliver on the United

Nations’ Sustainable Development Goals to end

poverty, protect the planet, and ensure prosperity

for all through the creation of Society 5.0. Typical

collaborative ecosystem activities for Society 5.0

in Japan are outlined in this column.

r7cps.indd 91 7/9/18 5:24 PM

Page 54: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

52 ComputingEdge June 201992 C O M P U T E R W W W . C O M P U T E R . O R G / C O M P U T E R

CYBER-PHYSICAL SYSTEMS

In 2016, an initiative called “Society 5.0” was proposed by the Japanese Cab-inet in its 5th Science and Technology Basic Plan,1 with a vision toward creat-ing a “Super Smart Society.” The Super Smart Society is positioned as the ­fth developmental stage in human soci-ety, following hunter/gatherer, pas-toral/agrarian, industrial, and infor-mation,2 and represents a sustainable society connected by digital technol-ogies that attend in detail to the var-ious needs of that society by provid-ing necessary items or services to the people who require them, when they are required, in the amount required, thus enabling its citizens to live an active and comfortable life through high-quality services regardless of age, sex, region, language, and so on. Note, however, that digitalization is only the means, and that it is essential that we humans remain the central actors so that a ­rm focus is kept on building a society that makes us happy and provides us with a sense of worth. The Japanese government presented

its vision of Society 5.0, together with exhibits by supporting companies from Japan, at CeBIT 2017,3 Europe’s business festival for innovation and digitalization that covers the digita-lization of business, government, and society from every angle.

International discussion is pro-ceeding on the implementation of the United Nations’ Sustainable De-velopment Goals (SDGs),4 which were adopted in September 2015 as guide-posts for the entire world. The driving principle is to realize peace and pros-perity for all people and the planet by responding to the challenges with an inclusiveness that “leaves no one be-hind.” The Japanese government has made the SDGs Implementation Guid-ing Principles—science, technology, and innovation (STI)—a key policy and priority area. The Advisory Board for the Promotion of Science and Technol-ogy Diplomacy deliberated on these concepts and prepared a recommenda-tion5 identifying four action areas to mobilize “STI for SDGs”:

› creating a global future through Society 5.0,

› enabling solutions using global data,

› promoting cooperation at a global level, and

› fostering human resources to undertake STI e�orts for SDGs.

The 12 service platforms shown in Figure 1 will be developed by fully utilizing the Internet of Things (IoT): big data, computation, arti­cial in-telligence (AI), display, and robotics technologies. A series of government initiatives are now in progress in Ja-pan, including “Robot Industry”6 and “Connected Industries,”7 which were introduced by the Ministry of Econ-omy, Trade, and Industry (METI); and “Conference toward AI Network Soci-ety,”8 introduced by the Ministry of Internal A�airs and Communications (MIC). These initiatives essentially target the development of advanced common platform technologies, ser-vices and systems, and system of

Intelligenttransportation

systems

Advanced securityand social

implementation

Utilizing existingsystems for positioning,

authentication, etc.

New manufacturingsystems

Regionalinclusive care

systems

Smartfood-chain systems

Utilization ofstandard data

Standardization ofinterfaces and data

formats

Human resourcedevelopment

Strengtheningdeveloping ofinformation

communicationplatforms Regulatory

institutional reform fornew services

Hospitality systems

Integratedmaterial development

systems

Society resilientagainst natural

disasters

Global environmentinformation platform

New businessand services

Infrastructuremaintenanceand updates

Smartmanufacturing

systems

Energy value chains

Figure 1. The 12 service platforms for creating a Super Smart Society.2

r7cps.indd 92 7/9/18 5:24 PM

Page 55: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

www.computer.org/computingedge 53J U LY 2 0 1 8 93

systems for new market creation and transformation into a prosperous so-ciety by creating new values through cyber-physical systems (CPS).9 As the CPS in Figure 2 shows, various big data items—collected from intelli-gent sensing devices with low power and networks and kept in informa-tion storage devices—can be ana-lyzed and visualized using analytic tools such as AI with high computing power in cyberspace. This valuable data, often hard for humans alone to notice, will inform actions taken by decision-makers to provide solu-tions to societal issues and economic growth in the physical world.

In Japan, a series of government projects, such as ImPACT (Impulsing Paradigm Change through Disruptive Technologies Program) and SIP (cross- ministerial Strategic Innovation Pro-motion program),10 are geared toward realizing such technologies and ser-vice platforms. ImPACT is designed to develop industry- and society-chang-ing disruptive STIs through high-risk,

high-impact research and develop-ment, while SIP covers the entire path from basic research to e�ective exit strategies (practical application/com-mercialization) as well as taking on initiatives to reform regulations and systems. In SIP, for example, projects such as an automated driving sys-tem; energy carriers; cybersecurity for critical infrastructure; and tech-nologies for creating next-generation agriculture, forestry, and �sheries are now in progress with allocated budgets from the Council for Science, Technology, and Innovation (CSTI), which is responsible for planning and coordinating STI policies under the Japanese Cabinet. In addition, several academic studies on authentic third proposals and policy proposals for Society 5.0, including the “SDGs Proj-ect,” “Next-Generation Computing Project,” and “Telexistence Project,” are being carried out by The Engi-neering Academy of Japan (EAJ),11 a unique non-governmental engineer-ing academy in Japan.

TOWARD HUMAN SECURITY AND WELL-BEINGWe have also entered an era in which the human lifespan— because of STI—is reaching 100 years; therefore, it’s in-creasingly vital to empower humans by including a wider range of stake-holders and digitalization technolo-gies. We are just beginning to strive for true human security and well- being. We must foster a pioneering spirit and the ability to be disruptive if necessary by increasing the number of people working under their own ini-tiative and acting as game changers. To achieve a sustainable society on a global scale as soon as possible, it will be necessary to pursue transformation through a collaborative ecosystem that brings together ideas from indus-try, academia, and citizens.

Japan’s most important business federation, Keidanren, is well aligned with this game-changing initiative. Keidanren issued the �fth revision of its Charter of Corporate Behavior12

with the primary aim of proactively

Many Cores

In memory DB, etc. Distributed KVS*, etc.

Wide area network

Device w/MCU/RFID/sensor(temp, ACC, position, etc.)

Industry, humannature, smart society

IoT infrastructure

Real-world sensing Real-world actuation

Data center/cloud

Edge MCU/RFID/Sensor

MCU/RFID/sensor

Collab.

Business application

Service solution(actuation)

GPU

CPU

AI

Cloud computing

Edge computing

Security, privacy, digital rights protection

(Making rawdata available for

analytics)

*KVS: key value storages

End point/user

Collab.

Cloud

Robot/display

Accumulated databatch process

Modeling and mapping

Streaming continuous process

Social societyindustry

Figure 2. A cyber-physical system.

r7cps.indd 93 7/9/18 5:24 PM

92 C O M P U T E R W W W . C O M P U T E R . O R G / C O M P U T E R

CYBER-PHYSICAL SYSTEMS

In 2016, an initiative called “Society 5.0” was proposed by the Japanese Cab-inet in its 5th Science and Technology Basic Plan,1 with a vision toward creat-ing a “Super Smart Society.” The Super Smart Society is positioned as the ­fth developmental stage in human soci-ety, following hunter/gatherer, pas-toral/agrarian, industrial, and infor-mation,2 and represents a sustainable society connected by digital technol-ogies that attend in detail to the var-ious needs of that society by provid-ing necessary items or services to the people who require them, when they are required, in the amount required, thus enabling its citizens to live an active and comfortable life through high-quality services regardless of age, sex, region, language, and so on. Note, however, that digitalization is only the means, and that it is essential that we humans remain the central actors so that a ­rm focus is kept on building a society that makes us happy and provides us with a sense of worth. The Japanese government presented

its vision of Society 5.0, together with exhibits by supporting companies from Japan, at CeBIT 2017,3 Europe’s business festival for innovation and digitalization that covers the digita-lization of business, government, and society from every angle.

International discussion is pro-ceeding on the implementation of the United Nations’ Sustainable De-velopment Goals (SDGs),4 which were adopted in September 2015 as guide-posts for the entire world. The driving principle is to realize peace and pros-perity for all people and the planet by responding to the challenges with an inclusiveness that “leaves no one be-hind.” The Japanese government has made the SDGs Implementation Guid-ing Principles—science, technology, and innovation (STI)—a key policy and priority area. The Advisory Board for the Promotion of Science and Technol-ogy Diplomacy deliberated on these concepts and prepared a recommenda-tion5 identifying four action areas to mobilize “STI for SDGs”:

› creating a global future through Society 5.0,

› enabling solutions using global data,

› promoting cooperation at a global level, and

› fostering human resources to undertake STI e�orts for SDGs.

The 12 service platforms shown in Figure 1 will be developed by fully utilizing the Internet of Things (IoT): big data, computation, arti­cial in-telligence (AI), display, and robotics technologies. A series of government initiatives are now in progress in Ja-pan, including “Robot Industry”6 and “Connected Industries,”7 which were introduced by the Ministry of Econ-omy, Trade, and Industry (METI); and “Conference toward AI Network Soci-ety,”8 introduced by the Ministry of Internal A�airs and Communications (MIC). These initiatives essentially target the development of advanced common platform technologies, ser-vices and systems, and system of

Intelligenttransportation

systems

Advanced securityand social

implementation

Utilizing existingsystems for positioning,

authentication, etc.

New manufacturingsystems

Regionalinclusive care

systems

Smartfood-chain systems

Utilization ofstandard data

Standardization ofinterfaces and data

formats

Human resourcedevelopment

Strengtheningdeveloping ofinformation

communicationplatforms Regulatory

institutional reform fornew services

Hospitality systems

Integratedmaterial development

systems

Society resilientagainst natural

disasters

Global environmentinformation platform

New businessand services

Infrastructuremaintenanceand updates

Smartmanufacturing

systems

Energy value chains

Figure 1. The 12 service platforms for creating a Super Smart Society.2

r7cps.indd 92 7/9/18 5:24 PM

Page 56: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

54 ComputingEdge June 201994 C O M P U T E R W W W . C O M P U T E R . O R G / C O M P U T E R

CYBER-PHYSICAL SYSTEMS

delivering on SDGs through the cre-ation of Society 5.0. Figure 3 summa-rizes the concept of Society 5.0 for SDGs, as well as the challenges, key technologies, and systems of Society 5.0 and the 17 goals of SDGs.13 Al-though STI has in many ways greatly enhanced the convenience of our lifestyle, it has also increased social complexity, revealing some negative aspects of a digital society. Society 5.0 can provide approaches to reduc-ing or eliminating these negative as-pects. However, doing so will require breaking down what the position pa-per calls the “�ve walls”: ministries and agencies, the legal system, tech-nologies, human resources, and social acceptance.14 This will be our global challenge, and professional societies such as the IEEE Computer Society are expected to play a leading role in fully �edged cooperation with industrial

society on STI, trans-science, and multidisciplinary issues.

We are living in a challeng-ing age of societal com-plexity and uncertainty.

The Japanese Cabinet’s Society 5.0 initiative envisions the creation of a Super Smart Society—a sustainable society where various types of val-ues are connected through CPS and where people can live in safety, se-curity, and comfort. CPS can bridge different sectors, countries, regions, and societies that otherwise tend to be divided. The key to implementing Society 5.0/SDGs is that stakeholders share and address the challenges to-gether by fully utilizing the potential of CPS.

To move toward greater human security and well-being, we will need

to pursue transformation through a collaborative ecosystem with a shared vision for the future created with the participation of all stakeholders. Spe-ci�cally, we should take the following actions:

› present a future vision of changes in society through STI;

› grasp and overcome challenges by creating new values through CPS; and

› establish collaborations among industry, multidisciplinary ac-ademia, and public and private sectors.

REFERENCES1. “The 5th Science and Technology Ba-

sic Plan,” Government of Japan, 22 Jan. 2016; http://www8.cao.go.jp/cstp/english/basic/5thbasicplan.pdf.

2. Y. Harayama, “Society 5.0: Aiming

Early warningalert system

Empowermentof women

e-Learningsystem

1

2

16 17

14

13

12

11

9 8

7

6

5

4

3

15

Society 5.0for SDGs

17 Goals of SDGs

IoT

Drone

AI

Robot

On Demand

Sharing

Big Data

MR

AR

VR

5G

PKI

3DPrint

SensorCloud

Edge

Mobile

RETech

TransTech

AutoTech

DPTech

AdTech

AgriTech

FinTech

CivicTech

GovTech

LegalTech

JobTech

EdTech

ConTech

IntraTech

FoodTech

CareTech

HealthTech

UrbanTech

TourTech

BioTech

SportsTech

Utilization ofremote sensing andoceanographic data

Utilization ofmeteorological

and otherobservation data

Smart cities

Global innovationecosystem

Smart agricultureSmart Food

Smart grid systemi-Construction

HomeTech

EnviTech

MediaTech

10

Figure 3. Society 5.0 for SDGs (Sustainable Development Goals).13

r7cps.indd 94 7/9/18 5:24 PM

Page 57: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

www.computer.org/computingedge 55J U LY 2 0 1 8 95

for a New Human-Centered Society.” Hitachi Rev., vol. 66, no. 6, 2017, pp. 556–557; http://www.hitachi.com/rev/archive/2017/r2017_06/pdf/p08-13_TRENDS.pdf.

3. “CeBIT: Japan’s Vision of Society 5.0,” Euronews; http://www.euronews.com/tag/cebit-2017.

4. “Sustainable Development Goals: 17 Goals to Transform Our World,” United Nations; http://www.un.org /sustainabledevelopment.

5. “Recommendation for the Future—STI as a Bridging Force to Provide Solutions for Global Issues: Four Actions of Science and Technology Diplomacy to Implement the SDGs,” Advisory Board for the Promotion of Science and Technology Diplomacy, 12 May 2017; http://www.mofa.go.jp /�les/000255801.pdf.

6. “Robot Industry,” Ministry of Econ-omy, Trade and Industry (METI), 31 Jan. 2018; http://www.meti.go.jp /english/policy/mono_info_service/robot_industry/index.html.

7. “Connected Industry,” METI, 13 June 2018; http://www.meti.go.jp/english/policy/mono_info_service/connected_industries/index.html.

8. “Conference toward AI Network Society” Ministry of Internal A¤airs and Communication; http://www .oecd.org/going-digital/ai-intelligent-machines-smart-policies/conference-agenda/ai-intelligent-machines-smart-policies-sudoh.pdf.

9. R. Poovendran et al., “Special Issue on Cyber-Physical Systems,” Proc. IEEE, vol. 100, no. 1, 2012, pp. 6–12.

10. “Pioneering the Future: Japanese Science, Technology and Innovation 2017,” Cabinet O¦ce, Brochure SIP; http://www8.cao.go.jp/cstp/panhu/sip_english/sip_en.html.

11. “Engineering Academy of Japan”; https://www.eaj.or.jp.

12. “Revision of the Charter of Corporate Behavior,” Keidanren (Japan Business Federation) Policy Proposals, 8 Nov. 2017; http://www.keidanren.or.jp/en/policy/csr/charter2017.html.

13. “Society 5.0 for SDGs,” Keidanren

(Japan Business Federation), 8 Nov. 2017; https://www.keidanren.or.jp/en/policy/csr/2017reference2.pdf.

14. “Toward Realization of the New Economy and Society: Reform of the economy and society by the deepen-ing of ‘Society 5.0’,” Keidanren (Japan Business Federation) Policy and Action, 19 Apr. 2016; http://www .keidanren.or.jp/en/policy/2016/029_outline.pdf.

DISCLAIMERThis article does not necessarily

reflect the positions or views of the

authors’ employer.

YOSHIHIRO SHIROISHI is a tech-nology advisor in the R&D Group at Hitachi, Ltd.; a Fellow of IEEE; and a member of EAJ. Contact him at [email protected].

KUNIO UCHIYAMA is a technology advisor in the R&D Group at Hitachi, Ltd.; a Fellow of IEEE; and a member of EAJ. Contact him at kunio.uchiyama [email protected].

NORIHIRO SUZUKI is vice president, executive o�cer, and chief technology o�cer of Hitachi, Ltd., and general manager of its R&D Group; and is a Senior Member of IEEE and member of EAJ. Contact him at norihiro.suzuki [email protected].

For more information on paper submission, featured articles, calls for papers, and subscription links visit:

www.computer.org/tsusc

IEEE TRANSACTIONS ON

SUSTAINABLECOMPUTING

T-SUSC is financially cosponsored by IEEE

Computer Society and IEEE Communications Society

T-SUSC is technically cosponsored by IEEE Council on Electronic

Design Automation

SUBSCRIBE AND SUBMIT

SUBMITTODAY

Read your subscriptions through the myCS publications portal at

http://mycs.computer.org

r7cps.indd 95 7/9/18 5:24 PM

94 C O M P U T E R W W W . C O M P U T E R . O R G / C O M P U T E R

CYBER-PHYSICAL SYSTEMS

delivering on SDGs through the cre-ation of Society 5.0. Figure 3 summa-rizes the concept of Society 5.0 for SDGs, as well as the challenges, key technologies, and systems of Society 5.0 and the 17 goals of SDGs.13 Al-though STI has in many ways greatly enhanced the convenience of our lifestyle, it has also increased social complexity, revealing some negative aspects of a digital society. Society 5.0 can provide approaches to reduc-ing or eliminating these negative as-pects. However, doing so will require breaking down what the position pa-per calls the “�ve walls”: ministries and agencies, the legal system, tech-nologies, human resources, and social acceptance.14 This will be our global challenge, and professional societies such as the IEEE Computer Society are expected to play a leading role in fully �edged cooperation with industrial

society on STI, trans-science, and multidisciplinary issues.

We are living in a challeng-ing age of societal com-plexity and uncertainty.

The Japanese Cabinet’s Society 5.0 initiative envisions the creation of a Super Smart Society—a sustainable society where various types of val-ues are connected through CPS and where people can live in safety, se-curity, and comfort. CPS can bridge different sectors, countries, regions, and societies that otherwise tend to be divided. The key to implementing Society 5.0/SDGs is that stakeholders share and address the challenges to-gether by fully utilizing the potential of CPS.

To move toward greater human security and well-being, we will need

to pursue transformation through a collaborative ecosystem with a shared vision for the future created with the participation of all stakeholders. Spe-ci�cally, we should take the following actions:

› present a future vision of changes in society through STI;

› grasp and overcome challenges by creating new values through CPS; and

› establish collaborations among industry, multidisciplinary ac-ademia, and public and private sectors.

REFERENCES1. “The 5th Science and Technology Ba-

sic Plan,” Government of Japan, 22 Jan. 2016; http://www8.cao.go.jp/cstp/english/basic/5thbasicplan.pdf.

2. Y. Harayama, “Society 5.0: Aiming

Early warningalert system

Empowermentof women

e-Learningsystem

1

2

16 17

14

13

12

11

9 8

7

6

5

4

3

15

Society 5.0for SDGs

17 Goals of SDGs

IoT

Drone

AI

Robot

On Demand

Sharing

Big Data

MR

AR

VR

5G

PKI

3DPrint

SensorCloud

Edge

Mobile

RETech

TransTech

AutoTech

DPTech

AdTech

AgriTech

FinTech

CivicTech

GovTech

LegalTech

JobTech

EdTech

ConTech

IntraTech

FoodTech

CareTech

HealthTech

UrbanTech

TourTech

BioTech

SportsTech

Utilization ofremote sensing andoceanographic data

Utilization ofmeteorological

and otherobservation data

Smart cities

Global innovationecosystem

Smart agricultureSmart Food

Smart grid systemi-Construction

HomeTech

EnviTech

MediaTech

10

Figure 3. Society 5.0 for SDGs (Sustainable Development Goals).13

r7cps.indd 94 7/9/18 5:24 PM

This article originally appeared in Computer, vol. 51, no. 7, 2018.

Page 58: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

w w w . c o m p u t e r . o r g / i n t e r n e t

IEEE IN

TERNET CO

MPU

TING

 July/August 2018

Evolution of Rack-Scale System

s

Volum

e 22 Num

ber 4

VOLUME 22, NUMBER 4 JULY/AUGUST 2018

Evolution of Rack-Scale Systems

www.computer .o rg / in te rne t

VOLUME 22, NUMBER 2 MARCH/APRIL 2018

Healthcare Informatics and Privacy

w w w . c o m p u t e r . o r g / i n t e r n e t

IEEE IN

TERNET CO

MPU

TING

 M

ay/June 2018

Connected and Autonomous Vehicles

Volum

e 22 Num

ber 3

VOLUME 22, NUMBER 3 MAY/JUNE 2018

Connected and Autonomous Vehicles

w w w . c o m p u t e r . o r g / i n t e r n e t

IEEE IN

TERNET CO

MPU

TING

 January/February 2018

IoT-Enhanced H

uman Experience

Volum

e 22 Num

ber 1

VOLUME 22, NUMBER 1 JANUARY/FEBRUARY 2018

IoT-Enhanced Human Experience

Join the IEEE Computer Society for subscription discounts today!www.computer.org/product/magazines/internet-computing

IEEE Internet Computing delivers novel content from academic and industry experts on the latest developments and key trends in Internet technologies and applications.

Written by and for both users and developers, the bimonthly magazine covers a wide range of topics, including:

• Applications• Architectures• Big data analytics• Cloud and edge computing• Information management• Middleware• Security and privacy• Standards• And much more

In addition to peer-reviewed articles, IEEE Internet Computing features industry reports, surveys, tutorials, columns, and news.

www.computer.org/internet

features industry reports, surveys, tutorials, columns, and news.

features industry reports, surveys, tutorials, columns, and news.

features industry reports, surveys, tutorials, columns, and news.

features industry reports, surveys, tutorials, columns, and news.

features industry reports, surveys, tutorials, columns, and news. reports, surveys, tutorials, columns, and news. reports, surveys, tutorials, columns, and news.

Page 59: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

72 June 2019 Published by the IEEE Computer Society 2469-7087/19/$33.00 © 2019 IEEE

I EEE Computer Society conferences are valuable forums for learning on broad and dynamically shifting topics from within the computing profession. With over 200 conferences featuring leading

experts and thought leaders, we have an event that is right for you.

JULY 8 July• ICME (IEEE Int’l Conf. on Multimedia and

Expo) ▲• ICMEW (IEEE Int’l Conf. on Multimedia &

Expo Workshops) ▲• SERVICES (IEEE World Congress on Ser-

vices) ●• SNPD (20th IEEE/ACIS Int’l Conf. on Soft-

ware Eng., Artifi cial Intelligence, Networking and Parallel/Distributed Computing) ▲

15 July• ASAP (IEEE 30th Int’l Conf. on Application-

specifi c Systems, Architectures and Proces-sors) ◗

• CBI (IEEE 21st Int’l Conf. on Business Infor-matics) ●

• COMPSAC (IEEE 43rd Annual Computer Soft-ware and Applications Conf.) ◗

• ICALT (19th IEEE Int’l Conf. on Advanced Learning Technologies) ★

• ISVLSI (IEEE Computer Society Annual Sym-posium on VLSI) ◗

23 July• ICCI*CC (IEEE 18th Int’l Conf. on Cognitive

Informatics & Cognitive Computing) ●30 July• IRI (IEEE 20th Int’l Conf. on Information Reuse

and Integration for Data Science) ◗• SMC-IT (IEEE Int’l Conf. on Space Mission

Challenges for Information Technology) ◗

AUGUST1 August• CSE (IEEE Int’l Conf. on Computational Sci-

ence and Eng.) ◗• EUC (IEEE Int’l Conf. on Embedded and Ubiq-

uitous Computing) ◗5 August• TrustCom (IEEE Int’l Conf. on Trust, Secu-

rity and Privacy in Computing and Commu-nications) ◆

8 August• 2019 Cloud Summit ◗

9 August• SmartIoT (IEEE Int’l Conf. on Smart Internet of

Things) ▲10 August• HPCC (IEEE 21st Int’l Conf. on High Perfor-

mance Computing and Communications) ▲• SmartCity (IEEE 17th Int’l Conf. on Smart City)

• DSS (IEEE 5th Int’l Conf. on Data Science and Systems) ▲

15 August• NAS (IEEE Int’l Conf. on Networking,

Conference CalendarQuestions? Contact [email protected]

Find a region: Africa ■Asia ▲

Australia ◆Europe ●

North America ◗South America ★

Page 60: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

Architecture and Storage) ▲18 August• HCS (IEEE Hot Chips 31 Symposium) ◗• RTCSA (IEEE 25th Int’l Conf. on Embedded

and Real-Time Computing Systems and Appli-cations) ▲

27 August• ASONAM (IEEE/ACM Int’l Conf. on Advances in

Social Networks Analysis and Mining) ◗

SEPTEMBER13 September• EWDTS (IEEE East-West Design & Test Sym-

posium) ●19 September• AVSS (16th IEEE Int’l Conf. on Advanced

Video and Signal Based Surveillance) ▲• ESEM (ACM/IEEE Int’l Symposium on Empir-

ical Software Eng. and Measurement) ★23 September• CLUSTER (IEEE Int’l Conf. on Cluster Com-

puting) ◗• PACT (28th Int’l Conf. on Parallel Architectures

and Compilation Techniques) ◗• RE (IEEE 27th Int’l Requirements Eng. Conf.) ▲• SecDev (IEEE Secure Development) ◗

25 September• HCC (IEEE Int’l Conf. on Humanized Comput-

ing and Communication) ◗29 September• ICSME (IEEE Int’l Conf. on Software Mainte-

nance and Evolution) ◗

OCTOBER1 October• MCSoC (IEEE 13th Int’l Symposium on

Embedded Multicore/Many-core Systems-on-Chip) ▲

2 October• DFT (IEEE Int’l Symposium on Defect and

Fault Tolerance in VLSI and Nanotechnology Systems) ●

6 October• MODELS (ACM/IEEE 22nd Int’l Conf. on

Model Driven Eng. Languages and Systems) ●

12 October• MICRO (52nd Annual IEEE/ACM Int’l Sympo-

sium on Microarchitecture) ◗14 October• ISMAR (IEEE Int’l Symposium on Mixed and

Augmented Reality) ▲• LCN (IEEE 44th Conf. on Local Computer Net-

works) ●• VL/HCC (IEEE Symposium on Visual Lan-

guages and Human-Centric Computing) ◗15 October• AIPR (IEEE Applied Imagery Pattern Recogni-

tion Workshop) ◗16 October• FIE (IEEE Frontiers in Education Conf.) ◗

20 October• ICCV (IEEE/CVF Int’l Conf. on Computer

Vision) ▲• VIS (IEEE Visualization Conf.) ◗

28 October• EDOC (IEEE 23rd Int’l Enterprise Distrib-

uted Object Computing Conf.) ●• ISSRE (IEEE 30th Int’l Symposium on

Software Reliability Eng.) ●

NOVEMBER4 November• ICTAI (IEEE 31st Int’l Conf. on Tools with Arti-

ficial Intelligence) ◗7 November• SEC (IEEE/ACM Symposium on Edge Comput-

ing) ◗8 November• ICDM (IEEE Int’l Conf. on Data Mining) ▲

Learn more about IEEE Computer Society Conferenceswww.computer.org/conferences

Page 61: Data Analytics - IEEE Computer Society … · he IEEE Computer Society’s lineup of 12 peer-reviewed tech-nical magazines covers cut-ting-edge topics ranging from software design

VERACODE AT THE FOREFRONT OF THE TRANSFORMATIONBy providing a comprehensive and accurate view of software security defects, companies can create secure software and ensure the software they buy or download is free of vulnerabilities. As a result, companies using Veracode are free to boldly innovate, explore, pioneer, discover, entertain, and change the world.

VISIT US AT VERACODE.COM