88
An Investigation into the Implications of Cloud Computing Models on Information Management, Discovery and Exploitation aRPU[\Y\TVR‘ Yannick Allard Mathieu Lavallee Michel Mayrand Elisa Shahbazian Prepared By: OODA Technologies Inc. 4891 Grosvenor Montr´ eal (Qc), H3W 2M2 514.476.4773 Prepared For: Defence Research & Development Canada, Atlantic Research Centre 9 Grove Street, PO Box 1012 Dartmouth, NS B2Y 3Z7 902-426-3100 Scientific Authority: Anthony Isenor Contract Number: W7707-145677 Call Up Number: 11; 4501328099 Project: 01da, Maritime Information Warfare Report Delivery Date: November 30, 2015 The scientific or technical validity of this Contract Report is entirely the responsibility of the contractor and the contents do not necessarily have the approval or endorsement of Defence R&D Canada. Contract Report DRDC-RDDC-2016-C030 November 2015

An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

An Investigation into the Implications of CloudComputing Models on Information Management,

Discovery and Exploitation

Yannick AllardMathieu LavalleeMichel MayrandElisa Shahbazian

Prepared By: OODA Technologies Inc.4891 GrosvenorMontreal (Qc), H3W 2M2514.476.4773

Prepared For: Defence Research & Development Canada, Atlantic Research Centre9 Grove Street, PO Box 1012Dartmouth, NSB2Y 3Z7902-426-3100

Scientific Authority: Anthony IsenorContract Number: W7707-145677Call Up Number: 11; 4501328099Project: 01da, Maritime Information WarfareReport Delivery Date: November 30, 2015

The scientific or technical validity of this Contract Report is entirely the responsibility of thecontractor and the contents do not necessarily have the approval or endorsement of Defence R&DCanada.

Page 2: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

This page is intentionally left blank.

Page 3: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Executive Summary

All branches of the armed forces are exploring the adoption of cloud computing. By pushing appli-cations into the cloud, rather than relying on local computer resources, many of the deployment,security, redundancy, scalability and flexibility issues for tactical information systems are simplified.However, unlike traditional cloud computing applications that benefit from highly reliable, always-on, high-bandwidth networks, the military cannot be guaranteed such stability. As such, differentsituations will arise in the case of at-sea data exchange. Cloud-based, cloud-to-disadvantaged-cloud and cloud-to-no-cloud situations will occur. Therefore there is a need to investigate differentstrategies for information management and distribution in a cloud environment.

This document presents the cloud architectures and information management models used withinsome of the major online service providers, namely Facebook and eBay. It provides insight on howthese service providers developed and implemented solutions to the cloud to provide their serviceswith state-of-the-art security, redundancy, scalability and flexibility.

First, the architecture and information management model of Facebook, the largest social networkto date, is investigated. This service provider is by far the largest content provider with more thana billion users. It uses graph-based representation of its data, a log-based storage system, namedHaystack, for write once read often data, and a publish and subscribe mechanism for subsystemsnotification as well as a pull mechanism for client update.

Then, the architecture of eBay, the largest online marketplace, is presented. It uses data partitionalong usage patterns and a horizontal scaling solution, a publish / subscribe mechanism for real-time update of its subsystems, high level of metadata for clear classification of items within specificcategories. Both service providers developed internally their search engine, storage solution, se-curity application that allowed them to scale easily and provide their users with the experiencethat was required to make them a success. No existing applications were able to meet their uniquerequirements of both Facebook and eBay.

The current state of at-sea data exchange is then presented to extract its unique characteristics.The interplay between Facebook and eBay architectures and information management models andthese unique characteristics is assessed, as applied to future cloud-based naval coalition operations,and then put in relation with cloud, cloud to disadvantage cloud and cloud-to-no-cloud situations.

i

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 4: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

This page is intentionally left blank.

ii

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 5: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Contents

Executive Summary i

Contents iii

List of Figures vii

List of Tables ix

1 Introduction 1

2 Cloud Computing Architectures 3

2.1 Overview of Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Cloud Type of Service and Deployment Model . . . . . . . . . . . . . . . . . . . . 5

2.2.1 Type of Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2.2 Deployment Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3 Cloud Messaging Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3.1 Request / Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3.2 Publish/Subscribe (pub/sub) . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3.2.1 Push / Pull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.4 Cloud Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.5 Examples of Cloud Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5.1 Cloud Architecture for Social Network . . . . . . . . . . . . . . . . . . . . . 11

2.5.2 Cloud Architecture for e-Commerce . . . . . . . . . . . . . . . . . . . . . . 12

iii

Page 6: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

2.5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Architectures and Information Management Models 15

3.1 Facebook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.1.1 Facebook’s Cloud Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.1.1.1 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1.1.2 Wormhole Pub-Sub System . . . . . . . . . . . . . . . . . . . . . . 21

3.1.1.3 Facebook Messages Stack . . . . . . . . . . . . . . . . . . . . . . . 22

3.1.2 The Social Graph, How Information is Discovered and Shared . . . . . . . . 24

3.1.3 Analysis of Facebook’s Information Model . . . . . . . . . . . . . . . . . . . 25

3.2 eBay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2.1 eBay’s Cloud Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.2.1.1 Openstack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2.1.2 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2.2 How Information is Discovered and Shared on eBay . . . . . . . . . . . . . 31

3.2.2.1 Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2.2.2 Search Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2.3 Analysis of eBay’s Information Model . . . . . . . . . . . . . . . . . . . . . 33

3.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4 At-sea data exchange environment 37

4.1 Overview of Maritime Communications . . . . . . . . . . . . . . . . . . . . . . . . 37

4.2 At Sea Combat System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.2.1 On-board information storage, discovery, distribution, and access . . . . . . 41

4.2.2 Tactical Data Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.2.3 GCCS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.2.4 Internet and other communication subsystems . . . . . . . . . . . . . . . . 46

4.2.5 Satellite Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.3 Global Information Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

iv

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 7: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

CONTENTS

4.4 Summary of unique characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.4.1 Information Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.4.2 Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.4.3 Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.4.4 Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.4.5 At Sea Information Model (IM) Summary . . . . . . . . . . . . . . . . . . . 53

5 Information management models in at-sea data exchange environment 55

5.1 Requirements for the Future At-sea Information Environment . . . . . . . . . . . . 55

5.2 Suitability to Particular Cloud Architectures . . . . . . . . . . . . . . . . . . . . . 59

5.3 CtoDC and CtoNC Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.4 Cloud Computing (CC) Architecture Essentials . . . . . . . . . . . . . . . . . . . . 63

6 Conclusion 65

Bibliography 67

v

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 8: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

This page is intentionally left blank.

vi

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 9: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

List of Figures

2.1 Publish Subscribe Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Push vs Pull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1 The Facebook Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.2 Overall BLOB storage architecture ( C: creates, D : delete, R : read ), where createsand most deletes are handled by Haystack. Reads are handled by either Haystackor f4 (from [14]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.3 Serving a photo using Haystack (from [16]) . . . . . . . . . . . . . . . . . . . . . . 20

3.4 Components of wormhole from ([15]) . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.1 Major Naval Networking Environment Components, reproduced from [33] . . . . . 38

4.2 Global Interconnections Between the Combined Enterprise Regional InformationExchange System (CENTRIXS) Networks, reproduced from [35] . . . . . . . . . . 39

4.3 Operational View of HFIP and SNR, reproduced from [35] . . . . . . . . . . . . . . 40

4.4 Surface Vessel Combat System, reproduced from [37] . . . . . . . . . . . . . . . . . 42

4.5 Comparison of Link-11 and Link-22/Link-16 Message Features, reproduced from [40] 44

4.6 GCCS-M High-Level Information Flow and Functions, reproduced from [43] . . . . 45

4.7 GCCS-M System Interfaces, reproduced from [44] . . . . . . . . . . . . . . . . . . . 46

4.8 Message Traffic Process, reproduced from [35] . . . . . . . . . . . . . . . . . . . . . 47

4.9 GIG Architecture, reproduced from [49] . . . . . . . . . . . . . . . . . . . . . . . . 49

4.10 GIG Networking, reproduced from [33] . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.1 DoD Enterprise Cloud Environment, reproduced from [2] . . . . . . . . . . . . . . 56

vii

Page 10: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

5.2 Cloud computing hardware and software as components of the GIG, reproducedfrom [56] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.3 ONR Vision of Naval Tactical Cloud Processing, reproduced from [57] . . . . . . . 59

viii

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 11: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

LIST OF TABLES

List of Tables

2.1 Possible arrangements of different cloud deployment models . . . . . . . . . . . . . 6

2.2 Push vs Pull in Publish/Subscribe system . . . . . . . . . . . . . . . . . . . . . . . 8

ix

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 12: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

This page is intentionally left blank.

x

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 13: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

LIST OF TABLES

ADNS Automated Digital Networking System

AEHF Advanced EHF

AMHS Automatic Message Handling System

BGDBM Battle Group Database Management

BLII Base Level Information Infrastructure

BLOBs Binary Large OBjects

C2 Command & Control

C2OIX Command and Control Office Information Exchange

C4I Command, Control, Communications, Computers and Intelligence

CANES Consolidated Afloat Networks and Enterprise Service

CDMI Cloud Data Management Interface

CDN Content Delivery Network

CENTRIXS Combined Enterprise Regional Information Exchange System

CIAs Communications Information Advisories

CIBs Communications Information Bulletins

CIO Chief Information Officer

COE Common Operating Environment

CONUS Continental United States

COP Common Operating Picture

CtoDC Cloud to Disadvantaged Cloud

CtoNC Cloud to No Cloud

CUDIXS Common User Digital Information Exchange Subsystem

DaaS Data as a Service

DBMS Database Management System

DII Defence Information Infrastructure

DMS Defence Message System

DoD Department of Defence

DON Department of the Navy

xi

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 14: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

DSB Defense Science Board

DSCS Defense Satellite Communications System

DUSC Directory Update and Service Center

EHF Extremely High Frequency

ESM Electronic Support Measures

FBKS Fleet Broadcast Keying System

FMX Fleet Message Exchange

FSM Fleet SIPRNET Messaging

GCCS Global Command and Control System

GCCS-M Global Command and Control System - Maritime

GIG Global Information Grid

HAG High Assurance Guard

HF High Frequency

HFIP High Frequency Internet Protocol

I3 Integrated Imagery and Intelligence

IaaS Infrastructure as a Service

IE Information Enterprise

IEA Information Enterprise Architecture

IFF Identification Friend or Foe

IM Information Model

IP Internet Protocol

ISNS Integrated Shipboard Network System

IT-21 Information Technology for the 21st Century

JIE Joint Information Environment

JWICS Joint Worldwide Intelligence Communication System

LAN Local Area Network

MCEN Marine Corps Enterprise Network

MCS Message Conversion System

xii

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 15: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

LIST OF TABLES

MDA Maritime Domain Awareness

MIDB Modernized Integrated Database

MILSATCOM Military Satellite Communications

MLS Multi-Level Security

MMS Multi-Level Mail Server

MUOS Mobile User Objective System

NATO North Atlantic Treaty Organization

NGEN Next Generation Enterprise Network

NIPRNET Unclassified but Sensitive Internet Protocol Network

NIST National Institute of Standards and Technology

NLP Natural Language Processing

NMCI Navy Marine Corps Intranet

OCONUS Outside the Continental United States

ONE-NET OCONUS Navy Enterprise Network

ONR Office of Naval Research

OTH Over-The-Horizon

PaaS Platform as a Service

PCMT Personal Computer Message Terminal

RCS Radio Communications System

RF Radio Frequency

SaaS Software as a Service

SATCOM Satellite Communications

SHF Super High Frequency

SIPRNET Secret Internet Protocol Network

SNR Sub-Network Relay

STANAG Standard NATO Agreement

SSDS Ship Self-Defense System

TADILs Tactical Digital Information Links

xiii

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 16: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

TAO The Associations and Objects

TDL Tactical Data Link

TMS Tactical Management System

UAV Unmanned Aerial Vehicle

UHF Ultra-High Frequency

US United States

UUV Unmanned Underwater Vehicle

VHF Very High Frequency

WAN Wide Area Network

WGS Wideband Global SATCOM

xiv

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 17: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 1

Introduction

All branches of the armed forces are exploring the adoption of cloud computing. By pushing appli-cations into the cloud, rather than relying on local computer resources, many of the deployment,security, redundancy, scalability and flexibility issues for tactical information systems are simplified[1] and presents many benefits. These can be summarised as gain in [2] :

• Efficiency : improved assets utilisation, removal of duplicative systems and improvement ofproductivity in application development and management compared to a current low serverutilisation and systems that are complex and difficult to manage.

• Agility : Near-instantaneous increase and reduction in capacity as well as as-a-Service ca-pabilities, compared to the current state where years are required to build new data centresand new services, and where months might be required to increase the capacity of existingservices.

• Innovation : easier to tap into private sector innovation and provide a much better link toemerging technologies.

However, unlike traditional cloud computing applications that benefit from highly reliable, always-on, high-bandwidth networks, the military cannot be guaranteed such stability. As such, differentsituations will arise, particularly in the case of at-sea data exchange. Cloud-based, cloud-to-disadvantaged-cloud and cloud-to-no-cloud situations will occur. Therefore there is a need toinvestigate different strategies for information management and distribution in a cloud environ-ment.

This document investigates the management of information in a cloud environment, more specifi-cally for at-sea data exchange, and is organised as follows :

• Section 2 briefly describes cloud computing architectures and other concepts related to cloudcomputing such as interoperability and standardisation efforts and presents 2 reference cloudarchitectures.

1

Page 18: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

• Section 3 examines specific cloud architectures of some major online service providers andtheir information management models.

• Section 4 presents unique characteristics that are typical of a naval at-sea environment, ascompared to the ashore environment.

• Section 5 documents the interplay between the information management models and theunique characteristics of at-sea data exchange environment.

• Section 6 summarises the information acquired during the completion of this call-up andserves as the general conclusion of this document.

2

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 19: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 2

Cloud Computing Architectures

There are many definitions and also many applications of cloud computing. There are usually twoparts in the cloud computing, the client and the provider. The provider is providing IT resourcesas a service (XaaS), where IT resources can be hardware, platforms or applications. The clientuses transparent and easy to use interface as an abstraction of a complex infrastructure. Thecomplex infrastructure is usually maintained by a group of experts. The infrastructure is usuallya distributed system of computers able to deliver a high level of reliability.

This section is organised as follows :

• Section 2.1 presents a high level definition of the cloud architectures and principles.

• Section 2.2 presents Cloud Type of service and deployment models.

• Section 2.3 presents cloud messaging architectures.

• Section 2.4 provides some remarks about interoperability in cloud computing.

• Section 2.5 presents some generic architectures for e-commerce and social networks.

2.1 Overview of Cloud Computing

Cloud Architectures are designs of software applications that use Internet-accessible on-demandservices. Applications built on Cloud Architectures are such that the underlying computing infras-tructure is used only when it is needed (for example to process a user’s request), draw the necessaryresources on-demand (like compute servers or storage), perform a specific job, then relinquish theunneeded resources and often dispose of themselves after the job is done. While in operation theapplication scales up or down elastically based on resource needs [3].

There are some clear business benefits to building applications using cloud architectures. A few ofthese are listed here [3]:

3

Page 20: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

• Almost zero upfront infrastructure investment if the application is hosted within a publiccloud;

• Just-in-time Infrastructure;

• More efficient resource utilisation;

• Usage-based costing;

• Potential for shrinking the processing time.

Some key organisational concerns can act as barriers to the adoption of cloud computing:

• Interoperability: The cloud-computing community has not yet defined a universal set ofstandards or interfaces, resulting in a significant risk of vendor lock-in.

• Latency: All access to the cloud occurs through a network (or the internet in the caseof public clouds), introducing latency into every communication between the user and theenvironment.

• Legal issues: Because cloud vendors tend to locate server farms and data centres where theyare cheaper to operate, some cloud-computing users have concerns about jurisdiction, dataprotection, fair information practices, and international data transfer.

• Platform or language constraints: Some cloud environments provide support for specificplatforms and languages only.

• Security: The key concern is data privacy; in most cases, organisations do not have controlof or know where cloud providers store their data.

According to NIST [4], a cloud computing infrastructure is composed of five essential characteris-tics:

1. On demand self-service : Cloud clients must be able to access the service whenever it isneeded. Resources must be expandable without requiring a downtime.

2. Broad network access : Capabilities and services are available over the network throughstandard interfaces and accessed through standard mechanisms using heterogeneous thin orthick client platforms (e.g., mobile phones, tablets, laptops and workstations).

3. Resource pooling : The provider must aggregate the computing unit such as it can assigndynamically the amount of resource allocated to each service or client. It mush be transparentto users, and precautions must be taken to ensure security and privacy.

4. Rapid elasticity : The amount of resource provided must scale rapidly to the demand of theclient.

5. Measured service : The provider must control and optimise resources using metered datacollected from different services provided.

4

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 21: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 2. Cloud Computing Architectures

2.2 Cloud Type of Service and Deployment Model

2.2.1 Type of Services

Cloud services are always depicted as a cloud owner providing services to clients. There arecommonly three levels of services offered by the cloud provider. Every level of service offers all thebasic features of cloud but range from a number of services offered. There are commonly threelevels of abstractions of cloud services: Software as a Service (SaaS), Platform as a Service (PaaS)and Infrastructure as a Service (IaaS).

IaaS provide everything needed to run an operating system in a virtual environment. The infras-tructure provided is usually one or multiple distributed servers running a virtualisation server. Itusually offers a simple web interface to manage the basic physical action (boot, reboot, status,mount DVD) and a network access to the OS. IaaS offers the full control of the virtual computer.The infrastructure may be designed to offer key features mentioned by NIST.

At this level, the most important features are those that guaranty the up-time. Depending onthe hardware architecture, there are a lot of features to consider, the most important being highbandwidth, network redundancy, dynamic scaling and failure recovery. Common applications arestorage, Content Delivery Network (CDN) and computing.

PaaS offers more services than IaaS. PaaS offers all the tools and libraries needed for development ofan application. The provider takes care of all the infrastructure and management of the platform.It manages all the networks, the visualisation and operating system update and security. Theclient brings its own code in a preconfigured architecture based on his needs.

Finally, SaaS offers the most abstraction to the user. It is basically an application accessible viathe cloud. The provider takes all the working operational aspects and leaves the configuration tothe consumer. The consumer uses the application without installing or developing the application.Software on the cloud can be of various domains, from Email clients, social platforms and storagesolutions.

2.2.2 Deployment Model

NIST [4] also refers to different types of deployment. The deployment refers to where it is hostedand who manages the cloud. Four types of deployment are defined in [4] :

• Public cloud is provided for public use by the general public. This model is a true repre-sentation of cloud hosting; in this the service provider renders services and infrastructure tovarious clients. This model is better suited for business requirements which require managingthe load; host application that is SaaS-based and manage applications that many users con-sume. Due to the decreased capital overheads and operational costs this model is economicalfrom the technical viewpoint, there may be slight or no difference between private and publicclouds’ structural design except in the level of security offered for various services given tothe public cloud subscribers by the cloud hosting providers.

5

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 22: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

Where is hosted Who manage

Private In-house or cloud provider Company or company / cloud provider

Community One company or cloud provider Host or cloud provider

Public Cloud provider Cloud provider

Hybrid One/many companies and/or cloud provider One/many companies and/or cloud provider

Table 2.1: Possible arrangements of different cloud deployment models

• Private cloud is used by a unique organisation for every purpose it needs and is also knownas internal cloud. The platform for cloud computing is implemented on a cloud-based secureenvironment that is safeguarded by a firewall. As it permits access only the authorisedusers, the private cloud gives the organisation greater and direct control over their data.Businesses that have dynamic or unforeseen needs, assignments which are mission critical,security alarms, management demands and uptime requirements are better suited to adoptthis model. Obstacles with regards to security can be evaded in a private cloud, but in caseof a natural disaster and internal data theft the private cloud may be prone to vulnerabilities.

• Hybrid cloud is a composition of multiple infrastructures that are run in parallel but boundedby the same technology. They can be run on the same hardware to load balance usage ofservers. A hybrid cloud can cross isolation and overcome boundaries by the provider; hence,it cannot be simply categorised into public, private or community cloud. It permits the userto increase the capacity or the capability by aggregation, assimilation or customisation withanother cloud package / service. In a hybrid cloud, the resources are managed and providedeither in-house or by external providers. Resources that are non-critical like developmentand test workloads can be housed in the public cloud that belongs to a third-party provider.While the workloads that are critical or sensitive must be housed internally. Businesses thathave more focus on security and demand for their unique presence can implement hybridcloud as an effective business strategy. When facing demand spikes the additional resourcesthat are required by a particular application can be accessed from the public cloud.

• Community cloud is used by multiple consumers for a single purpose they all share. It canbe owned and operated by multiple organisations. A community cloud is appropriate fororganisations and businesses that work on joint ventures, tenders or research that needs acentralised cloud computing ability for managing, building and implementing similar projects.

There are many considerations when choosing the deployment model. One issue that informationexperts, computer scientists and entrepreneurs debate is the concept of data ownership. Who ownsthe data stored in a cloud system? Does it belong to the client who originally saved the data tothe hardware? Does it belong to the company that owns the physical equipment storing the data?What happens if a client goes out of business? Can a cloud storage host delete the former client’sdata? Opinions vary on these issues.

6

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 23: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 2. Cloud Computing Architectures

2.3 Cloud Messaging Architecture

Cloud services use several nodes that may need to communicate with each other. There are fewmessaging architectures that may be applied in different kinds of situations. Those exposed inthis section are Request/Response, followed by Publish/Subscribe with two sub-implementationscalled Push and Pull. Finally, a comparison of those messaging architectures will be depicted.

2.3.1 Request / Response

The Request / Response architecture is simple. It is usually for one on one communication. Thecommunication starts when the requester opens a connection and sends a request to the replier;the replier then processes the request and sends a response to the requester. The communica-tion is often in synchronous mode, meaning that the requester sends a request and waits for aresponse message. Also, every message is sent sequentially and the connection is open until thecommunication ends. Many protocols can be used to do a Request/Response.

2.3.2 Publish/Subscribe (pub/sub)

Figure 2.1: Publish Subscribe Flow

The publish / subscribe messaging architecture brings a high level of flexibility and scalability.

7

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 24: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

Push Pull

Latency Delivery is immediate (no addedlatency)

Delivery on demand. Delays mayoccur between message publicationand delivery

Message handling Automatic delivery, client needsto be listening

On-demand, More client-side con-trol

Flow control Implicit acknowledgement Explicit acknowledgement

Network usage Low network usage, more effi-cient

High network usage if polling

Table 2.2: Push vs Pull in Publish/Subscribe system

Pub/sub helps to manage communication using a middleware also called the message broker orevent bus. The message broker manages the interconnection between subscriber and publisher.The publisher only needs to send a message, including the description of the message and a topic, tothe message broker. The event bus puts the messages into the message store until they are deliveredto the subscriber. All of the subscribers have to send an acknowledgement of the message andthen the message can be deleted from the message store of the hub.

The pub/sub architecture presents a great decoupling of the publisher and the subscriber. Thepublisher only has to know where is the pub/sub hub instead of managing all the output sources.In the same way, the subscriber only has to subscribe to one topic to receive information frommultiple sources. The message broker needs to be highly scalable to be able to serve publishersand subscribers that may use the system. The publish data model also needs to be well-definedsince any modification can affect publisher operability with the system.

2.3.2.1 Push / Pull

Push/Pull is a specification of publish/subscribe system that focuses on the subscriber. Thepublisher still publishes a message to the message broker using a topic and the message broker stillreceives and stores the message. Then, the subscriber has two choices to get the message. The firstchoice is to subscribe to the message broker to receive ’push’ update on a particular topic. Thesecond choice is to ’pull’ update from the message broker when needed. There are no limitationsto use both patterns, delivering push message to push subscriber and serving pull subscriber whenasked. In any case, advantages and disadvantages are exposed in Table 2.2.

8

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 25: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 2. Cloud Computing Architectures

Figure 2.2: Push vs Pull

2.4 Cloud Interoperability

Interoperability can be defined as a measure of the degree to which diverse systems or componentscan work together successfully. More formally, IEEE and ISO define interoperability as the abilityfor two or more systems or applications to exchange information and mutually use the informationthat has been exchanged. In the context of cloud computing, interoperability should be viewed asthe capability of public clouds, private clouds, and other diverse systems within the enterprise tounderstand each other’s application and service interfaces, configuration, forms of authenticationand authorisation, data formats etc. in order to cooperate and interoperate with each other.Interoperability requires standard data models and communication technologies compatible withthe existing Internet infrastructure. It is determined both at the data and service level.

The core cloud interoperability problem is that cloud providers have not done a good job coordi-nating the use of languages, data, interfaces and other subsystems that are now largely proprietary.There are different levels of system interoperability. Technical interoperability is about exchangingdata, semantic interoperability is about exchanging meaningful data, and organisational interop-erability is about participating in multi-organisational business processes [5].

In general, the cloud-computing community sees the lack of cloud interoperability as a barrier tocloud computing adoption because organisations fear vendor lock-in. Vendor lock-in refers to asituation in which, once an organisation has selected a cloud provider, either it cannot move to

9

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 26: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

another provider or it can change providers but only at great cost [5].

The greatest level of interoperability is likely to be found for IaaS cloud services, where function-ality is often broadly equivalent and there are a number of standard interfaces - some formallystandardised such as Cloud Data Management Interface (CDMI), others being de facto standardsin the marketplace. PaaS cloud services have lower levels of interoperability. There are few in-terface standards for PaaS functionality, although there are some open source platforms that arebecoming popular in the marketplace and where different cloud service providers use the sameopen-source platform, their interfaces are either identical or closely equivalent.

It is SaaS applications which present the greatest interoperability challenge today. There are veryfew standard APIs for SaaS applications - even switching from one SaaS application to anotherSaaS application with comparable functionality typically involves a change in interfaces. There isa resultant impact on both end users of the cloud service for any user interfaces and also on anyapplication or system belonging to the cloud service customer that use APIs offered by the SaaSapplication.

Within any joint force or coalition, the case of interoperability will be of tremendous importancefor efficient data exchange. From cloud-to-cloud to cloud to no-cloud cases, interoperability willrequire the use of standards. According to [6], IaaS is the service model that would most benefitfrom standardisation because the main building blocks of IaaS are workloads represented as virtual-machine images and storage units that vary from typed data to raw data.

For data migration, query and exchange, standard efforts such as CDMI and the Amazon S3 API,which multiple providers support, would enable users to extract data from one provider and uploadit to a different provider. If a cloud provider implements these standard interfaces using SOAP-or REST-based protocols, the cloud will offer the advantages of ease of development and toolavailability. However, these standards are more useful for raw data that is not typed (e.g., virtual-machine images, files, blobs) because the cloud resource in this case simply acts as a container andusually does not require data transformation. For typed data, data migration would occur similarlyto any other data migration task: users must extract data from its original source, transform it toa format compatible with the target source, and upload it into the target source, which could bea complex process.

As a note for the reader, the CDMI defines the functional interface that applications will use tocreate, retrieve, update and delete data elements from the Cloud. As part of this interface the clientwill be able to discover the capabilities of the cloud storage offering and use this interface to managecontainers and the data that is placed in them. In addition, metadata can be set on containers andtheir contained data elements through this interface. The capabilities of the underlying storageand data services are exposed so that clients can understand the offering. Regarding the AmazonS3 API, it provides various operations, related request and response structures, and error codes.It enables one to store data in the cloud and then download the data or use the data with otherAWS services, such as Amazon Elastic Compute Cloud (Amazon EC2).

10

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 27: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 2. Cloud Computing Architectures

2.5 Examples of Cloud Architectures

This section presents some high-level description of the requirements of cloud architectures fortwo of the most important online service types, namely : Social Network and e-commerce. Bothof these online service types serve very different purpose and are based on a very distinct set ofrequirements. As such, their underlying architecture and information management models are verydifferent. The remaining sections of this chapter will give an overview of each of these service typeand the specific services will be described in greater detail in section 3.

2.5.1 Cloud Architecture for Social Network

Social networks are networks of users connected through relationships such as friendship, followingor otherwise. Through these relationships, users are able to share content amongst themselves [7].On these sites, one of the greatest concerns has been the security and privacy of personal data[8]. Cloud computing presents the same general advantages to social applications: a significantdecrease in operational and infrastructure costs, along with the ease of scalability to meet theincreasing or decreasing needs of the applications.

Cloud computing and social networks have numerous examples of being used together. Typicallythese involve the social network being hosted on a cloud platform or social applications beinghosted on the cloud. Recent research has explored the idea of building cloud infrastructure leaningon the social network for the established relationships and user management it provides. Thefunctional requirements of a social network are [9]:

• Cloud-based : responds to changes in processing demand by modifying the amount of avail-able computing resources;

• Database storage : required for persistent storage and optimised data retrieval;

• Social Network : provides users with a richer content discovery experience by allowing usersto obtain meaningful content suggestions;

• Natural Language Processor : allows the articles in the system to be analysed and used forcontent suggestions and discovery;

• Suggestion engine : meaningful content discovery tool for users;

• Content Layout System : provides content producers with an experience similar to that of adesktop editor application;

• User Interface : renders the content for the reader the way the content producer envisionedit to be;

• Business Logic : user access to content should be controlled in order to differentiate betweena user who may edit content of a given article or publication and a user who may only viewthe content.

11

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 28: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

The cloud architecture that is commonly used through social applications does not differ fromthe typical cloud architectures. PaaS is commonly used for social applications as a total solutionfor social application development. Social applications can be designed as applications on top ofexisting social networks or as separate applications [8].

The established relationships within the social network are used to map certain resources andservices to particular users. For example, resource sharing can be done only with friends, ormembers of the same group. The application itself serves as the type of marketplace where theactual services or resources can be obtained.

The flexibility of cloud services to scale up and down to meet the resource needs fits well with thedynamic nature of the social network. Many existing applications fall under the social networkcategory, namely :

1. Facebook;

2. YouTube;

3. Twitter;

4. LinkedIn;

5. Flickr;

6. eHarmony.

The case of Facebook’s architecture and IM model will be discussed in greater detail in section 3.1

2.5.2 Cloud Architecture for e-Commerce

Cloud is a natural fit for e-commerce because you can provision and pay for resources when you needthem instead of building enormous static environments scaled for peaks [10]. Also, e-commercepresents some unique characteristics which lead to their platform being architected and deployeddifferently than on most other systems. Such characteristics are :

• Revenue generation : most revenue now flows through an organisation’s ecommerce platform.A platform-wide outage will prevent an entire organisation from taking in revenue and causedamage to the brand reputation;

• Visibility : high visibility characterises most ecommerce platforms, often serving as thepublic face, and increasingly the back office of an organisation. Every millisecond in delayedresponse time reflects more poorly on that brand;

• Traffic spikiness : subject to often unpredictable spikes in traffic that are one or two ordersof magnitude larger than steady state;

• Security : organisations are often liable for breaches, with even small breaches costing tens ofmillions of dollars, not to mention the negative publicity and loss in confidence by customers.Breaches tend to be far-reaching, with all data under management exposed;

12

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 29: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 2. Cloud Computing Architectures

• Statefulness : the challenge with ecommerce is that customers often browse anonymously foran extended period of time before they identify themselves by logging in. Most large websitesforce you to log in immediately (e.g., social media, email, online banking). When you login to an ecommerce website, you have to merge everything that happened throughout thesession with the data that’s been persisted about the customer.

In order to develop a cloud-based e-commerce solution, one must consider, at a minimum, thefollowing [11] :

• Storage of credit card data or integration of third-party payment options (examples : PayPal,etc. . . );

• Catalogue display logic and shopping cart logic;

• Database storage.

In addition, some very large e-commerce sites are starting to deliver more personal experience totheir users to maximise revenue by modifying the search results to present the buyers with itemsthey are more inclined to buy. In order to achieve this, a lot of behavioural data has to be storedand processed. This raises the concern for enhanced data security requirements. Most e-commercesites also include some real-time risk and fraud detection system.

Within an e-commerce site, a data-centric catalogue component is distributed in net-based data-centres to support application scaling in peak situations. The noncritical catalogue data can bedistributed in the cloud, for accessing personal data and for processing transactions. The waydata are stored and retrieved significantly impacts time it takes to complete online transactions.Typically three options are available :

• Cloud databases : high performance SQL database on the cloud. The Cloud Databasesarchitecture is built for high and consistent performance, with container-based virtualisation;

• Database installed on a Cloud Server connected to Cloud Block Storage. It enhances thenetwork security of Cloud Servers by running shopping cart and database servers on anisolated Cloud Network to filter illegitimate traffic from catalogue display. Cloud BlockStorage attached to a Cloud Server can provide extra space that the database will require;

• Running a database on a dedicated server.

The possible architectures of an e-commerce applications are:

• Two-tier Architecture (client-server) : data resides on a server. Business logic and userinterfaces reside on the client’s server. However, the drawbacks are that the clients sustainthe main load and consequently resulting in monolithic and heavyweight implementationwith excessive overhead and is simple but unsuitable for e-commerce applications;

13

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 30: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

• Three-tier architecture : It separates the business logic of the application from user interfacesand from data access. The middle tier can be further divided. In this case it’s called multi-tier architecture with multiple components where it is easier to modify one component andthe application has lower costs to deploy and maintain.

The case of eBay’s architecture and IM model will be discussed in greater detail in section 3.2.

2.5.3 Conclusion

This section presented an overview of the cloud services and communication types along with anoverview of social networks and e-commerce internet services. The requirements for storage, datadiscovery and distribution and how privacy is handled are very different from one service type toanother. The next section will describe in greater detail how some major service providers arecurrently dealing with the requirements of a cloud-based social network and online marketplace,namely Facebook and eBay.

14

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 31: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 3

Architectures and InformationManagement Models

This section presents the cloud architectures and information management models used withinsome of the major online service providers. The selection of the architecture and models wasbased on the availability of the information, which is mainly found in the service technical blogs.The most comprehensive technical information was found on the blog of or in several presentationsof:

• Facebook;

• Twitter;

• Netflix;

• eBay.

All of the above are giants of the Internet which face both unique and common problems. Face-book and Twitter are very similar services and while even if Twitter and eBay are very different,their problems relative to information search is very similar. We will focus on Facebook and eBayas they are two of the major service providers in very different domains, one in the Social Net-working the other on the e-commerce side. However, some very interesting technologies, now opensource, have been developed by all of the major online service providers and all of them are usingtechnologies/libraries developed by others to solve their problems. As an example, eBay is usingNetflix’s Hystrix library to improve its application resiliency.

Note that the architectures and models described in the following sections might not reflect accu-rately the current state of Facebook and eBay services as the technology and requirements of bothservices evolve and change very quickly.

15

Page 32: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

3.1 Facebook

Facebook has more than a billion active users who record their relationships, share their interests,upload text, images, and video, and curate semantic information about their data. The personal-ized experience of social applications comes from timely, efficient, and scalable access to this floodof data, the social graph. It is one of the largest online services and the following numbers displayat a glance its current state :

• Facebook has over 1.393 billion monthly active users,

• 890 million people log into Facebook daily,

• Facebook stores more than 300 petabytes of user data (1 petabyte holds the equivalent of223,000 DVDs),

• There are 4.5 billion Facebook likes every day,

• Each minute there are 3,125,000 new likes,

• The total number of uploaded Facebook photos, as of February 2014, is 400 billion,

• On average 350 million photos are uploaded daily to Facebook,

• 243,055 new photos are uploaded to Facebook every minute,

• On average, there are 4.75 billion items shared by Facebook users each day,

• 10 billion Facebook messages are sent each day.

3.1.1 Facebook’s Cloud Architecture

Figure 3.1 represents the different backend services used at Facebook to enable the support forsuch a high demand of users queries and interaction with their online service.

This section is organised as follows:

1. First, Facebook’s storage solution is presented;

2. Then, Wormhole, the publish-subscribe system developed for use within Facebook is pre-sented;

3. Finally, its messaging service is presented.

16

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 33: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 3. Architectures and Information Management Models

Figure 3.1: The Facebook Stack

3.1.1.1 Storage

As Facebook has grown, and the amount of data shared per user has grown, thus storing dataefficiently has become increasingly important. An important class of data that Facebook stores isBinary Large OBjects (BLOBs), which are immutable binary data. BLOBs are created once, readmany times, never modified, and sometimes deleted. BLOB types at Facebook include photos,videos, documents, traces, heap dumps, and source code [12].

With BLOBs storage being one of the biggest demands on the service, it’s important for Facebookto understand just what users are doing with them, as how they are viewed and shared determineshow storage needs to be designed [13]. Facebook has found that there is a strong correlationbetween the age of a BLOB and its request rate. As such Facebook qualifies a BLOB using whatthey call its temperature. Three levels of temperature are used : hot, warm and cold.

Newly created BLOBs are requested at a far higher rate than older ones and are considered to behot. The request rate for week-old BLOBs is an order of magnitude lower than for less-than-a-day old content. Content less than one day old receives more than 100 times the request rate of

17

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 34: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

one-year old content. The request rate drops by an order of magnitude in less than a week, andfor most content types, the request rate drops by 100x in less than 60 days. Similarly, there is astrong correlation between age and the deletion rate: older BLOBs see an order of magnitude lessdeletion rate than the new BLOBs. These older content is called warm, not seeing frequent accesslike hot content, but they are not completely frozen either.

Understanding that led to a redesign of Facebook’s storage architecture, with the following services[13]:

1. Haystack (section 3.1.1.1.1) : a storage service for hot BLOBs that are often requested;

2. f4 (section 3.1.1.1.2) : a slower tier, for warm storage of BLOBS that have not quite stoppedbeing requested;

3. The cold storage facility (section 3.1.1.1.3) : extra copies are deleted and BLOBs are movedto an offline facility.

Figure 3.2: Overall BLOB storage architecture ( C: creates, D : delete, R : read ), where createsand most deletes are handled by Haystack. Reads are handled by either Haystack or f4 (from [14])

This way there’s performance for the short period where friends and family are sharing imageswith each other and then a focus on durability in the long term. If an image becomes popular, itcan be moved out of a cold storage tier for a short period, until its re-found popularity fades again[13].

Facebook also has a distributed data store for the Social Graph, which is called The Associationsand Objects (TAO), it is described in more detail in section 3.1.1.1.4. Data storage systems aregeo-replicated with a single master, multiple slaves topology [15].

18

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 35: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 3. Architectures and Information Management Models

3.1.1.1.1 Haystack

Haystack is an object store for sharing BLOBs on Facebook where data is written once, read often,never modified, and rarely deleted. Haystack storage system was designed because traditional filesystems perform poorly under the Facebook workload [16].

On a traditional file system, several disk operations are necessary to read a single BLOB. Whileinsignificant on a small scale, multiplied over billions of items and petabytes of data, accessingfilesystem metadata to retrieve the desired BLOB becomes the throughput bottleneck.

To resolve the disk access bottleneck, the Haystack system keeps all the metadata in main memory.In order to do so, it dramatically reduces the memory used for filesystem metadata. Since storing asingle photo per file results in more filesystem metadata than could be reasonably cached, Haystacktakes a straightforward approach: it stores millions of photos in a single file and therefore maintainsvery large files.

It consists of 3 core components ([17]) :

1. The Haystack Store: the Store encapsulates the persistent storage system for photos and isthe only component that manages the filesystem metadata for photos;

2. The Haystack Directory: the directory maintains the logical to physical mapping along withother application metadata, such as the logical volume where each photo resides and thelogical volumes with free space;

3. The Haystack Cache: the Cache functions as an internal CDN, which shelters the Store fromrequests for the most popular photos and provides insulation if upstream CDN nodes failand need to refetch content.

When the browser requests a photo (see figure 3.3), the webserver uses the Directory to constructa URL for the photo, which includes the physical as well as logical volume information. Each Storemachine manages multiple physical volumes. Each volume holds millions of photos. A physicalvolume is simply a very large file (100 GB) saved as /hay/haystack-logical volumeID.

Haystack is a log-structured append-only object store. A Store machine can access a photo quicklyusing only the ID of the corresponding logical volume and the file offset at which the photo resides.This knowledge is the keystone of the Haystack design: retrieving the filename, offset, and size fora particular photo without needing disk operations. A Store machine keeps open file descriptorsfor each physical volume that it manages and also an in-memory mapping of photo IDs to thefilesystem metadata (i.e., file, offset and size in bytes) critical for retrieving that photo. Eachphoto stored in the file is called a needle. A complete and detail description of the Haystack objectStore can be found in [17]

3.1.1.1.2 f4 : Warm BLOB Storage

Warm content is now representing more than 89 percent of objects in Facebook’s storage [12].F4 was introduced by Facebook as a warm BLOB storage system because the request rate for its

19

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 36: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

Figure 3.3: Serving a photo using Haystack (from [16])

content is lower than that for content in Haystack and thus is not as hot. Warm is also in contrastwith cold storage systems that reliably stores data but may take a long time to retrieve it, whichis unacceptable for user-facing requests.

The data and index files are the same as Haystack, the journal file is new. The journal file is awrite-ahead journal with tombstones appended for tracking BLOBs that have been deleted. F4keeps dedicated spare backoff nodes to help with BLOB online reconstruction. F4 currently storesover 65PB of logical data. As such, it saves over 53PB of storage with a 1.2 replication factor.

3.1.1.1.3 Cold Storage

Facebook’s cold storage facility is very different from a traditional data centre. Instead of beingdesigned for performance, it’s designed above all to be energy efficient and for great data durability.High storage efficiency means no UPS, no redundant power supplies, and no generators. The facilitywas built from scratch in eighteen months, right down to the custom software it uses. It’s a lot

20

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 37: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 3. Architectures and Information Management Models

quieter than a typical data centre, as the service spins down unused drives to save energy [13].

3.1.1.1.4 TAO, Storage of the Social Graph

TAO distributed data store is a system purpose-built for the storage, expansion and, most impor-tantly, delivery of the complex web of relationships among people, places and things that Facebookrepresents, i.e. the Social Graph (see section 3.1.2 for more details about the Social Graph). Asingle Facebook page may aggregate and filter hundreds of items from the social graph. SinceFacebook presents each user with customised content (which needs to be filtered with privacychecks) an efficient, highly available, and scalable graph data store is needed to serve this dynamicread-heavy workload [18].

Before TAO, Facebook’s web servers directly accessed MySQL to read or write the social graph.It aggressively caches objects and associations to provide good read performance. TAO’s memorymanagement is based on Facebook’s customised Memcached. The TAO data store implements agraph abstraction directly. TAO implements an object and association model and continues to useMySQL for persistent storage, but mediates access to the database and uses its own graph-awarecache.

According to [18], TAO objects are typed nodes, and TAO associations are typed directed edgesbetween objects. Objects are identified by a 64-bit integer that is unique across all objects,regardless of the object type. Associations are identified by the source object, association typeand destination object. At most one association of a given type can exist between any two objects.Both objects and associations may contain data as key-value pairs. A per-type schema lists thepossible keys, the value type, and a default value. Each association has a 32-bit time field, whichplays a central role in queries. To handle multi-region scalability, Tao employs replication usingthe per-record master idea. TAO can sustain a billion reads per second on a changing data set ofmany petabytes.

The caching layer consists of multiple cache servers that together form a tier. A tier is collectivelycapable of responding to any TAO request. Each request maps to a single cache server using asharding scheme. Clients issue requests directly to the appropriate cache server, which is thenresponsible for completing the read or write. For cache misses and write requests, the servercontacts other caches and/or databases. TAO validates the graph data model as a way to accessdata on challenging read-dominated loads that social networks and sites with similar workloadsface.

3.1.1.2 Wormhole Pub-Sub System

Wormhole is a publish-subscribe system developed for use within Facebook’s geographically repli-cated data centres. When a user posts content to Facebook, it is written to a database. There arenumerous systems that need the newly updated data to function correctly. For instance, Facebookaggressively employs caching systems such as Memcached and TAO so the underlying storage sys-tems are not inundated with read queries. Similarly, Graph Search (see section 3.1.2) maintainsan index over all user generated data so it can quickly retrieve queried data. On a write, cached

21

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 38: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

and indexed copies of the data need to either be invalidated or updated.

Directing each application to poll the database for newly written data would be untenable asapplications have to decide between either long poll intervals which lead to stale data or frequentpolling which interferes with the production workload of the storage system. Publish-subscribesystems that identify updates and transmit notifications to interested applications offer a morescalable solution.

There are a number of challenges that an update dissemination system deployed at Facebook needsto handle [15] :

1. Different consumption speeds: Applications consume updates at different speeds. A slowapplication that synchronously processes updates should not hold up data delivery to a fastone;

2. At least once delivery: All updates are delivered at least once. This ensures that applicationscan trust that they have received all updates that they are interested in;

3. In-order delivery of new updates: When an update is received, the application should beconfident that all updates prior to the received one have also been received earlier;

4. Fault tolerance: The system must be resilient to frequent hardware and software failuresboth on the datastore as well as the application end.

Within wormhole, producers produce data and write to datastores. Publishers read the trans-action logs of datastores, construct updates from them, and send them to subscribers of variousapplications, which in turn do application specific work, e.g., invalidate caches or update indices.

However, the impact of malfunctioning of that system affects some Facebook users much more thanothers. For example, suppose Wormhole publishers are malfunctioning on 1 percent of datastoremachines so that 1 percent of the cache is stale. This would cause 100 percent of cached data for1 percent of the users to be stale and not 1 percent of the cached data for 100 percent of the users.This makes the reliability of the publishers very important in the cloud environment.

On the other hand, in order to render the news feed when a user connects to Facebook, the pullmodel is preferred. In a social graph, the number of incoming edges is much smaller than theoutgoing ones and the pull mechanism is putting less pressure on the infrastructure.

3.1.1.3 Facebook Messages Stack

Facebook messager (FM) is a messaging system that enables Facebook users to send chat and email-like messages to one another. It handles millions of messages each day. FM stores its informationwithin HBase (and thus, Hadoop Distributed File System (HDFS)). Users of FM interact with aweb layer, which is backed by an application cluster, which in turn stores data in a separate HBasecluster. The application cluster executes FM-specific logic and caches HBase rows while HBaseitself is responsible for persisting most data (small messages, Message metadata (thread/messageindices), search index). Large objects (e.g., message attachments) are an exception; these arestored in Haystack because HBase is inefficient for large data.

22

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 39: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 3. Architectures and Information Management Models

Figure 3.4: Components of wormhole from ([15])

HBase is used as the underlying support database because of the following specifications:

• High write throughput;

• Good random read performance;

• Horizontal scalability;

• Automatic Failover;

• Strong consistency;

• Benefits of HDFS;

– Fault tolerant, scalable, checksums, MapReduce;

– Internal devops expertise at Facebook.

23

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 40: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

FM uses the standard for the Format of ARPA Internet Text Messages RFC822.

3.1.2 The Social Graph, How Information is Discovered and Shared

Facebook maintains a database of the inter-relationships between the people and things in thereal world. These entities and connections are modelled as nodes and edges in a graph, whichis called the Social Graph. It is a directed graph and it consists of nodes signifying people andthings; and edges representing a relationship between two nodes [19]. This representation is veryflexible; it directly models real-life objects, and can also be used to store an application’s internalimplementation-specific data [18]. This blend of general information and social context in a singlegraph makes Facebook a rich source of content, and a unique data set [20].

Although there are many billions of nodes in the social graph, it is quite sparse: a typical nodewill have fewer than one thousand edges connecting it to other nodes. The most popular pagesand applications have tens of millions of edges, but these pages represent a tiny fraction of thetotal number of entities in the graph [19]. Both nodes and edges have metadata associated withthem. For example, the node corresponding to a person will have a name, a birthday, etc. and thenode corresponding to a company’s page will have its title and description as metadata. Nodes inthe graph are identified by a unique number called the fbid [20].

The Graph Search engine is built upon highly structured data in the form of a graph, representinghundreds of types of nodes and thousands of types of edges. Users, Pages, places, photos andposts are all nodes in the graph, each with structured information of its own nature. For example,users have gender information, places have addresses, and photos have posting dates. Moreover,the nodes are connected to each other in various ways. A user can like a page, study at a school,live in a city, be in a relationship with another user, check in at a place, and comment on a photo.A photo, in turn, can be tagged with a user, and be taken at a place. Graph Search is designedtowards understanding the user intent precisely and serving structured objects [21].

The kinds of entities searched are users, pages, places, groups, applications, and events [20]. It wasfound within Facebook that keyword-based search system would not be the best choice because ofthe fact that keywords, which usually consist of nouns or proper nouns, can be nebulous in theirintent. For example, friends Facebook can mean friends on Facebook, friends who work at FacebookInc., or friends who like Facebook the Page. Keywords, in general, are good for matching objectsin the graph but not for matching connections between the objects [21].

Therefore, the graph search system uses a query suggestions system. The query suggestions arealways constructed in natural language, expressing the precise intention interpreted by the system.This means a user knows in advance whether the system has correctly understood his intentbefore selecting any suggestion and executing a search [21]. The system also suggests options forcompleting one’s search as the user types into the typeahead, demonstrating what kinds of queriesit can understand [21].

The components of the architecture of Facebook’s natural language interface are [21]:

• Entity recognition and resolution, i.e., finding possible entities and their categories in aninput query and resolving them to database entries;

24

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 41: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 3. Architectures and Information Management Models

• Lexical analysis, i.e., analysing the morphological, syntactical and semantic information ofthe words/phrases in the input query;

• Semantic parsing, i.e., finding the top N interpretations of an input query given a grammarexpressing what one can potentially search for using Graph Search.

Unicorn (described in detail in [19]) is the primary backend system for Facebook Graph Searchand is designed to serve billions of queries per day with response latency less than a few hundredmilliseconds.

When a user types a query into the search box, the Natural Language Processing (NLP) moduleattempts to parse it based on a grammar. It identifies parts of the query as potential entities andpasses these parts down to Unicorn to search for them. The NLP module also provides hints as towhat kind of entity that maybe - for example, if the searcher types people who live in Sri, the NLPmodule sends the query sri to Unicorn suggesting a strong bias towards cities and places [22].

Once the results for all the search requests are back at the NLP module, it constructs all possibleparse trees with this information, assigns a score to each parse tree and shows the top parse treesas suggestions to the searcher [22]. Relevance indicators which influence the scoring process are[23]:

• Personal Context;

• Social Context;

• The query itself;

• Global popularity of an entity.

The search phase begins when the user has made a selection from the suggestions. The parse tree,along with the fbids of the matched entities, is sent back to a Top Aggregator which aggregatesthe results from several Unicorn entities [22].

Regarding the privacy of the content, when someone shares something on Facebook, he gets todecide exactly who can see that content. As such, a lot of what the graph search will find iscontent that is not public, but content that someone has shared with a limited audience. It isalso part of what makes Graph Search an interesting technical challenge to be built by Facebook.The system has to do an extraordinary amount of privacy checking in real time to deliver theexperience Facebook wants its users to have.

3.1.3 Analysis of Facebook’s Information Model

Everything created on the Facebook infrastructure is integrated in the social graph as a node withits supporting metadata as soon as it enters the system, when a user uploads it. As such, we canstate that in the process chain, the model happens right at the beginning. This IM model is alsoable to deal with changing data distribution requirements. When a user restricts the permission

25

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 42: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

on a particular picture or reverts the privacy of something so that it becomes public content, thesearch infrastructure will take that into account. However, it is believed that this was a verychallenging technical achievement.

The BLOB storage in combination with the graph-based representation as an IM model can easilyaccommodate information products. The pictures, where tagged information are added, can beconsidered an information product, same for the geolocalisation feature over a particular news feedor post. Graph structured data do not require a schema to be fixed a priori so one can augmentthe available information on a particular BLOB / graph node as he sees fit without breaking theunderlying functionalities.

The level of metadata required by the model to support the management of the information is hugeand has a lot of diversity as the set of metadata attached to each node type in the Social Graphis different from type to type. As mentioned earlier, a person will have birth day, occupation,relationships while a place will be described with very different metadata. In relation with dataincest, the same information could technically be published by two different users and appear onone’s journal from different unrelated people (as an example, same pictures downloaded from theweb). However, once the entity is in the social graph and is shared by the users, it remains thesame node in the graph and its duplication is not possible except, like mentioned if it is uploadedagain by a different user. This leads to the data pedigree, which from our point of view is not fullyaccounted for in the metadata that supports the model. The same information can be technicallypublished multiple times by many different users. As such, more metadata would be required totruly make every information unique.

In terms of the international and national standards used in the IM model, Facebook had to inventeverything from the ground as the size of the service it provides was unseen before. Facebookdeveloped and published many open source libraries and modified many others in order to achievescale and performance. It uses open graph, metadata tags, which are now recognised by all othermajor platforms, and as such is becoming a de facto standard. Lots of the underlying infrastructureis based on MySQL, Apache software and Memcached. It has developed Apache Cassandra, butnow uses Apache HBase that runs on top of HDFS, which is becoming a standard in big dataanalytics which is now supported by all the major cloud providers. As such they use a mix ofkey-value (Memcached), SQL and Column family (type of NoSQL) and graph-based datastores.Apache ZooKeeper is also used for distributed synchronisation. Their messaging application isusing the RFC822 standard for the Format of ARPA Internet Text Messages.

Among the strength of the Facebook IM model, the graph data model is certainly on top. It allowsstorage of all the connections of the node along with the node itself, so that there is no additionalstep in computing connected data apart from reading the node into memory. The graph datamodel also prescribes that relationships have properties. Property rich relationships are absolutelycritical when trying to explore connected data. It is therefore quite obvious that the graph datamodel allows one to find interconnected data much faster and in a much more scalable manner.However, Hadoop and its associated technologies (such as Pig and Hive) were not designed mainlyto support scalable processing of graph-structured data, which can be a handicap if someone needsto perform MapReduce operations and analytics on such kinds of representations.

Graph-based model also makes it easier to express many kinds of data that require significantkludging to fit in a relational database. Certain kinds of searches that are very difficult in a rela-

26

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 43: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 3. Architectures and Information Management Models

tional database (i.e., any search where relationships between different kinds of data are important)are very quick and easy. It easily allows for new kinds of data and is very well suited to theirregular, complex data involved in mapping the real world.

The log-based storage of Haystack is also very performant if one needs to interact with data thatare only written once and read often. The use of BLOB temperature to choose a storage solutionis also very clever and reduces the resources needed and energy consumption which at Facebook’sscale can become gigantic.

However, as mentioned previously, if one requires operations on large amounts of data, the graphmodel can be very slow and it can use a lot of space. It is also not widely used in businessenvironments and it can be very easy to describe data inconsistently, which can quickly reducethe usefulness of the database. Generally, it also requires all data to exist explicitly in relation toother data.

Facebook has developed many open source libraries, which are all free to use. However, givenFacebook scale, the solutions that were developed by Facebook might be too much for anyoneelse’s problems.

3.2 eBay

eBay Inc. is an American multinational corporation and e-commerce company, providing consumerto consumer and business to consumer sales services via Internet. It is recognised as the largestonline market place and is also the owner of PayPal, one of the most popular third party servicesfor credit card transaction. eBay’s numbers are impressive, for instance :

• It manages 248,000,000 registered users;

• It manages over 1 Billion photos;

• 190 million items for sale in 50,000 categories;

• Over 8 billion URL requests per day;

• Roughly 10 percent of items are listed or ended every day;

• In 39 countries and 10 languages;

• 24x7x365;

• There are more than 44 billion SQL executions per day;

• 70 billion read / write operations / day;

• Processes 50 TB of new, incremental data per day;

• Analyses 50PB of data per day.

27

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 44: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

This section presents eBay’s system architecture and its information model and is organised asfollows:

• First, section 3.2.1 presents an overview of eBay’s architecture, with a focus on storage andinformation exchange;

• Then, section 3.2.2 presents eBay’s information management model and search engine.

Note that the information provided by the different sources about eBay is much less structuredand complete than the one provided by the technical papers from Facebook.

3.2.1 eBay’s Cloud Architecture

eBay runs one of its most crucial workloads - its website - on an OpenStack private cloud platformdeveloped by itself. The company wanted to build a robust, scalable, agile infrastructure thatcan support the IT demands of all its brands, including the PayPal and eBay websites. Servicesprovided by eBay, as in most e-commerce web services, each being a subsystem, are:

• Selling;

• Search;

• Item view;

• Bidding;

• User account;

• Checkout;

• Feedback.

eBay’s system has a 3-tier architecture with a web-enabled device (a browser), application andtransaction servers, and databases at the data services layer.

Their cloud architecture follows several patterns. First, everything is partitioned. Processing ispartitioned into pools, services and stages. Data are partitioned along usage boundaries. Theyalso follow a horizontal split (see section 3.2.1.2 for more details). No session state about the useris kept in the application tier. User session flow moves through the different application pools (i.ethe above mentioned subsystems).

eBay is built under a publish/subscribe paradigm. Primary use-case produces an event (ITEM.NEW,ITEM.SOLD, etc. . . ) transactionally with primary insert/update within the databases. Con-sumers subscribe to an event and are guaranteed at least one delivery. However, the order inwhich they are received is not guaranteed. Their publish/subscribe mechanism follows a messagemulticast pattern:

28

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 45: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 3. Architectures and Information Management Models

1. Search Feeder publishes item updates;

(a) Reads item updates from primary database;

(b) Publishes sequenced updates via a SRM-inspired protocol;

2. Nodes listen to be assigned subset of messages;

(a) Update in-memory index in real time;

(b) Request recovery when messages are missed.

3.2.1.1 Openstack

95 percent of eBay’s marketplace traffic is powered by its own OpenStack cloud [24]. OpenStackhas replaced VMware in the eBay-owned online-payment firm’s data centres (i.e. PayPal). Thereasons the company made the switch are some of the most-frequently cited reasons for switchingfrom a proprietary platform to an open source one: to have more freedom to customise and to avoidvendor lock-in. eBay has about 20 percent of its customer-facing website running on OpenStack,and processed all PayPal transactions on applications deployed on the platform. The companyalso hosts significant amounts of data. Close to 100 percent of traffic running through PayPal weband API applications, as well as mid-tier services, is now served by the company’s own privateOpenStack cloud.

OpenStack adoption is on the rise. Examples of other big-name users that have deployed it inproduction include Walmart Labs, Time Warner Cable, and CERN, the European Organisationfor Nuclear Research, which operates the Large Hadron Collider. However, data from recent usersurveys suggest manageability and in particular upgradeability, long held to be a significant barrierto OpenStack adoption, are still huge issues. Not enough details are made available on the actualimplementation and only high-level discussions and presentations are available from the companiesthat are using Openstack.

3.2.1.2 Storage

Databases technologies used within eBay data infrastructure are a heterogeneous mix of the fol-lowing [25] :

• Apache Cassandra;

• MongoDB;

• Oracle;

• MySQL;

• Apache HBase;

• Xmp.

29

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 46: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

Each of these databases have been chosen for a specific part of the underlying data infrastructure toanswer for very specific requirements. For instance, Cassandra is used to store the taste graph. i.e.,the graph that is used to deliver a more personal experience when shopping on eBay. MongoDBis used to store the metadata about the items ([26]) and the data for indexing an item is stored inHBase for efficient updates and random read [25].

In order to scale the storage efficiently, the databases are segmented into different functional areas[27]:

• User hosts;

• Item hosts;

• Account hosts;

• Feedback hosts;

• Transaction hosts

• 70 more functional categories.

The databases are partitioned by different scaling and usage characteristics and the architecturesupports functional decoupling and isolation [27]. It is using a horizontal split of the databases, i.e.splitting a table into different tables that will contain a subset of the rows that were in the initialtable, in order to have a horizontal scaling of the transactional load and limit the business impactof a potential database outage. Memcached is also used to store the results for highly expensiveand frequently used queries.

One of the largest Hadoop clusters is run by eBay on data sets (store in HBase) such as :

• Inventory Data : Product Listings, Catalogue, Quantity etc.;

• Transactional Data : Buying, Returning, etc.

• User Behavioural Data : Click stream, comments, suggestions, user activities, etc.

• Customer profiles : Buyer, Seller, Partner information, etc.

• Machine data : Logs, application data.

eBay’s Taste graph for their users, stored in Cassandra [25], is mostly derived from the userbehavioural and historical data. It is used within eBay’s search engine to rank the result of aquery to deliver a more personal experience to the user, according to its computed preferences.

The most important aspect to retain from eBay storage mechanism is the heavy partitioning ofthe data based on their usage.

30

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 47: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 3. Architectures and Information Management Models

3.2.2 How Information is Discovered and Shared on eBay

eBay is amazingly dynamic. Around ten percent of the 300+ million items for sale end each day(sell or end up unsold), and a new 10 percent is listed. A large fraction of items have updates: theyget bids, prices change, sellers revise descriptions, buyers watch, buyers offer, buyers ask questions,and so on. This section presents an overview of the metadata associated with an item to enableits discovery by the user (section 3.2.2.1) as well as details on the eBay search engine (in section3.2.2.2).

3.2.2.1 Metadata

Items on eBay are indexed using several metadata, which are usually entered by the item seller,that are in return used when a buyer is looking for a particular item. Metadata aims to find theitem rapidly and to filter out the results returned by the search engine. Metadata attached to anitem are, as an example, but not limited to:

• Category and Subcategory details;

• Item condition details;

• Payment methods.

In addition, other metadata and information are also part of a listing such as: item title anddescription as well as sellers data. Complete details about the metadata can be found withinthe API ([28]). It is worth noting that all item category and subcategory have their own set ofmetadata attached to them. Metadata can be assigned automatically to an item if the seller useseBay catalogue to first find the item he wishes to sell.

Other metadata are computed and inferred using all the logged events in eBay’s system. Theymostly concern the buyer behaviour and historical information and are used to personalise theexperience when using the auction site.

3.2.2.2 Search Engine

In the case of eBay, the dominant user need is to find a great deal on something they want topurchase. There is a high contrast between web search and eBay’s search. Web search is largelyunstructured and is mostly about searching through text within documents finding the highestprecision matches. eBay does have text in its items and products, but there’s much more structurein the associated information. For example, items are listed in categories, and categories have ahierarchy. eBay’s search is more about matching text or preferences to structured information.Also, eBay has some unique search requirements [29] :

• Real-time updates: update an item on any change (list, bid, sales, etc. . . ) as users expectchanges to be visible immediately;

31

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 48: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

• Exhaustive recall: sellers notice if search results miss any item and search results requiredata from every matching item;

• Flexible data storage: keywords as well as structured categories and attributes.

eBay’s search engine went through 2 iterations. The first search engine, named Voyager, was builtin 2002. It’s delivered impressively and reliably for over ten years. However, it was architectedbefore many of the modern advances in how search works, having been launched before Microsoftbegan its search effort and before Google’s current generation of search engines and as such hadto be replaced to deliver the information in a different way. In 2013, eBay launched its newsearch engine, named Cassini. Cassini is a true start-from-scratch rewrite of eBay’s search engine.Because of the unique nature of eBay’s search problem, it couldn’t use an existing solution [30].Among the design principles of Cassini were the following :

• It supports searching over vastly more text; by default, Voyager lets users search over thetitle of our items and not the description. It was decided that Cassini would allow searchesover the entire document;

• Data centres automation suite made deployment of the Cassini software much easier thandeploying Voyager;

• Support of sophisticated, modern approaches to search ranking, including being able toprocess a query in multiple rounds of rankings (where the first iteration was fast and approx-imate, and later rounds were intensive and accurate).

In addition, the search engine is able to remove the seller’s domination of the search results when asaid seller was listing multiple times the same product with different titles or very few differences.Cassini is all about the buyer and how much value a seller can offer to the buyer. Previously, itwas very easy to manipulate eBay’s search engine by flooding listings for the same item, but thisis no longer the case.

The search engine is built around five factors, each of them supported with the appropriate meta-data ([31]):

• Text factors;

• Image factors: users prefer pictures where the background is a single colour, that is, wherethe object of interest is easily distinguished from the background;

• Seller factors: how long have they been selling? How’s their feedback? Do they ship on time?Are they a trusted seller?

• Buyer factors: Do they always buy fixed-price items? What are the categories they buy in?What’s the shoe size they keep on asking for in their queries? Do they buy internationally?

• Behavioural factors: does this item get clicks from buyers for this query? What’s the watchcount on the item? How many bids does the auction have? How many sales have there beenof this fixed price item, given it’s been shown to users that many times?

32

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 49: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 3. Architectures and Information Management Models

Each of these factors are used when ranking the results presented to a user. To derive all theneeded metadata, eBay runs possibly one of the largest Hadoop clusters to analyse all the loggedevents about buyers, sellers and item behaviour. Search results ranking are also corrected for theirposition bias by normalising the number of clicks an item received with regards to its position. Ifnot corrected to bias would lead to an item being at the top of the list to remain there foreveras it is normally clicked more often [32]. Price bias is also accounted for as eBay figures out theexpected sales for an item of price X based on historical data.

3.2.3 Analysis of eBay’s Information Model

Every item or user which is sent to the eBay auction site / infrastructure is integrated in theeBay’s IM model with its supporting metadata as soon as it enters the infrastructure througheBay’s API. Metadata is assigned by the item’s seller or automatically if the seller can provide theUPC information about its product. The level of metadata is very heavy to enable very efficientitem search. Different sets of metadata are used for different types item category and sub categoryto deliver the most complete information to the buyer. In addition, a lot of user-based metadataare extracted using possibly the largest Hadoop cluster and are used within eBay’s search engine,Cassini, to provide custom search results to each and every user.

It can be assumed that this IM model accommodates information products given the large quantityof metadata and the use of different metadata sets for different items/products. Even if only textualinformation and pictures are stored, other kind of information could be integrated within this IMmodel.

One drawback is that in its current state, eBay’s IM model supporting metadata does not helpavoid data incest. The same information could technically be published many times by the sameitem seller. Very slight change in the item description or title can enable someone to list the sameproduct many times. This was a real problem few years ago when sellers were using this techniqueto hijack the top spots in the search results. A completely new search algorithm and updatedeBay policy reduced the problem by imposing penalties on the sellers who use this technique,however, nothing prevents someone to publish the same item several times by modifying some ofthe metadata.

Even if data pedigree is accounted for in the metadata that supports the model, sellers have founda way to go around this. Separate listings for identical items compatible with several products ormodels can occur. More than one listing of an identical item listed separately by the same sellerunder another eBay user ID or listing identical items in different categories. eBay gave sellersa deadline to mend their duplicate listing tactics, but at that point, the company will maintainthe previously announced strategy for reducing listing visibility for those who pollute the eBaymarketplace with duplicates. Those who continued to use duplicate listings would see reducedsearch visibility on the site, which is, of course, terrible for sales. This IM model is clearly notperfect when dealing with data pedigree and data incest. On the other hand, the motivation forprofit from the seller, which pushes the items within eBay market place might be the ones to blamein this case.

This IM model does not have to deal with changing data distribution requirements. There is

33

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 50: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

no private information besides a buyer profile and those are not subject to any modification intheir distribution requirements. The data distribution is also real time when a buyer requestsinformation about a particular item. However, in order to deal with security around the datathey store, their security approach following these four pillars: access control, perimeter security,data classification, and data activity monitoring. eBay built its own open-source data activitymonitoring tool, called Eagle. Eagle provides, in real-time and in a scalable manner, the followingfunctionalities:

• Anomalous data access detection based on user behaviour;

• Discovery of intrusions and security breaches;

• Discovery and prevention of sensitive data loss;

• Policy-based detection and alerting;

In terms of international/national standards used, eBay is mainly built around an SOA principle.As such, it supports SOAP, XML and JSON within their communications. Apache Zookeeper,Hadoop/HBase (HDFS), Apache Cassandra are also heavily used alongside MySQL, MongoDBand Memcached. Just as Facebook, their datastores are a mix of various data structure paradigmsfrom standard relational databases to document datastores and key-value pairs. However, it is alsoimportant to note, that each of those open-source products are often tweaked by eBay to servevery particular requirements.

A clear advantage of this IM model is the partitioning of the data and database along theirfunctional usage. It enables a very performant horizontal scaling. The real time update achievedto notify a user of a price change or new bid on an item is also crucial given the service eBayprovides. The stateless aspect of their user session is also what enables them to scale more easilythan other services of the same type. The main disadvantage of eBay information model is thatit relies on the seller as an input for data and information. As such, the motivation for profitserves as a driving opportunity to enter the same product multiple times, with slight differences.As such, we can say that data pedigree not taken into account since the same object can be soldunder different listing at the same time. Also, since the same object can be found on multipledifferent listings, data incest could occur, the same information being stored at multiple places.

Just like Facebook, eBay developed most of its infrastructure and information model from theground. Both of them are linked to the specific set of requirements they must answer.

3.3 Conclusion

This section presented the cloud computing architecture and the IM models of two of the majorservice providers, namely Facebook and eBay. Both have very different architecture and datarepresentation, uniquely built and designed to answer their particular requirements.

Facebook is mainly designed to store data in a written once - read often paradigm and deliveringsearch results which are very user-centric, while eBay items list is highly dynamic and is often

34

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 51: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 3. Architectures and Information Management Models

changing. As such, Facebook adopted a log-structured database, called Haystack coupled witha BLOB temperature based on its request rate to determine the storage of those BLOBs. eBayopted for a usage based horizontal partitioning of their datasets to scale easily.

Both services have structured their data and information as well as the supporting metadata toenable fast and reliable data access. Facebook developed its search engine and query suggestionengine to make optimal use of the underlying graph structure. The supporting metadata of thedata and information also reflects the intended usage of them. On the other hand, eBay also devel-oped its search engine, but with very different requirements. It searches within highly structuredmetadata, where each item is part of a category and subcategory and where ranking is influencedby the taste of the user to maximise the profit of the organisation.

However, both services rely on the same data distribution mechanism, publish / subscribe, to notifytheir subsystems as this messaging model offers very low latency and is less expensive as it hasno persistent queues between distributed stages. Its drawback is in dealing with slow consumers,which is not a problem when operating in the cloud. Both services are also using Memcachedaggressively to store in memory the result of very expensive requests or content that is requestedvery often by the users of their service. This mechanism removes the load on the underlyingdatabases and is faster than having to send a request and wait for an answer from the storagelayer.

The next section will present the current state of at-sea data exchange and its unique character-istics. It will be used as a basis to put in relation the previously introduced architecture and IMmodels and to determine their suitability in that application domain.

35

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 52: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

This page is intentionally left blank.

36

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 53: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 4

At-sea data exchange environment

This section describes the at-sea information management environment, namely the informationstorage, discovery, distribution, and access, in operations with multinational partners and identifiesunique characteristics that are typical of a naval at-sea environment, as compared to the ashoreenvironment.

4.1 Overview of Maritime Communications

Coalition member countries have their Defence Information Infrastructure (DII)s which provide forthe necessary message traffic between ashore and afloat systems via satellites as well as other HighFrequency (HF), Very High Frequency (VHF) and Ultra-High Frequency (UHF) communicationssystems.

Figure 4.1 [33] depicts Major Naval Networking Environment Components used as of 2008 inthe United States (US) Department of Defence (DoD) Navy. Each of the four parts of this figurerepresents a portfolio of programs under the US Department of the Navy (DON) that encompassesdevelopments providing connectivity between its naval organisations, as follows:

• Navy Marine Corps Intranet (NMCI) provides secure, universal access to integrated voice,video, and data communications and a common computing environment supporting roughly650,000 Navy and Marine Corps user accounts on 340,000 seats servicing over 3,000 locationsacross the DON,

• OCONUS Navy Enterprise Network (ONE-NET), including the Base Level Information In-frastructure (BLII) is Navy enterprise network that delivers centralised IT, improved security,standard configurations, and increased service levels to roughly 41,000 users at shore instal-lations overseas to the commands at Outside the Continental United States (OCONUS)sites,

• Information Technology for the 21st Century (IT-21) programs provide networking capa-bilities to the afloat forces. These programs include local area networks (e.g., Integrated

37

Page 54: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

Figure 4.1: Major Naval Networking Environment Components, reproduced from [33]

Shipboard Network System (ISNS)), services (e.g., Global Command and Control System -Maritime (GCCS-M)), routing (e.g., Automated Digital Networking System (ADNS)), satel-lite communications (e.g., Extremely High Frequency (EHF), Super High Frequency (SHF)and/or UHF), and a shore-based infrastructure that supports global operations. These net-working capabilities are explained further in sections below.

• Marine Corps Enterprise Network (MCEN) is a portfolio of acquisition programs that pro-vides network services to Continental United States (CONUS), OCONUS, and deployedMarine Air-Ground Task Forces (MAGTFs). MCEN utilises NMCI and IT-21 capabilitiesbesides its own internal tactical function.

A number of programs have been planned by DON for evolving naval networking to 2016 [33]including the implementation of its Next Generation Enterprise Network (NGEN), which will bethe replacement for NMCI and Consolidated Afloat Networks and Enterprise Service (CANES) forreducing server footprints and migrating existing shipboard hardware into a centralised, managedprocess, replacing ISNS and parts of IT-21 for afloat connectivity.

Defence Message System (DMS) is the DoD system for all message traffic. It uses COTS-based(equivalent to e-mail) X.400 messaging and X.500 directory services. The DMS supports organ-isational messaging at the unclassified, SECRET, and TOP SECRET levels and can exchangeinformation via Unclassified but Sensitive Internet Protocol Network (NIPRNET), Secret InternetProtocol Network (SIPRNET) and other government networks.

There are DoD publications that provide information for the routing of message traffic withinand/or between communication systems and for the transfer of message traffic between nationalcommunications systems.

38

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 55: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 4. At-sea data exchange environment

US government organisations use NIPRNET, a global long-haul Internet Protocol (IP)-based net-work to support unclassified IP data communications services for combat support applications andSIPRNET, the DoD largest interoperable command and control data network, supporting theGlobal Command and Control System (GCCS) and numerous other classified warfighter applica-tions.

The Combined Enterprise Regional Information Exchange System (CENTRIXS), initially a USsystem, is a collection of classified coalition networks, called enclaves, that enable informationsharing through the use of email and Web services, instant messaging or chat, the Common Op-erational Picture service, and Voice over IP. CENTRIXS uses SIPRNET and NIPRNET services,supporting combatant commands throughout the world, including the US Pacific, Central and Eu-ropean commands [34]. The Global Interconnections between the CENTRIXS networks is shownin Figure 4.2 [35].

Figure 4.2: Global Interconnections Between the CENTRIXS Networks, reproduced from [35]

High Frequency Internet Protocol (HFIP) and Sub-Network Relay (SNR) enable warfighters onCENTRIXS-Maritime to plan and execute coalition operations in a real-time tactical environmentby transporting IP data directly to and from ships. HFIP and SNR provide Allied, Coalitionand US maritime units with a direct platform-to-platform tactical networking capability usingstandard UHF and HF voice radios. HFIP operates in the HF spectrum and SNR operates in theUHF spectrum and give surface platforms the ability to share a single SATCOM resource for reachback capability. Figure 4.3 is an Operational View of HFIP and SNR connectivity[35].

Systems such as ADNS and Radio Communications System (RCS) provide connectivity betweenthe Navy shipboard networks and other ships and shore networks and route IP data over multipleRadio Frequency (RF) mediums, using various means including [35]:

• High Frequency Communications System,

39

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 56: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

Figure 4.3: Operational View of HFIP and SNR, reproduced from [35]

• Very High Frequency Communications System,

• Ultra High Frequency Line-of-Sight Communications System,

• Ultra High Frequency Satellite Communications System,

• Extremely High Frequency Satellite Communications System,

• Super High Frequency Satellite Communications System,

• International Marine/Maritime Satellite (INMARSAT),

• Commercial Broadband Satellite Programs.

The information exchange capability between ashore and afloat systems is as good as the availableconnectivity systems a ship has access to. All of the available connectivity means will continuouslybe enhanced and new connectivity means will be installed by coalition members with the advance-ment of technologies. However it is apparent that not all coalition members will have same accessto same advanced level of communication technologies and subsystems, satellite systems, on-boardinformation management systems, etc.

To ensure interoperability US Military Standards (MIL-STD-...) and North Atlantic Treaty Orga-nization (NATO) Standard NATO Agreement (STANAG)s have been (and are being) developed

40

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 57: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 4. At-sea data exchange environment

describing the Information Model (IM) providing the connectivity between the ships and shorenational and coalition systems. Coalition interoperability tests take place to ensure that the stan-dards are consistent.

From the above description of maritime communications the following can be summarised aboutthe way information is exchanged between at-sea vessels currently:

• There are a number of separate networks that support the exchange of information fordifferent purposes and between different geopolitical subgroups

• There is information exchange at different levels of security

• There are satellite as well as other radio communication systems that support the connectivityneeds of at-sea vessels using various frequency channels

• There are US Military NATO standards that enforce how often and what type of informationis exchanged

4.2 At Sea Combat System

A Surface vessel Combat System comprises sensors, weapons, communication, navigation, andCommand & Control (C2) capabilities. C2 ties together all the operational functions and tasksand applies to all levels of war and echelons of command across the range of military operations[36]. C2 in each ship at sea comprises the means by which an operational commander synchronisesand integrates force activities in order to achieve unity of command. Figure 4.4 provides a high-level view of an example advanced surface vessel Combat System [37] based on the configurationof the US AEGIS frigate combat system.

This section focuses on the discussion of the IM within the ship combat system, including:

• On-board and remote information sources and their characteristics,

• Storage of the information,

• Establishment of the Maritime Domain Awareness (MDA), and

• Sharing of the information with participating units (national and coalition ships collaboratingin a mission).

4.2.1 On-board information storage, discovery, distribution, and access

On-board information sources can include radar, Identification Friend or Foe (IFF), ElectronicSupport Measures (ESM), various sonar systems, infrared, optical and radar imaging systems aswell as sensors of on-board helicopter and Unmanned Aerial Vehicle (UAV)s and Unmanned Under-water Vehicle (UUV)s. The on-board information characteristics as well as quality attributes are

41

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 58: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

Figure 4.4: Surface Vessel Combat System, reproduced from [37]

a function of sensor interface specifications as well as information pre-processing approach withinthe ship C2. Modern C2 systems provide multi-source integration/fusion enabled decision supportfor estimation of target kinematics and identification by integrating any information received fromon-board sensors and storing in a structured database that provides the local tactical situation.

Various fusion approaches would either update the target information ad hoc based on the sub-sequent sensor information input or periodically fusing all sensor information collected during apre-defined time period. The database structure is usually designed to at least contain all thetarget information (time, position, velocity, identification attributes, uncertainties, quality, etc.)necessary by the Tactical Data Link (TDL) standards for sharing the information with participat-ing units. Number of slots in the database is limited and more modern ships have larger capacityfor storing information, hence a larger database.

The database access is by track number which is assigned by the TDL system and shared by theparticipating units on the network.

The TDL protocols are designed to ensure that the tactical picture on each participating unit isthe best information available on the network. As per these protocols, an estimated on-board trackinformation is shared with the participating units via TDL if it is new to the network or if it is

42

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 59: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 4. At-sea data exchange environment

correlated with an existing TDL track which has a lower calculated track quality.

The ship operational staff has access to additional information to the established tactical pic-ture enhancing their MDA like Common Operating Picture (COP) through GCCS-M, intelligencedatabases like Modernized Integrated Database (MIDB), connectivity to the internet and a num-ber of messaging capabilities for intelligence and other mission-related communications. However,the access and connectivity to such systems vary between ships nationally in each country as wellas between coalition members.

4.2.2 Tactical Data Links

The Tactical Digital Information Links (TADILs) or TDLs are a means to disseminate informationprocessed from RADAR, SONAR, IFF, Electronic Warfare, Self-Reporting and visual observations[38].

TADILs are a way of passing digital information between platforms of a battle group using nettedcommunication techniques and standard message formats. They can provide the following tacticaladvantages [39]:

• increased situational awareness;

• improved real time weapons coordination among land, surface and airborne units;

• high integrity communications, navigation and identification;

• improved data accuracy, throughput and availability;

• secure jam resistant communications and interoperability with joint and international forces.

The TDL systems used by various Navies today can include Link-14, Link-11, Link-16 and Link-22.Link 14 is a slow semi-automatic link and has only few ship-to-ship and ship-to-shore applications.Link-11, 16 and 22 are used for tactical maritime data exchange. Link-14 and Link-11 are usuallyin the older naval systems, while the more advanced naval vessels include Link-16 and Link-22,or may include a link processing system, which receives and transmits information on all linknetworks, providing the ship C2 with a consolidated tactical picture from all TDLs.

A comparison of Link-11 and Link-22/Link-16 message features is presented in Figure 4.5 [40].

Even though there are significant improvements in Link-16 and Link-22, there are limitations interms of how many tracks can be tracked on the network, how much information regarding eachtrack is sent as per the communication standard, how often each participating unit sends theirobservations to the link network and how often they receive updates from the link network.

Furthermore, since the information analysis and processing methods/algorithms on each platformare not necessarily the same, especially between different coalition members, the estimated infor-mation values as well as the quality attributes are often inconsistent, therefore it is anticipatedthat the precision of the tactical information shared over the link network is not uniform evenbetween the most advanced afloat systems of coalition members.

43

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 60: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

Figure 4.5: Comparison of Link-11 and Link-22/Link-16 Message Features, reproduced from [40]

4.2.3 GCCS

GCCS-M provides a technical solution for the display of information in a COP. GCCS-M is re-ported to integrate data from over 80 different command and control systems. GCCS-M supportingsystems architecture is the US DII Common Operating Environment (COE). GCCS-M system isof particular importance to the Canadian and US navies [41].

The objective of the GCCS-M program is to satisfy Fleet Command, Control, Communications,Computers and Intelligence (C4I) requirements through the rapid and efficient development andfielding of C4I capabilities. GCCS-M enhances the operational commander’s war-fighting capabil-ities and aids in the decision-making process by receiving, retrieving, and displaying informationrelative to the current tactical situation. GCCS-M receives, processes, displays, and manages dataon the readiness of neutral, friendly, and hostile forces in near-real-time via external communica-tion channels, local area networks Local Area Network (LAN)s and direct interfaces with othersystems [42].

The GCCS-M system is comprised of four main variants, Ashore, Afloat, Tactical/Mobile andMulti-Level Security (MLS) that together provide command and control information to warfightersin all naval environments. GCCS-M provides centrally managed C4I services to the Fleet allowingboth United States and allied maritime forces the ability to operate in network-centric warfareoperations. Figure 4.6 [43] Depicts the High-Level Information Flow and Functions of GCCS-M.

As shown in the Figure 4.6 the various GCCS-M variants use different sets of functions. Withinthe DII COE architecture, a collection of one or more software or data units most convenientlymanaged as a unit is defined as segments and various functions can be subscribed to various subsetof segments.

44

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 61: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 4. At-sea data exchange environment

Figure 4.6: GCCS-M High-Level Information Flow and Functions, reproduced from [43]

In order to allow for maximum interoperability among GCCS systems at all sites and activities(Afloat, Ashore and Tactical/Mobile), GCCS-M utilises common communications media to themaximum extent possible. The SIPRNET, NIPRNET and the Joint Worldwide Intelligence Com-munication System (JWICS) provide the necessary Wide Area Network (WAN) connectivity usingthe same protocols as the internet. Figure 4.7 [44] depicts the interfaces of the GCCS-M System.

It is apparent that GCCS-M provides a wealth of information. However, no open documents werefound where any naval vessel with full connectivity to GCCS-M has been described. As per [37]as of 2010 the following issues existed in the US AEGIS Combat system interface to GCCS-M:

• Combat system interface to GCCS-M was defined by a point - to - point fixed messageinterface

– Difficult to change on AEGIS platforms

– Interface never implemented on Ship Self-Defense System (SSDS) platforms

• Very limited set of data was provided

– Filtered subset of combat system tracks and COP tracks are accessible

– Only ability for operators to remotely log into GCCS-M applications is provided

– Never implemented ability to access reference databases

The Over-The-Horizon (OTH)-Gold format is used to structure track information and typicallyGCCS-M OTH-Gold messages are on the order of 1-2 Kilobytes of data transmitted as ASCII textmessages [45].

45

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 62: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

Figure 4.7: GCCS-M System Interfaces, reproduced from [44]

The GCCS-M is currently installed on the Canadian frigates [41]. Part of the GCCS-M install is theTactical Management System (TMS). The TMS segment provides the database management func-tion for the tactical data. The installation of GCCS-M Integrated Imagery and Intelligence (I3),which is an assortment of selected functionality from GCCS-M and GCCS-I3 is planned in thefuture. This combines some functionality from both the maritime and intelligence communities.One functionality being included as part of the GCCS-M I3 install is the MIDB. MIDB is a reposi-tory for intelligence data and is used by American, Canadian and Australian navies. GCCS-M andGCCS-M I3 use the operational specification for OTH-T-GOLD formatted structured messages.Again, the importance of GCCS-M and GCCS-M I3 for the Canadian frigate situational aware-ness and its ability to interoperate with the allies is apparent. However, no open documentationdetailing the level of GCCS-M connectivity to the frigate C2 was possible to find. Within theavailable open literature it appears that the current access of the Canadian vessels to GCCS-Mis read only, making it possible to augment the vessel’s situational awareness. It does not appearthat the Canadian vessels have access to the GCCS-M decision support capabilities or have theability to contribute with any pertinent ship observations into its COP.

4.2.4 Internet and other communication subsystems

The information exchange requirements nationally as well as between coalition members are sup-ported by a messaging system (DMS in the US) comprised of a number of communication functions.Figure 4.8 [35] depicts the data flow between various messaging systems used in the US DoD Navy.

46

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 63: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 4. At-sea data exchange environment

Figure 4.8: Message Traffic Process, reproduced from [35]

The systems described in Figure 4.8 [35] are part of defence messaging and communication tech-nologies state-of-the-art of 2008, providing functionality necessary for an at-sea system DMS thatincludes: Automated Message Store and Forward (NOVA); Fleet Message Exchange (FMX), Mes-sage Conversion System (MCS), Directory Update and Service Center (DUSC); Personal ComputerMessage Terminal (PCMT); Common User Digital Information Exchange Subsystem (CUDIXS);Fleet SIPRNET Messaging (FSM); Fleet Broadcast Keying System (FBKS); Multi-Level MailServer (MMS); High Assurance Guard (HAG); Automatic Message Handling System (AMHS);etc.

The 2008 document [35] identifies message passing limitations and describes architecture andfunctionality specifics to be able to overcome them. In 2013 the Command and Control OfficeInformation Exchange (C2OIX) has been announced [46] that is expected to support the currentconnectivity needs, however, not all coalition member at sea systems are likely to adopt it at thesame time.

4.2.5 Satellite Communication

The primary means of Ship/Shore Communications is via satellite. Shore and shipboard satel-lite subsystems are in a continuing state of evolutionary development. The latest proceduresfor communicating in this dynamic environment are maintained in Communications InformationAdvisories (CIAs) and Communications Information Bulletins (CIBs)[35].

The current Military Satellite Communications (MILSATCOM) architecture consists of three typesof systems operated by the military: wideband, narrowband, and protected. The 2013 report [47]consolidated information from various sources as follows:

47

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 64: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

• Wideband systems provide high data rate communication links (up to and beyond 274 Mbps)for data and video. The military currently operates two primary constellations of widebandsatellites: the legacy Defense Satellite Communications System (DSCS) operating in X-bandand the newest Wideband Global SATCOM (WGS) system operating in both Xband andKa-band. The military also leases transponders on commercial wideband satellites, such asIntelsat, for additional wideband capacity beyond what DSCS and WGS provide. By someestimates, up to 80 percent of DoD’s Satellite Communications (SATCOM) needs have beenmet using commercial systems.

• Narrowband systems provide voice and low data rate (up to 384 Kbps) communications formobile users in the UHF band. The primary military system currently used for narrowbandcommunications is the legacy UHF Follow-On (UFO) constellation. The first satellite of thenext generation narrowband constellation, the Mobile User Objective System (MUOS), waslaunched in 2012. An additional four MUOS satellites are planned, including one on-orbitspare. The military also leases commercial narrowband services from companies such asIridium.

• Protected MILSATCOM systems provide assured, survivable communications that are dif-ficult to detect, intercept, and jam and that can overcome some of the atmospheric effectsgenerated by a nuclear blast. The military currently operates two protected constellationsin the EHF band. The legacy Milstar constellation provides data rates up to 1.5 Mbps, andthe recently launched Advanced EHF (AEHF) satellites provide data rates up to 8.2 Mbps.These constellations are supplemented by the Interim Polar System (IPS), a two-satellite con-stellation in polar orbit that provides continuous coverage above 65 degrees latitude north.To lessen their reliance on ground stations, which can be vulnerable to attack, the Milstarand AEHF constellations use inter-satellite links to pass data directly from one satellite toanother without going through a ground station.

The MILSATCOM systems provide bandwidth and security to support the necessary connectivitybetween at-sea systems, however, not all coalition members have access to satellite connectivityof equivalent capabilities, and the information sharing services should be able to accommodatevarious connectivity levels of coalition members.

As per the Inmarsat press release [48] since October 2012 the Canadian Navy has started thedeployment of FleetBroadband and new Assured Access service on 29 Canadian naval vesselsproviding global broadband connectivity while at sea.

4.3 Global Information Grid

Ultimately the ideal way to ensure interoperability and integration and automation of militaryforces nationally and in coalition operations is through an agreed upon established infrastructure,information processing, situation analysis, planning and decision support capabilities, resources,standards, etc. available to all parties involved in and collaborating in military operations.

Starting in 1999, the DoD began to put in place a set of strategies, policies and initiatives in-tended to create an Enterprise-wide Information Environment known as the Global Information

48

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 65: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 4. At-sea data exchange environment

Figure 4.9: GIG Architecture, reproduced from [49]

Figure 4.10: GIG Networking, reproduced from [33]

49

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 66: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

Grid (GIG). The GIG is defined as the globally interconnected, end-to-end set of informationcapabilities, associated processes, and personnel for collecting, processing, storing, disseminating,and managing information on demand to warfighters, policy makers, and support personnel. TheGIG includes all owned and leased communications and computing systems and services, software(including applications), data, security services, and other associated services necessary to achieveInformation Superiority [50].

The GIG is designed to provide information capabilities that enable the access and exchangeof information and services within the DoD and extending to mission partners [49]. Figure 4.9[49] Depicts the GIG Connectivity Architecture and Figure 4.10 [33] shows the Naval networkingcapabilities providing GIG connectivity envisaged for 2016.

The GIGs role is to create an environment in which users can access data on demand at any locationwithout having to rely on (and wait for) organisations in charge of data collection to process anddisseminate the information. At the core are communications satellites, next-generation radios,and an installations-based network with significantly expanded bandwidth. These will providethe basic infrastructure through which data will be routed and shared. In addition, the GIGwould employ a variety of information technology services and applications to manage the flow ofinformation and ensure the network is reliable and secure [51].

Currently the GIG is a concept that is not fully realised, but in early 2013 the DoD put forward thevision for how to realise the GIG functionality by 2020 [52], discussed further in the next chapter(5).

4.4 Summary of unique characteristics

The sections above summarised the current state-of-the-art of various systems and processes thatbring about the unique characteristics of IM that are typical of a naval at-sea environment. Thesecharacteristics, using as a baseline the AEGIS frigate combat system shown in figure 4.4 arehighlighted below.

4.4.1 Information Storage

In addition to the Track stores, which contain the integrated information (using various manuals orautomated correlation/fusion/discovery methods) from all local and remote sources tracking abovewater, surface and subsurface entities (as well as intelligence) pertinent to the mission (tacticalpicture), there are multiple places where information is stored in a naval vessel, including:

• Local sensor subsystems,

• TDL subsystem,

• Weapon Management subsystems,

• Intelligence information,

50

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 67: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 4. At-sea data exchange environment

• Meteorological information,

• Navigation subsystem,

• Logistics subsystem,

• Training subsystem.

Some of these stores are specific to the vessel and are not currently shared with the coalitionmembers, e.g. Local sensors, Navigation and Logistics data. Some not currently shares storescould benefit from a multi-platform sharing and collaboration, e.g. Local sensors and WeaponManagement.

The TDL subsystem conforms to STANAGs to coordinate, send and receive from remote plat-forms augmenting the information in the track stores. If the vessel has also connectivity to theGCCS-M, tactical and intelligence information of OTH-Gold format will also be contributing intothe information in the track stores.

The various information stores are files of fixed (predefined) size and structure, most often limitedby the connectivity and computer resource availability on the vessel.

The size of the track stores are limited by the capacity of the on-board sensors, the maximumcapacity of the specific TDL subsystem available on the vessel, as well as the connectivity to theGCCS-M of that vessel. As per [44] as of 2010 the naval Battle Group Database Management(BGDBM) maximum size was limited to 999 tracks. However this number can be much smallerin older vessels, which have radars of limited buffer size. Generally, to reduce the computerresource utilisation, such vessels will often maintain a track store to maintain tracks only withinthe maximum range of the radar surveillance.

The structure of the track stores are designed to ensure that at least they store and maintain allinformation required by STANAG (and OTH-Gold), including target kinematic estimates, iden-tification/classification attributes, time stamp, track quality (using an algorithm that accountsfor detection, latency and other parameters), etc. However depending on information fusion/in-tegration approaches used on the vessel, the store may have additional information about thetracks.

An individual track slot gets initialised when a sensor observation does not correlate with anexisting track or when TDL or GCCS-M provide a new track report. The track is updated inreal-time with each new information becoming available from the own sensors and as soon as thetrack quality is assessed to be better than the one in TDL network.

4.4.2 Discovery

Information discovery in the current systems is performed by combination of computer-baseddecision support services/tools and humans.

The main objective of these tools is to maintain the best tactical picture integrating own sensorinformation with the information from all other ships (also coalition members) of the task force

51

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 68: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

in near real time. This means that each ship has advanced real-time capabilities to correlate theirsensor information with the tracks in the track store and fuse estimating updated tactical picture,including track quality and uncertainty information in near real time.

Considering that the vast majority of naval at sea systems among all coalition members (includingmany of the US ships) do not have access to the same advanced decision support and informationand situation analysis capabilities, each ship has such services/tools of various levels of maturityand sophistication. This means that information shared over the TDL is not likely to have consis-tent accuracy and track quality. Furthermore, not having a visibility to the pedigree of informationthat was contributed by a participating unit into the link network (not a STANAG requirement)the most advanced Combat Systems will not be able to calibrate their information discovery meth-ods accordingly. It is very critical to ensure that the pedigree information is part of the futureSTANAG requirements.

The naval vessels also include computer-based services to strategise and manage on-board and taskforce resources to accomplish the mission. These services for example include methods to analysethe lethality of entities within the tactical picture and to select the best weapon(s) within the taskforce to encounter them in real time. These too are of various levels of maturity and sophisticationbetween various vessels of a specific nation as well as between coalition members and (naturally)it was not obvious how this is calibrated within a task force from the open literature.

4.4.3 Distribution

Distribution of the tactical information (as well as command and force management orders) withinthe Naval vessels currently takes place via the TDL systems. The TDL subsystem structures buffersof specific maximum size containing local track kinematic and attribute information that is new orof better quality than what exists on the TDL network. These buffers get transmitted when polledat periodic intervals, depending on the specific Link system that is installed on the vessel. Similarly,it receives buffers of information from remote platforms that are new or of better track quality thanthose in the track stores at predefined intervals and forwards to the Track Management subsystemfor integration and storing in the track stores. On AEGIS systems there is a link managementsubsystem that communicates on all TDL systems and consolidates the remote information thatis sent out from the vessel or received to be integrated with the information in the track stores.

If the vessel has connectivity with GCCS-M, depending on its connectivity level it will requestinformation or receive broadcast information from it. This information will also be forwarded tothe Track Management subsystem for integration and storing in the track stores.

No open literature has been found which describes the process of how a naval vessel would con-tribute with information into the COP of the GCCS-M.

While there is internet and numerous messaging communication subsystems within the ships, itwasn’t clear if there are any structured standards on how these subsystems contribute to thecoalition operations within the open literature found.

52

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 69: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 4. At-sea data exchange environment

4.4.4 Access

The local information access for all Combat System functions in current systems is in multipleplaces accessed by dedicated services. The track stores are structured files accessed by the Linktrack number unique for the coalition.

The information access in the TDL network is based on the subscribe and publish principle,different depending on the specific TDL system installed on the vessel [38]. Link-11 supports onenetwork of participating units, each of which get polled to broadcast their information within theirtime slot, while Link-16 supports multiple networks to which a subset of ships can be subscribedto. The units automatically transmit, receive and relay data at pre-assigned times on pre-assignednets based on instructions given to their terminals when they are initialised.

The information access in GCCS-M can be broadcast or publish and subscribe.

4.4.5 At Sea IM Summary

A bullet form summary of the at sea IM based on open literature is given below:

• Information is stored in structured files with indexed access

• Information is maintained/updated in near-real-time by a dedicated service in each vessel

• Information volume is limited to the immediate vicinity (max range of sensors) of the vessels

• Information is shared and synchronised between coalition members conforming to STANAGand OTH-Gold standards and protocols managed by dedicated services supporting the ap-propriate TDL terminal and level of GCCS-M connectivity available on the vessel

• Information discovery/analysis services are specific to each vessel

53

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 70: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

This page is intentionally left blank.

54

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 71: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 5

Information management models inat-sea data exchange environment

The objective of this section is to assess and document the interplay between the cloud computingarchitectures presented in section 3 and the current IM approaches and information sharing char-acteristics in at-sea data exchange environment identified in section 4. It will include an analysisof :

• Suitability of IM of the above referenced cloud computing architecture to cloud implemen-tations that could occur in an at-sea environment

• Suitability of current at sea IM to particular cloud architectures

• An IM existing within a single cloud architecture that is shared among platforms that couldbe cloud disadvantaged or have no cloud computing capabilities (CtoDC and CtoNC cases).

Before proceeding with the analysis of suitability of particular IMs and cloud architectures in navalas-sea environment, an analysis of the requirements for the cloud enabled future at sea informationenvironment as envisioned by DoD are discussed below. This will help the establishment of a visionfor the path from the current naval at-sea environment to the future cloud-based naval coalitionoperations, and help the evaluation of which aspects of the particular IMs discussed in the sections2 and 3 could be appropriate for the future naval at sea IM.

5.1 Requirements for the Future At-sea Information Environment

The white paper published in January 2013, by the U.S. Army General Martin E. Dempsey, theChairman of the US Joint Chiefs of Staff identified the requirement for a Joint Information Envi-ronment (JIE), to improve mission effectiveness as one of the first concrete changes along the pathto constructing Joint Force 2020 [52]. The Deputy Secretary of Defence published implementationguidance for JIE in May 2013 [53]. The DoD Chief Information Officer (CIO) has set a vision

55

Page 72: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

to deliver an Information Enterprise (IE) that will incrementally evolve to realise the JIE vision[54]. The DoD Information Enterprise Architecture (IEA) describes what the DoD IE must be andhow its elements should work together to accomplish such a transformation and deliver effectiveand efficient information and service sharing. To accommodate operational needs, the DoD IEAincorporates the concepts previously embedded in separate net-centric strategies, particularly theGIG. The evolution and the requirements for the future at-sea information environment will beconsistent with the at-sea information environment envisioned in the GIG.

The GIG Convergence Master Plan 2012 (GCMP 2012) [55] envisages in the long term the GIGconcept being implemented in the commercial-government hybrid cloud computing environmentwith DoD retaining the identity provider role. The GCMP 2012, describes 3 stages of cloudimplementations of GIG, with the short-term implementations the GIG using the existing servicesas available, but also providing two private clouds: an unclassified DoD Platform, and a classifiedDoD Platform.

As per [2] the DoD Cloud Computing Goal is to“Implement cloud computing as the means todeliver the most innovative, efficient, and secure information and IT services in support of theDepartment’s mission, anywhere, anytime, on any authorised device.”. This document as well asa number of others (e.g. [56], [55]) envisage many challenges that need to be overcome of varioustypes, including security, technology and funding resource availability, policy, design, upgradeabil-ity, etc.. However it also includes a vision for an integrated environment on the GIG, consistingof DoD Components, commercial entities, federal organisations, and mission partners depicted inthe Figure 5.1.

Figure 5.1: DoD Enterprise Cloud Environment, reproduced from [2]

The Final Report of the Defense Science Board (DSB) Task Force on Cyber Security and Reliability

56

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 73: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 5. Information management models in at-sea data exchange environment

in a Digital Cloud [56] provides an assessment of the suitability of cloud computing architecturesfor DoD applications and investigated the benefits and risks (and challenges) of cloud computingfor the needs of deployed forces. The benefits include:

• Accessibility of agile computational capabilities to support increasingly multifaceted missions

• Availability of resources and services to handle:

– Varying or unpredictable computing requirements,

– Many, high capacity data feeds from sensor networks and other sources,

– Analysis of very large data sets or those that require the ability to move computationalresources.

• Availability of ubiquitous connection to common cloud-based services, such as email, sharedcalendars, unclassified training, or document preparation

For sensitive, classified, or time-critical applications, the DSB report recommends the DoD topursue private cloud computing to enhance mission capabilities.

Figure 5.2 [56] depicts a simplified diagram that shows a notional relationship of cloud computingto some traditional functional components of the GIG.

The vision in this figure is consistent with the high level vision of this project, where there could beafloat networks that are cloud disadvantaged with own sensors and weapons, while the sensors andweapons of cloud advantaged vessels re directly interfaced with the common cloud- based storesand command and control services.

The DoD Cloud Computing Strategy [2] envisages separate implementations and data exchanges onNIPRNET, SIPRNET, and Top Secret Sensitive Compartmentalised Information (TS SCI) secu-rity domains, using cloud services of commercial vendors where possible and specifically emphasisesthe importance to diverge from the National Institute of Standards and Technology (NIST) cloudservice model definitions (i.e. IaaS, PaaS, & SaaS) to uniquely identify Data as a Service (DaaS)and the resulting DoD Data Cloud as key concepts. As per [2] “Within the DoD, DaaS encom-passes two primary activities. The first is the continued implementation of the DoD Data Strategyand deployment of standardised data interfaces that make DoD information visible and accessibleto all authorised users. The second is the incorporation of emerging “big data” technologies and ap-proaches to effectively manage rapidly increasing amounts of information and deliver new insightsand actionable information”. Figure 5.3 [57] presents the Office of Naval Research (ONR) visionfor naval cloud-based information environment. This figure provides a good visual illustration ofa multitude of new dimensions of data deployment on the cloud that will enhance at-sea system’ssituational awareness, decision making and collaboration.

Inferring from the number of DoD studies referenced in this section ([2], [56], [52], [53], [54], [55]....) and also the ONR vision the future cloud-based at-sea information environment would supporta naval task force operations in numerous ways, including:

• Observations from sensors (information sources) in every cloud-enabled at-sea vessels aredirectly provided to the DaaS in real-time,

57

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 74: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

Figure 5.2: Cloud computing hardware and software as components of the GIG, reproduced from[56]

• Strategic, mission, intelligence, etc. information are maintained in DaaS in real-time availablefor every vessel according to their access privileges,

• Information discovery services establish the tactical situation pertinent for each task force inreal-time, prioritising information contributing into the estimated situational picture basedon pre-defined criteria, including pedigree of information sources,

• Access to the tactical situation estimates is available to every vessel on the cloud accordingto their access privileges,

• Sophisticated information discovery services perform situation and threat analysis and des-ignate the appropriate resources on the cloud (part of a specific vessel combat system) to actin real-time,

• Storage and maintenance of extensive strategic, a priori knowledge and historical data, aswell as appropriate discovery services available for enhanced elaborate situational assessmentand predictive analysis by the vessels, according to their access privileges,

• A unique high quality tactical situational awareness information not limited to any geographicarea is available to the vessels on the cloud at far beyond their sensor detection ranges (accessto wide area surveillance),

58

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 75: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 5. Information management models in at-sea data exchange environment

Figure 5.3: ONR Vision of Naval Tactical Cloud Processing, reproduced from [57]

• Sophisticated services for data analytics, mining, predictive situational evaluation and mis-sion planning available for deployment to meet just-in-time discovery needs (e.g. in real-timefor the information pertinent to the task force area of operations and as requested beyondthe area of interest of the task force),

• While the information discovery services in the cloud of near future provide at least capabili-ties equivalent to those of GCCS-M afloat and advanced combat systems, with no limitationsto connectivity and computer resource availability, much more powerful and sophisticatedmethods are added continuously, maintaining a fully consistent situational awareness in everyvessel at all times, including services for consistent strategy and force control development,

• The possibility of transparency and knowledge and information sharing between at sea taskforce operations and land-based multi-agency decision centres.

5.2 Suitability to Particular Cloud Architectures

This section puts the particularities of the IM models and architectures presented in section 3 inthe context of the at-sea data exchange requirements. The topics of storage, information repre-sentation, data discovery and distribution are discussed. Regarding data storage, the following

59

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 76: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

patterns can be extracted from Facebook and eBay IM models and architectures:

• Hot / warm / cold storage: this concept is very interesting for Naval data storage. Yousee that concept being more and more used in the different web services to increase accessto information that is often queried versus information which is archived and less likely tobe queried;

• Log-based storage (Haystack): this type of storage would not be very useful for the Navalenvironment. It is likely that the data and information products will be updated frequentlyand as such it is not compatible with Haystack which has been tailor-made for the writtenonce, read often pattern. One of the only data that would fit this paradigm would be rawsensor data;

• Partitioning of the database along usage patterns of the different topics: databasepartitions could translate easily to the at-sea data exchange case where for instance, mapsare likely to be requested less often than track and track updates. As such, a partition of thedatabase and allowing more resources to the most requested information would make sense.

One can also notice that multiple different Database Management System (DBMS)s are used inthe different sub-systems of both Facebook and eBay (Cassandra, HBase, MySQL, etc. . . ). TheDBMS selection for a particular back-end is tightly coupled with its requirements. As differentDBMSs exhibit different behaviours and are specialised for certain types of storage and operation,their selection must be made carefully when designing all the different systems to be deployed onthe cloud.

Information representation and discovery are made in a very different manner in Facebook or eBay.Most of Facebook’s information representation and discovery is made through the use of the SocialGraph, a Graph-like representation of the information where everything is a node. Informationis discovered starting from a particular node and specific queries are built by the system andsuggested to the user to be fast at execution. On the other side, eBay information discovery isbased on the use of a huge level of metadata and searching within a highly structured environment.All the items are divided in categories and several sub-categories, but they also contain textualinformation, such as title and description. Putting them in the context of at-sea data exchange,we can state the following:

• Graph-based representation : Graph approaches can be useful for some parts of the in-formation/data/geographical parameter/linguistics relationship analysis in various situationanalysis services. Graph approaches have also been proposed for resource allocation ([58],[59]) like sensor management and target/weapon pairing. As an example, a directed acyclicgraph (DAG) is already used within STANAG 4559 (implemented in the coalition shareddata servers) to represent objects and exchanged information. The main advantage of thisrepresentation is that it easily allows for new kinds of data as nodes and edges, that can beused to effectively store any kind of data;

• Huge level of metadata and searching within a highly structured environment:This data / information structure is similar to what we can expect from the military data.

60

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 77: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 5. Information management models in at-sea data exchange environment

The first level of discovery is of the type request / response and search results are given aspecific treatment to return variety and personalized list of items to give the buyer a largenumber of options. Buyers can ask for notifications on particular items (publish / subscribe).This approach is likely to be well suited for maritime surveillance as the domain is filled witha wide variety of information. However, such a model is less likely to accommodate newkinds of data as it is more rigid than the graph-based approach.

Both Facebook and eBay had to develop their own search engine. They both used an approachthat was optimal for their underlying data representation as serving billions of requests a day canquickly become a bottleneck if it is not addressed properly and the result of a query had to generatethe expected user experience. It is important to note that both services are making use of largeservers of Memcached to speed up the query. Memcached is also a trend in cloud applications tospeed up access to information as well as fasten information discovery and sharing and removingsome pressure from the underlying database storage.

For data distribution, Facebook and eBay data both use a publish and subscribe mechanism tonotify the different subsystems of their services of an updated event. However, the pull mechanismis used within Facebook when a user requests information about one of its friends. The pros andcons of both models can be summarised as follows:

• Publish and subscribe mechanism: it offers very low latency and is less expensive asit has no persistent queues between distributed stages. However, its drawback is in dealingwith slow consumers, which can be the case in the context of at-sea data exchange. There is apotential for losing messages if the consumer cannot keep up. This could be very problematicin the cloud-to-no-cloud case. The publish and subscribe is what is currently in use and islikely to stay in the future. It is the most effective way to have less impact on the networkand real time update on the data;

• Pull model: The pull model has higher latency than the publish/subscribe mechanism andis more expensive as it involves persistent queues. The pull mechanism is unlikely to work inthe surveillance domain for time-sensitive data as it is executed at some interval. However,this model is very suitable for slow consumers and very reliable.

Regarding data access and security, nothing is sensitive within eBay, except for personal infor-mation which is not shared unless an actual transaction occurs between the seller and buyer.Transactions are taken care of by a third party service (e.g. PayPal) so that no credit card in-formation is shared between the buyer and seller. However, all the information that eBay keepsabout its users is very sensitive. As such, a complete open-source solution was developed to securein real time their Hadoop cluster and all its related data. Still the different classification aspectsthat the military have to deal with when distributing and sharing information is very differentthan the ones that eBay is facing.

For Facebook’s users, the privacy checking of what is published is of tremendous importance.Privacy levels can easily be related to the classification levels of surveillance data. Several checksare made for every information sharing within Facebook although no detail is available about theway this privacy checks are made. However, there are not many details on how they resolve theprivacy checks from an implementation point of view.

61

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 78: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

5.3 CtoDC and CtoNC Interfaces

The discussions in section 5.1 presented huge advantages (in terms of coordination and situationalawareness) of a cloud enabled naval task force compared to a task force sharing the results of theirtask force management and estimated tactical situational picture via TDL or other means. Cloudenvironments can provide almost infinite computer resource availability, an IM employing verypowerful services and ensuring consistent situational awareness for force-level coordinated oper-ations, in comparison to the current state-of-the-art for at-sea collaboration described in section4.4.

However there are risks in such centralised systems compared to the distributed networks. In thecentralised system should there be disruption of any sort (communication, processing, etc.) someor all units on the network will become helpless. In the distributed networks each node has its ownestimates for the tactical situation as well as its own evaluation of threats and actions to supportthe mission. Even if this is of lesser quality, because due to the bandwidth and computer resourceconstraints some information is not shared, the ships on the task force can function independently,should there be any disruption in communication, and any corruption in one ship will not impactthe others.

Realistically it is reasonable to assume that in the near future a naval task force will consist of cloudenabled, not fully cloud enabled (or Cloud to Disadvantaged Cloud (CtoDC)) and cloud disabled(or Cloud to No Cloud (CtoNC)) vessels, similar to most current naval coalition task forces, wheredifferent ships have different levels of connectivity to GCCS-M and TDL. Diagrammatically, thissituation is well presented in Figure 5.2. In this figure the sensors and weapons directly on thecloud are those on the cloud enabled vessels, and the local area network with its own sensors andweapons could represent one or a network of a few CtoDC and CtoNC vessels.

To ensure best combined information quality on the network it is preferable that all sensor in-formation be provided to the cloud discovery services unprocessed. This is naturally easy to dofor cloud enabled systems, but for the CtoDC and CtoNC ships, they could form a (possibly lineof sight) sub-network with a cloud (at least CtoDC) enabled vessel, which will act as a relay inboth directions. This ship will relay the sensor data of this sub-network to the cloud and relayback to the network the tactical picture and force commands from the cloud. Alternatively, if nosub-network is established, the CtoDC and CtoNC ships should be able to communicate with thecloud and/or cloud enabled vessels using TDL. For all such communication and collaboration tobe possible it is necessary that the cloud provide TDL capabilities.

A network of CtoDC and CtoNC ships could possibly replicate and share services to combinecomputing resources and if the specific TDL protocol permits it should send raw sensor data tothe cloud discovery services to be able to extract maximum knowledge.

Furthermore to ensure that the cloud enabled vessels can act as a relay for the CtoDC and CtoNCvessels it is necessary that they too maintain TDL capabilities. Also to ensure that the cloudenabled ships do not become incapacitated should there be any type of disruption of connectivity,they should also have back-up C2 capabilities which can provide functionality to ensure accom-plishing the mission until the connectivity to the cloud is re-established.

The cloud IM should allow for a cloud enabled ships to become disconnected and function au-

62

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 79: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 5. Information management models in at-sea data exchange environment

tonomously, continuing to be part of the coalition task force via TDL and resume working on thecloud when possible. The cloud discovery services should be able to ensure synchronisation ofinformation when connectivity resumes in a similar way described in [60].

5.4 Cloud Computing (CC) Architecture Essentials

This section includes a summary of specifics of the CC architecture necessary to be able to supportthe naval at-sea coalition task force based on the discussions above. Most DoD documents analysingthe future transition of military to cloud architectures envisage two cloud platforms, classified andunclassified. They envisage using cloud services of commercial vendors, where possible, but predicta commercial-government hybrid cloud computing environment. Hybrid environment implies thatsome parts of the cloud architecture is proprietary, and those parts will have to be designed toaugment and provide the necessary control and security features that commercial cloud vendorsare not designed to provide.

There are a number of DoD studies ([2], [56], [61], [62], etc.) that identify specific challenges suchas authentication, control, policy and standards, as listed in section 5.1, which could be specificto both unclassified and classified military cloud processing. However they also identify specificCloud Computing Security Challenges better detailed in [61]:

• Processing information at multiple classification levels and under multiple authorities (e.g.DoD, DHS)

– Sanitisation/purging of local storage

– Data labelling (security labels)

– Privilege-based access control to data stored in the cloud

– Tailoring common operating picture presented to a user based on their privileges

• Certification and Accreditation

– Approves system Hardware/Software configuration

– Extremely difficult in dynamically provisioned environment

– Must trust system to enforce a security policy and accredit the policy

The other very important aspect of the CC architecture is the appropriate design of the DaaSmanagement, control and strategy, including:

• Information storage strategy for fast read/write for tracks and real time update (eg Mem-cached),

• Information attributes such as quality, pedigree, relevance, latency, pertinence, priority, se-curity, access control, etc. and data-driven services that have to act accordingly ensuringthat data integrity, availability to all coalition members according to their access privileges,

63

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 80: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

• Information synchronisation when a coalition member joins the task group or when a shipfinds itself out of connectivity from the cloud,

• Intelligence information sharing,

• Duplication of services for information sharing between coalition members that are CtoNCand CtoDC (e.g. cloud must provide TDL services to CtoNC and CtoDC) and appropri-ate information discovery methods to account for the different nature of the informationattributes (e.g. lower quality, higher latency, etc.) from such coalition members.

64

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 81: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Part 6

Conclusion

This document presented an overview of cloud-computing architecture of some of the major serviceproviders in social network and e-commerce, namely Facebook and eBay. Both of these companieshad to develop most of their underlying solution from scratch as no available tools were able tomeet the particular requirements in terms of scalability, flexibility, and security that these twocompanies had to cope with.

Both adopted a very dissimilar information management model. Facebook used a graph-based onewhile eBay is using a more traditional approach. They both developed a unique search engine,tailored to the underlying data and information characteristics and storage structure, in order toprovide their users with the desired experience. Despite some major differences, they are bothrelying on large servers of Memcached to speed up results to queries and avoid having to make toomany requests to the underlying database, which have been observed to be a bottleneck when theuser base is rapidly growing. A publish / subscribe mechanism is also used to notify the subsystemsof particular changes in real time within both Facebook and eBay.

When one looks at the current at-sea data exchange characteristics and its future requirements, itis easy to see that some approaches used by Facebook and eBay can be investigated in greater detailas they seem to be suitable in the naval context. Though not easy to grasp at first, some data andinformation can benefit of a graph-based storage. However, Facebook has shown that a specificsearch engine has to be developed to support this kind of data in order to present the user withmeaningful information. eBay is using a more traditional data representation. Well structuredand well-defined, that can be easily related to the current information model one can currentlyencounter within systems such as GCCS-M and also exchanged using the standards defined withinNATO.

Still, despite having a cloud-enabled navy, supporting current data exchange capabilities, such asLink 11 and Link 16, will still be needed to accommodate older fleet vessels and other coalitionof joint forces members as the communication layer required to have a fully cloud-enabled navymight not be achievable in all situations.

65

Page 82: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

This page is intentionally left blank.

66

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 83: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Bibliography

[1] Rubin Dhillon. Cloud computing in a denied environment, July 2013. URLhttp://www.geautomation.com/blog/cloud-computing-denied-environment.

[2] DoD. Cloud computing strategy, 2012. URLhttp://dodcio.defense.gov/Portals/0/Documents/Cloud/DoD%20Cloud%20Computing%

20Strategy%20Final%20with%20Memo%20-%20July%205%202012.pdf.

[3] Jinesh Varia. Cloud architectures. White Paper of Amazon, jineshvaria. s3. amazonaws.com/public/cloudarchitectures-varia. pdf, page 16, 2008. URLhttp://ciowhitepapers.com/reader/papers/owp.whitepaper.b3e7d6c0e71abbac.

636c6f7564617263686974656374757265732d76617269612e706466.pdf.

[4] NIST. Nist cloud computing standards roadmap, July 2013. URL http://www.nist.gov/

itl/cloud/upload/NIST_SP-500-291_Version-2_2013_June18_FINAL.pdf.

[5] Grace Lewis et al. Role of standards in cloud-computing interoperability. In System Sciences(HICSS), 2013 46th Hawaii International Conference on, pages 1652–1661. IEEE, 2013.URL http://repository.cmu.edu/cgi/viewcontent.cgi?article=1679&context=sei.

[6] Lee Badger, David Bernstein, Robert Bohn, Frederic de Vaulx, Mike Hogan, Michaela Iorga,Jian Mao, John Messina, Kevin Mills, Eric Simmon, Annie Sokol, Jin Tong, Fred Whiteside,and Dawn Leaf. Nist us government cloud computing technology roadmap, November 2011.URL http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.500-293.pdf.

[7] Edward Krebs. An enterprise social network reference architecture, 2013. URLhttp://www.w3.org/2013/socialweb/presentations/krebs1.pdf.

[8] Sanjay P Ahuja and Bryan Moore. A survey of cloud computing and social networks.Network and Communication Technologies, 2(2):p11, 2013. URLhttp://www.ccsenet.org/journal/index.php/nct/article/download/23882/18442.

[9] Jeff Luo, Jon Kivinen, Joshua Malo, and Richard Khoury. Architecture of a cloud-basedsocial networking news site. Journal of Emerging Technologies in Web Intelligence, 4(3):227–233, 2012. URL http://ojs.academypublisher.com/index.php/jetwi/article/

download/jetwi0403227233/5230.

[10] Kelly Goetsch. eCommerce in the Cloud: Bringing Elasticity to eCommerce. ” O’ReillyMedia, Inc.”, 2014. URL https://www.geekbooks.me/books/c9/fb/

f37bc23d500030d7d63ab158cced/ecommerce_in_the_cloud.pdf.

67

Page 84: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

[11] Georg Lackermair. Hybrid cloud architectures for the online commerce. Procedia ComputerScience, 3:550–555, 2011. URLhttp://ac.els-cdn.com/S1877050910004667/1-s2.0-S1877050910004667-main.pdf?

_tid=d3b4424c-6209-11e5-8a2a-00000aacb362&acdnat=1443023185_

95f18bd7f38158e2269f4c891ddd203c.

[12] Subramanian Muralidhar, Wyatt Lloyd, Sabyasachi Roy, Cory Hill, Ernest Lin, Weiwen Liu,Satadru Pan, Shiva Shankar, Viswanath Sivakumar, Linpeng Tang, et al. F4: Facebook’swarm blob storage system. In Proceedings of the 11th USENIX Conference on OperatingSystems Design and Implementation, OSDI, 2014. URLwww-bcf.usc.edu/~wyattllo/papers/f4-osdi14.pdf.

[13] Simon Bisson. How facebook does storage, January 2015. URLhttp://thenewstack.io/facebook-storage/.

[14] Murat Demirbas. Facebook’s software architecture, October 2014. URLhttp://muratbuffalo.blogspot.ca/2014/10/facebooks-software-architecture.html.

[15] Yogeshwer Sharma, Philippe Ajoux, Petchean Ang, David Callies, Abhishek Choudhary,Laurent Demailly, Thomas Fersch, Liat Atsmon Guz, Andrzej Kotulski, Sachin Kulkarni,et al. Wormhole: Reliable pub-sub to support geo-replicated internet services. In NSDI,May, 2015. URL https:

//www.usenix.org/system/files/conference/nsdi15/nsdi15-paper-sharma.pdf.

[16] Murat Demirbas. Finding a needle in haystack: Facebook’s photo storage, December 2010.URL http:

//muratbuffalo.blogspot.ca/2010/12/finding-needle-in-haystack-facebooks.html.

[17] Doug Beaver, Sanjeev Kumar, Harry C Li, Jason Sobel, Peter Vajgel, et al. Finding a needlein haystack: Facebook’s photo storage. In OSDI, volume 10, pages 1–8, 2010. URLhttps://www.usenix.org/event/osdi10/tech/full_papers/Beaver.pdf.

[18] Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding,Jack Ferris, Anthony Giardullo, Sachin Kulkarni, Harry C Li, et al. Tao: Facebook’sdistributed data store for the social graph. In USENIX Annual Technical Conference, pages49–60, 2013. URL http://dl.frz.ir/FREE/papers-we-love/datastores/

tao-facebook-distributed-datastore.pdf.

[19] Michael Curtiss, Iain Becker, Tudor Bosman, Sergey Doroshenko, Lucian Grijincu, TomJackson, Sandhya Kunnatur, Soren Lassen, Philip Pronin, Sriram Sankar, et al. Unicorn: Asystem for searching the social graph. Proceedings of the VLDB Endowment, 6(11):1150–1161, 2013. URLhttp://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p871-curtiss.pdf.

[20] Sriram Sankar, Soren Lassen, and Mike Curtiss. Under the hood: Building out theinfrastructure for graph search, March 2013. URLhttps://code.facebook.com/posts/153483574851223/

under-the-hood-building-out-the-infrastructure-for-graph-search/.

68

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 85: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

BIBLIOGRAPHY

[21] Xiao Li and Maxime Boucher. Under the hood: The natural language interface of graphsearch, April 2013. URL https://code.facebook.com/posts/316353631844205/

under-the-hood-the-natural-language-interface-of-graph-search/.

[22] Sriram Sankar. Under the hood: Indexing and ranking in graph search, March 2013. URLhttps://code.facebook.com/posts/153625638171563/

under-the-hood-indexing-and-ranking-in-graph-search/.

[23] Akhil Wable. Intro to facebook search, March 2010. URLhttps://code.facebook.com/posts/422334894553532/intro-to-facebook-search/.

[24] Yevgeniy Sverdlik. Why paypal replaced vmware with openstack, March 2015. URLhttp://www.datacenterknowledge.com/archives/2015/03/31/

private-openstack-cloud-replaces-vmware-at-paypal/.

[25] Jay Patel. Cassandra at ebay, June 2013. URL http://www.slideshare.net/

planetcassandra/c-summit-2013-buy-it-now-cassandra-at-ebay-by-jay-patel.

[26] Yuri Finkelstein. Storing ebay’s media metadata on mongodb, May 2013. URLhttps://www.mongodb.com/presentations/storing-ebays-media-metadata-mongodb-0.

[27] Randy Shoup and Dan Pritchett. The ebay architecture. In SD Forum, 2006. URLhttp://www.addsimplicity.com/downloads/eBaySDForum2006-11-29.pdf.

[28] eBay. Api documentation, 2015. URLhttps://go.developer.ebay.com/api-documentation.

[29] Hugh Williams. Ranking at ebay (part 1), April 2012. URLhttp://hughewilliams.com/2012/04/19/ranking-at-ebay-part-1/.

[30] Hugh Williams. The cassini search engine, July 2013. URLhttp://hughewilliams.com/2013/07/08/the-cassini-search-engine/.

[31] Hugh Williams. Ranking at ebay (part 2), April 2012. URLhttp://hughewilliams.com/2012/04/28/ranking-at-ebay-part-2/.

[32] Mike Mathieson. Using behavioral data to improve search, April 2011. URL http:

//www.ebaytechblog.com/2011/04/13/using-behavioral-data-to-improve-search/.

[33] USNavy. Naval networking environment (nne) 2016. Technical report, Department of theNavy, May 2008. URLhttp://www.documbase.com/goto/2847074-4c0415078ae3879bc1df196c4309e529/

Department-of-the-Navy-Naval-Networking-Environment-%28NNE%29~2016.pdf.

[34] Wikipedia. Centrixs, September 2014. URL https://en.wikipedia.org/wiki/CENTRIXS.

[35] Naval Network Warfare Command. Naval telecommunications procedures (ntp 4), 2008.URL http://navybmr.com/study%20material/NTP%204%20(E).pdf.

[36] USNavy. Naval doctrine publication 1: Naval warfare, March 2010. URLhttps://www.usnwc.edu/Academics/Maritime--Staff-Operators-Course/documents/

NDP-1-Naval-Warfare-%28Mar-2010%29_Chapters2-3.aspx.

69

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 86: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

[37] Kathy Emery Ref Delgado and Delores Washburn. Peo iws & peo c4i: Vision brief, 2010.URL http://www.onr.navy.mil/~/media/Files/Funding-Announcements/BAA/2010/

10-018-3-IWS-C4I-Industry.ashx.

[38] Northrop Grumman. Understanding voice and data link networking, December 2013. URLhttp://www.northropgrumman.com/Capabilities/DataLinkProcessingAndManagement/

Documents/Understanding_Voice+Data_Link_Networking.pdf.

[39] Arthur Filippidis, Steve Blandford, Kate Foster, and Gary Moran. Simulation activitiesusing gateway and tactical digital information links. Technical report, DTIC Document,2006. URL www.dtic.mil/get-tr-doc/pdf?AD=ADA460117.

[40] John Asenstorfer, Thomas Cox, and Darren Wilksch. Tactical data link systems and theaustralian defence force (adf) – technology developments and interoperability issues,February 2004. URL http://dspace.dsto.defence.gov.au/dspace/bitstream/1947/

4031/1/DSTO-TR-1470%20PR.pdf.

[41] Anthony W. Isenor and Eric Dorion. The use of gccs in the canadian navy and itsrelationship to c2iedm. Technical report, DRDC-Atlantic, February 2005. URL http:

//www.dtic.mil/cgi-bin/GetTRDoc?Location=U2&doc=GetTRDoc.pdf&AD=ADA436415.

[42] GlobalSecurity. Global command and control system - maritime (gccs-m) an/usq-119e(v),July 2011. URLhttp://www.globalsecurity.org/military/systems/ship/systems/gccs-m.htm.

[43] DARPA. Considerations in developing survivable architectures for global information grid(gig) systems. Technical report, Defense Advanced Research Projects Agency, August 2001.URL http:

//www.ai.mit.edu/projects/tuesday-group-as-of-02mar03/SGIGReport-Final.pdf.

[44] Topic 11 operational and tactical command and control systems. URLhttp://www.pwlk.net/IWBC/CORE/Topic%2011%20OP-TAC%20C2%20INST%20rev%200210%

2010%20Feb%2010.ppt.

[45] Alfred Mitchell and Charles Gooding. Global command and control system-maritime(gccs-m) segments and skycap assured ip software and their application in joint/combinedexpeditionary operations. Technical report, ICCRTS, 2005. URLhttp://www.dodccrp.org/events/10th_ICCRTS/CD/papers/032.pdf.

[46] USNavy. C2oix, the navy’s newest messaging system - cost efficient and simpler, November2013. URL http://www.navy.mil/submit/display.asp?story_id=74740.

[47] Todd Harrison. The future of milsatcom, 2013. URL http:

//www.csbaonline.org/wp-content/uploads/2013/07/Future-of-MILSATCOM-web.pdf.

[48] Immarsat. Canadian navy deploys ’fleetbroadband with assured access’, 2015. URLhttp://www.inmarsat.com/press-release/

canadian-navy-deploys-fleetbroadband-with-assured-access/.

[49] Altera. Fulfilling technology needs for 40g–100g network-centric operations and warfare.Technical report, Altera, September 2010. URL https:

//www.altera.com/en_US/pdfs/literature/wp/wp-01138-stxv-intelligence.pdf.

70

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

Page 87: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

BIBLIOGRAPHY

[50] Robert Damashek and Derek Anderson. Flowing focused and relevant information to theedge through semantic channels. In 10th ICCRTS, 2005. URLhttp://www.dodccrp.org/events/10th_ICCRTS/CD/papers/315.pdf.

[51] GAO. The global information grid and challenges facing its implementation. Technicalreport, United States Government Accountability Office, July 2004. URLwww.gao.gov/new.items/d04858.pdf.

[52] Martin E. Dempsey. Joint information environment. Technical report, Joint Chiefs of Staff,January 2013. URL http:

//www.dtic.mil/doctrine/concepts/white_papers/cjcs_wp_infoenviroment.pdf.

[53] DoD. Guidance for implementing the joint information environment. September 2013. URLhttp:

//dodcio.defense.gov/Portals/0/Documents/JIE/20130926_Joint%20Information%

20Environment%20Implementation%20Guidance_DoD%20CIO_Final_Document.pdf.

[54] DoD. Information enterprise architecture (dod iea) version 2.0. Technical report, DoD, July2012. URL http://dodcio.defense.gov/Portals/0/Documents/DIEA/DoD%20IEA%20v2%

200_Volume%20II_Description%20Document_Final_20120806.pdf.

[55] DISA. Gig convergence master plan 2012 (gcmp 2012) volume 1. Technical report, DefenseInformation Systems Agency (DISA), August 2012. URLhttp://www.disa.mil/Audience/~/media/Files/DISA/About/GCMP-2012-Volume-I.pdf.

[56] DOD Defense Science Board. Cyber security and reliability in a digital cloud, 2013. URLhttp://www.acq.osd.mil/dsb/reports/CyberCloud.pdf.

[57] ONR. Data focused naval tactical cloud (df-ntc): Onr information package, 2014. URLhttp://www.onr.navy.mil/~/media/Files/Funding-Announcements/BAA/2014/

14-011-Attachment-0001.ashx.

[58] Jay M Rosenberger, Hee S Hwang, Ratna P Pallerla, Adnan Yucel, Ron L Wilson, and Ed GBrungardt. The generalized weapon target assignment problem. Technical report, DTICDocument, June 2005. URLhttp://www.dodccrp.org/events/10th_ICCRTS/CD/papers/182.pdf.

[59] ZR Bogdanowicz and NP Coleman. Advanced algorithm for optimal sensor-target andweapon-target pairings in dynamic collaborative engagement. Technical report, DTICDocument, 2008. URL http://www.dtic.mil/cgi-bin/GetTRDoc?AD=ADA505790.

[60] Robert Perkins, Fernando Dejesus, Jayson Durham, Robert Hastings, and John McDonnell.C2 data synchronization in disconnected, intermittent, and low-bandwidth (dil)environments. 18th ICCRTS, 2013. URL http:

//www.dodccrp.org/events/18th_iccrts_2013/post_conference/papers/097.pdf.

[61] Chris Kubic. Dod cloud computing security challenges, December 2008. URLhttp://csrc.nist.gov/groups/SMA/ispab/documents/minutes/2008-12/

cloud-computing-IA-challenges_ISPAB-Dec2008_C-Kubic.pdf.

[62] Eugene W.P. Bingue and David A. Cook. Security in the cloud - now and around the corner,2012. URL http://www.ieee-stc.org/proceedings/2012/pdfs/2951EugeneBingue.pdf.

71

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.

OODA Technologies Inc.

Page 88: An Investigation into the Implications of Cloud Computing Models … · 902-426-3100 ScientificAuthority: AnthonyIsenor ContractNumber: W7707-145677 CallUpNumber: 11;4501328099 Project:

Final Report for RISOMIA Call-up 11

This page is intentionally left blank.

72

The use or disclosure of the information on this sheet is subject to the restrictions on the title page of this document.