18
Synapse The Hive Big Data Platform Mohan Reddy, Chief Architect The Hive

Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive

  • View
    161

  • Download
    1

Embed Size (px)

DESCRIPTION

Talk by Mohan Reddy - Chief Architect, The Hive at The Hive Big Data Think Tank Internet of Things Meetup hosted at The Intel Inc.

Citation preview

Page 1: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive

SynapseThe Hive Big Data Platform

Mohan Reddy, Chief ArchitectThe Hive

Page 2: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive

Vision

The HivePortfolio

Online Enterprise Internet of Things

ApplicationsApplications

SynapseSynapseBig DataBig Data

Data InfrastructureData Infrastructure

Knowledge Action

The HiveBig Data

Stack

Page 3: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive

• Accelerate product development & go‐to‐market of The Hive portfolio companies

• Plug the latest open source innovations in data science & infrastructure

• Engage & contribute back to relevant open source communities

• Share insights & experiences with The Hive Think Tank

Goals of Synapse

Page 4: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive

Synapse for IoT Applications

Smart Home Smart Building Smart Factory

Synapse Data InfrastructureSynapse Data Infrastructure

Data‐driven ControlData‐driven ControlDeep LearningDeep Learning

SecuritySecurity

Business AppsBusiness AppsThe HiveIoT Portfolio

Page 5: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive

5

Synapse IoT Compute Models

Page 6: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive

• Fast changing open source technologies adding complexity to application design

• Realtime stream analytics for operations that can respond to patterns in live data streams

• Rethinking trade‐offs between scale‐up & scale‐out architectures, especially for realtime use‐cases

• Faster machine learning through smarter partitioning of data & parallelism in model building

• Data management, lineage and curation add significant overheads to product development

Trends driving Synapse Design

Page 7: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive

7

Synapse Infrastructure Services

Visualization Service APIs

Machine Learning Provisioning & Deployment

Stream Processing Batch Processing

Storage

Data Ingestion & Lineage

Page 8: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive

8

Synapse Service Abstractions

Visualization Service APIs

Machine Learning Provisioning & Deployment

Stream Processing Batch Processing

Storage

Data Ingestion & Lineage

Taswira Alchemy

Akili Chombo

Tempus Huduma

Ukoo

Duka

LambdaArchitecture

Page 9: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive

9

Extendable Service Implementationsby Present/Future Open Source Projects

Visualization Service APIs

Machine Learning Provisioning & Deployment

Stream Processing Batch Processing

Storage

Data Ingestion & Lineage

Mophiline Kite Falcon

Page 10: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive

• A framework to build, reuse, link, manage and run data and job pipelines

• The pipeline is a collection of procedural steps, interactions, input and output ‐ steps needed to describe a big data business process

• Datasets come from different sources, industry‐standard and proprietary adapters, Apache Flume, MQTT, iBeacon etc.,

• Based on Apache Falcon, Kite SDK, Morphlines

10

Ukoo ‐ Data Ingestion, Lineage and Management

Page 11: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive

• An extensible framework to process realtime data and an API to compute real time ranking and aggregations

• Works with Spark Streaming and Storm

• Real time classification

11

Tempus ‐ Realtime Processor

Page 12: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive

12

Tempus Speed Layer

• Stream Processing• Continuous 

Computation • Transactional• Stores limited window 

of data

• Complexity Isolated in this layer only

• Fault tolerant by autocorrection in the next batch run

• Compensates for batch latency

Page 13: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive

• Data adapters and pipelines for different sources

• DSL based jobs using Scalding• Data connectors to storage layer 

supporting HBase, Cassandra and Redis

• Input to machine learning models

13

Huduma ‐ Batch Processor

Page 14: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive

• Framework and Infrastructure to run machine learning models

• Embedded models with code generation in R, Javascript and Java

• Online Classification Service• Large scale collaborative filtering 

based recommendation engine• Uses MLLIB, GraphLab and 

OXData.• Based on SMAC/Auto 

Weka/GhostFace model selection

14

Akili ‐Machine Learning As a Service

Page 15: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive

15

Akili – Schematic Description

Page 16: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive

• Real‐time and batch views of data

• REST Interface• Scalable and Highly Available• Generic Service which 

interfaces with Data Storage and other realtime and batch processes

16

Alchemy ‐ Service Layer

Page 17: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive

• Scalable Interactive visualization 

• Uses D3, Aperture and Gephi

• Works with Tableau.

17

Taswira ‐ Visualization framework

Page 18: Synapse - The hive big data platform by Mohan Reddy - Chief Architect, The Hive

• Deployment of the components as a lightweight, portable, self‐sufficient container that will run virtually anywhere

• Docker based containers

18

Chombo ‐ Deployment Provisioning