Upload
chicago-hadoop-users-group
View
55
Download
1
Embed Size (px)
Citation preview
AboutMe
• EngineeringBackground- AppDev• OpensourceContributor• Hadoop– 10years• HWXPrincipleSolutionEngineer• Director,SolutionsEngineering@Kinetica
• Kinetica LocalContactInformation• Sunile Manjee,DirectorSolutionsEngineering,[email protected]• PhilZacharia,DirectorCentralRegion,[email protected]
2
The image part with relationship ID rId2 was not found in the file.
WhatisKinetica?
3
PatentedInMemory
ColumnarDistributed
GPUAcceleratedDatabase
The image part with relationship ID rId2 was not found in the file.
DevelopedtoIdentifyTerroristicThreatsinReal-Time
4
Kinetica incubated as a massively parallel computational engine for US Army INSCOM
Ingests 200+ sources of streaming data –mobile devices, drones, social media, cyber data
200B new records per hour
Incorporates geospatial and temporal data
Real-time, actionable threat intelligence
First high-performance database leveraging GPUs4
The image part with relationship ID rId2 was not found in the file.
WhoisKinetica?20
09
‘HPC Research Project’ incubated by US military
2010
2011
Patent # US8373710 B1 issued to GPUdb
2012
US Army deploys GPUdb
2013
GPUdb commercially available
2014
IDC HPC innovation excellence award
Army
GPUdb goes into production
at USPS
2015
Iron Net selects GPUdb for Cyber
Defense
2015
PG&E selects GPUdbfor electric grid
analysis
IDC HPC innovation excellence award
USPS
2016
Rebrand to
The image part with relationship ID rId3 was not found in the file.
4
2012
Confidential Information
Confidential Information6
Current Data Architectures Can’t Keep Up | Complex, Rigid, Agility
Challenges• Infrastructure complexity, costs – stitch together multiple
tools – separate tools for BI, ML, OLAP cubes, databases• High Latency – can’t handle big data’s volume, variety,
velocity• Data needs to be pre-aggregated and transformed to cubes• Processing is batch and not real-time
• Rigid – can’t handle changing requirements, changing data• Dashboard slowness pains
• Datamarts in Tableau, caching, very complex query
• Difficult to simultaneously ingest and analyze at scale• Limited Agility – admin overhead, resources, skills
Tableau
EDW(Teradata, Oracle)
Star schema – facts & dimensions
DATA
3rd partyERP, CRM, SFA Databases Flat files
MSTR SAS
Data Integration (INFA, Talend)
Others
Hadoop(Horton,
Cloudera)
DATA MARTS
OLAP CUBES INDICES SUMMARY
Tables
NiFi, Kafka
Confidential Information7
Kinetica Database | Real-Time, Flexible, Simple Data and Analytics
Tableau
EDW(Teradata, Oracle)
DATA
3rd partyERP, CRM, SFA Databases Flat files
MSTR SAS
Data Integration (INFA, Talend)
Others
Hadoop(HDP, CDH,
MapR)
Kinetica
NiFi, Kafka
Solution• Low Latency – millisecond response time• Real-time at scale – simultaneously ingest and analyze• Full data provisioning – ingest, manage, analyze, visualize• Flexible – handle changing requirements, changing data,
minimize aggregates, indexes, cubes• Simplicity – minimize admin overhead, resources, skillsPlus• Converge AI and BI• Location-based Analytics• Deploy on commodity hardware on-prem, cloud
The image part with relationship ID rId2 was not found in the file.
Confidential Information
Kinetica : Unique Strengths & Capabilities
Fast,Distributed,In-MemoryAnalyticsEngineforFastMoving,LargeScaleData
KineticaisdesignedtotakeadvantageoftheparallelprocessingnatureoftheGPU.Itdeliverslow-latency,highperformanceanalyticsonlargedatasets,andmakesstreamingdataavailableforqueryinreal-time.
8
OLAP Performance,
Scalability, Stability
Geospatial Processing & Visualization
API for GPU Powered Data &
Compute Orchestration
ConvergedAIandBI
UserDefinedFunctions(UDFs)andorchestrationofdatainadistributedmannerenableKineticatoofferlow-levelcustomizationsformachinelearningandAIworkloads
NativeGeospatial&VisualizationPipeline
Nativevisualizationpipelinemakesiteasiertoworkwithlargegeospatialdatasets.IdealforIoT use-cases,andpoweringgeospatialapplications
SonicLayer(Fast/TrueRealtime
Analytics)
HistoricandPredictiveInsights
InteractiveLocation-BasedAnalytics
c
c
9
CUDA
SELECTa*x+y FROMTABLE
SQL
Python
importgpudb
h_db = gpudb.GPUdb(encoding ='BINARY',host = '127.0.0.1', port = '9191’)
response=h_db.get_records_by_column(’TABLE',["(a*x+y)"],0,10,'json',{})
Make/Build
Cuda Abstraction,SaxPy Example
https://devblogs.nvidia.com/parallelforall/easy-introduction-cuda-c-and-c/
Confidential Information
KineticaArchitecture
7
VISUALIZATIONviaODBC/JDBCAPIs
JavaAPI
JavaScriptAPI
RESTAPI
C++API
Node.jsAPI
PythonAPI
OPENSOURCEINTEGRATION
ApacheNiFi
ApacheKafka
ApacheSpark
ApacheStorm
GEOSPATIALCAPABILITIESGeometricObjects
Tracks
GeospatialEndpoints
WMS
WKT
KINETICA CLUSTEROnDemandScale
CommodityHardwareW/GPU’s
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
ColumnarIn-memory
HTTPHeadNode
CommodityHardwareW/GPU’s
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
ColumnarIn-memory
HTTPHeadNode
CommodityHardwareW/GPU’s
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
ColumnarIn-memory
HTTPHeadNode
CommodityHardwareW/GPU’s
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
ColumnarIn-memory
HTTPHeadNode
OTHERINTEGRATION
MessageQueues
ETLTools
StreamingTools
• Reliable,AvailableandScalable• Diskbasedpersistence• Addnodesondemand• Datareplicationforhighavailability• Scaleupand/orout
• Performance• GPUAccelerated(1000’sCoresperGPU)• IngestBillionsofrecordsinminutes• Ultralowlatencyqueryperformance
• MassiveDataSizes• 100’sofTerabytesScale• Billionsofentries
• Connectors• ODBC/JDBC• RestfulEndpoints• RichAPI’s• StandardGeospatialCapabilities
• RunAnywhere• Onpremise,Amazon,Azure,GoogleCloud,
Nimbix,SoftLayer• HardwarePartners
• IBM,Dell,Cisco,HP
The image part with relationship ID rId2 was not found in the file.
CoreDesign&Architecture
12
GPU
SHARD
Chunk
Logical Node
Chunk
Chunk
SHARD
Chunk
Chunk
Chunk
SHARD
Chunk
Chunk
Chunk
SHARD
Chunk
Chunk
Chunk
SHARD
Chunk
Chunk
SHARD
Chunk
Chunk
SHARD
Chunk
Chunk
SHARD
Chunk
Chunk
GPU
Logical Node
GPU
SHARD
Chunk
Logical Node
CPU Socket
Chunk
Chunk
SHARD
Chunk
Chunk
Chunk
SHARD
Chunk
Chunk
Chunk
SHARD
Chunk
Chunk
Chunk
SHARD
Chunk
Chunk
SHARD
Chunk
Chunk
SHARD
Chunk
Chunk
SHARD
Chunk
Chunk
GPU
Logical Node
System Memory (RAM)
ChunkChunk ChunkChunk ChunkChunk Chunk Chunk
Table:Column:Data
Table:Column:Data
Table:Column:Data
Table:Column:Data
Table:Column:Data
Table:Column:Data
Table:Column:Data
Table:Column:Data
Table:Column:Data
Table:Column:Data
Table:Column:Data
Table:Column:Data
Table:Column:Data
Table:Column:Data
Table:Column:Data
Table:Column:Data
Table:Column:Data
Table:Column:Data
Table:Column:Data
Table:Column:Data
Table:Column:Data
Table:Column:Data
Table:Column:Data
Table:Column:Data
Map to Persist
CPU Socket
Confidential Information
The image part with relationship ID rId2 was not found in the file.
KineticaUDF
13
GPUSHARD
Chunk
Logical Node
CPU Socket
Chunk
Chunk
SHARD
Chunk
Chunk
Chunk
SHARD
Chunk
Chunk
Chunk
SHARD
Chunk
Chunk
Chunk
SHARD
Chunk
Chunk
SHARD
Chunk
Chunk
SHARD
Chunk
Chunk
SHARD
Chunk
Chunk
Logical NodeGPU
SHARD
Chunk
Logical Node
CPU Socket
Chunk
Chunk
SHARD
Chunk
Chunk
Chunk
SHARD
Chunk
Chunk
Chunk
SHARD
Chunk
Chunk
Chunk
SHARD
Chunk
Chunk
SHARD
Chunk
Chunk
SHARD
Chunk
Chunk
SHARD
Chunk
Chunk
Logical NodeSystem Memory (RAM)
ChunkChunk ChunkChunk ChunkChunk Chunk Chunk
The image part with relationship ID rId3 was not found in the file.
The image part with relationship ID rId3 was not found in the file.
The image part with relationship ID rId3 was not found in the file.
The image part with relationship ID rId3 was not found in the file.
The image part with relationship ID rId3 was not found in the file.
GPU GPU
The image part with relationship ID rId3 was not found in the file.
The image part with relationship ID rId3 was not found in the file.
The image part with relationship ID rId3 was not found in the file.
The image part with relationship ID rId3 was not found in the file.
The image part with relationship ID rId3 was not found in the file.
The image part with relationship ID rId3 was not found in the file.
The image part with relationship ID rId3 was not found in the file.
The image part with relationship ID rId3 was not found in the file.
The image part with relationship ID rId3 was not found in the file.
The image part with relationship ID rId3 was not found in the file.
The image part with relationship ID rId3 was not found in the file.
The image part with relationship ID rId3 was not found in the file.
The image part with relationship ID rId3 was not found in the file.
The image part with relationship ID rId3 was not found in the file.
The image part with relationship ID rId3 was not found in the file.
The image part with relationship ID rId3 was not found in the file.
Confidential Information
The image part with relationship ID rId2 was not found in the file.
CPUBound"RealTime”Architectures
14
Data Stream
Buy/Add More Nodes
Concurrent Ingest & Analytics
Confidential Information
The image part with relationship ID rId2 was not found in the file.
KineticaRealTimeAnalyticsArchitecture
15
Data Stream
Concurrent Ingest & Analytics
GPU
Confidential Information