Upload
hemal-gandhi
View
223
Download
6
Tags:
Embed Size (px)
Citation preview
H E M A L G A N D H ID I R E C T O R O F DATA E N G I N E E R I N G
DATA ENGINEERING AT ONE KINGS LANE
Powering business decisions through understanding
customer behavior.
Observations on Data Platforms
DREAM
You start with a simple design...
REALITY
...but you end up with a complex design.
DREAM
You start with full speed...
REALITY
...but you end up being slow.
DREAM
You start with the latest technology...
REALITY
...but end up with old stack before going live.
DREAM
You dream of a low cost platform...
REALITY
... but you end up shelling a lot of $$.
To build a scalable, loosely coupled big data platform.
WHAT IS OUR GOAL
Some design questions we need to answer:
DESIGN
Which technologies to choose? How to keep the stack current?
How to keep up with evolving business needs?
How to make your investment count?
It’s like building a city.
Technology
ProcessPeople
Technology
ProcessPeople
High Level Architecture
COLLECTION
- Apache Flume
- Sqoop
FLOW
- Kafka
- Spark
STORAGE
- HBase
- Hive
PROCESSING
- Pig
- Spark
DELIVERY
- Visualization
- Email / FTP
DATA PLATFORM
SCHEDULING & CLUSTER MONITORING
DATA PLATFORM
SE
CU
RIT
Y
COLLECTION
- Apache Flume
- Sqoop
FLOW
- Kafka
- Spark
STORAGE
- HBase
- Hive
PROCESSING
- Pig
- Spark
DELIVERY
- Visualization
- Email / FTP
APPLICATIONS & VISUALIZATION TOOLS
SCHEDULING & CLUSTER MONITORING
DATA PLATFORM
SE
CU
RIT
Y
COLLECTION
- Apache Flume
- Sqoop
FLOW
- Kafka
- Spark
STORAGE
- HBase
- Hive
PROCESSING
- Pig
- Spark
DELIVERY
- Visualization
- Email / FTP
DATA ACCESS ABSTRACTION API
SCHEDULING & CLUSTER MONITORING
DATA QUALITY SERVICE
DATA PLATFORM
APPLICATIONS & VISUALIZATION TOOLS
SE
CU
RIT
Y
COLLECTION
- Apache Flume
- Sqoop
FLOW
- Kafka
- Spark
STORAGE
- HBase
- Hive
PROCESSING
- Pig
- Spark
DELIVERY
- Visualization
- Email / FTP
DATA ACCESS ABSTRACTION API
SCHEDULING & CLUSTER MONITORING
DATA QUALITY SERVICE
DREDGE
SE
CU
RIT
Y
DATA PLATFORM
APPLICATIONS & VISUALIZATION TOOLS
COLLECTION
- Apache Flume
- Sqoop
FLOW
- Kafka
- Spark
STORAGE
- HBase
- Hive
PROCESSING
- Pig
- Spark
DELIVERY
- Visualization
- Email / FTP
WHAT IS DREDGE
A declarative, abstraction layer for integrating big data
tools, enabling loosely coupled big data platform.
SOURCE END POINTS
DREDGE LOGICAL VIEW
SOURCE END POINTS
DREDGE LOGICAL VIEW
SOURCE READERS
SOURCE END POINTS
DREDGE LOGICAL VIEW
SOURCE READERS
TASKSHADOOP CLUSTER
SOURCE END POINTS
DREDGE LOGICAL VIEW
SOURCE READERS
TASKSHADOOP CLUSTER
TARGET WRITERSSTREAM/DIRECT
SOURCE END POINTS
DREDGE LOGICAL VIEW
SOURCE READERS
TASKSHADOOP CLUSTER
TARGET WRITERSSTREAM/DIRECT
TARGET ENDPOINTS
SOURCE END POINTS
DREDGE LOGICAL VIEW
SOURCE READERS
TASKSHADOOP CLUSTER
TARGET WRITERSSTREAM/DIRECT
TARGET ENDPOINTS
LOG STREAMING
SOURCE END POINTS
DREDGE LOGICAL VIEW
SOURCE READERS
TASKSHADOOP CLUSTER
TARGET WRITERSSTREAM/DIRECT
TARGET ENDPOINTS
LOG STREAMINGEVENTS
MANAGEMENT
SOURCE END POINTS
DREDGE LOGICAL VIEW
SOURCE READERS
TASKSHADOOP CLUSTER
TARGET WRITERSSTREAM/DIRECT
TARGET ENDPOINTS
LOG STREAMINGEVENTS
MANAGEMENT
CONFIGURATION
ABSTRACTION
SOURCE END POINTS
LOG STREAMINGEVENTS
MANAGEMENT
CONFIGURATION
ABSTRACTION
TARGET ENDPOINTS
DREDGE LOGICAL VIEW
DREDGE REPOSITORY – HBASE
SOURCE READERS
TASKSHADOOP CLUSTER
TARGET WRITERSSTREAM/DIRECT
DREDGE ARCHITECTURE
LAMDA ARCHITECTURE : HDFS, HIVE, HBASE, P IG, FLUME, KAFKA, OOZIE
DREDGE DATA SERVICES
DREDGE ARCHITECTURE
ABSTRACTION BUILDER (KAFKA, FLUME, P IG, CUSTOM )
PLUGIN (JAVA/SHELL , P IG, SQL )
RANK, SORTER
AGGREGATOR
UDF’S
SET OPERATIONS
COMBINERS, ROUTERS. .
F ILTERS/PATTERNS ANALYSIS
SOURCE READERS (LOGS, RDBMS, UNSTRUCTURED DATA, CUSTOM ) D IRECT/STREAM
TARGET WRITERS (HIVE, HBASE, RDBMS, CUSTOM )DIRECT/STREAM
LAMDA ARCHITECTURE : HDFS, HIVE, HBASE, P IG, FLUME, KAFKA, OOZIE
DREDGE RUNTIME
DREDGE DATA SERVICES
DREDGE ARCHITECTURE
TEMP STORE - HDFS TEMP STORE - HDFSEVENT
MANAGEMENT
ABSTRACTION BUILDER (KAFKA, FLUME, P IG, CUSTOM )
PLUGIN (JAVA/SHELL , P IG, SQL )
RANK, SORTER
AGGREGATOR
UDF’S
SET OPERATIONS
COMBINERS, ROUTERS. .
F ILTERS/PATTERNS ANALYSIS
SOURCE READERS (LOGS, RDBMS, UNSTRUCTURED DATA, CUSTOM ) D IRECT/STREAM
TARGET WRITERS (HIVE, HBASE, RDBMS, CUSTOM )DIRECT/STREAM
LOGGERSTREAM
LAMDA ARCHITECTURE : HDFS, HIVE, HBASE, P IG, FLUME, KAFKA, OOZIE
DREDGE RUNTIME
DREDGE UI
Declarative configuration
Logical Flows
Data Lineage
Runtime Logs
Admin
DREDGE DATA SERVICES
DREDGE ARCHITECTURE
TEMP STORE - HDFS TEMP STORE - HDFSEVENT
MANAGEMENT
ABSTRACTION BUILDER (KAFKA, FLUME, P IG, CUSTOM )
PLUGIN (JAVA/SHELL , P IG, SQL )
RANK, SORTER
AGGREGATOR
UDF’S
SET OPERATIONS
COMBINERS, ROUTERS. .
F ILTERS/PATTERNS ANALYSIS
SOURCE READERS (LOGS, RDBMS, UNSTRUCTURED DATA, CUSTOM ) D IRECT/STREAM
TARGET WRITERS (HIVE, HBASE, RDBMS, CUSTOM )DIRECT/STREAM
LOGGERSTREAM
LAMDA ARCHITECTURE : HDFS, HIVE, HBASE, P IG, FLUME, KAFKA, OOZIE
DREDGE RUNTIME
DREDGE UI
Declarative configuration
Logical Flows
Data Lineage
Runtime Logs
Admin
DREDGE DATA SERVICES
DREDGE ARCHITECTURE
DREDGE REPOSITORY – HBASE
TEMP STORE - HDFS TEMP STORE - HDFSEVENT
MANAGEMENT
ABSTRACTION BUILDER (KAFKA, FLUME, P IG, CUSTOM )
PLUGIN (JAVA/SHELL , P IG, SQL )
RANK, SORTER
AGGREGATOR
UDF’S
SET OPERATIONS
COMBINERS, ROUTERS. .
F ILTERS/PATTERNS ANALYSIS
SOURCE READERS (LOGS, RDBMS, UNSTRUCTURED DATA, CUSTOM ) D IRECT/STREAM
TARGET WRITERS (HIVE, HBASE, RDBMS, CUSTOM )DIRECT/STREAM
LOGGERSTREAM
LAMDA ARCHITECTURE : HDFS, HIVE, HBASE, P IG, FLUME, KAFKA, OOZIE
Closing the Loop
Abstraction layer
Abstraction layer
Reusable data components
Abstraction layer
Reusable data components
Event Driven dependencies
Abstraction layer
Reusable data components
Event Driven dependencies
Plug n Play integration, loosely coupled (Cluster Resources, Data)
Summarizing
Big data requires a different mindset: Innovate, iterate often and
keep it simple.
E N G I N E E R I N G . O N E K I N G S L A N E . C O M
Thank you.
C O N T R I B U T O R S :
Maria Latushkin (CTO, One Kings Lane)
Joana Koiller (Senior Product Designer, One Kings Lane)