Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Copyright © 2015 KNIME.com AG
Integrating Big Data is as easy as 1,2,3 … 4!
Tobias Kötter
KNIME.com
Copyright © 2015 KNIME.com AG 2
Variety, Volume, Velocity
Copyright © 2015 KNIME.com AG
Variety, Volume, Velocity
Variety:• integrating heterogeneous data (and tools)
Volume:• from small files...
• ...to distributed data repositories (Hadoop)
• bring the tools to the data
Velocity:• from distributing computationally heavy
computations...
• ...to real time scoring of millions of records/sec.
3
Copyright © 2015 KNIME.com AG
Every Minute…
4
Copyright © 2015 KNIME.com AG
IoT
5
Copyright © 2015 KNIME.com AG 6
The Challenge
Copyright © 2015 KNIME.com AG
Energy Usage Prediction from Smart Meters Data
• Read Smart Meter Energy Data
• Clean Up and Aggregate total Energy Usage by hour, week, day, month, year
• Calculate Behavioral Measures for each Smart Meter
• Cluster Smart Meters with Similar Behavior (k-Means)
• Predict Energy Usage in Clustered Smart Meters (Auto-Regressive Time Series Prediction)
7
Workflow 1
Workflow 2
Workflow 3
Copyright © 2015 KNIME.com AG
Workflow 1: PrepareData
8
~ 2 days
Copyright © 2015 KNIME.com AG 9
Big Data
Copyright © 2015 KNIME.com AG
Big Data Support
• KNIME Big Data Access Nodes
– in database processing
– preconfigured connectors
• Big Data Platforms
– HDFS, Hive, Impala, HP Vertica, Hortonworks, ParStream, any big data platform really!
• Spark MLlib integration (coming soon)
• Streaming Executor (coming soon)
Copyright © 2015 KNIME.com AG
Virtual Machines
• Hortonworks:
http://hortonworks.com/products/hortonworks-sandbox/
• Cloudera:http://www.cloudera.com/content/cloudera/en/downloads/quickstart_vms.html
• Virtual Box
https://www.virtualbox.org/
• VMWare Player
http://www.vmware.com/
11
Copyright © 2015 KNIME.com AG
Accessing Big Data: Database Connector
Generic Database Connector
– Can connect to any JDBC source
– Register new JDBC driver via preferences page
12
Copyright © 2015 KNIME.com AG
Register JDBC Driver
13
Open KNIME and go toFile -> Preferences
Increase connection timeout for long running retrieval operations
Copyright © 2015 KNIME.com AG
Accessing Big Data: Dedicated Connectors
Dedicated pre-configured connectors
– Bundling necessary JDBC drivers
– Easy to use
– DB specific behavior/capability
Some dedicated connectors are part of the KNIME Analytics Platform, some belong to the commercial KNIME Big Data Extension
14
works for most Hadoop HIVE installations, including Hortonworks
free
Copyright © 2015 KNIME.com AG
Dedicated Connector
15
Accessing Big Data: Dedicated Connectors
Copyright © 2015 KNIME.com AG
Data Table Selection
16
Copyright © 2015 KNIME.com AG 17
In-Database Processing
Copyright © 2015 KNIME.com AG
Manipulation
• Filter rows and columns
• Join tables/queries
• Sort your data
• Write your own query
• Aggregate* your data
18
Similar Settings as GroupBy node
Similar Settings as Joiner node
* Database GroupBy node exposes DB specific aggregation methods
Copyright © 2015 KNIME.com AG
Adding SQL Queries for average Measures
19
Copyright © 2015 KNIME.com AG 20
Copyright © 2015 KNIME.com AG
Average Monthly Values
21
Copyright © 2015 KNIME.com AG
Import Data from Database into KNIME
22
< 30 min
Copyright © 2015 KNIME.com AG
New Big Data Platform?
23
No problem!Just change the connector node!
Copyright © 2015 KNIME.com AG
Other Useful Database Nodes
• Drop table
– missing table handling
– cascade option
• Execute any SQL statement e.g. DDL
• Manipulate existing queries
24
Executes severalqueries separatedby ; and new line
Copyright © 2015 KNIME.com AG 25
KNIME Big Data Extension
Copyright © 2015 KNIME.com AG
KNIME Big Data Extension
• KNIME Big Data Access Nodes
– preconfigured connectors
– HDFS File Handling
– Hive/Impala Loader
• Big Data Platforms
– HDFS, Hive, Impala, HP Vertica, Hortonworks, ParStream, SAP Hana (to be), Teradata (to be), …
• Spark MLlib integration (coming soon)
• Streaming Executor (coming soon)
Copyright © 2015 KNIME.com AG
HDFS File Handling
• KNIME & Extensions -> KNIME File Handling Nodes
• HDFS Connection and HDFS File Permission nodes
27
Copyright © 2015 KNIME.com AG
Hive/Impala Loader
28
• Upload a KNIME data table to Hive/Impala
Copyright © 2015 KNIME.com AG
KNIME Big Data Extension: Download and Install
KNIME.com Extension Store
License Required!
Installation Instructions
http://tech.knime.org/installation-instructions
Product Description
http://www.knime.org/knime-big-data-extension
Copyright © 2015 KNIME.com AG
License on KNIME Store
http://tech.knime.org/knime-store
30-day trial license available with special Promotion [email protected]
Copyright © 2015 KNIME.com AG
Thank You!
• We are hiring
– Java Hadoop/BigData developers!
– Senior Software Engineer - Web Development
• Whitepaper “KNIME opens the Doors to Big Data”http://www.knime.org/files/big_data_in_knime_1.pdf
• Blog Post “Integrating Big data is as Easy as 1,2,3,4”http://www.knime.org/blog/integrating-big-data-is-as-easy-as-1-2-3-4
31