View
469
Download
1
Category
Preview:
Citation preview
UnifiedData Analytics Platform(with Zeppelin, Ambari, Geode, SpringXD and
HAWQ)
by Christian Tzolov@christzolov
Whoami
Christian TzolovTechnical Architect at Pivotal, BigData, Hadoop, SpringXD,Apache Committer, Crunch PMC member
ctzolov@pivotal.ioblog.tzolov.net@christzolov
Contents• DEMO• Zeppelin Interpreters
• PSQL (to became JDBC in 0.6.x)• Geode• SpringXD
• Apache Ambari • Zeppelin Service • Geode, HAWQ and Spring XD services• Webpage Embedder View
Demo: Twitter Streams with SpringXD, Geode
and HAWQ
Technical Stack
Apache HDFS Data Lake - PHD or HDP HadoopApache HAWQ SQL on Hadoop (OLAP)Apache Geode In-memory data grid (OLTP)Spring XD Integration and Streaming RuntimeApache Ambari Manages All ClustersApache Zeppelin Web UI for interaction with Data Systems
Hadoop/HDFS
Geode HAWQ
SpringXD
Ambari
Zeppelin
Spring XDOrchestrates and automates all steps across multiple data stream pipelines
• HTTP• Tail• File• Mail• Twitter• Gemfire• Syslog• TCP• UDP• JMS• RabbitMQ• MQTT• Kafka• Reactor TCP/UDP
• Filter• Transformer• Object-to-JSON• JSON-to-Tuple• Splitter• Aggregator• HTTP Client• Groovy Scripts• Java Code• JPMML Evaluator• Spark Streaming
• File• HDFS• JDBC• TCP• Log• Mail• RabbitMQ• Gemfire• Splunk• MQTT• Kafka• Dynamic Router• Counters
Apache Geode• Cache - Performance / Consistency /
Resiliency
• Region - Highly available, redundant, distributed Map
China Railway Corporation
5,700 train stations4.5 million tickets per day20 million daily users1.4 billion page views per day40,000 visits per second
Indian Railways7,000 stations72,000 miles of track23 million passengers daily120,000 concurrent users10,000 transactions per minute
Apache HAWQ• Built around a Greenplum MPP DB
• 100% ANSI SQL compliant: SQL-92/99/2003…
• ODBC and JDBC
• Hadoop Native: Parquet, HDFS and YARN
• Extensible - Web Tables, PXF
• TPC-DS outperforms Impala by overall 454%
Demo
tweets = twittersearch --query=<keywork> | hdfs --directory=/user/zeppelin/xd/tweets
geodeTap = tap:stream:tweets > gemfire-json-server --regionName=regionTweet
hawqTap = tap:stream:tweets > transform --script=tweetJsonToTsv.groovy | gpfdist --table=xdsink
tweetsCount = tap:stream:tweets > json-to-tuple | transform --expression='payload.id_str' | counter
SpringXD Interpreter(s)
• %xd.stream and %xd.job
• Multiple streams or jobs in a paragraph.
• Special Deploy/Launch Semantics
• Zeppelin Dynamic Forms (${…})
• Comprihensive Stream and Job DSL auto-completion (Ctrl+.)
SpringXD Conf
PSQL Interpreter• Prefix: %psql.sql
• PostgreSQL, HAWQ/PXF, Greenplum … JDBC
• PSQL command line shell (via %sh)
• Zeppelin Dynamic Forms (${…})
• Comprihensive SQL/JDBC autocompletion (Ctrl+.)
PSQL Configuration
PSQL Doc
https://zeppelin.incubator.apache.org/docs/0.5.5-incubating/interpreter/
postgresql.html
PSQL/HAWQ Demo
• http://10.68.58.121:9995/#/notebook/2B2ZYS18Y
Geode Interpreter• Prefix: %geode.oql
• OQL and PDX nested access (user.name)
• Geode command line shell (via %sh)
• Zeppelin Dynamic Forms (${…})
• Basic OQL auto-completion (Ctrl+.)
Geode Configuration
Geode Doc
https://zeppelin.incubator.apache.org/docs/0.5.5-incubating/interpreter/geode.html
Geode Tutorial
• http://10.68.58.121:9995/#/notebook/2AW57BUN4
Apache AmbariZeppelin, Geode, HAWQ, SpringXD Services …
Ambari Services
Ambari Services• Ambari Zeppelin Service: github , rpm, blog
• Ambari Geode Service: github, rpm
• Ambari SpringXD Service: github
• Ambari HAWQ Service (Pivotal BDS dist)
Ambari Blueprint
http://<ambari>:8080/api/v1/clusters/mv10?format=blueprint
Webpage Ebedder
https://github.com/tzolov/ambari-webpage-embedder-view
stay in touchctzolov@pivotal.ioblog.tzolov.net@christzolovhttps://nl.linkedin.com/in/tzolov
Recommended