Upload
jason-shih
View
110
Download
4
Embed Size (px)
DESCRIPTION
Cloudera Impala is an open-source under Apache Licence enable real-time, interactive analytical SQL queries of the data stored in HBase or HDFS. The work was inspired by Google Dremel paper which is also the basis for Google BigQuery. It provide access same unified storage platform base on it's own distributed query engine but does not use mapreduce. In addition, it use also the same metadata, SQL syntax (HiveQL-like) ODBC driver and user interface (Hue Beeswax) as Hive. Besides the traditional Hadoop approach, aim to provide low-cost solution for resiliency and batch-oriented distributed data processing, we found more and more effort in the Big Data world pursuing the right solution for ad-hoc, fast queries and realtime data processing for large datasets. In this presentation, we'll explore how to run interactive queries inside Impala, advantages of the approach, architecture and understand how it optimizes data systems including also practical performance analysis.
Citation preview
Real-time Big Data Analytics Engine using Impala
Jason Shih Etu 28 Sept, HIT 2013
Outline
• Motivation & Users’ perspective • Impala architecture and data analytics stack Overview • Performance benchmark • Use Cases (Demo)
HIT 2013 2
Motivation & Users’ Perspective • Leverage existing Hadoop deployment
• Reuse HIVE metadata, metastore, DLL & JDBC/ODBC drivers. • File format widely support in Hadoop • Read performance: disk awareness and short-circuit
• MPP SQL query engine (over Hadoop) • billion to trillion records at interactive speeds • Both analytical & transactional • General purpose & ad-hoc
• MR • High latency, dismissed for interactive workload • Disk-based intermediated outputs • Execution strategies (lack of optimization base on data statistics) • Task and scheduling overhead
• Task launch delay 5~10sec (pre-defined delay due to the periodic heartbeat for new scheduled tasks).
HIT 2013 3
Motivation & Users’ Perspective (cont’)
• High performance • In memory query engine • C++ instead of JAVA • Runtime code generation • Completely new execution engine (cf. MR framework) • Data locality and short-circuit read
• HDFS-2246: avoid HDFS API overhead • HDFS-34: Making Short-Circuit Local Reads Secure
• Intermediate data never hits disk • Data stream to client
HIT 2013 4
Motivation & Users’ Perspective (cont’)
• MPP-RDB Paradigm • HDFS:
• Scalability & Availability • Price Performance & Commodity
• MPP DW appliance: • Exadata, Vertica, HANA, Aster (SQL-MapReduce), HWAQ (Pivotal
HD) & Dremel etc. • Pros:
• Very matured & highly optimized engine • Cons
• Generally not fault-tolerance ! For long run queries when cluster scale-up ! Lack rich analytics (machine learning)
HIT 2013 5
• Impala • Real-time queries in Apache Hadoop sit atop HDFS. • ~2010-2012, 7 FTE (Marcel Kornacker) • Completely open source, ASLv2 • GA: connectors for BI, DW general available
Google F1 - The Fault-Tolerant Distributed RDBMS, May 2012
6 Ref: http://www.wired.com/wiredenterprise/2012/10/cloudera-impala-hadoop/
Impala Overview: SQL Support
• Functionality highlight: • SQL-92 features minus correlated subqueries • SELECT, INSERT INTO, , SELECT ... INSERT INTO … VALUES(…) • ORDER BY requires LIMIT • Flexible file format: RCFile
• Unsupported/Limitation • WITH clause does not support recursive queries in the WITH • Only hash join
• Joined tables has to fit in aggregated memory of all executing nodes • No beyond SQL
• buckets, samples, transforms, array, structs, maps, xpath and json • UDF support
• Impala 1.2: Support HIVE UDFs (existing jars without recompile) • Impala native UDF/UDA and UDF/UDA register in metadata catalog
HIT 2013 7
Impala SQL: create table
HIT 2013 8
Ref: SQL Language Element: http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_langref_sql.html
Architecture Overview • Two daemons:
• impalad: • Run on all HDFS DNs • Functions as distributed query engine • Handle client and internal requests (query exec) • Design execution plan for queries and processes query on DNs • Thrift services for these two roles
• statestored: • Cluster metadata, name service & metadata distribution
– cf. HIVE metastore: RDB metadata • Metadata updated when add/delete impalad processes • Daemon cache metadata (INVALIDATE METADATA or REFRESH) • Export thrift service • Send periodic heartbeats, check for live backend and pushes new data • Fail of statestore wont affect query execution except for stale state of DN
HIT 2013 9
Architecture Overview: Impala daemons • Impalad:
• Impala 1.1 integrate Sentry for fine-grained authorization framework • Daemon startup arg (default):
• impalad -log_dir=/opt/impala/var/log/impala -state_store_port=24000 -state_store_host=impala-server -be_port=22000
• Enabled security • Rely on existing Kerberos subsystem for authentication framework • -use_statestore -kerberos_reinit_interval=60 -principal=impala/impalad-
[email protected] -keytab_file=impala.keytab • Authorization:
• -authorization_policy_file arg., feed with .ini fmt • divide into [groups] & [roles] (opt: [databases] & [users]) • [users] will override OS-level mapping of users to groups. • E.g.:
• Statestored: • daemon startup:
• statestored -log_dir=/opt/impala/var/log/impala -state_store_port=24000 • Enable Kerberos:
• -kerberos_reinit_interval=60 –principal=impala/[email protected] -keytab_file=impala.keytab
• Available flags: • http://statestored-server:25010/varz
HIT 2013 10
Architecture Overview (cont’)
• Query execution phases • Planner, coordinator, executor • Queries arrive via JDBC/ODBC, Thrift API/CLI, Hue/Beeswax • Planner turns request into collections of plan fragments • Coordinator initiates execution on impalad(s) local to data
HIT 2013 11
Architecture Overview: Query Execution • Plan fragments upon request from JDBC/ODBC or thrift client • Initiate execution on impalad by coordinator • Intermediate result: streamed between impalad • Results are streamed back to client
12
Architecture Overview: Query Plan
HIT 2013
• Plan node & operators: • Depth-first execution tree • Scan, HashJoin, HashAggr, Union, TopN, Exchange
• Two phases processes • Single node plan (left-deep tree) • Plan fragments: Partitioning operator tree
• Fragment: distributed atomic executable unit (plan nodes) • Distributed plans:
• Query operators are fully distributed • Max. scan locality & min. data movement
• Parallel joins: • Order: FROM clause • Broadcast join & partitioned join • Future roadmap: cost-based optimization based on column stats & cost of data
transfers
13
Architecture Overview: Query Plan (cont’)
HIT 2013 14
Logging and Profile • Impala logs:
• Logging level control by • GLOG_v env: “GLOG”
– Default level = 1, connection logging and execution profile – Level 2 logged ea. RPC initiated and execution progress info – Everything plus logging of every row read in 3rd level.
• -logbuflevel daemon startup flag. • Exam:
• $IMPALA_HOME/var/log/impala/{impalad,statestore}.{INFO,WARNING,ERROR} • Consolidate: impala-server.log & impala-state-store.log • http://impalad-server:25000/logs
• Content: • Startup opt: CPU, available spindles, flags, version and machine info • Query profile: composition, degree of data locality, throughput statistics and responding
time. • Auditing log featured in release 1.1.1
• Extensive analytics data for query execution: • query profile stored in zlib-compressed fmt: • $IMPALA_HOME/var/log/impala/profiles • http://impalad-server:25000/queries
HIT 2013 15
Performance Tip • Partitioning
• Large table & always or almost always queried with conditions on the partitioning columns
• JOIN • Broadcast join by default. • Partitioned join
• suitable for large tables of roughly equal size • subsets of rows can be processed in parallel by sending portion of each
tables • Join the biggest table first • Joining the table with the most selective filter
• INSERT • not suitable for loading large quantities of data into HDFS-based tables, due to
the lack of parallelized operations • Staging temporary files in an ETL pipeline and upload to HDFS (refresh)
• Resource usage: • Impalad startup flag: “-mem_limits” 16
Troubleshooting Hint • Queries are slow?
• Test: “select count(*) from table” • Non-zero “Total remote scan volume” shown in impalad log indicate either
some DNs not running impalad or impalad instance fail to contact one or more impalad instances.
• Missing impalad instances from DN • live backend: http://statestore:25010/metrics
• Data locality and native checksuming (>= CDH 4.2) • Enable properties: “dfs.client.read.shortcircuit”
&“dfs.client.read.shortcircuit.skip.checksum” • Rebuild/reinstall hadoop native lib “libhadoop.so” if needed. • Error:
– Unknown disk id. This will negatively affect performance. Check your hdfs settings to enable block location metadata
– Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
HIT 2013 17
Troubleshooting Hint (cont’) • Queries getting slower?:
• Impalad paging after mem exceeded • E.g.: mem-limit.h:86] Query: 0:0Exceeded limit: limit=26996031488 consumption=26996148624
• Incorrect result? • Invalid metadata (GA: REFRESH, post-GA: INVALID METADATA)
• Invalid query? • Cross check the query in HIVE • Useful debugging info from impala service logs. • Invalid/unsupported stmt:
• http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_langref.html#langref
• Auth error: • Server logging:
• Minor code may provide more information (Cannot contact any KDC for realm or Kerberos: • GSSAPI Error: Unspecified GSS failure
• Client: “Error connecting: <class 'thrift.transport.TTransport.TTransportException'>, TSocket read 0 bytes” • Ensured
• valid Kerberos ticket lifetime at client • Specify “-s” service principal and flag “–k” aim for kerberized impalad connection.
HIT 2013 18
Limitation and Wish List • limitation:
• Subquery referenced in the SELECT • Optional WITH clause before the INSERT.
• Recursive queries in the WITH clauses • Inconsistent VIEW
• parenthesis in WHERE clauses
• Wish list • SQL modeling tool • Fault tolerance query • Memory management (caching parquet table) & usage estimation
• Aggregation group of columns (> 30 etc.)
HIT 2013 19
Impala: Now & Future Roadmap • Now (1.1.x/1.0)
• OS Support: • RHEL/CentOS 5.7, Ubuntu, Debian, SLES, and Oracle Linux
• Connecters: JDBC/ODBC drivers • DDL support & SQL performance optimization • Fast & memory efficient: join & aggregation • File format: Parquet, Avro & LZO compressed
• Future (1.2) – late 2013 • UDF and extensibility • Automatic metadata refresh • In-memory HDFS caching • Cost-base join order optimization • Preview of YARN-integrated resource manager
• 2.0 Roadmap – first 3rd of 2014 • SQL 2003-compliant analytic window functions • Additional authentication mechanisms • UDTFs (user-defined table functions) • Intra-node parallelized aggregations and joins • Nested data • YARN-integrated resource manager • Additional data types – including Date and Decimal types
HIT 2013 20
More Information & Related Works • “Dremel: Interactive Analysis of Web-Scale Datasets”, Sergey Melnik et
al., Google • Cloudera Impala: Real-Time Queries in Apache Hadoop, For Real
http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-queries-in-apache- hadoop-for-real/
• “Impala unlocks Interactive BI on Hadoop with MicroStrategy”, Justin Erickson & Jochen Demuth, Cloudera
• “Cloudera impala Performance Evaluation”, Yukinori SUDA • “HANA vs Impala, on AWS Cloud”, Aron MacDonald • “Spark and Shark: High-speed In-memory Analytics over Hadoop Data”,
Reynold Xin, AMPLab • Stinger Initiative http://hortonworks.com/blog/100x-faster-hive/ • Apache Drill: http://incubator.apache.org/drill/
HIT 2013 21
Performance Evaluation
0 20 40 60 80 100
Shark
Impala
PIG
Elephant
Km/h
Ref: Wiki & http://www.speedofanimals.com
Breakdown of DNS Anomaly Analytics
HIT 2013 23
Two DN + Master - Dual DC E5620 2.40GHz - MEM 32GB ea. - 4 spindles, 2T ea.
HD
FS (GB
) Q
uery
Res
p.(s
ec)
Data Volume and Ingest
HIT 2013
1D 1W 1M 2M
Data (Raw) (GB) 5.1 35 140 280
Data (HDFS) (GB) 3.8 25.9 103.6 207.2
Blocks (HDFS) 31 211 844 1598
MEvt 42 291 1,166 2,209
24
PIG vs. Impala
• Domain level compute in preprocessing streaming. • DN sort throughput: ~120MB/s throughput & SIP/Qry ~ 50MB/s. • Processing time scale linearly with data vol.
HIT 2013 27
Query Resp. (sec) Impala: 71s
7 times faster.
Observation & Estimation • Speed-up: 4.5~7 times • DL Calc.: 57~70% memory usage • Data ingest
! Est. ~3TB take ~55K sec. • Plus pre-processing time
! Throughput constrain to GbE linkage (in/out bound) ! Avg. throughput ~80MB/s
• non-encrypted file transfer
• RTQ: ~15K sec for 3TB process ! c.f. 115K base on MR
HIT 2013 28
Query Throughput & Latency
• Queries • 20 from TPC-DS • 3 categories
• Interactive: 1month • Reports: several months • Deep analytics: all data
• Fact table: • 1TB snappy-seq.-files/5Yr
• Resource level: • 20 nodes, 24cores/node.
• Speed-up: • Interactive: 25~68 • Reports: 6~56 • Deep analytics: 6~55
29 Ref: “Impala: A Modern, Open-Source SQL Engine for Hadoop”, Marcel Kornacker, Cloudera
Impala vs. Stinger • Stinger
• Optimize execution plan • TEZ framework optimize execution
• Columnar file format
30 Ref: Cloudera Impala Overview, Scott Leberknight, Cloudera.
Impala Use Cases
Offloads DW for ad hoc query environment, ETL and archiving Interactive BI/analytics on large volume of data Real-time response for unstructured data analysis
Impala and HIVE
HIT 2013 32
• Impala: • Native MPP query engine for low
runtime overhead & interactive SQL • No fault tolerance • GA: UDF supported
• HIVE • MapReduce as an execution engine • Fault-tolerant leveraging MR framework • High runtime overhead (extensive
layering) • UDF
• Common for client: • SQL syntax
• highly compatible with HiveQL • ODBC/JDBC drivers • Metadata (table definition) • HUE
Data Warehouse Offload
33 Ref: Hadoop and the Data Warehouse: When to Use Which, Teradata
Query Run Times • Table with 60M Records
34 Ref: HANA vs Impala, on AWS Cloud
TPC-H Query Run Times • Lineitem table 60M Rows
35 Ref: HANA vs Impala, on AWS Cloud
• On-demand Customer Segmentation based on various demographic and mobile behavior attributes
• On-demand Customer Profiling through fast screening & ranking of critical attributes With the power of distributed in-memory computation on hadoop, Impala enables market analyst to conduct various interactive analytics such as OLAP, statistical correlation, and data mining on big data.
HIT 2013 36
「 標族群 」關聯屬性分析
� 33% �� 28% �� 27%
F� 12%
Facebook 43% Twitter 31%
Google+19%
LinkedIn 7%
�� 27%
�� 23%
K� 39% F� 11%
Facebook 44% Twitter 30%
Google+17%
LinkedIn 9%
[4 53%
(4 47%
[4 56% (4 44%
���app 28%
Co�app 17% 1��app 23%
XO�app 18%
.��app 14%
���app 25%
Co�app 14%
1��app 20%
-%DApp 33% .��app 10%
�E.�
1#.�
$(7F $82�
– A>��3; q 4'9=
39
DEMO
• CREATE TABLE, LOAD DATA from HDFS DROP TABLE IF EXISTS demo; CREATE EXTERNAL TABLE demo ( a string, b int, c int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '/user/etu/demo';
• PIG & Impala: • SUM • SUM with GROUP BY
HIT 2013 40
DEMO (cont’)
• SUM in PIG: a = load 'demo/demo_data.csv' using PigStorage(',') as (col1:chararray, col2:int, col3:int); b = foreach a generate col1, col2, col3, 1 as col4; d = group b by col4; d1 = foreach d generate SUM(b.col4); store d1 into 'demo/count2' using PigStorage(',');
• SUM in Impala: SELECT sum(demo.c) FROM demo;
HIT 2013 41
DEMO (cont’)
• SUM with GROUP BY in PIG a = load 'demo/demo_data.csv' using PigStorage(',') as (col1:chararray, col2:int, col3:int); b = foreach a generate col1, col2, col3, 1 as col4; c = group b by col1; c1 = foreach c generate group, SUM(b.col2); store c1 into 'demo/count1' using PigStorage(',');
• SUM with GROUP BY in Impala SELECT demo.a AS tag, sum(demo.b) AS val FROM demo GROUP BY demo.a;
HIT 2013 42
DEMO (cont’)
• Speed-up:
HIT 2013 43
Query Resp. (sec)
X 60
X 18
Two DN, same spec for DNS log analytics. Dual DC E5620, MEM 32GB ea. ~100 time faster when cluster scale.
44
Question? [email protected]
Slideshare
www.slideshare.net/hlshih/hit2013-impala-0925etu
Acknowledgement Dr. CM Fan, MFactory, SYSTEX
www.etusolution.com [email protected] Taipei, Taiwan 318, Rueiguang Rd., Taipei 114, Taiwan T: +886 2 7720 1888 F: +886 2 8798 6069 Beijing, China Room B-26, Landgent Center, No. 24, East Third Ring Middle Rd., Beijing, China 100022 T: +86 10 8441 7988 F: +86 10 8441 7227
Contact