Apache Kudu - Indico€¦ · KUDU tries to fill the gap • HBASE (on HDFS) excels at – Fast...

Apache Kudu

Zbigniew Baranowski

What is KUDU?

• New storage engine for structured data (tables) – does not use HDFS!

• Columnar store

• Mutable (insert, update, delete)

• Written in C++

• Apache-licensed – open source– Quite new ->1.0 version recently released

• First commit on October 11th, 2012

– …and immature?

KUDU tries to fill the gap

• HBASE (on HDFS) excels at– Fast random lookups by key– Making data mutable

HDFS excels at• Scanning of large amount of data at speed• Accumulating data with high throughput

Table oriented storage

• A Kudu table has RDBMS-like schema– Primary key (one or many columns),

• No secondary indexes

– Finite and constant number of columns (unlike HBase)

– Each column has a name and type • boolean, int(8,16,32,64), float, double, timestamp, string,

binary

• Horizontally partitioned (range, hash) – partitions are called tablets

– tablets can have 3 or 5 replicas

Data Consistency

• Writing– Single row mutations done atomically across all columns

– No multi-row ACID transactions

• Reading– Tuneable freshness of the data

• read whatever is available

• or wait until all changes committed in WAL are available

– Snapshot consistency• changes made during scanning are not reflected in the results

• point-in-time queries are possible– (based on provided timestamp)

Kudu simplifies BigData deployment model for online analytics (low latency ingestion and access)

• Classical low latency design

Stream Source

Staging area

Flush periodically

Big Files

Events

Indexed data

Flush immediately

Batch processing

Fast data access

Stream SourceStream Source

Stream Source

Implementing low latency with Kudu

Stream Source

EventsStream SourceStream Source

Stream Source

Batch processing

Fast data access

Kudu Architecture

Architecture overview

• Master server (can be multiple masters for HA)

– Stores metadata - tables definitions

– Tablets directory (tablets locations)

– Coordinates the cluster reconfigurations

• Tablet servers (worker nodes)

– Writes and reads tablets

• Stored on local disks (no HDFS)

– Tracks status of tablets replicas (followers)

• Replicates the data to followers

TabletID Leader Follower1 Follower2

TEST1 TS1 TS2 TS3

TEST2 TS4 TS1 TS2

TEST3 TS3 TS4 TS1

TabletID Leader Follower1 Follower2

TEST1 TS1 TS2 TS3

TEST2 TS4 TS1 TS2

TEST3 TS3 TS4 TS1

Tables and tabletsTabletID Leader Follower1 Follower2

TEST1 TS1 TS2 TS3

TEST2 TS4 TS1 TS2

TEST3 TS3 TS4 TS1

TabletServer1 TabletServer2 TabletServer3 TabletServer4

Master

Map of table TEST:

Leader TEST1 TEST1 TEST1 Leader TEST2

TEST2 TEST2 Leader TEST3 TEST3

Data changes propagation in Kudu (Raft Consensus - https://raft.github.io)

Client

Master

Tablet server X

WALTablet 1 (leader)

Tablet server Y

WALTablet 1 (follower)

Tablet server Z

WALTablet 1 (follower)

Commit

Commit Commit

Tablets Server

Insert into tablet (without uniqueness check)

INSERT

MemRowSet

DiskRowSet1 (32MB) PK {min, max}

B+treeRow: Col1,Col2, Col3

Row1,Row2,Row3Leafs sorted by

Primary Key

Col1 Col2 Col3

Columnar store encoded similarly to ParquetRows sorted by PK.

Bloom filters

Bloom filters for PK ranges. Stored in cached btree

There might be Ks of sets per tablet

Interval tree Interval tree keeps

track of PK ranges within DiskRowSets

DiskRowSet2 (32MB) PK {min, max}

Col1 Col2 Col3 Bloom filters

DiskRowSet compaction

• Periodical task

• Removes deleted rows

• Reduces the number of sets with overlapping PK ranges

• Does not create bigger DiskRowSets– 32MB size for each DRS is preserved

DiskRowSet1 (32MB) PK {A, G}

DiskRowSet2 (32MB) PK {B, E}

DiskRowSet1 (32MB) PK {A, D}

DiskRowSet2 (32MB) PK {E, G}

Compact

Column1

How columns are stored on disk (DiskRowSet)

Values

Page metadata

ValuesPage metadata

Values

Page metadata

ValuesPage metadata

Values

Page metadata

Size 256KB

ValuesPage metadata

maps row offsets to pages

Column2

Column3

maps PK to row offset

Pages are encoded with a

variety of encodings, such

as dictionary encoding,

bitshuffle, or RLE

Pages can be compressed:

Snappy, LZ4 or ZLib

Kudu deployment

3 options for deployments

• Build from source

• Using RPMs– 1 core rpms

– 2 service rpms (master and servers)

– One shared config file

• Using Cloudera manager– Click, click, click, done

Interfacing with Kudu

Table access and manipulations

• Operations on tables (NoSQL)

– insert, update, delete, scan

– Python, C++, Java API

• Integrated with

– Impala & Hive(SQL), MapReduce, Spark

– Flume sink (ingestion)

Manipulating Kudu tables with SQL(Impala/Hive)

CREATE TABLE `kudu_example` (`runnumber` BIGINT,`eventnumber` BIGINT,`project` STRING,`streamname` STRING,`prodstep` STRING,`datatype` STRING,`amitag` STRING,`lumiblockn` BIGINT,`bunchid` BIGINT,)DISTRIBUTE BY HASH (runnumber) INTO 64 BUCKETSTBLPROPERTIES(

'storage_handler' = 'com.cloudera.kudu.hive.KuduStorageHandler','kudu.table_name' = ‘example_table','kudu.master_addresses' = ‘kudu-master.cern.ch:7051','kudu.key_columns' = 'runnumber, eventnumber'

• Table creation

• DMLsinsert into kudu_example values (1,30,'test',….);insert into kudu_example select * from data_parquet;update kudu_example set datatype='test' where runnumber=1;delete from kudu_example where project='test';

• Queries

select count(*),max(eventnumber) from kudu_example where datatype like '%AOD%‘ group by runnumber;

select * from kudu_example k, parquet_table p where k.runnumber=p.runnumber ;

Creating table with Java

import org.kududb.*

//CREATING TABLE

String tableName = "my_table";

String KUDU_MASTER_NAME = "master.cern.ch"

KuduClient client = new KuduClient.KuduClientBuilder(KUDU_MASTER_NAME).build();

List<ColumnSchema> columns = new ArrayList();

columns.add(new ColumnSchema.ColumnSchemaBuilder("runnumber",Type.INT64). key(true).encoding(ColumnSchema.Encoding.BIT_SHUFFLE).nullable(false).compressionAlgorithm(ColumnSchema.CompressionAlgorithm.SNAPPY).build());

columns.add(new ColumnSchema.ColumnSchemaBuilder("eventnumber",Type.INT64). key(true).encoding(ColumnSchema.Encoding.BIT_SHUFFLE).nullable(false).compressionAlgorithm(ColumnSchema.CompressionAlgorithm.SNAPPY).build());

……..

Schema schema = new Schema(columns);

List<String> partColumns = new ArrayList<>();

partColumns.add("runnumber");

partColumns.add("eventnumber");

CreateTableOptions options = new CreateTableOptions().addHashPartitions(partColumns, 64).setNumReplicas(3);

client.createTable(tableName, schema,options);

……..

Inserting rows with Java

//INSERTING

KuduTable table = client.openTable(tableName);

KuduSession session = client.newSession();

Insert insert = table.newInsert();

PartialRow row = insert.getRow();

row.addLong(0, 1);

row.addString(2,"test")

session.apply(insert); //stores them in memory on client side (for batch upload)

session.flush(); //sends data to Kudu

……..

Scanner in Java

//configuring column projection

List<String> projectColumns = new ArrayList<>();

projectColumns.add("runnumber");

projectColumns.add("dataType");

//setting a scan range

PartialRow start = s.newPartialRow();

start.addLong("runnumber", 8);

PartialRow end = s.newPartialRow();

end.addLong("runnumber",10);

KuduScanner scanner = client.newScannerBuilder(table)

.lowerBound(start)

.exclusiveUpperBound(end)

.setProjectedColumnNames(projectColumns)

.build();

while (scanner.hasMoreRows()) {

RowResultIterator results = scanner.nextRows();

while (results.hasNext()) {

RowResult result = results.next();

System.out.println(result.getString(1)); //getting 2nd column

Spark with Kudu

wget http://central.maven.org/maven2/org/apache/kudu/kudu-spark_2.10/1.0.0/kudu-spark_2.10-1.0.0.jar

spark-shell --jars kudu-spark_2.10-1.0.0.jar

import org.apache.kudu.spark.kudu._

// Read a table from Kudu

val df = sqlContext.read.options(

Map("kudu.master"->“kudu_master.cern.ch:7051“,

"kudu.table" ->“kudu_table“)s).kudu

// Query using the DF API...

df.select(df("runnumber"),df("eventnumber"),df("db0")).filter($"runnumber"===169864).filter($"eventnumber"===1).show();

// ...or register a temporary table and use SQL

df.registerTempTable("kudu_table")

sqlContext.sql("select id from kudu_table where id >= 5").show()

// Create a new Kudu table from a dataframe schema

// NB: No rows from the dataframe are inserted into the table

kuduContext.createTable("test_table", df.schema, Seq("key"), new CreateTableOptions().setNumReplicas(1))

// Insert data

kuduContext.insertRows(df, "test_table")

Kudu Security

To be done!

Performance(based on ATLAS EventIndex case)

Average row length• Very good compaction ratio

– The same like parquetEach row consists of 56 attributes• Most of them are strings• Few integers and floats

777890

171 189

314217

kudu parquet hbase avro

No compression Snappy GZip-like

length in CSV

Insertion rates (per machine, per partition) with Impala

• Average ingestion speed

– worse than parquet

– better than HBase

Random lookup with Impala

• Good random data lookup speed

– Similar to Hbase

0.27 0.62 0.56

0.45 0.86 0.4

0.32 0.89 0.5

Data scan rate per core with a predicate on non PK column (using Impala)

• Quite good data scanning speed

– Much better than HBase

– If natively supported predicates operations are used it is even faster than parquet

kudu parquet hbase

Kudu monitoring

Cloudera Manager

• A lot of metrics are published though servers http

• All collected by CM agents and can be plotted

• Predefined CM dashboards

– Monitoring of Kudu processes

– Workload plots

• CM can be also used for Kudu configuration

CM – Kudu host status

CM - Workload plots

CM - Resource utilisation

Observations & Conclusions

What is nice about Kudu

• The first one in Big Data open source world trying to combine columnar store + indexing

• Simple to deploy• It works (almost) without problems• It scales (this depends how the schema is designed)

– Writing, Accessing, Scanning

• Integrated with Big Data mainstream processing frameworks– Spark, Impala, Hive, MapReduce– SQL and NoSQL on the same data

• Gives more flexibility in optimizing schema design comparing to HBase (to levels of partitioning)

• Cloudera is pushing to deliver production-like quality of the software ASAP

What is bad about Kudu?

• No security (it should be added in next releases)– authentication (who connected)– authorization (ACLs)

• Raft consensus not always works as it should– Too frequent tablet leader changes (sometime leader cannot be

elected at all)– Period without leader is quite long (sometimes never ends)– This freezes updates on tables

• Handling disk failures– you have to erase/reinitialize entire server

• Only one index per table• No nested types (but there is a binary type)• Cannot control tablet placement on servers

When to Kudu can be useful?

• When you have structured ‘big data’– Like in a RDBMS– Without complex types

• When sequential and random data access is required simultaneously and have to scale– Data extraction and analytics at the same time– Time series

• When low ingestion latency is needed– and lambda architecture is too expensive

Learn more

• Main page: https://kudu.apache.org/• Video: https://www.oreilly.com/ideas/kudu-resolving-

transactional-and-analytic-trade-offs-in-hadoop• Whitepaper: http://kudu.apache.org/kudu.pdf• KUDU project: https://github.com/cloudera/kudu• Some Java code examples:

https://gitlab.cern.ch:8443/zbaranow/kudu-atlas-eventindex

• Get Cloudera Quickstart VM and test it

Apache Kudu - Indico€¦ · KUDU tries to fill the gap • HBASE (on HDFS) excels at – Fast...

Documents

hdfs Documentation

Maximizing Hadoop Performance with Hardware Compression Excels at Sifting through Huge Masses of Data to Find what is Useful HDFS MapReduce Pig Hive Sqoop HBase. MapReduce Data Flow

Introducing Kudu and Recordservice

DHHS EXCELS: STRATEGIC PLAN MEASURES …publichealth.nc.gov/employees/grants/StrategicPlanMeasures-Goal3...dhhs excels: strategic plan measures for goal 3 ... dhhs excels: strategic

Environmental Conditions in Dawakin Kudu

HDFS Architecture

HDFS: Hadoop Distributed File Systemeecs.csuohio.edu/~sschung/cis612/LectureNotes_HadoopFinal_1.pdf · Hadoop Distributed File System (HDFS) p: HDFS • HDFS Consists of data blocks

KUDU INTEGRATION PROJECT FOR TRANSMISSION LINES …€¦ · KUDU INTEGRATION PROJECT FOR TRANSMISSION LINES ... ESKOM KUDU INTEGRATION PROJECT FOR TRANSMISSION LINES AND ... Environmental

HDFS & MapReduce

KUDU final

Kudu Cloudera Meetup Paris

Human Development & Family Science (HDFS)catalog.okstate.edu/courses/hdfs/hdfs.pdf · 2020. 9. 3. · 2 Human Development & Family Science (HDFS) HDFS 2233 Development of Creative

Kudu - Fast Analytics on Fast Data

Storage in Kudu

Introduction to Apache Kudu

KUDU POWER STATION (PS)

Construction of a 132KV Distribution Line from the Kudu ... March 2011/Project 1_9 files/Kudu... · BOHLWEKI-SSI ENVIRONMENTAL Construction of a 132KV Distribution Line from the Kudu

Herding the elephants: Workload-level optimization …openproceedings.org/2017/conf/edbt/paper-266.pdfthe Apache Kudu integration [12], a viable alternative to using HDFS is now available

QUALITY THAT EXCELS

Greater Kudu