Veracity think bugdata #2 6.7.2015

DWH OVER HADOOPDWH OVER HADOOP

THETHE

BASICSBASICS

COLUMNAR FORMATS (ORC/PARQUET)COLUMNAR FORMATS (ORC/PARQUET)Projection Push DownPredicate Push DownExcellent Compression RatiosColumn IndicesMax/Avg/Min valuesRows must be batched to benefit from these optimizations

PARQUETPARQUET

Strongly endorsed by ClouderaOne of the few formats Impalasupports (and the most optimalfor it)Also supported by Hive, Spark,Tajo, Drill & Presto.Speaking from myown personal experience a bitmore expensive to generate.

ORCORC

Endorsed by HortonworksMost optimal for PrestoSpark support was recentlyintroduced.

QUERYINGQUERYINGENGINESENGINES

HIVEHIVE

Hive provides a SQL like interface ofaccessing the data (files) called HiveQL.The HQL is translated intoM/R code and executed immediately. Batch Oriented Fault tolerant and thus reliableNot a DB! Does not support updates & delete and hasno transaction (or does it ?)

LOW LOW LATENCYLATENCYSQLSQL

Map-Reduce can be compared toa Tractor:It's very strong and can plow afield better than any other vehicle,but it's also very slow.As prices of memory dropped, ademand emerged to better utilizeit for faster response times.

CLOUDERA IMPALACLOUDERA IMPALAWriten in C++Utilizes Hive's metadataVery fastNot fault toleranteDoesn't support custom dataformatsDoesn't support complex datatypes (maps/arrays/structs)A bit complicated setup for nonCDH distributions

FACEBOOK PRESTOFACEBOOK PRESTOCan connect to:

CassandraHiveJMX SourcesPostgres & Mysql

Allows cross engine joins Used in Facebook to serve onlinedashboardsEasy to setup

SPARK SQLSPARK SQLNot affiliated with any HadoopvendorSupport all of the optimized fileformats (ORC/Parquet/Avro)Can auto discover schemaAims to provide second/sub-second latnecyStill not very mature

THE USUAL DATA FLOWTHE USUAL DATA FLOW

Collect -> Store -> Convert -> Select

The Data Latency conflict - lotsof fragmented small files or bigoptimized files with big latencyProcessing efforts involved inthe conversion process shouldbe minimizedExample..

A BETTER DATA FLOWA BETTER DATA FLOW

Collec-tor-vert -> Select

Convert the data as it is beingcollected where possibleOr convert the data as it isbeing stored (streaming) butwithout losing optimizationsHow can this be achieved?

SQOOPSQOOPImport data from RDBMS intoHadoopCreate java classes and hivetables on importExport data back to RDBMSRuns a "Map Only" job toperform the taskSupports incremental importsNow supports import rightaway as Parquet

HIVE & ACIDHIVE & ACID

Recently a conceptual change has beenintroduced into Hive: CRUD with ACIDTransactions.It is not meant to replace your OLTP butrather supply a better data modificationmechanism to a subset of the data.Explanation on how it worksDemo simple insertStill requires M/R :(

HIVE & STREAMING INGESTHIVE & STREAMING INGEST

With the new ACID capabilities it is nowpossible to continously insert data into hiveData apperas almost immediatelyData is optimized in a columnar formatData is compacted by different triggersCode snippet

FLUMEFLUMEDistributedDurableScalableFault ToleranteServes for ingestion and basicpre-processing of the dataComposed of source -> channel -> Sink(Draw Architecture)Utilized Hive's ACID capabilitiesto instantly stream data into hive- demo

Veracity think bugdata #2 6.7.2015

Documents

MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity

Veracity VHW-HWPS-B Data Sheet

VERACITY RIFLESCOPES - Beretta

Veracity Fund - TARP

Veracity VTNTN Data Sheet

SPATIO-TEMPORAL VERACITY ASSESSMENT

VERACITY RIFLESCOPES - Burris Optics · 2020-01-01 · VERACITY ® RIFLESCOPES User Guide This user guide includes information for the entire Veracity riflescope line. Please review

Thomas Veracity Therapy

Week 2 Keeping It Real - Veracity

CUSTOMER AGREEMENT - Veracity Markets€¦ · 1. TERMS AND HEADINGS 1.1 “Veracity Markets” shall, where the context so permits or requires, be deemed to mean Veracity Markets

IASbaba - Best IAS and UPSC coaching institute in India - Online … · veracity 2 years ago Apur2904

Veracity Embracing Uncertainty

FP Veracity for Virtual Reality

ECO-VERACITY - Department of Business Economicsdbe-du.org/documents/Eco-Veracity(January2015).pdfECO-VERACITY January 2015 ... an old Doordarshan ad being played, ... problems for

[VERACITY] behind stories of japan location

VeR city veracity N

Veracity Logic Overview 2015 07-17

ICCM 2010: Veracity Blues

Veracity VLS-1U Data Sheet

Veracity VCS-4P1 Data Sheet