© 2014 EXASOL AG
1
A Tool For A Job
Thomas Bestfleisch, Solution Engineer, EXASOL
© 2014 EXASOL AG
2
@EXADude Loves His Lawnmower ...
... because it cuts his grass well
© 2014 EXASOL AG
3
But ...
... he struggles quite a bit cutting his hedge
© 2014 EXASOL AG
4
And ...
... it isn‘t good at making apple sauce
© 2014 EXASOL AG
5
And don‘t even thinking about...
... using it to cut hair
© 2014 EXASOL AG
6
So … how does this apply to Big Data ?
You can run Analytical queries
On Multipurpose databases
But they stop running well
As data volumes increase
You could run them on Hadoop
but you‘ll wish you hadn‘t
Maybe you need a Tool for This Job
© 2014 EXASOL AG
7
“Pimp My Database”
Add to a multipurpose database
Add to Hadoop
Build a specialist tool from scratch
© 2014 EXASOL AG
8
Multi-Purpose Databases
Architecture dates back to the 1980s
Row-based
(usually) run on a single machine
Heavily reliant on Disk-based processing
Why this architecture
Memory was expensive
Works well with a wide range of data problems
BUT : doesn‘t work well with big analytical queries
© 2014 EXASOL AG
9
Enhancing a Multipurpose Database
ADD PARALLEL
Run the database on a clusters of machines
ADD COLUMNAR
Replace row-based with columnar
OR offer an additional column store
ADD IN-MEMORY
© 2014 EXASOL AG
10
Multipurpose Database plus Parallel
Better to “scale out“ than “scale up“.
Get many people to dig rather than using a bigger shovel.
There is a physical limit to the size of shovel one person can effectively use.
© 2014 EXASOL AG
11
Multipurpose Database plus Column Storage
Better compression
Faster query performance
© 2014 EXASOL AG
12
Multipurpose Database plus In-Memory
© 2014 EXASOL AG
13
A Multipurpose Database
© 2014 EXASOL AG
14
Examples of Enhanced Multipurpose Databases
Netezza = Postgres + Parallel
Greenplum & ParAccel = Postgres + Parallel + Columnar
Redshift = ParAccel + Cloud
Aster Data = Postgres + Map/Reduce
InfiniDB = mySQL + Parallel
Oracle 12c in-memory
= Oracle + Columnar + In-memory
© 2014 EXASOL AG
15
My Opinion of Enhanced Multipurpose Databases
• Those add-ons are very clever, but not as good as purpose-built
• Often make it worse for the original purpose
© 2014 EXASOL AG
16
Hadoop
Hadoop was invented to index the Internet on a cluster of machines. Map/Reduce – distributed processing
HDFS – distributed file system
Analytical Queries are NOT like indexing the internet
© 2014 EXASOL AG
17
Potential “Fixes” for Hadoop
Forget Map/Reduce
Add another Execution Engine
e.g. Tez or Spark
OR
Forget HDFS
Add a columnar, in-memory file store
Tachyon in-memory file system
Columnar file formats like parquet, RCFile
© 2014 EXASOL AG
18
My Opinion of a Partly-Fixed Hadoop solution
The legacy bits don’t really fit with the new stuff
© 2014 EXASOL AG
19
A complete replacement
Let’s ignore Map/Reduce AND HDFS for analytical queries.
But without Map/Reduce and HDFS, is it still Hadoop ?
Or have we started from scratch and made something new …
© 2014 EXASOL AG
20
Adapting Hadoop for Analytical Queries is not an option
All new SQL-on-Hadoop projects are starting from scratch
New columnar in-memory file formats / systems
New low latency execution frameworks
This is EXASOL over a decade ago
Catch us if you can !
© 2014 EXASOL AG
21
A Tool For A Job
Use the appropriate tool for each job Multipurpose databases are great for
transactional processing
Hadoop is amazing with unstructured data
EXASolution is breathtaking on analytical queries
Why choose one when you can have them all ?
© 2014 EXASOL AG
22
Questions ?
More details and a FREE community version of our database available at
www.exasol.com
Email:
Twitter : @EXASOLAG, @EXADude