5

Click here to load reader

Greenplum versus redshift and actian vectorwise comparison

Embed Size (px)

Citation preview

Page 1: Greenplum versus redshift and actian vectorwise comparison

Criteria Greenplum Amazon Redshift Vectorwise

Description Analytic Database platform built on PostgreSQL. Full name is Pivotal Greenplum Database

From <https://db-engines.com/en/system/Greenplum%3BIngres>

URL :-https://pivotal.io/pivotal-greenplumhttp://greenplum.org/http://gpdb.docs.pivotal.io/43160/common/welcome.html

Large scale data warehouse service for use with business intelligence tools

From <https://db-engines.com/en/system/Amazon+Redshift%3BGreenplum>

Actian Vector is a relational database engine designed for high performance analytics. Actian Vector was designed from the ground up to exploit performance features in today’s x86 CPUs such as vectorization and larger chip caches enabling in-chip analytics. Actian Vector’s record breaking speed delivers results faster than any of its competitors.

From <https://www.actian.com/analytic -database/vector-smp-analytic-database/>

URL :https://www.actian.com/analytic-database/vector-smp-analytic-database/

Vendor Pivotal Software Inc. It is a division of Dell EMC

From <https://db-engines.com/en/system/Greenplum%3BIngres>

Amazon Actian Corporation

From <https://db-engines.com/en/system/Greenplum%3BIngres >

DB Engine Ranking

Score: 11.41Rank#35Overall#22Relational DBMS

From <https://db-engines.com/en/system/Greenplum%3BIngres>

PostgreSQL at #4

This ranking does not make sense because this is apple versus oranges comparison. We should not be comparing relational databases with columnar databases in OLTP versus OLAP scenarios.

Score : 13.04Rank : 32Overall : 20

Score: 0.66Rank#151 Overall#76 Relational DBMS

From <https://db-engines.com/en/system/Actian+Vector >

Ingress is ranked 52 versus PostgreSQL at #4Informix is also ranked quite low at #25

https://db-engines.com/en/ranking

IT Central Station Ranking

Ranked at No. 6, top 5 are Oracle Exadata, Teradata, HPE Vertica, Netezza and SAP IQ which are all commercial solution

Actian offers Parcels which is rated 18th. We are not using Parcels and can't comment on that. Actian Vectorwise is not listed here

Release History

Initial Release : 2005Current Release : 4.3.11.1, January 2017Beta Release : 5.x beta is out, it is expected they will improve PostgreSQL version support.https://gpdb.docs.pivotal.io/500Beta/relnotes/GPDB_500

_README.html#topic_yxx_bq2_lx

Vector 5.0, July 2016

From <https://db-engines.com/en/system/Actian+Vector >

Licensing Both Open Source and Licensed Version Available Cloud based, SaaS solution, based on usage Licensed Version Only

Technical Support including bug fixing support

Available with Licensed VersionTechnical Documentation

gpdb.docs.pivotal.io

From <https://db-engines.com/en/system/Greenplum%3BIngres>

Available

Community Open Source, Greenplum community, PostgreSQL community, lots of information on youtube

Yes Lacks community, closed source, have to rely on technical support

Licensing Cost

Free in case of open-source(Aplache License 2.0)In case of technical support subscription, un-official estimate is :-100 CPU cores : 1000 $ CPU core, per year

Commercial Commercial

Architecture Columnar, Shared Nothing with MPP SupportSupported using EMC appliance as well as off the shelf suitable hardware

Columnar Columnar, MPP is supported. Although it depends on our license, not sure whether we have the license for this.

Hardware and Setup

Could be implemented using commodity hardware, DCA appliance is also available. Only enterprise MPP that can be run on commodity hardware.Cloud deployment on Amazon and Microsoft is available

Cloud Setup only Linux based deployment

GitHub https://github.com/greenplum-db/gpdb Not available

Storage POLYMORPHIC DATA STORAGE AND EXECUTIONThe table or partition storage, execution, and compression settings can be

configured to suit the way data is accessed. Users have the choice of row or

column-oriented storage and processing for any table or partition.

From <http://greenplum.org/>

Hadoop is supported but requires separately licensed product

Replication Methods

Master-Slave Yes Yes,

Partitioning Methods

Sharding Sharding

Compression Upto 30%

ACID Compliance

Yes Yes Yes

Greenplum Versus Redshift and Actian Vectorwise ComparisonWednesday, August 23, 2017 10:39 AM

GreenPlum Page 1

Page 2: Greenplum versus redshift and actian vectorwise comparison

Backup and Recovery

Supports parallel and non-parallel backup and restorehttps://gpdb.docs.pivotal.io/4350/admin_guide/managing/backup.html

Using Greenplum ApplianceThey have a hardware called data domain system for backup and recovery, similar solutions should be available from other vendors as well.https://www.emc.com/collateral/hardware/white-papers/h8038-backup-recovery-greenplum-data-domain-wp.pdf

Using Commvaulthttp://documentation.commvault.com/commvault/v11/article?p=products/greenplum/t_greenplum_backup.htm

http://documentation.commvault.com/fujitsu/v11/article?p=products/greenplum/t_greenplum_restore_from_backup_job.htm

Veritas

https://gpdb.docs.pivotal.io/500Beta/admin_guide/managing/backup-veritas.html

Cloud based Replication approach is used, which is costly

Scalability Greenplum’s NewSQL MPP share-nothing RDBMS database designed for multi-petabyte environments where a share-everything DBMS, like Oracle, would die due to IO limitations to the SAN.

From <https://www.quora.com/Why-would-anyone-migrate-from-Oracle-to-the-Greenplum-database>

Cloud based options MPP

Big Data Supports data stored in HDFS, Hive, HBase, Avro, ProtoBuf, Delimited Text and Sequence Files.

Solr/Lucene integration for multi-lingual full-text search embedded in the SQL.

Row and/or Column-oriented data storage. It is the only database where a table can be polymorphic with both columnar and row-based partitions as defined by the DBA.

Advanced Map-Reduce CBO Query Optimizer – queries can be run on over 1,000+ nodes.

It has a dynamic distributed pipeline execution model for query processing. While older map-reduce databases rely on materialized execution Greenplum doesn't have to write data to disk with every intermediate query step. It streams data to the next stage of a query plan in memory, and never has to materialize the data to disk, so it's much faster than what anybody has demonstrated on Hadoop.

Deep analytics – including data mining or machine learning algorithms using MADlib (think of it as R for MPP). Deep Semantic Textual Analytics using GPText.

Graphical Analysis - billion edge distributed in-memory graph database and algorithms using GraphLab.

Integration of SQL, Solr indexes, GPText, MADlib and GraphLab in a singlequery. Wow!

From <https://www.quora.com/Why-would-anyone-migrate-from-Oracle-to-the-Greenplum-database>

No Support, Map-reduce is not supported Not sure about data science support and integration in Actian.

Data Loading Distributed ETL rate of 16 TB/hr without using master node!!•Integration with Talend available.•

Data loading component for PDI is also available•GPLoad is data loading component•

• Compatible with open source and commercial ETL tools

AWS tools are available, compatiable with various open source and commercial ETL tools

VWload is available for bulk loading

Integration with Spagobi

Spagobi supports PostgreSQL, but testing needs to be done with Greenplum. Spagobi is not officially certified with Greenplum. POC will be done focusing on this area.

Should be possible through JDBC, not tested Integration with Spagobi 5.2 has been tested. Basic features such as dashboards, reports, adhoc analysis are working. Currently there are issues with creating QBE(business models)

Known Issues

Greenplum may have problems with high concurrency & volatility, but it would be silly to have high concurrency in the PB range. In the < 100 TB range it becomes a question of whether you need high concurrency (Oracle) or Data Science like analytics (Greenplum).

From <https://www.quora.com/Why-would-anyone-migrate-from-Oracle-to-the-Greenplum-database>

Note : This is an opinion, and may not be valid as per my knowledge. VWWare Case Study(108 TB with 6000 users, 300 of them concurrent)

Redshift is still very limited in terms of SQL functionalities that it offers. You can't have procedures, functions, triggers, CTE etc. in Redshift. So you have to follow an ETL approach for your data warehouses in most of the cases, even though ELT might suit you better.Another major limitation with Redshift is the number of concurrent queries it can run: 15. Yes, that's right. Redshift can only run 15 concurrent queries as of now.

For a big datawarehouse with ETL processes, admins, users, dashboards on top of that this number is ridiculously low. I hope they do something about this sooner than later.

From <https://www.quora.com/What-features-does-Amazon-Redshift-fail-to-offer-compared-to-higher-priced-alternatives-like-Teradata-How-likely-are-customers-to-switch>

1). Product was discontinued and started again2). Had issues with large, complex queries crashing in version 3.x, issues were resolved in version 4.x

Customers BC Hydro, China Railway, TCS Bank Russia, Well Care, VWWare Case

GreenPlum Page 2

Page 3: Greenplum versus redshift and actian vectorwise comparison

Customers BC Hydro, China Railway, TCS Bank Russia, Well Care, VWWare Case Study(108 TB with 6000 users, 300 of them concurrent)

Performance Petabyte scale data warehouse solution Some customers have claimed that it has better performance : -https://pavanskumar.wordpress.com/2015/04/23/actian -vector-migration-from-greenplum/

Gartner Rated as Visionary(2015-16) and Niche Player(2017). Gartner praises Greenplum for built-in data science support. This would be a major advantage in development of advanced payment analytics features.

Rated as leader(I wonder why ?) Rated as Visionary(2015-16), Actian did not make it to Gartner Magic Quadrant in 2017

In-memory Grid

Gemfire, a key application is fraud detection which requires real-time transaction data as seen here https://content.pivotal.io/blog/big-data-meets-fast-data-to-fight-fraud-and-more

Hadoop HAWQ provides the most robust SQL interface for Hadoop and can tackle data exploration and transformation in HDFS.

HAWQ is a parallel SQL query engine that combines the key technological advantages of the industry-leading Pivotal Analytic Database with the scalability and convenience of Hadoop. HAWQ reads data from and writes data to HDFS natively. HAWQ delivers industry-leading performance and linear scalability. It provides users the tools to confidently and successfully interact with petabyte range data sets. HAWQ provides users with a complete, standards compliant SQL interface.By using the proven parallel database technology of the Pivotal Analytic Database, HAWQ has been shown to be consistently tens to hundreds of times faster than all Hadoop query engines in the market today.

Pivotal HAWQ is a Massively Parallel Processing (MPP) database using several Postgres database instances and HDFS storage. Think of your regular MPP databases like Teradata/Greenplum/Netezza but instead of using local storage it uses HDFS to store datafiles. Each of the processing nodes still has its own CPU/memory and storage.

References

From <https://dwarehouse.wordpress.com/2014/03/14/pivotal-hawq-mpp-database-on-hdfs/>

From <https://www.quora.com/What-is-HAWQ>

From <https://www.pivotalguru.com/?p=642>

They do have hadoop support, but it is not clear whether hadoop connnectivity and support is part of existing product license or not. In the same way, it seems hadoop support will have extra cost.

VectorH seems to be a separate product : -https://www.actian.com/analytic-database/vectorh-sql-hadoop/

Management Tools

Pivotal Command CenterAnd Workload Manager

Cloud based

Figure : http://www.cmswire.com/cms/analytics/a-look-at-gartners-data-management-analytics-leaders-028772.php

GreenPlum Page 3

Page 5: Greenplum versus redshift and actian vectorwise comparison

Reference : https://dwarehouse.wordpress.com/2014/03/14/pivotal-hawq-mpp-database-on-hdfs/

Key Components of Greenplum

Comparison with Other MPP's

GreenPlum Page 5