12
© 2015 GridGain Systems, Inc. Shared In-Memory RDD Ignite Fixing A Missing Link in Spark http:/ /ignite.apache.o rg @apacheign ite NIKITA IVANOV GridGain Founder & CTO Apache Ignite PMC See all the presentations from the In-Memory Computing Summit at http://imcsummit.org

IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Link in Spark

Embed Size (px)

Citation preview

Page 1: IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Link in Spark

© 2015 GridGain Systems, Inc.

Shared In-Memory RDDIgnite Fixing A Missing Link in Spark

http://ignite.apache.org @apacheignite

NIKITA IVANOVGridGain Founder & CTO

Apache Ignite PMC

See all the presentations from the In-Memory Computing Summit at http://imcsummit.org

Page 2: IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Link in Spark

© 2014 GridGain Systems, Inc.

Apache Ignite Project History

• 2005: GridGain project started• 2007: 1st public release• 2009: Largest production cluster > 2,000 nodes• 2010: GridGain Systems formed• 2011: GridGain starts every 10 sec around the world• 2014: Accepted to ASF Incubation as Apache Ignite• 2015: Ignite Graduated to Top Level Project in ASF

Page 3: IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Link in Spark

© 2014 GridGain Systems, Inc.

Apache Ignite - In-Memory Data Fabric

Page 4: IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Link in Spark

© 2015 GridGain Systems, Inc.

In-Memory Data Fabric – Main Features

Page 5: IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Link in Spark

© 2015 GridGain Systems, Inc.

• Direct API for MapReduce• Zero Deployment• Cron-like Task Scheduling• State Checkpoints• Load Balancing• Automatic Failover• Full Cluster Management• Pluggable SPI Design

In-Memory Compute Grid

Page 6: IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Link in Spark

© 2015 GridGain Systems, Inc.

• Distributed In-Memory Key-Value Store• Replicated and Partitioned data• TBs of data, of any type• On-Heap and Off-Heap Storage• Highly Available In-Memory Replicas• Automatic Failover • Distributed ACID Transactions • SQL queries and JDBC driver• Collocation of Compute and Data

In-Memory Data Grid

Page 7: IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Link in Spark

© 2015 GridGain Systems, Inc.

Spark Integration – IGFS & Shared RDD

Page 8: IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Link in Spark

© 2015 GridGain Systems, Inc.

Apache Ignite - Advanced Distributed SQL

• ANSI-99 SQL• H2 Local SQL Engine• Shipping with JDBC driver• Topology change tolerance• In-Memory indexes (On-Heap and Off-Heap)• Distributed Group By, Aggregations, Sorting• Cross-Cache Joins, Unions, etc.• Ad-Hoc SQL Support• Custom SQL functions (Java, Scala, etc.)• Auto GUI-based schema import• Dynamic schema change support• Geo-spatial indexes• Support for Lucene (text) and predicate queries• Flash-only support (Q116)• Un-collocated joins support (Q116)

Page 9: IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Link in Spark

© 2015 GridGain Systems, Inc.

Java Ex: SQL Cross-Cache JOIN

Page 10: IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Link in Spark

© 2015 GridGain Systems, Inc.

Java Ex: SQL Cross-Cache GROUP BY

Page 11: IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Link in Spark

© 2015 GridGain Systems, Inc.

Demo: Spark SQL & Ignite SQL via Zeppelin

https://github.com/apacheignite/zeppelin-demo

Page 12: IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Link in Spark

© 2015 GridGain Systems, Inc.

Thank you!www.gridgain.comignite.apache.org

#apacheignite

Demo: https://github.com/vkulichenko/zeppelin-demo