Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
WELCOME TO NOSQL
Ahmed [email protected]
2016 EMC Proven Professional Knowledge Sharing 2
Table of Contents
1. INTRODUCTION ........................................................................................................................................... 3
2. NOSQL BACKGROUND ................................................................................................................................. 3
2.1. NOSQL MAIN CATEGORIES ............................................................................................................................... 4
2.1.1 Key-Value Stored NoSQL ...................................................................................................................... 5
2.1.2 Document Stored NoSQL ..................................................................................................................... 6
2.1.3 Wide-Column Stored NoSQL ................................................................................................................ 7
2.1.4 Graph-Oriented NoSQL ........................................................................................................................ 9
2.1.5 Choosing a NoSQL Category .............................................................................................................. 10
2.2. HADOOP AND NOSQL RELATIONSHIP ................................................................................................................ 12
2.3. BIG DATA ANALYTIC PLATFORM SUPPORT FOR NOSQL ........................................................................................ 15
2.3.1. Datameer .......................................................................................................................................... 15
2.3.2. DataStax ........................................................................................................................................... 16
2.3.3. Karmasphere..................................................................................................................................... 16
2.3.4. Solr .................................................................................................................................................... 16
2.3.5. RapidMiner ....................................................................................................................................... 17
2.3.6. R ........................................................................................................................................................ 17
2.3.7. Pivotal HD ......................................................................................................................................... 17
3. CONCLUSION ............................................................................................................................................. 18
4. REFERENCES .............................................................................................................................................. 19
Disclaimer: The views, processes or methodologies published in this article are those of the authors.
They do not necessarily reflect Dell EMC’s views, processes or methodologies.
Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries.
2016 EMC Proven Professional Knowledge Sharing 3
1. Introduction
Welcome to NoSQL (Not Only SQL) technology! This research summarizes:
The four main categories of NoSQL databases and when to use each type
The relationship between Hadoop and NoSQL technologies
Big Data analytic platform support for NoSQL
The article starts with a brief background about NoSQL technology. This is followed by a discussion
on the four main types of NoSQL databases: Key-value Stored, Document Stored, Wide-column
Stored, and Graph Oriented. While there is a difference between NoSQL technology and Hadoop,
there is an important relationship binding the two and this relationship will be addressed. Finally,
the research will examine the current state of play with regard to Big Data analytic platforms on
NoSQL databases.
2. NoSQL Background
The term NoSQL was first used in 1998 as a name of a database. The term gained significant
awareness in 2009 according to Google Trends [7].
Figure 1: Google Trend Result for the term NoSQL
2016 EMC Proven Professional Knowledge Sharing 4
Many believe that a better naming would have been NoRelational instead of NoSQL, since the term
better describes the non-relational flexible schema that characterizes NoSQL databases. According
to the list of NoSQL databases referred to as the “ultimate guide to the non-relational universe” [2],
the current number of NoSQL databases is roughly 150 databases. The huge volume (petabyte scale)
and increasing variety (structured/unstructured) of data posed significant challenges for classic
relational database management systems (RDBMS). This challenge led to the creation of alternative
database management systems (DBMS) such as NoSQL databases. NoSQL systems are distributed,
non-relational databases, designed for large scale data storage and for massively parallel data
processing across a large number of commodity servers [8].
According to the Couchbase survey conducted in 2012 [9], the two main drivers for adopting NoSQL
databases relates to Variety and Volume. This confirms the earlier statement that people turn to
NoSQL databases as opposed to RDBMS to solve their Big Data challenges.
Figure 2: Couchbase Survey Results
2.1. NoSQL Main Categories
NoSQL databases can be classified into four basic categories, each appropriate to different kinds of
tasks. The four main categories are:
1) Key-value Stored NoSQL
2) Document Stored NoSQL
3) Wide-Column Stored NoSQL
4) Graph-Oriented NoSQL
2016 EMC Proven Professional Knowledge Sharing 5
2.1.1 Key-Value Stored NoSQL
These DBMS store items as keys and values. The key is an alpha-numeric identifier, while the value
may be a simple text string or more complex lists and sets. Data searches are usually performed
against keys only, and are limited to exact matches. An example key value store is as follows:
Key Value
1 Mobile Device Type: Computer
Model: Toshiba Laptop
Location: Office
Expires: 2015
2 Mobile Device Type: Tablet
Model: Samsung S4
Location: Home
Expires: 2014
When to Use: This NoSQL database type is ideal for extremely fast, highly scalable retrieval of
values. This can be found in use cases such as user profile management, shopping carts, or any other
use case where extremely low response time is critical.
Examples: Voldemort (LinkedIn), Dynamo (Amazon), Redis, Riak
While Amazon’s Dynamo has significant influence over a number of key-value NoSQL databases,
other players in the field have significant presence such as Voldemort and Redis. Dynamo is
proprietary, while Voldemort (Apache license) and Redis (BSD license) are both open source and
therefore gaining more popularity. Both Voldemort and Redis support compression, while Dynamo
does not [5]. Redis allows matching for key-ranges, such as matching of numeric ranges or regular
expressions, while Dynamo represents “Just a key-value store” [5]. On the other hand, revision
control is a capability that both Dynamo and Voldemort excel at over Redis. Voldemort is special in
that its data replication technique is symmetric (peer-to-peer) versus master-slave replication for
both Dynamo and Redis. Redis supports in-memory operations and is considered by some to be the
most popular key-value store in the cloud [12].
2.1.2 Document-Stored NoSQL
These DBMS are document databases and are inspired by Lotus Notes. They are designed to store
and manage documents which are encoded in a standard data exchange format such as XML, JSON
2016 EMC Proven Professional Knowledge Sharing 6
(Javascript Option Notation), or BSON (Binary JSON). The value column of these databases is more
complex than a simple Key-Value Store NoSQL. A single column can host hundreds of
attributes/value pairs, and these attributes can change from one row to another. Both the value and
keys are fully searchable in document-stored NoSQL databases.
Figure 3: Document Store NoSQL Database [8]
When to Use: This NoSQL database type is ideal for storing and managing Big Data-size collections
of documents. Examples include text documents, XML documents, and emails. This database type
works well in storing semi-structured data that would require an extensive use of nulls in a RDBMS
for missing or nonexistent values.
Examples: CouchDB (JSON), MongoDB (BSON)
The two main players in this NoSQL database type are CouchDB and MongoDB. Both are open-
source with CouchDB following the Apache licnese, and MongoDB following the AGPL license.
MongoDB follows the BASE (Basically Available Soft-State Eventual Consistency) data integrity
model, while CouchDB can follow the ACID (Atomicity Consistency Isolation Durability) data integrity
model. MongoDB makes it easy to perform full text search, while a similar search in CouchDB may
require a MapReduce query. The replication in MongoDB is Master-slave while CouchDB is peer-to-
peer which may be valuable in some scenarios. Sharding in MongoDB is more advanced than in
CouchDB which itself has no built-in sharding mechanism yet, but there are several projects that
provide sharding support for CouchDB [7]. The ease of use and documentation available for
MongoDB is better than in CouchDB [11]. According to a recent research for LinkedIn profiles,
MongoDB is in highervdemand than CouchDB. In fact, MongoDB is considered the most popular
NoSQL database in terms of LinkdIn profile mentions (see Figure 4).
2016 EMC Proven Professional Knowledge Sharing 7
Figure 4: NoSQL LinkedIn Skills Index [13]
2.1.3 Wide-Column Stored NoSQL
These DBMS are referred to as Wide-Column or Column-Family (WC/CF) stores. This database
management system is similar to document databases in that it uses a distributed column-oriented
data structure with multiple attributes per key. Some of these Wide-Column stores take the form of
a Key-Value store such as the popular Cassandra. The majority of these databases however follow
GoogleilyBigtable, which was developed by Google to be a petabyte-scale data storage system for its
search index. Google not only developed this database, but also developed a distributed file system
called GFS, as well as a MapReduce parallel processing framework. Similarly, Hadoop core consists
of the Hadoop file system (HDFS), and MapReduce. Hadoop ecosystem expands to include Hbase,
which is one Bigtable style database [8].
2016 EMC Proven Professional Knowledge Sharing 8
Row Super Column Families: Electronics
100
Super Column: Device Type
Model: Toshiba Laptop
Weight: 2Kg
Dimensions: 30cm x 20cm x 3cm
Super Column: Manufacturer
Name: Toshiba Corporation
Country: Japan
City: Tokyo
101
Super Column: Device Type
Model: Television
Size: 40 inch
Type: Plasma
Super Column: Manufacturer
Name: Samsung
Country: Korea
Zip: 1135345
Figure 5: Wide-Column Store NoSQL Database
When to Use: This NoSQL database type is ideal for distributed data storage that is versioned due to
the availability of Wide-Column time-stamping functions. Also, large scale batch-oriented data
processing such as sorting and parsing works well for this database type.
Examples: BigTable, Hbase, Cassandra
The three main players in this NoSQL database type are BigTable, Hbase, and Cassandra. BigTable is
proprietary to Google, while both Hbase and Cassandra are open source following the Apache
license. BigTable uses GFS distributed file system for data storage, Hbase uses the Hadoop
distributed file system for storage, and Cassandra has its own file system. Cassandra has a special
query language – Cassandra Query Language (CQL) – and also supports API calls for queries. Hbase
2016 EMC Proven Professional Knowledge Sharing 9
is queried through API calls or REST, and BigTable queried through APIs. All support MapReduce. For
integrity model, BigTable uses the multi-version concurrency control (MVCC), Hbase uses the log
replication, and Cassandra uses basically available soft state eventual consistency (BASE). Bigtable
supports full text search, while both Hbase and Cassandra do not. The maximum value size for
Hbase is much higher than Cassandra (2TB vs 2GB). BigTable is based on C/C++, while both Hbase
and Cassandra are based on Java. Since BigTable is proprietary, Cassandra and Hbase are in wide
use. According to a recent research for LinkedIn profiles, Cassandra is in higher demand than Hbase
[13]. This conclusion is also validated by another survey included below.
Figure 6: Wide-Column Store NoSQL Database Rankings [20]
2.1.4 Graph-Oriented NoSQL
These DBMS came to replace relational tables with structured relational graphs of interconnected
key-value pairings. This database type is unique because it is the only one of the four types that
focuses on relations visually. This special visual representation of information makes them more
familiar to human nature than any of the other NoSQL database types. This database type seems to
be ignored often by specialists in the field when it comes to analyzing NoSQL databases [20].
2016 EMC Proven Professional Knowledge Sharing 10
Figure 7: Graph-Oriented Store NoSQL Database [29]
When to Use: This NoSQL database type is ideal for exploring relationships between data, rather
than exploring the data itself. Social networks traversing and representation is one example. This
database type is optimized for relationship traversing, not for querying. If the use case is more about
querying values, it may be better to use a search-based DMS instead. Perhaps that is what led
LinkedIn to use Voldemort, and Facebook to use Cassandra as their database instead of a Graph-
Oriented store.
Examples: Neo4j, AllegrGraph
The main player in this NoSQL database type is Neo4j. The data storage is mainly volatile memory
and it does not support MapReduce. Neo4j is based on the ACID (Atomicity, Consistency, Isolation,
Durability) integrity model. Full text search is supported as is graph. Neo4j is based on Java [8].
2.1.5 Choosing a NoSQL Category
The line between the different NoSQL databases is very thin; however, they have some small but
very significant differences. Providing a sorted view of a data set is a typical task for a database. The
document-oriented databases excel compared to other databases when ordering by multiple
attributes is required. All database types are more or less capable of ordering by a single attribute
[7]. Understanding the workload is key to selecting the right NoSQL category for it. In a vendor-
independent comparison of NoSQL databases [3], Cassandra, HBase, MongoDB, and Riak
performance was compared in various workloads. The results of the tests showed that Wide-Column
store NoSQL databases (HBase, and Cassandra) excelled in write workloads over Document stores
(MongoDB). This is a logical result when one understands that Wide-Column stores such as HBase
favor consistency over availability by committing writes after a particular number of in-memory
HDFS replicas. Document stores on the other hand favor availability over consistency and therefore
absorb write workloads slower than Wide-Column stores. Figure 8 below illustrates the two types of
2016 EMC Proven Professional Knowledge Sharing 11
databases applying the CAP Theorem (Consistency Availability Partitioning). Wide-Column prefers
CP, while Document prefers AP.
Figure 8: Relative Position of the NoSQL Databases in the CAP Theorem [7]
On the other hand, the popular Document store (MongoDB) and Key-Value store databases excelled
in read workloads performance over the Wide-Column store (HBase). To improve the latency and
throughput of NoSQL databases, it is often the case that multiple NoSQL types work together to get
the best of both worlds. For example, a case study explained an architecture that had Redis (Key-
Value Store) using Cassandra (Wide-Column Store) in the backend [27]. The case study explained
how one organization scaled using this architecture to serve 4 billion videos.
To help in choosing a specific NoSQL database after deciding which NoSQL category is appropriate,
the following popularity diagrams can help [20].
Figure 9: Graph-Oriented NoSQL Databases Popularity [20]
2016 EMC Proven Professional Knowledge Sharing 12
Figure 10: Document NoSQL Databases Popularity [20]
Figure 11: Key-Value NoSQL Databases Popularity [20]
Figure 12: Wide-Column NoSQL Databases Popularity [20]
2.2. Hadoop and NoSQL Relationship
Data platforms have evolved from traditional RDBMS, to Data Warehouses, to Big Data platforms
such as Hadoop. Hadoop Core consists of the Hadoop filesystem (HDFS), and an open-source
implementation of MapReduce. The Hadoop Filesystem serves as a distributed file system to store
huge amounts of unstructured data. Hadoop Core is an analytic platform that serves batch
2016 EMC Proven Professional Knowledge Sharing 13
processing well. In such batch processing, delays are tolerable and the situation is not real-time. This
may be appropriate in some scenarios. Another form of processing exists which is transactional
processing, which is characterized by a low latency requirement that is often near real-time. Hadoop
Core alone is not capable of achieving such requirements, and that is why NoSQL databases are
needed. NoSQL databases interface with Hadoop platform as the data storage component of the
database. Figure 13 shows a nice taxonomy of the different data platforms.
Figure 13: Taxonomy of Data Platforms – NoSQL in the Real World [31]
As evident in Figure 13, Hadoop is located in the analytic section at the top and not the operational
section at the bottom. The Hadoop filesystem (HDFS) is where unstructured data is stored. The
Hadoop MapReduce is the data processing which takes that unstructured data and makes some
structure out of it. That structure can be stored into a NoSQL database to support low latency
transactional processing. The most obvious example for Hadoop and NoSQL working together is by
looking into the Hadoop Ecosystem, which includes HBase. HBase is Wide-Column NoSQL database
that is integrated with Hadoop Core (HDFS and MapReduce).
2016 EMC Proven Professional Knowledge Sharing 14
Figure 14: HBase Architecture [6]
Even with HBase, Hadoop’s transactional database based on columnar storage, the latency is still
based on disk I/O, queries are based on API-level programming, and high availability isn’t mature
[26]. That is why there are other examples of NoSQL with Hadoop such as Hadoop with Redis – the
popular Key-Value Store NoSQL database [25]. Redis can be used as a front end to serve data out of
Hadoop, caching the hot pieces of data in-memory for fast access when they are needed again. This
is achieved by using a Java client called Jedis, which can ingest and retrieve data with Redis. Figure
15 below summarizes the relationship between Hadoop and in-memory NoSQL. A final example of
Hadoop with NoSQL is Hadoop with MongoDB – the popular document NoSQL database. A practical
example of such a scenario exists in the paper titled Performance Evaluation of a MongoDB and
Hadoop Platform for Scientific Data Analysis [4].
2016 EMC Proven Professional Knowledge Sharing 15
Figure 15: Hadoop and In-Memory NoSQL Comparison [6]
2.3. Big Data Analytic Platform Support for NoSQL
There are multiple Big Data analytic platforms. This section will summarize the relationship between
some of the main Big Data analytic platforms and NoSQL databases. The following platforms have
been analyzed:
Datameer
DataStax
KarmaSphere
Solr
RapidMiner
R
Pivotal HD
2.3.1. Datameer
Datameer is a company that attempts to unify data analytics into a single application. Its main value
proposition is the significant reduction in complexity in terms of data integration, data
transformation, and data visualization. According to Datameer, one typically goes through a three
step process for data analytics involving three different technologies. Datameer simplifies this
complex environment into a single application on top of the powerful Hadoop platform. The NoSQL
support available from Datameer is in integrating data from the Wide-Column Store NoSQL
category. The Datameer analytic platform supports HBase or Cassandra as sources for data
integration [14].
2016 EMC Proven Professional Knowledge Sharing 16
2.3.2. DataStax
DataStax is a company that announced it is the first to introduce the world’s first NoSQL Big Data
platform with comprehensive enterprise-grade security features. On February 25th, 2013 the
company announced the general availability of its product DataStax Enterprise (DSE) 3 – the newest
version of DataStax’s Apache Cassandra-based big data platform. DSE3 is a complete integrated big
data platform that combines a production-certified version of Cassandra with Apache Solr and
Apache Hadoop to deliver continuous availability support and performance across multiple data
centers. According to DataStax, the product is architected to securely manage real-time (through
Cassandra), analytic (through Hadoop), and enterprise search (through Solr) data all in the same
database cluster [15].
2.3.3. Karmasphere
Karmasphere is a company that created a product designed for teams of analysts to explore and
analyze Big Data on Hadoop, and to discover business insights about their customers that can be
applied to all points of customer engagement. According to Karmasphere, its product is natively
designed for Hadoop and provides a unified workspace for the Big Data Analytics workflow, making
it possible to transform vast amounts of raw data into business insight spanning data ingestion,
iterative analysis, as well as the visualization and publishing of new insights. Karmasphere itself uses
a MySQL database, and its supported databases do not include MySQL databases [16].
2.3.4. Solr
Solr is an Apache open source project for a Search Solution. It is unique in that is similar to a NoSQL
database, but it is not. Solr is most similar to the MongoDB architecture. It includes the following
NoSQL features: Realtime-Get, Update Durability, Atomic Compare and Set, Versioning, and
Optimistic Locking. Like some NoSQL databases’ implementation of the CAP theorem, it favors
Consistency and Partitioning, rather than Availability and Partitioning [20]. There are several Search
Projects associated with NoSQL Databases. Examples are Lucandra/Solandra for Cassandra, HSearch
for Hbase, and Riak Search for Riak. Solr was built to be a search solution and search capability is its
sweet spot [18]. It is considered by far the most popular search engine based on the popularity
survey below.
2016 EMC Proven Professional Knowledge Sharing 17
Figure 16: Solr Popularity Compared to Other Search Solutions [20]
Solr and NoSQL remain two inter-related but separate worlds with each having its sweet spot. Thus
far, one has not dominated the other [20]. One proof point of this is the recently released DataStax
product DS3 which uses both Solr and Cassandra NoSQL.
2.3.5. RapidMiner
RapidMiner is a company that provides software, solutions, and services in the fields of predictive
analytics, data mining, and text mining. Its flagship product is user friendly, and used by many
beginners in the field of Data Analytics. The company helps to automatically and intelligently analyze
data – including databases and text. The company currently has limited support for NoSQL
technology [21].
2.3.6. R
R is a open source statistical package used in many data analytics projects. R has strong support for
NoSQL technologies in Key-Value, Wide-Column, and Document store categories. In specific, R
supports the most popular NoSQL databases in each category: Redis, Cassandra, and MongoDB [22]
[23].
2.3.7. Pivotal HD
Pivotal is a company providing application and data infrastructure software, agile development
services, and data science consulting. Its product – Pivotal HD – is a Hadoop distribution fully
supported and enterprise-ready. Pivotal HD supports multiple NoSQL databases such as Gemfire – a
proprietary NoSQL database- , as well as the open source HBase [28]. The most popular Key-Value
NoSQL database – Redis – is also supported by Pivotal [12].
2016 EMC Proven Professional Knowledge Sharing 18
3. Conclusion
Big Data is about variety, velocity, and volume. Traditional relational databases are not flexible
enough to deal with the new variety of data types. The implication is that a new generation of
databases is required with a more flexible schema. The velocity and volume of data pose a similar
challenge to traditional scale-up databases in terms of the performance requirement. The
implication is that a new generation of databases is required with a scale-out and distributed
architecture. Indeed these two challenges led to the development of a new generation of databases
referred to as NoSQL databases. Although vendors have developed their proprietary NoSQL
databases, open source remains king in this new space.
There are four main categories of NoSQL databases: Key-Value, Document, Wide-Column, and
Graph. Each category has its strengths and weaknesses and is populated with multiple databases,
each with its special implementation and characteristics. In each category one database has gained
more popularity over the others. Redis in the Key-Value category; MongoDB in the Document
category; Cassandra in the Wide-Column category; and Neo4j in the Graph category. NoSQL
databases and Hadoop – the popular big data platform – work together closely. NoSQL works in the
operational space, while Hadoop works in the analytical space. NoSQL exists in the front-end, and
Hadoop exists in the back-end.
The Big Data analytic platforms arena is continuously changing with some platforms embracing
NoSQL more than others. R, DataStax, and Pivotal HD are examples of platforms that have embraced
NoSQL, while DataMiner and Karmasphere platforms are examplesof those that are behind in
embracing NoSQL. With billions of users spending billions of hours online, application usage can
grow from zero to a million users overnight [1]. The application tier has long been accustomed to
scale-out architecture to absorb such spikes. Now it is the database tier’s turn for scale-out using
NoSQL.
2016 EMC Proven Professional Knowledge Sharing 19
4. References
[1] Online article, Couchbase, "What is NoSQL Database & Why NoSQL", Accessible from:
http://www.couchbase.com/why-nosql/nosql-database, Date Accessed: November 28th 2013.
[2] Online article, NoSQL Databases, "Your Ultimate Guide to the Non-Relational Universe!", Accessible
from: http://nosql-database.org/, Date Accessed: November 28th 2013.
[3] Online article, Bushik, S., "A Vendor Independent Comparison of NoSQL Databases: Cassandra,
HBase, MondoDB, Riak", Accessible from:
http://www.networkworld.com/news/tech/2012/102212-nosql-263595.html, Date Accessed: November
29th 2013
[4] University Research Paper, Dede, E., Govindaraju, M., Gunter, D., Canon, R., Ramakrishnan, L.,
"Performance Evaluation of MongoDB and Hadoop Platform for Scientific Data Analysis", Accessible
from: http://datasys.cs.iit.edu/events/ScienceCloud2013/p02.pdf, Date Accessed: November 23rd 2013
[5] University Research Report, Strauch, C., "NoSQL Databases – Selected Topics on Software Technology
Ultra-Large Scale Sites”, Accessible from: http://www.christof-strauch.de/nosqldbs.pdf, Date Accessed:
November 23rd 2013
[6] Research Paper, Sharma, S., "A Brief Review on Modern NoSQL Data Models, Handling Big Data",
Accessible from:
www.cs.iastate.edu/~sugamsha/articles/A Brief Review on Modern NoSQL Data Models_Handling Big
Data.pdf , Date Accessed: November 24th 2013
[7] Master’s Thesis, Orend, K., "Analysis and Classification of NoSQL Databases and Evaluation of their
Ability to Replace an Object-relational Persistence Layer”, Accessible from:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.184.483, Date Accessed: November 24th 2013
[8] Research Paper, Moniruzzaman, A., Hossain, S., "NoSQL Database: New Era of Databases for Big Data
Analytics – Classification, Characteristics and Comparison", Accessible from: International Journal of
Database Theory and Application Vol. 6, No. 4, 2013, Date Accessed: November 24th 2013
[9] Research Survey, Couchbase, "Accelerated Adoption of NoSQL”, Accessible from:
http://www.couchbase.com/press-releases/couchbase-survey-shows-accelerated-adoption-nosql-2012,
Date Accessed: November 30th 2013
[10] Master’s Thesis, Feng, H., "Benchmarking the Suitability of Key-Value Stores for Distributed Scientific
Data”, Accessible from:
http://www.epcc.ed.ac.uk/sites/default/files/Dissertations/2011-2012/Submission-1054977.pdf, Date
Accessed: November 30th 2013
2016 EMC Proven Professional Knowledge Sharing 20
[11] Online Blog, "The Comparison Wiki”, Accessible from:
http://vschart.com/compare/dynamo-db/vs/project-voldemort/vs/redis-database, Date Accessed:
November 30th 2013
[12] Web Page, Pivotal "Open Source Software”, Accessible from: http://gopivotal.com/oss, Date
Accessed: November 30th 2013
[13] Online Research, 451 Group, "NoSQL LinkedIn Skills Index”, Accessible from:
http://blogs.the451group.com, Date Accessed: November 30th 2013
[14] Web Page, Datameer, "Data Integration with Datameer”, Accessible from:
http://www.datameer.com/product/data-integration.html, Date Accessed: December 1st 2013
[15] Web Page, DataStax, "DataStax Introduces World’s First NoSQL Big Data Platform with
Comprehensive Enterprise-Grade Security Features”, Accessible from:
http://www.datastax.com/2013/02/datastax-introduces-worlds-first-nosql-big-data-platform-with-
comprehensive-enterprise-grade-security-features, Date Accessed: December 1st 2013
[16] Web Page, Karmasphere, "Karmashpere Technical Specifications”, Accessible from:
http://www.karmasphere.com/product-overview/technical-specifications/, Date Accessed: December 1st
2013
[17] Online Blog, Yonik, "SolrCloud, NoSQL and More”, Accessible from:
http://searchhub.org/2012/05/21/solr-4-preview/, Date Accessed: December 1st 2013
[18] Online Presentation, Ingersoll, G., Johnson, R., "Solr Power FTW”, Accessible from:
http://portal.sliderocket.com/ANYSX/SXSW-2011-Solr-Nosql, Date Accessed: December 1st 2013
[19] Online Presentation, Ingersoll, G., "Apache Lucene, Solr and NoSQL: A Comparison”, Accessible from:
http://www.lucenerevolution.org/sites/default/files/LuceneRevPreso_Ingersoll_NoSQL.pdf, Date Accessed:
December 1st 2013
[20] Online Presentation, Miller, M., "Solr The Search First NoSQL Database”, Accessible from:
http://www.slideshare.net/lucenerevolution/solr-cloud-the-search-first-nosql-database-extended-deep-
dive, Date Accessed: December 1st 2013
[21] Online Blog, Rieger, A., "Large Scale Data Analysis and Predictive Modeling in Data Mining”,
Accessible from: http://blog.bosch-si.com/large-scale-data-analysis-and-predictive-modeling-in-data-
mining/, Date Accessed: December 1st 2013
[22] Online Documentation, Urbanek, S., "Package RCassandra”, Accessible from: http://cran.r-
project.org/web/packages/RCassandra/RCassandra.pdf, Date Accessed: December 1st 2013
[23] Online Documentation, Lindsly, G., "CRAN – Package MongoDB Driver”, Accessible from: http://cran.r-
project.org/web/packages/rmongodb/index.html, Date Accessed: December 1st 2013
[24] Online Blog, Apicella, P., "Adding Years to Your RDBMS by Scaling with Spring and NoSQL”, Accessible
from: http://blog.gopivotal.com/products/adding-years-to-your-rdbms-by-scaling-with-spring-and-nosql,
Date Accessed: December 1st 2013
2016 EMC Proven Professional Knowledge Sharing 21
[25] Online Blog, Shook, A., "Making Hadoop MapReduce Work with a Redis Cluster”, Accessible from: http://blog.gopivotal.com/products/making-hadoop-mapreduce-work-with-a-redis-cluster, Date Accessed:
December 1st 2013
[26] Online Blog, Melo, F., "Cultivating Hybrids: 4 Key Data Architectures for Scaling Infinitely”, Accessible
from: http://blog.gopivotal.com/features/cultivating-hybrids-4-key-data-architectures-for-scaling-infinitely,
Date Accessed: December 1st 2013
[27] Online Blog, Bloom, A., "Case Study: How Hulu Scaled Serving 4 Billion Videos Using Redis”,
Accessible from: http://blog.gopivotal.com/case-studies-2/case-study-how-hulu-scaled-serving-4-billion-
videos-using-redis, Date Accessed: December 1st 2013
[28] Online Blog, Miner, D., "Introducing Pivotal HD”, Accessible from: http://blog.gopivotal.com/features/introducing-pivotal-hd, Date Accessed: December 1st 2013
[29] Online Blog, neo4j, "Top 10 Ways to get to Know Neo4j”, Accessible from:
http://blog.ne4j.org/2010/02/top-10-ways-to-get-to-know-neo4j.html, Date Accessed: December 6th 2013
[30] Online Article, Swoyer, S., "DataStax: Anything Hadoop Can Do Cassandra Can Do Better”, Accessible
from: http://tdwi.org/Articles/2013/08/20/DataStax-Hadoop-Cassandra.aspx?Page=1, Date Accessed:
December 6th 2013
[31] Online Blog, Techielicous, "NoSQL in the Real World”, Accessible from:
http://techielicous.com/2011/06/04/search-and-analytics/, Date Accessed: December 6th 2013
Dell EMC believes the information in this publication is accurate as of its publication date. The
information is subject to change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” DELL EMC MAKES NO
RESPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS
PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR
FITNESS FOR A PARTICULAR PURPOSE.
Use, copying and distribution of any Dell EMC software described in this publication requires an
applicable software license.
Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries.