Current Data Security Issues of NoSQL Databases Toolkits/The NoSQL... · Current Data Security Issues of NoSQL Databases . ... such as data security and consistency, have not been

1

PAGE 1 PAGE 1

Current Data Security Issues of NoSQL Databases January 2014

Fidelis Cybersecurity 1601 Trapelo Road, Suite 270

Waltham, MA 02451 1

PAGE 2

PAGE 2

© 2015, Fidelis Cybersecurity

Abstract

NoSQL databases, sometimes referred as Not--‐Only--‐SQL databases, have recently gained much attention and popularity because of their demonstrated high scalability and performance. The primary advantage of NoSQL databases is that they are designed to efficiently store significant amounts of unstructured data. Facing the "Big Data" problem that has challenged most traditional relational database management systems (RDBMS), major Web 2.0 companies have developed or adopted different flavors of NoSQL databases for their growing data and infrastructure needs, including Amazon (Dynamo), Google (BigTable), LinkedIn (Voldemort), Face book (Cassandra), etc. From their inception, NoSQL databases have been designed for solving the Big Data issue by utilizing distributed, collaborating hosts to achieve satisfactory performance in data storage and retrieval. Other equally important database requirements, such as data security and consistency, have not been fully addressed. Following a previous study published in 2011 [Ref.1] that identified several NoSQL security issues, this white paper summarizes an open--‐ source research on recent NoSQL improvements on data security issues, as dictated by the PCI--‐ DSS compliance. With the help of third--‐party--‐security solutions, some of current NoSQL databases seem able to achieve the PCI--‐DSS compliance. However, the potential data inconsistency among replications may impede a wide acceptance of NoSQL by much less--‐ tolerable financial applications. It is generally conceived for the foreseeable future NoSQL and RDBMS are co--‐deployed to process different data flows in the ways they are best designed to do. Potentially, NoSQL databases may replace RDBMS once they are continuously improved to provide sufficient data security.

PAGE 3

PAGE 3


Relational and NoSQL Databases

There are three basic requirements for databases management systems, confidentiality, integrity and availability. The stored data must be available when it is needed (availability), but only to authorized entities (confidentiality), and only modified by authorized entities (integrity). Traditional relational database management systems (RDBMS), like Oracle, SQL and MySQL, have been well--‐developed to meet the three requirements. In addition, enterprise RDBMS are further required to have ACID properties, Atomic, Consistency, Isolation, and Durability, that guarantee that database transactions are processed reliably [Ref. 2]. With such desirable properties, RDBMS have been widely used as the dominant data storage choice. RDBMS now are facing major performance problems in processing exponential growth of unstructured data, such as documents, e--‐mail, multi--‐media or social media. Thus a new breed of non--‐relational, cloud--‐based distributed databases, called NoSQL, has emerged to satisfy the unprecedented needs for scalability, performance and storage. Currently there are about 150 different NoSQL databases available [Ref. 3]. They are designed to achieve the desired scalability and performance by sharing a BASE transaction concept (Basically Available, Soft state, and eventually consistent). Under this concept, committed transactions are not written to database immediately to achieve data consistency as in RDBMS. Instead, the database just needs to reach a consistent state eventually among the clustering hosts. Based on the data storage model, NoSQL databases generally can be categorized into the following four groups [Ref. 4 and 5], Key--‐Values Databases: Store un--‐interpreted arbitrary data values into a system that can be recalled later using a key (hash). This schema less data model allows for easy scaling and very simple APIs for implementations. Column Databases: Store data in a similar key--‐value fashion, except the key is a combination of column, row, and/or timestamp, which points to one or multiple columns (Column Family). The column family used here is like a table commonly found in a relational database. Document Databases: Store documents that consist of one or more self--‐contained named fields in each document, like JSON or BSON format. The structure of documents is dynamic that allows for free modification with the ability to add or remove fields of existing documents. Indexing on the named fields enables fast data retrieval.

PAGE 4

PAGE 4


Graph Databases: Store data in a flexible graph model that scales across multiple machines. This model is suitable for data with relations that are best represented as a graph (elements interconnected with an undetermined number of relations between them), such as social relations, public transport links, road maps or network topologies. Illustrated in the diagram below are the characteristics of RDBMS and NoSQL databases being scaled up in both data size and data complexity. While RDBMS are limited in both aspects, NoSQL databases with simpler data models, e.g., key--‐value and column are more easily scaled up in the data size.

Fig. Relative scalability in data size and complexity of RDBMS and NoSQL

The following table shows a sample list of well--‐known companies that uses NoSQL databases for production needs. Young internet media and social network companies are more readily to accept NoSQL because of their needs for data flexibility and scalability. For example, Netflix in 2013 migrated completely its streaming services from Oracle to NoSQL (Cassandra) to improve availability [Ref. 6]. Other well--‐established companies are relatively slower in transitioning to NoSQL, possibly burdened by legacy data storage and/or applications, in addition to still lingering concerns about NoSQL data security.

PAGE 5

PAGE 5


Table1. Examples of major companies using NoSQL databases

Company Name

NoSQL Name

NoSQL Storage Type

Adobe

HBase

Column

Amazon

Dynamo | SimpleDB

Key---Value | Document

BestBuy

Riak

Key---Value

eBay

Cassandra | MongoDB

Column | Document

Facebook

Cassandra | Neo4j

Column | Graph

Google

BigTable

Column

LinkedIn

Voldemort

Key---Value

LotsOfWords

CouchDB

Document

MongoHQ

MongoDB

Document

Mozilla

HBase | Riak

Column | Key---Value

Netflix

SimpleDB | HBase | Cassandra

Document | Column | Column

Twitter

Cassandra

Column Current NoSQL Data Security Issues

Very recent data breaches occurred at MongoHQ (Oct 2013) [Ref. 7] and LinkedIn (July 2012) [Ref. 8] underscores the importance of NoSQL data security as more and more companies are bracing for the new family of products . Although the above two incidents are caused by weak encryption of passwords, and not directly linked to any known NoSQL vulnerability, they point to a fact that NoSQL are becoming targets of attackers who seek valuable information. NoSQL database may become even more susceptible to exploits once attackers overcome the learning curve, and are able to identify hidden security or software weaknesses.

PAGE 6

PAGE 6


Okman et al in 2011 published a comprehensive study on security issues of NoSQL databases [Ref. 1], which discussed common security issues on two popular NoSQL, Cassandra (version 0.8) and MongoDB (version unknown). As most NoSQL databases are still work--‐in--‐progress products, three years later it is worthy to take another look and re--‐evaluate their recent developments. The current version of Apache Cassandra is 2.0 (Enterprise version 3.2 is offered by DataStax), and it is version 2.4 for MongoDB. Since enhanced database security always comes at the expense of database performance, there is no surprise that most default security settings for Cassandra are set to either none or minimum [Ref. 9]; and MongoDB's current manual states "The most effective way to reduce risk for MongoDB is to run your entire MongoDB deployment in a trusted environment" [Ref. 10]. 1. Data at Rest --‐ [Cassandra] The latest Cassandra (Enterprise 3.2) provides an optional feature for Transparent Data Encryption (TDE) to protect data that is flushed out from the memory (memtable) and written to disk. To some extent, this feature can be enabled to protect sensitive data. However, since the encryption certificate is stored locally, a secured file system is necessary before TDE is turned on. In addition, the commit log of Cassandra, a file where committed data is appended to, is not encrypted at all. [MongoDB] Data files in MongoDB are never encrypted, and there is no method provided to accomplish this. If encryption is needed, the application layer should perform the data encryption before writing to database. Strong file system security is also recommended. [Third Party Tools] To help NoSQL databases solve the critical deficiency in data--‐at--‐rest security, a few third--‐party tools have emerged to provide transparent data encryption and the associated key management, such as Gazzang [Ref. 11], Zettaset [Ref. 12] and IBM InfoSphere Guardium [Ref. 13]. The solutions provided by Gazzang and Zettaset are specifically targeted for distributed cloud--‐based NoSQL and Hadoop systems. IBM InfoSphere Guardium, on the other hand, is suitable for a wide range of RDBMS and NoSQL databases. 2. Data in Motion (Client--‐Node Communications) --‐ [Cassandra] By default, the client--‐node communication is not encrypted. SSL can be turned on by editing the corresponding settings under client_encryption_options in the cassandra.yaml file after generating valid server certificates.

PAGE 7

PAGE 7


[MongoDB] The default distribution of MongoDB does not support SSL client--‐node communication. In order to use SSL, it is required to either recompile MongoDB with the "--‐--‐ssl" option, or use the MongoDB Enterprise version. Additional steps to generate keys are needed for configuring client/server for SSL communication. 3. Data in Motion (Inter--‐Node Communications) --‐ [Cassandra] By default the inter--‐node communication is not encrypted either. If needed, available SSL encryption options are "all" (all inter--‐node), "dc" (between datacenters), and "rack" (between racks). Inter--‐node SS communication can be configured by editing the corresponding settings under server_encryption_options in the cassandra.yaml file. [MongoDB] Inter--‐node encryption communication is not supported in MongoDB. 4. Authentication --‐ [Cassandra] By default the authenticator setting of basic Cassandra is AllowAllAuthenticator, which means essentially there is no authentication. The other available option is PasswordAuthenticator, in which user names and passwords (hashed but unsalted) are stored in the system_auth.credentials table. Enterprise Cassandra can further provide Kerberos authentication, which requires setting up separate Kerberos servers and installing Kerberos client software on all joining Cassandra hosts. [MongoDB] Authentication is also disabled by default. Basic MongoDB does provide support for authentication on a per--‐database level. Users exist in the context of a single logical database. MongoDB Enterprise supports an additional Kerberos service for authentication. 5. Authorization --‐ Due to the schema--‐less nature of NoSQL's data models, fine--‐grained data access controls at the row or column level, as provided by RDBMS like Oracle, are not available with current NoSQL databases. Some of them do implement some sort of authorization if needed. [Cassandra ] The default choice is AllowAllAuthorizer, which essentially provides no authorization and allows any action by any user. If CassandraAuthorizer is selected, then privileged administrators can grant any of the privileges (ALTER, AUTHORIZE, CREATE, DROP, MODIFY, SELECT) on any resources (ALL KEYSPACES, KEYSPACE, TABLE) to a selected user, by executing CQL (Cassandra Query Language) statements.

PAGE 8

PAGE 8


[MongoDB] Disabled by default, MongoDB provisions authorization on a per--‐database level by using a role--‐based approach. Available roles are limited to the following, read, readWrite, dbAdmin, userAdmin, clusterAdmin, readAnyDatabase, readWriteAnyDatabase, userAdminAnyDatabase, and dbaAdminAnyDatabase.

6. Audit – Security logging and monitoring is also required by the PCI-DSS compliance (Requirement 10), to determine the "who, what, where and when" of users accessing a data processing resource, such as a database.

[Cassandra] Auditing is available in Enterprise Cassandra as a log4j--‐based integration, and a per--‐ node basis. To get the maximum audit information, turning on auditing on every node is recommended. Filters are available for logging, using a combination of the following categories, ADMIN, ALL, AUTH, DML, DDL, DCL, and QUERY.

[MongoDB] MongoDB is far behind in implementing the desired security logging and monitoring. Most monitoring and reporting tools currently distributed with MongoDB are related to database performance, mainly for showing the running state of a MongoDB instance. There is an HTTP Console for each MongoDB instance to show information about the system and connecting clients. However, if security is not enabled for the MongoDB instance, which is by default, no authorization is needed to access this interface, resulting in a potential vulnerability.

7. Data Consistency --‐ Because of the shared BASE design among NoSQL databases, inherent data inconsistency among clustering nodes becomes possibly. This may explain why NoSQL databases have not well made their way into processing critical financial transactions. The potential data inconsistency can be shown in the following series of diagrams,

PAGE 9

PAGE 9


Fig. 2 A user enters information into a social network site

Fig. 3 Shortly the information is updated, but hasn't been consistently replicated

PAGE 10

PAGE 10


Fig. 4 Read inconsistency could happen if stale data is retrieved

Since NoSQL databases do not guarantee strong data consistency, it usually falls on developers to design applications that can work with the eventual consistency model, and to weigh tradeoffs between data consistency and performance impact. Cassandra does provide a range of configurable write and read consistency levels (CL) to meet particular application needs, as shown in Fig. 5 [Ref. 14].

PAGE 11

PAGE 11


Fig. 5 Configurable write and read consistency levels available in Cassandra

8. NoSQL Injection Exploits

Just like their traditional RDMBS counterparts, NoSQL databases are susceptible for injection attacks, especially those heavily use server--‐side JavaScript and PHP to enhance database performance. Take MongoDB for example, its internal operator "$where", designed to be used as a filter like the "where" clause in SQL, can also takes sophisticated JavaScript functions to filter data. An attacker thus can pass arbitrary code or commands into the $where operator as part of the query. Other vulnerable MongoDB operations include db.eval(), mapReduce, and group, which also permit to run arbitrary JavaScript expressions on the server. The next release of Open Web Application Security Project (OWASP) Test Guide (v4), currently still in draft, is to include new procedures for testing NoSQL injections [Ref. 15]. Although the draft test uses MongoDB as the target for example, other NoSQL databases that built upon JavaScript and/or PHP engines may possess similar vulnerabilities. Typically, NoSQL injection attacks will execute where the attack string is parsed, evaluated, or concatenated into a NoSQL API call. Attackers

PAGE 12

PAGE 12


just need to be familiar with the syntax, data model, and underlying programming language of the target database in order to design specific exploits. The following examples demonstrate how JavaScript NoSQL injections can be crafted against a vulnerable MongoDB instance. [JavaScript NoSQL Injection #1] To demonstrate a potential NoSQL injection against MongoDB, consider the following two valid, equivalent JavaScript statements to retrieve a collection of data that meet the (credits < debits) condition,

1. db.myCollection.find( { $where: "this.credits < this.debits" } ); 2. db.myCollection.find( { $where: function() { return obj.credits - obj.debits < 0; } } );

If a dynamic threshold that takes a user input is desired, the second statement can be re--‐written as follows, 3. db.myCollection.find( { $where: function() { return obj.credits - obj.debits < $userInput; } } );

This may expose a vulnerability where an attacker could overwrite the $userInput variable with arbitrary code, such as $userInput = "0;var date=new Date(); do{curDate = new Date();}while(curDate-date<10000)"

If sanitization check fails to screen the $userInput value, now upon concatenation the third statement becomes the following form that could trigger a DOS attack and cause the MongoDB instance to execute at 100% CPU usage for 10 second! 4. db.myCollection.find( { $where: function() { return obj.credits - obj.debits < 0;var date=new Date(); do{curDate = new Date();}while(curDate-date<10000); } } );

[JavaScript NoSQL Injection #2] If developers are not careful enough in security coding, it is also possible for an attacker to pass malicious code directly from a malformed URL [Ref. 16]. The following is a generic JavaScript query function to perform a search based on a provided 'year' criterion, input_value. function() {

PAGE 13

PAGE 13


var search_year = input_value;

return this.publicationYear == search_year ||

this.filmingYear == search_year || this.recordingYear ==

search_year;

} The application developer may code this application using PHP, and the source code that includes building the above function might look like the following, before being passed to a MongoDB instance, $query = 'function() {var search_year = \'' . $_GET['year'] . '\';' .

'return this.publicationYear == search_year || ' . '

this.filmingYear == search_year || ' .

' this.recordingYear == search_year;}'; $cursor = $collection->find(array('$where' => $query)); This code builds the function ad--‐hoc by concatenating the value of the request parameter “year”, and then pass it to MongoDB. This code is vulnerable to a server--‐side JavaScript injection attack. For example, an attacker could formulate the following URL call to cause an effective DoS attack against the system: http://server/app.php?year=1995';while(1);var%20foo='bar Conclusions

Based on this open--‐source research, the following conclusions can be drawn;

1. NoSQL databases are desirable and popular among Web--‐based companies, due to their demonstrated advantages in data flexibility, scalability and performance.

2. Security issues of NoSQL in general remain to be improved. There are only a few NoSQL (e.g., Cassandra) that currently meet the data security requirements of PCI---DSS, e.g., data---at---rest and data---in---motion. However, enhanced security is expected to come at the expense of performance

3. More server---side JavaScript injection vulnerabilities are expected from NoSQL databases, because many of them are running JavaScript engines to achieve high performance.

http://server/app.php?year=1995%27%3Bwhile(1)%3Bvar%20foo%3D%27bar

PAGE 14

PAGE 14


4. Working with NoSQL databases, application developers have much greater responsibility in ensuring reliable transactions and data consistency. In addition, they also have to adhere more closely to the standards and practices of security coding.

5. Relational and NoSQL databases are best co---deployed to process different data flows to achieve the optimal combined features from both families.

PAGE 15

PAGE 15


Author

Dr. Ming--‐ Shih Wong is a Senior Cyber Security Engineer for Incident Response & Forensics for Fidelis Cybersecurity Solutions. He has years of experience in conducting advanced projects for Air Force Intelligence, Surveillance and Reconnaissance (AFISR) and Defense Advanced Research Project Agency (DARPA). He has also participated in several major commercial data--‐ breach Incident Response and Remediation engagements that involve noteworthy companies in payment card and technology industries. As a member of the Fidelis First Response team, he collaborates with forensics and SOC experts in optimizing search strategies for investigation of key indicators of compromise (IOC), utilizing advanced tools like Splunk, ArcSight, open--‐ source log analyzers, or even customized databases. He has also extended his database management foundation to cover PCI security standards and compliance, including data security at rest and data security in motion. He provides expertise in the following data analytics and remediation areas, database security management, breach--‐ indicator investigation, PCI data security standards and compliance, network and host intrusion detections.

References

1. L. Okman, N. Gal--‐Oz, Y. Gonen, E. Gudes, and J. Abramov, "Security Issues in NoSQL Databases", 2011 International Joint Conference of IEEE TrustCom--‐11/IEEE ICESS--‐11/FACT--‐11.

2. HTTP://en.wikipedia.org/wiki/ACID

3. HTTP://nosql--‐database.org/

4. HTTP://en.wikipedia.org/wiki/NoSQL

5. "The Four Categories of NoSQL", http://rebelic.nl/2011/05/28/the--‐four--‐categories--‐of--‐nosql--‐ databases.

6. "Netflix Relies on NoSQL", http://www.dataversity.net/netflix--‐relies--‐on--‐nosql/

7. "Hosting Service MongoDB Suffers Major Security Breach That Explains

http://en.wikipedia.org/wiki/ACID

http://nosql-/

http://en.wikipedia.org/wiki/NoSQL

http://rebelic.nl/2011/05/28/the-

http://www.dataversity.net/netflix-

PAGE 16

PAGE 16


Buffer's Hack Over the Weekend", http://techcrunch.com/2013/10/29/hosting--‐service--‐mongohq--‐suffers--‐major--‐ security--‐breach--‐that--‐explains--‐buffers--‐hack--‐over--‐the--‐weekend/ 8. "LinkedIn Suffer Data Breach", http://www.reuters.com/article/2012/06/06/net--‐us--‐linkedin--‐ breach--‐idUSBRE85511820120606

9. "DataStax Enterprise 3.2 Documentation", http://www.datastax.com/docs/datastax_enterprise3.2/index

10. "MongoDB Security Introduction", http://docs.mongodb.org/manual/core/security--‐ introduction

11. "Data Encryption and Key Management for the Cloud", http://www.gazzang.com/solutions/cloud--‐security

12. "Hadoop Strict Encryption for (Big) Data--‐At--‐Rest", http://www.drdobbs.com/tools/hadoop--‐ strict--‐encryption--‐for--‐big--‐data--‐at/240165149

13. "NoSQL Does Not Have to Mean No Security --‐ Data security and compliance best practices for NoSQL data systems", http://public.dhe.ibm.com/common/ssi/ecm/en/nib03019usen/NIB03019USEN.PDF

14. "Cassandra Replication & Consistency", http://www.slideshare.net/benjaminblack/introduction--‐to--‐cassandra--‐replicatio--‐and--‐consistency

15. "Testing for NoSQL Injection", https://www.owasp.org/index.php/Testing_for_NoSQL_injection

16. "Server--‐Side JavaScript Injections",

http://media.blackhat.com/bh--‐us--‐11/Sullivan/BH_US_11_Sullivan_Server_Side_WP.pdf

http://techcrunch.com/2013/10/29/hosting-

http://techcrunch.com/2013/10/29/hosting-

http://www.reuters.com/article/2012/06/06/net-

http://www.datastax.com/docs/datastax_enterprise3.2/index

http://www.datastax.com/docs/datastax_enterprise3.2/index

http://docs.mongodb.org/manual/core/security-

http://www.gazzang.com/solutions/cloud-

http://www.gazzang.com/solutions/cloud-

http://www.drdobbs.com/tools/hadoop-

http://public.dhe.ibm.com/common/ssi/ecm/en/nib03019usen/NIB03019USEN.PDF

http://public.dhe.ibm.com/common/ssi/ecm/en/nib03019usen/NIB03019USEN.PDF

http://www.slideshare.net/benjaminblack/introduction-

http://www.owasp.org/index.php/Testing_for_NoSQL_injection

http://www.owasp.org/index.php/Testing_for_NoSQL_injection

http://media.blackhat.com/bh-

Documents

Current Data Security Issues of NoSQL Databases Toolkits/The NoSQL... · Current Data Security Issues of NoSQL Databases . ... such as data security and consistency, have not been