NoSQL continued CMSC 461 Michael Wilson. MongoDB MongoDB is another NoSQL solution Provides a bit...

Preview:

Citation preview

NoSQL continuedCMSC 461Michael Wilson

MongoDB MongoDB is another NoSQL solution

Provides a bit more structure than a solution like Accumulo

Data is stored as BSON (Binary JSON) Binary encoded JSON, extends JSON

Allows storage of large amounts of data

SQL vs. MongoDB SQL has databases, tables, rows,

columns Monbo has databases, collections,

documents, fields Both have primary keys, indexes Collection structures are not enforced

heavily Inserts automatically create schemas

Interacting with MongoDB Multiple databases within MongoDB

Switch databases use newDb

New databases will be stored after an insert

Create collection db.createCollection(“collectionName”) Not necessary, collections are implicitly

created on insert

BSON MongoDB uses BSON very heavily

Binary JSON Like JSON with a binary serialization

method Has extensions so that it can represent

data types that JSON cannot Used to represent documents, provide

input to queries

Selects/queries In MongoDB, querying typically consists of

providing an appropriately crafted BSON SELECT * FROM collectionName

db.collectionName.find() SELECT * FROM collectionName WHERE field =

value db.collectionName.find( {field: value} )

SELECT * FROM collectionName WHERE field > 5 db.collectionName.find( {field: {$gt: 5} } )

Other functions that take a query argument have queries that are formatted this way

Interacting with MongoDB Insert

db.collectionName.insert( {queryBSON} ) Update

db.collectionName.update( {queryBSON}, {updateBSON}, {optionBSON} ) updateBSON

Set field to 5: {$set: {field: 5}} Increment field by 1 {$inc: {field: 1}}

optionBSON Options that determine whether or not to create new

documents, update more than one document, write concerns

Interacting with MongoDB Delete

db.collectionName.remove( {queryBSON} )

Apache Hive Also runs on Hadoop, uses HDFS as a

data store Queryable like SQL

Using an SQL-inspired language, HiveQL

Hive data organization Databases Tables Partitions

Tables are broken down into partitions Partition keys allow data to be stored into

separate data files on HDFS Can query on particular partitions

Buckets Can bucket by column to sample data

Purpose of Hive Provide analytics, query large volumes

of data NOT to be used for real time queries like

Postgres or Oracle Hive queries take forever

Partitions and buckets can help reduce this amount of time

Hive queries Hive queries actually generate

MapReduce jobs MapReduce jobs take a while to set up

and run MapReduce jobs can be run manually,

but for structured data and analytics, Hive can be used

Recommended