Upload
vivian-s-zhang
View
679
Download
0
Embed Size (px)
DESCRIPTION
It is a NYC Open Data Meetup event. All credits went to Kannan and Roman. Event link: http://www.meetup.com/NYC-Open-Data/events/141123082/ Blog Post: http://www.nycopendata.com/2014/02/11/mongodb/
Citation preview
MONGODB WORKSHOP{
meetup: “NYC Open Data”, presenters: [“Kannan Sankaran”, “Roman Kubiak”],host: “Vivian”,location: “ThoughtWorks”,audience: “You guys”
}
MONGODB WORKSHOP{
meetup: “NYC Open Data”, presenters: [“Kannan Sankaran”, “Roman Kubiak”],host: “Vivian is awesome, THANK YOU”,location: “ThoughtWorks is awesome, THANK YOU”,audience: “You guys are awesome, THANK YOU”
}
OUR TOPICSOVERVIEW OF DATABASES
WHAT IS MONGODB?
MONGODB, NOSQL, AND RELATIONAL DATABASES
A PEEK AT MONGODB COMMANDS
SHARDING AND REPLICATION IN MONGODB
FUTURE OF MONGODB AND US
DEMO
WORKSHOP
ARCHITECT
MONGO PIE
OVERVIEW OF DATABASES
ROWSCOLUMNS
TABLES
ORGANIZING DATA
DATA SPREAD OUT IN VARIOUS
TABLES
DATA MAY BE RELATED
1980s 1990s 2000s 20071970s
RELATIONAL DATABASES
(RDBMS) CREATED
CLIENT/SERVER MODEL
STRUCTURED QUERY LANGUAGE (SQL) CREATED
RDBMS CONTINUE TO BE POPULAR
INTERNET ARRIVES
INTERNET GROWS
NoSQL DATABASES EMERGE
MONGODB CREATED
DATABASES AND THEIR GROWTH
WHAT IS NoSQL?
A TWITTER HASHTAG#nosql
NOSQL GENERALLY REFERS TO DATABASES THAT DO NOT HAVE
A FIXED ROW-COLUMN DATA ORGANIZATION STRUCTURE.
WHAT IS MONGODB?
A HUMONGOUS NoSQL DB
DOCUMENTS NOT ROWSCOLLECTIONS NOT TABLES
A HUMONGOUS NoSQL DBWHERE DATA IS ORGANIZED BY
WHAT IS A DOCUMENT?
A DOCUMENT IS LIKE A ROW…
{_id: ObjectID(“12AB34CD56EF”),name: “Ed Brown”,orderDate: “2-1-2014”
}
…BUT IT IS MORE FLEXIBLE{
_id: ObjectID(“12AB34CD56EF”),name: “Ed Brown”,orderDate: “2-1-2014”,payments: {
car: “100.50”,hotel: “200”
}}
THAT LOOKS LIKE A DOCUMENT WITHIN ANOTHER DOCUMENT!
{_id: ObjectID(“12AB34CD56EF”),name: “Ed Brown”,orderDate: “2-1-2014”,payments: {
car: “100.50”,hotel: “200”
},tags: [“shirt”, “tie”]
}
WHAT IS THIS? MULTIPLE VALUES WITHIN A COLUMN?
HOW LARGE CAN THIS DOCUMENT BE?
{_id: ObjectID(“12AB34CD56EF”),name: “Ed Brown”,orderDate: “2-1-2014”,payments: {
car: “100.50”,hotel: “200”
}………
}
UP TO 16 MB
LEO TOLSTOY’S 1225-PAGE BOOK ON WAR AND PEACE CAN FIT IN 1 DOCUMENT, AS IT IS ONLY AROUND 3 MB.
WELL, ALMOST!
ISN’T THAT JSON?
WHAT IS JSON?
WEB SERVER
MONGODB DATABASE
{
“vehicle”: “Chevy Malibu 2014”,“price”: { “min”: 22340, “max”: 29950 },“citympg”: 25
}
{ “make”: “Chevy”,“model”: “Malibu”,“year”: 2014
}
WHAT IS JSON?
{
vehicle: “car”, make: “Malibu”,color: “blue”
}
JAVASCRIPT OBJECT NOTATION NAME-VALUE PAIRS
{ name: “Kannan”, gender: “male”,favorites: {
color: “blue”},interests: [“MongoDB”, “R”]
}
MONGODB DOCUMENT{
_id: ObjectID(“12AB34CD56EF”),name: “Kannan”,
gender: “male”,
favorites:
{
color: “blue”
},
interests: [“MongoDB”, “R”],
date: new Date()
}
WHAT IS A COLLECTION?
A GROUP OF DOCUMENTS
{_id: ObjectID(“12AB34CD56EF”),name: “Ed Brown”,orderDate: “2-1-2014”
}{
_id: ObjectID(“78AB34CD56EF”),name: “Roman Ku”,orderDate: “2-1-2014”
}{
_id: ObjectID(“56AB34CD56EF”),name: “Eva Green”,orderDate: “2-1-2014”
}
{_id: ObjectID(“34AB34CD56EF”),name: “Ed Brown”,orderDate: “2-1-2014”,tags: [“shirt”, “tie”]
}{
_id: ObjectID(“90AB34CD56EF”),name: “Roman Ku”,orderDate: “2-1-2014”,payments: { car: “100.50”, hotel: “200” }
}{
_id: ObjectID(“13AB34CD56EF”),name: “Eva Green”,orderDate: “2-1-2014”
}
{_id: ObjectID(“35AB34CD56EF”),name: “Ed Brown”,orderDate: “2-1-2014”
}{
_id: ObjectID(“79AB34CD56EF”),vehicle: “car”, make: “Malibu”,color: “blue”
}{
_id: ObjectID(“57AB34CD56EF”),name: “Eva Green”,orderDate: “2-1-2014”,tags: [“shirt”, “tie”]
}
SIMILAR DIFFERENT VERY DIFFERENT
MONGODB IS...
A DOCUMENT-ORIENTED NOSQL DATABASE WHERE DATA CONSISTS OF
DOCUMENTS STORED IN COLLECTIONS.
MONGODB FEATURES
EASY TO LEARNDYNAMIC QUERY LANGUAGE - SEARCH BY FIELDS, REGULAR EXPRESSIONS- USER-DEFINED JAVASCRIPT FUNCTIONS- AGGREGATION, INCLUDING MAP/REDUCEINDEXING – SINGLE, COMPOUND, GEOSPATIALREPLICATIONLOAD BALANCING USING SHARDINGGRIDFS TO STORE FILES
MONGODB USAGE
CONTENT MANAGEMENT SYSTEMSE-COMMERCE WEBSITESLOG DATA AND HIERARCHICAL AGGREGATIONREAL-TIME ANALYTICS
MONGODB, NOSQL, AND RELATIONAL DATABASES
1980s 1990s 2000s 20071970s
BERKELEY INGRES
ORACLE
INFORMIX
DB2
SYBASE
SQL SERVER
MS ACCESS
POSTGRESQL
MYSQL
NETEZZA
GREENPLUM
VERTICA
MARIADB
MONGODB
DATABASE MANAGEMENT SYSTEMS
MOST SYSTEMS USE SOME FLAVOR OF SQL
RELATIONAL DATABASES WERE / STILL ARE THE DEFACTO IN SEVERAL
COMPANIES.
RELATIONAL DATABASE FEATURESC.R.U.D. OPERATIONS
STRUCTURED QUERY LANGUAGE (SQL)
FIXED DATABASE SCHEMAS
NORMALIZATION
REFERENTIAL INTEGRITY(E.G. FOREIGN KEYS, CONSTRAINTS)
JOINS
TRANSACTIONS - A.C.I.D. PROPERTIES
INDEXES
IN THE LATE 90s/EARLY 2000s…
DOT COM BUBBLE
DOT COM BUST
WEB SERVICES
SOCIAL NETWORKS
GOOGLE, AMAZON
COMPUTER OWNERS/USERS
WEBSITE DATA COLLECTION
DATABASE SIZES
COMPUTING/STORAGE RESOURCES BECAME A
CHALLENGE FOR SMALLER COMPANIES LIKE GOOGLE AND
AMAZON THAT HAD LOTS OF DATA.
SCALE UP
MORE DISK SPACE
MORE RAM
MORE PROCESSORS
MORE EXPENSIVE
SINGLE POINT OF FAILURE
HARDWARE HAS LIMITS!
BIGGER MACHINE
SCALE OUT
LESS DISK SPACE
LESS RAM
LESS PROCESSORS
LESS EXPENSIVE
NO SINGLE POINT OF FAILURE
HIGHER RELIABILITY DESPITE FAILURE OF INDIVIDUAL MACHINES
SMALLER MACHINES
RELATIONAL DATABASES WERE DESIGNED TO OPERATE ON A
SINGLE MACHINE, AND SCALING OUT MEANT A LOT OF
CHALLENGES.
SPLITTING DATA FOR SCALE OUT
BY COLUMNS BY
ROWS
WORDPRESS MYSQL SCHEMA WITH 2 TABLES
A JOIN QUERY IN MYSQLWP_POSTS
SELECT p.post_author, p.post_date, c.comment_author, c.comment_dateFROM wp_posts AS p INNER JOIN wp_comments AS c ON p.ID = c.comment_post_IDWHERE p.ID = 1;
WP_COMMENTS
A JOIN QUERY IN MYSQLWP_POSTS WP_COMMENTS
RESULT
SCALE OUT DATA BY ROWSWP_POSTS
A
B
WP_COMMENTSC
D
HOW COMPLICATED
WOULD SCALING THIS
BE?
JOINS MAY GET REALLY MESSY WITH MANY MACHINES
(DISTRIBUTED JOINS)
TRANSACTIONSWP_POSTS
BEGIN TRANSACTIONTRY
DELETE FROM wp_comments AS cWHERE c.comment_post_ID = 1;
DELETE FROM wp_posts AS pWHERE p.ID = 1;
CATCHIF ERROR THEN ROLLBACK TRANSACTION
COMMIT TRANSACTIONEND TRANSACTION
WP_COMMENTS
MUST SATISFY A.C.I.D.
PROPERTIES
TRANSACTIONS MAY TAKE A LONG TIME TO EXECUTE IF DATA
IS ON DIFFERENT MACHINES (DISTRIBUTED TRANSACTIONS)
TO SPLIT THE DATA, A WHOLE BUNCH OF COMPROMISES
MUST BE MADE IN RELATIONAL DATABASES
THIS GAVE RISE TO NON-RELATIONAL SOLUTIONS
GOOGLEAMAZON
NoSQL SYSTEM CHARACTERISTICSC.R.U.D. OPERATIONS
STRUCTURED QUERY LANGUAGE (SQL)
FIXED DATABASE SCHEMAS
NORMALIZATION
REFERENTIAL INTEGRITY(E.G. FOREIGN KEYS, CONSTRAINTS)
JOINS
TRANSACTIONS – LIMITED A.C.I.D. PROPERTIES
INDEXES
OPEN SOURCE
HOW IS THIS SCALABILITY ACHIEVED IN MONGODB?
STACKING THE DATA
STACKING THE DATAWP_POSTS
WP_COMMENTS
NO NEED TO JOIN
{_id: 1,post_author: “Amy W”,post_date: “1/1/2014”,comments: [{
comment_author: “bestguy”,comment_date: “1/1/2014”
},{comment_author: “baddie”,comment_date: “1/10/2014”
},{comment_author: “clever24”,comment_date: “1/11/2014”
}]}
NOW, EACH DOCUMENT CAN BE IN A DIFFERENT MACHINE
WHAT ABOUT TRANSACTIONS?
MONGODB DOES NOT SUPPORT TRANSACTIONS
BUT SINGLE DOCUMENT UPDATE IS ATOMIC{
_id: 1,post_author: “Amy W”,post_date: “1/1/2014”,comments: [{
comment_author: “bestguy”,comment_date: “1/1/2014”
},{comment_author: “baddie”,comment_date: “1/10/2014”
},{comment_author: “clever24”,comment_date: “1/11/2014”
}]}
THE KEY IS TO FOCUS ONTHE DATA MODEL
MONGODB CHARACTERISTICSC.R.U.D. OPERATIONS
STRUCTURED QUERY LANGUAGE (SQL) DYNAMIC QUERY LANGUAGE
FIXED DATABASE SCHEMASFLEXIBLE DATABASE SCHEMAS
NORMALIZATION
REFERENTIAL INTEGRITY(E.G. FOREIGN KEYS, CONSTRAINTS)
JOINS
TRANSACTIONS – LIMITED A.C.I.D. PROPERTIES
INDEXES
OPEN SOURCE
WHEN NOT TO USE MONGODB
IF TRANSACTIONS ARE A MUST
IF JOINS ARE ABSOLUTELY NECESSARY
SOFTWARE PRODUCTS LIKE WORDPRESS THAT ALREADY HAVE TONS OF SUPPORT FOR RELATIONAL DATABASES
FOR MONGODB vs MYSQL ARGUMENTS, WATCH…
Source: http://www.youtube.com/watch?v=b2F-DItXtZs
A PEEK AT MONGODB COMMANDS
{ _id: ObjectID(“A1234566789”), name: “Ed Brown”, orderDate: “2-1-2014”
}{
_id: ObjectID(“A1234566789”), name: “Roman Ku”, orderDate: “1-1-2014”
}{
_id: ObjectID(“A1234566789”), name: “Eva Green”, orderDate: “10-12-2013”
}
MONGODB IS A DOCUMENT-ORIENTED DATABASE
DOCUMENTS ARE INTERNALLY STORED AS BSON (BINARY JSON)
MONGODB FEATURES
EASY TO LEARNDYNAMIC QUERY LANGUAGE - SEARCH BY FIELDS, REGULAR EXPRESSIONS- USER-DEFINED JAVASCRIPT FUNCTIONS- AGGREGATION, INCLUDING MAP/REDUCEINDEXING – SINGLE, COMPOUND, GEOSPATIALREPLICATIONLOAD BALANCING USING SHARDINGGRIDFS TO STORE FILES
MONGODB SYNTAX SEEMS TO BE BORROWED FROM…
- MYSQL
- JSON
- JAVASCRIPT
- UNIX
MONGODB SUPPORTS SEVERAL LANGUAGES
DRIVERS FOR
- PYTHON
- NODE.JS
- C#
- HADOOP
- R
AND MANY MORE
MONGODB TERMINOLOGYRDBMS MONGODBDATABASE DATABASE
TABLE COLLECTION
ROW DOCUMENT
A DATABASE CAN HAVE 1 OR MORE COLLECTIONS.
A COLLECTION CAN HAVE 1 OR MORE DOCUMENTS.
A DOCUMENT CAN HAVE 1 OR MORE NAME-VALUE PAIRS, AND/OR 1 OR MORE EMBEDDED DOCUMENTS.
MONGODB SUPPORTS SEVERAL DATA TYPES
STRING
NUMBER
BOOLEAN
ARRAY
DATE
EMBEDDED DOCUMENT
NULL
MONGODB OPERATIONS
C.R.U.D.CREATE
READ
UPDATE
DELETE
CONNECTING TO MONGODB
MONGOD
MONGO ROBOMONGO
MONGO SHELL IS A JAVASCRIPT INTERPRETER.
ROBOMONGO HAS THE SAME JAVASCRIPT ENGINE AS THE MONGO SHELL.
mongoimport -d tennis –c ParksNYC --type json --drop < ParksNYC.json
IMPORT JSON TO MONGO COLLECTION
CREATE TABLE ParksNYC
(
id int identity(1, 1),
Prop_ID varchar(10),
Name varchar(50) not null,
Location varchar(20) not null,
EstablishedOn datetime
)
SQL MONGODB
CREATE COLLECTION
INSERT ParksNYC (Prop_ID, Name, Location, EstablishedOn)
VALUES(’Q900’, ’Ridge Park’, ‘1843 Norman St.’, ‘1/1/1970’)
db.ParksNYC.insert({
Prop_ID : "Q900",
Name : "Ridge Park",
Location : ”1843 Norman St.”,
EstablishedOn: “1/1/1970”
})
SQL MONGODB
CREATE DOCUMENT
Prop_ID Name Location EstablishedOn
Q900 Ridge Park 1843 Norman St. 1/1/1970
SELECT * FROM ParksNYC
SQL MONGODB
READ ALL DOCUMENTS
db.ParksNYC.find()
SELECT * FROM ParksNYC
WHERE Name = "Ridge Park"
SQL MONGODB
READ SPECIFIC DOCUMENT
db.ParksNYC.find(
{
Name : "Ridge Park”
})
SELECT TOP 1 * FROM ParksNYC
SQL MONGODB
READ FIRST DOCUMENT
db.ParksNYC.findOne()
SELECT id, Name FROM ParksNYC
SQL MONGODB
READ SPECIFIC FIELDS IN DOCUMENT
db.ParksNYC.find(
{ },{
_id: 1, Name: 1
}
)
SELECT id, Name FROM ParksNYC WHERE Courts > 5AND Courts <= 8
SQL MONGODB
READ DOCUMENTS WITH RANGE CRITERIA
db.ParksNYC.find(
{
Courts: { $gt: 5, $lte: 8}
}
)
SELECT id, Name FROM ParksNYC WHERE NAME LIKE ‘F%’
SQL MONGODB
READ DOCUMENTS THAT START WITH A LETTER (REGULAR EXPRESSION)
db.ParksNYC.find(
{
Name: /^F/
}
)
UPDATE ParksNYCSET VisitDate = ‘1/1/2014’
SQL MONGODB
UPDATE FIELD IN DOCUMENT
db.ParksNYC.update({ }, {
$set: { VisitDate: "1/1/2014" }
},{ multi: true}
)
DELETE FROM ParksNYCWhere Name = ‘Ridge Park’
SQL MONGODB
DELETE DOCUMENT
db.ParksNYC.remove(
{
Name : “Ridge Park”})
SELECT COUNT(Name) AS Parks_Number,
SUM(Courts) AS Courts_Number
FROM ParksNYC
GROUP BY Accessible
SQL MONGODB
GROUP BY AND SUM
db.ParksNYC.aggregate({ $group :
{_id : "$Accessible", Parks_Number : { $sum : 1 }, Courts_Number :
{ $sum : "$Courts" } }
})
SHARDING AND REPLICATION IN MONGODB
EACH DOCUMENT CAN BE IN A DIFFERENT MACHINE
HOW DOES MONGODB DO THIS?
AUTOSHARDING, FOR A COLLECTION
MONGODB CLUSTER
MONGOS
CLIENT
MONGOD MONGOD MONGOD
CLIENT
MONGOD
SHARDING STEPS1. ENABLE SHARDING ON DATABASE.2. PICK A SHARD KEY FROM THE COLLECTION.
MAKE SURE THE KEY IS- INDEXED- SUFFICIENTLY UNIQUE SO IT WILL HAVE A VARIETY OF UNIQUE VALUES.
3. SIT BACK AND RELAX. MONGODB WILL AUTOMATICALLY DO THE SHARDING.
SHARDING WP_POSTS COLLECTION{
_id: 1,post_author: “Amy W”,post_date: “1/1/2014”,comments: [{
comment_author: “bestguy”,comment_date: “1/1/2014”
},{comment_author: “baddie”,comment_date: “1/10/2014”
},{comment_author: “clever24”,comment_date: “1/11/2014”
}]}
SHARD KEY
BREAKING THE USERS INTO CHUNKS
$minKeyAbba1234
Abba1235CarlW
CarlZFrankT
FrankYJackA
JackBLambV
LambWRobF
RobGTimA
TimB$maxKey
BREAKING THE RANGE INTO CHUNKS
$minKeyAbba1234
Abba1235CarlW
CarlZFrankT
FrankYJackA
JackBLambV
LambWRobF
RobGTimA
TimB$maxKeyMONGOS
CLIENT
MONGOD
MONGOD
MONGOD
SHARD0000
SHARD0001
SHARD0002
BENEFITS OF SHARDING
1. INCREASES AVAILABLE MEMORY.2. REDUCES LOAD ON THE SERVER.3. INCREASES HARD DISK SPACE.4. LOCATION-BASED SHARD KEYS CAN PUT DATA
CLOSE TO THE USERS AND KEEP RELATED DATA TOGETHER.
MASTER-SLAVE REPLICATION
MONGOD
CLIENT
MASTER SLAVE SLAVE
REPLICA SET
MONGOD MONGOD
MASTER-SLAVE REPLICATION
MONGOD
CLIENT
MASTER SLAVE SLAVE
REPLICA SET
MONGOD MONGOD
ELECTION
MASTER-SLAVE REPLICATION
MONGOD
CLIENT
MASTER SLAVE
REPLICA SET
MONGOD MONGOD
MINIMUM 3 MEMBERS TO FORM REPLICA SET
MASTER-SLAVE REPLICATION
MONGOD
CLIENT
MASTER SLAVE
REPLICA SET
MONGOD MONGOD
SLAVE
REPLICATION SOLVES THE PROBLEM OF AVAILABILITY
AND FAULT TOLERANCE
FUTURE OF MONGODB AND US
COMPANIES USING MONGODB
MONGODB WINS AWARD
36 MOST VALUABLE STARTUPS ON EARTH
ORACLE
SQL SERVER
MONGODB
POSTGRESQL
?RIAK
NEO4J
POLYGLOT PERSISTENCE
GOOD TO KNOW BOTH SQL AND
NOSQL
MYSQL
DREMEL
ARCHITECT
WHAT WE DID NOT COVER
SECURITY
BACKUP/RECOVERY
DATA MODELING
THANK YOU VERY MUCH
AND THANK YOU TO EVERYONE WHO HELPED US
DR. BILL HOWE, UNIVERSITY OF WASHINGTON
JASON CHEN, MONGODB RECRUITER
KRISTINA CHODOROW (DEFINITIVE GUIDE AUTHOR)
FRANCESCA KRIHELY (MONGODB COMMUNITY MANAGER)
DR. MARKUS SCHMIDBERGER, RMONGODB
JOHANNES BRANDSTETTER, MONGOSOUP (THE FIRST EUROPEAN PARTNER OF MONGODB TO PROVIDE MONGODB AS A SERVICE)
DR. RAMNATH VAIDYANATHAN, RCHARTS
REFERENCESMongoDBhttp://www.mongodb.org
Book: MongoDB, The Definitive Guide – Kristina Chodorow
Book: NoSQL Distilled – Pramod J. Sadalage and Martin Fowler
NoSQLhttp://en.wikipedia.org/wiki/NoSQL
MongoDB Use Caseshttp://www.mongodb.com/use-cases
First NoSQL Meetup Noteshttp://developer.yahoo.com/blogs/ydn/notes-nosql-meetup-7663.html
Billion dollar clubhttp://graphics.wsj.com/billion-dollar-club/
Photos from Google
DEMO