Upload
mongodb
View
3.432
Download
2
Embed Size (px)
Citation preview
Building Java Applications on MongoDB
Aveek Bushan [email protected] Solutions Architect – APAC Lead
• @aveekshith
Contents
• From Sampling to n=all • Implica1ons from a Data Standpoint • Building an Applica1on in Java • Core Features of MongoDB • Connec1ng the Dots
Random Sample
Image Source: SurveyMonkey
Sampling • Based on Random Sampling • Used in a variety of fields – Opinion Polls, Bug Es1ma1on etc
Issues • Loss of detail – 3% margin of error • Are the samples truly Random? • Outliers might have very interes1ng informa1on • Black Swan Events have a massive impact that cannot be captured in a Normal Distribu1on
N=All
Causa1on to Correla1on
From Why to What
Ever-‐more Ubiquitous access to the Digital World
Cost of Storage has plummeted over the years
Ability to process Unstructured and semi-‐structured informa1on
SoUware tools that can process the data at real-‐1me
Source: Big Data – Viktor Mayer-‐Schönberger and Kenneth Cukier
Data Implications
Data Implications
Rich Data
Data Variety
Fast Processing
Data Availability
Data Volume
Geo-‐Spa1al
Real-‐1me Access
Data Durability
Expressive Query
Language
Strong Consistency
Secondary Indexes
Flexibility
Scalability
Performance
MongoDB - Nexus Architecture
Relational + NoSQL
Example Application Requirements
• Skillsets of Employees • Certification and Skill level • Dashboard View of real-time data • Scalable, Reliable and Performant
Database
Design the Schema
Embedded Informa1on
Sub-‐documents, Arrays etc
Na1vely Supported
Differing Data
RD
DVa
FP
DA
DVo
GS
RTA
DD
Preparing the Java Application
• Add the driver Libraries to the Classpath 3.0 New Features – Generic MongoCollec1on Interface – New Asynchronous API – New Codec Infrastructure – New Core Driver
• Start the MongoDB instance. Let’s start with a standalone instance. For a write-‐performant storage engine, start the mongod with –storageEngine wiredTiger
Build the Java Object
Or Use a Object-‐DocumentMapper such as Morphia
@En1ty public class coll {
@Id private int id; private String name; @Embedded private List<SkillsPOJO> skills; @Embedded private InfoPOJO info;
@Embedded public class SkillsPOJO {
private String skill; private int level; private String version; private boolean cer1fied;
// Similarly for Info POJO
public class DataObject { private int id; private String name; private List<SkillObject> obj; private InfoObject info;
public class SkillObject { private String skill; private int level; private String version; private boolean cer1fied;
public class InfoObject { private String dept; private int experience; private List<Double> gps; private String loca1on; private boolean reviewed;
DB Tier
Connect to MongoDB
mongod
Java Client
Driver
public void MongoConnect(String[] hosts) { List<ServerAddress> seeds = new ArrayList<ServerAddress>(); for (String h : hosts) { // MongoDB Server address and Port seeds.add(new ServerAddress(h)); } // MongoDB client with internal connec1on pooling. client = new MongoClient(seeds); // The database to connect to database = client.getDatabase("mydb"); // The collec1on to connect to collec6on = database.getCollec/on("coll"); }
import com.mongodb.MongoClient; import com.mongodb.client.MongoCollec1on; import com.mongodb.client.MongoDatabase;
Or Use an ODM import com.mongodb.MongoClient; import com.mongodb.client.MongoCollec1on; import com.mongodb.client.MongoDatabase; import org.mongodb.morphia.Datastore; import org.mongodb.morphia.Morphia;
public void MorphiaConnect(String[] hosts) { List<ServerAddress> seeds = new ArrayList<ServerAddress>(); for (String h : hosts) { seeds.add(new ServerAddress(h)); } client = new MongoClient(seeds); morphia = new Morphia(); // Map the Morphia Object morphia.map(coll.class).map(SkillsPOJO.class). map(InfoPOJO.class); // Create a datastore to interact with MongoDB // using POJOs ds = morphia.createDatastore(client, "mydb"); }
DB Tier
mongod
Java Client
Driver
Authentication
String dbName = ”testdb"; String userName = "user1"; char[] password = {‘p',’w',’d'}; MongoCreden1al creden1al = MongoCredenAal.createMongoCRCreden/al(
dbName, userName, password); // With the appropriate Creden1al client = new MongoClient(seeds,
Arrays.asList(creden1al));
Perform some Inserts
Using Morphia
Document doc = new Document("_id", emplList.get(i).getId()) .append("name", emplList.get(i).getName()) .append("skills", skillBOList) .append("info", new Document("dept", info.getDept()) .append("yearsexp", info.getExperience()) .append("gps", info.getGPS()) .append("loca1on", info.getLoca1on()));
collec/on.insertOne(doc);
import org.bson.Document; import com.mongodb.client.MongoCollec1on;
public void insert(List<coll> emplList) throws InterruptedExcep1on { ds.save(emplList); }
RD
DVa
FP
DA
DVo
GS
RTA
DD
Async Operations
// Factory of MongoClient Instances client = MongoClients.create("mongodb://localhost"); database = client.getDatabase("mydb"); collec6on = database.getCollec6on("coll"); … // methods that cause network IO take a SingleResponseCallback<T> and return immediately collec6on.insertOne(doc, new SingleResultCallback<Void>() { @Override
public void onResult(final Void result, final Throwable t) { System.out.println("Inserted!"); }
}); …
import com.mongodb.async.SingleResultCallback; import com.mongodb.async.client.*;
RD
DVa
FP
DA
DVo
GS
RTA
DD
Retrieve the Data
import sta1c com.mongodb.client.model.Filters.*; … public void read(int id) {
Document myDoc = collec/on.find(eq("_id", id)).first(); System.out.println("Read Document with id: " + id + "\n” + myDoc.toJson() + "\n"); …
}
Using Morphia
List<coll> empl = ds.createQuery(coll.class).filter("id =", id) .asList();
RD
DVa
FP
DA
DVo
GS
RTA
DD
Retrieving a Datapoint { "_id" : 5, "name" : "John Snow",
"skills" : [ { "name" : "java", "level" : 3, "cerAfied" : true }, { "name" : "mongo", "level" : 5 } ], "info" : { "dept" : "A91", "yearsexp" : 3, "gps" : [-‐74.00597, 40.71427], "locaAon" : "New York" }
}
RD
DVa
FP
DA
DVo
GS
RTA
DD
Geo-Location Query
import sta1c com.mongodb.client.model.Filters.*; … public void read(List<Double> gps, Double maxDistance, Double minDistance) {
double longitude = gps.get(0); double la1tude = gps.get(1); collec6on.createIndex(new Document("info.gps", "2dsphere")); MongoCursor<Document> cursor = collec/on.find( near("info.gps", new Point(
new PosiAon(longitude, laAtude)), maxDistance,
minDistance)).iterator(); while (cursor.hasNext()) { … } …
}
RD
DVa
FP
DA
DVo
GS
RTA
DD
Geo-Location - Output
• Query to get all employees in and around Boston(GPS coordinates Lat 42.35843, Long -71.05977), within maxDistance of 400,000 Ms
{ "_id" : 5, "name" : "John Snow",
"skills" : [ { "name" : "java", "level" : 3, "cerAfied" : true }, { "name" : "mongo", "level" : 5 } ], "info" : { "dept" : "A91", "yearsexp" : 3, "gps" : [-‐74.00597, 40.71427], "locaAon" : "New York" }
}
{ "_id" :45, "name" : ”Jack Kingsley",
"skills" : [ { "name" : ”c++", "level" : 4 }, { "name" : "mongo", "level" : 2,
“version”: “3.0” } ], "info" : { "dept" : ”A83", "yearsexp" : 18, "gps" : [-‐71.05977,
42.35843], "locaAon" : ”Boston" }
}
RD
DVa
FP
DA
DVo
GS
RTA
DD
Update the Data
import sta1c com.mongodb.client.model.Filters.*; … Map<String, Object> updateOps = new HashMap<String, Object>(); updateOps.put("$inc", new Document("info.yearsexp", 1)); updateOps.put("$set", new Document("info.reviewed", true)); result = collec/on.updateOne(eq("_id", id), new Document(updateOps));
Using Morphia
Query<coll> query = ds.createQuery(coll.class).field("id").equal(id); UpdateOpera1ons<coll> ops = ds.createUpdateOpera/ons(coll.class)
.inc("info.experience", 1) .set("info.reviewed", true);
ds.update(query, ops);
RD
DVa
FP
DA
DVo
GS
RTA
DD
Update - Output
• Data point has been reviewed after 1 more year of employment { "_id" : 5, "name" : "John Snow",
"skills" : [ { "name" : "java", "level" : 3, "cer1fied" : true }, { "name" : "mongo", "level" : 5 } ], "info" : { "dept" : "A91", "yearsexp" : 3, "gps" : [-‐74.00597, 40.71427], "loca1on" : "New York" }
}
{ "_id" : 5, "name" : "John Snow",
"skills" : [ { "name" : "java", "level" : 3, "cer1fied" : true }, { "name" : "mongo", "level" : 5 } ], "info" : { "dept" : "A91", "yearsexp" : 4, "gps" : [-‐74.00597, 40.71427], "loca1on" : "New York”,
“reviewied” : true }
}
RD
DVa
FP
DA
DVo
GS
RTA
DD
Delete Data
Using Morphia
import sta1c com.mongodb.client.model.Filters.*; … public void delete(int id) {
collec/on.deleteOne(eq("_id", id)); System.out.println("Deleted Document with id: " + id + "\n"); …
}
public void delete(int id) { Query<coll> query = ds.createQuery(coll.class) .field("id").equal(id); ds.delete(query); …
}
RD
DVa
FP
DA
DVo
GS
RTA
DD
Replica Set
High Availability
Secondary Secondary
Primary
Java Client
Driver
✔
✔ ✔
• Automated Fail-‐over
• Rolling upgrades • Mul1 Data Center
Support • Data Durability and
Strong Consistency Heartbeat
RD
DVa
FP
DA
DVo
GS
RTA
DD
MongoDB set up
Use MongoDB OpsManager or Cloud Manager Automation to set up the cluster
(or) sudo mongod --port 27017 --dbpath /data/rs1 --replSet rs --logpath /logs/rs1.log --fork sudo mongod --port 27018 --dbpath /data/rs2 --replSet rs --logpath /logs/rs2.log --fork sudo mongod --port 27019 --dbpath /data/rs3 --replSet rs --logpath /logs/rs3.log --fork mongo --port 27017 > config = { "_id" : "rs", "members" : [ ... {"host":"localhost:27017", "_id":0}, ... {"host":"localhost:27018", "_id":1}, ... {"host":"localhost:27019", "_id":2} ... ] ... } rs.initiate(config)
In the Java Program, pass the addresseses and Ports of the replica set members as part of the Connec1on String
RD
DVa
FP
DA
DVo
GS
RTA
DD
Ensuring Durability
• By default, WriteConcern is Acknowledged => received write opera1on and has applied the change in-‐memory
• Primary Server crash means that the data might be lost
• Stricter WriteConcern such as Majority or w:2
for (int retry = 0; retry < 3; retry++) { try { collec6on.withWriteConcern(WriteConcern.MAJORITY) .insertOne(doc); break; } catch (Excep1on e) { e.getMessage(); Thread.sleep(5000); }
}
RD
DVa
FP
DA
DVo
GS
RTA
DD
Eventual Consistency
Repor1ng Applica1on
Driver
Replica Set
P
S S
• Read from the nearest node for lower latency • Read-‐only applica1ons where eventual consistency is OK
– For Ex: Repor1ng Applica1ons • Can be achieved using ReadPreference in MongoDB • Modes of Primary, PrimaryPreferred, Secondary,
SecondaryPreferred and Nearest
Repor1ng Applica1on and Secondary Member are on the same DC
myDoc = collec6on .withReadPreference(ReadPreference.nearest()) .find(eq("_id", id)).first();
HA Best Practices
• HA against DC failures and ac1ve-‐ac1ve => 5 Nodes across 3 DCs
• For Writes => Majority Nodes Need to be in Ac1ve State • For Reads => Secondary Reads can con1nue • Majority Inac1ve => Force Reconfig to con1nue Writes
rs:SECONDARY> config = { "_id" : "rs", "members" : [ ... {"host":"localhost:27018", "_id":1} ... ] ... } rs:SECONDARY> rs.reconfig(config, {force:true}) { "ok" : 1 } rs:PRIMARY>
Replica Set
Removed Removed
Primary
Java Client
Driver
✔
✗ ✗
Aggregation of Data import sta1c com.mongodb.client.model.Accumulators.avg; import sta1c com.mongodb.client.model.Accumulators.sum; import sta1c com.mongodb.client.model.Aggregates.group; import sta1c com.mongodb.client.model.Aggregates.sort; import sta1c com.mongodb.client.model.Aggregates.unwind; import sta1c com.mongodb.client.model.Aggregates.out; … public void deptForSkills() { Document group = new Document(); group.append("skills", "$skills.name"); group.append("dept", "$info.dept"); AggregateIterable<Document> iter = collec6on.aggregate(Arrays .asList(unwind("$skills"), group(group, avg("avgLevel", "$skills.level"), sum("count", 1)), sort(new Document().append( "_id.skills", 1).append( "avgLevel", -‐1)), out("skills"))); }
RD
DVa
FP
DA
DVo
GS
RTA
DD
{ "_id" : 5, "name" : "John Snow",
"skills" : [ { "name" : "java", "level" : 3, "cerAfied" : true }, { "name" : "mongo", "level" : 5 } ], "info" : { "dept" : "A91", "yearsexp" : 3, "gps" : [-‐74.00597, 40.71427], "locaAon" : "New York" }
}
Aggregation - Output
{ "_id" : { "skills" : "c++", "dept" : "A75" }, "avgLevel" : 5, "count" : 10 } { "_id" : { "skills" : "c++", "dept" : "A83" }, "avgLevel" : 4.666666666666667, "count" : 30 } { "_id" : { "skills" : "c++", "dept" : "A91" }, "avgLevel" : 3, "count" : 10 } { "_id" : { "skills" : "java", "dept" : "A75" }, "avgLevel" : 4, "count" : 10 } { "_id" : { "skills" : "java", "dept" : "A83" }, "avgLevel" : 3.5, "count" : 10 } { "_id" : { "skills" : "java", "dept" : "A91" }, "avgLevel" : 3, "count" : 40 } { "_id" : { "skills" : "mongo", "dept" : "A91" }, "avgLevel" : 5, "count" : 40} { "_id" : { "skills" : "mongo", "dept" : "A83" }, "avgLevel" : 2, "count" : 10 } { "_id" : { "skills" : "mongo", "dept" : "A75" }, "avgLevel" : 1, "count" : 10 }
RD
DVa
FP
DA
DVo
GS
RTA
DD
DB Tier
Sharding
Shard 1
Java Client
Driver
Shard 2
P
S S
P
S S
Router Router … Client Tier
Config Server
Config Server
Config Server
Shard n
P
S S
• Scale as you grow • Redundancy is built-‐in at
all levels • 3 Types of Sharding –
Range, Hashed or Tag-‐Aware
RD
DVa
FP
DA
DVo
GS
RTA
DD
MongoDB set up
Use MongoDB OpsManager or Cloud Manager Automation to set up the cluster
(or) sudo mongod --port 37017 --dbpath /data/shard1 --logpath /logs/shard1.log –fork sudo mongod --port 37018 --dbpath /data/shard2 --logpath /logs/shard2.log –fork sudo mongod --port 47017 --dbpath /data/cfg —configsvr --logpath /logs/cfg.log –fork sudo mongos --port 57017 --configdb localhost:47017 sudo mongos --port 57018 --configdb localhost:47017 mongo --port 57017 > sh.addShard("localhost:37017”) > sh.addShard("localhost:37018”) > sh.enableSharding("mydb”) > sh.shardCollection("mydb.coll",{"_id":1})
In the Java Program, pass the Router IP addresseses and Ports as part of the Connec1on String
RD
DVa
FP
DA
DVo
GS
RTA
DD
MongoDB for a Big Data World
Rich Data
Data Variety
Fast Processing
Data Availability
Data Volume
Geo-‐Spa1al
Real-‐1me Access
Data Durability
MongoDB for a Big Data World
Rich Data
Data Variety
Fast Processing
Data Availability
Data Volume
Geo-‐Spa1al
Real-‐1me Access
Data Durability
Flexible Data Model and Dynamic Schema
Embedded Data
Na1ve Replica1on Across Data Centers
Appropriate WriteConcern
Rich Query Model and Aggrega1on
Na1ve Geo-‐Spa1al Features
Horizontal Scalability as you grow
Sub-‐documents, Arrays etc
More Information – Java/MongoDB Resource Location
MongoDB Java Driver http://docs.mongodb.org/ecosystem/drivers/java/
Java API to connect to MongoDB http://api.mongodb.org/java/3.0/
Driver Download http://mongodb.github.io/mongo-java-driver/
Morphia Project https://github.com/mongodb/morphia
Hadoop Driver for MongoDB
http://docs.mongodb.org/ecosystem/tools/hadoop/
University Course
https://university.mongodb.com/courses/M101J/about?jmp=docs&_ga=1.249916550.1866581253.1440492145
Resource Location
Case Studies mongodb.com/customers
Presentations mongodb.com/presentations
Free Online Training university.mongodb.com
Webinars and Events mongodb.com/events
Documentation docs.mongodb.org
MongoDB Downloads mongodb.com/download
Additional Info [email protected]
More Information – MongoDB
Thank You!
[email protected] You can reach me at