View
226
Download
2
Category
Preview:
Citation preview
NoSQL Databases
Oracle - Berkeley DB
Content
A brief intro to NoSQL
About Berkeley Db
About our application
3
???
What is NoSQL?
• Stands for Not Only SQL
• Class of non-relational data storage systems
• Usually do not require a fixed table schema nor do they use the concept of joins, group by, order by and so on.
• All NoSQL offerings relax one or more of the ACID properties.
What is NoSQL ?
• Next generation databases
• Characteristic:
– Large Data Volumes
– Non-relational
– Distributed
– Open-source
– Scalable replication and distribution
CAP Theorem
8
History of NoSQL
• The term NoSQL was introduced by Carl Strozzi in 1998 to name his file based database.
• It was again re-introduced by Eric Evans when an event was organized to discuss open source distributed databases.
Why NoSQL Databases ?
• Bigness
• Massive write performance
• Fast key-value access
• Flexible schema and Flexible data types
• No single point of failure
• Programming ease of use
12
Scaling to size vs complexity.
Berkeley DB - Introduction
• An open-source, embedded transactional data management system.
• A key/value store.
• Runs on everything from cell phone to large servers.
• Distributed as a library that can be linked directly into an application.
• Berkeley DB has high reliability and high performance.
Berkeley DB Product Family Architecture
Berkeley DB: The Design Philosophy
• Provide mechanisms without specifying policies.
• For example, Berkeley DB is abstracted as a store of <key, value> pairs.
– Both keys and values are opaque byte-strings.
– Berkeley DB has no schema.
– Application that embeds Berkeley DB is responsible for imposing its own schema on the data.
Data Access Services
• Indexing methods
– B-Tree
– Hash
– Queue
– A record-number-based index
Advantages of <key, value> pairs
• An application is free to store data in whatever form is most natural to it.
– Objects (like structures in C language)
– Rows in Oracle, SQL Server
– Columns in C-store
• Different data formats can be stored in the same databases.
Data Management Services
Concurrency
Transactions
Recovery
Berkeley DB Applications
• Local Directory Access Protocol
• Mail Servers
• Manage access control lists
• Store user keys in a public-infrastructure
• Record machine-to-network address mappings in address servers
Berkeley DB for Computationally Intensive
Algorithms• Algorithms that repeatedly execute a
computationally intensive operation– E.g. Factorial
• Useful to create a cache containing the already computed results– Cache = Set of <key,value> pairs containing <n,
factorial(n)>
• Advantages:– avoid to re-compute results for the same input (even
over different executions) – In a process crash, we can still start again the process
and quickly go back to the point where it stopped
• In memory map• Simple• Very efficient (b/s in completely memory)• Need considerable amount of memory• No fault tolerance (We need to manually save data to a file)• Relation Databases• ACID properties may not be necessary• Cannot handle Big data • Slow• NoSQL databases (Berkeley DB)• Fast key-value access• Flexible schema and Flexible data types• Ease of use• Fault tolerance
Berkeleydb.java
• Open Environment:• EnvironmentConfig class specify environment configuration parameters
• Open Class Catalog: • Class catalog : specialized database store that contain
java class descriptions of all serialized objects stored in the database
• Create Database and StoredClassCatalog object
• Open Database:
• Close Environment, Class Catalog and Databases:
DBViews.java
Factorial.java
Factorial (Berkeley DB ) – Memory Usage
Factorial (MySQL) – Memory Usage
Factorial (HashMap) – Memory Usage
Recommended