Upload
himanshuvaishnav
View
485
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
An Evaluation of Distributed Datastores UsingThe AppScale Cloud Platform
Presented By- Himanshu Ranjan Vaishnav
TE-42065 (Comp-I)
SEMINAR GUIDE - Prof. Mrs S. S. Sonawani
1
04/09/23
What is AppScale?
AppScale is an open-source implementation of the Google App Engine
cloud platform.
AppScale is an extension of the non-scalable software development kit
that Google makes available for testing and debugging applications.
App-Scale currently supports HBase, Hypertable, Cassandra, Voldemort,
MongoDB, MemcacheDB, Scalaris, and MySQL Cluster datastores.
2
04/09/23
What AppScale Does?
AppScale is a robust, open source implementation of the Google App
Engine APIs that executes over private virtualized cluster resources and
cloud infrastructures including Amazon Web Services and Eucalyptus.
Users can execute their existing Google App Engine applications over
AppScale without modification.
AppScale automates deployment and simplifies configuration of datastores
that implement the API and facilitates their comparison and evaluation on
end-to-end performance using real programs (Google App Engine
applications).
3
04/09/23
AppScale Features
• More Choices of data Stores
• Neptune Language
• MapReduce
• Fault Tolerance
And More
• App Engine Portability
4
04/09/23
Google App Engine
A software development platform
Platform-as-a-service (PaaS)
GAE Datastore
Big Table
A master/slave relationship
5
04/09/23
Continue….
GAE Datastore API provides the following primitives:
For eg.
• Put (k, v): Add key k and value v to table; creating a table if needed
• Get (k): Return value associated with key k
• Delete (k): Remove key k and its value
• Query (q): Perform query q using the Google Query Language (GQL) on a single table, returning a list of values
• Count (t): For a given query, returns the size of the list of values returned
6
04/09/23
Google App Engine APIs
Blobstore API
Channel API
Datastore API
Images API
Memcache API
Namespace API
Task Queue API
Users API
URL Fetch API
XMPP API
MapReduce Streaming API
EC2 API
7
04/09/23
AppScale deployment
AS – App Server
ALB – App Load Balancer
DBS – Data Base Slave Peer
DBM – Data Base Master Peer
8
04/09/23
Multi-tiered approach within AppScale
9
04/09/23
Database Services
Protocol Buffer Server (PBServer)
User/App Server (UAServer)
Blobstore service
Monitoring Services
Neptune
10
04/09/23
APPSCALE DISTRIBUTED DATABASE SUPPORT
Cassandra
HBase
Hypertable
MemcacheDB
MongoDB
Voldemort
MySQL
11
04/09/23
1. Cassandra
Facebook engineers designed, implemented, and released
A hybrid approach
Consistent
Written in the Java and exposes its API through the Thrift software
framework
Supports range queries
12
04/09/23
2. HBase
Developed and released by PowerSet
An official Hadoop subproject
Employs a master-slave distributed architecture
Provides flexible column support
Written primarily in Java, with a small portion of the code base in C
HBase is deployed over the Hadoop Distributed File System (HDFS)
13
04/09/23
3. Hypertable
Hypertable was developed by Zvents
Provide an open source version of Google’s BigTable
Written in C++
RangeServer
14
04/09/23
4. MemcacheDB
Developed by Open source developer Steve Chu
Employs a master-slave approach
Runs with a single master node and multiple replica nodes
Written in C and uses Berkeley DB
15
04/09/23
5. MongoDB
Developed and released by 10gen
Provide both the speed and scalability
Written in C++
Queries are performed using hashtable
16
04/09/23
6. Voldemort
Developed by and currently in use internally at LinkedIn
Eventual consistency
More Developer friendly
Written in Java and exposes its API via Thrift
17
04/09/23
7. MySQL
A well-known relational database
Employ MySQL Cluster
Provides concurrent access to the system
Written in C and C++
18
04/09/23
EVALUATION
Load tables in all databases with 1000 items
Test specifics:
– On Each database put, get, delete, no-op performed
– Considered- light load: one thread, medium load: three concurrent thread,
heavy thread: nine concurrent thread
– Repeat each experiment 5 times
Executes this application in an AppScale cloud
Each node executes with 2 virtual processors, 10GB of disk(max), 4GB of
memory
19
04/09/23
Experimental Results20
04/09/23
Limitations
Persistence
Blobstore Max File Size
Datastore
Task Queue
Follow a ”deploy on all nodes”
Limited distribution supported
Lake of retrieving the entire
table to run a query
Not released the source code of
the Java App Engine server
21
04/09/23
Future Work
Expand out of the web services domain
– Investigating opportunities in streaming
– Integrated MapReduce support for highperformance computing (HPC)
– Co-locate AppEngines and use shared memory
Additional databases:
– MongoDB, Scalaris, CouchDB
22
04/09/23
Continue…
Extending AppScale with new services for
- large-scale data analytics
- data
- computation intensive tasks
Cloud-agnostic
Integration of mobile device
23
04/09/23
CONCLUSION
Presents an open source implementation of the Google App Engine
(GAE) Datastore API with in a cloud platform called AppScale
The implementation unifies access to wide range of open source
distributed database technologies and automates their configuration
and deployment. However, each database differs in the degree to which
it implements the APIs.
24
04/09/23
DEMO
25
04/09/23
Thank YouAny Questions ??
26
04/09/23