27

Introduction - What is SimpleDB?

  • Upload
    esben

  • View
    42

  • Download
    0

Embed Size (px)

DESCRIPTION

Introduction - What is SimpleDB?. Amazon SimpleDB is a web service for running queries on structured data in real time. - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction - What is SimpleDB?
Page 2: Introduction - What is SimpleDB?

Introduction - What is SimpleDB?

• Amazon SimpleDB is a web service for running queries on structured data in real time.

• Amazon SimpleDB requires no schema, automatically indexes your data and provides a simple API for storage and access. This eliminates the administrative burden of data modeling, index maintenance, and performance tuning.

• This service works in close conjunction with Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2), collectively providing the ability to store, process and query data sets in the cloud.

Page 3: Introduction - What is SimpleDB?

Domains, Attributes, and Items

• A domain is like a table.

• An attribute is analogous to a field or column.

• An item is similar to a database row.

• We can change the structure of a domain easily, since it has no schema.

• In addition, attributes are of string type and can contain multiple values.

Page 4: Introduction - What is SimpleDB?

Services offered by SimpleDB

Page 5: Introduction - What is SimpleDB?

Setting up Eclipse with AWS Plugin

• Requires Eclipse IDE 3.4 or higher.• Open Help-> Install New Software.. Click on Add button and Enter “AWS Toolkit

for Eclipse” in the name box and http://aws.amazon.com/eclipse/ in the location box.

• You need to provide your AWS Credentials which can be retrieved from your AWS account.

• From the Window menu select Open Perspective -> Other-> Database development perspective and create a connection for Amazon SimpleDB.

• Your AWS account in linked to your Eclipse platform and you can directly create domains, attributes, items and also use SQL Scrapbook feature to query data.

Page 6: Introduction - What is SimpleDB?

Data loading

SimpleDB can be queried in one of the following ways:

• Making RESTful get and post requests over HTTP or HTTPS.

• Making SQL like query using a programming language.

• I have implemented data loading into SimpleDB using Java by importing AWS Plugin which allows us to use the AWS credentials and store the data into our SimpleDB domains which we can query later to fetch the data.

Page 7: Introduction - What is SimpleDB?

Using Eclipse IDE in correlation with SimpleDB

The Java program I wrote accesses the SimpleDB connection using the AWS credentials and reads each of the field into the domain we specify.

Page 8: Introduction - What is SimpleDB?

We read each line of the csv source file and load the data into the domain in SimpleDB. The code snippet of the same is as below:

Page 9: Introduction - What is SimpleDB?

Output

The DB setup in Java looks like this:

Page 10: Introduction - What is SimpleDB?

Output in SQL Scrapbook

Once the data is loaded, we can check for the data loaded in the DB connection we created earlier. We can view the data for each attribute as below:

Page 11: Introduction - What is SimpleDB?

Using REST to load data

This shows a REST request that puts three attributes and values for an item named Item123 into the domain named MyDomain.

Page 12: Introduction - What is SimpleDB?

Response to the request:

Page 13: Introduction - What is SimpleDB?

Data Querying

Once the data is loaded into the domain of SimpleDB, we can query the data either using the SQL Scrapbook feature provided by the AWS SimpleDB or we can write queries in Java to get the data.

Page 14: Introduction - What is SimpleDB?

Query Output in console

Page 15: Introduction - What is SimpleDB?

Query Output in SQL Scrapbook

Page 16: Introduction - What is SimpleDB?

Types of QueriesSimple Queries:These are the usual queries we perform like in any database: Examples: select * from mydomain where Title = 'The Right Stuff'

select * from mydomain where Year > '1985'

Range Queries:Amazon SimpleDB enables us to execute more than one comparison against attribute

values within the same predicate. This is most commonly used to specify a range of values.

select * from mydomain where Year between '1975' and '2008'select * from mydomain where (Year > '1950' and Year < '1960') or Year like '193%' or Year = '2007'

Page 17: Introduction - What is SimpleDB?

Queries on Attributes with Multiple Values:

• Amazon SimpleDB allows you to associate multiple values with a single attribute.

• Each attribute is considered individually against the comparison conditions defined in the predicate.

Example:select * from mydomain where Keyword = 'Book' and Keyword = 'Hardcover'

• Retrieve all items that have the Keyword attribute as both "Book" and "Hardcover."

• Each value is evaluated individually against the predicate expression.

Page 18: Introduction - What is SimpleDB?

Multiple Attribute Queries:

• Multiple attribute queries work by producing a set of item names from each predicate and applying the intersection operator.

• The intersection operator only returns item names that appear in both result sets.

select * from mydomain where Keyword = 'Book' intersection Keyword = 'Hardcover'

• The first predicate produces 100, 200, and 50. The second produces 50. The result returns 50 counts. The intersection operator returns results that appear in both queries.

Page 19: Introduction - What is SimpleDB?

Query Optimisation

• Amazon does the query optimization on its own and lets the users to just store the data and query it.

• The 10gb domain limit was created with optimization in mind.

• The user can optimize it themselves by splitting data to multiple domains.

• In order to improve the performance, we can partition our dataset among multiple domains to parallelize queries and have them operate on smaller individual datasets.

Page 20: Introduction - What is SimpleDB?

Partitioning the data

Applications to parallelize queries:

• Natural Partitions— The data set naturally partitions along some dimension. For example, a University catalog might be partitioned in the "Grad", "UnderGrad" and "Staff" domains. Although we can store all the product data in a single domain, partitioning can improve overall performance.

• High Performance Application— This can be useful when the application requires higher throughput than a single domain can provide.

• Large Data Set—This can be useful when timeout limits are reached because of the data size or query complexity.

Page 21: Introduction - What is SimpleDB?

Aggregation and Joins

• If we need aggregation, SimpleDB is not the right solution.

• It is built around the school of thought that the DB is just a key value store, and aggregation should be handled by an aggregation process that writes the results back to the key value store.

• The count() function is recently introduced to the set of functions.

• Since only 2500 data records will be displayed per query we should make sure that the count function does not exceed this range.

• We cannot perform joins in SimpleDB as we can execute a query against a single domain only and this is one of the limitations present in it.

Page 22: Introduction - What is SimpleDB?

Data Indexing

• Amazon does not provide enough information about how indexes are created or managed on SimpleDB, except for the fact that they are automatically created and managed.

• SimpleDB users do not have any control over it.

• Following are some of the salient features of indexes:1. Domain keys are indexed.2. Data are indexed when we enter or modify them in the database.3. SimpleDB takes all data as input and indexes all the attributes.

Page 23: Introduction - What is SimpleDB?

Replication

• Asynchronous replication is supported.

• Amazon SimpleDB creates and manages multiple geographically distributed replicas of the data automatically.

• Every time we store a data item, multiple replicas are created in different data centers within the region we select.

Page 24: Introduction - What is SimpleDB?

Use CasesAmazon S3 Content Search

• It is easy to store attributes in SimpleDB, along with pointers to where the media is

stored in S3.

• SimpleDB creates an index for every attribute for quick searching. Different file

types can have different attributes in the same SimpleDB domain. New file types or

new attributes on existing file types can be added at any time without requiring

existing records to be updated.

Page 25: Introduction - What is SimpleDB?

Low-Usage Application

• There are applications in the enterprise and on the open web that do not see a

consistent heavy load. They can be low usage in general with periodic or seasonal

spikes.

• For these types of applications, it can be difficult to justify an entire database

server for one application. With SimpleDB, low-usage applications can run within

the free tier of service while maintaining the ability to scale up to large request

volumes when necessary.

Page 26: Introduction - What is SimpleDB?

Fat clients

For years, everyone has been working on thin clients. The new, smarter apps do a lot themselves, but in the age of the cloud they can’t do everything. SimpleDB is a perfect companion for this type of system: self-contained clients operating on cloud-based information.

One advantage of SimpleDB is that it’s ready to use right away. There is no setup or administration hassle, the data is secure and most importantly, SimpleDB provides access through web services that can be called easily from these clients. Typically, many applications take advantage of storing data in the cloud to build different kinds of clients—web, smartphone, desktop—accessing the same data.

Page 27: Introduction - What is SimpleDB?

Thank you