App Engine: Datastore Introduction
Part 1
Another very useful course:https://www.udacity.com/course/developing-scalable-apps-in-java--ud859
1
Topics cover in this lessonTopics cover in this lesson
• What is Datastore?What is Datastore?– Datastore and relational database– Scalability, reliability and performancey, y p
• Datastore Internals– BigtableBigtable
• Datastore Basics Operation– Entity Properties and KeysEntity, Properties and Keys– Properties and Value Types– Datastore APIsDatastore APIs
2
What is Datastore?What is Datastore?
• Datastore is a database (persistent storage)Datastore is a database (persistent storage) for AppEngine
AppEngine Traditional Web Apps
Web application
AppEngine(Java, Python,
Pert/CGIPHP
framework PHP, Go) Ruby on RailsPersistent storage
Datastore RDBMS: MySQL MSstorage MySQL, MS SQL, Oracle
3
What is Datastore?
• Persistent storage for AppEngine
What is Datastore?
Persistent storage for AppEngine• AppEngine is very scalable=> Many instances
=> Central Server to store data from all=> Central Server to store data from all instances.Wh RDB? S l bili !• Why not RDB? Scalability!
4
Datastore and RDBMSDatastore and RDBMS
Datastore RDBMSDatastore RDBMS
Query SQL like query Full support of SQLQuery language flexibility
SQL-like query language : Limited to simple filter and
Full support of SQL-Table JOIN- Flexible filtering
sort - SubqueryReliability and S l bilit
Highly scalable and li bl ith
Hard to scaleScalability reliable with
performance
D t t ff G l l l l bilit
5
Datastore offsers Google-level scalability
Problems of Scalability and ReliabilityProblems of Scalability and Reliability
• Single InstanceSingle Instance– Performance limited by machine resource
Single point of failure– Single point of failure
• Replication (copies) increases reliability– Consistency among instances
• Sharding (Split among machines)– Lock control (transaction)[Shard = split server into multiple machines]
6
Strong Consistency and Eventual Consistency
Strong Data is always consistent among allStrong Consistency
Data is always consistent among all database instances-Just after write operation- Crash in the middle of write operation -> All server returns the same results.
Eventual Takes time until all data becomesEventual Consistency
Takes time until all data becomesconsistent after write(Think of DNS as an example)
DNS i di t ib t d d t b tDNS is a distributed database system.Updated configuration on domain -> Not reflected to all DNS immediately. For a certain period of time, some DNS servers return old. 7
Scalability, Reliability and Performance on RDB
• Replication and/or sharding for scalabilityReplication and/or sharding for scalability• But…
St i t RDB l it ti– Strong consistency on RDB slows write operations due to lock.Join operation is a bottleneck due to data– Join operation is a bottleneck due to data shuffling.
RDB ensures strong consistency -> Hard to ensure scalability.
8Datastore for AppEngine
Datastore InternalsDatastore Internals• Based on Bigtable, which offers super high
l biliscalability.• High availability by High Replication Datastore
(HRD)– Synchronous write on multiple datacenters.
• Supports strong consistency among multiple rows
9
What is Bigtable?What is Bigtable?
• Scalable, distributed, highly-available and Sca ab e, d st buted, g y a a ab e a dstructured storage– Bigtable is not database itself (it doesn’t support
query)• Consistency
S i f i l– Strong consistency for single row– Eventual consistency for multi-row level
• Google usage• Google usage– In production since April 2005– Web search youtubeWeb search, youtube…
10
Automatic Scale-out of Bigtable table server
11
Bigtable Data ModelBigtable Data Model
• Key value data storageKey value data storage• A row has a Key and Columns
S d b• Sorted by Key– In lexical order– Enables range query by application
12
Bigtable OperationsBigtable Operations
• CRUD on a rowCRUD on a row– Create, Read, Update and Delete operations
Preserves single row strong consistency (not– Preserves single-row strong consistency (not multiple row).
• Scan by range of keys• Scan by range of keys– But can not search by column values
13
Scalability is based on Bigtable automated sharding
14
Scalability is based on Bigtable automated sharding.Megastore supports transactions (strong consistency)
15Property = actual data you want to store
16
17Property can have multiple values. (Multiple data for one property)
18
19
20
App Engine: Datastore Query, Index andDatastore Query, Index and
Transactions
Part 2
21
2222
2323
2424
2525
Bigtable can scanon a key, noton a key, not value!
Index table on Bigtable:Property name and valueImplement query on bigtable(without reading
t l l )actual value)
2626
2727
2828
2929
3030
3131
3232
3333
3434
3535
3636
3737
3838
3939
4040
4141
4242
4343
4444
4545
4646
4747
4848
Caveats = limitations
4949
5050
5151
5252
5353
5454
5555
5656
57