This presentation tackles a particularly challenging situation that often occurs when creating a distributed relational database. In this presentation you will learn: - What a shard conflict is - How to identify shard conflicts - How to resolve shard conflicts in a distributed database - How shard conflicts affect query processing
Database Scalability: The Shard Conflict July 2014
2 The Database Scalability: The Shard Conflict This presentation tackles a particularly challenging situation that often occurs when creating a distributed database. In this presentation you will learn: What a shard conflict is How to identify shard conflicts How to resolve shard conflicts in a distributed database How shard conflicts affect query processing
3 Traditional Databases vs. Distributed Databases Traditional Monolithic DB Made up of tables of data that are related to one another Modern Distributed DB Data distribution is necessary for scalability All of the data is located in one place and is easily accessible Information is spread across various servers (instances) The data relationship is stored deep in the database and can be easily analyzed and queried using conventional methods Related data can be distributed into different partitions, or shards, making related query requests difficult to process
4 So, What Is aShard Conflict? At ScaleBase, we have coined the term shard conflict to describe a situation where: A given statement cannot be executed as is, unchanged, on all (or one) partitions and cannot be relied upon to yield a truly correct result. Lets take a look at the following examples
5 Identifying the Conflict Example #1 Choosing id as the shard key presents a shard conflict, because there is no guarantee that all employees are in the same shard as their corresponding departments.
6 Resolving the Conflict Example #2 The Method Choose department_id as the Employee Tableshard key The Outcome: The join query was optimized as a result of all department- related data being stored in the same partition No cross-joins exist between partitions Statements can now safely be executed on all partitions
7 Wait a Minute...Theres Still a Conflict Select e.first_name, e.last_name, m.first_name, m.last_name from employee e join employee m on e.manager_id=m.id Join the Employee Table together with itself to find a manager there is no guarantee they are in the same shard. The employee tables are not capable of being sharded by both id and manager_id at the same time.
8 Shard Conflict Effects on Query Processing It is clear from the examples that when dealing with a foreign key and two tables, a common key can be utilized to resolve certain (but not all) conflicts Distributed data can become quite complex if not handled correctly Its the kind of problem that is not always obvious, and can yield incorrect results, unnoticed
9 ScaleBase Can Help ScaleBase is a modern, distributed MySQL database management system. It is optimized for the cloud and deploys in minutes to enable you to scale out to an unlimited number of users, data and transactions. It is a horizontally scalable database cluster built on MySQL that dynamically optimizes workloads and availability by logically distributing data across public, private and geo-distributed clouds. Contact Us firstname.lastname@example.org or Download free software ScaleBase Software http://www.scalebase.com/software/ Use your relational aDBA skills and get NoSQL capabilities
10 Start Using ScaleBase Today Check out ScaleBases software ScaleBase on Amazon ScaleBase on Rackspace