Lookup Tables:Fine-Grained Partitioning for Distributed
Databases Aubrey L. Tatarowicz #1, Carlo Curino #2,
Evan P. C. Jones #3, Sam Madden #4# Massachusetts Institute of Technology,
USA
Scale a distributed OLTP DBMS---Partition horizontally partition
To be effective--Strategy Minimize the number of nodes involved
The most common strategy horizontally partition the database using
hash partition or range partition
BACKGROUND
Many to many relationships are hard to partition
For social networking, simple partitioning schemes create a large fraction of
distributed queries/transactions.
While queries on the partitioning attribute go to a single partition, queries on other attributes must be broadcast to all partitions.
Problems
Use a fine-grained partitioning strategy. Related individual tuples are co-located
together in the same partition
Partition index . It specifies which partitions contain tuples
matching a given attribute value, without partitioning the data by those attributes.
Solution---Lookup Tables
To solve both the fined-grained partitioning and partition index problems, we introduce lookup tables
Lookup tables map from a key to a set of partition ids that store the corresponding tuples.
Lookup tables are small enough that they can be cached in memory on database query routers, even for large databases.
Lookup Tables
Lookup tables must be stored compactly in RAM
To avoid adding additional disk accesses when processing queries.
Efficiently maintaining lookup tables in the presence of updates
Challenges for lookup tables
Interact with the database using JDBC driver. Consist of two layers• Backend databases(plus an agent)• Query routers Contain Lookup table and partitioning
metadata.
OVERVIEW-The structure of our system.
The routers are given the network address for each backend,the schema, and the partitioning metadata when they are started.
Lookup tables are stored in memory and consulted to determine which backends should run each query.
Query routers send queries to backend databases.
Result in excellent performance, providing 40% to 300% better throughput.
Overview-Basic flow path of the system
LOOKUP TABLE QUERY PROCESSING
START-UP, UPDATES AND RECOVERY
STORAGE ALTERNATIVES
EXPERIMENTAL EVALUATION
CONCLUSION
Content
When receives a query ,firstly,Lookup tables tell which backends store the data that is referenced.
If (queries referencing a column that uses a lookup table )
the router consults its local copy of the lookup table and determines where to send the query;
If(multiple backends are referenced)rewrite the query and a separate query is sent to
each backend;
Basic Lookup Table Operation
We will use two tables
Usersid status Followerssource destination
source and destination are two foreign keys to users
Basic operation--Example
Users want to get the status for all usersthey are following.
R=SELECT destination FROM followers WHERE source=x
SELECT * FROM users WHERE id IN (R)
Traditional hash partitioning• Partition the users table by id • Partition the followers table by source
Problem:• The second query accesses several
partitions• Hard to scale this system by adding more
machines
Example-Partition
id status
1 0
2 0
3 1
4 1
source destination
1 2
1 4
2 1
3 1
3 2
3 4
1 2
1 4
2 1
3 1
3 2
3 4
id status
1 0
2 0
3 0
4 0
id status
1 0
3 1
Intelligent partitioning-share many friends
2 0
4 1
source destination
1 2
1 4
3 1
3 2
3 4
2 1
4 1
Defining Lookup Tables
Query Planning
LOOKUP TABLE QUERY PROCESSING
CREATE TABLE users ( id int, ..., PRIMARY KEY (id), PARTITION BY lookup(id) ON (part1, part2) DEFAULT NEW ON hash(id)); This says that users is partitioned with a
lookup table on id.
ALTER TABLE users SET PARTITION=part2 WHERE id=27;
Place one or more users into a given partition
Define Lookup Tables
ALTER TABLE followersPARTITION BY lookup(source) SAME AS users; Specify that the followers table should be
partitioned in the same way as the users table It means each followers tuple f should be
placed on the same partition as the users tuple u where u.id = f.source
CREATE SECONDARY LOOKUP l_a ON users(name);
• Define partition indexes. This specifies that a lookup table l_a should be maintained.
Each router maintains a copy of the partitioning metadata .This metadata describes how each table is partitioned or replicated.
The router parses each query to extract the tables and attributes that are being accessed
The goal is to push the execution of queries to the backend nodes, involving as few of them as possible.
Query Planning
When starting, each router know the network
address of each backend. This is part of the static configuration data.
The router then attempts to contact other routers to copy their lookup table.
As a last resort, it contacts each backend agent to obtain the latest copy of each lookup table subset.
START-UP
To ensure correctness, the copy of the lookup table at each router is considered a cache that may not be up to date.
To keep the routers up to date, backends piggyback changes with query responses.
This is only a performance optimization, and is not required for correctness.
UPDATES
Lookup tables are usually unique.
If(tuples are found)The existence of a tuple on a backend indicates that the query was routed
correctly; Else Stale lookup table entryNo lookup table entry
Piggyback changes
Lookup Tables must be stored in RAM to avoid imposing performance penalty.
Two implementations of lookup tablesHash tables
◦ Hash tables can support any data type and sparse key spaces, and hence are a good default choice.
Arrays Arrays work better for dense key-spaces. Arrays are not always an option because they require mostly-dense, countable key spaces.
STORAGE ALTERNATIVES
Lookup Table Reuse• Reuse the same lookup table in the router
for tables with location dependencies• At the cost of a slightly more complex
handling of metadata.
Compressed Tables• Trade CPU time to reduce space.• Specifically, we used Huffman encoding.
Efficiently store large lookup tables in RAM
Hybrid Partitioning• Combine the fine-grained partitioning of a
lookup table with the space-efficient representation of range or hash partitioning.
• The idea is to place “important” tuples in specific partitions, while treating the remaining tuples with a default policy
• To derive a hybrid partitioning, we use decision tree classifiers
Partial Lookup Tables• Trade memory performance by maintaining only
the recently used part of a lookup table.• It is effective if the data is accessed with skew.• The basic approach is to allow each router to
maintain its own least-recently used lookup table over part of the data.
• If the id being accessed is not found in the table, the router falls back to a broadcast query, and adds the mapping to its current table.
Backend nodes run Linux and MySQL. The backend servers are older single-CPU,
single-disk systems. Query router is written in Java, and
communicates with the backends using MySQL’s protocol via JDBC.
All machines were connected to the same gigabit Ethernet switch.
The network was not a bottleneck.
EXPERIMENTAL EVALUATION
Partition them using both lookup tables and hash/range partitioning.
Include approximately 1.5 million entries in each of the revision and text tables. And occupies 36 GB of space in MySQL.
Extracted the most common operation: fetch the current version of an article.
EXPERIMENTAL EVALUATION----Wikipedia
Wikipedia
R=select pid from page where title=“world” Z=select rid,page, text_id from R,revision where revision.page=R.pid and revision.rid=R.latest select text.tid from text where text.tid=Z.text_id
Query statements
Partition page on title, revision on rid, and text on tid.
The first query will be efficient and go to a single partition---1message.
The join must be executed in two steps across all partitions (fetch page by pid which queries all partitions, then fetch revision where rid = p.latest)—k+1messages.
Finally, text can be fetched directly from one partition.—1message
Alternative 1
The read-only distributed transaction can be committed with another broadcast to all partitions (Because of the 2PC read-only optimization. A distributed transaction that accesses more than one partition and must use two-phase commit).---k messages
Total: 2K+3 messages
Partition page on pid, revision on page , and text on tid.
The first query goes everywhere—K messages
The join is pushed down to a single partition.—1message
The final query goes to a single partition.—1message
This results in a total of 2k + 2 messages.
Alternative 2
Hash or range partition page on title.—1message
Build a lookup table on page.pid. Co-locate revisions together with their
corresponding page by partitioning revision using the lookup table. -2messages
Create a lookup table on revision.text_id and partitioning on text.tid = revision.text_id-
1message A total of 4 messages.
Alternative 3
The number of messages required for hash and range partitioning grows linearly with the number of backends, implying that this solution will not scale.Lookup tables enable a constant number of messages for growing number of backends and thus better scalability.
Lookup tables are mostly dense integers (76 to 92% dense), we use an array implementation of lookup tables.
We reuse lookup tables when there are location dependencies.
• In this case, there is one a lookup table shared for both page.pid and revision.page, and a second table for revision.text_id and text.tid.
We can store the 360 million tuples in the complete Wikipedia snapshot in less than 200MB of memory, which easily fits in RAM.
Storage Optimization
The primary benefit of lookup tables is to reduce the number of distributed queries and transactions.
Examining the cost of distributed queries.
• Scale the number of backends.
• Increase the percentage of distributed queries
Cost of Distributed Queries
The throughput with 1,4,8 backends As the percentage of distributed queries
increases, the throughput decreases. The reason is that the communication
overhead for each query is a significant cost
Partitioned it across 1, 2, 4 and 8 backends. With both hash partitioning and lookup
tables.
Example-Wikipedia
Shared nothing distributed databases typically only support hash or range partitioning of the data.
Lookup tables can be used with all these systems, in conjunction with their existing support for partitioning.
Related Work
We use lookup tables as a type of secondary index for tables that are accessed via more than one attribute.
Bubba proposed Extended Range Declustering, where a secondary index on the non-partitioned attributes is created and distributed across the database nodes .
Our approach simply stores this secondary data in memory across all query routers, avoiding an additional round trip.
Previous work has argued that hard to partition applications containing many-to-many relationships can be partitioned effectively by allowing tuples to be placed in partitions based on their relationships.
Schism uses graph partitioning algorithms to derive the partitioning. It does not discuss how to use the fine-grained partitioning it produces
Using lookup tables, application developers can implementany partitioning scheme they desire, and can also create partition indexes that make it possible to efficiently route queries to just the partitions they need to access.
The article presented a set of techniques to efficiently store and compress lookup tables, and to manage updates, inserts, and deletes to them.
CONCLUSION
With these applications, we showed that lookup tables with an appropriate partitioning scheme can achieve from 40% to 300% better performance than either hash or range partitioning and shows greater potential for further scale-out.
THANK YOU FOR YOUR TIME!