Advanced Non-Relational Schemas For Big Data

Advanced Non-Relational SchemasFor Big Data

by Victor Smirnov

Non-Relational Schema

● Is just a data structure● That uses some Memory Model● Typically, Key->Value mapping● Where Key is an Integer ID● And Value is an arbitrary array of a limited size or

memory block● It's assumed that operations on memory blocks

are atomic.

Storage Options

Partial (Prefix) Sums Tree

● Given a sequence of S[0, N) = s0...sn-1 of non-negative integers

● Sum(i) returns X = s0+s1+...+si.● FindLT(X) returns position i of largest Sum(i) < X● FindLE(X) is the same, but Sum(i) <= X● We can also define range versions of Sum(i, j) and

FindLT(j, X)● All operations perform in O(log N) time.

Packing Perfect Balanced Tree into an Array

Some Performance Bits

0

5e+06

1e+07

1.5e+07

2e+07

2.5e+07

3e+07

3.5e+07

4e+07

4.5e+07

5e+07

1 4 16 64 256 1024 4096 16384 65536 262144

Pe

rfo

rma

nce

, op

era

tion

s/se

c

Memory Block Size, Kb

PackedTree random read performance,1 million random reads

PackedTree<BigInt>, 2 childrenPackedTree<BigInt>, 32 children

std::set<BigInt>, 2 children

L1 L2 L3 RAM

Dynamic Vector

● An ordered sequence of elements (bytes, integers, strings) of size N

● Acess(i) is O(log N)● Insert(i, value) is O(log N)● Delete(i) is O(log N)● We can also define batch operations:● Insert(i, value[])● Delete(i, j)● Split(i); Merge(AnotherVector);...

Dynamic Vector

Dynamic Vector Operations

● FindLT(i) returns the B where i bounds and offset j in the block B for i

● Acces(i) is O(log N)● Insert(i, value) and Delete(i) are also O(log N)

because the tree is balanced.

File System: Map<ID, Vector<T>>

● Maps ID to Vector<T>● Merge all values into one large Dynamic Vector, in ID

order● Create separate “index” sequence from pairs <ID, Offset>

in ID order● We can represent this “index” sequence as two partial

sums tree, for ID and for Offset● We can merge both these trees to one because they have

exactly the same structure: multi-index balanced partial sums tree.

Map<ID, Vector<T>>

Sharing Tree Structures

● Tree structure sharing saves both space and time: SPMD principle (single program, multiple data)

● We can align partial sum trees with different structures using interpolation (padding with zeroes)

● We can merge index and data streams (index and data) of Map<ID, Vector<T>> in one multi-stream tree.

● Merging the trees, we will try to fix index pairs and corresponding data into the same leaf node of multi-stream tree.

Multistream Tree Node Layout

Multistream Balanced Tree

ACID

● Atomic block operations are not enough● Even simple tree update affects several blocks ● So, ACID is mandatory for advanced non-

relational schemas● We can get ACID for free with Multi-Version

Concurrency Control (MVCC)● We need Version History over data blocks● Where each each transaction is a version.

Transaction History via MVCC

Version History Implementation

● Version History maps pair <ID, Version> to an ID of real data block for that version and given ID

● We have Map<ID, Vector<Version, ID>>● We can turn it to Version History by sorting each

Vector<Version, ID> (less sapce, slower)● Or by creating additional partial sums tree index on top of it

(more space, but much faster)● We can do it in just one multi-stream balanced tree● MVCC requires some other data structures but they can be

designed by analogy.

Concurrency Handling

● Version History is a complicated data structure

● Concurrent access to it must be restricted

● Split whole Version History to shards

● And shard blocks by ID to reduce lock contention on Version History

Distributed Storage and Processing

● MVCC is very Raft/Paxos-friendly

● Because of Version History and MVCC

● So we can join storage nodes to Raft groups

● And join Raft groups to larger groups with 2PC

● Using split/merge model to map data to nodes.

Bonus Slides

Searchable Bitmaps

● rank1(n) = number of ones in [0, n)● select1(i) = position of i-th 1 in the bitmap● rank0(n) = number of zeroes in [0, n)● select0(i) = position of i-th 0 in the bitmap

Searchable Bitmap: Structure

Searchable Bitmaps: Views

LOUDS Tree

LOUDS Tree: Parent()

Wavelet Tree

● Searchable sequence [0...N) for large alphabets● Rank(i, s) returns number of symbols s in [0, i)● Select(k, s) returns position i of k-th symbol s● Insert(i, s), Delere(i), Access(i) – insert, remove and

access the symbol at position i respectively● All these operations have O(log N) time complexity● By mapping numbers to symbols we can perform the

following lookup operations: >, >=, <, <=, <> in O(log N) time.

Wavelet Tree: Structure

Wavelet Tree: Rank

Wavelet Tree: Inverted Index

Inverted Index Lookup

Thanks!More details are at:

https://bitbucket.org/vsmirnov/memoria/wiki/MemoriaForBigData

Data & Analytics

Advanced Non-Relational Schemas For Big Data