46
1 FOSS Asia 2010

BlackRay FOSS Asia 2010

  • Upload
    fschupp

  • View
    348

  • Download
    1

Embed Size (px)

DESCRIPTION

Hot off the press from FOSS Asia in Saigon, Vietnam (Nov 12th-14th) the current "State of the Engine" presentation.

Citation preview

Page 1: BlackRay FOSS Asia 2010

1 FOSS Asia 2010

Page 2: BlackRay FOSS Asia 2010

2 FOSS Asia 2010

The State of the Engine

➔ Brief Technology Overview➔ New SQL Parser (lemon/quex)➔ User Defined Functions➔ BlackRay as a storage engine➔ Outlook: Realtime Data Updates

Page 3: BlackRay FOSS Asia 2010

FOSS Asia 2010

Brief BlackRay History

Page 4: BlackRay FOSS Asia 2010

4 FOSS Asia 2010

What is BlackRay?

● BlackRay is a relational, in-memory database● Supports SQL, utilizes PostgreSQL drivers● Fulltext (Tokenized) Search in Text fields● Object-Oriented API Support● Persistence via Files, Transaction support● Scalable and Fault Tolerant● Open Source, Open Community● Available under the GPLv2

Page 5: BlackRay FOSS Asia 2010

5 FOSS Asia 2010

Current Release

Current 0.10.0 – Released December 2009● Complete rewrite of SQL Parser (boost::spirit2)● PostgreSQL client compatibility (via network protocol) to

allow JDBC/ODBC... via PostgreSQL driver● Rewritten CLI tools● Major bugfixes (potential memory leaks)● Better Authentication suppor for Instances

Page 6: BlackRay FOSS Asia 2010

FOSS Asia 2010

Technology Overview

Page 7: BlackRay FOSS Asia 2010

7 FOSS Asia 2010

Why call it Data Engine?

● BlackRay is a hybrid between a relational database and a search engine thus we call it „→ data engine“

● Database features:● Relational structure, with Join between tables● Wildcards and index functions● SQL and JDBC/ODBC

● Search Engine Features● Fulltext retrieval (token index)● Phonetic and similar approximation search● Extremely low latency

Page 8: BlackRay FOSS Asia 2010

8 FOSS Asia 2010

BlackRay Architecture

C++ API

Java API

Management Server

InstanceServer

Data Universe(RAM Resident)

<

Redo Log

Snapshots

SQLInterface

Postgres*Clients

L5: Multi-Values

L4: Multi-Tokens

L5: Multi-Values

L3: Row Index

L2: Postings

L1: Dictionary

5-Perspective Index

Python API

PHP API

Python API

C# API

Page 9: BlackRay FOSS Asia 2010

9 FOSS Asia 2010

Data Universe

● BlackRay features a 5-Perspective Index ● Layer 1: Dictionary● Layer 2: Postings ● Layer 3: Row Index● Layer 4: Multi-Token Layer● Layer 5: Multi-Value Layer

● Layer 1 and 2 comprise a fully inverted Index● Statistics in this Index used for Query Plan Building● All data - index and raw output - are held in memory

Page 10: BlackRay FOSS Asia 2010

10 FOSS Asia 2010

Core BlackRay Features

● Standard loaders enable high performance loading of data into tables

● Persistence is done via file based snapshots● Snapshots enable data versioning and simple

backups● Basic ACID Transaction complianc is implemented

in BlackRay, without crash recovery support.

Page 11: BlackRay FOSS Asia 2010

11 FOSS Asia 2010

Query Interfaces

● BlackRay implements the PostgreSQL server socket interface and binary APIs in Java, C++ and Python

● PostgreSQL compatible drivers can be utilized against BlackRay (JDBC/ODBC)

● Native API enables object oriented data access ● Performance of native APIs currently is substantially

better than SQL via PostgreSQL drivers● Dynamic query building is very efficient with native

APIs

Page 12: BlackRay FOSS Asia 2010

FOSS Asia 2010

A New SQL Parser

Page 13: BlackRay FOSS Asia 2010

13 FOSS Asia 2010

A New SQL Parser - again?

● The 0.10 release included a much improved SQL parser, built with boost::spirit

● Quite solid, fast and simple to use● However, boost deprecates spirit1● boost::spirit2 is not compatible to spirit1, requiring a

rewrite anyways● Our impression: spirit2 requires too many resources

and large grammars result in huge generated files● Also: spirit and C++ templates do not mix well

Page 14: BlackRay FOSS Asia 2010

14 FOSS Asia 2010

What would be a better choice?

● Flex/Bison: ● The obvious choice of MySQL and PostgreSQL● Two-step compile process, generates C not C++● No Unicode support

● ANTLR: ● Odd grammar rules, not optimal for C++● Recursive Descent parsers are not suited for SQL

● Lemon/QUEX● Our new choice ;)

Page 15: BlackRay FOSS Asia 2010

15 FOSS Asia 2010

Lemon/Quex: Our experience

● Lemon:● Lemon is part of SQLite● Much more intuitive syntax than Flex syntax

● Quex:● Generates tokenizers in C++● Unicode and external Parser support● Partially buggy but all issues were fixed witihn days

● Synopsis: Lemon/Quex are like Bison/Flex, just with Unicode and C++ support and maybe easier to debug

Page 16: BlackRay FOSS Asia 2010

16 FOSS Asia 2010

Current Progress

● Basic SQL Features are ported from spirit to Lemon/Quex

● The „issue-77“ branch contains all recent SQL parser code

● Unit-Testing and Database level testing very solid● Will be part of the 0.11 release

Page 17: BlackRay FOSS Asia 2010

17 FOSS Asia 2010

Recent Additions

● Support for simple (single column) User Defined Functions is now complete

● Query portion (no subselect, no aggregate functions) is very stable

● Data Definition Language was added recently● CREATE SCHEMA ● CREATE TABLE● ALTER TABLE ● Index is created dynamically, so no CREATE INDEX

required

Page 18: BlackRay FOSS Asia 2010

FOSS Asia 2010

User Defined Functions

Page 19: BlackRay FOSS Asia 2010

19 FOSS Asia 2010

User Defined Functions

● BlackRay was designed with support for Index functions that operate on data in tables

● Functions pre-compute index results, improving speed and enabling queries that are not possible otherwise

● Functions are called on data load, and also on queries.

● Functions must not maintain state outside of tables of the same instance they operate on.

Page 20: BlackRay FOSS Asia 2010

20 FOSS Asia 2010

A Sample Function

● Using functions in BlackRaySELECT name FROM employee_table WHERE fx_phonetic (name) = 'mike';

● Functions need to be loaded beforehand:CREATE FUNCTION fx_phonetic(varchar, varchar) RETURNS int AS 'DIRECTORY/funcs', 'phonetic' ;

● The function must implement the BlackRay default function signature, which is almost identical to the MySQL and PGSQL signatures

Page 21: BlackRay FOSS Asia 2010

21 FOSS Asia 2010

Current State

● User Defined Function Repository fully implemented● All built-in functions ported to be compatible to User

Defined Functions● SQL support for User Defined Functions under way● Will be part of the 0.11 release

Page 22: BlackRay FOSS Asia 2010

FOSS Asia 2010

Our Adventure: BlackRay As A Storage Engine

Page 23: BlackRay FOSS Asia 2010

23 FOSS Asia 2010

Why even bother?

● In Fall 2009, we embarked on a little adventure to implement BlackRay as a storage engine

● The old Engine had only a minimal SQL interface and we lacked the expertise to build it ourselves

● Plugging into the MySQL ecosystem seemed like a very pleasant choice

● The features of BlackRay would make it a good query cache for large disk tables.....

Page 24: BlackRay FOSS Asia 2010

24 FOSS Asia 2010

Our First Problem

BlackRay does not support a simple table scan.....● It may seem strange, but due to it's design as an in-

memory index, we do not separate table and index● Each column index basically is the data of the column● BlackRay distinguishes select and output columns, both

of which remain in RAM● The index therefore was never designed to be forced

back into a row format, for a simple table walk

Page 25: BlackRay FOSS Asia 2010

25 FOSS Asia 2010

Possible Solution?

So, we can walk the Index instead?● Rather than scanning the table, it is possible to scan the

index instead● This only works for the columns markes „searchable“● Causes nasty errors when trying to select against result-

only columns● In tokenized index columns, getting the data back out

means concatenation with a blank between values – not nice, as tokenizing can follow complex rules

● Requires Refactoring of our Layer 3 (Row-Index)

Page 26: BlackRay FOSS Asia 2010

26 FOSS Asia 2010

Next Issue

Optimizing Queries● The BlackRay Optimizer uses the Layer 1/2 (Inverted

Index) and Layer 4 (Multi-Tokens) Data to chose a Query Path

● In BlackRay „SELECT text FROM t WHERE text LIKE '*pattern1*' AND text LIKE '*pattern2* is extremely efficient as the inverted index has all the data

● Even with OR this is an efficient Query, due to the fact that we can immediately chose the smaller query first and eliminate double matches

Page 27: BlackRay FOSS Asia 2010

27 FOSS Asia 2010

Next Issue

● Optimizing in the Storage Engine Interface?● In BlackRay, the Optimizer uses the AST from the SQL

Parser to figure out what to optimize● Based on a field or single Index level, the number of

matches really are not useful● Without utilizing the Layer2 and Layer4 structures, we

lose performance by several orders of magnitude● Personal Opinion: The MySQL Optimizer really seems to

like table scans, and tricking it with random vs sequential read cost did not do the trick

Page 28: BlackRay FOSS Asia 2010

28 FOSS Asia 2010

Functions in the Index

● Columns can take functions to be used on the data upon indexing, and when select is carried out

● The most common functions are – TOKENIZE – to support multi-token indexes– PHONETIC – match against defined phonetic rules– ALIAS – match a token against words with similar meaning

● Internally these functions could be considered Meta-Columns on the Index

● To be able to chose the proper column, we need to know what function was used in the select

Page 29: BlackRay FOSS Asia 2010

29 FOSS Asia 2010

Functions in the Index

Consider this Query:SELECT text FROM t WHERE fx_phonetic(text) LIKE 'maier%';

● Functions can take more than one parameter, and may be nested

● We could not quite figure out how to explain this to the MySQL Parser

● The function data would need to be available to the Index to chose where to look

Page 30: BlackRay FOSS Asia 2010

30 FOSS Asia 2010

Threading Models...

● BlackRay has a highly optimized Threading Model● In RAM, we do not expect I/O-waits, so a model of

two dedicated Threads per CPU core works really well

● Locking in the Index is built around this model● „One Thread per Conection“ requires at least a

careful review of the way critical data structures are accessed

Page 31: BlackRay FOSS Asia 2010

31 FOSS Asia 2010

.... Our Conclusion

● Currently, BlackRay really does not fit too well into the storage engingine architecture

● Did we lose all hope? Absolutely not.....● BlackRay Applications could really benefit from

being able to utilize MySQL features, including the Archive Engine as well as temporary tables in Heap

● Thanks to the excellent Blog and postings by Brian Aker, which allowed us to not make all beginner mistakes ourselves

Page 32: BlackRay FOSS Asia 2010

FOSS Asia 2010

Outlook: Realtime Update/Insert

Page 33: BlackRay FOSS Asia 2010

33 FOSS Asia 2010

Current Challenges

● Bulk Updates● BlackRay supports Insert and Delete via the Bulk Loader ● Updates are done via Insert & Delete

● Insert/Delete via API● An API exists for Insert/Delete● The Insert/Delete API is separate from the Query API● Both APIs cannot be used in the same Thread

● Insert/Delete via SQL● Currently Insert/Delete are not available via SQL

Page 34: BlackRay FOSS Asia 2010

34 FOSS Asia 2010

Supporting Insert/Delete

● Pull together Insert/Delete and Query APIs● Take out the separate APIs● Unified API will then support transactions

● Enable Insert/Delete via SQL● Extend the SQL Grammar to include INSERT/DELETE● Implement the functions via the unified API

● The Bulk Loader and SQL● Rewrite of the Bulk Loader to utilize the unified API,

rather than SQL

Page 35: BlackRay FOSS Asia 2010

35 FOSS Asia 2010

Performance Impact

● Insert and Delete has a severe performance impact on parallel queries

● Locking needs to be utilized to ensure transactional integrity, causing queries to stall on data modification

● Currently BlackRay uses sorted lists for the data ductionary and the other index layers

● For indeces that have frequent changes, it may be much more desirable to utilize other basic data structures underneath the index

Page 36: BlackRay FOSS Asia 2010

FOSS Asia 2010

Project Roadmap

Page 37: BlackRay FOSS Asia 2010

37 FOSS Asia 2010

Immediate Roadmap

● Planned 0.11.0 – Due in Fall 2010● Pluggable Function architecture (loadable libraries)● Make all index functions available in SQL● Support for Prepared Statements (ODBC/JDBC) ● Improved thread and memory management (Perftools?)

● BlackRay Admin Console (Remora) 0.11● Engine Statistics via GUI● Cluster Node management

Page 38: BlackRay FOSS Asia 2010

38 FOSS Asia 2010

Shortterm Roadmap

● Planned 0.12.0 – Due in February 2012● Realtime INSERT/UPDATE/DELETE● SQL to support subselect● Default aggregate functions (SUM/AVG/....)● Fix several potential memory leaks (smart pointers)

● The 0.12 release should be the last pre-GA release

Page 39: BlackRay FOSS Asia 2010

39 FOSS Asia 2010

Midterm Roadmap

● Scalability Features● Sharding & Partitioning Options● Federated Search

● Fully portable snapshot format (across platforms)● Query Performance Analyzer● Improved Statistics Module with GUI● BlackRay as a Storage Backend for SUN OpenDS

LDAP Engine

Page 40: BlackRay FOSS Asia 2010

40 FOSS Asia 2010

Midterm Roadmap

● Security Features● Improved User and Access Control concepts● SSL for all connections● External User Store (LDAP/OpenSSO/PAM...)

● Increased Platform support● Windows 7 and Windows Server platforms● Embedded platforms

● Other, random features by popular request.

Page 41: BlackRay FOSS Asia 2010

FOSS Asia 2010

The Team behind BlackRay

Page 42: BlackRay FOSS Asia 2010

42 FOSS Asia 2010

SoftMethod GmbH

● SoftMethod GmbH initiated the project in 2005 ● Company was founded in 2004 and currently has

10 employees● Focus of SoftMethod is high performance software

engineering● Product portfolio includes telco/contact center and

LDAP applications● SoftMethod also offers load testing and technical

software quality assurance support.

Page 43: BlackRay FOSS Asia 2010

43 FOSS Asia 2010

Development Team

● Felix Schupp, Initiator and Project Sponsor● Thomas Wunschel, Architect and Lead Developer● Mike Alexeev, Key Contributor (SQL/Functions)● Souvik Roy, Performance Analysis and Tools● Simon Courtenage, C++ and boost expert

Page 44: BlackRay FOSS Asia 2010

FOSS Asia 2010

Wrap-Up

Page 45: BlackRay FOSS Asia 2010

45 FOSS Asia 2010

What to do next

● Get BlackRay:● Register yourself on http://forge.softmethod.de● SVN checkout available at

http://svn.softmethod.de/opensource/blackray/trunk● Get Involved

● Anyone can register and create tickets, news etc● We have an active mailing list for discussion as well

● Contribute● We require a signed Contributor agreement before being

allowed commit access to the repository

Page 46: BlackRay FOSS Asia 2010

46 FOSS Asia 2010

Contact Us

● Website: http://www.blackray.org● Twitter: http://twitter.com/dataengine● Facebook http://facebook.com/dataengine● Mailing List: http://lists.softmethod.de● Download: http://sourceforge.net/projects/blackray

● Felix: [email protected]