36
Best Practices and Optimization with Infobright John Ringhofer Sales Engineer [email protected]

Infobright Best Practices

  • Upload
    mishaiv

  • View
    429

  • Download
    2

Embed Size (px)

DESCRIPTION

Infobright Best Practices May 2012

Citation preview

Page 1: Infobright Best Practices

Best Practices and Optimization with Infobright

John RinghoferSales [email protected]

Page 2: Infobright Best Practices

Agenda

Infobright Architecture Review Installation Tips Leveraging the Architecture Toolset Integration Getting Started Down the Right Path Q&A

Page 3: Infobright Best Practices

Infobright Architecture Review

Page 4: Infobright Best Practices

Infobright Technology: Key Concepts

1. Column orientation2. Data packs and Compression3. Knowledge Grid4. Granular Engine

Page 5: Infobright Best Practices

1. Column Orientation

EMP_ID  FNAME  LNAME  SALARY 1 Moe Howard 100002 Curly Joe 120003 Larry Fine 9000

Works well with aggregate results (sum, count, avg. ) Only columns that are relevant need to be touched Consistent performance with any database design Allows for very efficient compression

Column Oriented Layout(1,2,3; Moe,Curly,Larry; Howard,Joe,Fine; 10000,12000,9000;)

Incoming Data

Page 6: Infobright Best Practices

2. Data Packs and Compression

64K

64K

64K

64K

Data Packs Each data pack contains 65,536 data values Compression is applied to each individual data pack The compression algorithm varies depending on data type and distribution

Compression Results vary depending on the distribution of data among data packs A typical overall compression ratio seen in the field is 10:1 Some customers have seen results of 40:1 and higher For example, 1TB of raw data compressed 10 to 1 would only require 100GB of disk capacity

Patent‐PendingCompressionAlgorithms

Page 7: Infobright Best Practices

3. The Knowledge Grid

Knowledge Gridapplies to the whole table

Column A Column B …

DP1

DP2

DP3

DP4

DP5

DP6

Information about the data

Knowledge Nodesbuilt for each Data Pack

Dynamic knowledge

Global knowledge

String and character data

Numeric data

Distributions

Built duringLOAD

Built per query E.g. for aggregates, joins

DP1

Column A

Knowledge Nodes answer the query directly, or Identify only required Data Packs, minimizing decompression, and Predict required data in advance based on workload

Page 8: Infobright Best Practices

4. Granular Engine

Infobright Database

ReportRough Set Granular Engine

1%

Compressed Data

Page 9: Infobright Best Practices

Leverage DomainExpert

DomainExpert: Breakthrough Analytics Enables Infobright and users to add

intelligence into Knowledge Grid directly with no schema changes

Optimized for web data analysis IP addresses Email addresses URL/URI

Can cut query time in half when using this data

Improves compression

DomainExpert

• Intelligence to automatically optimize the database

• Intelligence to automatically optimize the database

DomainExpert

Page 10: Infobright Best Practices

Leverage DomainExpert

Pattern recognition in data enables faster query performance Patterns defined and stored Complex fields decomposed into more homogeneous parts Database uses this information when processing query

IB 4.0 delivered with pre-defined data types common to machine-generated data URL E-Mail addresses IP Addresses

Users can also easily add their own data patterns Identify strings, numerics, or constants Financial Trading example– ticker feed

“AAPL–350,354,347,349” encoded “%s-%d,%d,%d,%d”

http://www.infobright.com/News-&-Events/Events/

Page 11: Infobright Best Practices

Infobright Architected on MySQL

“The world’s most popular open source database”

Page 12: Infobright Best Practices

Installation Tips

Page 13: Infobright Best Practices

• Install Directory: Don’t install to Linux /home or Windows /program files

IEE Evaluation Key (Linux or Windows): Put .lic file in the /path/to/infobright directory. (Eval only)

IEE Evaluation Key Windows: Put the .license file in the installation directory. (Eval only)

IEE Evaluation: After starting Infobright for the first time, backup the data/iblicense.dat file.

Note: The memory settings assume that there are no other services on the machine consuming significant memory. If this is not the case, lower the memory settings for Infobright.

Installation Tips

Page 14: Infobright Best Practices

Leveraging the Architecture

Page 15: Infobright Best Practices

Leverage Column Orientation

DO• Create tables with lots of columns• Only use the columns you need to

complete a specific query

Avoid• Select * from wide tables• Using views with many columns

Page 16: Infobright Best Practices

Leverage Data Packs – Bulk Loads

Expect transactional insert to be slow• Uses MySQL API• When using JDBC, refer to:

• http://www.infobright.org/Blog/Entry/using_the_mysql_jdbc_driver_with_infobright_for_data_loading/

• Overhead• Decompression & Compression

Infobright Bulk Loader• Optimized for Infobright• Writes data packs, not rows• Up to 150GB per hour

Distributed Load Processor• Optimized for Infobright• Multiple servers creating data packs• Has loaded 10TB per hour into a SINGLE table

Page 17: Infobright Best Practices

Leverage Compression – Backups/Restore

Compressed data = faster backup• Avoid MySQLDump• Use compressed database files

Backup Procedure • Entire Infobright Directory• Full Backup (e.g. Weekly)• Regular Incremental Backups (e.g.

Daily)

Restore Procedure• Copy backup image to Infobright

directory (e.g. data image to data directory)

Page 18: Infobright Best Practices

Leverage Fault Tolerance – High Availability

Block level replication • Active/Passive• Moves compressed

data image• DRBD• Use DLP, Infobright

loader, or MySQL loader

• Use IEE or ICE

Page 19: Infobright Best Practices

Leverage Fault Tolerance – High Availability

Use DLP to Load the Same Data to Multiple Locations Simultaneously (IEE only)

• Active/Active• Very fast data loading • Can be scaled out to

support as many servers as needed

• Servers are highly-available to process queries since the data packs are created remotely

• No time lag as with asynchronous replication; data is immediately available to be queried.

Page 20: Infobright Best Practices

Leverage the Knowledge Grid

• Do constrain the fact table directly• Do use sub-selects instead of joins• Do add additional columns to

create useful knowledge nodes• Do remove references to indexes

and other constraints (PK, FK)• Do remove aggregate, reporting

and summary tables per use case.Everyone wants to be a Star

Adding as many WHERE conditions as you can to your SQL increases the chance that knowledge grid statistics can be used to speed up your queries.

Page 21: Infobright Best Practices

select sum(dlr_trans_amt), msa_id from fact_sales a wheretrans_date in (select trans_date from dim_dates b where b.trans_year=2006 and b.trans_month='MARCH') and msa_id in (select msa_id from dim_msa where msa_name in ('BIRMINGHAMHOOVER', 'NAPLESMARCO ISLAND', 'CHAMPAIGNURBANA')group by msa_id;

3 rows in set (21.28 sec)

select sum(dlr_trans_amt), a.msa_id from fact_sales a, dim_dates b, dim_msa c where a.trans_date=b.trans_date and a.msa_id=c.msa_idand b.trans_year=2006 and b.trans_month='MARCH'

and c.msa_name in ('BIRMINGHAMHOOVER', 'NAPLESMARCO ISLAND', 'CHAMPAIGNURBANA')group by a.msa_id;

3 rows in set (3 min 11.65 sec)

Becomes

Leverage the Knowledge Grid – Query Example

Original SQL

Page 22: Infobright Best Practices

Leverage the Knowledge Grid - DML

Delete, Truncate and Update Operations• Query performance can suffer • Updates mimics Delete/Insert• Drop/Create Alternative

• Speed and Simplicity• Works in ICE as well as IEE• Knowledge Nodes permanently

deleted

Infobright Reorg (defragment)• Reload data• Knowledge Grid refreshes• http://www.infobright.org/Downloads/

Contributed‐Software/

Page 23: Infobright Best Practices

Leverage the Granular Engine – Optimizing Queries

Original: SELECT t1.a, sum(t2.b) FROM t1 JOIN t2 ON t1.key=t2.keyWHERE t1.x > 0 AND t2.y = 5GROUP BY t1.a;Modified: SELECT t1copy.a, sum(temp_tab.sum2) FROM( SELECT t2.key AS k2, sum(t2.b) AS sum2 FROM t1 JOIN t2 ON

t1.key=t2.keyWHERE t1.x > 0 AND t2.y = 5GROUP BY t2.key ) temp_tab, t1 t1copy

WHERE temp_tab.k2 = t1copy.keyGROUP BY t1copy.a;

Union All Faster than just using “Union” Leveraging Columnar Architecture Only select necessary columns (avoid “select *”)

Page 24: Infobright Best Practices

Leverage the Granular Engine – Data Types

Integers perform best• Join columns• Surrogate keys

Character best practice• Sub-selects with surrogate keys • Column option ‘lookup’

• http://www.infobright.org/wiki/How_and_When_to_use_Lookups/

• Chksum columns on large strings• Binary collations

Page 25: Infobright Best Practices

Leverage the Granular Engine – Table Definitions

Create Table Customer(Customer_Key integer,Customer_Name varchar(50),Customer_Address varchar(300),Category varchar(10) comment ‘lookup’,Customer_Name_MD5 bigint,Customer_Address_MD5 bigint);

Create Table Customer(Customer_Key varchar(10),Customer_Name varchar(50),Customer_Address varchar(300),Category varchar(10));

BecomesOriginal DDL

SELECT ... FROM table WHERE str=‘value’.

SELECT ... FROM table WHERE str=‘value’ AND cksum=cksum(str)

Original Query

Becomes…

Page 26: Infobright Best Practices

Toolset Integration

Page 27: Infobright Best Practices

Bear in Mind

The unique attributes of Infobright are transparent to developers.

The benefits are obvious and immediate to users. Infobright is a relational database Infobright observes and obeys SQL standards Infobright observes and obeys standards-based

connectivity Design tools Development tools Administrative tools Query and reporting tools

Page 28: Infobright Best Practices

Infobright Development

When developing applications, you can use: Industry standard interfaces including

those listed below; Comprehensive Management Services

and Utilities; Robust connectivity with BI Tools.

Connector/ODBCConnector/NETConnector/JConnector/MXJConnector/C++Connector/C

C APIPHP APIPerl APIC++ APIPython APIRuby APIs

Note: API calls are restricted to the functional support of the Brighthouse engine. (e.g. mysql_stmt_insert_id )

Page 29: Infobright Best Practices

Popular Database Tools

MySQL Workbench Toad for MySQL on Windows Navicat for MySQL on Mac PHPMyAdmin

Points to Remember Default port =5029 No Explain Plan

Page 30: Infobright Best Practices

Infobright Technology Partners

BI tools MicroStrategy Jaspersoft Pentaho BIRT

ETL Tools Talend (aka Jasper ETL) Pentaho Data Integration

Page 31: Infobright Best Practices

Getting Started Down the Right Path

Page 32: Infobright Best Practices

Plan your Evaluation

Before starting your evaluation, define a concise set of target objectives and requirements

Remember that Infobright shows value with medium to large data sizes (>100GB and growing on up) Don’t undersize the evaluation

A planning document is available that can help with the evaluation exercise

Page 33: Infobright Best Practices

Migration Tools – ICE Breakers

The community has contributed code called ICE Breakers that can make data migration easier

Industry tools can be used but will require manual intervention

Page 34: Infobright Best Practices

Additional Resources

Both infobright.com and infobright.org have additional documentation and white papers

Ask a question on the Infobright forum

If you are new to MySQL you can also visit www.mysql.com for additional help

You can also visit the Infobright YouTube channel

Page 35: Infobright Best Practices

Infobright Community Edition with Pentaho: http://www.infobright.org/Downloads/Pentaho_ICE_VM/

Infobright Community Edition with Talend: http://www.infobright.org/Downloads/Talend_VM/

Infobright Community Edition with Jaspersoft: http://www.infobright.org/Downloads/Jaspersoft_ICE_VM/

Infobright Community Edition with Jaspersoft AND Talend: http://www.infobright.org/Downloads/Talend_VM/

Integrated VMs with Partner Solutions

Page 36: Infobright Best Practices

Questions?

For the open community ICE Quick Start (http://www.infobright.org/wiki) ICE FAQ (http://www.infobright.org/Resources/FAQ/) ICE Data Loading Guide

(http://http://www.infobright.org/wiki/Data_Loading/ MySQL Online Tutorial

(http://dev.mysql.com/doc/refman/5.1/en/tutorial.html)

For Licensed or Evaluation Customers IEE Quick Start and Knowledgebase

(http://www.infobright.com/Wiki/IEE_Wiki) Screencasts (http://support.infobright.com/Training-Screencasts/)

Please check out our forums at (http://www.infobright.org/Forums)

080