Upload
mishaiv
View
429
Download
2
Embed Size (px)
DESCRIPTION
Infobright Best Practices May 2012
Citation preview
Best Practices and Optimization with Infobright
John RinghoferSales [email protected]
Agenda
Infobright Architecture Review Installation Tips Leveraging the Architecture Toolset Integration Getting Started Down the Right Path Q&A
Infobright Architecture Review
Infobright Technology: Key Concepts
1. Column orientation2. Data packs and Compression3. Knowledge Grid4. Granular Engine
1. Column Orientation
EMP_ID FNAME LNAME SALARY 1 Moe Howard 100002 Curly Joe 120003 Larry Fine 9000
Works well with aggregate results (sum, count, avg. ) Only columns that are relevant need to be touched Consistent performance with any database design Allows for very efficient compression
Column Oriented Layout(1,2,3; Moe,Curly,Larry; Howard,Joe,Fine; 10000,12000,9000;)
Incoming Data
2. Data Packs and Compression
64K
64K
64K
64K
Data Packs Each data pack contains 65,536 data values Compression is applied to each individual data pack The compression algorithm varies depending on data type and distribution
Compression Results vary depending on the distribution of data among data packs A typical overall compression ratio seen in the field is 10:1 Some customers have seen results of 40:1 and higher For example, 1TB of raw data compressed 10 to 1 would only require 100GB of disk capacity
Patent‐PendingCompressionAlgorithms
3. The Knowledge Grid
Knowledge Gridapplies to the whole table
Column A Column B …
DP1
DP2
DP3
DP4
DP5
DP6
Information about the data
Knowledge Nodesbuilt for each Data Pack
Dynamic knowledge
Global knowledge
String and character data
Numeric data
Distributions
Built duringLOAD
Built per query E.g. for aggregates, joins
DP1
Column A
Knowledge Nodes answer the query directly, or Identify only required Data Packs, minimizing decompression, and Predict required data in advance based on workload
4. Granular Engine
Infobright Database
ReportRough Set Granular Engine
1%
Compressed Data
Leverage DomainExpert
DomainExpert: Breakthrough Analytics Enables Infobright and users to add
intelligence into Knowledge Grid directly with no schema changes
Optimized for web data analysis IP addresses Email addresses URL/URI
Can cut query time in half when using this data
Improves compression
DomainExpert
• Intelligence to automatically optimize the database
• Intelligence to automatically optimize the database
DomainExpert
Leverage DomainExpert
Pattern recognition in data enables faster query performance Patterns defined and stored Complex fields decomposed into more homogeneous parts Database uses this information when processing query
IB 4.0 delivered with pre-defined data types common to machine-generated data URL E-Mail addresses IP Addresses
Users can also easily add their own data patterns Identify strings, numerics, or constants Financial Trading example– ticker feed
“AAPL–350,354,347,349” encoded “%s-%d,%d,%d,%d”
http://www.infobright.com/News-&-Events/Events/
Infobright Architected on MySQL
“The world’s most popular open source database”
Installation Tips
• Install Directory: Don’t install to Linux /home or Windows /program files
IEE Evaluation Key (Linux or Windows): Put .lic file in the /path/to/infobright directory. (Eval only)
IEE Evaluation Key Windows: Put the .license file in the installation directory. (Eval only)
IEE Evaluation: After starting Infobright for the first time, backup the data/iblicense.dat file.
Note: The memory settings assume that there are no other services on the machine consuming significant memory. If this is not the case, lower the memory settings for Infobright.
Installation Tips
Leveraging the Architecture
Leverage Column Orientation
DO• Create tables with lots of columns• Only use the columns you need to
complete a specific query
Avoid• Select * from wide tables• Using views with many columns
Leverage Data Packs – Bulk Loads
Expect transactional insert to be slow• Uses MySQL API• When using JDBC, refer to:
• http://www.infobright.org/Blog/Entry/using_the_mysql_jdbc_driver_with_infobright_for_data_loading/
• Overhead• Decompression & Compression
Infobright Bulk Loader• Optimized for Infobright• Writes data packs, not rows• Up to 150GB per hour
Distributed Load Processor• Optimized for Infobright• Multiple servers creating data packs• Has loaded 10TB per hour into a SINGLE table
Leverage Compression – Backups/Restore
Compressed data = faster backup• Avoid MySQLDump• Use compressed database files
Backup Procedure • Entire Infobright Directory• Full Backup (e.g. Weekly)• Regular Incremental Backups (e.g.
Daily)
Restore Procedure• Copy backup image to Infobright
directory (e.g. data image to data directory)
Leverage Fault Tolerance – High Availability
Block level replication • Active/Passive• Moves compressed
data image• DRBD• Use DLP, Infobright
loader, or MySQL loader
• Use IEE or ICE
Leverage Fault Tolerance – High Availability
Use DLP to Load the Same Data to Multiple Locations Simultaneously (IEE only)
• Active/Active• Very fast data loading • Can be scaled out to
support as many servers as needed
• Servers are highly-available to process queries since the data packs are created remotely
• No time lag as with asynchronous replication; data is immediately available to be queried.
Leverage the Knowledge Grid
• Do constrain the fact table directly• Do use sub-selects instead of joins• Do add additional columns to
create useful knowledge nodes• Do remove references to indexes
and other constraints (PK, FK)• Do remove aggregate, reporting
and summary tables per use case.Everyone wants to be a Star
Adding as many WHERE conditions as you can to your SQL increases the chance that knowledge grid statistics can be used to speed up your queries.
select sum(dlr_trans_amt), msa_id from fact_sales a wheretrans_date in (select trans_date from dim_dates b where b.trans_year=2006 and b.trans_month='MARCH') and msa_id in (select msa_id from dim_msa where msa_name in ('BIRMINGHAMHOOVER', 'NAPLESMARCO ISLAND', 'CHAMPAIGNURBANA')group by msa_id;
3 rows in set (21.28 sec)
select sum(dlr_trans_amt), a.msa_id from fact_sales a, dim_dates b, dim_msa c where a.trans_date=b.trans_date and a.msa_id=c.msa_idand b.trans_year=2006 and b.trans_month='MARCH'
and c.msa_name in ('BIRMINGHAMHOOVER', 'NAPLESMARCO ISLAND', 'CHAMPAIGNURBANA')group by a.msa_id;
3 rows in set (3 min 11.65 sec)
Becomes
Leverage the Knowledge Grid – Query Example
Original SQL
Leverage the Knowledge Grid - DML
Delete, Truncate and Update Operations• Query performance can suffer • Updates mimics Delete/Insert• Drop/Create Alternative
• Speed and Simplicity• Works in ICE as well as IEE• Knowledge Nodes permanently
deleted
Infobright Reorg (defragment)• Reload data• Knowledge Grid refreshes• http://www.infobright.org/Downloads/
Contributed‐Software/
Leverage the Granular Engine – Optimizing Queries
Original: SELECT t1.a, sum(t2.b) FROM t1 JOIN t2 ON t1.key=t2.keyWHERE t1.x > 0 AND t2.y = 5GROUP BY t1.a;Modified: SELECT t1copy.a, sum(temp_tab.sum2) FROM( SELECT t2.key AS k2, sum(t2.b) AS sum2 FROM t1 JOIN t2 ON
t1.key=t2.keyWHERE t1.x > 0 AND t2.y = 5GROUP BY t2.key ) temp_tab, t1 t1copy
WHERE temp_tab.k2 = t1copy.keyGROUP BY t1copy.a;
Union All Faster than just using “Union” Leveraging Columnar Architecture Only select necessary columns (avoid “select *”)
Leverage the Granular Engine – Data Types
Integers perform best• Join columns• Surrogate keys
Character best practice• Sub-selects with surrogate keys • Column option ‘lookup’
• http://www.infobright.org/wiki/How_and_When_to_use_Lookups/
• Chksum columns on large strings• Binary collations
Leverage the Granular Engine – Table Definitions
Create Table Customer(Customer_Key integer,Customer_Name varchar(50),Customer_Address varchar(300),Category varchar(10) comment ‘lookup’,Customer_Name_MD5 bigint,Customer_Address_MD5 bigint);
Create Table Customer(Customer_Key varchar(10),Customer_Name varchar(50),Customer_Address varchar(300),Category varchar(10));
BecomesOriginal DDL
SELECT ... FROM table WHERE str=‘value’.
SELECT ... FROM table WHERE str=‘value’ AND cksum=cksum(str)
Original Query
Becomes…
Toolset Integration
Bear in Mind
The unique attributes of Infobright are transparent to developers.
The benefits are obvious and immediate to users. Infobright is a relational database Infobright observes and obeys SQL standards Infobright observes and obeys standards-based
connectivity Design tools Development tools Administrative tools Query and reporting tools
Infobright Development
When developing applications, you can use: Industry standard interfaces including
those listed below; Comprehensive Management Services
and Utilities; Robust connectivity with BI Tools.
Connector/ODBCConnector/NETConnector/JConnector/MXJConnector/C++Connector/C
C APIPHP APIPerl APIC++ APIPython APIRuby APIs
Note: API calls are restricted to the functional support of the Brighthouse engine. (e.g. mysql_stmt_insert_id )
Popular Database Tools
MySQL Workbench Toad for MySQL on Windows Navicat for MySQL on Mac PHPMyAdmin
Points to Remember Default port =5029 No Explain Plan
Infobright Technology Partners
BI tools MicroStrategy Jaspersoft Pentaho BIRT
ETL Tools Talend (aka Jasper ETL) Pentaho Data Integration
Getting Started Down the Right Path
Plan your Evaluation
Before starting your evaluation, define a concise set of target objectives and requirements
Remember that Infobright shows value with medium to large data sizes (>100GB and growing on up) Don’t undersize the evaluation
A planning document is available that can help with the evaluation exercise
Migration Tools – ICE Breakers
The community has contributed code called ICE Breakers that can make data migration easier
Industry tools can be used but will require manual intervention
Additional Resources
Both infobright.com and infobright.org have additional documentation and white papers
Ask a question on the Infobright forum
If you are new to MySQL you can also visit www.mysql.com for additional help
You can also visit the Infobright YouTube channel
Infobright Community Edition with Pentaho: http://www.infobright.org/Downloads/Pentaho_ICE_VM/
Infobright Community Edition with Talend: http://www.infobright.org/Downloads/Talend_VM/
Infobright Community Edition with Jaspersoft: http://www.infobright.org/Downloads/Jaspersoft_ICE_VM/
Infobright Community Edition with Jaspersoft AND Talend: http://www.infobright.org/Downloads/Talend_VM/
Integrated VMs with Partner Solutions
Questions?
For the open community ICE Quick Start (http://www.infobright.org/wiki) ICE FAQ (http://www.infobright.org/Resources/FAQ/) ICE Data Loading Guide
(http://http://www.infobright.org/wiki/Data_Loading/ MySQL Online Tutorial
(http://dev.mysql.com/doc/refman/5.1/en/tutorial.html)
For Licensed or Evaluation Customers IEE Quick Start and Knowledgebase
(http://www.infobright.com/Wiki/IEE_Wiki) Screencasts (http://support.infobright.com/Training-Screencasts/)
Please check out our forums at (http://www.infobright.org/Forums)
080