Upload
djkucera
View
130
Download
0
Tags:
Embed Size (px)
Citation preview
COLLABORATE 12April 22-26, 2012
Mandalay Bay Convention CenterLas Vegas, Nevada, USA
www.collaborate12.orgwww.collaborate12.ioug.org
2
Leveraging and Enriching the capabilities of Oracle Database 11g
Dylan Kucera – [email protected]_Director – Data ArchitectureOntario Teachers’ Pension Plan
Oracle OpenWorld – Session 3700Sun Oct 2 – 12:15 – 1:15pm
3 Themes
• Useful “out of the box” features to
support the operations of your DW• Easily adding capabilities that
Oracle doesn’t give you in the box• Why you don’t have to live with
legacy indefinitely
4 Database Capabilities to support your DW
• Partitioning• Advanced Compression• Oracle DB JVM – DW integration• Advanced Queueing• Oracle Streams – Legacy DW/DM• Data Warehouse Modernization
5 PROBLEM: Rolling analysis results
“The results of my analytics produce
120M rows. Each day I need to
delete the oldest set of analysis
results and insert a new set for the
new day. I need this operation to
run as quickly as possible. When
working with an analysis set, I
need to not be bogged down by the
analysis results from other days.”
6 Partitioning Use Case – Data Retention
7Partitioning Use Case – Ease of Implementation
Instead of
DELETE FROM RISK.PNL_DETAIL
WHERE ANALYSIS_ID = 12345
Use
ALTER TABLE RISK.PNL_DETAIL
DROP PARTITION P12345
8 Partitioning Use Case – Outcomes
• Hours process time reduction• Eliminates unnecessary pressure
on undo space• Avoids costly index maintenance –
access by partition/sub-partition
9 PROBLEM: Tracability of DW ETL
“We receive our data warehouse
feeds from counterparties and
vendors who send us CSV’s, XML,
Fixed width, etc. files.
Occasionally I need to refer back to
the source file when data in the
Data Warehouse is in question. I
could need to go back as far as 3
months.”
10 Use Case – Advanced Compression
11 Use Case – Advanced Compression
Connected to Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 Connected as OPS$KUCERAD SQL> SELECT COUNT(ETL_ID), ROUND(AVG(LENGTH_)), TO_CHAR(SUM(LENGTH_),
'FM999,999,999,999') FROM ETL.ETL_ARCHIVE; COUNT(ETL_ID) ROUND(AVG(LENGTH_)) TO_CHAR(SUM(LENGTH_),'FM999,99------------- ------------------- ------------------------------ 216413 269977 58,426,579,616 SQL>
CREATE TABLE ETL.ETL_ARCHIVE ( ETL_ID NUMBER, DATE_ DATE, LENGTH NUMBER FILE_ BLOB) LOB(FILE_) STORE AS SECUREFILE ( COMPRESS CACHE )
Back to Data Retention:Interval Partition!!
12 PROBLEM: Proprietary file/protocol formats
“We just signed up for a new data
feed from ConvolutaCorp. They
will be expecting us to pick up
their file from their SFTP site. The
file will be in Microsoft Excel (.XLS)
format.”
13 Oracle Database JVM
14 Oracle Database JVM
15
E:\>loadjava -schema ETL -user *******/********@EDBDEV -verbose jxl.jar
…etc…
creating : class ETL.jxl/NumberCellloading : class ETL.jxl/NumberCellcreating : class ETL.jxl/NumberFormulaCellloading : class ETL.jxl/NumberFormulaCellcreating : class ETL.jxl/StringFormulaCellloading : class ETL.jxl/StringFormulaCellClasses Loaded: 520Resources Loaded: 7Sources Loaded: 0Published Interfaces: 0Classes generated: 0Classes skipped: 0Synonyms Created: 0Errors: 0
E:\>
Oracle Database JVM
16 Oracle Database JVM
17 Oracle Database JVM
18 Oracle Database JVM
19 Oracle Database JVM
20
• Apache Commons – FTP(S), POP,
Telnet, etc.http://commons.apache.org/net/
• Orion SSH2 – Secure SFTPhttp://sourceforge.net/projects/orion-ssh2/
• jCIFS – NetBIOS/CIFS/SMB Network
protocols (MS Windows))http://jcifs.samba.org/
Oracle Database JVM – Other useful Libraries
21 PROBLEM: Notifying when DW changes
“I need my application to be notified
whenever data changes in these
(they hand you a list) Data
Warehouse tables. Some of these
tables are updated directly by
Users who link the tables via
Microsoft Access in order to
override or supplement the data
feeds.”
22Oracle Advanced Queueing Asynchronous Notification
AQReturns control almost
instantly
Application - Producer HTTP
SMTP
JMS
AQ/JMSSubscri
ber
23PROBLEM: Asynchronous Data Warehouse Updates
“When we close the ‘Month End’, I
need the Data Warehouse to
update its Materialized Views
ASAP. Our ETL folks have told us
that the process takes 30-40
minutes. Our users don’t want
their workflow tool to hang for that
long waiting for the update, they
just want to know when it is done.”
24 AQ Use case – Publishing Materialized Views
Workflow Tool
AQ
Returns control almost instantly
Authorized Employee Closes the Month-end
Materialized Views
Published data
20 minutes elapsed
25 PROBLEM: Legacy DW and DM’s
“10 years ago we wrote a key data
acquisition process to load to this
isolated legacy SQL Server over
here. We need this data in our
central Oracle Data Warehouse.
However, existing users must be
assured that the data is available
each day in the legacy SQL Server
for the next 6-9 months.”
26Data Warehouse Centralization – Heterogeneous Oracle Streams
Oracle Streams
ETL
27Heterogeneous Streams – General Architecture
Diagram Adapted from “Oracle Database 11g: Oracle Streams Replication, An Oracle White Paper, July 2007”
28Heterogeneous Streams – Before inserts to Captured Oracle table
29Heterogeneous Streams – Insert some rows into Captured Oracle table
SQL> INSERT INTO PLAY.NHL_PLAYER_STAT (DATE_, TEAM, SWEATER_NO, NAME_, BIRTH_DATE, POINTS, FACE_OFF_PCT) 2 VALUES ('2010-08-31', 'VAN', 33, 'Henrik Sedin', '1980-09-26', 112, 49.5); 1 row insertedSQL> INSERT INTO PLAY.NHL_PLAYER_STAT (DATE_, TEAM, SWEATER_NO, NAME_, BIRTH_DATE, POINTS, FACE_OFF_PCT) 2 VALUES ('2010-08-31', 'PIT', 87, 'Sidney Crosby', '1987-08-07', 109, 55.9); 1 row insertedSQL> INSERT INTO PLAY.NHL_PLAYER_STAT (DATE_, TEAM, SWEATER_NO, NAME_, BIRTH_DATE, POINTS, FACE_OFF_PCT) 2 VALUES ('2010-08-31', 'WSH', 8, 'Alex Ovechkin', '1985-09-17', 109, 45.4); 1 row insertedSQL> INSERT INTO PLAY.NHL_PLAYER_STAT (DATE_, TEAM, SWEATER_NO, NAME_, BIRTH_DATE, POINTS, FACE_OFF_PCT) 2 VALUES ('2010-08-31', 'SJS', 19, 'Joe Thornton', '1979-07-02', 89, 53.9); 1 row insertedSQL> INSERT INTO PLAY.NHL_PLAYER_STAT (DATE_, TEAM, SWEATER_NO, NAME_, BIRTH_DATE, POINTS, FACE_OFF_PCT) 2 VALUES ('2010-08-31', 'OTT', 11, 'Daniel Alfredsson', '1972-12-11', 71, 35.0); 1 row insertedSQL> INSERT INTO PLAY.NHL_PLAYER_STAT (DATE_, TEAM, SWEATER_NO, NAME_, BIRTH_DATE, POINTS, FACE_OFF_PCT) 2 VALUES ('2010-08-31', 'CGY', 12, 'Jarome Iginla', '1977-07-01', 69, 47.0 ); 1 row insertedSQL> INSERT INTO PLAY.NHL_PLAYER_STAT (DATE_, TEAM, SWEATER_NO, NAME_, BIRTH_DATE, POINTS, FACE_OFF_PCT) 2 VALUES ('2010-08-31', 'TOR', 15, 'Tomas Kaberle', '1978-03-02', 49, NULL); 1 row inserted SQL> COMMIT; Commit complete
30Heterogeneous Streams – After inserts to Captured Oracle table
31 PROBLEM: Data Warehouse Modernization
“Our data warehouse structures are
outdated. Our data modelers have
come up with a new set of
structures. We need to keep the
old data structures alive for quite
some time to give people a chance
to migrate. Users are
opportunistic people who will, if
not prevented, employ both old
and new.”
32 Oracle 11gR2 - Edition Based Redefinition
33 Wrap-up
• Oracle has likely thought of the
feature or optimization you need• When they haven’t, the Java Open
Source community likely has• Data re-centralization IS possible
with creative Oracle Streams use• Don’t let your users drag the
legacy along forever anymore, EBR
is here!
34
Q&A
Dylan Kucera – [email protected]_Director – Data ArchitectureOntario Teachers’ Pension Plan
Oracle OpenWorld – Session 3700Sun Oct 2, 2011 – 12:15 – 1:15pm
Thank You!
COLLABORATE 12April 22-26, 2012
Mandalay Bay Convention CenterLas Vegas, Nevada, USA
www.collaborate12.orgwww.collaborate12.ioug.org