View
220
Download
0
Tags:
Embed Size (px)
Citation preview
1Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
OCL3 Oracle 10g:SQL & PL/SQLSession #10
Matthew P. Johnson
CISDD, CUNY
January, 2005
2Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Agenda Security & web apps
RegEx support in 10g
Oracle & XML
Data warehousing
More on the PL/SQL labs
Any more lab?
3Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Review: Why security is hard It’s a “negative deliverable”
It’s an asymmetric threat
Tolstoy: “Happy families are all alike; every unhappy family is unhappy in its own way.” Analogs: “homeland”, jails, debugging, proof-
reading, Popperian science, fishing, MC algs
So: fix biggest problems first
4Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
DB users have privileges SELECT: read access to all columns INSERT(col-name): can insert rows with non-
default values in this column INSERT: can insert rows with non-default values in
all columns DELETE REFERENCES(col-name): can define foreign keys
that refer to (or other constraints that mention) this column
TRIGGER: triggers can reference table EXECUTE: can run function/SP
5Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Granting privileges (Oracle) One method of setting access levels Creator of object automatically gets all
privileges to it Possible objects: tables, whole databases, stored
functions/procedures, etc. <DB-name>.* - all tables in DB
A privileged user can grant privileges to other users or groups
GRANT privileges ON object TO user <WITH GRANT OPTION>GRANT privileges ON object TO user <WITH GRANT OPTION>GRANT SELECT ON mytable TO someone WITH GRANT OPTION;GRANT SELECT ON mytable TO someone WITH GRANT OPTION;
6Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Granting and revoking Privileged user has privileges Privileged-WGO user can grant them, w/wo GO Granter can revoke privileges or GO Revocation cascades by default
To prevent, use RESTRICT (at end of cmd) If would cascade, command fails
Can change owner:
ALTER TABLE my-tblOWNER TO new-owner;
ALTER TABLE my-tblOWNER TO new-owner;
7Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Granting and revoking What we giveth, we may taketh away mjohnson: (effects?)
george: (effects?)
mjohnson: (effects?)
GRANT SELECT, INSERT ON my-table TO george WITH GRANT OPTION;GRANT SELECT, INSERT ON my-table TO george WITH GRANT OPTION;
GRANT SELECT ON my-table TO laura;GRANT SELECT ON my-table TO laura;
REVOKE SELECT ON my-table FROM laura;REVOKE SELECT ON my-table FROM laura;
8Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Role-based authorization In SQL-1999, privileges assigned with roles For example:
Student role Instructor role Admin role
Each role gets to do same (sorts of) things
Privileges assigned by assigning role to users
GRANT SELECT ON my-table TO employee;GRANT SELECT ON my-table TO employee;
GRANT employee TO billg;GRANT employee TO billg;
9Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Passwords DBMS recognizes your privileges because it
recognizes you
how?
Storing passwords in the DB is a bad idea
10Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Hashed or digested passwords One-way hash function:
1. computing f(x) is easy;
2. Computing f-1(y) is hard/impossible;
3. Finding some x2 s.t. f(x2) = f(x) is hard/imposs “collisions”
Intuitively: seeing f(x) gives little (useful) info on x x “looks random” PRNGs
MD5, SHA-1 RFID for cars: http://www.rfidanalysis.org/
11Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Built-in accounts Many DBMSs (and OSs) have built-in demo
accounts by default In some versions, must “opt out”
MySQL: root/(blank) (closed on sales) http://lists.seifried.org/pipermail/security/2004-February/001782.html
Oracle: scott/tiger (was open on sales last year)
SQLServer: sa/(blank/null) http://support.microsoft.com/default.aspx?scid=kb;EN-US;31341
8
12Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Query-related: Injection attacks Here’s a situation:
Prompt for user/pass Do lookup:
If found, user gets in test.user table in MySQL http://pages.stern.nyu.edu/~mjohnson/dbms/php/loginph
p.txt http://pages.stern.nyu.edu/~mjohnson/dbms/php/login.php
Apart from no hashing, is this safe?
SELECT * FROM usersWHERE user=u AND password=p;
SELECT * FROM usersWHERE user=u AND password=p;
13Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Injection attacks
We expect to get input of something like: user: mjohnson pass: secret
SELECT * FROM usersWHERE user = u AND password = p;
SELECT * FROM usersWHERE user = u AND password = p;
SELECT * FROM usersWHERE user= 'mjohnson' AND password = 'secret';
SELECT * FROM usersWHERE user= 'mjohnson' AND password = 'secret';
14Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Injection attacks – MySQL/Perl/PHP
Consider another input: user: ' OR 1=1 OR user = ' pass: ' OR 1=1 OR pass = '
SELECT * FROM usersWHERE user = u AND password = p;
SELECT * FROM usersWHERE user = u AND password = p;
SELECT * FROM users
WHERE user = '' OR 1=1 OR user = '' AND password = '' OR 1=1 OR pass = '';
SELECT * FROM users
WHERE user = '' OR 1=1 OR user = '' AND password = '' OR 1=1 OR pass = '';
http://pages.stern.nyu.edu/~mjohnson/dbms/php/login.phphttp://pages.stern.nyu.edu/~mjohnson/dbms/eg/injection.txt
SELECT * FROM usersWHERE user = ''
OR 1=1OR user = ''AND password = ''OR 1=1OR pass = '';
SELECT * FROM usersWHERE user = ''
OR 1=1OR user = ''AND password = ''OR 1=1OR pass = '';
15Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Injection attacks – MySQL/Perl/PHP
Consider this one: user: your-boss' OR 1=1 # pass: abc
SELECT * FROM usersWHERE user = u AND password = p;
SELECT * FROM usersWHERE user = u AND password = p;
SELECT * FROM users
WHERE user = 'your-boss' OR 1=1 #' AND password = 'abc';
SELECT * FROM users
WHERE user = 'your-boss' OR 1=1 #' AND password = 'abc';
http://pages.stern.nyu.edu/~mjohnson/dbms/php/login.php
SELECT * FROM usersWHERE user = 'your-boss'
OR 1=1 #' AND password = 'abc';
SELECT * FROM usersWHERE user = 'your-boss'
OR 1=1 #' AND password = 'abc';
16Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Injection attacks – MySQL/Perl/PHP
Consider another input: user: your-boss pass: ' OR 1=1 OR pass = '
SELECT * FROM usersWHERE user = u AND password = p;
SELECT * FROM usersWHERE user = u AND password = p;
SELECT * FROM usersWHERE user = 'your-boss' AND password = '' OR 1=1 OR pass = '';
SELECT * FROM usersWHERE user = 'your-boss' AND password = '' OR 1=1 OR pass = '';
http://pages.stern.nyu.edu/~mjohnson/dbms/php/login.php
SELECT * FROM usersWHERE user = 'your-boss'
AND password = ''OR 1=1OR pass = '';
SELECT * FROM usersWHERE user = 'your-boss'
AND password = ''OR 1=1OR pass = '';
17Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Multi-command inj. attacks (other DBs)
Consider another input: user: '; DELETE FROM users WHERE user = 'abc'; SELECT FROM users WHERE password = '
pass: abc
SELECT * FROM usersWHERE user = u AND password = p;
SELECT * FROM usersWHERE user = u AND password = p;
SELECT * FROM users
WHERE user = ''; DELETE FROM users WHERE user = 'abc'; SELECT FROM users WHERE password = '' AND password = 'abc';
SELECT * FROM users
WHERE user = ''; DELETE FROM users WHERE user = 'abc'; SELECT FROM users WHERE password = '' AND password = 'abc';
SELECT * FROM users WHERE user = '';DELETE FROM users WHERE user = 'abc'; SELECT FROM users WHERE password = ''
AND password = 'abc';
SELECT * FROM users WHERE user = '';DELETE FROM users WHERE user = 'abc'; SELECT FROM users WHERE password = ''
AND password = 'abc';
18Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Consider another input: user: '; DROP TABLE users; SELECT FROM users WHERE password = '
pass: abc
SELECT * FROM usersWHERE user = u AND password = p;
SELECT * FROM usersWHERE user = u AND password = p;
SELECT * FROM users
WHERE user = ''; DROP TABLE users; SELECT FROM users WHERE password = '' AND password = 'abc';
SELECT * FROM users
WHERE user = ''; DROP TABLE users; SELECT FROM users WHERE password = '' AND password = 'abc';
SELECT * FROM users WHERE user = '';DROP TABLE users;SELECT FROM users WHERE password = ''
AND password = 'abc';
SELECT * FROM users WHERE user = '';DROP TABLE users;SELECT FROM users WHERE password = ''
AND password = 'abc';
Multi-command inj. attacks (other DBs)
19Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Consider another input: user: '; SHUTDOWN WITH NOWAIT; SELECT FROM users WHERE password = '
pass: abc
SELECT * FROM usersWHERE user = u AND password = p;
SELECT * FROM usersWHERE user = u AND password = p;
SELECT * FROM users
WHERE user = ''; SHUTDOWN WITH NOWAIT; SELECT FROM users WHERE password = '' AND password = 'abc';
SELECT * FROM users
WHERE user = ''; SHUTDOWN WITH NOWAIT; SELECT FROM users WHERE password = '' AND password = 'abc';
SELECT * FROM users WHERE user = '';SHUTDOWN WITH NOWAIT;SELECT FROM users WHERE password = ''
AND password = 'abc';
SELECT * FROM users WHERE user = '';SHUTDOWN WITH NOWAIT;SELECT FROM users WHERE password = ''
AND password = 'abc';
Multi-command inj. attacks (other DBs)
20Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Injection attacks – MySQL/Perl/PHP
Consider another input: user: your-boss pass: ' OR 1=1 AND user = 'your-boss
Delete your boss!
DELETE FROM usersWHERE user = u AND password = p;
DELETE FROM usersWHERE user = u AND password = p;
DELETE FROM usersWHERE user = 'your-boss' AND pass = ' ' OR 1=1 AND user = 'your-boss';
DELETE FROM usersWHERE user = 'your-boss' AND pass = ' ' OR 1=1 AND user = 'your-boss';
http://pages.stern.nyu.edu/~mjohnson/dbms/php/users.php
DELETE FROM usersWHERE user = 'your-boss'
AND pass = ''OR 1=1AND user = 'your-boss';
DELETE FROM usersWHERE user = 'your-boss'
AND pass = ''OR 1=1AND user = 'your-boss';
21Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
http://pages.stern.nyu.edu/~mjohnson/dbms/php/users.php
Injection attacks – MySQL/Perl/PHP
Consider another input: user: ' OR 1=1 OR user = ' pass: ' OR 1=1 OR user = '
Delete everyone!
DELETE FROM usersWHERE user = u AND pass = p;
DELETE FROM usersWHERE user = u AND pass = p;
DELETE FROM users
WHERE user = '' OR 1=1 OR user = '' AND pass = '' OR 1=1 OR user = '';
DELETE FROM users
WHERE user = '' OR 1=1 OR user = '' AND pass = '' OR 1=1 OR user = '';
DELETE FROM usersWHERE user = ''
OR 1=1OR user = ''AND pass = ''OR 1=1OR user = '';
DELETE FROM usersWHERE user = ''
OR 1=1OR user = ''AND pass = ''OR 1=1OR user = '';
22Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Preventing injection attacks Ultimate source of problem: quotes Soln 1: don’t allow quotes!
Reject any entered data containing single quotes Q: Is this satisfactory?
Does Amazon need to sell O’Reilly books?
Soln 2: escape any single quotes Replace any ' with a '' or \' In Perl, use taint mode – won’t show In PHP, turn on magic_quotes_gpc flag in .htaccess
show both PHP versions
23Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Preventing injection attacks Soln 3: use prepare parameter-based queries
Supported in JDBC, Perl DBI, PHP ext/mysqli http://pages.stern.nyu.edu/~mjohnson/dbms/perl/loginsafe.cgi http://pages.stern.nyu.edu/~mjohnson/dbms/perl/userssafe.cgi
Very dangerous: using tainted data to run commands at the Unix command prompt Semi-colons, prime char, etc. Safest: define set if legal chars, not illegal ones
24Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Preventing injection attacks When to do security checking for quotes,
etc.? Natural choice: in client-side data validation But not enough!
As saw earlier: can submit GET and POST params manually
Must do security checking on server
25Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
More Info phpGB MySQL Injection Vulnerability
http://www.securiteam.com/unixfocus/6X00O1P5PY.html
"How I hacked PacketStorm“ http://www.wiretrip.net/rfp/txt/rfp2k01.txt
26Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
SQL*Plus settings
SQL> SET RECSEP OFFSQL> COLUMN text FORMAT A60
SQL> SET RECSEP OFFSQL> COLUMN text FORMAT A60
27Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
New topic: Regular Expressions In automata theory, Finite Automata are the
simplest weakest of computer, Turing Machines the strongest Chomsky’s Hierarchy
FA are equivalent to a regular expression Expressions that specify a pattern Can check whether a string matches the pattern
28Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
RegEx matching Use REGEX_LIKE Metachar for any char is . First, get employee_comment table:
http://pages.stern.nyu.edu/~mjohnson/oracle/empcomm.sql
Now do search:
So far, like LIKE
SELECT emp_id, textFROM employee_commentWHERE REGEXP_LIKE(text,'...-....');
SELECT emp_id, textFROM employee_commentWHERE REGEXP_LIKE(text,'...-....');
29Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
RegEx matching Can also pull out the matching text with
REGEXP_SUBSTR:
If want only numbers, can specify a set of chars rather than a dot:
SELECT emp_id, REGEXP_SUBSTR(text,'...-....') textFROM employee_commentWHERE REGEXP_LIKE(text,'...-....');
SELECT emp_id, REGEXP_SUBSTR(text,'...-....') textFROM employee_commentWHERE REGEXP_LIKE(text,'...-....');
SELECT emp_id, REGEXP_SUBSTR(text, '[0123456789]..-...[0123456789]') textFROM employee_commentWHERE REGEXP_LIKE(text, '[0123456789]..-...[0123456789]');
SELECT emp_id, REGEXP_SUBSTR(text, '[0123456789]..-...[0123456789]') textFROM employee_commentWHERE REGEXP_LIKE(text, '[0123456789]..-...[0123456789]');
30Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
RegEx matching Or can specify a range of chars:
Or, finally, can state how many copies to match:
SELECT emp_id, REGEXP_SUBSTR(text, '[0-9]..-....') textFROM employee_commentWHERE REGEXP_LIKE(text,'...-....');
SELECT emp_id, REGEXP_SUBSTR(text, '[0-9]..-....') textFROM employee_commentWHERE REGEXP_LIKE(text,'...-....');
SELECT emp_id, REGEXP_SUBSTR(text,'[0-9]{3}-[0-9]{4}') text
FROM employee_commentWHERE REGEXP_LIKE(text,'[0-9]{3}-[0-9]{4}');
SELECT emp_id, REGEXP_SUBSTR(text,'[0-9]{3}-[0-9]{4}') text
FROM employee_commentWHERE REGEXP_LIKE(text,'[0-9]{3}-[0-9]{4}');
31Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
RegExp matching Other operators:
* - 0 or more matches + - 1 or more matches ? - 0 or 1 match
Also, can OR options together with | op Here: some phone nums have area codes, some
not, so want to match both:
SELECT emp_id, REGEXP_SUBSTR(text,'[0-9]{3}-[0-9]{3}-[0-9]{4}|[0-9]{3}-
[0-9]{4}') textFROM employee_commentWHERE REGEXP_LIKE(text,'[0-9]{3}-[0-9]{3}-[0-9]{4}|[0-9]{3}-[0-9]{4}');
SELECT emp_id, REGEXP_SUBSTR(text,'[0-9]{3}-[0-9]{3}-[0-9]{4}|[0-9]{3}-
[0-9]{4}') textFROM employee_commentWHERE REGEXP_LIKE(text,'[0-9]{3}-[0-9]{3}-[0-9]{4}|[0-9]{3}-[0-9]{4}');
32Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
RegExp matching Order of ORed together patterns matters:
First matching pattern wins
SELECT emp_id, REGEXP_SUBSTR(text,'[0-9]{3}-[0-9]{4}|[0-9]{3}-[0-9]{3}-
[0-9]{4}') textFROM employee_commentWHERE REGEXP_LIKE(text,'[0-9]{3}-[0-9]{4}|[0-9]{3}-[0-9]{3}-[0-9]{4}');
SELECT emp_id, REGEXP_SUBSTR(text,'[0-9]{3}-[0-9]{4}|[0-9]{3}-[0-9]{3}-
[0-9]{4}') textFROM employee_commentWHERE REGEXP_LIKE(text,'[0-9]{3}-[0-9]{4}|[0-9]{3}-[0-9]{3}-[0-9]{4}');
33Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
RegExp matching There’s a shared structure between the two,
tho Area code is just optional Can use ? op
SELECT emp_id, REGEXP_SUBSTR(text,'([0-9]{3}-)?[0-9]{3}-[0-9]{4}') text
FROM employee_commentWHERE REGEXP_LIKE(text,'([0-9]{3}-)?[0-9]{3}-[0-9]{4}');
SELECT emp_id, REGEXP_SUBSTR(text,'([0-9]{3}-)?[0-9]{3}-[0-9]{4}') text
FROM employee_commentWHERE REGEXP_LIKE(text,'([0-9]{3}-)?[0-9]{3}-[0-9]{4}');
34Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
RegExp matching Also, different kinds of separators:
dash, dot, just blank Can OR together whole number patterns Better: Just use set of choices of each sep.
SELECT emp_id, REGEXP_SUBSTR(text, '([0-9]{3}[-. ])?[0-9]{3}[-. ][0-9]{4}') textFROM employee_commentWHERE REGEXP_LIKE(text,'([0-9]{3}[-. ])?[0-9]{3}[-. ][0-9]{4}');
SELECT emp_id, REGEXP_SUBSTR(text, '([0-9]{3}[-. ])?[0-9]{3}[-. ][0-9]{4}') textFROM employee_commentWHERE REGEXP_LIKE(text,'([0-9]{3}[-. ])?[0-9]{3}[-. ][0-9]{4}');
35Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
RegExp matching One other thing: area codes in parentheses
Of course, area codes are still optional Parentheses must be escaped - \( \)
SELECT emp_id, REGEXP_SUBSTR(text, '([0-9]{3}[-. ]|\([0-9]{3}\) )?[0-9]{3}[-. ][0-9]{4}') textFROM employee_commentWHERE REGEXP_LIKE(text,'([0-9]{3}[-. ]|\([0-9]{3}\) )?[0-9]{3}[-. ][0-9]{4}');
SELECT emp_id, REGEXP_SUBSTR(text, '([0-9]{3}[-. ]|\([0-9]{3}\) )?[0-9]{3}[-. ][0-9]{4}') textFROM employee_commentWHERE REGEXP_LIKE(text,'([0-9]{3}[-. ]|\([0-9]{3}\) )?[0-9]{3}[-. ][0-9]{4}');
36Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
And now for something completely different: XML XML: eXtensible Mark-up Language
Very popular language for semi-structured data
Mark-up language: consists of elements composed of tags, like HTML
Emerging lingua franca of the Internet, Web Services, inter-vender comm
37Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Unstructured data At one end of continuum: unstructured data
Text files Stock market prices CIA intelligence intercepts Audio recordings “Just one damn bit after another”
~ Henry Ford
No (intentional, formal) patterns to the data Difficult to manage/make sense of
Why we need data-mining
38Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Structured data At the other end: structured data
Tables in RDBMSs Data organized into semantic chunks
entities Similar/related entities grouped together
Relationships, classes Entities in same group have same structure
Same fields/attributes/properties
Easy to make sense of But sometimes too rigid a req. Difficult to send—convert to tab-delimited
39Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Semi-structured data Not too random
Data organized into entities Similar/related grouped to form other entities
Not too structured Some attributes may be missing Size of attributes may vary
Support of lists/sets
Juuust Right Data is self-describing
40Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Semi-structured data Predominant examples:
HTML: HyperText Mark-up Language XML: eXtensible Mark-up Language
NB: both mark-up languages (use tags) Mark-up lends self of semi-structured data
Demarcate boundaries for entities But freely allow other entities inside
41Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Data model for semi-structured data Usually represented as directed graphs Graph: set of vertices (nodes) and edges
Dots connected by lines; not nec. a tree!
In model, Nodes ~ entities or fields/attributes Edges ~ attribute-of/sub-entity-of
Example: publisher publishes >=0 books Each book has one title, one year, >=1 authors Draw publishers graph
42Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
XML is a SSD language Standard published by W3C
Officially announced/recommended in 1998
XML != HTML XML != a replacement for HTML Both are mark-up languages
Big diffs:1. XML doesn’t use predefined tags (!)
But it’s extensible: tags can be added2. HTML is about presentation: <I>, <B>, <P>
XML is about content: <book>, <author>
43Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
XML syntax Like HTML in many respects but more strict
All tags must be closed Can’t have: this is a line<br> Every start tag has an end tag Although <br/> style can replace both
IS case-sensitive IS space-sensitive
XML doc has a unique root element
44Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
XML syntax Tags must be properly nested
Not allowed <b><i>I’m not kidding</b></i> Intuition: file folders
Elements may have quoted attributes <Myelm myatt=“myval”>…</Myelm>
Comments same as in HTML: <!-- Pay no attention… -->
Draw publishers XML
45Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Escape chars in XML Some chars must be escaped
Distinguish content from syntax
Can also declare value to be pure text:
> <
< >
& &
" "
' '
<aRealTag> <![CDATA[<notAtag>jsdljsd<neitherAmI<“'><>>]]></aRealTag>
<aRealTag> <![CDATA[<notAtag>jsdljsd<neitherAmI<“'><>>]]></aRealTag>
<elm>3 < 5</elm><elm>3 < 5</elm>
<elm>"Don't call me 'Ishmael'!"</elm>
<elm>"Don't call me 'Ishmael'!"</elm>
46Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
XML Namespaces Different schemas/DTDs may overlap
XHTML and MathML share some tags Soln: namespaces
as in Java/C++/C#
<book xmlns:isbn="www.isbn-org.org/def">
<title>...</title>
<number>15</number>
<isbn:number>...</isbn:number>
</book>
<book xmlns:isbn="www.isbn-org.org/def">
<title>...</title>
<number>15</number>
<isbn:number>...</isbn:number>
</book>
47Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
<persons> <row><name>Michael</name> <ssn>123</ssn></row> <row><name>Hilary</name> <ssn>456</ssn></row> <row><name>Bill</name> <ssn>789</ssn></row></persons>
<persons> <row><name>Michael</name> <ssn>123</ssn></row> <row><name>Hilary</name> <ssn>456</ssn></row> <row><name>Bill</name> <ssn>789</ssn></row></persons>
row row row
name name namessn ssn ssn
“Michael” 123 “Hilary” “Bill”456 789
personsXML:
persons
From Relational Data to XML Data
Name SSN Mailing-address
Michael 123 NY
Hilary 456 DC
Bill 789 Chappaqua
48Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Semi-structured Data Explained List-valued attributes
XML is not 1NF!
Impossible in (single, BCNF) tables:
two phones!
name phone
Bill914-222-2222
212-333-3333
???
<persons> <row><name>Hilary</name> <phone>202-222-2222</phone> <phone>914-222-2222</phone></row> <row><name>Bill</name> <phone>914-222-2222</phone> <phone>212-333-3333</phone></row></persons>
<persons> <row><name>Hilary</name> <phone>202-222-2222</phone> <phone>914-222-2222</phone></row> <row><name>Bill</name> <phone>914-222-2222</phone> <phone>212-333-3333</phone></row></persons>
49Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Object ids and References SSD graph might not be trees! But XML docs must be
Would cause much redundancy Soln: same concept as pointers in C/C++/J
Object ids and references
Graph example: Movies: Lost in Translation, Hamlet Stars: Bill Murray, Scarlet Johansson
<movieinfo>
<movie id="o111">
<title>Lost in Translation</title>
<year>2003</year>
<stars idref="o333 o444"/>
</movie>
<movie id="o222">
<title>Hamlet</title>
<year>1999</year>
<stars idref="o333"/>
</movie> <person id="o456">
<person id="o111">
<name>Bill Murray</name>
<movies idref="o111 o222"/>
</person>
</movieinfo>
<movieinfo>
<movie id="o111">
<title>Lost in Translation</title>
<year>2003</year>
<stars idref="o333 o444"/>
</movie>
<movie id="o222">
<title>Hamlet</title>
<year>1999</year>
<stars idref="o333"/>
</movie> <person id="o456">
<person id="o111">
<name>Bill Murray</name>
<movies idref="o111 o222"/>
</person>
</movieinfo>
50Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
What do we do with XML? Things done with XML:
Send to partners Parse XML received Convert to RDBMS rows Query for particular data Convert to other XML Convert to formats other than XML
Lots of tools/standards for these…
51Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
DTDs & understanding XML XML is extensible Advantage: when creating, we can use any
tags we like Disadv: when reading, they can use any tags
they like Using XML docs a priori is very difficult
Solution: impose some constraints
52Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
DTDs DTD: Document Type Definition
You and partners/vertical industry/academic discipline decide on a DTD/schema for your docs Specify which entities you may use/must understand Specify legal relationships
DTD specifies the grammar to be used DTD = set of rules for creating valid entities
DTD tells your software what to look for in doc
53Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
DTD examples Well-formed XML v. valid XML
Simple example: http://pages.stern.nyu.edu/~mjohnson/dbms/xml/note.xml http://pages.stern.nyu.edu/~mjohnson/dbms/xml/badnote.xml http://pages.stern.nyu.edu/~mjohnson/dbms/xml/badnote2.xml Copy from: http://pages.stern.nyu.edu/~mjohnson/dbms/eg/xml.txt
Partial publisher example rules: Root publisher Publisher name, book*, author* Book title, date, author+ Author firstname, middlename?, lastname
54Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Partial DTD example (typos!)<?xml version=“1.0” encoding=“UTF-8” ?><!DOCTYPE PUBLISHER [<!ELEMENT PUBLISHER (name, book*, author*)><!ELEMENT name (#PCDATA)><!ELEMENT BOOK (title, date, author+)><!ELEMENT AUTHOR (firstname, middlename?,
lastname><!ELEMENT firstname (#PCDATA)><!ELEMENT lastname (#PCDATA)><!ELEMENT middlename (#PCDATA)>
<?xml version=“1.0” encoding=“UTF-8” ?><!DOCTYPE PUBLISHER [<!ELEMENT PUBLISHER (name, book*, author*)><!ELEMENT name (#PCDATA)><!ELEMENT BOOK (title, date, author+)><!ELEMENT AUTHOR (firstname, middlename?,
lastname><!ELEMENT firstname (#PCDATA)><!ELEMENT lastname (#PCDATA)><!ELEMENT middlename (#PCDATA)>
DTD is not XML, but can be embedded in or ref.ed from XML Replacement for DTDs is XML Schema
55Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
XML Applications/dialects MathML: Mathematical Markup Language
http://wwwasdoc.web.cern.ch/wwwasdoc/WWW/publications/ictp99/ictp99N8059.html
VoiceXML: http://newmedia.purchase.edu/~Jeanine/interfaces/rps.xml
ChemML: Chemical Markup Language
XHMTL: HTML retrofitted as an XML application
56Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
XML Applications/dialects VoiceXML:
http://newmedia.purchase.edu/~Jeanine/interfaces/rps.xml AT&T Directory Assistance http://phone.yahoo.com/
Image from http://www.voicexml.org/tutorials/intro2.html
57Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
More XML Apps FIXML
XML equiv. of FIX: Financial Information eXchange
swiftML XML equiv. of SWIFT: Society for Worldwide Interbank
Financial Telecommunications message format
Apache’s Ant Scripting language for Java build management http://ant.apache.org/manual/using.html
Many more: http://www-106.ibm.com/developerworks/xml/library/x-stand4/
58Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
More XML Applications/Protocols RSS: Rich Site Summary/Really Simple
Syndication News sites, blogs… http://slate.msn.com/rss/ http://slashdot.org/index.rss Screenshot
http://paulboutin.weblogger.com/pictures/viewer$673 More info: http://slate.msn.com/id/2096660/
<channel><title>my channel</title><item> <title>story 1</title> <link>…</link></item>// other items</channel>
<channel><title>my channel</title><item> <title>story 1</title> <link>…</link></item>// other items</channel>
59Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
More XML Applications/Protocols SOAP: Simple Object Access Protocol
XML-based messaging format Used by Google API: http://www.google.com/apis/ Amazon API: http://amazon.com/gp/aws/landing.html Amazon light: http://kokogiak.com/amazon/ Other examples:
http://www.wired.com/wired/archive/12.03/google.html?pg=10&topic=&topic_set=
SOAP envelope with header and body Request sales tax for total
<SOAP:Envelope xmlns:SOAP="urn:schemas-xmlsoap-org:soap.v1"> <SOAP:Header></SOAP:Header> <SOAP:Body> <GetSalesTax> <SalesTotal>100</SalesTotal> <GetSalesTax> </SOAP:Body></SOAP:Envelope>
<SOAP:Envelope xmlns:SOAP="urn:schemas-xmlsoap-org:soap.v1"> <SOAP:Header></SOAP:Header> <SOAP:Body> <GetSalesTax> <SalesTotal>100</SalesTotal> <GetSalesTax> </SOAP:Body></SOAP:Envelope>
60Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
More XML Applications/Protocols<?xml version="1.0" encoding="UTF-8"?><soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body> <gs:doGoogleSearch xmlns:gs="urn:GoogleSearch"> <key>%(key)s</key> <start>0</start> <maxResults>10</maxResults> <filter>true</filter> <restrict/> <safeSearch>false</safeSearch> <lr/> </gs:doGoogleSearch> </soap:Body></soap:Envelope>
<?xml version="1.0" encoding="UTF-8"?><soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body> <gs:doGoogleSearch xmlns:gs="urn:GoogleSearch"> <key>%(key)s</key> <start>0</start> <maxResults>10</maxResults> <filter>true</filter> <restrict/> <safeSearch>false</safeSearch> <lr/> </gs:doGoogleSearch> </soap:Body></soap:Envelope>
61Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
New topic: XML in Oracle - purchase-order e.g
<?xml version="1.0"?><purchase_order> <customer_name>Alpha Tech</customer_name> <po_number>11257></po_number> <po_date>2004-01-20</po_date> <po_items> <item> <part_number>AI5-4557</part_number> <quantity>20</quantity> </item> <item> <part_number>EI-T5-001</part_number> <quantity>12</quantity> </item> </po_items></purchase_order>
<?xml version="1.0"?><purchase_order> <customer_name>Alpha Tech</customer_name> <po_number>11257></po_number> <po_date>2004-01-20</po_date> <po_items> <item> <part_number>AI5-4557</part_number> <quantity>20</quantity> </item> <item> <part_number>EI-T5-001</part_number> <quantity>12</quantity> </item> </po_items></purchase_order>
62Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Storing XML data As of 9i, has XMLType data type
By default, underlying storage is as CLOB
CREATE TABLE purchase_order( po_id number(5) not null, customer_po_nbr varchar(20), customer_inception_date date, order_nbr number(5), purchase_order_doc xmltype, constraint purchase_order_pk primary key(po_id));
CREATE TABLE purchase_order( po_id number(5) not null, customer_po_nbr varchar(20), customer_inception_date date, order_nbr number(5), purchase_order_doc xmltype, constraint purchase_order_pk primary key(po_id));
63Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Loading XML into Oracle First, log in as sys:
Now scott can import:
connect sys/junk as sysdbacreate directory xml_data as '/xml';grant read, write on directory xml_data to scott;
connect sys/junk as sysdbacreate directory xml_data as '/xml';grant read, write on directory xml_data to scott;
connect scott/tiger
declare bf1 bfile;beginbf1 := bfilename('XML_DATA', 'purch_ord.xml');insert into purchase_order(po_id, purchase_order_doc) values(1000, xmltype(bf1,
nls_charset_id('we8mswin1252')));end;
connect scott/tiger
declare bf1 bfile;beginbf1 := bfilename('XML_DATA', 'purch_ord.xml');insert into purchase_order(po_id, purchase_order_doc) values(1000, xmltype(bf1,
nls_charset_id('we8mswin1252')));end;
64Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Loading XML into Oracle Not just loading raw text
XMLType data must be well-formed Parsable as XML
Try modifying customer_name open tag
65Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Accessing XML in Oracle Now can look at raw XML:
Can also use XPath to extract particular nodes and values, with extract function:
SQL> SELECT purchase_order_docFROM purchase_order;
SQL> SELECT purchase_order_docFROM purchase_order;
SQL> SELECT extract(purchase_order_doc, '/purchase_order/customer_name')FROM purchase_order;
SQL> SELECT extract(purchase_order_doc, '/purchase_order/customer_name')FROM purchase_order;
66Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
XPath in Oracle Can also extract all nodes of one type, underneath some
node, with double-slash // All purchase order items
NB: this is not valid XML No unique root Can request just one with bracket op Numbering starts at 1, not 0 Wrong name/number no error, no results
SQL> SELECT extract(purchase_order_doc, '/purchase_order/po_items/item[2]')FROM purchase_order;
SQL> SELECT extract(purchase_order_doc, '/purchase_order/po_items/item[2]')FROM purchase_order;
SQL> SELECT extract(purchase_order_doc, '/purchase_order//item')FROM purchase_order;
SQL> SELECT extract(purchase_order_doc, '/purchase_order//item')FROM purchase_order;
67Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
extract v. extractvalue extractvalue returns value, not whole node:
vs.
extractvalue applies only to unique nodes:
SQL> SELECT extractvalue(purchase_order_doc, '/purchase_order/customer_name')FROM purchase_order;
SQL> SELECT extractvalue(purchase_order_doc, '/purchase_order/customer_name')FROM purchase_order;
SQL> SELECT extract(purchase_order_doc, '/purchase_order/customer_name')FROM purchase_order;
SQL> SELECT extract(purchase_order_doc, '/purchase_order/customer_name')FROM purchase_order;
SQL> SELECT extractvalue(purchase_order_doc, '/purchase_order/po_items')FROM purchase_order;
SQL> SELECT extractvalue(purchase_order_doc, '/purchase_order/po_items')FROM purchase_order;
68Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
existsnode function Can check whether node/location exists with
existnode function Returns 1 or 0
Also applies to bracketed paths:
SQL> SELECT po_id FROM purchase_orderWHERE existsnode(purchase_order_doc, '/purchase_order/customer_name') = 1;
SQL> SELECT po_id FROM purchase_orderWHERE existsnode(purchase_order_doc, '/purchase_order/customer_name') = 1;
SQL> SELECT po_id FROM purchase_orderWHERE existsnode(purchase_order_doc, '/purchase_order/po_items/item[1]') = 1;
SQL> SELECT po_id FROM purchase_orderWHERE existsnode(purchase_order_doc, '/purchase_order/po_items/item[1]') = 1;
69Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Moving data from XML to relations To move single values from XML to tables, can
simply use extractvalue in UPDATE statements:
SQL> UPDATE purchase_orderSET order_nbr = 7101,customer_po_nbr = extractvalue(purchase_order_doc, '/purchase_order/po_number'),customer_inception_date =
to_date(extractvalue(purchase_order_doc,'/purchase_order/po_date'), 'yyyy-mm-dd');
SQL> UPDATE purchase_orderSET order_nbr = 7101,customer_po_nbr = extractvalue(purchase_order_doc, '/purchase_order/po_number'),customer_inception_date =
to_date(extractvalue(purchase_order_doc,'/purchase_order/po_date'), 'yyyy-mm-dd');
70Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Moving data from XML to relations What about moving set of nodes
The two item nodes
Use xmlsequence to get a varray of items Use TABLE to convert to a relation
SQL> SELECT extract(purchase_order_doc, '/purchase_order//item')
FROM purchase_order;
SQL> SELECT extract(purchase_order_doc, '/purchase_order//item')
FROM purchase_order;
SQL> SELECT rownum, item.* FROM TABLE(SELECT xmlsequence(extract(purchase_order_doc, '/purchase_order//item'))FROM purchase_order) item;
SQL> SELECT rownum, item.* FROM TABLE(SELECT xmlsequence(extract(purchase_order_doc, '/purchase_order//item'))FROM purchase_order) item;
71Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Moving data from XML to relations Result is a two-row relation with XMLTypes Can use extractvalue to extract this data First, create destination table:
CREATE TABLE LINE_ITEM( ORDER_NBR NUMBER(9) NOT NULL, PART_NBR VARCHAR2(20) NOT NULL, QTY NUMBER(5) NOT NULL, FILLED_QTY NUMBER(5), CONSTRAINT line_item_pk PRIMARY KEY (ORDER_NBR,PART_NBR));
CREATE TABLE LINE_ITEM( ORDER_NBR NUMBER(9) NOT NULL, PART_NBR VARCHAR2(20) NOT NULL, QTY NUMBER(5) NOT NULL, FILLED_QTY NUMBER(5), CONSTRAINT line_item_pk PRIMARY KEY (ORDER_NBR,PART_NBR));
72Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Moving data from XML to relations Then insert results:
SQL> INSERT INTO line_item(order_nbr,part_nbr,qty)SELECT 7109, extractvalue(column_value, '/item/part_number'),
extractvalue(column_value, '/item/quantity')FROM TABLE(
SELECT xmlsequence(extract(purchase_order_doc, '/purchase_order//item'))
FROM purchase_order);
SQL> INSERT INTO line_item(order_nbr,part_nbr,qty)SELECT 7109, extractvalue(column_value, '/item/part_number'),
extractvalue(column_value, '/item/quantity')FROM TABLE(
SELECT xmlsequence(extract(purchase_order_doc, '/purchase_order//item'))
FROM purchase_order);
73Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
XML Schemas and Oracle By default, XML must be well-formed to be read into
the XMLType field XML is valid if it conforms to a schema To use a schema with Oracle, must first register it:
declare bf1 bfile;beginbf1 := bfilename('XML_DATA',
'purch_ord.xsd');dbms_xmlschema.registerschema('http://localhost:8080/home/xml/schemas/purch_ord.xsd', bf1);end;
declare bf1 bfile;beginbf1 := bfilename('XML_DATA',
'purch_ord.xsd');dbms_xmlschema.registerschema('http://localhost:8080/home/xml/schemas/purch_ord.xsd', bf1);end;
74Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
XML Schemas and Oracle With schema registered, can apply it to an XMLType field
CREATE TABLE purchase_order2 (po_id NUMBER(5) NOT NULL, customer_po_nbr VARCHAR2(20), customer_inception_date DATE, order_nbr NUMBER(5), purchase_order_doc XMLTYPE, CONSTRAINT purchase_order2_pk PRIMARY KEY (po_id))XMLTYPE COLUMN purchase_order_doc XMLSCHEMA "http://localhost:8080/home/xml/schemas/purch_ord.xsd"
ELEMENT "purchase_order";
CREATE TABLE purchase_order2 (po_id NUMBER(5) NOT NULL, customer_po_nbr VARCHAR2(20), customer_inception_date DATE, order_nbr NUMBER(5), purchase_order_doc XMLTYPE, CONSTRAINT purchase_order2_pk PRIMARY KEY (po_id))XMLTYPE COLUMN purchase_order_doc XMLSCHEMA "http://localhost:8080/home/xml/schemas/purch_ord.xsd"
ELEMENT "purchase_order";
75Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Importing to schema field Try to import xml file, get error:
declare bf1 bfile;begin bf1 := bfilename('XML_DATA', 'purch_ord.xml'); insert into purchase_order2(po_id, purchase_order_doc) values (2000, XMLTYPE(bf1, nls_charset_id('WE8MSWIN1252')));end;
declare bf1 bfile;begin bf1 := bfilename('XML_DATA', 'purch_ord.xml'); insert into purchase_order2(po_id, purchase_order_doc) values (2000, XMLTYPE(bf1, nls_charset_id('WE8MSWIN1252')));end;
76Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Importing to schema field Root node of XML must specify the schema Change root to the following:
Now can import Also fails if extra or missing nodes
Modify company_name node Add new comments node
<purchase_order xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:noNamespaceSchemaLocation="http://localhost:8080/home/xml/schemas/purch_ord.xsd">
<purchase_order xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:noNamespaceSchemaLocation="http://localhost:8080/home/xml/schemas/purch_ord.xsd">
77Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Can check to see whether schema is used Can call isSchemaBased(), getSchemaURL()
and isSchemaValid() on XMLType fields:
SQL> select po.purchase_order_doc.isSchemaBased(),po.purchase_order_doc.getSchemaURL(),po.purchase_order_doc.isSchemaValid()
from purchase_order2 po;
SQL> select po.purchase_order_doc.isSchemaBased(),po.purchase_order_doc.getSchemaURL(),po.purchase_order_doc.isSchemaValid()
from purchase_order2 po;
78Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Updating XMLType data Can update XMLType data with ordinary
UPDATE statements:
Replaces whole XMLType object with new one
SQL> UPDATE purchase_order poSET po.purchase_order_doc = XMLTYPE(BFILENAME('XML_DATA', 'purch_ord_alt.xml'), nls_charset_id('WE8MSWIN1252'))WHERE po.po_id = 2000;
SQL> UPDATE purchase_order poSET po.purchase_order_doc = XMLTYPE(BFILENAME('XML_DATA', 'purch_ord_alt.xml'), nls_charset_id('WE8MSWIN1252'))WHERE po.po_id = 2000;
79Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Updating XMLType data Can also modify the existing XMLType object
By writing node values updateXML() function does search/replace
But searches for node, not value
SQL> SELECT extract(po.purchase_order_doc,'/purchase_order/customer_name') FROM purchase_order poWHERE po_id = 1000;
SQL> UPDATE purchase_order poSET po.purchase_order_doc = updateXML(po.purchase_order_doc,'/purchase_order/customer_name/text()', 'some other company')WHERE po.po_id = 1000;
SQL> SELECT extract(po.purchase_order_doc,'/purchase_order/customer_name') FROM purchase_order poWHERE po_id = 1000;
SQL> UPDATE purchase_order poSET po.purchase_order_doc = updateXML(po.purchase_order_doc,'/purchase_order/customer_name/text()', 'some other company')WHERE po.po_id = 1000;
80Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Updating XMLType data Can also write whole node, using XMLType:
Validation/well-formedness is still checked
SQL> UPDATE purchase_order poSET po.purchase_order_doc =
updateXML(po.purchase_order_doc,'/purchase_order/customer_name',XMLTYPE('<customer_name>some third
company</customer_name>'))WHERE po.po_id = 1000;
SQL> SELECT extract(po.purchase_order_doc,'/purchase_order/customer_name')
FROM purchase_order poWHERE po_id = 1000;
SQL> UPDATE purchase_order poSET po.purchase_order_doc =
updateXML(po.purchase_order_doc,'/purchase_order/customer_name',XMLTYPE('<customer_name>some third
company</customer_name>'))WHERE po.po_id = 1000;
SQL> SELECT extract(po.purchase_order_doc,'/purchase_order/customer_name')
FROM purchase_order poWHERE po_id = 1000;
81Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Updating XMLType data And can update items in a collection:
SQL> SELECT extract(po.purchase_order_doc, '/purchase_order//item')FROM purchase_order poWHERE po.po_id = 1000;
SQL> UPDATE purchase_order poSET po.purchase_order_doc = updateXML(po.purchase_order_doc, '/purchase_order/po_items/item[1]', XMLTYPE('<item><part_number>T-1000</part_number><quantity>33</quantity></item>'))WHERE po.po_id = 1000;
SQL> SELECT extract(po.purchase_order_doc, '/purchase_order//item')FROM purchase_order poWHERE po.po_id = 1000;
SQL> UPDATE purchase_order poSET po.purchase_order_doc = updateXML(po.purchase_order_doc, '/purchase_order/po_items/item[1]', XMLTYPE('<item><part_number>T-1000</part_number><quantity>33</quantity></item>'))WHERE po.po_id = 1000;
82Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Converting relational data to XML Saw how to put XML in a table Conversely, can convert ordinary relational
data to XML XMLElement() generates an XML node
First, create supplier table:CREATE TABLE SUPPLIER( SUPPLIER_ID NUMBER(5) NOT NULL, NAME VARCHAR2(30) NOT NULL, PRIMARY KEY (SUPPLIER_ID));insert into supplier values(1, 'Acme');insert into supplier values(2, 'Tilton');insert into supplier values(3, 'Eastern');
CREATE TABLE SUPPLIER( SUPPLIER_ID NUMBER(5) NOT NULL, NAME VARCHAR2(30) NOT NULL, PRIMARY KEY (SUPPLIER_ID));insert into supplier values(1, 'Acme');insert into supplier values(2, 'Tilton');insert into supplier values(3, 'Eastern');
83Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Converting relational data to XML Now can call XMLElement function to wrap values in
tags:
And can build it up:
Don’t concatenate! Turns to strings, escapes < > Error in book
SELECT XMLElement("supplier_id", s.supplier_id) ||XMLElement("name", s.name) xml_fragment
FROM supplier s;
SELECT XMLElement("supplier_id", s.supplier_id) ||XMLElement("name", s.name) xml_fragment
FROM supplier s;
SELECT XMLElement("supplier",XMLElement("supplier_id", s.supplier_id), XMLElement("name", s.name))
FROM supplier s;
SELECT XMLElement("supplier",XMLElement("supplier_id", s.supplier_id), XMLElement("name", s.name))
FROM supplier s;
84Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
XMLForest() More simply, can use XMLForest() function:
SELECT XMLElement("supplier", XMLForest(s.supplier_id, s.name))FROM supplier s;
SELECT XMLElement("supplier", XMLForest(s.supplier_id, s.name))FROM supplier s;
85Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
XMLAgg() Can use XMLAgg() to put nodes together
inside another node:
SELECT XMLElement("supplier_list", XMLAgg(XMLElement("supplier", XMLElement("supplier_id", s.supplier_id), XMLElement("name", s.name) ))) xml_documentFROM supplier s;
SELECT XMLElement("supplier_list", XMLAgg(XMLElement("supplier", XMLElement("supplier_id", s.supplier_id), XMLElement("name", s.name) ))) xml_documentFROM supplier s;
86Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
New topic: Data Warehousing Physical warehouse: stores different kinds of items
combined from different sources in supply chain access items as a combined package “Synergy”
DW is the sys containing the data from many DBs OLAP is the system for easily querying the DW
Online analytical processing front-end to DW & stats
87Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Integrating Data Ad hoc combination of DBs from different sources
can be problematic
Data may be spread across many systems geographically by division different systems from before mergers…
88Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Conversion/scrubbing/merging Lots of issues…
different types of data Varchar(255) v. char(30)
Different values for data ‘GREEN’/’GR/’2
Semantic differences Cars v. Automobiles
Missing values Handle with nulls or XML
89Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Federated DBs Situ: n different DBs must work together
One idea: write programs for each to talk to each other one How many programs required? Like ambassadors for each country
90Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Federated DBs Better idea: introduce another DB
write programs for it to talk to each other DB
Now how many programs? English in business, French in diplomacy
Warehousing Refreshed nightly
91Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
OLTP v. OLAP DWs usually not updated in real-time
data is usually not live but care about higher-level, longer-term patterns For “knowledge workers”/decision-makers
Live data is in system used by OLTP online transaction processing E.g., airline reservations OLTP data loaded into DW periodically, say nightly
92Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Utilizing Data Situ: each time manager has hunch
requests custom reports direct programmers to write/modify SQL app to produce
these results on higher or lower levels, for different specifics
Problem: too difficult/expensive/slow too great a time lag
93Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
EISs Could just write queries at command-prompt
But decision makes aren’t (all) SQL programmers
Soln: create an executive information system provides friendly front-end to common, important queries basically a simple DB front-end your project part 5
GROUP BY queries are particularly applicable…
94Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
EISs v. OLAP Okay for fixed set of queries But what if queries are open-ended?
Q: What’s driving sales in the Northeast? What’s the source cause? Result from one query influences next query tried
OLAP systems are interactive: run query analyze results think of new query repeat
95Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Star Schemas Popular schema for DW data
One central DB surrounded by specific DBs
Center: fact table
Extremities: data tables
Fields in fact table are foreign keys to data tables
Normalization Snowflake Schema May not be worthwhile…
96Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Dates and star schemas OLAP behaves as though you had a Days table,
with every possible row Dates(day, week, month, year, DID) (5, 27, 7, 2000)
Can join on Days like any other table
97Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Dates and star schemas E.g.: products x salesperson x region x date
Products sold by salespeople in regions on dates
Regular dim tables: Product(PID, name, color) Emp(name, SSN, sal) Region(name, RID)
Fact table: Sales(PID, DID, SSN, RID) Interpret as a cube (cross product of all dimensions)
Can have both data and stats
98Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Drill-down & roll-up Imagine: notice some region’s sales way up Why? Good salesperson? Some popular product
there?
Maybe need to search by month, or month and product, abstract back up to just product…
“slicing & dicing”
99Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
OLAP and data warehousing Could write GROUP BY queries for each
OLAP systems provide simpler, non-SQL interface for this sort of thing
Vendors: MicroStrategy, SAP, etc.
Otoh: DW-style operators have been added to SQL and some DBMSs…
100Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
DW extensions in SQL: ROLLUP (Oracle) Suppose have orders table (from two years), with
region and date info:
Can select total sales:
Examples derived/from Mastering Oracle SQL, 2e (O’Reilly) Get data here: http://examples.oreilly.com/mastorasql2/mosql2_data.sql
SELECT sum(o.tot_sales)FROM all_orders o join region rON r.region_id = o.region_id;
SELECT sum(o.tot_sales)FROM all_orders o join region rON r.region_id = o.region_id;
SQL> column month format a10SQL> @mosql2_dataSQL> describe all_orders
SQL> column month format a10SQL> @mosql2_dataSQL> describe all_orders
101Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Can write GROUP BY queries for year or region or both:
SELECT r.name region, o.year, sum(o.tot_sales)FROM all_orders o join region rON r.region_id = o.region_idGROUP BY (r.name, o.year);
SELECT r.name region, o.year, sum(o.tot_sales)FROM all_orders o join region rON r.region_id = o.region_idGROUP BY (r.name, o.year);
DW extensions in SQL: ROLLUP (Oracle)
102Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
ROLLUP operator Extension of GROUP BY Does GROUP BY on several levels, simultaneously Order matters
Get sales totals for each region/year pair each region, and the grand total:
SELECT r.name region, o.year, sum(o.tot_sales)FROM all_orders o join region rON r.region_id = o.region_idGROUP BY ROLLUP (r.name, o.year);
SELECT r.name region, o.year, sum(o.tot_sales)FROM all_orders o join region rON r.region_id = o.region_idGROUP BY ROLLUP (r.name, o.year);
DW extensions in SQL: ROLLUP (Oracle)
103Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Change the order of the group fields to get a different sequence of groups
To get totals for each year/region pair, each year, and the grand total, and just reverse group-by order:
SELECT o.year, r.name region, sum(o.tot_sales)FROM all_orders o join region rON r.region_id = o.region_idGROUP BY ROLLUP (o.year, r.name);
SELECT o.year, r.name region, sum(o.tot_sales)FROM all_orders o join region rON r.region_id = o.region_idGROUP BY ROLLUP (o.year, r.name);
DW extensions in SQL: ROLLUP (Oracle)
104Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Adding more dimensions, like month, is easy (apart from formatting):
NB: summing happens on each level
SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales)FROM all_orders o join region rON r.region_id = o.region_idGROUP BY ROLLUP (o.year, o.month, r.name);
SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales)FROM all_orders o join region rON r.region_id = o.region_idGROUP BY ROLLUP (o.year, o.month, r.name);
DW extensions in SQL: ROLLUP (Oracle)
105Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
If desired, can combine fields for the sake of grouping:
DW extensions in SQL: ROLLUP (Oracle)
SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales)FROM all_orders o join region rON r.region_id = o.region_idGROUP BY ROLLUP ((o.year, o.month), r.name);
SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales)FROM all_orders o join region rON r.region_id = o.region_idGROUP BY ROLLUP ((o.year, o.month), r.name);
106Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
DW extensions in SQL: CUBE (Oracle) Another GROUP BY extension: CUBE
Subtotals all possible combins of group-by fields (powerset) Syntax: “ROLLUP” “CUBE” Order of fields doesn’t matter (apart from ordering)
To get subtotals for each region/month pair, each region, each month, and the grand total:
SELECT to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales)FROM all_orders o join region rON r.region_id = o.region_idGROUP BY CUBE (o.month, r.name);
SELECT to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales)FROM all_orders o join region rON r.region_id = o.region_idGROUP BY CUBE (o.month, r.name);
107Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
DW extensions in SQL: CUBE (Oracle) Again, can easily add more dimensions:
SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales)FROM all_orders o join region rON r.region_id = o.region_idGROUP BY CUBE (o.year, o.month, r.name);
SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales)FROM all_orders o join region rON r.region_id = o.region_idGROUP BY CUBE (o.year, o.month, r.name);
108Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
DW SQL exts: GROUPING SETS (Oracle) That’s a lot of rows Instead of a cube of all combinations, maybe we just
want the totals for each individual field:
SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales)FROM all_orders o join region rON r.region_id = o.region_idGROUP BY GROUPING SETS (o.year, o.month, r.name);
SELECT o.year, to_char(to_date(o.month, 'MM'),'Month') month, r.name region, sum(o.tot_sales)FROM all_orders o join region rON r.region_id = o.region_idGROUP BY GROUPING SETS (o.year, o.month, r.name);
109Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
Next Final evals
More lab…
110Matthew P. Johnson, OCL3, CISDD CUNY, June 2005
That’s all, folks! Selected solutions to exercises:
sqlzoo ~ “Answers” on sqlzoo.net
PL/SQL ~ http://pages.stern.nyu.edu/~mjohnson/oracle/archive/fall04/plsql/
mpjohnson-at-gmail.com