Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:

Oracle OptimizerOracle Optimizer

Combining Output From Combining Output From Multiple Index ScansMultiple Index Scans

• AND-EQUAL: – select * from sailors

where sname = 'Jim' and rating = 10

• Suppose we have 2 indexes: sname, rating

TABLE ACCESS BY ROWID

AND-EQUAL

INDEX RANGE SCAN Sailors(sname)

INDEX RANGE SCAN Sailors(rating)

• Suppose we also have an index on (sname, rating)– How should the query be performed?

Operations that Manipulate Data Operations that Manipulate Data SetsSets

• Up until now, all operations returned the rows as they were found

• There are operations that must find all rows before returning a single row

• Try to avoid these operations for online users!– SORT ORDER BY: query with order by

select sname, age

from Sailors

order by age;


– SORT UNIQUE: sorting records while

eliminating duplicates

e.g., query with distinct; query with minus,

intersect or union

select DISTINCT age from Sailors;

– SORT AGGREGATE, SORT GROUP BY:

queries with aggregate or grouping

functions (like MIN, MAX)

Is the table always Is the table always accessed?accessed?

What if there is no index?


• Consider the query:

– select sname from sailors

union

select bname from boats;


• Consider the query:

– select sname from sailors

minus

select bname from boats;

How do you think that

Oracle implements intersect?union all?

• Select age, COUNT(*)

from Sailors

GROUP BY age

SORT GROUP BY

TABLE ACCESS FULL


DistinctDistinct

• What should Oracle do when

processing the query (assuming that

sid is the primary key):

– select distinct sid

from Sailors

Join MethodsJoin Methods

• Select * from Sailors, Reserves

where Sailors.sid = Reserves.sid

• Oracle can use an index on Sailors.sid

or on Reserves.sid (note that both will

not be used)

• Join Methods: MERGE JOIN, NESTED

LOOPS, HASH JOIN

Nested Loops JoinsNested Loops Joins

• Block nested loop join

NESTED LOOPS

TABLE ACCESS FULL OF our_outer_table

TABLE ACCESS FULL OF our_inner_table

• Index nested loop joinNESTED LOOPS

TABLE ACCESS FULL OF our_outer_table

TABLE ACCESS BY ROWID OF our_inner_table

INDEX RANGE SCAN OF inner_table_index

When Are Nested Loops When Are Nested Loops Joins Used?Joins Used?

• If tables are of unequal size

• If results should be returned

online

Hash JoinHash Join//Partition R into k partitions

foreach tuple r in R do //flush when fills

read r and add it to buffer page h(ri)

foreach tuple s in S do //flush when fills

read s and add it to buffer page h(sj)

for l = 1..k

//Build in-memory hash table for Rl using h2

foreach tuple r in Rl do

read r and insert into hash table with h2

foreach tuple s in Sl do

read s and probe table using h2

output matching pairs <r,s>

Hash Join PlanHash Join Plan

HASH JOINTABLE ACCESS FULL OF table_ATABLE ACCESS FULL OF table_B

When Are Hash Joins When Are Hash Joins Used?Used?

• If tables are small

• If results should be returned online

Sort-Merge Join PlanSort-Merge Join Plan

MERGE JOINSORT JOINTABLE ACCESS FULL OF table_ASORT JOINTABLE ACCESS FULL OF table_B

When Are Sort/Merge Joins When Are Sort/Merge Joins Used?Used?

• Performs badly when tables are

of unequal size. Why?

HintsHints

• You can give the optimizer hints about

how to perform query evaluation

• Hints are written in /*+ */ right after

the select

• Note: These are only hints. The oracle

optimizer can choose to ignore your

hints

ExamplesExamples

Select /*+ FULL (sailors) */ sidFrom sailorsWhere sname=‘Joe’;

Select /*+ INDEX (sailors) */ sidFrom sailorsWhere sname=‘Joe’;

Select /*+ INDEX (sailors s_ind) */ sidFrom sailors S, reserves RWhere S.sid=R.sid AND sname=‘Joe’;

More ExamplesMore Examples

Select /*+ USE_NL (sailors) */ sidFrom sailors S, reserves RWhere S.sid=R.sid AND sname=‘Joe’;

Select /*+ USE_MERGE (sailors, reserves) */ sidFrom sailors S, reserves RWhere S.sid=R.sid AND sname=‘Joe’;

Select /*+ USE_HASH */ sidFrom sailors S, reserves RWhere S.sid=R.sid AND sname=‘Joe’;

inner table

Information Retrieval and DBInformation Retrieval and DB

CONTAINSCONTAINS

• Introduce text search in SQL

• CONTAINS operator

select Name

from article

where CONTAINS(abstract, ‘play’) > 0;

• Can combine OR, AND

StemmingStemming

• Given the “stem” of a word, Oracle will

expand the list of words to search for

to include all words having the same

stem

– Stem of plays, played, playing, playful:

play

– where CONTAINS(abstract, ‘$play’) > 0;

RankingRanking

• We need to rank between the retrieved

tuples according to their relevance

– Open challenge

– Several implementations for oracle

The following slides are based on those of Dr. Sara Cohen

The Vector Space ModelThe Vector Space Model

• The Vector Space Model (VSM) is a way of representing text data through the words that they contain

• It is a standard technique in Information Retrieval

• In the following, we call this text data, document (classical IR)

• The VSM allows decisions to be made about which documents are similar to each other and to keyword queries

How Does it Work?How Does it Work?

• Each document is represented as a vector

which contains a value for each word in the

vocabulary

– this value is 0, if the word does not appear in the

document

• Similarly, a query is represented as a vector

• The rank of the document with respect the the

query is the distance between their vectors

Example: Boolean ValueExample: Boolean Value

• P1 = “I live in a green

house with a green roof”

• P2 = “There is no life

form on Mars”

• P3 = “Men love green

cars”

• P4 = “I saw some little

green men yesterday”

P1 P2 P3 P4a 1 0 0 0cars 0 0 1 0green 1 0 1 1house 1 0 0 0I 0 0 0 1is 1 1 0 0life 0 1 0 0little 1 0 0 1love 0 0 1 0mars 0 1 0 0men 0 0 1 1my 1 0 0 0no 0 1 0 0on 1 1 0 0roof 1 0 0 0saw 0 0 0 1there 1 1 0 0

1 if the word appears, 0 otherwise


• P1 = “I live in a green

house with a green roof”

• P2 = “There is no life

form on Mars”

• P3 = “Men love green

cars”

• P4 = “I saw some little

green men yesterday”


Vector for P1


• Q = green OR men OR marsQuery

a 0cars 0green 1house 0I 0is 0life 0little 0love 0mars 1men 1my 0no 0on 0roof 0saw 0there 0

Distance Between VectorsDistance Between Vectors

• For two vectors d and d’ the cosine distance between d and d’ is given by:

• d d’ is the scalar product of d and d’, calculated by multiplying corresponding values together

• |d| is the norm of d

• The “cosine measure” calculates the cosine between the vectors in a high-dimensional virtual space

'

'

dd

dd

Distance Between Distance Between DocumentsDocuments

t1

d2

d1

d3

d4

d5

t3

t2

θ

φ

P3 Querycars 1 0green 1 1love 1 0men 1 1

ExampleExample

• Consider the query Q="green

men" and the document P3 =

"Men love green cars"

• The cosine distance:

– scalar product:

1*0 + 1*1+ 1*0 + 1*1 = 2

– norms:

(12 + 12 + 12 + 12 ) = 2

(02 + 12 + 02 + 12 ) = 2

– Similarity: 2/(2 2) = 1/ 2

Only dimensions that are non-zero in one of

the vectors are shown

Defining Vector Values: TFDefining Vector Values: TF

• Instead of boolean value, put word frequency (called tf, for "term frequency")

• What affect does this give?

• Sometimes a normalized version is used:– term frequency/number of

words in the document


P1 P2 P3 P4a 0.1 0 0 0cars 0 0 0.25 0green 0.2 0 0.25 0.2house 0.1 0 0 0I 0 0 0 0.2is 0.1 0.1667 0 0life 0 0.1667 0 0little 0.1 0 0 0.2love 0 0 0.25 0mars 0 0.1667 0 0men 0 0 0.25 0.2my 0.1 0 0 0no 0 0.1667 0 0on 0.1 0.1667 0 0roof 0.1 0 0 0saw 0 0 0 0.2there 0.1 0.1667 0 0

Normalized TFNormalized TF

Always: Sum = 1

Another Option:Another Option:Defining Vector Values as IDFDefining Vector Values as IDF

• We can combine TF

with IDF, inverse

document frequency

– 1/(number of

documents

containing the word)

• What is the affect?

P1 P2 P3 P4a 1 0 0 0cars 0 0 1 0green 0.3333 0 0.3333 0.3333house 1 0 0 0I 0 0 0 1is 0.5 0.5 0 0life 0 1 0 0little 0.5 0 0 0.5love 0 0 1 0mars 0 1 0 0men 0 0 0.5 0.5my 1 0 0 0no 0 1 0 0on 0.5 0.5 0 0roof 1 0 0 0saw 0 0 0 1there 0.5 0.5 0 0

Normalized IDFNormalized IDF

• Sometimes a normalized version is used:

• The logarithm gives less influence to IDF when TF and IDF are combined

• What is the value for a word that appears in all documents? Why?

ww n

Nidf log


Number of documents

Number of documents in

which w appears

Standard Measure is TF-IDFStandard Measure is TF-IDF

• Use normalized TF

times normalized IDF

• Note: Once the values

are chosen (using any

of the schemes

considered), we use

cosine distance to

compare the document

and query


XML (Extensible Markup XML (Extensible Markup Language) Language)

andand

the Semi-Structured Data Modelthe Semi-Structured Data Model

MotivationMotivation

• We have seen that relational databases

are very convenient to query. However:

– There is a LOT of data not in relational

databases!!

• Perhaps the most widely accessed

database is the web, and it certainly

isn’t a relational database.

Querying the WebQuerying the Web

• The web can be queried using a search engine, however, we can’t ask questions like:– What is the lowest price for which a Jaguar

is sold on the web?

• Problems:– There are no facilities for asking complex

questions, such as aggregation of data

Understanding the WebUnderstanding the Web

• In order to query the web, we must be

able to understand it.

• 2 Computer Science Approaches:

– Artificial Intelligence Approach

– Database Approach

Database ApproachDatabase Approach

“The web is unstructured and we will structure it”

• Sometimes problems that are very difficult can be solved easily by enforcing a standard

• Encourage the use of XML as a standard for data exchange on the web

<addresses >

<person friend="yes">

<name> Jeff Cohen</name>

<tel> 04-828-1345 </tel>

<tel> 054-470-778 </tel>

<email> [email protected] </email>

</person>

<person friend="no">

<name> Irma Levy</name>

<tel> 03-426-1142 </tel>

<email>[email protected]</email>

</person>

</addresses>

Example XML DocumentExample XML Document

Opening Tag

AttributeElement

Closing Tag

Very Unstructured XMLVery Unstructured XML

<?xml version=“1.0”?>

<DamageReport>

The insured’s <Vehicle Make = “Toyota”>

Corolla </Vehicle> broke through the guard rail and plummeted into the ravine. The cause was determined to be <Cause>faulty brakes </Cause>. Amazingly there were no casualties.

</DamageReport>

XML Vs. HTMLXML Vs. HTML

• XML and HTML are brothers. They are both

special cases of SGML.

• HTML has specific tag and attribute names.

These are associated with a specific meaning

• XML can have any tag and attribute name.

These are not associated with any meaning

• HTML is used to specify visual style

• XML is used to specify meaning

A Different Data ModelA Different Data Model

RelationalSemi-Structured

Abstract

Model

Sets of

tuples

Labeled Directed

Graph

Concrete

Model

TablesXML Documents

Standard

for

Storing

Data

Data Exchange

Separating Content

from Style

Data ExchangeData Exchange

• Problem: Many data sources, each of a different type (different vendor), with a different schema. – How can the data be combined and used

together?

– How can different companies collaborate on their data?

– What format should be used to exchange the data?

Separating Content from Separating Content from StyleStyle

• Web sites develop over time

• Important to separate style from data in order to allow changes to the site structure and appearance

• Using XML, we can store data alone

• CSS separates style from data only in a limited way

• Using XSL, this data can be translated into HTML

• The data can be translated differently as the site develops

Write Once Use Write Once Use EverywhereEverywhere

XML Data

XSL

WML(hand-held

devices)

XSL

HTML(web browser

XSL

TEXT(Excel)

http://www.pdabuyersguide.com/palmOne_Tungsten_T5.htm

Using XMLUsing XML

• Quering and Searching XML: There are query languages and search engines that query XML and return XML. Examples: Xpath, Xquery /SQL4X, Equix, XSEarch

• Displaying XML: An XML document can have an associated style-sheet which specifies how the document should be translated to HTML. Examples: CSS, XSL

DTD: Document Type DTD: Document Type DescriptorsDescriptors

• Document Type Descriptors (DTDs)

impose structure on an XML

document

• There is some relationship

between a DTD and a schema