Upload
scott-wesley
View
1.145
Download
7
Embed Size (px)
Citation preview
SAGE Computing Services
Customised Oracle Training Workshops and Consulting
Oracle Text in Apex
Advanced Indexing Techniques Integrated with
Application Express
Scott Wesley
Systems Consultant & Trainer
Agenda
• Introduction
• Architecture
• Fundamentals
• Considerations
• Setting Up
• Samples
• Index Maintenance
• Visualisation
• New Features
Larry Lessig?
the law is strangling creativity
http://www.ted.com/talks/larry_lessig_says_the_law_is_strangling_creativity.html
http://presentationzen.blogs.com/presentationzen/2005/10/the_lessig_meth.html
Identity 2.0 – Dick Hardt
http://identity20.com/media/OSCON2005/
who’s the Dick on your site
so today’s going to be more like this
and this
after I show a few pictures
who_am_i;
http://strategy2c.wordpress.com/2009/01/10/strategy-for-goldfish-funny-illustration-by-frits/
balance
Why use Oracle Application Express?
Why use Oracle Text?
What is Oracle Text?
Document Collection
Catalogue Information
Document Classification
Architecture
Class Description
Datastore How are your documents stored?
Filter How can the documents be converted to plain text?
Lexer What language is being indexed?
Wordlist How should stem and fuzzy queries be expanded?
Storage How should the index data be stored?
Stop List What words or themes are not to be indexed?
Section Group How are documents sections defined?
1) Example
CREATE INDEX ctx_name ON my_names(name)
INDEXTYPE IS ctxsys.context
PARAMETERS ('DATASTORE CTXSYS.DEFAULT_DATASTORE');
SQL> SELECT SCORE(1), name
2 FROM my_names
3 WHERE CONTAINS(name, 'fuzzy(john,,,weight)', 1) > 0
4 ORDER BY SCORE(1) DESC;
SCORE(1) NAME
---------- ----------------------------------------
100 John
100 John
70 Jon
70 Jon
63 Joan
63 Joan
52 Jong
48 Jona
8 rows selected.
2) Datastore
CTXSYS.DEFAULT_DATASTORE
BLOB
BFiles
Pointers to objects on file system
URLs
Pointers to objects on the intertube
User Defined
Why would you?
3) Index Type
a) CONTEXT
Document Collection
large document size
provides a score
asynchronous index & table data
CONTAINS
b) CTXCAT
Catalogue Information
smaller documents
text fragments
multiple attributes
set lists
similar to typical index paradigm
transactional
CATSEARCH
c) CTXRULE
Document Classification
routing information
displace manual interaction
not binary files
MATCHES
4) Considerations
location of text
document format
bypassing rows - images
character set
language
fuzzy matching & stemming
wildcard query performance
stopwords & stopthemes
query performance and storage of LOBs
mixed queries
5) Setting up
GRANT ctxapp TO ausoug;
create & delete indexing preferences
use Oracle Text PL/SQL supplied packages
1* select grantee, owner, table_name, privilege from dba_tab_privs where table_name =
'CTX_DDL'
SQL> /
GRANTEE OWNER TABLE_NAME PRIVILEGE
-------------------- ------------ ------------------------------ --------------------
CTXAPP CTXSYS CTX_DDL EXECUTE
APEX_040000 CTXSYS CTX_DDL EXECUTE
APEX_030200 CTXSYS CTX_DDL EXECUTE
AUSOUG CTXSYS CTX_DDL EXECUTE
XDB CTXSYS CTX_DDL EXECUTE
5 rows selected.
PLS-00201: identifier "string" must be declared
CTX PL/SQL Packages
GRANT EXECUTE ON CTXSYS.CTX_CLS TO ausoug;
GRANT EXECUTE ON CTXSYS.CTX_DDL TO ausoug;
GRANT EXECUTE ON CTXSYS.CTX_DOC TO ausoug;
GRANT EXECUTE ON CTXSYS.CTX_OUTPUT TO ausoug;
GRANT EXECUTE ON CTXSYS.CTX_QUERY TO ausoug;
GRANT EXECUTE ON CTXSYS.CTX_REPORT TO ausoug;
GRANT EXECUTE ON CTXSYS.CTX_THES TO ausoug;
GRANT EXECUTE ON CTXSYS.CTX_ULEXER TO ausoug;
Using URL Datastore in 11g
CREATE ROLE apex_url_datastore_role;
GRANT apex_url_datastore_role TO APEX_040000
WITH ADMIN OPTION;
GRANT apex_url_datastore_role TO ausoug;
EXEC
ctxsys.ctx_adm.set_parameter
('file_access_role'
,'APEX_URL_DATASTORE_ROLE');
Demonstrations
Script Description
Ctx_blobs.sql Import & index a range of documents
Ctx_bfiles.sql Import & index BFILE pointers
Ctx_urls.sql Index & search URL references
Ctx_dict.sql Index & search English dictionary words
Ctx_views.sql Index view SQL text for impact analysis
Ctx_apex_files.sql Duplicate and search Apex file repository
Ctx_apex_backups.sql Hunt through your (automated) Apex app backups
Ctx_names.sql Basic name filter options
Ctx_products.sql Multiple column searches
Ctx_category.sql Attribute based searching
Ctx_classify.sql Classify documents into categories
6) Index maintenance
indexing errors
resume failed index
ALTER INDEX ctx_surname
REBUILD PARAMETERS ('resume memory 10m');
recreate index online (11g)
EXEC ctx_ddl.recreate_index_online
('ctx_surname', 'replace lexer sw_lexer');
rebuilding an index
ALTER INDEX ctx_surname
REBUILD PARAMETERS('replace lexer sw_lexer')
ONLINE;
ctx_report.index_stats
create table ausoug.my_stats (stats clob);
declare
x clob := null;
begin
for r_rec in
(select *
from ctxsys.ctx_indexes
where idx_owner = 'AUSOUG'
and idx_type = 'CONTEXT') loop
ctx_report.index_stats(r_rec.idx_name,x);
insert into ausoug.my_stats values (x);
end loop;
commit;
dbms_lob.freetemporary(x);
end;
/
7) Data Dictionary
SQL> select count(*)
2 from all_views
3 where owner = 'CTXSYS';
COUNT(*)
----------
58
8) Common Questions
DML operations on a CONTEXT index
ctxsys.ctx_user_pending
synchronise the index
synchronize
EXEC ctx_ddl.sync_index('ctx_surname');
dbms_job
dbms_scheduler
how often?
optimise the index
can get fragmented
inverted index
each entry contains list of documents
DOG - DOC1 DOC3 DOC5
DOG - DOC7
DOG - DOC9
DOG - DOC11
ctx_ddl.optimize_index
capacity planning?
Object of Interest Num
Rows
Table
Size
Index
size
Dictionary 150k 7 27
Documents 28 34 1.5
Names 27k 1 6
Views 2k 7 2
BFiles 4
Product 1
URL 1
more text
cleaner data
less overhead
document format
next steps?
read Application Developer’s Guide
find examples
experiment
SAGE Computing Services
Customised Oracle Training Workshops and Consulting
Question time
Presentations are available from our website:
http://www.sagecomputing.com.au
http://triangle-circle-square.blogspot.com