59
Information Resources Information Resources Management Management April 24, 2001 April 24, 2001

Information Resources Management April 24, 2001. Agenda n Administrivia n Object-Oriented & Databases n Data Warehousing n Data Mining n SQL Extensions

  • View
    218

  • Download
    1

Embed Size (px)

Citation preview

Information Resources Information Resources ManagementManagement

April 24, 2001April 24, 2001

AgendaAgenda

AdministriviaAdministrivia Object-Oriented & DatabasesObject-Oriented & Databases Data WarehousingData Warehousing Data MiningData Mining SQL ExtensionsSQL Extensions XMLXML

AdministriviaAdministrivia

Homework #8Homework #8 Homework #9Homework #9 Current ScoresCurrent Scores Final Review Session?Final Review Session?

OODBMS vs. ORDBMSOODBMS vs. ORDBMS

OODBMS - Object-OrientedOODBMS - Object-Oriented ORDBMS - Object-RelationalORDBMS - Object-Relational

Appendix AAppendix A

OODBMSOODBMS

Persistent ObjectsPersistent Objects By classBy class By creationBy creation By markingBy marking By referenceBy reference

Storage/Retrieval MethodsStorage/Retrieval Methods

OODBMS - BenefitsOODBMS - Benefits

MatchMatch ProgrammingProgramming MethodologyMethodology Data types & structuresData types & structures

Ease of programmingEase of programming InheritanceInheritance

OODBMS - ChallengesOODBMS - Challenges

StandardsStandards ODMG - Object Database Management ODMG - Object Database Management

GroupGroup PerformancePerformance

Database vs. persistent languageDatabase vs. persistent language Loss of integrity, queriesLoss of integrity, queries

Storage SpaceStorage Space MaturityMaturity

ORDBMSORDBMS

Extensions to relational modelExtensions to relational model Complex data typesComplex data types InheritanceInheritance ReferencesReferences

Migration pathMigration path Use existing applications and Use existing applications and

knowledge baseknowledge base

ORDBMS - BenefitsORDBMS - Benefits

SQLSQL Existing SystemsExisting Systems VendorsVendors

ORDBMS - ChallengesORDBMS - Challenges

StandardsStandards ““Fit” with the development languageFit” with the development language Programming ComplexityProgramming Complexity

Using a relational database to store data from an object-oriented system has been likened to parking your car in your garage. With an OODBMS you park the car in the garage. If a (O)RDBMS is used, to park your car in the garage, you must first completely disassemble it and put each part in its specific location on a shelf. This process must then be reversed the next time you want to go for a drive.

OODBMS/ORDBMS ProductsOODBMS/ORDBMS ProductsVendor ProductComputer Associates www.cai.com/products/jasmine

Jasimine

Franz www.franz.com

AllegroStore

Fujitsu Software www.fsc.fujitsu.com

Jasmine

Gemstone Systems www.gemstone.com

GemStone/S

Matisse Software www.matisse.com

ADB

O2 Technology www.o2tech.com

O2

Object Design www.odi.com

ObjectStore

OODBMS/ORDBMS ProductsOODBMS/ORDBMS ProductsVendor ProductObjectivity www.objectivity.com

Objectivity/DB

Object Systems www.iprolink.ch/ibex.com

ITASCA

Ontos www.ontos.com

Ontos Integrator

Persistence www.persistence.com

Persistence LiveObject Server

Poet Software www.poet.com

Poet Object Server

Unisys www.osmos.com

Osmos

Versant www.versant.com

Versant ODBMS

Other LinksOther Links

Object Database Management GroupObject Database Management Group

www.odmg.orgwww.odmg.org Object Database NewsgroupObject Database Newsgroup

comp.databases.objectcomp.databases.object

Data MiningData Mining

Corporations have collosal amounts of dataCorporations have collosal amounts of data Usually only used for very specific purposes Usually only used for very specific purposes

(operations)(operations) Automated attempt to learn from the dataAutomated attempt to learn from the data Find statistical rules and patterns in the dataFind statistical rules and patterns in the data

Example: Giant Eagle Advantage CardExample: Giant Eagle Advantage Card

Goals of Data MiningGoals of Data Mining

Explanatory - Why?Explanatory - Why? Confirmatory - Is it?Confirmatory - Is it? Exploratory - ???Exploratory - ???

Approaches to Data MiningApproaches to Data Mining

ClassificationClassification identify rules that create identify rules that create

groupsgroups AssociationAssociation

find related conditions or find related conditions or eventsevents

CorrelationCorrelation relationships between relationships between

valuesvalues

User GuidedUser Guided hypothesis hypothesis

drivendriven AutomaticAutomatic

data driven data driven - AI based- AI based

Data WarehouseData Warehouse

A subject-oriented, integrated, time-A subject-oriented, integrated, time-variant, nonvolatile collection of datavariant, nonvolatile collection of data

Usually all data for a corporationUsually all data for a corporation Multidimensional databaseMultidimensional database

Data WarehousingData Warehousing

Single locationSingle location Long-term storageLong-term storage Greater availabilityGreater availability Separate “data” processing from day-to-Separate “data” processing from day-to-

day operations (performance)day operations (performance) All data is historicalAll data is historical Support data mining, et al.Support data mining, et al.

Data Warehousing QuestionsData Warehousing Questions

What data needs to be kept?What data needs to be kept? Where is it from?Where is it from? How good is it?How good is it? How long should it be kept?How long should it be kept? Can it be summarized? When?Can it be summarized? When? Will it make sense? What is the schema?Will it make sense? What is the schema? When is it updated?When is it updated?

Data Warehousing - BenefitsData Warehousing - Benefits

Support for decision making toolsSupport for decision making tools DSS, EIS, Data MiningDSS, EIS, Data Mining

Separation of information and day-to-Separation of information and day-to-day processingday processing

Unification - CentralizationUnification - Centralization Improved quality and consistencyImproved quality and consistency

Data Warehousing - Data Warehousing - ChallengesChallenges Costs: Storage, Setup, MaintenanceCosts: Storage, Setup, Maintenance Historical data issuesHistorical data issues Defining the warehouse schemaDefining the warehouse schema Doing the conversionDoing the conversion

Implementation & every timeImplementation & every time Keeping up with operational system Keeping up with operational system

changeschanges Answering the questionsAnswering the questions

Multidimensional DatabasesMultidimensional Databases

Two viewsTwo views Multidimensional tablesMultidimensional tables Star schemaStar schema

Multidimensional tableMultidimensional table each cell is attributeeach cell is attribute dimensions are “interesting” dimensions are “interesting”

categoriescategories

Multidimensional TableMultidimensional Table

Cell - salesCell - sales DimensionsDimensions

dayday personperson storestore itemitem

Star SchemaStar Schema

Multiple tablesMultiple tables Central table - data item (cell)Central table - data item (cell) Surrounding tables - information Surrounding tables - information

about each category (dimensions)about each category (dimensions)

Star SchemaStar Schema

Sales

Person

StoreItem

Day

Star SchemaStar Schema

Sales (Sales (DayDay, , PersonPerson, , StoreStore, , ItemItem, sales), sales)

Day (Day (DayDay, day info), day info)

Person (Person (PersonPerson, person info), person info)

Store (Store (StoreStore, store info), store info)

Item (Item (ItemItem, item info), item info)

Building/Maintaining a Data Building/Maintaining a Data WarehouseWarehouse1.1. Capture Capture

2.2. Scrub Scrub

3. Transform3. Transform

4. Load and Index4. Load and Index

Data MartsData Marts

Making specific data availableMaking specific data available Different ones for different needsDifferent ones for different needs

DW DM1

DM2Operational Systems

Data MiningData Mining

Corporations have collosal amounts of dataCorporations have collosal amounts of data Usually only used for very specific purposes Usually only used for very specific purposes

(operations)(operations) Automated attempt to learn from the dataAutomated attempt to learn from the data Find statistical rules and patterns in the dataFind statistical rules and patterns in the data

Example: Giant Eagle Advantage CardExample: Giant Eagle Advantage Card

Goals of Data MiningGoals of Data Mining

Explanatory - Why?Explanatory - Why? Confirmatory - Is it?Confirmatory - Is it? Exploratory - ???Exploratory - ???

Approaches to Data MiningApproaches to Data Mining

ClassificationClassification identify rules that create identify rules that create

groupsgroups AssociationAssociation

find related conditions or find related conditions or eventsevents

CorrelationCorrelation relationships between relationships between

valuesvalues

User GuidedUser Guided hypothesis hypothesis

drivendriven AutomaticAutomatic

data driven data driven - AI based- AI based

Data Mining - BenefitsData Mining - Benefits

Use dataUse data Learn new thingsLearn new things Improve decision makingImprove decision making

Data Mining - ChallengesData Mining - Challenges

Time (human and/or computer)Time (human and/or computer) Spurious resultsSpurious results

Separating the wheat from the chaffSeparating the wheat from the chaff Availability of dataAvailability of data Amount of dataAmount of data Changes in tools and technologiesChanges in tools and technologies Validity over timeValidity over time

Enhanced Data AnalysisEnhanced Data Analysis

Beyond SUM, COUNT, and AVGBeyond SUM, COUNT, and AVG SQL extensions (suggested)SQL extensions (suggested)

GROUP BY … AS PERCENTILEGROUP BY … AS PERCENTILE Specific percentilesSpecific percentiles

GROUP BY … WITH CUBEGROUP BY … WITH CUBE Cross-tabulationsCross-tabulations

Statistical package interfaceStatistical package interface SAS, S++, othersSAS, S++, others

Enhanced Data Analysis - Enhanced Data Analysis - BenefitsBenefits Greater functionalityGreater functionality Improved decision makingImproved decision making

Enhanced Data Analysis - Enhanced Data Analysis - ChallengesChallenges Lack of standardsLack of standards UnderstandabilityUnderstandability Processing requirementsProcessing requirements Cost of poorly written queriesCost of poorly written queries

““ad hoc” queries aren’t reviewedad hoc” queries aren’t reviewed

Extending Relational DBsExtending Relational DBs

Spatial and Geographic DatabasesSpatial and Geographic Databases Multimedia DatabasesMultimedia Databases

Changing the data stored while Changing the data stored while retaining the benefits of relational retaining the benefits of relational databasesdatabases

Spatial & Geographic DBsSpatial & Geographic DBs

Spatial - CADSpatial - CAD Geographic - GISGeographic - GIS

Similar issueSimilar issue How to store and retrieve such dataHow to store and retrieve such data

Spatial DatabasesSpatial Databases

Geometric objects (2 or 3 dimensions)Geometric objects (2 or 3 dimensions) LocationsLocations ConnectionsConnections Nonspatial information about each objectNonspatial information about each object SubstructuresSubstructures Spatial integrity constraintsSpatial integrity constraints

Two things can’t occupy the same Two things can’t occupy the same spacespace

GIS DatabasesGIS Databases

Raster Data (fractal data)Raster Data (fractal data) Pictures - possibly over timePictures - possibly over time MapsMaps

Vector DataVector Data LocationsLocations ConnectionsConnections

Nongeographic informationNongeographic information

Spatial & Geographic DB -Spatial & Geographic DB -BenefitsBenefits DBMSDBMS Specialized queriesSpecialized queries

Spatial & Geographic DataSpatial & Geographic Data ““Standard” DataStandard” Data Mix of the twoMix of the two

Integrity constraintsIntegrity constraints

Spatial & Geographic DB - Spatial & Geographic DB - ChallengesChallenges Space requirementsSpace requirements Level of detailLevel of detail Understandability - ComplexityUnderstandability - Complexity Processing requirementsProcessing requirements Compatibility between systemsCompatibility between systems Lack of standardsLack of standards

Multimedia DatabasesMultimedia Databases

Images, Audio, VideoImages, Audio, Video Nonmultimedia data (text) about eachNonmultimedia data (text) about each

Database EnhancementsDatabase Enhancements BLOBs (Binary Large Objects)BLOBs (Binary Large Objects) Similarity-based queriesSimilarity-based queries Guaranteed steady rateGuaranteed steady rate Synchronization of audio and videoSynchronization of audio and video

Multimedia Databases - Multimedia Databases - BenefitsBenefits DBMSDBMS Greater compression may be possibleGreater compression may be possible ““Paperless” office - document imagingPaperless” office - document imaging Workflow redesign - improvementsWorkflow redesign - improvements Greater availabilityGreater availability

Multimedia Databases - Multimedia Databases - ChallengesChallenges S T O R A G ES T O R A G E Specialized DBMSSpecialized DBMS Unity of database and networkUnity of database and network

Usually requires ATMUsually requires ATM Specialized hardwareSpecialized hardware

““juke boxes”juke boxes” optical disksoptical disks

XMLXML

What is it?What is it? What isn’t it?What isn’t it? What are the goals?What are the goals? Who controls it?Who controls it? Who’s using it?Who’s using it? Beyond XMLBeyond XML

What is XML?What is XML?

eXtensible Markup LanguageeXtensible Markup Language Markup language for “structured Markup language for “structured

information”information” ““structured” - content & role of that structured” - content & role of that

contentcontent markup - identify structuresmarkup - identify structures

““meta language for describing markup meta language for describing markup languages”languages”

Huh?Huh?

Storing structured data in a text fileStoring structured data in a text file spreadsheet, address book, transactions spreadsheet, address book, transactions

(think EDI)(think EDI) Looks like HTML, <tags>, but isn’tLooks like HTML, <tags>, but isn’t Text is universal, but not efficientText is universal, but not efficient

Does disk space matter?Does disk space matter? What about network capacity?What about network capacity?

XML is license-free & platform-independentXML is license-free & platform-independent

What XML isn’tWhat XML isn’t

HTMLHTML SGML - Standard Generalized Markup SGML - Standard Generalized Markup

Language - printingLanguage - printing Limited to current definitions (tags)Limited to current definitions (tags)

XML is the way to add new definitionsXML is the way to add new definitions A relational database management A relational database management

systemsystem A database, or is it?A database, or is it?

Goals of XMLGoals of XML

Easy to use over InternetEasy to use over Internet Wide variety of applicationsWide variety of applications Compatible with SGML (subset)Compatible with SGML (subset) Easy to write programs that use XML Easy to write programs that use XML

documentsdocuments No (or few) optional featuresNo (or few) optional features Human-legible if necessaryHuman-legible if necessary

Goals of XML (2)Goals of XML (2)

Standards developed quicklyStandards developed quickly Formal and conciseFormal and concise Easy to create documentsEasy to create documents No need for “shortcuts”No need for “shortcuts”

Who Controls XML?Who Controls XML?

W3 ConsortiumW3 Consortium www.w3.org/XMLwww.w3.org/XML XML 1.0 specificationXML 1.0 specification

Who’s Using XML?Who’s Using XML?

Financial Products Markup LanguageFinancial Products Markup Language FpMLFpML FpML.orgFpML.org ““A standard for financial derivatives A standard for financial derivatives

business-to-business e-Commerce”business-to-business e-Commerce” Others?Others?

Beyond XMLBeyond XML

Xlink - hyperlinks in XMLXlink - hyperlinks in XML XPointer & Xfragments - point to parts of XPointer & Xfragments - point to parts of

an XML documentan XML document CSS - style sheet languageCSS - style sheet language

XML and HTMLXML and HTML XSL - advanced language for style sheetsXSL - advanced language for style sheets XSLT - XSL transformation languageXSLT - XSL transformation language

Beyond XML (2)Beyond XML (2)

DOM - standard function calls for DOM - standard function calls for manipulating XML (and HTML) from manipulating XML (and HTML) from programsprograms

XML Namespaces - link a URL with XML Namespaces - link a URL with every tag and attributeevery tag and attribute

XML Schemas 1 & 2 - help in precisely XML Schemas 1 & 2 - help in precisely developing own XML-based formatsdeveloping own XML-based formats

Homework #10Homework #10

Last One! (No HW #11)Last One! (No HW #11) Research and evaluate productsResearch and evaluate products 100 points100 points

FinalFinal

Next Tuesday, 5/1Next Tuesday, 5/1 Approximately 1/3 from 4/3 - 4/24Approximately 1/3 from 4/3 - 4/24 Remainder - comprehensiveRemainder - comprehensive

Thank YouThank You