Upload
pankaj2908
View
221
Download
0
Embed Size (px)
Citation preview
8/2/2019 Data Mining & Housing
1/13
ABSTRACT
The Data Warehousing supports business analysis and decision making by
creating an enterprise wide integrated database of summarized, historical information.
It integrates data from multiple incompatible sources. By transforming data into
meaningful information a data warehouse allows the business manager to perform
more substantive, accurate and consistent analysis.
DataMining techniques can be implemented rapidly on existing software and
hardware platforms to enhance the value of existing information resources and can be
integrated with new products and systems as they are brought online. When
implemented on high performance clien/server or parallel processing computersdatamining tools can analyze massive databases that support querying effectively.
A Data Warehouse is of course a database, but it contains summarized
information. Integration of Data Mining with Warehouse exploits effective results like
better quering process, performance sharing and also getting reliable information.
Here in the following section we expose the entire concept of Data Warehousing &
Data Mining.
8/2/2019 Data Mining & Housing
2/13
NARAYANA ENGG. COLLEGE
By
D. Ajith kumar(IVth CSIT)&
SANJAY JOSHI(IIIrd ECE)
8/2/2019 Data Mining & Housing
3/13
Contents
1) Introduction
Features
Decision Support Systems
2) Datawarehouse schemas
3) Microsoft Data Warehousing Framework
4) Dataminig working procedure
Datawarehouse with data mining
An approach to Client/Server data warehousing
Applications
Conclusion
8/2/2019 Data Mining & Housing
4/13
INTRODUCTION:
Modern organizations are under enormous pressure with recent development of the
technology. Clearly we need a rapid access to all kinds of information. To assist this
we need to consider the past and to identify relevant trend analysis. So in order to
perform any trend analysis we must have a database.
In most organizations you will find really large databases in operation for
normal daily transactions. These types of databases are known as operational
databases; in most cases they have not been design to store historical data or to
respond to queries but simply to support all the applications for day to day
transactions.
The second type of database found in organizations is the data warehouse. This is
designed for strategic decision support and is largely built up from the databases that
make up the operational database. The basic characteristic of a data warehouse is that
it contains vast amount of data which can mean billions of records. Smaller, local data
warehouse are called data marts.
A data warehouse is designed especially for decision support queries, therefore only
data that is needed for decision support is extracted from the operational data and
stored in the data warehouse along with the time when it was retrieved from
operational databases.
Datawarehousing
Need for Datawarehouse:
To summarise large valumes of data.
To integrate datas from different sources.
Make decision makers to access past data.
Enable people to make informed decisions.
8/2/2019 Data Mining & Housing
5/13
FEATURES :
1. Time dependent: - That is, containing information collected over time, which
implies there must always be a connection between the information in the
warehouse and the time when it was entered.
2. Non-volatile (permanent): -That is, data in datawarehouse is never updated
but used only for queries. End users who want to update the data must use
operational database.This means that data warehouse will always be filled with
historical data.
3. Subject oriented: - That is, built around all the existing applications of the
operational data.The data warehouse is designed specifically for decision
support while the operational databases contain about information for day to-
dayuse.
4. Integrated: - In data warehouse it is essential to integrate this information and
make it consistent; only one name must exists to describe each individual entity.
DECISION SUPPORT SYSTEM :
When designing a decision support system, particular importance should be placed on
the requirements of the end-user and the h/w and s/w products that will be required.
The requirements of the end-users: -
Some end-users need specific query tools so that they can build their queries
themselves. Some others are interested only in particular part of information. We can
build a specific type of application around this to speed up the query process.
H/w and S/w products of a decision support systems:
Working in a client/server environment allows you great flexibility in choosing the
appropriate s/w for end-users because each individual need can be catered for on a
local workstation.The h/w requirements depends on the type of data warehouse and
the techniques with which you want to work.Two basic types of data warehouses :
8/2/2019 Data Mining & Housing
6/13
1. Enterprise data warehouses: The enterprise data warehouse
contains corporate wide information integrated from multiple operational data sources
for consolidated data analysis. Typically it is composed of several subject areas such
as customers, products, and sales and is used for both tactical and strategic decision
making.
2. DataMarts :Datamarts contain a subset of carporate wide data
that is built for use by an individual department or division of an organization. Unlike
the enterprise data warehouse, datamarts are often built from the bottom of by
departmental resources for a specific support application or group of users. Datamarts
contain summarized and often detailed about subject area.
DATAWAREHOUSE SCHEMAS :
A multidimensional data model identifies the dimensions, their hierarchies the
measure functions etc., for the design of data cube. But realization of data cube is in
designing phase. Variouse schemas as employed.
1. Star schema :
It is a modeling paradign in which the datawarehouse contains a large single fact table
and a set of smaller dimensional tables, one for each dimension.
Fact table:
Fact table
Dim1-key
Dim2-keyDim3-key
Summary
Dim1table
Dim1Attrib Dim2table
Dim2Attrib
Dim3table
Dim3Attrib
8/2/2019 Data Mining & Housing
7/13
It contains detailed summary data
Each tuple consists of foreign key to each dimension table.
Corresponds to only one tuple in each dimension table.
Dimension table:
It consists of columns that corresponds to the attributes of the dimensions.
One tuple in a dimension table may corresponds to more than one tuple
in the fact table.
1:N relationship exists between factable and dimensiontables. It is easy to understand and easy to define hierarchies. It reduces the no. of physical joins and is easy to maintain.
2. Snowflake schema :
It consists of single fact table and multiple dimension tables. The difference between
star schema and snowflake schema is that in star schema the dimension tables are
denormalized and in snowflake schema these tables are normalized.
Easier to maintain.
Saves storage space.
Microsoft Data Warehousing Framework:
The goal of the data warehousing framework is to simplify the design implementation
and management of data warehousing solutions. The data warehousing framework
Fact tableDimension2
tableDimension1
tableDimension3
table
8/2/2019 Data Mining & Housing
8/13
describes the relationships between the various components used in the process of
building using and managing a data warehouse.
The core of the Microsoft framework is a set of enabling technologies comprised of
the data transport layer and integrated data repository. Operational data must pass
through a cleaning and transformation stage before being placed into the datamarts or
data warehouse in order to confirm to the decisions laid out during the design stage.
End-user tools including desktop productivity products specialized analysis
products and custom programs are used to gain access the information in the data
warehouse. Ideally user access is through a directory facility that enables the user
search for appropriate and relevant data to resolve business questions, and provides a
layer of security between the users and backend systems.Finally a verity of tools
come into play for the management of data warehouse environment such as
scheduling repeated tasks and managing multiserver N/w.
Data Warehouse/ Data Mart Design
Operational
Sources
Data
Transform/
Cleaning
Datamarts
or Data
Warehouse
Infor
mation
directory
End-User
Tools
Repository(persistent shared metadata)
Data Warehouse Management
Schema Transform Schedule Repl Info Publish OLAP
Building Using
8/2/2019 Data Mining & Housing
9/13
Microsoft repository provides the integration point for the metadata shared by the
various tools used in the data warehousing process. Shared metadata allows for the
transparent integration of the multiple tools from a variety of vendors, with out the
need for specialized interfaces between each of the products.
Datamining
DataMinig or knowledge discovery in databases is the nontrivial extraction of implicit
and previously unknown and potentially usefull information from the data. Data
mining is the search for relationship and global patterns that exist in large databases
but are hidden among vast amount of data.
WORKING PROCEDURE :
DataMining software analyzes relationships and patterns in stored transactions data
based on open-ended user queries.
Generally sought four types of relationships are :
classes : Stored data is used to locate data in predetermined groups.
Clusters : Data items are grouped according to logical relationships or consumer
preferences.
Associations : Data can be mined to identify associations.
Sequential patterns : Data is mined to anticipate behaviour patterns and trends.
Major Steps :
Extract, transform and load transaction data onto the datawarehouse system.
8/2/2019 Data Mining & Housing
10/13
Store and manage the data in a multidimensional database system. Provide data access to business analysts and Information technology professionals. Analyze the data by application software. Present data in useful manner such as graph or table.Techniques in DataMining:
1. Artificial Neural Networks: Non-linear predictive models that learn through
training and resemble biological neural network in structure.
2. Decision Trees: Tree shaped structures that represent sets of decisions. These
decisions generate rules for classification of dataset.
3. Genetic Algorithms: Optimization techniques that use processes such as genetic
combinations, mutation and natural selection in a design based on the concepts of
evaluation.
4. Rule Induction: The extraction of useful if-then rules from data based on
statistical significance.
Datawarehouse with data mining:
Data mining: - As is well known, in mining, enormous quantities of debris have to
be removed before diamonds or gold can be found. The analogy that, with a computer
you can automatically find the one 'information-diamond' among the tons of data-
debris in your database is of course very attractive.
Integration of a data mining in a decision support system is very helpful. The
sole function of data warehouse is to supply information needed to make adequate
decisions. In some cases you can use standard SQL tools for decision support, but if
you want to compare millions of records and do not know exactly the type of
information you require, or if you want to find hidden data then you have to turn to
data mining. In many cases you will find that you need a separate computer for data
mining; trying to mine operational data is almost impossible because there are
different applications with different types of attributes and different data types but no
historical data. With a data warehouse this problem does not exist - all the information
8/2/2019 Data Mining & Housing
11/13
has been transferred from the operational database to the data warehouse;
furthermore, in many cases you can clean the data before commencing data mining.
The Relationship between operational data, a data warehouse, and datamarts
Client/Server and data warehousing:
Over the past few years it has proved very difficult to built effective decision
support systems because the techniques available were not able to support the end-
user satisfactorily. End-users would ideally like to have available all kinds of
techniques such as GUI, statistical techniques, windowing mechanisms and
visualization techniques so that they can easily access the data being sought. This
means that a great deal of local computer power is needed at each workstation, and
the client/server technique is the solution to this problem. Client/Server involves
dispersing the s/w over several computers and creating an environment for the end-
user so that it appears that each is working on just one system. The heavy load of GUI
or other visual techniques can be processed on this local machines and all the
database tasks handled by a specific database serve. In this way the database server
can be completely optimized for the database. In some cases you can buy special
databases that operate with specific type of h/w. With client/server you only have to
change the piece of s/w that is related to the end-user the other applications do not
Operational
data
Extracts
from several
databases
Data
WarehouseDatamarts
8/2/2019 Data Mining & Housing
12/13
require alteration. Of all the techniques currently available on the market, client/server
represents the best choice for building a data warehouse.
APPLICATIONS:
Datawarehousing:
a. Sales and marketing analysis across many industries.
b. Inventory turn and product tracking in manufacturing.
c. Profitable lane or driver risk analysis in transportation.
d. Claims analysis or fraud detection in insurance.
DataMining:
Retail/Marketing : Identifying buying patterns from customers.
Banking: Detect patterns of fraudulent credit card use.
Healthcare:
1. Identifying the behaviour of the risky customer.
2. Identifying successful medical therapies for different illenesses.
Conclusion: -
Acquiring of right information at right time to right people is key to take right
decisions. To make possible so, the path called data warehouse is used to data
mining.
Bibliography:
1. Data Mining by Pieter Adriaans , Dolf Zantinge
8/2/2019 Data Mining & Housing
13/13
2. Decision Support and Data Warehouse Systems
by Efrem G. Mallach
Contact Address:
D. Ajith kumar ,
01711A1201, IV/IV CSIT
NARAYANA ENGG. COLLEGE,
NELLORE.
Mail: [email protected]
http://var/www/apps/conversion/current/tmp/scratch20368/[email protected]://var/www/apps/conversion/current/tmp/scratch20368/[email protected]