Importance of Data Mining
Preview:
DESCRIPTION
Data mining focuses on extraction of information from a large set of data and transforms it into an easily interpretable structure for further use.
Citation preview
- 1. Follow Us:1 Importance of Data Mining in IT Industry
- 2. Follow Us:2 Data , Data everywhere..
- 3. Follow Us:3 Introduction What is Data? Facts and statistics
collected together for reference or analysis. As per Moores Law,
The information density on silicon integrated circuits double every
18 to 24 months.
- 4. Follow Us:4 There has been a huge increase in the amount of
data being stored in database over past twenty years. All that user
wants is more sophisticated information. This surges up the demand
of Data mining.
- 5. Follow Us:5 What is Data mining? Data mining refers to
extracting or mining Knowledge from large amount of data. - By
Jiawei Han, Micheline Kamber,
- 6. Follow Us:6 Data mining focuses on extraction of information
from a large set of data and transforms it into an easily
interpretable structure for further use. As data is growing at very
remarkable rate, there comes a need to analyze large, complex and
information rich data sets to gain the hidden information. This may
result into greater customer satisfaction and remarkable turn over
for the firm.
- 7. Follow Us: Why Data mining? 7 I am not able to find the data
I need. (Data is dispersed over the network) The data I have access
to is poorly documented. (Proper information is missing) I am not
able to use the data I have. (Unexpected results)
- 8. Follow Us: What is Data Warehouse? 8 Data Warehouse is an
integrated, subject oriented and time-varying collection of data
that is used primarily in organizational decision making. -Bill
Inmon, Building the Data Warehouse 1996 Data Warehousing: Data
Warehousing is a process of managing data from various sources in
order of answering the Business questions.
- 9. Follow Us:9 Examples of Data warehousing: Wal-Mart Terabytes
(10^12 bytes) Geographic Information System Peta bytes(10^15 bytes)
National Medical Records Extra bytes (10^18 bytes) Weather Images-
Zettabyte(10^21 bytes) Intelligent video agency- Yottabyte (10^24
bytes)
- 10. Follow Us: How data mining works with warehouse data? 10
Data warehouse avails enterprise with memory. Data warehouse avails
enterprise with intelligence.
- 11. Follow Us: Example of Data mining in IT: 11 A company that
offers online services in several countries faces a problem in
times of loading the homepage. Even IT department employees could
not solve the bug. Based on customer feedback and last mile
measurements, company gets to know that IT services are suffering
from insufficient time for loading the homepage for some
users.
- 12. Follow Us:12 Management has the high interest in solving
the problem because the response time of the site directly affects
the online sales, IT department has already looked in to log files
with the help of mining tool. It has been discovered that theres no
such single type of typical problem. Instead data mining program
discovers that there are various problem clusters.
- 13. Follow Us:13 The patterns for this problem is determined by
a combination of country code , operating system, and the version
of a browser that the customers use. One another pattern shows that
the same bad performance in loading times of the website is caused
by a different problem.
- 14. Follow Us:14 Speed of internet and cookie settings also
lead to greater time for loading the site. This is how data mining
helps in indentifying the problem and hence solving the problem
becomes quick. This is just one example of how data mining can be
so useful.
- 15. Follow Us:15 Data mining Models and Tasks Data mining is
widely divided into two parts: Predictive Data mining Descriptive
Data mining
- 16. Follow Us:16
- 17. Follow Us:17 Predictive Data mining The objective of
predictive tasks is to use the values of some variable to predict
the values of other variable. Ex: Web mining is used by the online
marketers to predict the purchase by online user on a website.
- 18. Follow Us:18 Classification is used to map data in a
predefined groups. Regression maps a data item to a real valued
prediction variable. Clustering forms a similar data together.
Summarization is used to map data in a subsets. Link Analysis
defines relationships among data.
- 19. Follow Us:19 Descriptive Data mining The objective of
descriptive tasks is to find human readable patterns which
describes the relationships between data.
- 20. Follow Us:20 Architectures of a data mining system There
are four Data mining architecture: I. No- Coupling II. Loose
Coupling III.Semi tight Coupling IV.Tight coupling
- 21. Follow Us:21 In this architecture, data mining system
doesnt use any functionality of a database or data warehouse
system. Data is retrieved from data sources like file system and
processed using data mining algorithms which are stored into file
system. This architecture is considered as a poor architecture for
data mining system as it does not take any advantages of database
or data warehouse. However it is used for simple data mining
processes No-coupling:
- 22. Follow Us:22 The loose coupling data mining system uses
database or data warehouse for data retrieval. In this
architecture, data mining system retrieves data from database or
data warehouse, processes data using data mining algorithms and
stores the result in those systems. Loose coupling architecture is
for memory-based data mining system which does not require high
scalability and high performance. Loose Coupling:
- 23. Follow Us:23 In semi-tight coupling data mining
architecture, it not only links it to database or data warehouse
system, but it also uses several features of database or data
warehouse systems which perform some data mining tasks like sorting
and indexing etc. Moreover the intermediate result can also be
stored in database or data warehouse system for better performance.
Semi-tight Coupling:
- 24. Follow Us:24 Tight-coupling: In this architecture, database
or data warehouse is treated as an information retrieval component.
Tight-coupling data mining architecture provides scalability, high
performance and integrated information.
- 25. Follow Us:25 Three tiers of Tight-coupling Data mining
Architecture: i. Data Layer ii. Data mining application layer
iii.Front-end Layer
- 26. Follow Us:26 Data layer can be database or data warehouse
systems which is an interface for all data sources. It stores
results in data layer so it can be presented to end-user in form of
reports. Datalayer
- 27. Follow Us:27 This layer is used to retrieve data from
database. And by performing transformation routine, we can get data
into desired format. Then data is processed using various data
mining algorithms. Data mining application layer
- 28. Follow Us:28 Front-end layer provides visceral and user
friendly interface for end-user to interact with data mining
system. The result of data mining is presented in visualization
form in front-end layer. Front-end layer:
- 29. Follow Us:29 Although data mining is still in its early
stage; companies in a wide range of industries such as retail,
heath care, manufacturing, finance, and transportation are already
using data mining techniques to take advantage of historical data.
Data mining helps analysts to recognize significant facts,
relationships, trends, patterns and anomalies which might go
unnoticed otherwise. Application Area of Data mining:
- 30. Follow Us:30 Data mining uses pattern recognition
technologies and mathematical techniques to sift through warehoused
information. In business, data mining is useful for discovering
patterns and relationships in data to help make better decisions.
Data mining helps in developing smarter marketing campaigns and to
predict customer loyalty.
- 31. Follow Us:31 Advantages of Data Mining Marketing /Retail :
Data mining helps marketing companies to build models based on
historical data which will precisely predict responders to the new
marketing campaigns. Marketers will have appropriate approach for
targeted customers. Data mining helps retail companies as well. By
using market basket analysis, a store can have an appropriate
arrangement in such a way that customers can purchase frequent
buying products together with pleasant. It also helps the retail
companies to offer certain discounts which will attract more
customers.
- 32. Follow Us:32 Finance / Banking By building a model from
historical customers data of loans, the bank officials and
financial institution can determine good and bad loans. Data mining
also helps banks to detect fraudulent credit card
transactions.
- 33. Follow Us:33 Manufacturing Data mining is useful in
operational engineering data which can detect faulty equipments and
determines optimal control parameters. Data mining can determine
the range of control parameters which leads to the production of
perfect product. Hence optimal control parameters can provide the
desired quality.
- 34. Follow Us:34 Governments Data mining helps government
agency to analyze records of financial transaction which will help
in building patterns that can detect money laundering or criminal
activities.
- 35. Follow Us:35 Market segmentation - Data mining helps to
identify the common characteristics of customers who buy the same
products from your company. Customer anticipation - It helps to
predict which customers may leave your company and go to a
competitor. Fraud detection - It indentifies which transactions are
most likely to be fraudulent. Specific use of data mining
include:
- 36. Follow Us:36 Direct marketing - Direct marketing identifies
which prospects should be included to obtain the highest response
rate. Interactive marketing - It is useful for predicting what each
user on a Web site is most likely interested in seeing. Market
basket analysis - It helps to understand what products or services
are commonly purchased together. Trend analysis - Trend analysis
identifies the difference between a typical customer this month and
last.
- 37. Follow Us:37 Disadvantages of data mining Privacy Issues
The internet is booming with social networks, e- commerce, blogs
etc, the concerns about the personal privacy has been increasing.
This worries the users as the information might be collected and
used in unethical way which can potentially cause a lot of
troubles. Businesses collect the information of its users for
setting up the marketing strategies but there are chances that
business might be taken by other firms or gets shut down and thats
where a concern of misusing or leaking the personal information
arises.
- 38. Follow Us:38 Security issues Security is the biggest
concern in data mining. Businesses own all the information of their
employees which even includes personal and financial information,
there are the chances of misusing data by hackers and which cause
serious trouble to the organization and its employees.
- 39. Follow Us:39 Misuse of information/inaccurate information
The information collected by the data mining may be exploited by
unethical people or businesses in order to take benefits of
vulnerable people. Data mining techniques is not totally accurate.
So inaccurate information may lead the wrong decision-making which
may cause serious consequences.
- 40. Follow Us:40 Challenges of Data Mining Scalability:
Scalable techniques are needed for handling massive size of
datasets that are now being created. Poor efficiency: Such large
datasets require the use of ecient methods for storing, indexing,
and retrieving data from secondary or even tertiary storage
systems. In order to perform the data mining task in timely manner,
data mining techniques are needed.
- 41. Follow Us:41 Complexity: Such techniques can dramatically
increase the size of the datasets which can be handled and for that
it requires new designs and algorithms. Dimensionality: Some
domains have number of dimensions which are very large and makes
the data analyzing difficult. That is called as 'curse of
dimensionality. For example, bioinformatics.
- 42. Follow Us:42