Click here to load reader

Importance of Data Mining

Embed Size (px)

DESCRIPTION

Data mining focuses on extraction of information from a large set of data and transforms it into an easily interpretable structure for further use.

Citation preview

  • 1. Follow Us:1 Importance of Data Mining in IT Industry
  • 2. Follow Us:2 Data , Data everywhere..
  • 3. Follow Us:3 Introduction What is Data? Facts and statistics collected together for reference or analysis. As per Moores Law, The information density on silicon integrated circuits double every 18 to 24 months.
  • 4. Follow Us:4 There has been a huge increase in the amount of data being stored in database over past twenty years. All that user wants is more sophisticated information. This surges up the demand of Data mining.
  • 5. Follow Us:5 What is Data mining? Data mining refers to extracting or mining Knowledge from large amount of data. - By Jiawei Han, Micheline Kamber,
  • 6. Follow Us:6 Data mining focuses on extraction of information from a large set of data and transforms it into an easily interpretable structure for further use. As data is growing at very remarkable rate, there comes a need to analyze large, complex and information rich data sets to gain the hidden information. This may result into greater customer satisfaction and remarkable turn over for the firm.
  • 7. Follow Us: Why Data mining? 7 I am not able to find the data I need. (Data is dispersed over the network) The data I have access to is poorly documented. (Proper information is missing) I am not able to use the data I have. (Unexpected results)
  • 8. Follow Us: What is Data Warehouse? 8 Data Warehouse is an integrated, subject oriented and time-varying collection of data that is used primarily in organizational decision making. -Bill Inmon, Building the Data Warehouse 1996 Data Warehousing: Data Warehousing is a process of managing data from various sources in order of answering the Business questions.
  • 9. Follow Us:9 Examples of Data warehousing: Wal-Mart Terabytes (10^12 bytes) Geographic Information System Peta bytes(10^15 bytes) National Medical Records Extra bytes (10^18 bytes) Weather Images- Zettabyte(10^21 bytes) Intelligent video agency- Yottabyte (10^24 bytes)
  • 10. Follow Us: How data mining works with warehouse data? 10 Data warehouse avails enterprise with memory. Data warehouse avails enterprise with intelligence.
  • 11. Follow Us: Example of Data mining in IT: 11 A company that offers online services in several countries faces a problem in times of loading the homepage. Even IT department employees could not solve the bug. Based on customer feedback and last mile measurements, company gets to know that IT services are suffering from insufficient time for loading the homepage for some users.
  • 12. Follow Us:12 Management has the high interest in solving the problem because the response time of the site directly affects the online sales, IT department has already looked in to log files with the help of mining tool. It has been discovered that theres no such single type of typical problem. Instead data mining program discovers that there are various problem clusters.
  • 13. Follow Us:13 The patterns for this problem is determined by a combination of country code , operating system, and the version of a browser that the customers use. One another pattern shows that the same bad performance in loading times of the website is caused by a different problem.
  • 14. Follow Us:14 Speed of internet and cookie settings also lead to greater time for loading the site. This is how data mining helps in indentifying the problem and hence solving the problem becomes quick. This is just one example of how data mining can be so useful.
  • 15. Follow Us:15 Data mining Models and Tasks Data mining is widely divided into two parts: Predictive Data mining Descriptive Data mining
  • 16. Follow Us:16
  • 17. Follow Us:17 Predictive Data mining The objective of predictive tasks is to use the values of some variable to predict the values of other variable. Ex: Web mining is used by the online marketers to predict the purchase by online user on a website.
  • 18. Follow Us:18 Classification is used to map data in a predefined groups. Regression maps a data item to a real valued prediction variable. Clustering forms a similar data together. Summarization is used to map data in a subsets. Link Analysis defines relationships among data.
  • 19. Follow Us:19 Descriptive Data mining The objective of descriptive tasks is to find human readable patterns which describes the relationships between data.
  • 20. Follow Us:20 Architectures of a data mining system There are four Data mining architecture: I. No- Coupling II. Loose Coupling III.Semi tight Coupling IV.Tight coupling
  • 21. Follow Us:21 In this architecture, data mining system doesnt use any functionality of a database or data warehouse system. Data is retrieved from data sources like file system and processed using data mining algorithms which are stored into file system. This architecture is considered as a poor architecture for data mining system as it does not take any advantages of database or data warehouse. However it is used for simple data mining processes No-coupling:
  • 22. Follow Us:22 The loose coupling data mining system uses database or data warehouse for data retrieval. In this architecture, data mining system retrieves data from database or data warehouse, processes data using data mining algorithms and stores the result in those systems. Loose coupling architecture is for memory-based data mining system which does not require high scalability and high performance. Loose Coupling:
  • 23. Follow Us:23 In semi-tight coupling data mining architecture, it not only links it to database or data warehouse system, but it also uses several features of database or data warehouse systems which perform some data mining tasks like sorting and indexing etc. Moreover the intermediate result can also be stored in database or data warehouse system for better performance. Semi-tight Coupling:
  • 24. Follow Us:24 Tight-coupling: In this architecture, database or data warehouse is treated as an information retrieval component. Tight-coupling data mining architecture provides scalability, high performance and integrated information.
  • 25. Follow Us:25 Three tiers of Tight-coupling Data mining Architecture: i. Data Layer ii. Data mining application layer iii.Front-end Layer
  • 26. Follow Us:26 Data layer can be database or data warehouse systems which is an interface for all data sources. It stores results in data layer so it can be presented to end-user in form of reports. Datalayer
  • 27. Follow Us:27 This layer is used to retrieve data from database. And by performing transformation routine, we can get data into desired format. Then data is processed using various data mining algorithms. Data mining application layer
  • 28. Follow Us:28 Front-end layer provides visceral and user friendly interface for end-user to interact with data mining system. The result of data mining is presented in visualization form in front-end layer. Front-end layer:
  • 29. Follow Us:29 Although data mining is still in its early stage; companies in a wide range of industries such as retail, heath care, manufacturing, finance, and transportation are already using data mining techniques to take advantage of historical data. Data mining helps analysts to recognize significant facts, relationships, trends, patterns and anomalies which might go unnoticed otherwise. Application Area of Data mining:
  • 30. Follow Us:30 Data mining uses pattern recognition technologies and mathematical techniques to sift through warehoused information. In business, data mining is useful for discovering patterns and relationships in data to help make better decisions. Data mining helps in developing smarter marketing campaigns and to predict customer loyalty.
  • 31. Follow Us:31 Advantages of Data Mining Marketing /Retail : Data mining helps marketing companies to build models based on historical data which will precisely predict responders to the new marketing campaigns. Marketers will have appropriate approach for targeted customers. Data mining helps retail companies as well. By using market basket analysis, a store can have an appropriate arrangement in such a way that customers can purchase frequent buying products together with pleasant. It also helps the retail companies to offer certain discounts which will attract more customers.
  • 32. Follow Us:32 Finance / Banking By building a model from historical customers data of loans, the bank officials and financial institution can determine good and bad loans. Data mining also helps banks to detect fraudulent credit card transactions.
  • 33. Follow Us:33 Manufacturing Data mining is useful in operational engineering data which can detect faulty equipments and determines optimal control parameters. Data mining can determine the range of control parameters which leads to the production of perfect product. Hence optimal control parameters can provide the desired quality.
  • 34. Follow Us:34 Governments Data mining helps government agency to analyze records of financial transaction which will help in building patterns that can detect money laundering or criminal activities.
  • 35. Follow Us:35 Market segmentation - Data mining helps to identify the common characteristics of customers who buy the same products from your company. Customer anticipation - It helps to predict which customers may leave your company and go to a competitor. Fraud detection - It indentifies which transactions are most likely to be fraudulent. Specific use of data mining include:
  • 36. Follow Us:36 Direct marketing - Direct marketing identifies which prospects should be included to obtain the highest response rate. Interactive marketing - It is useful for predicting what each user on a Web site is most likely interested in seeing. Market basket analysis - It helps to understand what products or services are commonly purchased together. Trend analysis - Trend analysis identifies the difference between a typical customer this month and last.
  • 37. Follow Us:37 Disadvantages of data mining Privacy Issues The internet is booming with social networks, e- commerce, blogs etc, the concerns about the personal privacy has been increasing. This worries the users as the information might be collected and used in unethical way which can potentially cause a lot of troubles. Businesses collect the information of its users for setting up the marketing strategies but there are chances that business might be taken by other firms or gets shut down and thats where a concern of misusing or leaking the personal information arises.
  • 38. Follow Us:38 Security issues Security is the biggest concern in data mining. Businesses own all the information of their employees which even includes personal and financial information, there are the chances of misusing data by hackers and which cause serious trouble to the organization and its employees.
  • 39. Follow Us:39 Misuse of information/inaccurate information The information collected by the data mining may be exploited by unethical people or businesses in order to take benefits of vulnerable people. Data mining techniques is not totally accurate. So inaccurate information may lead the wrong decision-making which may cause serious consequences.
  • 40. Follow Us:40 Challenges of Data Mining Scalability: Scalable techniques are needed for handling massive size of datasets that are now being created. Poor efficiency: Such large datasets require the use of ecient methods for storing, indexing, and retrieving data from secondary or even tertiary storage systems. In order to perform the data mining task in timely manner, data mining techniques are needed.
  • 41. Follow Us:41 Complexity: Such techniques can dramatically increase the size of the datasets which can be handled and for that it requires new designs and algorithms. Dimensionality: Some domains have number of dimensions which are very large and makes the data analyzing difficult. That is called as 'curse of dimensionality. For example, bioinformatics.
  • 42. Follow Us:42