35
DATA MINING AN INTRODUCTION TO DATA MINING TECHNIQUES AND ITS APPLICATIONS

DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

Embed Size (px)

Citation preview

Page 1: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

DATA MININGAN INTRODUCTION TO DATA MINING TECHNIQUES AND ITS APPLICATIONS

Page 2: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|2|

Outline

Introduction

Motivation for Data Mining

What is Data Mining?

Data Mining Techniques

Data Mining Applications

Motivation for Text Mining

Text Mining Applications

Software

NAHED TAHA DATA MINING: AN INTRODUCTION

Page 3: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|3|

Introduction…BIG DATA Everywhere!

Big Data is a term that describes the large volume of a - both structured andunstructured - that have not been handled by the traditional data managementsystems. To handle this data, Analytics is require.

Big Data

Big Analytics

Big Insights

Big Data can be described by the following characteristics (4 V’s):

Volume - too Big - Terabytes and more of Credit Card Transactions, Web Usage Data.

Velocity - too Fast - Sensor Data, Live Web Traffic, Mobile Phone Usage, GPS Data.

Variety - too Complex – Truly Unstructured Data such as Social Media, Customer Reviews.

Veracity - Messiness or Trustworthiness - Social Media Content.

There is another V to take into account when looking at Big Data: Value!Having access to big data is no good unless we can turn it into value.

Value

NAHED TAHA DATA MINING: AN INTRODUCTION

Page 4: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|4|

Introduction… “Search” vs. “Discover”

Data MiningData Retrieval

Information Retrieval

Search(Goal-Oriented)

Discover(Opportunistic)

Structured Data

Unstructured Data (Text) Text Mining

NAHED TAHA DATA MINING: AN INTRODUCTION

Page 5: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|5|

Motivation for Data Mining

Growth Rate of Data.

Growth of Internet (Arena for Information Generation).

Information and communications technology (ICT) produces a flood of data. These datarepresent traces of almost all kinds of activities of individuals enabling an entirely newscientific approach for social analysis. (BIG DATA: New Era of Computational SocialSciences)

The development of computer capacities makes it possible to handle the data deluge andto invent models that reflect the diversity and complexity of the society.

NAHED TAHA DATA MINING: AN INTRODUCTION

Page 6: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|6|

What is Data Mining?

Many Definitions

Non-trivial extraction of implicit, previously unknown and potentially useful information from data.

Exploration and analysis, by automatic or semi-automatic means, of large quantities of data in order to

discover meaningful patterns and rules.

Discovering meaningful correlations, patterns and trends by sifting through large amounts of data stored in

repositories. Data mining employs recognitions technologies, as well as statistical and mathematical techniques.

There are many other terms carrying a similar or slightly different meaning to DM such asKnowledge Mining from Databases, Knowledge Extraction, Data or Pattern Analysis, BusinessIntelligence (BI).

NAHED TAHA DATA MINING: AN INTRODUCTION

DM is a major and growing area of research and applications across the social sciences and humanities

that intersect with many disciplines such as Artificial Intelligence (AI), Databases Systems, Statistics,

Visualization, and High-Performance and Parallel Computing.

Page 7: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|7|

What is Data Mining?

Knowledge Discovery in Databases (KDD) Process

Data Mining is the core of Knowledge Discovery in Database (KDD) process, involving the inferring of algorithms that explore the data, develop the model and discover previously unknown patterns.

NAHED TAHA DATA MINING: AN INTRODUCTION

Page 8: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|8|

Association Rules

Clustering Techniques

Decision Trees

Neural Nets

Key Data Mining Techniques

NAHED TAHA DATA MINING: AN INTRODUCTION

Page 9: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|9|

Association Rules

Clustering Techniques

Decision Trees

Neural Nets

Key Data Mining Techniques

NAHED TAHA DATA MINING: AN INTRODUCTION

Page 10: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|10|

Association Rules Mining

Which products are frequently bought together?Analyzing the sales behavior of customers in order to identify sets of products

frequently bought together.

Proposed by Agrawal et al in 1993.

Idea come from the Market Basket Analysis (MBA).

Study of “what goes with what.”

Finding frequent patterns, associations, correlations, or causalstructures among a set of items or objects in transactiondatabases, relational databases, and other informationrepositories.

Association rules take the form of IF-THEN rules (if item A ispresent in a transaction, then item B will present as well).

Applications of Association Rules:

Product Assortment Optimization, Fraud Detection, SequenceDiscovery, Inventory Control, Cross-Selling, Healthcare.

NAHED TAHA DATA MINING: AN INTRODUCTION

Page 11: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|11|

Association Rules Mining

Rule # (Support, Confidence)

Performance Measures:Min Support: 0.5Min Confidence: 0.5

Graphical Representation of Association Rules Using RapidMinerOnline Statistics Courses Purchase Data at Statistics.com

Support

Confidence

Lift Ratio

Performance Measures:

Number of transactions that include both the antecedent andconsequent item sets, P(A,B).

Ratio of the number of transactions that include all antecedent andconsequent item sets to the number of transactions that include allthe antecedent item sets, P(B|A).

Confidence of the rule divided by the confidence , assuming independence ofconsequent from antecedent, P(B|A)/P(B).

Indicates how efficient the rule is in finding consequent, compared to randomselection.

NAHED TAHA DATA MINING: AN INTRODUCTION

Page 12: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|12|

Association Rules Mining

Min Lift Ratio: 1The larger the lift ration, the greater

the strength of the association.

Graphical Representation of Association Rules Using RapidMinerOnline Statistics Courses Purchase Data at Statistics.com

Performance Measures:Min Support: 0.5Min Confidence: 0.5

NAHED TAHA DATA MINING: AN INTRODUCTION

Page 13: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|13|

Association Rules Mining

Application of Association Rules ine-Commerce Recommendation System(e.g. Amazon.com’s Online Shopping System)

Recommendations to customers arebased on past purchases and whatother customers are purchasing.

NAHED TAHA DATA MINING: AN INTRODUCTION

Page 14: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|14|

Association Rules

Clustering Techniques

Decision Trees

Neural Nets

Key Data Mining Techniques

NAHED TAHA DATA MINING: AN INTRODUCTION

Page 15: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|15|

Clustering Data Mining

Exploratory Technique.

Finding groups of objects such that the objects in a group will be similar (orrelated) to one another and different from (or unrelated to) the objects inother groups.

There are a number of clustering methods, including Partitional Clustering(k-means algorithm), Hierarchical Clustering, Density-based Clustering, etc.

Choice of method (or algorithm) depends on type of data available and thenature and purpose of the application.

Objectives of Clustering: Taxonomy Description, Data Simplification,Relationship Identification, Outlier Detection.

Applications of Clustering DM:

Marketing, Information Retrieval, Land Use, Insurance, City Planning, Crime Detection.

Inter-cluster distances are maximized

Intra-cluster distances are minimized

Goal: Find high-quality clusters such that the inter-cluster similarity (distance) is low and the intra-clustersimilarity (distance) is high.

NAHED TAHA DATA MINING: AN INTRODUCTION

Page 16: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|16|

Clustering Data Mining

NAHED TAHA DATA MINING: AN INTRODUCTION

Using ArcMap 10.3

Location of Property Crimes across several cities in the greater Los Angeles Area

Compton Law Enforcement

LakewoodLaw Enforcement

Property Crime

Law Areas

Block Groups

Page 17: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|17|

Clustering Data Mining

Mapping Crime IntensityThis map visualizes where property crime occurs more frequently in the two law

enforcement areas.

Where are property crime rates high?How may property crimes occurred in each block group?

This map visualizes the rate of property crime per 100 crimes in each group-block.

Property Crime Rate

> 625 to 1,000

> 455 to 625

> 320 to 455

> 200 to 320

38 to 200

PropertyCrimesDensity

Low Density

High Density

NAHED TAHA DATA MINING: AN INTRODUCTION

GIS-based Spatial Cluster Analysis of Property Crimes

Page 18: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|18|

Clustering Data Mining

Hot-Spot AnalysisThis map shows that the eastern corner part of the Compton law enforcementarea and the northwestern corner part of the Lakewood law enforcement havestatistically significant higher rates of property crime.

Cluster and Outlier Analysis This map distinguishes between a statistically significant cluster of high values(HH), cluster of low values (LL), outlier in which a high value is surroundedprimarily by low values (HL), and outlier in which a low value is surroundedprimarily by high values (LH).

Crime Hotspots

Cold Spot - 99% Confidence

Cold Spot - 95% Confidence

Cold Spot - 90% Confidence

Not Significant

Hot Spot - 90% Confidence

Hot Spot - 95% Confidence

Hot Spot - 99% Confidence

Property Crime Rate

Not Significant

High-High Cluster

High-Low Outlier

Low-High Outlier

Low-Low Cluster

NAHED TAHA DATA MINING: AN INTRODUCTION

GIS-based Spatial Cluster Analysis of Property Crimes

Page 19: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|19|

Association Rules

Clustering Techniques

Decision Trees

Neural Nets

Key Data Mining Techniques

NAHED TAHA DATA MINING: AN INTRODUCTION

Page 20: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|20|

Decision Trees Mining

One of the most widely used techniques in DM.

Classification (Classification Trees) and Predictive

(Regression Trees) DM Technique.

Also called Trees or CART.

Decision tree (DT) models consist of a set of rules (IF-THEN

Rules) for diving a large heterogeneous population into

smaller, more homogenous (mutually exclusive) groups with

respect to a particular target. (Recursive Partitioning)

DT Generation Approach usually consists of Two Phases:

Tree Construction and Tree Pruning.

NAHED TAHA DATA MINING: AN INTRODUCTION

Best Pruned Classification Tree for The Loan Acceptance DataThis figure presents a tree for classifying bank customers who receive a loanoffer as either acceptors (=1) or non-acceptors (=0), as a function of informationsuch as their income, age, average credit card expenditure, mortgage, etc.

Using RapidMiner

Accuracy: 95.07%Classification Error: 4.93%

Page 21: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|21|

Decision Tree Mining

DT can also be used to estimate the value of a continuoustarget variable (Regression Tree). However, regressionmodels and neural nets are generally more appropriate forestimation.

Regression trees operate in much the same fashion asclassification trees.

Advantages of Decision Tree:

Easy to use and, maybe even more important, also easy tounderstand even by non-experts.

Useful for data exploration and variable selection even when youplan to use different technique to create the final model.

Applications of Decision Tree:

Credit Approval, Target Marketing, Medical Diagnosis, etc.

Best Pruned Regression Tree for Toyota Corolla PricesThis figure presents a tree for Predicting Toyota Corolla prices, as a function ofinformation such as age, HP, KM, # of doors, automatic, fuel type, etc.

Using XLMiner

NAHED TAHA DATA MINING: AN INTRODUCTION

Page 22: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|22|

Association Rules

Clustering Techniques

Decision Trees

Neural Nets

Key Data Mining Techniques

NAHED TAHA DATA MINING: AN INTRODUCTION

Page 23: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|23|

Neural Networks Mining

Classification and Predictive DM Technique.

Also called Artificial Neural Networks (ANN).

NNs incorporate the two fundamental components ofbiological neural nets: Neurons (nodes) and Synapses(weights).

The main strength of NNs is their high predictive performance.Their structure supports capturing very complex relationshipsbetween predictors and a response, which is often not possiblewith other classifiers.

Applications of Neural Nets:

Customers Relationship Management (CRM), Currency MarketTrading, Bank Loan Approval, Bankruptcy Predictions, etc.

NAHED TAHA DATA MINING: AN INTRODUCTION

RapidMiner Output for Neural Net for The Loan Acceptance Data.

True non-Acceptors True Acceptors

Pred. non-Acceptors 1342 61

Pred. Acceptors 8 89

Accuracy: 95.40%Classification Error: 4.60%

Confusion Matrix

Black-Box

Parameters:Training Cycles: 100Learning Rate: 0.5Momentum: 0.2

weights

weights

Input Nodes

Hidden Nodes

Output Nodes

Page 24: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|24|

Text Mining

Data Mining

Text Mining

Data Retrieval

Information Retrieval

Search(Goal-Oriented)

Discover(Opportunistic)

Structured Data

Unstructured Data (Text)

NAHED TAHA DATA MINING: AN INTRODUCTION

Page 25: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|25|

Motivation for Text Mining

Text mining is well motivated, due to the fact that much of the world’s data can befound in free text form (newspaper articles, emails, literature, etc.). There is a lot ofinformation available to mine.

Unstructured data will accounts

for more than 80% of the world’s data (Source: Oracle Corporation).

Internet of Things

NAHED TAHA DATA MINING: AN INTRODUCTION

Page 26: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|26|

Text Mining

Also called Text Data Mining or Text Intelligent Analysis.

Aim: Extract interesting and non-trivial information andknowledge from unstructured textual data.

Uses methods and techniques from Machine Learning, NaturalLanguage Processing, Linguistics, Statistics, etc.

Text mining tasks include keyword-based association analysis,text categorization, text clustering, concept or entityextraction, sentiment analysis, document summarization, andentity relation modeling.

Text Mining Applications:

Customer Profile Analysis, Analyzing Open-ended Survey Responses,Trend Analysis, Information Filtering and Routing, Event Tracks,News Stories Classification, Web Search, etc.

Term Co-Occurrence Map of the Journal of the American Society for InformationScience and Technology, using a set of 2000 papers published from 2004-2015. Inthis map, colors used to distinguish clusters of related terms. These clusters reflectthe three main topics discussed in the publications: Bibliometrics andScientometrics, Information Retrieval, and Information Science. (Data Source: Webof Science Website).

Using VOSviewer

Circle: Occurrence WeightLine Strength: Number of co-occurrences ofthe two terms that are connected by theline.

NAHED TAHA DATA MINING: AN INTRODUCTION

Page 27: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|27|

Text Mining

Text CleanupTokenization

Part of Speech TaggingWord Sense Disambiguation

Semantic Structures

Knowledge Discovery in Textual Databases

NAHED TAHA DATA MINING: AN INTRODUCTION

Page 28: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|28|

Text Mining Applications

Detecting Patterns in 2012 US Elections News Coverage

Apply insights from several theories and techniquesincluding: graph partitioning, centrality, assortativity,hierarchy and structural balance.

Using text mining techniques, they detect election-relatedarticles, parse their content, extract the key actors in anarration and their relations, form a network whosetopology has then been analyzed.

This is the first study in which political positions areautomatically extracted and derived from a very largecorpus of online news, generating a network that goes wellbeyond traditional word-association networks by means ofricher linguistic analysis of texts.

Endorse/ Oppose Network of Actors in 2012 U.S. Presidential ElectionsNarrative Network of US Election 2012 - Nodes indicate noun phrases, links go fromsubject to object, nodes size reveals the frequency of mentions of an actor, and thegreen/red links imply positive/negative relations between actors (support/opposition)(Sudhahar et al., 2015).

During the primaries phase (January to August) After conventions 2012 (August to November)

Semantic Graphs

Obama & Romney: Dominant Players

Detecting “Who did What to Whom”Ex. Obama criticized Romney.(Subject-Verb-Object (SVO))

NAHED TAHA DATA MINING: AN INTRODUCTION

Page 29: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

POLITICAL CAMPAIGNSDATA MINING IN US PRESIDENTIAL ELECTION CAMPAIGN

|29|

The 2012 U.S. presidential election will be a classic caseof big data analytics and its applications for many yearsto come. The real winner of the 2012 U.S. presidential

election is ANALYTICS.

NAHED TAHA DATA MINING: AN INTRODUCTION

Page 30: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

PREDICT SOCCER WORLD CUP 2010 WINNER

RapidMiner (Analytics platform) predicted Spain as Winner of the World Cup 2010 even before the event had even started.

WHAT SHOULD A PLAYER DO WHEN BIG DATA PREDICTS HIS BEHAVIOR?

In 2010, RapidMiner used sentiment analysis (opinionmining) to accurately predict the winner of the final gamemore than a week before the matches began. The companyused its RapidMiner server solution to monitor sentiment innews texts, blog and forum posts – analyzing more than 1,000online channels and thousands of posts per minute – andaggregated the results. As a result of its sentiment analysis,RapidMiner correctly identified Spain as the winner of the 2010championship, which featured a matchup between Spain andthe Netherlands that Spain won 1-0.Source: http://www.kdnuggets.com/2014/05/predict-world-cup-2014-winner-rapidminer-contest.html

|30|NAHED TAHA DATA MINING: AN INTRODUCTION

Page 31: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

Source: https://business.linkedin.com/talent-solutions/blog/2014/01/top-10-job-titles-that-didnt-exist-5-years-ago-infographic

BIG DATA JOBS TRENDS LinkedIn study examined the data of over 259 million members’ profiles to

determine the top 10 most popular job titles that were nowhere to be found in 2008 (November, 2013).

|31|NAHED TAHA DATA MINING: AN INTRODUCTION

Page 32: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

MODERN DATA SCIENTISTMost successful data scientists have substantial, deep expertise in at least one

aspect of data science: Statistics, Machine Learning, Big Data, Business Communication.

|32|NAHED TAHA DATA MINING: AN INTRODUCTION

Page 33: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|33|

Software

RapidMiner Studio (Open Source Predictive Analytics Platform):https://rapidminer.com/

XLMiner ( DM add-in for Excel):http://www.solver.com/xlminer-data-mining

Microsoft Azure Machine Learning:https://azure.microsoft.com/en-us/services/machine-learning/

DM Packages for R (DM Programming Language):Data Mining Classification and Regression Methods Package:https://cran.r-project.org/web/packages/rminer/index.html

Text Mining Package:https://cran.rproject.org/web/packages/tm/index.html

Microsoft SQL Server 2012 (DM Add-ins for Microsoft Office):http://www.microsoft.com/en-us/download/details.aspx?id=35578

Angoss Predective Analyticshttp://www.angoss.com/

NAHED TAHA DATA MINING: AN INTRODUCTION

Page 34: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

|34|

ReferencesAbdullah, N., Ismail, S., Sophiayati, S., & Sam, S. (2015). Data Quality in Big Data: A Review. International Journal of Advances in Soft Computing and its Application, 7(3), 16-27.

Agrawal, R., Imieliński, T.,& Swami, A. (1993). Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD international conferenceon Management of data – (SIGMOD 1993). p. 207.

Chitra, K,. & Subashini, B. (2013). Data Mining Techniques and its Applications in Banking Sector. International Journal of Emerging Technology and Advanced Engineering, 3(8),219-226.

Conte, R., Gilbert, N., Bonelli, G., Cioffi-Revilla, C., Deffuant, G., Kertesz, J., Loreto, V., Moat, S., Nadal, P., Sánchez, A., Nowak, A., Flache, A., San Miguel, M., Helbing, D. (2012).Manifesto of Computational Social Science. The European Physical Journal Special Topics, 214 (1), 325-346.

Han, J., & Kamber, M. (2006). Data Mining: Concepts and Techniques, 2nd Edition, Morgan Kaufmann Publishers, San Francisco.

Issenberg, S. (2012). How President Obama’s campaign used big data to rally individual voters. Mit Technology Review. Accessed 30 December 2015. URL:http://www.technologyreview.com/featuredstory/509026/how-obamas-team-used-big-data-to-rally-voters/

North, M. A.(2012). Data Mining for the Masses. A Global Text Project Book.

Shmueli, G., Patel, N. R., & Bruce, P. (2010). Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner,2nd Edition, JohnWiley & Sons Inc.

Sudhahar, S., Veltri, G. A., Cristianini, N. (2015). Automated analysis of the US presidential elections using Big Data and network analysis, Big Data & Society 2 (1), 1-28. URL:http://bds.sagepub.com/content/2/1/2053951715572916

Van Eck, N.J., & Waltman, L. (2011). Text mining and visualization using VOSviewer. ISSI Newsletter, 7(3), 50-54.

Zafarani, R., Abbasi, M.A., & Liu, H. (2014). Social Media Mining: An Introduction. Cambridge: Cambridge University Press.

NAHED TAHA DATA MINING: AN INTRODUCTION

Page 35: DATA MINING - scholar.cu.edu.egscholar.cu.edu.eg/zeini/files/data_mining.pdf · Graphical Representation of Association Rules Using RapidMiner ... NAHED TAHA DATA MINING: AN INTRODUCTION

Thank You