318
• Data mining https://store.theartofservice.com/the-data-mining- toolkit.html

Data mining

Embed Size (px)

Citation preview

Page 1: Data mining

• Data mining

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 2: Data mining

Data mining

1 The overall goal of the data mining process is to extract information from

a data set and transform it into an understandable structure for further

use

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 3: Data mining

Data mining

1 Even the popular book "Data mining: Practical machine learning tools and techniques with Java" (which covers mostly machine learning material)

was originally to be named just "Practical machine learning", and the term "data mining" was only added

for marketing reasons

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 4: Data mining

Data mining

1 Neither the data collection, data preparation, nor result interpretation

and reporting are part of the data mining step, but do belong to the overall KDD process as additional

steps.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 5: Data mining

Data mining

1 The related terms data dredging, data fishing, and data snooping refer to the use of data mining methods to sample

parts of a larger population data set that are (or may be) too small for reliable

statistical inferences to be made about the validity of any patterns discovered.

These methods can, however, be used in creating new hypotheses to test against

the larger data populations.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 6: Data mining

Data mining

1 Data mining interprets its data into real time analysis that can be used to increase sales, promote new product,

or delete product that is not value-added to the company.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 7: Data mining

Data mining Etymology

1 Currently, Data Mining and Knowledge

Discovery are used interchangeably.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 8: Data mining

Data mining Background

1 Data mining is the process of applying these methods with the intention of uncovering hidden

patterns in large data sets

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 9: Data mining

Data mining Research and evolution

1 The premier professional body in the field is the Association for Computing

Machinery's (ACM) Special Interest Group (SIG) on Knowledge Discovery and Data Mining (SIGKDD). Since 1989 this ACM SIG has hosted an annual international

conference and published its proceedings, and since 1999 it has

published a biannual academic journal titled "SIGKDD Explorations".

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 10: Data mining

Data mining Research and evolution

1 Computer science conferences on data

mining include:

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 11: Data mining

Data mining Research and evolution

1 DMKD Conference – Research Issues on Data Mining and

Knowledge Discovery

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 12: Data mining

Data mining Research and evolution

1 ECDM Conference – European

Conference on Data Mining

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 13: Data mining

Data mining Research and evolution

1 ECML-PKDD Conference – European Conference on Machine Learning and Principles and Practice of Knowledge

Discovery in Databases

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 14: Data mining

Data mining Research and evolution

1 EDM Conference – International Conference

on Educational Data Mining

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 15: Data mining

Data mining Research and evolution

1 PAKDD Conference – The annual Pacific-Asia Conference on Knowledge Discovery and Data

Mining

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 16: Data mining

Data mining Research and evolution

1 SSTD Symposium – Symposium on Spatial and Temporal

Databases

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 17: Data mining

Data mining Research and evolution

1 Data mining topics are also present on many data

management/database conferences such as the ICDE Conference,

SIGMOD Conference and International Conference on Very

Large Data Bases

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 18: Data mining

Data mining Process

1 (5) Interpretation/Evaluation.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 19: Data mining

Data mining Process

1 It exists, however, in many variations on this theme, such as the Cross

Industry Standard Process for Data Mining (CRISP-DM) which defines six

phases:

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 20: Data mining

Data mining Process

1 (5) Evaluation

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 21: Data mining

Data mining Process

1 or a simplified process such as (1) , (2) data mining, and (3) results validation.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 22: Data mining

Data mining Process

1 Polls conducted in 2002, 2004, and 2007 show that the CRISP-DM methodology is the leading methodology used by data miners. The only other data mining standard named

in these polls was SEMMA. However, 3-4 times as many people reported using CRISP-

DM. Several teams of researchers have published reviews of data mining process

models, and Azevedo and Santos conducted a comparison of CRISP-DM and SEMMA in

2008.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 23: Data mining

Data mining Pre-processing

1 Before algorithms can be used, a target data set must be

assembled

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 24: Data mining

Data mining

1 Anomaly detection (Outlier/change/deviation detection) – The identification of unusual data records, that might be interesting or

data errors that require further investigation.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 25: Data mining

Data mining

1 Association rule learning (Dependency modeling) – Searches for relationships

between variables. For example a supermarket might gather data on customer purchasing habits. Using

association rule learning, the supermarket can determine which products are

frequently bought together and use this information for marketing purposes. This is

sometimes referred to as market basket analysis.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 26: Data mining

Data mining

1 Clustering – is the task of discovering groups and structures in the data that are in some way or another "similar", without using known

structures in the data.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 27: Data mining

Data mining

1 Classification – is the task of generalizing known structure to

apply to new data. For example, an e-mail program might attempt to

classify an e-mail as "legitimate" or as "spam".

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 28: Data mining

Data mining

1 Regression – Attempts to find a function which models the data with the least error.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 29: Data mining

Data mining

1 Summarization – providing a more compact representation of the data

set, including visualization and report generation.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 30: Data mining

Data mining Results validation

1 For example, a data mining algorithm trying to distinguish "spam" from

"legitimate" emails would be trained on a training set of sample e-mails

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 31: Data mining

Data mining Results validation

1 If the learned patterns do not meet the desired , subsequently it is

necessary to re-evaluate and change the pre-processing and data mining

steps. If the learned patterns do meet the desired , then the final step

is to interpret the learned patterns and turn them into knowledge.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 32: Data mining

Data mining Standards

1 There have been some efforts to define standards for the data mining process, for

example the 1999 European Cross Industry Standard Process for Data Mining

(CRISP-DM 1.0) and the 2004 Java Data Mining standard (JDM 1.0). Development on successors to these processes (CRISP-DM 2.0 and JDM 2.0) was active in 2006,

but has stalled since. JDM 2.0 was withdrawn without reaching a final draft.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 33: Data mining

Data mining Standards

1 As the name suggests, it only covers prediction models, a particular data mining task of high importance to

business applications

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 34: Data mining

Data mining Games

1 for 3x3-chess) with any beginning configuration, small-board dots-and-boxes, small-board-hex, and certain endgames in chess, dots-and-boxes, and hex; a new area for data mining

has been opened

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 35: Data mining

Data mining Business

1 If Walmart analyzed their point-of-sale data with data mining

techniques they would be able to determine sales trends, develop marketing campaigns, and more

accurately predict customer loyalty

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 36: Data mining

Data mining Business

1 Once the results from data mining (potential prospect/customer and

channel/offer) are determined, this "sophisticated application" can either

automatically send an e-mail or a regular mail

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 37: Data mining

Data mining Business

1 In order to maintain this quantity of models, they need to manage model versions and move on to automated

data mining.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 38: Data mining

Data mining Business

1 Data mining can also be helpful to human resources (HR) departments in identifying the

characteristics of their most successful employees. Information obtained – such as universities attended by highly successful

employees – can help HR focus recruiting efforts accordingly. Additionally, Strategic Enterprise

Management applications help a company translate corporate-level goals, such as profit

and margin share targets, into operational decisions, such as production plans and

workforce levels.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 39: Data mining

Data mining Business

1 If a clothing store records the purchases of customers, a data

mining system could identify those customers who favor silk shirts over

cotton ones

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 40: Data mining

Data mining Business

1 Market basket analysis has also been used to identify the purchase patterns of the Alpha Consumer. Alpha Consumers are

people that play a key role in connecting with the concept behind a product, then

adopting that product, and finally validating it for the rest of society. Analyzing the data collected on this type of user has allowed companies to predict future buying trends

and forecast supply demands.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 41: Data mining

Data mining Business

1 Data mining is a highly effective tool in the catalog marketing industry.

Catalogers have a rich database of history of their customer transactions for millions of customers dating back a number of years. Data mining tools

can identify patterns among customers and help identify the most

likely customers to respond to upcoming mailing campaigns.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 42: Data mining

Data mining Business

1 Data mining for business applications is a component that needs to be integrated

into a complex modeling and decision making process. Reactive business

intelligence (RBI) advocates a "holistic" approach that integrates data mining, modeling, and interactive visualization

into an end-to-end discovery and continuous innovation process powered

by human and automated learning.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 43: Data mining

Data mining Business

1 The relation between the quality of a data mining system and the amount of investment that the decision maker is willing to make was

formalized by providing an economic perspective on the value of “extracted knowledge” in terms of its payoff to the

organization This decision-theoretic classification framework was applied to a real-world semiconductor wafer manufacturing line, where decision rules for effectively monitoring

and controlling the semiconductor wafer fabrication line were developed.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 44: Data mining

Data mining Business

1 Another implication is that on-line monitoring of the semiconductor

manufacturing process using data mining may be highly effective.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 45: Data mining

Data mining Science and engineering

1 In recent years, data mining has been used widely in the areas of science and engineering, such as

bioinformatics, genetics, medicine, education and electrical power

engineering.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 46: Data mining

Data mining Science and engineering

1 The data mining method that is used to perform this task is known as

multifactor dimensionality reduction.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 47: Data mining

Data mining Science and engineering

1 In the area of electrical power engineering, data mining methods

have been widely used for condition monitoring of high voltage electrical

equipment

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 48: Data mining

Data mining Science and engineering

1 Data mining methods have also been applied to dissolved gas analysis

(DGA) in power transformers. DGA, as a diagnostics for power

transformers, has been available for many years. Methods such as SOM

has been applied to analyze generated data and to determine

trends which are not obvious to the standard DGA ratio methods (such as

Duval Triangle).https://store.theartofservice.com/the-data-mining-toolkit.html

Page 49: Data mining

Data mining Science and engineering

1 In this way, data mining can facilitate

institutional memory.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 50: Data mining

Data mining Science and engineering

1 Other examples of application of data mining methods are biomedical

data facilitated by domain ontologies, mining clinical trial data,

and traffic analysis using SOM.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 51: Data mining

Data mining Science and engineering

1 In adverse drug reaction surveillance, the Uppsala Monitoring Centre has, since 1998,

used data mining methods to routinely screen for reporting patterns indicative of emerging drug safety issues in the WHO global database of 4.6 million suspected

adverse drug reaction incidents. Recently, similar methodology has been developed to mine large collections of electronic health records for temporal patterns associating drug prescriptions to medical diagnoses.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 52: Data mining

Data mining Human rights

1 Data mining of government records – particularly records of the justice

system (i.e., courts, prisons) – enables the discovery of systemic

human rights violations in connection to generation and publication of

invalid or fraudulent legal records by various government agencies.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 53: Data mining

Data mining Medical data mining

1 In 2011, the case of Sorrell v. IMS Health, Inc., decided by the Supreme Court of the United States, ruled that pharmacies may share information

with outside companies. This practice was authorized under the 1st

Amendment of the Constitution, protecting the "freedom of speech."

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 54: Data mining

Data mining Spatial data mining

1 So far, data mining and Geographic Information Systems (GIS) have

existed as two separate technologies, each with its own

methods, traditions, and approaches to visualization and data analysis

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 55: Data mining

Data mining Spatial data mining

1 Data mining offers great potential benefits for GIS-based applied decision-making.

Recently, the task of integrating these two technologies has become of critical

importance, especially as various public and private sector organizations possessing

huge databases with thematic and geographically referenced data begin to

realize the huge potential of the information contained therein. Among those

organizations are:https://store.theartofservice.com/the-data-mining-toolkit.html

Page 56: Data mining

Data mining Spatial data mining

1 offices requiring analysis or dissemination of geo-referenced statistical data

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 57: Data mining

Data mining Spatial data mining

1 public health services searching for explanations of disease clustering

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 58: Data mining

Data mining Spatial data mining

1 environmental agencies assessing the impact of changing land-use patterns on climate

change

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 59: Data mining

Data mining Spatial data mining

1 geo-marketing companies doing customer segmentation based on spatial location.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 60: Data mining

Data mining Spatial data mining

1 Challenges in Spatial mining: Geospatial data repositories tend to be very large

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 61: Data mining

Data mining Spatial data mining

1 Developing and supporting geographic data warehouses

(GDW's): Spatial properties are often reduced to simple aspatial attributes

in mainstream data warehouses. Creating an integrated GDW requires solving issues of spatial and temporal

data interoperability – including differences in semantics, referencing systems, geometry, accuracy, and

position.https://store.theartofservice.com/the-data-mining-toolkit.html

Page 62: Data mining

Data mining Spatial data mining

1 Geographic data mining methods should recognize more complex

geographic objects (i.e., lines and polygons) and relationships (i.e., non-

Euclidean distances, direction, connectivity, and interaction through attributed geographic space such as

terrain)

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 63: Data mining

Data mining Spatial data mining

1 Geographic knowledge discovery using diverse data types: GKD

methods should be developed that can handle diverse data types

beyond the traditional raster and vector models, including imagery

and geo-referenced multimedia, as well as dynamic data types (video

streams, animation).

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 64: Data mining

Data mining Sensor data mining

1 By measuring the spatial correlation between data sampled by different sensors, a wide class of specialized

algorithms can be developed to develop more efficient spatial data

mining algorithms.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 65: Data mining

Data mining Visual data mining

1 In the process of turning from analogical into digital, large data sets have been generated, collected, and

stored discovering statistical patterns, trends and information

which is hidden in data, in order to build predictive patterns. Studies

suggest visual data mining is faster and much more intuitive than is traditional data mining. See also

Computer vision.https://store.theartofservice.com/the-data-mining-toolkit.html

Page 66: Data mining

Data mining Music data mining

1 Data mining techniques, and in particular co-occurrence analysis,

has been used to discover relevant similarities among music corpora (radio lists, CD databases) for the purpose of classifying music into

genres in a more objective manner.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 67: Data mining

Data mining Surveillance

1 Data mining has been used to fight

terrorism by the U.S

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 68: Data mining

Data mining Surveillance

1 In the context of combating terrorism, two particularly plausible methods of data mining are "" and

"subject-based data mining".

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 69: Data mining

Data mining Pattern mining

1 "Pattern mining" is a data mining method that involves finding existing patterns in data. In

this context patterns often means association rules. The original motivation for searching association rules came from the desire to

analyze supermarket transaction data, that is, to examine customer behavior in terms of the

purchased products. For example, an association rule "beer ⇒ potato chips (80%)"

states that four out of five customers that bought beer also bought potato chips.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 70: Data mining

Data mining Pattern mining

1 In the context of pattern mining as a tool to identify terrorist activity, the National Research

Council provides the following definition: "Pattern-based data mining looks for patterns (including

anomalous data patterns) that might be associated with terrorist activity — these patterns

might be regarded as small signals in a large ocean of noise." Pattern Mining includes new

areas such a Music Information Retrieval (MIR) where patterns seen both in the temporal and

non temporal domains are imported to classical knowledge discovery search methods.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 71: Data mining

Data mining Subject-based data mining

1 "Subject-based data mining" is a data mining method involving the search for associations between individuals in data. In the context of combating terrorism, the National Research

Council provides the following definition: "Subject-based data mining uses an initiating individual or other datum that is considered,

based on other information, to be of high interest, and the goal is to determine what other persons or financial transactions or movements,

etc., are related to that initiating datum."

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 72: Data mining

Data mining Knowledge grid

1 Knowledge discovery "On the Grid" generally refers to conducting

knowledge discovery in an open environment using grid computing

concepts, allowing users to integrate data from various online data

sources, as well make use of remote resources, for executing their data

mining tasks

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 73: Data mining

Data mining Reliability / Validity

1 Data mining can be misused, and can also unintentionally produce

results which appear significant but which do not actually predict future behavior and cannot be reproduced on a new sample of data. See Data

dredging.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 74: Data mining

Data mining Privacy concerns and ethics

1 In particular, data mining government or commercial data sets

for national security or law enforcement purposes, such as in the Total Information Awareness Program

or in ADVISE, has raised privacy concerns.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 75: Data mining

Data mining Privacy concerns and ethics

1 This is not data mining per se, but a result of the preparation of data

before – and for the purposes of – the analysis

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 76: Data mining

Data mining Privacy concerns and ethics

1 It is recommended that an individual is made aware of the following before data are

collected:

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 77: Data mining

Data mining Privacy concerns and ethics

1 the purpose of the data collection and any (known)

data mining projects

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 78: Data mining

Data mining Privacy concerns and ethics

1 how the data will be used

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 79: Data mining

Data mining Privacy concerns and ethics

1 who will be able to mine the data and use the data and their derivatives

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 80: Data mining

Data mining Privacy concerns and ethics

1 the status of security surrounding access to the data

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 81: Data mining

Data mining Privacy concerns and ethics

1 In America, privacy concerns have been addressed to some extent by the US Congress via the passage of

regulatory controls such as the Health Insurance Portability and

Accountability Act (HIPAA)

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 82: Data mining

Data mining Privacy concerns and ethics

1 Data may also be modified so as to become anonymous, so that

individuals may not readily be identified. However, even "de-

identified"/"anonymized" data sets can potentially contain enough

information to allow identification of individuals, as occurred when

journalists were able to find several individuals based on a set of search

histories that were inadvertently released by AOL.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 83: Data mining

Data mining Free open-source data mining software and applications

1 Carrot2: Text and search results clustering

framework.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 84: Data mining

Data mining Free open-source data mining software and applications

1 Chemicalize.org: A chemical structure miner and web

search engine.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 85: Data mining

Data mining Free open-source data mining software and applications

1 ELKI: A university research project with advanced cluster analysis and outlier detection methods written in

the Java language.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 86: Data mining

Data mining Free open-source data mining software and applications

1 GATE: a natural language processing and language

engineering tool.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 87: Data mining

Data mining Free open-source data mining software and applications

1 KNIME: The Konstanz Information Miner, a user friendly and comprehensive data

analytics framework.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 88: Data mining

Data mining Free open-source data mining software and applications

1 ML-Flex: A software package that enables users to integrate with third-

party machine-learning packages written in any programming

language, execute classification analyses in parallel across multiple

computing nodes, and produce HTML reports of classification results.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 89: Data mining

Data mining Free open-source data mining software and applications

1 NLTK (Natural Language Toolkit): A suite of libraries and programs for

symbolic and statistical natural language processing (NLP) for the

Python language.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 90: Data mining

Data mining Free open-source data mining software and applications

1 SenticNet API: A semantic and affective resource for opinion mining and sentiment

analysis.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 91: Data mining

Data mining Free open-source data mining software and applications

1 Orange: A component-based data mining and machine learning

software suite written in the Python language.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 92: Data mining

Data mining Free open-source data mining software and applications

1 R: A programming language and software environment for statistical

computing, data mining, and graphics. It is part of the GNU

Project.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 93: Data mining

Data mining Free open-source data mining software and applications

1 UIMA: The UIMA (Unstructured Information Management

Architecture) is a component framework for analyzing

unstructured content such as text, audio and video – originally

developed by IBM.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 94: Data mining

Data mining Free open-source data mining software and applications

1 Weka: A suite of machine learning software applications written in the Java programming

language.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 95: Data mining

Data mining Commercial data-mining software and applications

1 Angoss KnowledgeSTUDIO: data mining tool provided by

Angoss.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 96: Data mining

Data mining Commercial data-mining software and applications

1 BIRT Analytics: visual data mining and predictive analytics tool provided by Actuate

Corporation.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 97: Data mining

Data mining Commercial data-mining software and applications

1 Clarabridge: enterprise class text analytics solution.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 98: Data mining

Data mining Commercial data-mining software and applications

1 IBM DB2 Intelligent Miner: in-database data mining platform provided by IBM, with modeling,

scoring and visualization services based on the SQL/MM - PMML

framework.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 99: Data mining

Data mining Commercial data-mining software and applications

1 LIONsolver: an integrated software application for data mining, business

intelligence, and modeling that implements the Learning and

Intelligent OptimizatioN (LION) approach.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 100: Data mining

Data mining Commercial data-mining software and applications

1 NetOwl: suite of multilingual text and entity analytics products that enable data mining.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 101: Data mining

Data mining Commercial data-mining software and applications

1 SAS Enterprise Miner: data mining software provided by

the SAS Institute.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 102: Data mining

Data mining Marketplace surveys

1 Several researchers and organizations have conducted

reviews of data mining tools and surveys of data miners. These

identify some of the strengths and weaknesses of the software

packages. They also provide an overview of the behaviors,

preferences and views of data miners. Some of these reports

include:https://store.theartofservice.com/the-data-mining-toolkit.html

Page 103: Data mining

Data mining Marketplace surveys

1 Forrester Research 2010 Predictive Analytics and Data Mining Solutions report

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 104: Data mining

Data mining Marketplace surveys

1 Gartner 2008 "Magic Quadrant" report

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 105: Data mining

Data mining Marketplace surveys

1 Haughton et al.'s 2003 Review of Data Mining Software Packages in The American

Statistician

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 106: Data mining

Data mining Further reading

1 M.S. Chen, J. Han, P.S. Yu (1996) "Data mining: an overview from a database perspective". Knowledge

and data Engineering, IEEE Transactions on 8 (6), 866-883

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 107: Data mining

Data mining Further reading

1 Feldman, Ronen; and Sanger, James; The Text Mining Handbook,

Cambridge University Press, ISBN 978-0-521-83657-9

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 108: Data mining

Data mining Further reading

1 Guo, Yike; and Grossman, Robert (editors) (1999); High Performance Data Mining: Scaling Algorithms, Applications and Systems, Kluwer

Academic Publishers

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 109: Data mining

Data mining Further reading

1 Han, Jiawei, Micheline Kamber, and Jian Pei. Data mining: concepts and

techniques. Morgan kaufmann, 2006.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 110: Data mining

Data mining Further reading

1 Liu, Bing (2007); Web Data Mining: Exploring Hyperlinks, Contents and Usage Data, Springer, ISBN 3-540-

37881-2

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 111: Data mining

Data mining Further reading

1 Murphy, Chris (16 May 2011). "Is Data Mining Free Speech?". InformationWeek (UMB): 12.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 112: Data mining

Data mining Further reading

1 Poncelet, Pascal; Masseglia, Florent; and Teisseire, Maguelonne (editors)

(October 2007); "Data Mining Patterns: New Methods and

Applications", Information Science Reference, ISBN 978-1-59904-162-9

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 113: Data mining

Data mining Further reading

1 Tan, Pang-Ning; Steinbach, Michael; and Kumar, Vipin (2005); Introduction to Data Mining, ISBN 0-321-32136-7

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 114: Data mining

Data mining Further reading

1 Theodoridis, Sergios; and Koutroumbas, Konstantinos (2009);

Pattern Recognition, 4th Edition, Academic Press, ISBN 978-1-59749-

272-0

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 115: Data mining

Data mining Further reading

1 Weiss, Sholom M.; and Indurkhya, Nitin (1998); Predictive Data Mining, Morgan

Kaufmann

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 116: Data mining

Data mining Further reading

1 Witten, Ian H.; Frank, Eibe; Hall, Mark A. (30 January 2011). Data Mining:

Practical Machine Learning Tools and Techniques (3 ed.). Elsevier. ISBN

978-0-12-374856-0. (See also Free Weka software)

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 117: Data mining

Data mining Further reading

1 Ye, Nong (2003); The Handbook of

Data Mining, Mahwah, NJ:

Lawrence Erlbaumhttps://store.theartofservice.com/the-data-mining-toolkit.html

Page 118: Data mining

Data Mining Extensions

1 Data Mining Extensions (DMX) is a query language for Data Mining

Models supported by Microsoft's SQL Server Analysis Services product.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 119: Data mining

Data Mining Extensions

1 DMX is used to create and train data mining models, and to browse, manage, and predict

against them

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 120: Data mining

Data Mining Extensions - DMX Queries

1 DMX Queries are formulated using the SELECT statement. They can extract information from existing

data mining models in various ways.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 121: Data mining

Data Mining Extensions - Data Definition Language

1 The Data Definition Language (DDL)

part of DMX can be used to

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 122: Data mining

Data Mining Extensions - Data Definition Language

1 Create new data mining models and mining structures - CREATE MINING STRUCTURE,

CREATE MINING MODEL

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 123: Data mining

Data Mining Extensions - Data Definition Language

1 Delete existing data mining models and mining structures - DROP MINING

STRUCTURE, DROP MINING MODEL

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 124: Data mining

Data Mining Extensions - Data Definition Language

1 Export and import mining structures - EXPORT, IMPORT

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 125: Data mining

Data Mining Extensions - Data Manipulation Language

1 The Data Manipulation

Language (DML) part of DMX can be

used tohttps://store.theartofservice.com/the-data-mining-toolkit.html

Page 126: Data mining

Data Mining Extensions - Data Manipulation Language

1 Make predictions using mining model -

SELECT ... FROM PREDICTION JOIN

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 127: Data mining

Data Mining Extensions - Example: a prediction query

1 This example is a singleton prediction query, which predicts for the given customer whether she will be interested in home loan products.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 128: Data mining

Data Mining Extensions - Example: a prediction query

1 NATURAL PREDICTION JOIN

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 129: Data mining

Data Mining Extensions - Example: a prediction query

1 18 AS [Total Years of Education]

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 130: Data mining

OAuth - Abuse of OAuth for Internet data mining

1 A growing number of social networking services promote OAuth

logins to the dominant social networks (Facebook, Twitter, etc.) as the primary authentication method, over "traditional" email confirmation

type processes

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 131: Data mining

OAuth - Abuse of OAuth for Internet data mining

1 The use of OAuth logins to social networks for "authentication" permits

the application provider to legitimately circumvent the often

significant restrictions on API use put in place by social network providers

to prevent large-scale data extraction

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 132: Data mining

Social networking service - Data mining

1 Through data mining, companies are able to improve their sales and profitability

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 133: Data mining

United States Department of Homeland Security - Data mining (ADVISE)

1 The Associated Press reported on September 5, 2007, that DHS had scrapped an anti-terrorism data

mining tool called ADVISE (Analysis, Dissemination, Visualization, Insight and Semantic Enhancement) after

the agency's Privacy Office and Office of Inspector General (OIG)

found that pilot testing of the system had been performed using data on real people without having done a

Privacy Impact Assessment, a required privacy safeguard for the

various uses of real personally identifiable information required by

section 208 of the e-Government Act of 2002

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 134: Data mining

Multitenancy - Data aggregation/data mining

1 One of the most compelling reasons for vendors/ISVs to utilize

multitenancy is for the inherent data aggregation benefits

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 135: Data mining

Machine learning - Machine learning and data mining

1 These two terms are commonly confused, as they often employ the

same methods and overlap significantly. They can be roughly

defined as follows:

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 136: Data mining

Machine learning - Machine learning and data mining

1 Machine learning focuses on prediction, based on known properties learned from the training

data.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 137: Data mining

Machine learning - Machine learning and data mining

1 Data mining focuses on the discovery of (previously) unknown properties in the data. This is the analysis step of Knowledge Discovery in Databases.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 138: Data mining

Machine learning - Machine learning and data mining

1 Much of the confusion between these two research communities (which do often have separate conferences and separate journals, ECML PKDD being a major exception) comes from the

basic assumptions they work with: in machine learning, performance is

usually evaluated with respect to the ability to reproduce known

knowledge, while in Knowledge Discovery and Data Mining (KDD) the

key task is the discovery of previously unknown knowledge

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 139: Data mining

Surveillance - Data mining and profiling

1 Data mining is the application of statistical techniques and

programmatic algorithms to discover previously unnoticed relationships

within the data.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 140: Data mining

Surveillance - Data mining and profiling

1 Economic (such as Creditcard purchases) and social (such as

telephone calls and emails) transactions in modern society

create large amounts of stored data and records. In the past, this data was documented in paper records, leaving a paper trail, or was simply

not documented at all. Correlation of paper-based records was a laborious

process—it required human intelligence operators to manually dig through documents, which was time-consuming and incomplete, at

best.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 141: Data mining

Surveillance - Data mining and profiling

1 But today many of these records are electronic, resulting in an electronic trail

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 142: Data mining

Surveillance - Data mining and profiling

1 Information relating to many of these individual transactions is often easily available because it is generally not

guarded in isolation, since the information, such as the title of a

movie a person has rented, might not seem sensitive

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 143: Data mining

Surveillance - Data mining and profiling

1 In addition to its own aggregation and profiling tools, the government is able to access information from third parties— for example, banks, credit companies or employers, etc.— by requesting access informally, by

compelling access through the use of subpoenas or other procedures, or

by purchasing data from commercial data aggregators or data brokers

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 144: Data mining

Surveillance - Data mining and profiling

1 Under [http://caselaw.lp.findlaw.com/scripts/

getcase.pl?court=usvol=425invol=435 United

States v. Miller] (1976), data held by third parties is generally not subject to Fourth Amendment to the United

States Constitution|Fourth Amendment warrant requirements.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 145: Data mining

Criticism of Facebook - Data mining

1 There have been some concerns expressed regarding the use of

Facebook as a means of surveillance and data mining

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 146: Data mining

Criticism of Facebook - Data mining

1 The possibility of data mining by private individuals unaffiliated with Facebook has been a concern, as evidenced by the fact that two

Massachusetts Institute of Technology (MIT) students were able

to download, using an automated script, over 70,000 Facebook profiles

from four schools (MIT, NYU, the University of Oklahoma, and Harvard

University) as part of a research project on Facebook privacy

published on December 14, 2005

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 147: Data mining

Criticism of Facebook - Data mining

1 A second clause that brought criticism from some users allowed

Facebook the right to sell users' data to private companies, stating We may share your information with

third parties, including responsible companies with which we have a

relationship. This concern was addressed by spokesman Chris

Hughes, who said Simply put, we have never provided our users'

information to third party companies, nor do we intend to. Facebook

eventually removed this clause from its privacy policy.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 148: Data mining

Criticism of Facebook - Data mining

1 Previously, third party applications had access to almost all user

information. Facebook's privacy policy previously stated: Facebook

does not screen or approve Platform Developers and cannot control how such Platform Developers use any

personal information. However, that language has since been removed. Regarding use of user data by third party applications, the 'Preapproved

Third-Party Websites and Applications' section of the Facebook

privacy policy now states:

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 149: Data mining

Criticism of Facebook - Data mining

1 In the United Kingdom, the Trades Union Congress (TUC) has

encouraged employers to allow their staff to access Facebook and other social-networking sites from work,

provided they proceed with caution.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 150: Data mining

Criticism of Facebook - Data mining

1 In September 2007, Facebook drew a fresh round of criticism after it began allowing non-members to search for

users, with the intent of opening limited public profiles up to search

engines such as Google in the following months. Facebook's privacy

settings, however, allow users to block their profiles from search

engines.https://store.theartofservice.com/the-data-mining-toolkit.html

Page 151: Data mining

Criticism of Facebook - Data mining

1 Concerns were also raised on the Watchdog (TV series)|BBC's

Watchdog program in October 2007 when Facebook was shown to be an

easy way in which to collect an individual's personal information in

order to facilitate identity theft. However, there is barely any

personal information presented to non-friends - if users leave the

privacy controls on their default settings, the only personal

information visible to a non-friend is the user's name, gender, profile

picture, networks, and user name.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 152: Data mining

Criticism of Facebook - Data mining

1 A New York Times article in February 2008 pointed out that Facebook does not actually provide a mechanism for

users to close their accounts, and raised the concern that private user data would remain indefinitely on

Facebook's servers. , Facebook gives users the options to deactivate or

delete their accounts.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 153: Data mining

Criticism of Facebook - Data mining

1 Deactivating an account allows it to be restored later, while deleting it

will remove the account permanently, although some data

submitted by that account (like posting to a group or sending

someone a message) will remain.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 154: Data mining

Criticism of Facebook - Data mining

1 A third party site, uSocial, was involved in a controversy

surrounding the sale of fans and friends. uSocial received a cease-

and-desist letter from Facebook and has stopped selling friends.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 155: Data mining

Data visualization - Data mining

1 Data mining is the process of sorting through large amounts of data and

picking out relevant information. It is usually used by business intelligence organizations, and financial analysts, but is increasingly being used in the sciences to extract information from

the enormous data sets generated by modern experimental and observational methods.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 156: Data mining

Data visualization - Data mining

1 It has been described as the nontrivial extraction of implicit, previously unknown,

and potentially useful information from data and the science of extracting useful information from large data sets or databases. In relation to enterprise

resource planning, according to Monk (2006), data mining is the statistical and

logical analysis of large sets of transaction data, looking for patterns that can aid

decision making.https://store.theartofservice.com/the-data-mining-toolkit.html

Page 157: Data mining

Mass surveillance in the United States - Data mining of subpoenaed records

1 The Federal Bureau of Investigation|FBI collected nearly all hotel, airline,

rental car, gift shop, and casino records in Las Vegas, Nevada|Las

Vegas during the last two weeks of 2003

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 158: Data mining

Oracle Data Mining

1 It provides means for the creation, management and operational

deployment of data mining models inside the database environment.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 159: Data mining

Oracle Data Mining - Overview

1 These operations include functions to Data Definition Language|create, apply, Test method|test, and Data

manipulation|manipulate data mining models

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 160: Data mining

Oracle Data Mining - Overview

1 In data mining, the process of using a model to derive predictions or

descriptions of behavior that is yet to occur is called scoring

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 161: Data mining

Oracle Data Mining - Overview

1 Most Oracle Data Mining functions also allow text mining by accepting

Text (unstructured data) attributes as input

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 162: Data mining

Oracle Data Mining - History

1 Oracle Data Mining was first introduced in 2002 and its releases

are named according to the corresponding Oracle database

release:

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 163: Data mining

Oracle Data Mining - History

1 * Oracle Data Mining 10gR1 (10.1.0.2.0 - February 2004)

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 164: Data mining

Oracle Data Mining - History

1 * Oracle Data Mining 10gR2 (10.2.0.1.0 - July 2005)

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 165: Data mining

Oracle Data Mining - History

1 Oracle Data Mining is a logical successor of the Darwin data mining

toolset developed by Thinking Machines Corporation in the mid-

1990s and later distributed by Oracle after its acquisition of Thinking

Machines in 1999. However, the product itself

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 166: Data mining

Oracle Data Mining - History

1 is a Rewrite (programming)|complete redesign and rewrite from ground-up

- while Darwin was a classic GUI-based analytical workbench, ODM

offers a data mining development/deployment platform

integrated into the Oracle database, along with the Oracle Data Miner

GUI.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 167: Data mining

Oracle Data Mining - History

1 The Oracle Data Miner 11gR2 New Workflow GUI was previewed at Oracle Open World 2009. An

updated Oracle Data Miner GUI was released in 2012. It is free, and is available as an extension to Oracle

SQL Developer 3.1 .

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 168: Data mining

Oracle Data Mining - Functionality

1 As of release 11gR1 Oracle Data Mining contains the following data mining functions:

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 169: Data mining

Oracle Data Mining - Functionality

1 ** Model exploration,

evaluation and analysis.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 170: Data mining

Oracle Data Mining - Functionality

1 * Feature selection (Attribute Importance).

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 171: Data mining

Oracle Data Mining - Functionality

1 ** Support Vector Machine (SVM).

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 172: Data mining

Oracle Data Mining - Functionality

1 ** One-class Support Vector Machine (SVM).

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 173: Data mining

Oracle Data Mining - Functionality

1 ** Generalized linear model (GLM) for

Multiple regression

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 174: Data mining

Oracle Data Mining - Functionality

1 ** Orthogonal Partitioning Clustering (O-Cluster).

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 175: Data mining

Oracle Data Mining - Functionality

1 * Association rule learning:

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 176: Data mining

Oracle Data Mining - Functionality

1 ** Itemsets and association rules

(AM).

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 177: Data mining

Oracle Data Mining - Functionality

1 * Feature extraction.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 178: Data mining

Oracle Data Mining - Functionality

1 ** Combined text and non-text columns of

input data.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 179: Data mining

Oracle Data Mining - Input sources and data preparation

1 Most Oracle Data Mining functions accept as input one relational table or view. Flat data can be combined with transactional data through the

use of nested columns, enabling mining of data involving one-to-many

relationships (e.g. a star schema). The full functionality of SQL can be used when preparing data for data mining, including dates and spatial

data.https://store.theartofservice.com/the-data-mining-toolkit.html

Page 180: Data mining

Oracle Data Mining - Input sources and data preparation

1 Oracle Data Mining distinguishes numerical, categorical, and

unstructured (text) attributes. The product also provides utilities for

data preparation steps prior to model building such as outlier treatment,

discretization, Database normalization|normalization and

binning (sorting in general speak)

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 181: Data mining

Oracle Data Mining - Graphical user interface: Oracle Data Miner

1 There is also an independent interface: the Spreadsheet Add-In for

Predictive Analytics which enables access to the Oracle Data Mining

Predictive Analytics PL/SQL package from Microsoft Excel.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 182: Data mining

Oracle Data Mining - PL/SQL and Java interfaces

1 Oracle Data Mining provides a native PL/SQL package

(DBMS_DATA_MINING) to create, destroy, describe, apply, test, export and import models. The code below

illustrates a typical call to build a Statistical classification|classification

model:

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 183: Data mining

Oracle Data Mining - PMML

1 In Release 11gR2 (11.2.0.2), ODM supports the import of externally-

created PMML for some of the data mining models. PMML is an XML-

based standard for representing data mining models.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 184: Data mining

Oracle Data Mining - Predictive Analytics MS Excel Add-In

1 The PL/SQL package DBMS_PREDICTIVE_ANALYTICS

automates the data mining process including data preprocessing, model building and evaluation, and scoring

of new data

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 185: Data mining

Oracle Data Mining - References and further reading

1 * T. H. Davenport, [ http://www.lbl.gov/BLI/BLI_Library/assets/articles/OM/OM_PSDM_Competing

_Analytics.pdf Competing on Analytics], Harvard Business Review,

January 2006.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 186: Data mining

Oracle Data Mining - References and further reading

1 * I. Ben-Gal,[ http://www.eng.tau.ac.il/~bengal/outlier.pdf Outlier detection], In: Maimon O. and Rockach L. (Eds.) Data Mining and Knowledge Discovery Handbook:

A Complete Guide for Practitioners and Researchers, Kluwer Academic

Publishers, 2005, ISBN 0-387-24435-2.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 187: Data mining

Oracle Data Mining - References and further reading

1 * M. M. Campos, P. J. Stengard, and B. L. Milenova, Data-centric Automated Data Mining. In proceedings of the Fourth International Conference on Machine Learning and Applications 2005, 15–17 December 2005. pp8,

ISBN 0-7695-2495-8

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 188: Data mining

Oracle Data Mining - References and further reading

1 * M. F. Hornick, Erik Marcade, and Sunil Venkayala. Java Data Mining: Strategy, Standard, and Practice.

Morgan-Kaufmann, 2006, ISBN 0-12-370452-9.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 189: Data mining

Oracle Data Mining - References and further reading

1 * B. L. Milenova, J. S. Yarmus, and M. M. Campos. SVM in Oracle database

10g: removing the barriers to widespread adoption of support

vector machines. In Proceedings of the 31st international Conference on Very Large Data Bases (Trondheim, Norway, August 30 - September 2,

2005). pp1152–1163, ISBN 1-59593-154-6.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 190: Data mining

Oracle Data Mining - References and further reading

1 * B. L. Milenova and M. M. Campos. O-Cluster: scalable clustering of large

high dimensional data sets. In proceedings of the 2002 IEEE

International Conference on Data Mining: ICDM 2002. pp290–297, ISBN

0-7695-1754-4.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 191: Data mining

Oracle Data Mining - References and further reading

1 * P. Tamayo, C. Berger, M. M. Campos, J. S. Yarmus, B. L.Milenova,

A. Mozes, M. Taft, M. Hornick, R. Krishnan, S.Thomas, M. Kelly, D.

Mukhin, R. Haberstroh, S. Stephens and J. Myczkowski. Oracle Data

Mining - Data Mining in the Database Environment. In Part VII of Data

Mining and Knowledge Discovery Handbook, Maimon, O.; Rokach, L.

(Eds.) 2005, p315-1329, ISBN 0-387-24435-2.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 192: Data mining

Oracle Data Mining - References and further reading

1 * Brendan Tierney, Predictive Analytics using Oracle Data Miner:

for the data scientist, oracle analyst, oracle developer DBA, Oracle Press,

McGraw Hill, Spring 2014.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 193: Data mining

Computational sociology - Data mining and social network analysis

1 Independent from developments in computational models of social

systems, social network analysis emerged in the 1970s and 1980s from advances in graph theory, statistics, and studies of social

structure as a distinct analytical method and was articulated and

employed by sociologists like James Samuel Coleman|James S

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 194: Data mining

Department of Homeland Security - Data mining (ADVISE)

1 found that Pilot (experiment)|pilot testing of the system had been

performed using data on real people without having done a Privacy Impact

Assessment, a required privacy safeguard for the various uses of real

personally identifiable information required by section 208 of the e-

Government Act of 2002

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 195: Data mining

List of free and open-source software packages - Data mining

1 * Environment for DeveLoping KDD-Applications Supported by Index-

Structures|Environment for DeveLoping KDD-Applications

Supported by Index-Structures (ELKI) — data mining software framework

written in Java with a focus on clustering and outlier detection

methods.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 196: Data mining

List of free and open-source software packages - Data mining

1 * Orange (software) — data visualization and data mining for

novice and experts, through visual programming or Python scripting. Extensions for bioinformatics and

text mining.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 197: Data mining

List of free and open-source software packages - Data mining

1 * RapidMiner — data mining software written in Java, fully integrating

Weka, featuring 350+ operators for preprocessing, machine learning,

visualization, etc.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 198: Data mining

List of free and open-source software packages - Data mining

1 * Scriptella|Scriptella ETL — Extract transform load|ETL (Extract-

Transform-Load) and script execution tool. Supports integration with J2EE and Spring. Provides connectors to CSV, LDAP, XML, JDBC/ODBC and

other data sources.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 199: Data mining

List of free and open-source software packages - Data mining

1 * Weka (machine learning)|Weka — data mining software written in Java featuring machine learning operators

for classification, regression, and clustering.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 200: Data mining

List of open-source software packages - Data mining

1 * OpenNN — Open source neural networks software library written in the C++

programming language.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 201: Data mining

Learning analytics - Differentiating Learning Analytics and Educational Data Mining

1 They go on to attempt to disambiguate educational data mining from academic analytics based on whether the process is hypothesis driven or not, though

Brooks C

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 202: Data mining

Learning analytics - Differentiating Learning Analytics and Educational Data Mining

1 Regardless of the differences between the LA and EDM

communities, the two areas have significant overlap both in the

objectives of investigators as well as in the methods and techniques that

are used in the investigation.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 203: Data mining

Customer analytics - Data mining

1 There are two types of categories of data mining. Predictive models use previous customer interactions to

predict future events while segmentation techniques are used to

place customers with similar behaviors and attributes into distinct

groups. This grouping can help marketers to optimize their campaign management and

targeting processes.https://store.theartofservice.com/the-data-mining-toolkit.html

Page 204: Data mining

Conference on Knowledge Discovery and Data Mining

1 'SIGKDD' is the Association for Computing Machinery's Association for Computing Machinery#Special

Interest Groups|Special Interest Group on Knowledge Discovery and Data Mining. It became an official ACM SIG in 1998. The official web page of SIGKDD can be found on

www.KDD.org.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 205: Data mining

Conference on Knowledge Discovery and Data Mining - Conferences

1 SIGKDD has hosted an annual conference - 'ACM SIGKDD

Conference on Knowledge Discovery and Data Mining' ('KDD') - since

1995. KDD Conferences grew from KDD (Knowledge Discovery and Data

Mining) workshops at AAAI conferences, which were started by

Wikipedia:Gregory I. Piatetsky-Shapiro|Gregory Piatetsky-Shapiro in 1989, 1991, and 1993, and Usama

Fayyad in 1994.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 206: Data mining

Conference on Knowledge Discovery and Data Mining - Conferences

1 http://www.sigkdd.org/conferences.php Conference papers of each Proceedings of the SIGKDD

International Conference on Knowledge Discovery and Data Mining are published through

Association for Computing Machinery|ACMhttp://dl.acm.org/even

t.cfm?id=RE329

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 207: Data mining

Conference on Knowledge Discovery and Data Mining - Conferences

1 KDD-2012 took place in Beijing, China,http://kdd2012.sigkdd.org/ KDD-2013 took place in Chicago,

USA., and KDD-2014 will take place in New York City, USA., August 24–27,

2014. Here is a full list of past KDD meetings.http://www.kdnuggets.com/

meetings/past-meetings-kdd.html

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 208: Data mining

Conference on Knowledge Discovery and Data Mining - KDD-Cup

1 SIGKDD sponsors the [http://www.kdd.org/kddcup/ KDD Cup] competition every year in

conjunction with the annual conference. It is aimed at members

of the industry and academia, particularly students, interested in

KDD.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 209: Data mining

Conference on Knowledge Discovery and Data Mining - Awards

1 The group also annually recognizes members of the KDD community with

its [http://www.kdd.org/sigkdd-innovation-award Innovation Award] and [http://www.kdd.org/innovation-

service-awards Service Award]. Additionally, KDD presents a Best

Paper Award to recognize the highest quality paper at each

conference.https://store.theartofservice.com/the-data-mining-toolkit.html

Page 210: Data mining

Conference on Knowledge Discovery and Data Mining - SIGKDD Explorations

1 SIGKDD has also published a biannual academic journal titled

[http://www.kdd.org/explorations/ SIGKDD Explorations] since June,

1999.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 211: Data mining

Conference on Knowledge Discovery and Data Mining - Leadership

1 The new SIGKDD leadership team

took office on July 1, 2013

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 212: Data mining

Conference on Knowledge Discovery and Data Mining - Leadership

1 * Wikipedia:Gregory I. Piatetsky-Shapiro|Gregory Piatetsky-

Shapirohttp://www.kdnuggets.com/gps.html (2005-2008)

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 213: Data mining

Conference on Knowledge Discovery and Data Mining - Leadership

1 * David D. Jensenhttp://kdl.cs.umas

s.edu/people/jensen/

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 214: Data mining

Conference on Knowledge Discovery and Data Mining - Information Directors

1 * [http://faculty.washington.edu/ankurt/ Ankur Teredesai]

(2011-)https://store.theartofservice.com/the-data-mining-toolkit.html

Page 215: Data mining

Quantitative structure–activity relationship - Data mining approach

1 Computer SAR models typically calculate a relatively large number of

features. Because those lack structural interpretation ability, the preprocessing steps face a feature

selection problem (i.e., which structural features should be interpreted to determine the

structure-activity relationship). Feature selection can be

accomplished by visual inspection (qualitative selection by a human);

by data mining; or by molecule mining.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 216: Data mining

Quantitative structure–activity relationship - Data mining approach

1 A typical data mining based prediction uses e.g. support vector machines, decision trees, neural networks for inductive reasoning|

inducing a predictive learning model.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 217: Data mining

Quantitative structure–activity relationship - Data mining approach

1 Molecule mining approaches, a special case of structured data

mining approaches, apply a similarity matrix based prediction or an

automatic fragmentation scheme into molecular substructures. Furthermore there exist also

approaches using Maximum common subgraph isomorphism problem|

maximum common subgraph searches or graph kernels.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 218: Data mining

Data mining in meteorology

1 Meteorology is the interdisciplinary scientific study of the atmosphere. It

observes the changes in temperature, air pressure, moisture

and wind direction. Usually, temperature, pressure, wind

measurements and humidity are the variables that are measured by a

thermometer, barometer, anemometer, and hygrometer, respectively. There are many

methods of collecting data and Radar, Lidar, satellites are some of

them.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 219: Data mining

Data mining in meteorology

1 Weather forecasts are made by collecting quantitative data about

the current state of the atmosphere. The main issue arise in this

prediction is, it involves high-dimensional characters. To overcome

this issue, it is necessary to first analyze and simplify the data before proceeding with other analysis. Some

data mining techniques are appropriate in this context.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 220: Data mining

Data mining in meteorology - What is Data mining?

1 Consequently, data mining consists of more than collecting and analyzing

data, it also includes analyze and predictions

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 221: Data mining

Data mining in meteorology - What is Data mining?

1 The network architecture and signal process used to model nervous

systems can roughly be divided into three categories, each based on a

different philosophy.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 222: Data mining

Data mining in meteorology - What is Data mining?

1 #Feedforward neural network: the input information defines the initial signals into set of output signals.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 223: Data mining

Data mining in meteorology - What is Data mining?

1 #Feedback network: the input information defines the initial activity state of a feedback system, and after state transitions, the asymptotic final state is identified as the outcome of

the computation.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 224: Data mining

Data mining in meteorology - What is Data mining?

1 #Neighboring cells in a neural network compete in their activities

by means of mutual lateral interactions, and develop adaptively

into specific detectors of different signal patterns. In this category, learning is called competitive, unsupervised learning or self-

organizing.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 225: Data mining

Data mining in meteorology - Self-organizing Maps

1 Self-Organizing Map (SOM) is one of the most popular neural network

models, which is especially suitable for high dimensional data

visualization, clustering and modeling

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 226: Data mining

Data mining in meteorology - Self-organizing Maps

1 The Self-Organizing Map projects high-dimensional input data onto a

low dimensional (usually two-dimensional) space

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 227: Data mining

Data mining in meteorology - Self-organizing Maps

1 According to the first input of the input vector, System chooses the

output neuron (winning neuron) that closely matches with the given input

vector

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 228: Data mining

Police-enforced ANPR in the UK - Data mining

1 A major feature of the National ANPR Data Centre for car numbers is the ability to data mining|data mine.

Advanced versatile automated data mining software trawls through the

vast amounts of data collected, finding patterns and meaning in the data. Data mining can be used on

the records of previous sightings to build up intelligence of a vehicle's

movements on the road network or can be used to find cloned vehicles

by searching the database for impossibly quick journeys.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 229: Data mining

Police-enforced ANPR in the UK - Data mining

1 We can use ANPR on investigations or we can use it looking forward in a proactive,

intelligence way

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 230: Data mining

Multifactor dimensionality reduction - Data mining with MDR

1 Another approach is to generate many random permutations of the data to see what the data mining algorithm finds when given the

chance to overfit

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 231: Data mining

Educational data mining

1 Baker (2010) Data Mining for Education

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 232: Data mining

Educational data mining - Definition

1 Educational Data Mining refers to techniques, tools, and research

designed for automatically extracting meaning from large repositories of

data generated by or related to people's learning activities in

educational settings

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 233: Data mining

Educational data mining - Definition

1 In other cases, the data is less fine-

grained

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 234: Data mining

Educational data mining - History

1 Educational Data Mining: A Review of the State-of-

the-Art

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 235: Data mining

Educational data mining - History

1 As interest in EDM continued to increase, EDM researchers

established an academic journal in 2009, the

[http://www.educationaldatamining.org/JEDM/ Journal of Educational Data

Mining], for sharing and disseminating research results. In

2011, EDM researchers established the

[http://educationaldatamining.org/ International Educational Data Mining Society] to connect EDM researchers

and continue to grow the field.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 236: Data mining

Educational data mining - History

1 With the introduction of public educational data repositories in

2008, such as the Pittsburgh Science of Learning Centre’s (PSLC) DataShop

and the National Center for Education Statistics (NCES), public data sets have made educational data mining more accessible and

feasible, contributing to its growth.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 237: Data mining

Educational data mining - Goals

1 Baker and Yacef identified the following

four goals of EDM:

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 238: Data mining

Educational data mining - Goals

1 #'Predicting students' future learning behavior' – With the use of student modeling, this goal can be achieved

by creating student models that incorporate the learner’s

characteristics, including detailed information such as their knowledge, behaviours and motivation to learn. The user experience of the learner

and their overall Contentment|satisfaction with learning are also

measured.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 239: Data mining

Educational data mining - Goals

1 #'Discovering or improving domain models' – Through the various

methods and applications of EDM, discovery of new and improvements

to existing models is possible. Examples include illustrating the educational content to engage

learners and determining optimal instructional sequences to support

the student’s learning style.https://store.theartofservice.com/the-data-mining-toolkit.html

Page 240: Data mining

Educational data mining - Goals

1 #'Studying the effects of educational support' that can be achieved through learning

systems.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 241: Data mining

Educational data mining - Goals

1 #'Advancing scientific knowledge about learning and learners' by

building and incorporating student models, the field of EDM research and the technology and software

used.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 242: Data mining

Educational data mining - Users and Stakeholders

1 There are four main users and stakeholders involved with educational data mining. These

include:

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 243: Data mining

Educational data mining - Users and Stakeholders

1 JEDM-Journal of Educational Data Mining

5.2 (2013): 102-126.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 244: Data mining

Educational data mining - Users and Stakeholders

1 * 'Educators' - Educators attempt to understand the learning process and the methods they can use to improve

their teaching methods

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 245: Data mining

Educational data mining - Users and Stakeholders

1 * 'Researchers' - Researchers focus on the development and the

evaluation of data mining techniques for effectiveness. A yearly

international conference for researchers began in 2008, followed

by the establishment of the [http://www.educationaldatamining.o

rg/JEDM/index.php/JEDM Journal of Educational Data Mining] in 2009. The wide range of topics in EDM ranges from using data mining to

improve institutional effectiveness to student performance.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 246: Data mining

Educational data mining - Users and Stakeholders

1 * 'Administrator (business)|Administrators' - Administrators are

responsible for allocating the resources for implementation in

institutions

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 247: Data mining

Educational data mining - Phases of Educational Data Mining

1 As research in the field of educational data mining has

continued to grow, a myriad of data mining techniques have been applied to a variety of educational contexts. In each case, the goal is to translate raw data into meaningful information about the learning process in order to

make better decisions about the design and trajectory of a learning environment. Thus, EDM generally

consists of four phases:

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 248: Data mining

Educational data mining - Phases of Educational Data Mining

1 # The first phase of the EDM process (not counting pre-processing) is discovering relationships in data

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 249: Data mining

Educational data mining - Phases of Educational Data Mining

1 # Discovered relationships must then be Validity (statistics)|validated in

order to avoid overfitting.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 250: Data mining

Educational data mining - Phases of Educational Data Mining

1 # Validated relationships are applied to make predictions about future

events in the learning environment.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 251: Data mining

Educational data mining - Phases of Educational Data Mining

1 # Predictions are used to support decision-making processes and policy decisions.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 252: Data mining

Educational data mining - Phases of Educational Data Mining

1 During phases 3 and 4, data is often visualized or in some other way

distilled for human judgment. A large amount of research has been

conducted in best practices for Data visualization|visualizing data.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 253: Data mining

Educational data mining - Main Approaches

1 Of the general categories of methods mentioned, prediction, Cluster

analysis|clustering and relationship mining are considered universal methods across all types of data mining; however, 'Discovery with

Models' and 'Distillation of Data for Human Judgment' are considered

more prominent approaches within educational data mining.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 254: Data mining

Educational data mining - Discovery with Models

1 In the Discovery with Model method, a model is developed via prediction,

clustering or by human reasoning knowledge engineering and then used as a component in another

analysis, namely in prediction and relationship mining

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 255: Data mining

Educational data mining - Discovery with Models

1 Key applications of this method include discovering relationships

between student behaviors, characteristics and contextual

variables in the learning environment. Further discovery of

broad and specific research questions across a wide range of

contexts can also be explored using this method.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 256: Data mining

Educational data mining - Distillation of Data for Human Judgment

1 Humans can make inferences about data that may be beyond the scope in which an automated data mining

method provides. For the use of education data mining, data is

distilled for human judgment for two key purposes, Identification

(information)|identification and Statistical classification|classification.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 257: Data mining

Educational data mining - Distillation of Data for Human Judgment

1 For the purpose of Identification (information)|identification, data is

distilled to enable humans to identify well-known patterns, which may

otherwise be difficult to interpret. For example, the learning curve, classic to educational studies, is a pattern that clearly reflects the relationship between learning and experience

over time.https://store.theartofservice.com/the-data-mining-toolkit.html

Page 258: Data mining

Educational data mining - Distillation of Data for Human Judgment

1 Data is also distilled for the purposes of Statistical classification|classifying

features of data, which for educational data mining, is used to

support the development of the prediction model. Classification helps

expedite the development of the prediction model, tremendously.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 259: Data mining

Educational data mining - Distillation of Data for Human Judgment

1 The goal of this method is to summarize and present the

information in a useful, interactive and visually appealing way in order to understand the large amounts of

education data and to support decision making

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 260: Data mining

Educational data mining - Applications

1 A list of the primary applications of EDM is provided by Cristobal Romero

and Sebastian Ventura. In their taxonomy, the areas of EDM

application are:

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 261: Data mining

Educational data mining - Applications

1 * Providing feedback for supporting

instructors

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 262: Data mining

Educational data mining - Applications

1 * Recommendations for students

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 263: Data mining

Educational data mining - Applications

1 * Predicting student performance

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 264: Data mining

Educational data mining - Applications

1 * Detecting undesirable student behaviors

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 265: Data mining

Educational data mining - Applications

1 * Constructing courseware - EDM can be applied to course management

systems such as open source Moodle. Moodle contains usage data

that includes various activities by users such as test results, amount of readings completed and participation

in discussion forums. Data mining tools can be used to customize

learning activities for each user and adapt the pace in which the student

completes the course. This is in particularly beneficial for online courses with varying levels of

competency.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 266: Data mining

Educational data mining - Applications

1 New research on Mobile phone|mobile learning environments also suggests that data mining can be

useful. Data mining can be used to help provide personalized content to mobile users, despite the differences in managing content between mobile

devices and standard Personal computer|PCs and web browsers.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 267: Data mining

Educational data mining - Applications

1 New EDM applications will focus on allowing non-technical users use and

engage in data mining tools and activities, making data collection and

processing more accessible for all users of EDM. Examples include

statistical and visualization tools that analyzes social networks and their

influence on learning outcomes and productivity.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 268: Data mining

Educational data mining - Courses

1 In October 2013, Coursera offered a free online course on “Big Data in Education” that teaches how and

when to use key methods for EDM. A course archive is now available

online.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 269: Data mining

Educational data mining - Courses

1 Teachers College, Columbia University offers a Learning Analytics focus as part of its Cognitive Studies

Masters. http://catalog.tc.columbia.edu/tc/departments/humandevelopment/cogniti

vestudiesineducation/

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 270: Data mining

Educational data mining - Publication Venues

1 Considerable amounts of EDM work are published at the peer-reviewed

International Conference on Educational Data Mining, organized

by the [http://www.educationaldatamining.o

rg/ International Educational Data Mining Society].

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 271: Data mining

Educational data mining - Publication Venues

1 * [http://www.educationaldatamining.o

rg/EDM2008 1st International Conference on Educational Data

Mining] (2008) -- Montreal, Canada

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 272: Data mining

Educational data mining - Publication Venues

1 * [http://www.educationaldatamining.o

rg/EDM2009 2nd International Conference on Educational Data Mining] (2009) -- Cordoba, Spain

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 273: Data mining

Educational data mining - Publication Venues

1 * [http://www.educationaldatamining.o

rg/EDM2010 3rd International Conference on Educational Data Mining] (2010) -- Pittsburgh, USA

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 274: Data mining

Educational data mining - Publication Venues

1 * [http://www.educationaldatamining.o

rg/EDM2011 4th International Conference on Educational Data

Mining] (2011) -- Eindhoven, Netherlands

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 275: Data mining

Educational data mining - Publication Venues

1 * [http://www.educationaldatamining.o

rg/EDM2012 5th International Conference on Educational Data Mining] (2012) -- Chania, Greece

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 276: Data mining

Educational data mining - Publication Venues

1 * [http://www.educationaldatamining.o

rg/EDM2013 6th International Conference on Educational Data Mining] (2013) -- Memphis, USA

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 277: Data mining

Educational data mining - Publication Venues

1 EDM papers are also published in the [http://www.educationaldatamining.org/JEDM/ Journal of Educational Data

Mining] (JEDM).

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 278: Data mining

Educational data mining - Publication Venues

1 Many EDM papers are routinely published in related conferences, such as Artificial Intelligence and

Education, Intelligent Tutoring Systems, and User Modeling and

Adaptive Personalization.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 279: Data mining

Educational data mining - Publication Venues

1 In 2011, Chapman Hall/CRC Press, Taylor and Francis Group published the first Handbook of Educational Data Mining. This resource was

created for those that are interested in participating in the educational

data mining community.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 280: Data mining

Educational data mining - Contests

1 In 2010, the Association for Computing Machinery's

[http://www.kdd.org/kdd2010/kddcup.shtml KDD Cup] was conducted using data from an educational

setting

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 281: Data mining

Educational data mining - Costs and Challenges

1 Along with technological advancements are costs and challenges associated with

implementing EDM applications

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 282: Data mining

Educational data mining - Criticisms

1 Research also indicates that the field of educational data mining is

concentrated in North America and western cultures and subsequently,

other countries and cultures may not be represented in the research and

findings

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 283: Data mining

Educational data mining - Criticisms

1 As users become savvy in their understanding of online privacy,

Business Administrator|administrators of educational data

mining tools need to be proactive in protecting the privacy of their users and be transparent about how and with whom the information will be

used and shared

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 284: Data mining

Educational data mining - Criticisms

1 * 'Plagiarism' - Plagiarism detection is an ongoing challenge for educators

and faculty whether in the classroom or online. However, due to the complexities associated with

detecting and preventing digital plagiarism in particular, educational data mining tools are not currently sophisticated enough to accurately

address this issue. Thus, the development of predictive capability in plagiarism-related issues should

be an area of focus in future research.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 285: Data mining

Educational data mining - Criticisms

1 * 'Adoption' - It is unknown how widespread the adoption of EDM is and the extent to which institutions

have applied and considered implementing an EDM strategy. As

such, it is unclear whether there are any barriers that prevent users from adopting EDM in their educational

settings.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 286: Data mining

Java Data Mining

1 JDM enables applications to integrate data mining technology for

developing predictive analytics applications and tools

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 287: Data mining

Java Data Mining

1 Various data mining functions and techniques like statistical

classification and association (statistics)|association, regression

analysis, data clustering, and attribute importance are covered by

the 1.0 release of this standard.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 288: Data mining

Cross Industry Standard Process for Data Mining

1 In Proceedings of the IADIS European Conference on Data Mining 2008, pp 182-185.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 289: Data mining

Cross Industry Standard Process for Data Mining - Major phases

1 The lessons learned during the process can trigger new, often more

focused business questions and subsequent data mining processes will benefit from the experiences of

previous ones.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 290: Data mining

Cross Industry Standard Process for Data Mining - Major phases

1 ;Business Understanding: This initial phase focuses on understanding the project objectives and requirements

from a business perspective, and then converting this knowledge into

a data mining problem definition, and a preliminary plan designed to

achieve the objectives.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 291: Data mining

Cross Industry Standard Process for Data Mining - Major phases

1 ;Data Understanding: The data understanding phase starts with an initial data collection and proceeds

with activities in order to get familiar with the data, to identify data quality

problems, to discover first insights into the data, or to detect interesting

subsets to form hypotheses for hidden information.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 292: Data mining

Cross Industry Standard Process for Data Mining - Major phases

1 ;Data Preparation: The data preparation phase covers all

activities to construct the final dataset (data that will be fed into the modeling tool(s)) from the initial raw

data. Data preparation tasks are likely to be performed multiple times,

and not in any prescribed order. Tasks include table, record, and

attribute selection as well as transformation and cleaning of data

for modeling tools.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 293: Data mining

Cross Industry Standard Process for Data Mining - Major phases

1 ;Modeling: In this phase, various modeling techniques are selected and applied, and their parameters are calibrated to optimal values.

Typically, there are several techniques for the same data mining problem type. Some techniques have specific requirements on the form of data. Therefore, stepping back to the

data preparation phase is often needed.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 294: Data mining

Cross Industry Standard Process for Data Mining - Major phases

1 At the end of this phase, a decision on the use of the data mining results should be

reached.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 295: Data mining

Cross Industry Standard Process for Data Mining - Major phases

1 Depending on the requirements, the deployment phase can be as simple as generating a report or as complex as implementing a repeatable data

mining process

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 296: Data mining

Cross Industry Standard Process for Data Mining - History

1 CRISP-DM was conceived in 1996. In 1997 it got underway as a European

Union project under the European Strategic Program on Research in Information Technology|ESPRIT

funding initiative. The project was led by five companies: SPSS Inc.|SPSS, Teradata, Daimler AG, NCR

Corporation and OHRA, an insurance company.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 297: Data mining

Cross Industry Standard Process for Data Mining - History

1 This core consortium brought different experiences to the project: ISL, later acquired and merged into SPSS Inc. The computer giant NCR Corporation produced the Teradata data warehouse and its own data

mining software. Daimler-Benz had a significant data mining team. OHRA

was just starting to explore the potential use of data mining.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 298: Data mining

Cross Industry Standard Process for Data Mining - History

1 and published as a step-by-step data mining guide later that year.Pete Chapman, Julian Clinton, Randy

Kerber, Thomas Khabaza, Thomas Reinartz, Colin Shearer, and Rüdiger

Wirth (2000); [ftp://ftp.software.ibm.com/software/

analytics/spss/support/Modeler/Documentation/14/UserManual/

CRISP-DM.pdf CRISP-DM 1.0 Step-by-step data mining guides].

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 299: Data mining

Cross Industry Standard Process for Data Mining - History

1 Between 2006 and 2008 a CRISP-DM 2.0 SIG was formed and there were

discussions about updating the CRISP-DM process model.Colin

Shearer (2006); [http://www.kdnuggets.com/news/20

06/n19/4i.html First CRISP-DM 2.0 Workshop Held] The current status of these efforts is not known. However,

the original crisp-dm.org website cited in the reviews, and the CRISP-

DM 2.0 SIG website are both no longer active.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 300: Data mining

Cross Industry Standard Process for Data Mining - History

1 While many non-IBM data mining practitioners use CRISP-DM, IBM is

the primary corporation that currently embraces the CRISP-DM

process model. It makes some of the old CRISP-DM documents available

for download and it has incorporated it into its SPSS Modeler product.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 301: Data mining

Data mining in agriculture

1 'Data mining in agriculture' is a very recent research topic. It consists

in the application of data mining techniques to agriculture. Recent

technologies are nowadays able to provide a lot of information on

agricultural-related activities, which can then be analyzed in order to find important information. A related, but

not equivalent term is precision agriculture.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 302: Data mining

Data mining in agriculture - Prediction of problematic wine fermentations

1 Wine is widely produced all around the world

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 303: Data mining

Data mining in agriculture - Detection of diseases from sounds issued by animals

1 The detection of animal's diseases in farms can impact positively the

productivity of the farm, because sick animals can cause contaminations

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 304: Data mining

Data mining in agriculture - Sorting apples by watercores

1 For this reason, a computational system is under study which takes X-

ray photographs of the fruit while they run on conveyor belts, and

which is also able to analyse (by data mining techniques) the taken

pictures and estimate the probability that the fruit contains watercores.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 305: Data mining

Data mining in agriculture - Optimizing pesticide use by data mining

1 By data mining the cotton Pest Scouting data along with the

meteorological recordings it was shown that how pesticide use can be

optimized (reduced)

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 306: Data mining

Data mining in agriculture - Explaining pesticide abuse by data mining

1 Creating a novel Pilot Agriculture Extension Data Warehouse followed by analysis through querying and

data mining some interesting discoveries were made, such as pesticides sprayed at the wrong

time, wrong pesticides used for the right reasons and temporal

relationship between pesticide usage and day of the week.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 307: Data mining

Data mining in agriculture - Literature

1 There are a few precision agriculture journals, such as Springer's

[http://www.springerlink.com/content/103317/ Precision Agriculture] or

Elsevier's [http://www.sciencedirect.com/science/journal/01681699 Computers and Electronics in Agriculture], but those are not exclusively devoted to data

mining in agriculture.https://store.theartofservice.com/the-data-mining-toolkit.html

Page 308: Data mining

Data mining in agriculture - Conferences

1 There are many conferences organized every year on data mining techniques

and applications, but rather few of them consider problems arising in the

agricultural field. To date, there is only one example of a conference completely devoted to applications in agriculture of

data mining. It is organized by Georg Ruß. This is the conference [http://dma-

workshop.de/ web page].

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 309: Data mining

Dependent variables - Data mining

1 In data mining tools (for multivariate statistics and machine learning), the

depending variable is assigned a role as 'target variable' (or in some tools as label

attribute), while a dependent variable may be assigned a role as regular

variable.[http://1xltkxylmzx3z8gd647akcdvov.wpengine.netdna-cdn.com/wp-content/uploads/2013/10/rapidminer-5.0-manual-

english_v1.0.pdf English Manual version 1.0] for RapidMiner 5.0, October 2013

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 310: Data mining

Learning algorithms - Machine learning and data mining

1 * Machine learning focuses on prediction, based on known properties learned from the

training data.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 311: Data mining

Learning algorithms - Machine learning and data mining

1 * Data mining focuses on the discovery (observation)|discovery of (previously) unknown properties in

the data. This is the analysis step of Knowledge discovery|Knowledge

Discovery in Databases.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 312: Data mining

Learning algorithms - Machine learning and data mining

1 Much of the confusion between these two research communities (which do often have separate conferences and separate journals, ECML PKDD being a major exception) comes from the basic assumptions they work with: in machine learning, performance is usually

evaluated with respect to the ability to reproduce known knowledge, while in

Knowledge Discovery and Data Mining (KDD) the key task is the discovery of previously

unknown knowledge

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 313: Data mining

Activity recognition - Data mining based approach to activity recognition

1 They proposed a data mining approach based on discriminative patterns which describe significant changes between any two activity

classes of data to recognize sequential, interleaved and

concurrent activities in a unified solution.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 314: Data mining

Activity recognition - Data mining based approach to activity recognition

1 Gilbert et al.Gilbert A, Illingworth J, Bowden R. Action Recognition using Mined

Hierarchical Compound Features. IEEE Trans Pattern Analysis and Machine Learning use 2D corners in both space and time. These

are grouped spatially and temporally using a hierarchical process, with an increasing

search area. At each stage of the hierarchy, the most distinctive and descriptive features are learned efficiently through data mining

(Apriori rule).

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 315: Data mining

Covert surveillance - Data mining and profiling

1 Data mining is the application of statistical techniques and

programmatic algorithms to discover previously unnoticed relationships

within the data

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 316: Data mining

Covert surveillance - Data mining and profiling

1 Economic (such as credit card purchases) and social (such as

telephone calls and emails) transactions in modern society

create large amounts of stored data and records. In the past, this data was documented in paper records, leaving a paper trail, or was simply

not documented at all. Correlation of paper-based records was a laborious

process—it required human intelligence operators to manually dig through documents, which was time-consuming and incomplete, at

best.

https://store.theartofservice.com/the-data-mining-toolkit.html

Page 317: Data mining

Covert surveillance - Data mining and profiling

1 But today many of these records are electronic, resulting in an electronic trail

https://store.theartofservice.com/the-data-mining-toolkit.html