Upload
keira-townsend
View
225
Download
10
Embed Size (px)
Citation preview
G54DMT – Data Mining Techniques and Applications
http://www.cs.nott.ac.uk/~jqb/G54DMT
Dr. Jaume [email protected]
Topic 3: Data MiningLecture 5: Regression and Association Rules
Some slides from chapter 5 of Data Mining. Concepts and Techniques by Han & Kamber
Outline of the lecture
• Regression– Definition– Representations
• Association rules– Definition– Methods
• Resources
Regression• Regression problems are supervised problems where the
output variable is continuous• Many techniques with different names are included in this
category– Regression– Function approximation– Modelling– Curve-fitting
• Given an input vector X and a corresponding output y, we want to find a function f such that y’=f(X) is as close as possible to the true y
Evaluating regression
• Supervised learning: we know the true outputs, so we check how different are from the predicted ones– Mean Absolute Error– Mean Squared Error– Root Mean Squared Error
Linear Regression• Most classic (and widespread in statistics)
type of regression• f(X) is modelled as
– y’=w0+w1x1+w2x2+…+wnxn
http://upload.wikimedia.org/wikipedia/en/thumb/1/13/Linear_regression.png/400px-Linear_regression.png
Linear regression
• Simple but limited in expression power– The same model would apply to these four
datasets
http://en.wikipedia.org/wiki/Anscombe%27s_quartet
Linear regression
• How to find W?– Many mathematical methods available
• Least squares• Ridge regression• Lasso• Etc
– We can also use some kind of metaheuristic (e.g. a Genetic Algorithm)
Polynomial regression
• More complex and sophisticated functions– y=w0+w1x+w2x2+…..
– Y=w0+w1x1+w2x2+w3x1x2+…
• Now the job is double– Choosing the correct function (human inspection
may help)– Adjusting the weights of the model
• Still, would a single mathematical function fit any type of data?
Piece-wise regression
• Input space is partitioned in regions• A local regression model is generated from
the training examples that fell inside each region– Approximating a sine function with linear
regressions
(Butz, 2010)
Piece-wise regression
• How to partition the input space– Using a series of rules
• With a (hyper)rectangular condition (XCSF)• With a (hyper)ellipsoidal condition (XCSF,LWPR)• With a neural condition (XCSF)
– Using a tree-like structure (CART, M5)
• How to perform the regression process for each local approximation– Pick any of the functions discussed before– Plus some truly non-linear methods (SVR)
Piece-wise approximation with hyperellipsoids
• Using XCSF (Wilson, 02) with hyperellipsoid conditions (Butz et al, 08)
Test functionXCSF’s population
(Stalph et al, 2010)
Other regression methods
• Neural networks– A MLP is natively a regression method
• Classification is done by discretising the output of the network
– It is proven that a MLP with enough hidden nodes can approximate any function
• Support Vector Regression– As in SVM, depending on the kernel we got linear or non-
linear regression– The margin specifies a tube around the approximated
function. All points inside the tube have their errors ignored– Support Vectors are the points that lay outside the tube
Association Rules• Association rules try to find frequent patterns in the
dataset that appear together• It can use the class label but it does not have to
we can consider it an unsupervised learning paradigm
• Two types of elements being generated– Association rules: They have antecedent and consequent– Frequent itemsets: They just have an antecedent.
• Both antecedent and consequent are logic predicates (generally of conjunctive form)
Association rules mining
Witten and Frank, 2005 (http://www.cs.waikato.ac.nz/~eibe/Slides2edRev2.zip)
Origin of Association Rules
• These methods were originally employed to analyse shopping carts
• Database is specified as a set of transactions. Each of them includes one or more of a set of items
• An frequent itemset is a set of items that appears in many transactions
• These databases are extremely sparse Tid Items
10 A, C, D
20 B, C, E
30 A, B, C, E
40 B, E
Beers and diapers
• An urban myth about association rules says that when applied to analyze a very large volume of shopping carts they discovered a very simple pattern– “Customers that buy beer also tend to buy diapers”
• This story has changed through time. You can find an article about it here
• It is a good example of data mining, as it was able to find an unexpected pattern
Why Is Freq. Pattern Mining Important?
• Discloses an intrinsic and important property of data sets• Forms the foundation for many essential data mining tasks
– Association, correlation, and causality analysis– Sequential, structural (e.g., sub-graph) patterns– Pattern analysis in spatiotemporal, multimedia, time-series,
and stream data – Classification: associative classification– Cluster analysis: frequent pattern-based clustering– Data warehousing: iceberg cube and cube-gradient – Semantic data compression: fascicles– Broad applications
Evaluation of association rules• Support
– Percentage of examples covered by the predicate in the antecedent
– Applies to both association rules and frequent itemsets
• Confidence– Percentage of the examples matched by the antecedent for
which also match the consequent– Only apply to association rules
• Typically, the user specifies a minimum support and confidence and the algorithm finds all rules above the thresholds
Scalable Methods for Mining Frequent Patterns
• The downward closure property of frequent patterns– Any subset of a frequent itemset must be frequent– If {beer, diaper, nuts} is frequent, so is {beer, diaper}– i.e., every transaction having {beer, diaper, nuts} also
contains {beer, diaper} • Scalable mining methods: Three major approaches
– Apriori (Agrawal & Srikant@VLDB’94)– Freq. pattern growth (FPgrowth—Han, Pei & Yin
@SIGMOD’00)– Vertical data format approach (Charm—Zaki & Hsiao
@SDM’02)
Apriori: A Candidate Generation-and-Test Approach
• Apriori pruning principle: If there is any itemset which is infrequent, its superset should not be generated/tested! (Agrawal & Srikant @VLDB’94, Mannila, et al. @ KDD’ 94)
• Method:
– Initially, scan DB once to get frequent 1-itemset
– Generate length (k+1) candidate itemsets from length k frequent itemsets
– Test the candidates against DB
– Terminate when no frequent or candidate set can be generated
The Apriori Algorithm—An Example
Database TDB
1st scan
C1L1
L2
C2 C2
2nd scan
C3 L33rd scan
Tid Items
10 A, C, D
20 B, C, E
30 A, B, C, E
40 B, E
Itemset sup
{A} 2
{B} 3
{C} 3
{D} 1
{E} 3
Itemset sup
{A} 2
{B} 3
{C} 3
{E} 3
Itemset
{A, B}
{A, C}
{A, E}
{B, C}
{B, E}
{C, E}
Itemset sup{A, B} 1{A, C} 2{A, E} 1{B, C} 2{B, E} 3{C, E} 2
Itemset sup{A, C} 2{B, C} 2{B, E} 3{C, E} 2
Itemset
{B, C, E}
Itemset sup
{B, C, E} 2
Supmin = 2
The Apriori Algorithm
• Pseudo-code:Ck: Candidate itemset of size kLk : frequent itemset of size k
L1 = {frequent items};for (k = 1; Lk !=; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do
increment the count of all candidates in Ck+1 that are contained in t
Lk+1 = candidates in Ck+1 with min_support endreturn k Lk;
Resources• “The Elements of Statistical Learning” by Hastie et al.
contains a lot of detail about statistical regression• List of Regression and association rules methods in
KEEL• Weka also contains both kind of methods• Chapter 5 of the Han and Kamber book is all about
association rules (Han created the Fpgrowth method)• Review of evolutionary algorithms for association
rule mining
Questions?