Advanced Analytics Data Mining using SQL Server Tuesday, April
17, 2012 from 5:30 PM to 7:30 PM (CT) Thomas Arehart Microsoft
Technology Center
Slide 2
Growing Business Use Whether delivered as dashboards,
scorecards or standalone tools, the number of users benefiting from
access to business intelligence (BI) and analytics tools is taking
off. Once limited to only a few number crunchers with degrees in
advanced mathematics, BI and analytics tools are rapidly being
deployed to all professionals in many organizations, and to
everyone in a substantial number of companies, according to
analysts and recent surveys. While traditional BI tools were
complex and expensive, access to powerful BI and analytics
capabilities is no longer out of reach for the masses. Today, BI
capabilities are increasingly embedded in a wide range of software
applications. Another reason for the broader use of these tools is
that the market has evolved into a broad ecosystem. A wide swath of
vendors in a variety of fields essentially have collaborated to
simplify the technology front ends as well as focused the tools on
specific vertical markets such as retailing, telecom and consumer
packaged goods manufacturing. The BA market ranges from platform
technologies such as data warehouse management to end user-facing
analytic applications and BI tool. (1)
Slide 3
Business Need With BI capabilities now found in a wide range of
software applications as well as lighter weight, standalone
packages, new-generation BI is often invisible to its users. This
lets them focus on making better decisions and serving customers
more effectively as opposed to staying up to speed on the latest
technology acronyms. Knowledge workers need analytical tools to
explore the gaps in a process when things break. Analytical
software that analyzes a multitude of databases and transaction
histories can provide guidance and predictions about future
customer needs and behavior. This guidance empowers employees to
anticipate customer needs and reduce costs and improve overall
efficiency. Companies want more automation and consistency around
the decisions employees make on a daily basis. (1)
Slide 4
Evolutionary StepBusiness QuestionEnabling TechnologiesProduct
Providers Data Characteristics Data Collection (1960s) "What was my
total revenue in the last five years?" Spreadsheets, desktop
databases Microsoft Excel and Access Retrospective, static data
delivery Data Access (1980s) "What were unit sales in New England
last March?" SQL, relational database management systems (RDBMS)
Microsoft SQLServer Retrospective, dynamic data delivery at record
level Data Warehousing & Decision Support (1990s) "What were
unit sales in New England last March? Drill down to Boston."
On-line analytic processing (OLAP), multidimensional databases,
data warehouses Microsoft Reporting SQL Server Services (SSRS),
Microsoft SQL Server Analysis Services (SSAS) Retrospective,
dynamic data delivery at multiple levels Data Mining (Emerging
Today) "Whats likely to happen to Boston unit sales next month?
Why?" Advanced algorithms, multiprocessor computers, massive
databases SQL Server Integration Services (SSIS), Excel Add-In
Prospective, proactive information delivery Table 1. Steps in the
Evolution of Data Mining
Slide 5
Examples of tasksMicrosoft algorithms to use (2) Predicting a
discrete attribute Flag the customers in a prospective buyers list
as good or poor prospects. Calculate the probability that a server
will fail within the next 6 months. Categorize patient outcomes and
explore related factors. Microsoft Decision Trees Algorithm
Microsoft Naive Bayes Algorithm Microsoft Clustering Algorithm
Microsoft Neural Network Algorithm Predicting a continuous
attribute Forecast next year's sales. Predict site visitors given
past historical and seasonal trends. Generate a risk score given
demographics. Microsoft Decision Trees Algorithm Microsoft Time
Series Algorithm Microsoft Linear Regression Algorithm Predicting a
sequence Perform clickstream analysis of a company's Web site.
Analyze the factors leading to server failure. Capture and analyze
sequences of activities during outpatient visits, to formulate best
practices around common activities. Microsoft Sequence Clustering
Algorithm Finding groups of common items in transactions Use market
basket analysis to determine product placement. Suggest additional
products to a customer for purchase. Analyze survey data from
visitors to an event, to find which activities or booths were
correlated, to plan future activities. Microsoft Association
Algorithm Microsoft Decision Trees Algorithm Finding groups of
similar items Create patient risk profiles groups based on
attributes such as demographics and behaviors. Analyze users by
browsing and buying patterns. Identify servers that have similar
usage characteristics. Microsoft Clustering Algorithm Microsoft
Sequence Clustering Algorithm
Slide 6
Analytic Algorithm Categories Regression a powerful and
commonly used algorithm that evaluates the relationship of one
variable, the dependent variable, with one or more other variables,
called independent variables. By measuring exactly how large and
significant each independent variable has historically been in its
relation to the dependent variable, the future value of the
dependent variable can be estimated. Regression models are widely
used in applications, such as seasonal forecasting, quality
assurance and credit risk analysis.
Slide 7
Analytic Algorithm Categories Clustering / Segmentation the
process of grouping items together to form categories. You might
look at a large collection of shopping baskets and discover that
they are clustered corresponding to health food buyers, convenience
food buyers, luxury food buyers, and so on. Once these
characteristics have been grouped together, they can be used to
find other customers with similar characteristics. This algorithm
is used to create groups for applications, such as customers for
marketing campaigns, rate groups for insurance products, and crime
statistics groups for law enforcement.
Slide 8
Analytic Algorithm Categories Nearest Neighbor quite similar to
clustering, but it will only look at others records in the dataset
that are nearest to a chosen unclassified record based on a
similarity measure. Records that are near to each other tend to
have similar predictive values as well. Thus, if you know the
prediction value of one of the records, you can predict its nearest
neighbor. This algorithm works similar to the way that people think
by detecting closely matching examples. Nearest Neighbor
applications are often used in retail and life sciences
applications.
Slide 9
Analytic Algorithm Categories Association Rules detects related
items in a dataset. Association analysis identifies and groups
together similar records that would otherwise go unnoticed by a
casual observer. This type of analysis is often used for market
basket analysis to find popular bundles of products that are
related by transaction, such as low-end digital cameras being
associated with smaller capacity memory sticks to store the digital
images.
Slide 10
Analytic Algorithm Categories Decision Tree a tree-shaped
graphical predictive algorithm that represents alternative
sequential decisions and the possible outcomes for each decision.
This algorithm provides alternative actions that are available to
the decision maker, the probabilistic events that follow from and
affect these actions, and the outcomes that are associated with
each possible scenario of actions and consequences. Their
applications range from credit card scoring to time series
predictions of exchange rates.
Slide 11
Analytic Algorithm Categories Sequence Association detects
causality and association between time-ordered events, although the
associated events may be spread far apart in time and may seem
unrelated. Tracking specific time-ordered records and linking these
records to a specific outcome allows companies to predict a
possible outcome based on a few occurring events. A sequence model
can be used to reduce the number of clicks customers have to make
when navigating a companys website.
Slide 12
Analytic Algorithm Categories Neural Network a sophisticated
pattern detection algorithm that uses machine learning techniques
to generate predictions. This technique models itself after the
process of cognitive learning and the neurological functions of the
brain capable of predicting new observations from other known
observations. Neural networks are very powerful, complex, and
accurate predictive models that are used in detecting fraudulent
behavior, in predicting the movement of stocks and currencies, and
in improving the response rates of direct marketing campaigns.
Slide 13
Conventional BI Reporting Architecture
Slide 14
Slide 15
Excel Data Analysis Tool Analysis Category Anova: Single
Factormultiple linear regression Anova: Two-Factor with
replicationmultiple linear regression Anova: Two-Factor without
replicationmultiple linear regression Correlationlinear regression
Covariancelinear regression Descriptive Statisticslinear regression
Exponential Smoothingnave forecast F-Test Two-sample for
Varianceslinear regression Fourier Analysislinear regression
Histogramlinear regression Moving Averagelinear regression Random
Number GenerationN/A Rank and Percentileclustering Regressionlinear
regression SamplingN/A t-Test: Paired Two Sample for Meanslinear
regression t-Test: Two-Sample Assuming Equal Varianceslinear
regression t-Test: Two-Sample Assuming Unequal Varianceslinear
regression z-Test: Two Sample for Meanslinear regression
Slide 16
Table Analysis Tools for Excel (SQL Server 2008 Data Mining
Add-ins)Table Analysis Tools for Excel (SQL Server 2008 Data Mining
Add-ins) > The Analyze Key Influencers tool enables you to
select a column that contains a desired outcome or target value,
and then analyze the patterns in your data to determine which
factors had the strongest influence on the outcome. For example, if
you have a customer list that includes a column that shows the
total purchases for each customer over the past year, you could
analyze the table to determine the customer demographics for your
top purchasers. Microsoft SQL Server 2008 Data Mining Add-Ins for
Office 2007 Analyze Key Influencers (Table Analysis Tools for
Excel)
Slide 17
TaskDescriptionAlgorithms Market Basket AnalysisDiscover items
sold together to create recommendations on-the-fly and to determine
how product placement can directly contribute to your bottom line.
Association Decision Trees Churn AnalysisAnticipate customers who
may be considering canceling their service and identify the
benefits that will keep them from leaving. Decision Trees Linear
Regression Logistic Regression Market AnalysisDefine market
segments by automatically grouping similar customers together. Use
these segments to seek profitable customers. Clustering Sequence
Clustering ForecastingPredict sales and inventory amounts and learn
how they are interrelated to foresee bottlenecks and improve
performance. Decision Trees Time Series Data ExplorationAnalyze
profitability across customers, or compare customers that prefer
different brands of the same product to discover new opportunities.
Neural Network Unsupervised LearningIdentify previously unknown
relationships between various elements of your business to inform
your decisions. Neural Network Web Site AnalysisUnderstand how
people use your Web site and group similar usage patterns to offer
a better experience. Sequence Clustering Campaign AnalysisSpend
marketing funds more effectively by targeting the customers most
likely to respond to a promotion. Decision Trees Nave Bayes
Clustering Information QualityIdentify and handle anomalies during
data entry or data loading to improve the quality of information.
Linear Regression Logistic Regression Text AnalysisAnalyze feedback
to find common themes and trends that concern your customers or
employees, informing decisions with unstructured input. Text Mining
Microsoft Office 2007 Data Mining Tasks (4)
Slide 18
Data Analysis Expressions (DAX) is the standard PowerPivot
formula language that supports custom calculations in PowerPivot
tables and Excel PivotTables. While many of the functions used in
Excel are included, DAX also offers additional functions for
carrying out dynamic aggregation and other operations with your
data. (8) (7)
Slide 19
Time related calculated measures (10) Dax FormulaDescription
=IF( COUNTROWS(VALUES(DimDate[CalendarYear]))=1 Previous Year,
CALCULATE([Sales], PREVIOUSYEAR(DimDate[DateKey])), BLANK() ) OR or
=IF( COUNTROWS(VALUES(DimDate[CalendarYear]))=1, CALCULATE([Sales],
PARALLELPERIOD(DimDate[Datekey],-12,MONTH)), BLANK() ) or =IF(
COUNTROWS(VALUES(DimDate[CalendarYear]))=1, [Sales]
(PARALLELPERIOD(DimDate[Datekey],-12,MONTH)), BLANK() )
=IF(COUNTROWS(VALUES(DimDate[CalendarYear]) = 1, [Sales] -
CALCULATE([Sales], PREVIOUSYEAR(DimDate[Datekey])), Blank() )
Sales] - CALCULATE([Sales], PREVIOUSYEAR(DimDate[Datekey])) Year
over year growth, Blank() )
Slide 20
The DMX query editor for SQL Server Reporting Services
Reporting is a fundamental activity in most businesses, and SQL
Server 2008 Reporting Services provides a comprehensive solution
for creating, rendering, and deploying reports throughout the
enterprise. SQL Server Reporting Services can render reports
directly from a data mining model by using a data mining extensions
(DMX) query. This enables users to visualize the content of data
mining models for optimized data representation. Furthermore, the
ability to query directly against the data mining structure enables
users to easily include attributes beyond the scope of the mining
model requirements, presenting complete and meaningful information.
(4)
Slide 21
For more information about the functions that are supported for
each model type, see the following links: Association Model Query
ExamplesMicrosoft Naive Bayes Algorithm Clustering Model Query
ExamplesNeural Network Model Query Examples Decision Trees Model
Query ExamplesSequence Clustering Model Query Examples Linear
Regression Model Query ExamplesTime Series Model Query Examples
Logistic Regression Model Query Examples You can also call VBA
functions, or create your own functions. For more information, see
Functions (DMX).
Slide 22
SELECT PredictTimeSeries([Forecasting].[Amount]) as
[PredictedAmount], PredictTimeSeries([Forecasting].[Quantity]) as
[PredictedQty] FROM [Forecasting] Prediction Queries (Data Mining)
(9)
Slide 23
SQL Server 2008 data mining supports a number of application
programming interfaces (APIs) that developers can use to build
custom solutions that take advantage of the predictive analysis
capabilities in SQL Server. DMX, XMLA, OLEDB and ADOMD.NET, and
Analysis Management Objects (AMO) offer a rich, fully documented
development platform, empowering developers to build data mining
aware applications and providing real-time discovery and
recommendation through familiar tools. This extensibility creates
an opportunity for business organizations and independent software
vendors (ISVs) to embed predictive analysis into line-of-business
applications, introducing insight and forecasting that inform
business decisions and processes. For example, the Analytics
Foundation adds predictive scoring to Microsoft Dynamics CRM, to
enable information workers across sales, marketing, and service
organizations to identify attainable opportunities that are more
likely to lead to a sale, increasing efficiency and improving
productivity (for more information, see the Microsoft Dynamics
site).Microsoft Dynamics
Slide 24
Integration Services Data Mining Tasks and Transformations
--------------------------------------------------------------------------------
SQL Server Integration Services provides many components that
support data mining. Some tools in Integration Services are
designed to help automate common data mining tasks, including
prediction, model building, and processing. For example: 1)Create
an Integration Services package that automatically updates the
model every time the dataset is updated with new customers
2)Perform custom segmentation or custom sampling of case records.
3)Automatically generate models passed on parameters. However, you
can also use data mining in a package workflow, as an input to
other processes. For example: 1)Use probability values generated by
the model to weight scores for text mining or other classification
tasks. 2)Automatically generate predictions based on prior data and
use those values to assess the validity of new data. 3)Using
logistic regression to segment incoming customers by risk.
Slide 25
Data mining in SQL Server Integration Services Microsoft SQL
Server 2008 Integration Services provides a powerful, extensible
ETL platform that Business Intelligence solution developers can use
to implement ETL operations. SQL Server Integration Services
includes a Data Mining Model Training destination for training data
mining models, and a Data Mining Query transformation that can be
used to perform predictive analysis on data as it is passed through
the data flow. Integrating predictive analysis with SQL Server
Integration Services enables organizations to flag unusual data,
classify business entities, perform text mining, and fill-in
missing values on the fly based on the power and insight of the
data mining algorithms. (4)
Slide 26
After you have created a mining structure and mining model by
using the Data Mining Wizard, you can use the Data Mining Designer
from either SQL Server Data Tools (SSDT) or SQL Server Management
Studio to work with existing models and structures. The designer
includes tools for these tasks: 1)Modify the properties of mining
structures, add columns and create column aliases, change the
binning method or expected distribution of values. 2)Add new models
to an existing structure; copy models, change model properties or
metadata, or define filters on a mining model. 3)Browse the
patterns and rules within the model; explore associations or
decision trees. Get detailed statistics about 4)Custom viewers are
provided for each different time of model, to help you analyze your
data and explore the patterns revealed by data mining. 5)Validate
models by creating lift charts, or analyzing the profit curve for
models. Compare models using classification matrices, or validate a
data set and its models by using cross-validation. 6)Create
predictions and content queries against existing mining models.
Build one-off queries, or set up queries to generate predictions
for entire tables of external data.
Slide 27
SQL Server 2008 Analysis Services provides a highly scalable
platform for multidimensional OLAP analysis. Many customers are
already reaping the benefits of creating a unified dimensional
model (UDM) in Analysis Services and using it to slice and dice
business measures by multiple dimensions. Predictive analysis,
being part of SQL Server 2008 Analysis Services provides a richer
OLAP experience, featuring data mining dimensions that slice your
data by the hidden patterns within.(4) A data mining dimension in
an OLAP cube
Slide 28
Data Mining Algorithms (Analysis Services - Data Mining)
Choosing an Algorithm by Task To help you select an algorithm for
use with a specific task, the following table provides suggestions
for the types of tasks for which each algorithm is traditionally
used.
Slide 29
Many businesses use KPIs to evaluate critical business metrics
against targets. SQL Server 2008 Analysis Services provides a
centralized platform for KPIs across the organization, and
integration with Microsoft Office PerformancePoint Server 2007
enables decision makers to build business dashboards from which
they can monitor the companys performance. KPIs are traditionally
retrospective, for example showing last months sales total compared
to the sales target. However, with the insights made possible
through data mining, organizations can build predictive KPIs that
forecast future performance against targets, giving the business an
opportunity to detect and resolve potential problems proactively.
Predictive analysis can detect attributes that influence KPIs.
Together with Office PerformancePoint Server 2007, users can
monitor trends in key influencers to recognize those attributes
that have a sustained effect. Such insights enable businesses to
inform and improve their response strategy. (4) Microsoft Office
PerformancePoint Server 2007
Slide 30
The SQL Server data mining toolset is fully extensible through
Microsoft.NETstored procedures, plug-in algorithms, custom
visualizations and PMML. This enables developers to extend the
out-of-the-box data mining technologies of SQL Server 2008 to meet
uncommon business needs that are specific to the organization by:
Creating custom data mining algorithms to solve business-specific
analytical problems. Using data mining algorithms from other
software vendors. Creating custom visualizations of data mining
models through plug-in viewer APIs. Although the data mining
functionality provided with SQL Server 2008 is comprehensive enough
to meet the needs of a wide range of business scenarios, its
extensibility ensures that it can be used to solve virtually any
predictive problem. The ability to extend the data mining
technologies of SQL Server through custom algorithms and
visualizations, together with the ability to embed predictive
functionality into line-of-business applications makes SQL Server
2008 a powerful platform for introducing predictive analysis into
existing business processes to add insight and recommendations into
everyday operations. (4)
Slide 31
Plugin Algorithms SQL Server 2012 SQL Server 2008 R2 SQL Server
2008 SQL Server 2005 In addition to the algorithms that Microsoft
SQL Server Analysis Services provides, there are many other
algorithms that you can use for data mining. Accordingly, Analysis
Services provides a mechanism for "plugging in" algorithms that are
created by third parties. As long as the algorithms follow certain
standards, you can use them within Analysis Services just as you
use the Microsoft algorithms. Plugin algorithms have all the
capabilities of algorithms that SQL Server Analysis Services
provides. For a full description of the interfaces that Analysis
Services uses to communicate with plugin algorithms, see the
samples for creating a custom algorithm and custom model viewer
that are published on CodePlex Web site.
Slide 32
One Way ANOVA (Analysis of Variance) When to Use One-Way,
Single Factor ANOVA In a manufacturing or service environment, you
might wonder if changing a formula, process or material might
deliver a better product at a lower cost. Saving a penny a pound on
five million pounds a month can really add up. Saving ten minutes
of wait time in hospital might add $100,000 to the bottom line and
deliver better patient outcomes. Comparing two or more drug
formulations might pinpoint the best drug for a desired result. How
can you compare the old formula with a new one and be certain that
you have an opportunity to improve? Use one-way ANOVA (also known
as single factor ANOVA) to determine if there's a statistically
significant difference between two or more alternatives.
Slide 33
One Way ANOVA (Analysis of Variance) Imagine that you
manufacture paper bags and that you want to improve the tensile
strength of the bag. You suspect that changing the concentration of
hardwood in the bag will change the tensile strength. You measure
the tensile strength in pounds per square inch (PSI). So, you
decide to test this at 5%, 10%, 15% and 20% hardwood concentration
levels. These "levels" are also called "treatments." Since we are
only evaluating a single factor (hardwood concentration) this is
called one-way ANOVA. The null hypothesis is that the means are
equal: H 0 : Mean1 = Mean2 = Mean3 = Mean4 The alternate hypothesis
is that at least one of the means are different: H a : At least one
of the means is different To conduct the one-way ANOVA test, you
need to randomize the trials (assumption #1). Imagine that we've
conducted these trials at each of the four levels of hardwood
concentration.
Slide 34
One Way ANOVA (Analysis of Variance) You'll find the results of
these trials in the ANOVA test data provided with the QI Macros at
c:\qimacros\testdata\anova.xls.c:\qimacros\testdata\anova.xls The
QI Macros will prompt you for the significance level you desire.
While the default is 0.05 (95% confident), in this example we want
to be even more certain, so we use 0.01 (99% confident).
Slide 35 critical value (i.e. F> Fcrit) Reject the null
hypothesis test statistic < critical value (i.e. F< Fcrit)
Accept the null hypothesis p value < Reject the null hypothesis
p value > Accept the null hypothesisThe P-value of 0.000 is less
than the significance level (0.01), so we can reject the null
hypothesis and safely assume that hardwood concentration affects
tensile strength. F (19.60521) is greater than F crit (4.938193),
so again, we can reject the null hypothesis. Interpreting the Anova
One Way test results The QI Macros automatically compares the p
value to , but you might want to know how to do this manually. The
"null" hypothesis assumes that there is no difference between the
hardwood concentrations. The P-value of 0.000 is less than the
significance level (0.01), so we can reject the null hypothesis and
safely assume that hardwood concentration affects tensile strength.
F (19.60521) is greater than F crit (4.938193), so again, we can
reject the null hypothesis.">
One Way ANOVA (Analysis of Variance) Interpreting the Anova One
Way test results The QI Macros automatically compares the p value
to , but you might want to know how to do this manually. The "null"
hypothesis assumes that there is no difference between the hardwood
concentrations. IfThen test statistic > critical value (i.e.
F> Fcrit) Reject the null hypothesis test statistic <
critical value (i.e. F< Fcrit) Accept the null hypothesis p
value < Reject the null hypothesis p value > Accept the null
hypothesisThe P-value of 0.000 is less than the significance level
(0.01), so we can reject the null hypothesis and safely assume that
hardwood concentration affects tensile strength. F (19.60521) is
greater than F crit (4.938193), so again, we can reject the null
hypothesis. Interpreting the Anova One Way test results The QI
Macros automatically compares the p value to , but you might want
to know how to do this manually. The "null" hypothesis assumes that
there is no difference between the hardwood concentrations. The
P-value of 0.000 is less than the significance level (0.01), so we
can reject the null hypothesis and safely assume that hardwood
concentration affects tensile strength. F (19.60521) is greater
than F crit (4.938193), so again, we can reject the null
hypothesis.
Slide 36
One Way ANOVA (Analysis of Variance) Now we can look at the
average tensile strength and variances: The average tensile
strength increases, but we cannot say for certain which means
differ. The variance at the 15% level looks substantially lower
than the other levels. We might need to do additional analysis. If
we reran the one way Anova test with just 10% and 15%, we'd
discover there is no statistically significant difference between
the two means. The P value (0.349) is greater than the signficance
level (0.01), so we cannot reject the null hypothesis that the
means are equivalent. And F (0.963855) is less than F crit
(10.04429) so we cannot reject the null hypothesis. Based on this
analysis, if we were aiming for a tensile strength of 15 PSI or
greater, the 10% level might be more cost effective.
Slide 37
Two Way ANOVA (Analysis of Variance) - Without Replication
What's cool about QI Macros Two-Way ANOVA? Unlike other statistical
software, the QI Macros is the only SPC software that compares the
p-values to the significance level and tells you when to "Accept or
Reject the Null Hypothesis" and what that tells you: "Means are
Same or Different ". Two Way Anova (Analysis of variance), also
known as two factor Anova, can help you determine if two factors
have the same "mean" or average. This is a form of "hypothesis
testing."hypothesis testing
Slide 38
Two Way ANOVA (Analysis of Variance) - Without Replication The
null hypothesis is that the means are equal: H0: Factor 1's Means =
Factor 2's Means The alternate hypothesis is: Ha: The means are
different. The goal is to accept or reject the null hypothesis
(i.e., the samples have different means) at a certain confidence
level (95% or 99%).
Slide 39
Two Way ANOVA (Analysis of Variance) - Without Replication
Using Excel and the QI Macros, run a two-way analysis without
replication (alpha=0.05 for a 95% confidence). Click on QI Macros
menu and select: ANOVA Two Factor without replication.
Slide 40
Two Way ANOVA (Analysis of Variance) - Without Replication
Interpreting the Anova Two Way Without Replication Results In case
you want to know how to do this manually, use these instructions.
IfThen test statistic > critical value (i.e. F> Fcrit) Reject
the null hypothesis test statistic < critical value (i.e. F<
Fcrit) Accept the null hypothesis p value < Reject the null
hypothesis p value > Accept the null hypothesisHere, the P-value
for Rows (i.e., golfers) is less than alpha (0.05), so we can
reject the hypothesis that all of the golfers are the same. The
P-Value for Columns (i.e., golf balls) is also less than alpha, so
we can reject the hypothesis that all of the golf balls are the
same. Interpreting the Anova Two Way Without Replication Results In
case you want to know how to do this manually, use these
instructions. Here, the P-value for Rows (i.e., golfers) is less
than alpha (0.05), so we can reject the hypothesis that all of the
golfers are the same. The P-Value for Columns (i.e., golf balls) is
also less than alpha, so we can reject the hypothesis that all of
the golf balls are the same.
Slide 41
Two Way ANOVA (Analysis of Variance) - Without Replication It
does look like Brand B and C are similar. We could run a paired two
sample t test on Brands B and C to determine if they deliver the
same average distance.paired two sample t test Since the p values
are greater than alpha (0.05), we can accept the null hypothesis
that there is no difference between the two brands of golf balls,
except perhaps price.
Slide 42 Mean2 Mean3 The goal is to accept or reject the null
hypothesis (i.e., the samples have different means) at a certain
confidence level (95% or 99%).">
Two Way ANOVA (Analysis of Variance) With Replication When to
Use Two Way Anova Two Way Anova (Analysis of variance), also known
as two factor Anova, can help you determine if two or more samples
have the same "mean" or average. This is a form of "hypothesis
testing." The null hypothesis is that the means are equal. The
alternate hypothesis is that the means are not equal. H0: Mean1 =
Mean2 = Mean3 Ha: Mean1 Mean2 Mean3 The goal is to accept or reject
the null hypothesis (i.e., the samples have different means) at a
certain confidence level (95% or 99%).
Slide 43
Two Way ANOVA (Analysis of Variance) With Replication What if
you have two populations of patients (male/female) and three
different kinds of medications, and you want to evaluate their
effectiveness? You might run a study with three "replications",
three men and three women.
Slide 44 0.05" and that the "Means are the same ".">
Two Way ANOVA (Analysis of Variance) With Replication Using the
QI Macros, run a two-way Anova analysis with replication
(alpha=0.05 for a 95% confidence). What's cool about QI Macros
ANOVA? Unlike other statistical software, the QI Macros is the only
SPC software that compares the p-values (0.179) to the signficance
(0.05) and tells you to "Accept the Null Hypothesis because
p>0.05" and that the "Means are the same ".
Slide 45
Two Way ANOVA (Analysis of Variance) With Replication IfThen
test statistic > critical value (i.e. F> Fcrit) Reject the
null hypothesis test statistic < critical value (i.e. F<
Fcrit) Accept the null hypothesis p value < Reject the null
hypothesis p value > Accept the null hypothesisHere, the P-value
for Male/Female is greater than alpha (.179>.05), so we accept
the null hypothesis that the means are the same. The P-Value for
Drugs is greater than alpha (.106 >.05), so the null hypothesis
holds as well (means are the same). The P-value for the interaction
of the drugs and patients is less than alpha (.006