Upload
karthik-abhi
View
12
Download
0
Embed Size (px)
DESCRIPTION
Active technique [5] [6] [7]: it is based on the same principle as passive cancellation, which is the creation of an appropriate destructive echo, which would cancel the real echo of the target to the radar. The target will emit electromagnetic energy synchronized with the received radar energy to minimize the reflected signal. 1.2 Benefits of high manoeuvrability in combatManoeuvrability can be defined as the rate at which an aircraft can change speed, altitude, and direction in any desired combination. Another important characteristic of a fighter aircraft is its Agility that measures the rate change of manoeuvrability with respect to time. Erich Hartmann, world’s top fighter pilot in WW II who had 352 confirmed air-to-air kills had a formula of formula: See – Decide – Attack – Coffee Break [8]. The whole purpose of an air combat is to be superior in agility in order to evade a pursuer and corner a quarry. It relies on offensive and defensive basic fighter manoeuvres to gain an advantage over an opponent. This can be achieved with high manoeuvrability only. In any combat manoeuvre, energy gets quickly depleted. Instead of using kinetic energy (speed) to chase the enemy or potential energy (altitude) to get rid of the enemy, if the aircraft is highly manoeuvrable, the Excess Power, (T-D)*V/W of the aircraft can be used to engage with the enemy, kill the targets, disengage and quickly get into a safe position. The initial advantage goes to the pilot who enters combat with most aircraft energy either in terms of speed (kinetic energy) and/or altitude (potential energy), and can manoeuvre over the enemy in quick time. Also when locked by the enemy missile radar or IR trackers, if the aircraft is agile (can quickly manoeuvre), it can survive against the missiles that are not as agile as the aircraft itself. 1.3 Comparison of the benefits of stealth over manoeuvrability in combatsThe stand taken by self is that, in aerial combat a highly manoeuvrable fighter aircraft is preferred over a Stealth fighter. The logical arguments for the stand taken here are discussed below.Low radar signature (Stealth aircrafts) means, the target can be detected and tracked by radar at a shorter distance only. It does not mean the aircraft has complete disappearance from the radar screens. Usually in an air-to-air combat, the enemy fires a missile at the target from a close in range only in order to achieve 100% kill rate. But the missile exhaust is a brilliant flame and being visible from a long distance it can be visually tracked by the target pilot. This coupled with the fact that missiles are less manoeuvrable than the fighter itself and hence have difficulty following an agile fighter, they too often miss their target if the target is agile. The result is usually a close range dogfight where the cannon/gun is the proven weapon; at such ranges, the missile is unusable and hence harmless. Stealth aircrafts have higher flyaway and maintenance costs, while they have significant operational limitations due to the specific aircraft shape imposed and materials used. Also due to their limited ability to carry fuel and weapons internally, they become inefficient in an air-to-air combat scenario. Further, having realized the capabilities of stealth aircrafts, many countries have been developing anti-stealth technologies like the multistatic radars, very low frequency radars, over-the-horizon radars and sensitive IR sensor systems. Be it a highly manoeuvrable fighter or a stealth aircraft, when under attack, the target always tries to turn into the enemy as quickly as possible because turning away will make it easier for the enemy to kill you. And this turn has to be done at a very fast rate. If the aircraft is not agile, it falls prey to the enemy. Most of the aerial attacks are ambushed ones, where manoeuvrability is the key. Superior pilot skill to tackle the enemy is the most important factor in any aerial combat. To achieve this, it is b
Citation preview
M. S. Ramaiah University of Applied Sciences
1
Data Analysis
Session Speaker
K.M. Sharath Kumar
Session 6
M. S. Ramaiah University of Applied Sciences
22
Session Objectives
>_To explain the relevance of data analysis for carrying outresearch
>_To explore different types of data analysis techniques foreffective interpretation
>_To critique and recommend appropriate exploratory dataanalysis techniques for a problem
M. S. Ramaiah University of Applied Sciences
33
Session Outline
Sampling Design
Data Collection Methods
Quantitative and Qualitative Data Analysis
Stages in Data Analysis
Review of Techniques
Error Analysis
M. S. Ramaiah University of Applied Sciences
44
M. S. Ramaiah University of Applied Sciences
55
One Variant
6,200 Distinct Parts
Imported from 17 Countries
From 240 Suppliers
Assembled in 1 Plant
Within few minutes
Exported to 34 Countries
Same day
Without becoming inventory!
Suzuki Grand Vitara
M. S. Ramaiah University of Applied Sciences
6
The secret of success is to know something nobody else knows
- Aristotle Onassis
M. S. Ramaiah University of Applied Sciences
7
Turn Data into InsightInsight into Action
Action into Tangible Results
- Accenture
M. S. Ramaiah University of Applied Sciences
8
Data Analysis (1/2)
Explore relationships among the variables
Partition the total variability (by statement / variance component analysis)
Handle noisy data appropriately
Questions to be answered:
Is the process stable?
Is the process capable of meeting specifications?
What are the major sources of variation (noise, etc)?
Listen to what the data is saying
M. S. Ramaiah University of Applied Sciences
9
Data Analysis (2/2)
Data Analysis is carried out in two distinct environment
Result of a special study or Experiment
By product of some operations or Observational
Experimental Studies
Here we compare various condition and try to determinewhich condition is better. We have finite amount of data andcarry out one time analysis
Observational Studies
Here we get data from steady state process and trying to findout any unplanned change is occurred or not. Generally weperform a sequential analysis using a continuing stream ofdata
M. S. Ramaiah University of Applied Sciences
10
Quantitative vs. Qualitative
Explanation through numbers
Objective
Deductive reasoning
Predefined variables and measurement
Data collection before analysis
Cause and effect relationships
Explanation through words
Subjective
Inductive reasoning
Creativity, extraneous variables
Data collection and analysis intertwined
Description, meaning
Classification of Data Analysis
M. S. Ramaiah University of Applied Sciences
1111
Ambushed Every Where
M. S. Ramaiah University of Applied Sciences
12
Data analysis should be:
Supported by data
Shown in graphical and statistical format
Not based on intuition
Make sense from an engineering standpoint
Data and Hard Evidence!!
M. S. Ramaiah University of Applied Sciences
13
Key Components of a Data Analysis Plan
Purpose of the evaluation
Questions
What you hope to learn from the question
Analysis technique
How data will be presented
M. S. Ramaiah University of Applied Sciences
1414
Types of Data
Continuous Data
Discrete Data
M. S. Ramaiah University of Applied Sciences
1515
Continuous Data
Data generated by
Physically measuring the characteristic
Generally using an instrument
Assigning an unique value to each item
Examples:
Time to receive a shipment, Time spend per page, Time to
activate, CPU Speed, Total Minutes per Incident (TMPI),
etc.
Hardness, Strength, Weight, Diameter, etc.
M. S. Ramaiah University of Applied Sciences
1616
Discrete Data
Data generated by
Classifying the items into different groups based on
some criteria
No physical measurement is involved
Examples:
Sex, Shade variation, Surface defects etc.
% of visitors signing in for AOL messenger per day,
Number of Recharges per Month , Number of Operating
Systems, % Escalations, etc .
M. S. Ramaiah University of Applied Sciences
1717
Continuous Data: Example (Time spend per page visit (in
minutes))
SL No. Data SL No. Data
1 0.98 11 1.02
2 1.03 12 0.98
3 1.00 13 1.01
4 1.00 14 1.01
5 0.99 15 0.99
6 1.01 16 1.00
7 0.97 17 1.01
8 1.02 18 0.99
9 1.00 19 1.00
10 0.99 20 1.02
M. S. Ramaiah University of Applied Sciences
1818
Continuous Data: Example (Time spend per visit (in
minutes)) Graphical Representation
0
1
2
3
4
5
6
0.9631 0.9731 0.9831 0.9931 1.0031 1.0131 1.0231 1.0331
M. S. Ramaiah University of Applied Sciences
19
Random Variables
BBBB
BGBB
GBBB
BBBG
BBGB
GGBB
GBBG
BGBG
BGGB
GBGB
BBGG
BGGG
GBGG
GGGB
GGBG
GGGG
0
1
2
3
4
X
Sample Space
Points on the
Real Line
M. S. Ramaiah University of Applied Sciences
20
Suppose, the random variable X = 3 when any of the four outcomes BGGG, GBGG, GGBG, or GGGB occurs,
P(X = 3) = P(BGGG) + P(GBGG) + P(GGBG) + P(GGGB) = 4/16
The probability distribution of a random variable is a table that lists the possible values of the random variables and their associated probabilities.
x P(x)0 1/161 4/162 6/163 4/164 1/16
16/16=1
Random Variables (Continued)
The Graphical Display for this Probability Distributionis shown on the next Slide.
M. S. Ramaiah University of Applied Sciences
21
Random Variables (Continued)
Number of Girls, X
Pro
ba
bili
ty,
P(X
)
43210
0.4
0.3
0.2
0.1
0.0
1/16
4/16
6/16
4/16
1/16
Probability Distribution of the Number of Girls in Four Births
M. S. Ramaiah University of Applied Sciences
22
Consider the experiment of tossing two six-sided dice. There are 36 possible
outcomes. Let the random variable X represent the sum of the numbers on
the two dice:
2 3 4 5 6 7
1,1 1,2 1,3 1,4 1,5 1,6 8
2,1 2,2 2,3 2,4 2,5 2,6 9
3,1 3,2 3,3 3,4 3,5 3,6 10
4,1 4,2 4,3 4,4 4,5 4,6 11
5,1 5,2 5,3 5,4 5,5 5,6 12
6,1 6,2 6,3 6,4 6,5 6,6
x P(x)2 1/363 2/364 3/365 4/366 5/367 6/368 5/369 4/3610 3/3611 2/3612 1/36
1
12111098765432
0.17
0.12
0.07
0.02
x
p(x
)
P robab ility Dis tribution o f S um of Two Dice
Example
M. S. Ramaiah University of Applied Sciences
23
NORMAL DISTRIBUTION
M. S. Ramaiah University of Applied Sciences
2424
Generic Causes Of Variation
Machines
Materials
Methods
Measurements
Mother Nature
People
PROCESS
M. S. Ramaiah University of Applied Sciences
25
Center of the barSmooth curve interconnecting the center of each bar
Units of Measure
THE NORMAL CURVE
M. S. Ramaiah University of Applied Sciences
26
If the frequency distribution of a set of values is such that :
68.26% of the values lie within 1 from the meanAND
95.46% of the values lie within 2 from the meanAND
99.73% of the values lie within 3 from the mean
Then the distribution is normal.
NORMAL DISTRIBUTION IS CHARACTERISED BY A BELL SHAPED CURVE.
Normal Distribution
M. S. Ramaiah University of Applied Sciences
27
Standard Normal Distribution
Since each normal variables have different units of measurement
Standard Normal Distribution can tackle this
Standard Normal Variable Z = (x ) /
First convert the original problem into Z. The probability table for Zwill be available
M. S. Ramaiah University of Applied Sciences
28
Sampling Design
M. S. Ramaiah University of Applied Sciences
2929
Population (N) Sample (n)
Samples and Populations
M. S. Ramaiah University of Applied Sciences
30
Sampling Design within the Research Process
Draw
sample
Question hierarchy
Sample Type
Sampling
technique
Define Relevant
Population
Identify existing
sampling frame
Evaluate
sampling frame
Modify
sampling frame
Dont
accept
Probability
Non-Probability
Select
sampling frame
M. S. Ramaiah University of Applied Sciences
31
Types of Sampling
Probability Sampling Non-Probability
Sampling
Simple
Random
Sampling
Stratified
Random
Sampling
Systematic
Sampling
Cluster
Sampling
Convenience
Sampling
Quota
Sampling
Expert
Sampling
M. S. Ramaiah University of Applied Sciences
32
In stratified random sampling, we assume that the population of N units may be divided into m groups with Ni units in each group i=1,2,...,m. The m strata are nonoverlapping and together they make up the total population: N1 + N2+...+ Nm =N.
Stratified Random Sampling
2 Stratum
1 Stratum
mStratum
1N
2N
mN
The m strata are non-overlapping.
NNm
i i
1
Population
M. S. Ramaiah University of Applied Sciences
33
Systematic Random Sampling
Units are drawn from the population at regular intervals clearly defined
Steps
- Compute K =(N/n) and take integer value. K is called sampling interval
- Select a random number between 1 and k
- Starting with this number, select every kth number until all the n units are selected
M. S. Ramaiah University of Applied Sciences
34
Example
Suppose in a market survey, you have to select 5 households out of 50 households in a block.
- Number of units in the population N = 50
- Number of units in the sample n = 5
- Sampling Interval K = (N/n) = 50/5 = 10
- Select a random number between 1 and 10
Suppose the selected random number is 5. Starting with 5, select every 10th unit.
M. S. Ramaiah University of Applied Sciences
35
Example Contd.
1 2 3 4 5 6 7 8 910 11 12 13 14 15 16 17 1819 20 21 22 23 24 25 26 2728 29 30 31 32 33 34 35 3637 38 39 40 41 42 43 44 4546 47 48 49 50
M. S. Ramaiah University of Applied Sciences
36
7654321Group
Population Distribution
In stratified sampling a random sample (ni) is chosen from each segment of the population (Ni).
Sample Distribution
In cluster sampling observations are drawn from m out of M areas or clusters of the population.
Cluster Sampling
M. S. Ramaiah University of Applied Sciences
37
Caution
None of the Non-probability sampling should be generalised about the population
M. S. Ramaiah University of Applied Sciences
38
Sampling Distribution
- A conceptual framework
M. S. Ramaiah University of Applied Sciences
39
Sampling Distribution of the Mean from Normal Population
If X1, X2,.., Xn are n independent random samples drawn from a normal population with mean and standard deviation ,
then
the sampling distribution of X follows a normal distribution with mean and standard deviation / sqrt(n)
Standard deviation of the sample mean = = standard error
nnX XXXX ni
.......21
n
M. S. Ramaiah University of Applied Sciences
40
Standard error
of statistic
Sample size = n
Sample size = 2n
Standard error
of statistic
The sample size determines the bound of a statistic, since the standard error of a statistic shrinks as the sample size increases:
Sample Size and Standard Error
M. S. Ramaiah University of Applied Sciences
41
Determining Sample Size
M. S. Ramaiah University of Applied Sciences
42
Determining Sample Size using Confidence Interval
If we know the precision (sampling error), the confidencelevel, and the standard deviation of the originalpopulation the sample size can be determined
M. S. Ramaiah University of Applied Sciences
43
Sample Size Determination Population Mean
n
xZ
Sampling Error E = X , squaring both sides we get
EZn
2
22
Where Z is the value corresponding to the area of
((1-) / 2) from the mean of the standard normal
distribution
M. S. Ramaiah University of Applied Sciences
44
Example
A marketing manager of a fast food restaurant in a city wishes toestimate the average yearly amount that families spend on fastfood restaurants. He wants the estimate to be within + or Rs. 100with a confidence interval of 99%. It is known from an earlier pilotstudy that the standard deviation of the family expenditure on fastfood restaurant is Rs. 500. How many families must be chosen forthis problem?
M. S. Ramaiah University of Applied Sciences
45
Solution
Applying the formula
n = ((2.58^2) * (500^2)) / (100^2) = 166.41
= 166 (ROUNDED OFF)
EZn
2
22
M. S. Ramaiah University of Applied Sciences
46
Sample Size Determination Population Proportion
We know
Sampling Error E = (p-P), squaring both sides and simplifying
We get:
n
pp
PpZ
)1(
EZ ppn
2
2)1(
Where Z is the value corresponding to the area of
((1-) / 2) from the proportion of the standard
normal distribution
M. S. Ramaiah University of Applied Sciences
47
Example
A company manufacturing sports goods wants to estimate theproportion of cricket players among high school students in India.The company wants the estimate to be within + or 0.03 with aconfidence interval of 99%. A pilot study done earlier reveals thatout of 80 high school students, 36 students play cricket. Whatshould be the sample size?
M. S. Ramaiah University of Applied Sciences
48
Solution
p = 36/80 = 0.45
Applying the formula
n = ((2.58^2) (0.45(1-0.45)))/(0.03^2)
n = 1831
M. S. Ramaiah University of Applied Sciences
49
Data Collection Methods
Primary Data Collection
Secondary Data Collection
M. S. Ramaiah University of Applied Sciences
50
Data Collection Methods
Primary Data
Observation method
Interview method
Questionnaires
Warranty cards
Mechanical devices
Secondary Data
Agency
Published material etc.
M. S. Ramaiah University of Applied Sciences
51
Scales of Measurement
Nominal Scale - groups or classes
Gender
Ordinal Scale - order matters
Ranks (top ten videos)
Interval Scale - difference or distance matters has arbitrary zero value
Temperatures (0F, 0C)
Ratio Scale - Ratio matters has a natural zero value
Salaries
Likerts Scale
M. S. Ramaiah University of Applied Sciences
52
Sample Rating Scales
Simple category scale: (data: nominal)
Ex:
I plan to purchase a laptop in next twelve months
Yes
No
M. S. Ramaiah University of Applied Sciences
53
Sample Rating Scales
Multiple choice Single response scale (data: nominal)
Ex:
What newspaper do you read most often?
TOI
DH
The Hindu
Mint
Others (specify:_________)
M. S. Ramaiah University of Applied Sciences
54
Sample Rating Scales
Likert Scale (data: interval)
Ex:
The internet is superior to traditional libraries for comprehensive searches
Strongly Agree Neutral Disagree Strongly
Agree Disagree
M. S. Ramaiah University of Applied Sciences
55
Sample Rating Scales
Semantic Differential Scale (data: interval)
Ex:
Lands end catalog
Fast ___: ___ : ___ : ___ : ___ : ___ : ___ : Slow
M. S. Ramaiah University of Applied Sciences
56
Sample Rating Scales
Numerical Scale (data: ordinal or interval)
Ex:
Extremely 5 4 3 2 1 Extremely
Favourable Unfavourable
Employees cooperation in teams___
Employees knowledge of task ___
Employees planning effectiveness ___
M. S. Ramaiah University of Applied Sciences
57
Sample Rating Scales
Multiple rating list scale (data: interval)
Ex:
Please indicate how important or unimportant each service characteristic is:
Important Unimportant
Fast reliable repair 7 6 5 4 3 2 1
Service at my location 7 6 5 4 3 2 1
Maintenance by manufacturer 7 6 5 4 3 2 1
Knowledgeable technicians 7 6 5 4 3 2 1
Service contract after warranty 7 6 5 4 3 2 1
M. S. Ramaiah University of Applied Sciences
58
Sample Rating Scales
Constant-Sum Scale (data: ratio)
Ex:
Taking all the supplier characteristics we have just discussed and now considering cost, what is their relative importance to you (dividing 100 units between)
Being one of the lowest cost suppliers
All other aspects of supplier performance
Sum 100
M. S. Ramaiah University of Applied Sciences
59
Sample Rating Scales
Stapel Scale (data: ordinal or interval)
Ex:
Company Name
+3 +3 +3
+2 +2 +2
+1 +1 +1
Technology Existing Reputation
Leader Products
-1 -1 -1
-2 -2 -2
-3 -3 -3
M. S. Ramaiah University of Applied Sciences
60
Sample Rating Scales
Graphic rating scale (data: ordinal or interval or ratio)
Ex:
How likely are you to recommend complete care to others?
Very Likely Very Unlikely
M. S. Ramaiah University of Applied Sciences
61
Data Analysis
M. S. Ramaiah University of Applied Sciences
62
The cure for boredom is curiosity ,There is no cure for curiosity
- Dorothy Parker
M. S. Ramaiah University of Applied Sciences
63
Things arent always what we think!
Six blind men go to observe an elephant. One feels the side and thinks the
elephant is like a wall. One feels the tusk and thinks the elephant is a like a
spear. One touches the squirming trunk and thinks the elephant is like a
snake. One feels the knee and thinks the elephant is like a tree. One
touches the ear, and thinks the elephant is like a fan. One grasps the tail and
thinks it is like a rope. They argue long and loud and though each was partly
in the right, all were in the wrong.
For a detailed version of this fable see:
http://www.wordinfo.info/words/index/info/view_unit/1/?letter=B&spage=3
Blind men and an elephant
- Indian fable
M. S. Ramaiah University of Applied Sciences
64
Stages in Data Analysis
Editing
coding
Data entry
Key Boarding
Data
Analysis
Descriptive
analysis
Univariate
analysis
Bivariate
analysis
Multivariate
analysis
Interpretation
Err
or
chec
kin
g
and ver
ific
atio
n
M. S. Ramaiah University of Applied Sciences
65
Descriptive Analysis Techniques
Count (frequencies)
Percentage
Mean
Mode
Median
Range
Standard deviation
Variance
Ranking
M. S. Ramaiah University of Applied Sciences
66
Overview of the Stages in Data Analysis
Editing
coding
Data entry
Key Boarding
Data
Analysis
Descriptive
analysis
Univariate
analysis
Bivariate
analysis
Multivariate
analysis
Interpretation
Err
or
chec
kin
g
and ver
ific
atio
n
M. S. Ramaiah University of Applied Sciences
67
Frequency Distributions
To what extent did you increase your skills in
putting together a household budget?
A lot Some A little Not at all
Women (N=30) 14 9 5 2
Uni-variate Analysis The analysis of a single variable, for
purposes of description (examples: frequency distribution,
averages, and measures of dispersion)
M. S. Ramaiah University of Applied Sciences
68
Percentage Distributions
To what extent did you increase your skills in
putting together a household budget?
A lot Some A little Not at all
Women (N=30) 46% 30% 17% 7%
M. S. Ramaiah University of Applied Sciences
69
Graphing Frequency DataHow did you first hear about the web site?
N Percent
Court Referral 10 24.4%
Social Worker 5 12.2%
Friend 5 12.2%
Web Search Engine 8 19.5%
Librarian 9 22.0%
Newspaper Story 3 7.3%
Other 1 2.4%
41 100.0%
How did you first hear about the web-site?
Court Referral
Social Worker
Friend or Acquaintance
Web Search EngineLibrarian
Newspaper Story
Other
M. S. Ramaiah University of Applied Sciences
70
Means and Medians
History 95
English 96
Biology 93
Latin 92
Math 98
Music 94
Gym 40
Mean = 87
Median = 94
Math 98
English 96
History 95
Music 94
Biology 93
Latin 92
Gym 40
M. S. Ramaiah University of Applied Sciences
71
Note
40 50 55 94 100 100 100
40 92 93 94 95 96 98
Mean = 81
Mean = 87
M. S. Ramaiah University of Applied Sciences
72
Histograms
0
1
2
3
4
5
6
60 70 80 90 100
Fre
qu
en
cy
0
1
2
3
4
5
6
7
60 70 80 90 100
Fre
qu
en
cy
M. S. Ramaiah University of Applied Sciences
73
Cross Tabulations
Program Type Area of Inquiry Outcome
Web site Employment law Satisfied
I & R Line Family law Not satisfied
Law clinic Immigration Pending
Web site Immigration Satisfied
I & R Line Immigration Satisfied
I & R Line Family law Not satisfied
Web site Employment law Not satisfied
Law clinic Other Satisfied
I & R Line Other Not satisfied
I & R Line Other Satisfied
Law clinic Employment law Satisfied
Web site Family law Satisfied
Law clinic Family law Satisfied
Web site Immigration Not satisfied
Law clinic Immigration Not satisfied
I & R Line Family law Satisfied
I & R Line Immigration Not satisfied
I & R Line Employment law Not satisfied
Law clinic Other Pending
Count of Outcome Outcome
Program Type Not satisfied Pending Satisfied Grand Total
I & R Line 7 5 12
Law clinic 1 3 7 11
Web site 6 5 11
Grand Total 14 3 17 34
Count of Outcome Outcome
Program Type Not satisfied Pending Satisfied Grand Total
I & R Line 58% 0% 42% 100%
Law clinic 9% 27% 64% 100%
Web site 55% 0% 45% 100%
Grand Total 41.18% 8.82% 50.00% 100.00%
M. S. Ramaiah University of Applied Sciences
74
Graphing comparisons
Satisfaction with Services
0
5
10
15
20
25
30
35
40
A B C D E
Clinic Name
Sati
sfact
ion
Sco
re
M. S. Ramaiah University of Applied Sciences
75
Satisfaction with Services
0
2
4
6
8
10
12
14
16
A B C D E
Clinic
Sa
tisf
act
ion
Sco
re
Staff
Advice
Facility
M. S. Ramaiah University of Applied Sciences
76
Satisfaction with Services
0
2
4
6
8
10
12
14
16
Staff Advice Facility
Satisfaction Component
Sa
tisf
act
ion
Sco
re
A
B
C
D
E
M. S. Ramaiah University of Applied Sciences
77
Overview of the Stages in Data Analysis
Editing
coding
Data entry
Key Boarding
Data
Analysis
Descriptive
analysis
Univariate
analysis
Bivariate
analysis
Multivariate
analysis
Interpretation
Err
or
chec
kin
g
and ver
ific
atio
n
M. S. Ramaiah University of Applied Sciences
78
Bi-variate Analysis
The analysis of two variables simultaneously fordetermining the empirical relationship between them
Y = f (X)
M. S. Ramaiah University of Applied Sciences
79
Few Techniques Available
Correlation
Regression
Chi-square Test and Cramers rule
Hypothesis Test for two population means/proportions
Paired T-tests comparing two groups
M. S. Ramaiah University of Applied Sciences
80
Measure of Correlation: Coefficient of Correlation Symbol : r
Range : -1 to 1
Sign : Type of correlation
Value : Degree of correlation
Examples:
r = 0.6 , 60 % positive correlation
r = -0.82, 82% negative correlation
r = 0, No correlation
M. S. Ramaiah University of Applied Sciences
81
Regression
Regression helps
To identify the exact form of the relationship
To model output in terms of input or process variables
y = a + b x
Examples:
Yield = 5 + 3 x Time
Y = 2 - 5x
M. S. Ramaiah University of Applied Sciences
82
Coefficient of Regression
Measure of degree of Relationship
Symbol : R2
Range of R2 : 0 to 1
If R2 > 0.6, the Model is reasonably good
M. S. Ramaiah University of Applied Sciences
8383
Error or Residual Analysis
Root Mean Square Error for Prediction
(MSEP)
Regression Statistics
Multiple R 0.594159006
R Square 0.353024925
Adjusted R Square 0.191281156
Standard Error 27.80337004
Observations 6
Coefficients
Intercept 83.00449781
x -0.605970474
x y
65 69
8 78
89 8
88 21
50 24
73 72
M. S. Ramaiah University of Applied Sciences
8484
Root Mean Square Error:
Predicted y = 83.0045 0.6059 x
Error = y predicted y
Mean Square Error = 3092.11 / 6 = 515.35
Root Mean Square Error = 22.70
x y Predicted y Error Error Square
65 69 43.62 25.38 644.33
8 78 78.16 -0.16 0.02
89 8 29.07 -21.07 444.08
88 21 29.68 -8.68 75.33
50 24 52.71 -28.71 824.03
73 72 38.77 33.23 1104.32
3092.11Sum
M. S. Ramaiah University of Applied Sciences
85
Difference between Observed Values Yi and model predicted
values f(Xi) for n datasets
Decomposition of MSEP has been carried out using mean bias (UM), slope bias (UR) and random error (UD)
MSEPYXfU iM /)(
2_
MSEPbSU jXR /)1(* 2
2
MSEPSrU YD /*122
M. S. Ramaiah University of Applied Sciences
86
Objective
To develop a mathematical model for an attribute or response metric
(Y) in terms of other available attributes (Xs).
When to Use
Xs : Continuous
Y : Discrete binary
Logistic Regression
M. S. Ramaiah University of Applied Sciences
87
Hypothesis Test for Difference between Two Means
Objective
To test hypothesis that compare the population mean of interestfor two separate populations (independent samples)
Test Statistic (Large Sample) Test Statistic (Small Sample)
nn
XXZ
2
2
2
1
2
1
21
nnS
XXt
21
2
21
11
M. S. Ramaiah University of Applied Sciences
88
Chi-Square Test
Objective:
To test whether two variables which have frequency data are related or not
Usage:
When both the variables ( X & Y) are categorical (grouped)
Cramers Rule: To quantify the relationship between X & Y
M. S. Ramaiah University of Applied Sciences
89
Overview of the Stages in Data Analysis
Editing
coding
Data entry
Key Boarding
Data
Analysis
Descriptive
analysis
Univariate
analysis
Bivariate
analysis
Multivariate
analysis
Interpretation
Err
or
chec
kin
g
and ver
ific
atio
n
M. S. Ramaiah University of Applied Sciences
90
Multivariate Analysis
The analysis of the simultaneous relationships among several variables
Analyse the data covariance structure to understand it orto reduce the data dimension
Assign observations to groups
Explore relationships among categorical variables
M. S. Ramaiah University of Applied Sciences
91
Few Techniques Available
Multiple Linear Regression
Cluster Analysis
Factor Analysis
ANOVA
MANOVA
Conjoint Analysis
Optimisation Techniques .
M. S. Ramaiah University of Applied Sciences
92
Multiple Regression
To model output variable y in terms of two or more variables
General Form:
Y = a + b1X1 + b2X2 + - - - + bkXk
Two variable case:
Y = a + b1X1 + b2X2
Adjusted R2
If Adj R2 > 0.6, then the model is reasonably good
P value from coefficient table
If p value < 0.05, the corresponding term has strong relationship with output
M. S. Ramaiah University of Applied Sciences
93
Residual Plots: Error Analysis
Y = 44+0.19X1-2.55X2
M. S. Ramaiah University of Applied Sciences
94
Evidence of a strong Shift to Shift Effect
TimeShiftDay
43212121
0.038
0.033
0.028
0.023
0.018
Impurity
Main Effects Plot - Data Means for Impurity
Main Effects Plot
M. S. Ramaiah University of Applied Sciences
95
Validation Tests for Model Adequacy
Mean Square Error (MSE) for checking Model Precision
Mean Bias (MB) for checking Model Accuracy
where, f(Xi)= ith model Prediction
2
1
2^
n
YY
MSE
n
i
ii
n
XfY
MB
n
i
ii
1)(
M. S. Ramaiah University of Applied Sciences
9696
First Factor
Seco
nd F
acto
r
1.00.80.60.40.20.0-0.2-0.4
0.75
0.50
0.25
0.00
-0.25
-0.50
Home
Health
Employ
School
Pop
Loading Plot of Pop, ..., Home
Explain the presence of each variable with the sign (+ or -). This
way we can reduce the number of variables
Factor Analysis
M. S. Ramaiah University of Applied Sciences
97
Predictors Selection
M. S. Ramaiah University of Applied Sciences
98
P = 0.001
M. S. Ramaiah University of Applied Sciences
99
Classification Methods
Example:
Attribute 1 x1
Attribute 2 x2
Label : y y1 (Red) , y2 (Blue)
20
22
24
26
28
30
32
34
36
38
40
10.00 11.00 12.00 13.00 14.00 15.00 16.00 17.00 18.00 19.00 20.00
x1
x2
x2
x1y1 y2
> 35 < 28
y2 y1
< 15.5 > 15.5
M. S. Ramaiah University of Applied Sciences
100100
CLASSIFICATION METHODS
Example: Rules
Attribute 1 x1
Attribute 2 x2
Label : y y1 (Red) , y2 (Blue)
x2
x1y1 y2
> 35 < 28
y2 y1
< 15.5 > 15.5
If x2 > 35 then y = y1
If x2 < 28, then y = y2
If 28 > x2 > 35 & x1 > 15.5, then y = y1
If 28 > x2 > 35 & x1 < 15.5, then y = y2
M. S. Ramaiah University of Applied Sciences
101
Cluster Analysis
Objective
To classify the records or items into a smaller number of groups based
on the values of available attributes.
When to Use
When there is no Y attribute
All attributes are considered as Xs only
M. S. Ramaiah University of Applied Sciences
102
K-Nearest Neighbors Cluster Analysis
Weight in kg
Weight in kg
Acc
eler
ati
on
in
m/s
2
Acc
elera
tion
in
m/s
2
M. S. Ramaiah University of Applied Sciences
103
ANOVA or Experimental Design
Sometimes, an investigator would like to compare more than twopopulation means in a problem situation
ANOVA decomposes the total variation into components ofvariation
1 2 3
Population 1 Population 2 Population 3
M. S. Ramaiah University of Applied Sciences
104
MANOVA and Conjoint Analysis
MANOVA is similar to the ANOVA with added ability to handleseveral dependent variables
The most common applications of conjoint analysis are marketresearch and product development for making trade-offs
M. S. Ramaiah University of Applied Sciences
105
Optimisation Methods
Objective
To identify the best values of a set of variables (Xs) which will optimize an objective function satisfying a given set of constraints
For n variables in m constraints
Max / Min Z = C1x1 + C2x2 + .CnxnSubject to
a11 x1 + a12x2 + . + a1nxn < /> = b1a21 x1 + a22x2 + . + a2nxn < /> = b2
am1 x1 + am2x2 + . + amnxn < /> = bm
And xi > 0, I = 1,2,.n
M. S. Ramaiah University of Applied Sciences
106
You never know what is enough unless you know what is more than enough
- William Blake
M. S. Ramaiah University of Applied Sciences
107
Session Summary (1/2)
Statistical Techniques and Tools:
Completely dependent on type of data used (continuous ordiscrete)
Normal Distribution:
Describes many natural phenomena, industrial and scientificsituations. A normal curve is a graphical representation to describethe normal distribution
Data Analysis is carried out in two distinct environment:
Result of a special study or Experiment
By product of some operations or Observational
M. S. Ramaiah University of Applied Sciences
108
Session Summary (2/2)
Uni-variate Analysis:
The analysis of a single variable, for purposes of description(examples: frequency distribution, averages, and measures ofdispersion)
Bi-variate Analysis:
The analysis of two variables simultaneously for determining theempirical relationship between independent and dependentvariables
Multi-variate Analysis:
The analysis of the simultaneous relationships among severalvariables