Session - 6 Statistics and Data Analysis

Embed Size (px)

DESCRIPTION

Active technique [5] [6] [7]: it is based on the same principle as passive cancellation, which is the creation of an appropriate destructive echo, which would cancel the real echo of the target to the radar. The target will emit electromagnetic energy synchronized with the received radar energy to minimize the reflected signal. 1.2 Benefits of high manoeuvrability in combatManoeuvrability can be defined as the rate at which an aircraft can change speed, altitude, and direction in any desired combination. Another important characteristic of a fighter aircraft is its Agility that measures the rate change of manoeuvrability with respect to time. Erich Hartmann, world’s top fighter pilot in WW II who had 352 confirmed air-to-air kills had a formula of formula: See – Decide – Attack – Coffee Break [8]. The whole purpose of an air combat is to be superior in agility in order to evade a pursuer and corner a quarry. It relies on offensive and defensive basic fighter manoeuvres to gain an advantage over an opponent. This can be achieved with high manoeuvrability only. In any combat manoeuvre, energy gets quickly depleted. Instead of using kinetic energy (speed) to chase the enemy or potential energy (altitude) to get rid of the enemy, if the aircraft is highly manoeuvrable, the Excess Power, (T-D)*V/W of the aircraft can be used to engage with the enemy, kill the targets, disengage and quickly get into a safe position. The initial advantage goes to the pilot who enters combat with most aircraft energy either in terms of speed (kinetic energy) and/or altitude (potential energy), and can manoeuvre over the enemy in quick time. Also when locked by the enemy missile radar or IR trackers, if the aircraft is agile (can quickly manoeuvre), it can survive against the missiles that are not as agile as the aircraft itself. 1.3 Comparison of the benefits of stealth over manoeuvrability in combatsThe stand taken by self is that, in aerial combat a highly manoeuvrable fighter aircraft is preferred over a Stealth fighter. The logical arguments for the stand taken here are discussed below.Low radar signature (Stealth aircrafts) means, the target can be detected and tracked by radar at a shorter distance only. It does not mean the aircraft has complete disappearance from the radar screens. Usually in an air-to-air combat, the enemy fires a missile at the target from a close in range only in order to achieve 100% kill rate. But the missile exhaust is a brilliant flame and being visible from a long distance it can be visually tracked by the target pilot. This coupled with the fact that missiles are less manoeuvrable than the fighter itself and hence have difficulty following an agile fighter, they too often miss their target if the target is agile. The result is usually a close range dogfight where the cannon/gun is the proven weapon; at such ranges, the missile is unusable and hence harmless. Stealth aircrafts have higher flyaway and maintenance costs, while they have significant operational limitations due to the specific aircraft shape imposed and materials used. Also due to their limited ability to carry fuel and weapons internally, they become inefficient in an air-to-air combat scenario. Further, having realized the capabilities of stealth aircrafts, many countries have been developing anti-stealth technologies like the multistatic radars, very low frequency radars, over-the-horizon radars and sensitive IR sensor systems. Be it a highly manoeuvrable fighter or a stealth aircraft, when under attack, the target always tries to turn into the enemy as quickly as possible because turning away will make it easier for the enemy to kill you. And this turn has to be done at a very fast rate. If the aircraft is not agile, it falls prey to the enemy. Most of the aerial attacks are ambushed ones, where manoeuvrability is the key. Superior pilot skill to tackle the enemy is the most important factor in any aerial combat. To achieve this, it is b

Citation preview

  • M. S. Ramaiah University of Applied Sciences

    1

    Data Analysis

    Session Speaker

    K.M. Sharath Kumar

    Session 6

  • M. S. Ramaiah University of Applied Sciences

    22

    Session Objectives

    >_To explain the relevance of data analysis for carrying outresearch

    >_To explore different types of data analysis techniques foreffective interpretation

    >_To critique and recommend appropriate exploratory dataanalysis techniques for a problem

  • M. S. Ramaiah University of Applied Sciences

    33

    Session Outline

    Sampling Design

    Data Collection Methods

    Quantitative and Qualitative Data Analysis

    Stages in Data Analysis

    Review of Techniques

    Error Analysis

  • M. S. Ramaiah University of Applied Sciences

    44

  • M. S. Ramaiah University of Applied Sciences

    55

    One Variant

    6,200 Distinct Parts

    Imported from 17 Countries

    From 240 Suppliers

    Assembled in 1 Plant

    Within few minutes

    Exported to 34 Countries

    Same day

    Without becoming inventory!

    Suzuki Grand Vitara

  • M. S. Ramaiah University of Applied Sciences

    6

    The secret of success is to know something nobody else knows

    - Aristotle Onassis

  • M. S. Ramaiah University of Applied Sciences

    7

    Turn Data into InsightInsight into Action

    Action into Tangible Results

    - Accenture

  • M. S. Ramaiah University of Applied Sciences

    8

    Data Analysis (1/2)

    Explore relationships among the variables

    Partition the total variability (by statement / variance component analysis)

    Handle noisy data appropriately

    Questions to be answered:

    Is the process stable?

    Is the process capable of meeting specifications?

    What are the major sources of variation (noise, etc)?

    Listen to what the data is saying

  • M. S. Ramaiah University of Applied Sciences

    9

    Data Analysis (2/2)

    Data Analysis is carried out in two distinct environment

    Result of a special study or Experiment

    By product of some operations or Observational

    Experimental Studies

    Here we compare various condition and try to determinewhich condition is better. We have finite amount of data andcarry out one time analysis

    Observational Studies

    Here we get data from steady state process and trying to findout any unplanned change is occurred or not. Generally weperform a sequential analysis using a continuing stream ofdata

  • M. S. Ramaiah University of Applied Sciences

    10

    Quantitative vs. Qualitative

    Explanation through numbers

    Objective

    Deductive reasoning

    Predefined variables and measurement

    Data collection before analysis

    Cause and effect relationships

    Explanation through words

    Subjective

    Inductive reasoning

    Creativity, extraneous variables

    Data collection and analysis intertwined

    Description, meaning

    Classification of Data Analysis

  • M. S. Ramaiah University of Applied Sciences

    1111

    Ambushed Every Where

  • M. S. Ramaiah University of Applied Sciences

    12

    Data analysis should be:

    Supported by data

    Shown in graphical and statistical format

    Not based on intuition

    Make sense from an engineering standpoint

    Data and Hard Evidence!!

  • M. S. Ramaiah University of Applied Sciences

    13

    Key Components of a Data Analysis Plan

    Purpose of the evaluation

    Questions

    What you hope to learn from the question

    Analysis technique

    How data will be presented

  • M. S. Ramaiah University of Applied Sciences

    1414

    Types of Data

    Continuous Data

    Discrete Data

  • M. S. Ramaiah University of Applied Sciences

    1515

    Continuous Data

    Data generated by

    Physically measuring the characteristic

    Generally using an instrument

    Assigning an unique value to each item

    Examples:

    Time to receive a shipment, Time spend per page, Time to

    activate, CPU Speed, Total Minutes per Incident (TMPI),

    etc.

    Hardness, Strength, Weight, Diameter, etc.

  • M. S. Ramaiah University of Applied Sciences

    1616

    Discrete Data

    Data generated by

    Classifying the items into different groups based on

    some criteria

    No physical measurement is involved

    Examples:

    Sex, Shade variation, Surface defects etc.

    % of visitors signing in for AOL messenger per day,

    Number of Recharges per Month , Number of Operating

    Systems, % Escalations, etc .

  • M. S. Ramaiah University of Applied Sciences

    1717

    Continuous Data: Example (Time spend per page visit (in

    minutes))

    SL No. Data SL No. Data

    1 0.98 11 1.02

    2 1.03 12 0.98

    3 1.00 13 1.01

    4 1.00 14 1.01

    5 0.99 15 0.99

    6 1.01 16 1.00

    7 0.97 17 1.01

    8 1.02 18 0.99

    9 1.00 19 1.00

    10 0.99 20 1.02

  • M. S. Ramaiah University of Applied Sciences

    1818

    Continuous Data: Example (Time spend per visit (in

    minutes)) Graphical Representation

    0

    1

    2

    3

    4

    5

    6

    0.9631 0.9731 0.9831 0.9931 1.0031 1.0131 1.0231 1.0331

  • M. S. Ramaiah University of Applied Sciences

    19

    Random Variables

    BBBB

    BGBB

    GBBB

    BBBG

    BBGB

    GGBB

    GBBG

    BGBG

    BGGB

    GBGB

    BBGG

    BGGG

    GBGG

    GGGB

    GGBG

    GGGG

    0

    1

    2

    3

    4

    X

    Sample Space

    Points on the

    Real Line

  • M. S. Ramaiah University of Applied Sciences

    20

    Suppose, the random variable X = 3 when any of the four outcomes BGGG, GBGG, GGBG, or GGGB occurs,

    P(X = 3) = P(BGGG) + P(GBGG) + P(GGBG) + P(GGGB) = 4/16

    The probability distribution of a random variable is a table that lists the possible values of the random variables and their associated probabilities.

    x P(x)0 1/161 4/162 6/163 4/164 1/16

    16/16=1

    Random Variables (Continued)

    The Graphical Display for this Probability Distributionis shown on the next Slide.

  • M. S. Ramaiah University of Applied Sciences

    21

    Random Variables (Continued)

    Number of Girls, X

    Pro

    ba

    bili

    ty,

    P(X

    )

    43210

    0.4

    0.3

    0.2

    0.1

    0.0

    1/16

    4/16

    6/16

    4/16

    1/16

    Probability Distribution of the Number of Girls in Four Births

  • M. S. Ramaiah University of Applied Sciences

    22

    Consider the experiment of tossing two six-sided dice. There are 36 possible

    outcomes. Let the random variable X represent the sum of the numbers on

    the two dice:

    2 3 4 5 6 7

    1,1 1,2 1,3 1,4 1,5 1,6 8

    2,1 2,2 2,3 2,4 2,5 2,6 9

    3,1 3,2 3,3 3,4 3,5 3,6 10

    4,1 4,2 4,3 4,4 4,5 4,6 11

    5,1 5,2 5,3 5,4 5,5 5,6 12

    6,1 6,2 6,3 6,4 6,5 6,6

    x P(x)2 1/363 2/364 3/365 4/366 5/367 6/368 5/369 4/3610 3/3611 2/3612 1/36

    1

    12111098765432

    0.17

    0.12

    0.07

    0.02

    x

    p(x

    )

    P robab ility Dis tribution o f S um of Two Dice

    Example

  • M. S. Ramaiah University of Applied Sciences

    23

    NORMAL DISTRIBUTION

  • M. S. Ramaiah University of Applied Sciences

    2424

    Generic Causes Of Variation

    Machines

    Materials

    Methods

    Measurements

    Mother Nature

    People

    PROCESS

  • M. S. Ramaiah University of Applied Sciences

    25

    Center of the barSmooth curve interconnecting the center of each bar

    Units of Measure

    THE NORMAL CURVE

  • M. S. Ramaiah University of Applied Sciences

    26

    If the frequency distribution of a set of values is such that :

    68.26% of the values lie within 1 from the meanAND

    95.46% of the values lie within 2 from the meanAND

    99.73% of the values lie within 3 from the mean

    Then the distribution is normal.

    NORMAL DISTRIBUTION IS CHARACTERISED BY A BELL SHAPED CURVE.

    Normal Distribution

  • M. S. Ramaiah University of Applied Sciences

    27

    Standard Normal Distribution

    Since each normal variables have different units of measurement

    Standard Normal Distribution can tackle this

    Standard Normal Variable Z = (x ) /

    First convert the original problem into Z. The probability table for Zwill be available

  • M. S. Ramaiah University of Applied Sciences

    28

    Sampling Design

  • M. S. Ramaiah University of Applied Sciences

    2929

    Population (N) Sample (n)

    Samples and Populations

  • M. S. Ramaiah University of Applied Sciences

    30

    Sampling Design within the Research Process

    Draw

    sample

    Question hierarchy

    Sample Type

    Sampling

    technique

    Define Relevant

    Population

    Identify existing

    sampling frame

    Evaluate

    sampling frame

    Modify

    sampling frame

    Dont

    accept

    Probability

    Non-Probability

    Select

    sampling frame

  • M. S. Ramaiah University of Applied Sciences

    31

    Types of Sampling

    Probability Sampling Non-Probability

    Sampling

    Simple

    Random

    Sampling

    Stratified

    Random

    Sampling

    Systematic

    Sampling

    Cluster

    Sampling

    Convenience

    Sampling

    Quota

    Sampling

    Expert

    Sampling

  • M. S. Ramaiah University of Applied Sciences

    32

    In stratified random sampling, we assume that the population of N units may be divided into m groups with Ni units in each group i=1,2,...,m. The m strata are nonoverlapping and together they make up the total population: N1 + N2+...+ Nm =N.

    Stratified Random Sampling

    2 Stratum

    1 Stratum

    mStratum

    1N

    2N

    mN

    The m strata are non-overlapping.

    NNm

    i i

    1

    Population

  • M. S. Ramaiah University of Applied Sciences

    33

    Systematic Random Sampling

    Units are drawn from the population at regular intervals clearly defined

    Steps

    - Compute K =(N/n) and take integer value. K is called sampling interval

    - Select a random number between 1 and k

    - Starting with this number, select every kth number until all the n units are selected

  • M. S. Ramaiah University of Applied Sciences

    34

    Example

    Suppose in a market survey, you have to select 5 households out of 50 households in a block.

    - Number of units in the population N = 50

    - Number of units in the sample n = 5

    - Sampling Interval K = (N/n) = 50/5 = 10

    - Select a random number between 1 and 10

    Suppose the selected random number is 5. Starting with 5, select every 10th unit.

  • M. S. Ramaiah University of Applied Sciences

    35

    Example Contd.

    1 2 3 4 5 6 7 8 910 11 12 13 14 15 16 17 1819 20 21 22 23 24 25 26 2728 29 30 31 32 33 34 35 3637 38 39 40 41 42 43 44 4546 47 48 49 50

  • M. S. Ramaiah University of Applied Sciences

    36

    7654321Group

    Population Distribution

    In stratified sampling a random sample (ni) is chosen from each segment of the population (Ni).

    Sample Distribution

    In cluster sampling observations are drawn from m out of M areas or clusters of the population.

    Cluster Sampling

  • M. S. Ramaiah University of Applied Sciences

    37

    Caution

    None of the Non-probability sampling should be generalised about the population

  • M. S. Ramaiah University of Applied Sciences

    38

    Sampling Distribution

    - A conceptual framework

  • M. S. Ramaiah University of Applied Sciences

    39

    Sampling Distribution of the Mean from Normal Population

    If X1, X2,.., Xn are n independent random samples drawn from a normal population with mean and standard deviation ,

    then

    the sampling distribution of X follows a normal distribution with mean and standard deviation / sqrt(n)

    Standard deviation of the sample mean = = standard error

    nnX XXXX ni

    .......21

    n

  • M. S. Ramaiah University of Applied Sciences

    40

    Standard error

    of statistic

    Sample size = n

    Sample size = 2n

    Standard error

    of statistic

    The sample size determines the bound of a statistic, since the standard error of a statistic shrinks as the sample size increases:

    Sample Size and Standard Error

  • M. S. Ramaiah University of Applied Sciences

    41

    Determining Sample Size

  • M. S. Ramaiah University of Applied Sciences

    42

    Determining Sample Size using Confidence Interval

    If we know the precision (sampling error), the confidencelevel, and the standard deviation of the originalpopulation the sample size can be determined

  • M. S. Ramaiah University of Applied Sciences

    43

    Sample Size Determination Population Mean

    n

    xZ

    Sampling Error E = X , squaring both sides we get

    EZn

    2

    22

    Where Z is the value corresponding to the area of

    ((1-) / 2) from the mean of the standard normal

    distribution

  • M. S. Ramaiah University of Applied Sciences

    44

    Example

    A marketing manager of a fast food restaurant in a city wishes toestimate the average yearly amount that families spend on fastfood restaurants. He wants the estimate to be within + or Rs. 100with a confidence interval of 99%. It is known from an earlier pilotstudy that the standard deviation of the family expenditure on fastfood restaurant is Rs. 500. How many families must be chosen forthis problem?

  • M. S. Ramaiah University of Applied Sciences

    45

    Solution

    Applying the formula

    n = ((2.58^2) * (500^2)) / (100^2) = 166.41

    = 166 (ROUNDED OFF)

    EZn

    2

    22

  • M. S. Ramaiah University of Applied Sciences

    46

    Sample Size Determination Population Proportion

    We know

    Sampling Error E = (p-P), squaring both sides and simplifying

    We get:

    n

    pp

    PpZ

    )1(

    EZ ppn

    2

    2)1(

    Where Z is the value corresponding to the area of

    ((1-) / 2) from the proportion of the standard

    normal distribution

  • M. S. Ramaiah University of Applied Sciences

    47

    Example

    A company manufacturing sports goods wants to estimate theproportion of cricket players among high school students in India.The company wants the estimate to be within + or 0.03 with aconfidence interval of 99%. A pilot study done earlier reveals thatout of 80 high school students, 36 students play cricket. Whatshould be the sample size?

  • M. S. Ramaiah University of Applied Sciences

    48

    Solution

    p = 36/80 = 0.45

    Applying the formula

    n = ((2.58^2) (0.45(1-0.45)))/(0.03^2)

    n = 1831

  • M. S. Ramaiah University of Applied Sciences

    49

    Data Collection Methods

    Primary Data Collection

    Secondary Data Collection

  • M. S. Ramaiah University of Applied Sciences

    50

    Data Collection Methods

    Primary Data

    Observation method

    Interview method

    Questionnaires

    Warranty cards

    Mechanical devices

    Secondary Data

    Agency

    Published material etc.

  • M. S. Ramaiah University of Applied Sciences

    51

    Scales of Measurement

    Nominal Scale - groups or classes

    Gender

    Ordinal Scale - order matters

    Ranks (top ten videos)

    Interval Scale - difference or distance matters has arbitrary zero value

    Temperatures (0F, 0C)

    Ratio Scale - Ratio matters has a natural zero value

    Salaries

    Likerts Scale

  • M. S. Ramaiah University of Applied Sciences

    52

    Sample Rating Scales

    Simple category scale: (data: nominal)

    Ex:

    I plan to purchase a laptop in next twelve months

    Yes

    No

  • M. S. Ramaiah University of Applied Sciences

    53

    Sample Rating Scales

    Multiple choice Single response scale (data: nominal)

    Ex:

    What newspaper do you read most often?

    TOI

    DH

    The Hindu

    Mint

    Others (specify:_________)

  • M. S. Ramaiah University of Applied Sciences

    54

    Sample Rating Scales

    Likert Scale (data: interval)

    Ex:

    The internet is superior to traditional libraries for comprehensive searches

    Strongly Agree Neutral Disagree Strongly

    Agree Disagree

  • M. S. Ramaiah University of Applied Sciences

    55

    Sample Rating Scales

    Semantic Differential Scale (data: interval)

    Ex:

    Lands end catalog

    Fast ___: ___ : ___ : ___ : ___ : ___ : ___ : Slow

  • M. S. Ramaiah University of Applied Sciences

    56

    Sample Rating Scales

    Numerical Scale (data: ordinal or interval)

    Ex:

    Extremely 5 4 3 2 1 Extremely

    Favourable Unfavourable

    Employees cooperation in teams___

    Employees knowledge of task ___

    Employees planning effectiveness ___

  • M. S. Ramaiah University of Applied Sciences

    57

    Sample Rating Scales

    Multiple rating list scale (data: interval)

    Ex:

    Please indicate how important or unimportant each service characteristic is:

    Important Unimportant

    Fast reliable repair 7 6 5 4 3 2 1

    Service at my location 7 6 5 4 3 2 1

    Maintenance by manufacturer 7 6 5 4 3 2 1

    Knowledgeable technicians 7 6 5 4 3 2 1

    Service contract after warranty 7 6 5 4 3 2 1

  • M. S. Ramaiah University of Applied Sciences

    58

    Sample Rating Scales

    Constant-Sum Scale (data: ratio)

    Ex:

    Taking all the supplier characteristics we have just discussed and now considering cost, what is their relative importance to you (dividing 100 units between)

    Being one of the lowest cost suppliers

    All other aspects of supplier performance

    Sum 100

  • M. S. Ramaiah University of Applied Sciences

    59

    Sample Rating Scales

    Stapel Scale (data: ordinal or interval)

    Ex:

    Company Name

    +3 +3 +3

    +2 +2 +2

    +1 +1 +1

    Technology Existing Reputation

    Leader Products

    -1 -1 -1

    -2 -2 -2

    -3 -3 -3

  • M. S. Ramaiah University of Applied Sciences

    60

    Sample Rating Scales

    Graphic rating scale (data: ordinal or interval or ratio)

    Ex:

    How likely are you to recommend complete care to others?

    Very Likely Very Unlikely

  • M. S. Ramaiah University of Applied Sciences

    61

    Data Analysis

  • M. S. Ramaiah University of Applied Sciences

    62

    The cure for boredom is curiosity ,There is no cure for curiosity

    - Dorothy Parker

  • M. S. Ramaiah University of Applied Sciences

    63

    Things arent always what we think!

    Six blind men go to observe an elephant. One feels the side and thinks the

    elephant is like a wall. One feels the tusk and thinks the elephant is a like a

    spear. One touches the squirming trunk and thinks the elephant is like a

    snake. One feels the knee and thinks the elephant is like a tree. One

    touches the ear, and thinks the elephant is like a fan. One grasps the tail and

    thinks it is like a rope. They argue long and loud and though each was partly

    in the right, all were in the wrong.

    For a detailed version of this fable see:

    http://www.wordinfo.info/words/index/info/view_unit/1/?letter=B&spage=3

    Blind men and an elephant

    - Indian fable

  • M. S. Ramaiah University of Applied Sciences

    64

    Stages in Data Analysis

    Editing

    coding

    Data entry

    Key Boarding

    Data

    Analysis

    Descriptive

    analysis

    Univariate

    analysis

    Bivariate

    analysis

    Multivariate

    analysis

    Interpretation

    Err

    or

    chec

    kin

    g

    and ver

    ific

    atio

    n

  • M. S. Ramaiah University of Applied Sciences

    65

    Descriptive Analysis Techniques

    Count (frequencies)

    Percentage

    Mean

    Mode

    Median

    Range

    Standard deviation

    Variance

    Ranking

  • M. S. Ramaiah University of Applied Sciences

    66

    Overview of the Stages in Data Analysis

    Editing

    coding

    Data entry

    Key Boarding

    Data

    Analysis

    Descriptive

    analysis

    Univariate

    analysis

    Bivariate

    analysis

    Multivariate

    analysis

    Interpretation

    Err

    or

    chec

    kin

    g

    and ver

    ific

    atio

    n

  • M. S. Ramaiah University of Applied Sciences

    67

    Frequency Distributions

    To what extent did you increase your skills in

    putting together a household budget?

    A lot Some A little Not at all

    Women (N=30) 14 9 5 2

    Uni-variate Analysis The analysis of a single variable, for

    purposes of description (examples: frequency distribution,

    averages, and measures of dispersion)

  • M. S. Ramaiah University of Applied Sciences

    68

    Percentage Distributions

    To what extent did you increase your skills in

    putting together a household budget?

    A lot Some A little Not at all

    Women (N=30) 46% 30% 17% 7%

  • M. S. Ramaiah University of Applied Sciences

    69

    Graphing Frequency DataHow did you first hear about the web site?

    N Percent

    Court Referral 10 24.4%

    Social Worker 5 12.2%

    Friend 5 12.2%

    Web Search Engine 8 19.5%

    Librarian 9 22.0%

    Newspaper Story 3 7.3%

    Other 1 2.4%

    41 100.0%

    How did you first hear about the web-site?

    Court Referral

    Social Worker

    Friend or Acquaintance

    Web Search EngineLibrarian

    Newspaper Story

    Other

  • M. S. Ramaiah University of Applied Sciences

    70

    Means and Medians

    History 95

    English 96

    Biology 93

    Latin 92

    Math 98

    Music 94

    Gym 40

    Mean = 87

    Median = 94

    Math 98

    English 96

    History 95

    Music 94

    Biology 93

    Latin 92

    Gym 40

  • M. S. Ramaiah University of Applied Sciences

    71

    Note

    40 50 55 94 100 100 100

    40 92 93 94 95 96 98

    Mean = 81

    Mean = 87

  • M. S. Ramaiah University of Applied Sciences

    72

    Histograms

    0

    1

    2

    3

    4

    5

    6

    60 70 80 90 100

    Fre

    qu

    en

    cy

    0

    1

    2

    3

    4

    5

    6

    7

    60 70 80 90 100

    Fre

    qu

    en

    cy

  • M. S. Ramaiah University of Applied Sciences

    73

    Cross Tabulations

    Program Type Area of Inquiry Outcome

    Web site Employment law Satisfied

    I & R Line Family law Not satisfied

    Law clinic Immigration Pending

    Web site Immigration Satisfied

    I & R Line Immigration Satisfied

    I & R Line Family law Not satisfied

    Web site Employment law Not satisfied

    Law clinic Other Satisfied

    I & R Line Other Not satisfied

    I & R Line Other Satisfied

    Law clinic Employment law Satisfied

    Web site Family law Satisfied

    Law clinic Family law Satisfied

    Web site Immigration Not satisfied

    Law clinic Immigration Not satisfied

    I & R Line Family law Satisfied

    I & R Line Immigration Not satisfied

    I & R Line Employment law Not satisfied

    Law clinic Other Pending

    Count of Outcome Outcome

    Program Type Not satisfied Pending Satisfied Grand Total

    I & R Line 7 5 12

    Law clinic 1 3 7 11

    Web site 6 5 11

    Grand Total 14 3 17 34

    Count of Outcome Outcome

    Program Type Not satisfied Pending Satisfied Grand Total

    I & R Line 58% 0% 42% 100%

    Law clinic 9% 27% 64% 100%

    Web site 55% 0% 45% 100%

    Grand Total 41.18% 8.82% 50.00% 100.00%

  • M. S. Ramaiah University of Applied Sciences

    74

    Graphing comparisons

    Satisfaction with Services

    0

    5

    10

    15

    20

    25

    30

    35

    40

    A B C D E

    Clinic Name

    Sati

    sfact

    ion

    Sco

    re

  • M. S. Ramaiah University of Applied Sciences

    75

    Satisfaction with Services

    0

    2

    4

    6

    8

    10

    12

    14

    16

    A B C D E

    Clinic

    Sa

    tisf

    act

    ion

    Sco

    re

    Staff

    Advice

    Facility

  • M. S. Ramaiah University of Applied Sciences

    76

    Satisfaction with Services

    0

    2

    4

    6

    8

    10

    12

    14

    16

    Staff Advice Facility

    Satisfaction Component

    Sa

    tisf

    act

    ion

    Sco

    re

    A

    B

    C

    D

    E

  • M. S. Ramaiah University of Applied Sciences

    77

    Overview of the Stages in Data Analysis

    Editing

    coding

    Data entry

    Key Boarding

    Data

    Analysis

    Descriptive

    analysis

    Univariate

    analysis

    Bivariate

    analysis

    Multivariate

    analysis

    Interpretation

    Err

    or

    chec

    kin

    g

    and ver

    ific

    atio

    n

  • M. S. Ramaiah University of Applied Sciences

    78

    Bi-variate Analysis

    The analysis of two variables simultaneously fordetermining the empirical relationship between them

    Y = f (X)

  • M. S. Ramaiah University of Applied Sciences

    79

    Few Techniques Available

    Correlation

    Regression

    Chi-square Test and Cramers rule

    Hypothesis Test for two population means/proportions

    Paired T-tests comparing two groups

  • M. S. Ramaiah University of Applied Sciences

    80

    Measure of Correlation: Coefficient of Correlation Symbol : r

    Range : -1 to 1

    Sign : Type of correlation

    Value : Degree of correlation

    Examples:

    r = 0.6 , 60 % positive correlation

    r = -0.82, 82% negative correlation

    r = 0, No correlation

  • M. S. Ramaiah University of Applied Sciences

    81

    Regression

    Regression helps

    To identify the exact form of the relationship

    To model output in terms of input or process variables

    y = a + b x

    Examples:

    Yield = 5 + 3 x Time

    Y = 2 - 5x

  • M. S. Ramaiah University of Applied Sciences

    82

    Coefficient of Regression

    Measure of degree of Relationship

    Symbol : R2

    Range of R2 : 0 to 1

    If R2 > 0.6, the Model is reasonably good

  • M. S. Ramaiah University of Applied Sciences

    8383

    Error or Residual Analysis

    Root Mean Square Error for Prediction

    (MSEP)

    Regression Statistics

    Multiple R 0.594159006

    R Square 0.353024925

    Adjusted R Square 0.191281156

    Standard Error 27.80337004

    Observations 6

    Coefficients

    Intercept 83.00449781

    x -0.605970474

    x y

    65 69

    8 78

    89 8

    88 21

    50 24

    73 72

  • M. S. Ramaiah University of Applied Sciences

    8484

    Root Mean Square Error:

    Predicted y = 83.0045 0.6059 x

    Error = y predicted y

    Mean Square Error = 3092.11 / 6 = 515.35

    Root Mean Square Error = 22.70

    x y Predicted y Error Error Square

    65 69 43.62 25.38 644.33

    8 78 78.16 -0.16 0.02

    89 8 29.07 -21.07 444.08

    88 21 29.68 -8.68 75.33

    50 24 52.71 -28.71 824.03

    73 72 38.77 33.23 1104.32

    3092.11Sum

  • M. S. Ramaiah University of Applied Sciences

    85

    Difference between Observed Values Yi and model predicted

    values f(Xi) for n datasets

    Decomposition of MSEP has been carried out using mean bias (UM), slope bias (UR) and random error (UD)

    MSEPYXfU iM /)(

    2_

    MSEPbSU jXR /)1(* 2

    2

    MSEPSrU YD /*122

  • M. S. Ramaiah University of Applied Sciences

    86

    Objective

    To develop a mathematical model for an attribute or response metric

    (Y) in terms of other available attributes (Xs).

    When to Use

    Xs : Continuous

    Y : Discrete binary

    Logistic Regression

  • M. S. Ramaiah University of Applied Sciences

    87

    Hypothesis Test for Difference between Two Means

    Objective

    To test hypothesis that compare the population mean of interestfor two separate populations (independent samples)

    Test Statistic (Large Sample) Test Statistic (Small Sample)

    nn

    XXZ

    2

    2

    2

    1

    2

    1

    21

    nnS

    XXt

    21

    2

    21

    11

  • M. S. Ramaiah University of Applied Sciences

    88

    Chi-Square Test

    Objective:

    To test whether two variables which have frequency data are related or not

    Usage:

    When both the variables ( X & Y) are categorical (grouped)

    Cramers Rule: To quantify the relationship between X & Y

  • M. S. Ramaiah University of Applied Sciences

    89

    Overview of the Stages in Data Analysis

    Editing

    coding

    Data entry

    Key Boarding

    Data

    Analysis

    Descriptive

    analysis

    Univariate

    analysis

    Bivariate

    analysis

    Multivariate

    analysis

    Interpretation

    Err

    or

    chec

    kin

    g

    and ver

    ific

    atio

    n

  • M. S. Ramaiah University of Applied Sciences

    90

    Multivariate Analysis

    The analysis of the simultaneous relationships among several variables

    Analyse the data covariance structure to understand it orto reduce the data dimension

    Assign observations to groups

    Explore relationships among categorical variables

  • M. S. Ramaiah University of Applied Sciences

    91

    Few Techniques Available

    Multiple Linear Regression

    Cluster Analysis

    Factor Analysis

    ANOVA

    MANOVA

    Conjoint Analysis

    Optimisation Techniques .

  • M. S. Ramaiah University of Applied Sciences

    92

    Multiple Regression

    To model output variable y in terms of two or more variables

    General Form:

    Y = a + b1X1 + b2X2 + - - - + bkXk

    Two variable case:

    Y = a + b1X1 + b2X2

    Adjusted R2

    If Adj R2 > 0.6, then the model is reasonably good

    P value from coefficient table

    If p value < 0.05, the corresponding term has strong relationship with output

  • M. S. Ramaiah University of Applied Sciences

    93

    Residual Plots: Error Analysis

    Y = 44+0.19X1-2.55X2

  • M. S. Ramaiah University of Applied Sciences

    94

    Evidence of a strong Shift to Shift Effect

    TimeShiftDay

    43212121

    0.038

    0.033

    0.028

    0.023

    0.018

    Impurity

    Main Effects Plot - Data Means for Impurity

    Main Effects Plot

  • M. S. Ramaiah University of Applied Sciences

    95

    Validation Tests for Model Adequacy

    Mean Square Error (MSE) for checking Model Precision

    Mean Bias (MB) for checking Model Accuracy

    where, f(Xi)= ith model Prediction

    2

    1

    2^

    n

    YY

    MSE

    n

    i

    ii

    n

    XfY

    MB

    n

    i

    ii

    1)(

  • M. S. Ramaiah University of Applied Sciences

    9696

    First Factor

    Seco

    nd F

    acto

    r

    1.00.80.60.40.20.0-0.2-0.4

    0.75

    0.50

    0.25

    0.00

    -0.25

    -0.50

    Home

    Health

    Employ

    School

    Pop

    Loading Plot of Pop, ..., Home

    Explain the presence of each variable with the sign (+ or -). This

    way we can reduce the number of variables

    Factor Analysis

  • M. S. Ramaiah University of Applied Sciences

    97

    Predictors Selection

  • M. S. Ramaiah University of Applied Sciences

    98

    P = 0.001

  • M. S. Ramaiah University of Applied Sciences

    99

    Classification Methods

    Example:

    Attribute 1 x1

    Attribute 2 x2

    Label : y y1 (Red) , y2 (Blue)

    20

    22

    24

    26

    28

    30

    32

    34

    36

    38

    40

    10.00 11.00 12.00 13.00 14.00 15.00 16.00 17.00 18.00 19.00 20.00

    x1

    x2

    x2

    x1y1 y2

    > 35 < 28

    y2 y1

    < 15.5 > 15.5

  • M. S. Ramaiah University of Applied Sciences

    100100

    CLASSIFICATION METHODS

    Example: Rules

    Attribute 1 x1

    Attribute 2 x2

    Label : y y1 (Red) , y2 (Blue)

    x2

    x1y1 y2

    > 35 < 28

    y2 y1

    < 15.5 > 15.5

    If x2 > 35 then y = y1

    If x2 < 28, then y = y2

    If 28 > x2 > 35 & x1 > 15.5, then y = y1

    If 28 > x2 > 35 & x1 < 15.5, then y = y2

  • M. S. Ramaiah University of Applied Sciences

    101

    Cluster Analysis

    Objective

    To classify the records or items into a smaller number of groups based

    on the values of available attributes.

    When to Use

    When there is no Y attribute

    All attributes are considered as Xs only

  • M. S. Ramaiah University of Applied Sciences

    102

    K-Nearest Neighbors Cluster Analysis

    Weight in kg

    Weight in kg

    Acc

    eler

    ati

    on

    in

    m/s

    2

    Acc

    elera

    tion

    in

    m/s

    2

  • M. S. Ramaiah University of Applied Sciences

    103

    ANOVA or Experimental Design

    Sometimes, an investigator would like to compare more than twopopulation means in a problem situation

    ANOVA decomposes the total variation into components ofvariation

    1 2 3

    Population 1 Population 2 Population 3

  • M. S. Ramaiah University of Applied Sciences

    104

    MANOVA and Conjoint Analysis

    MANOVA is similar to the ANOVA with added ability to handleseveral dependent variables

    The most common applications of conjoint analysis are marketresearch and product development for making trade-offs

  • M. S. Ramaiah University of Applied Sciences

    105

    Optimisation Methods

    Objective

    To identify the best values of a set of variables (Xs) which will optimize an objective function satisfying a given set of constraints

    For n variables in m constraints

    Max / Min Z = C1x1 + C2x2 + .CnxnSubject to

    a11 x1 + a12x2 + . + a1nxn < /> = b1a21 x1 + a22x2 + . + a2nxn < /> = b2

    am1 x1 + am2x2 + . + amnxn < /> = bm

    And xi > 0, I = 1,2,.n

  • M. S. Ramaiah University of Applied Sciences

    106

    You never know what is enough unless you know what is more than enough

    - William Blake

  • M. S. Ramaiah University of Applied Sciences

    107

    Session Summary (1/2)

    Statistical Techniques and Tools:

    Completely dependent on type of data used (continuous ordiscrete)

    Normal Distribution:

    Describes many natural phenomena, industrial and scientificsituations. A normal curve is a graphical representation to describethe normal distribution

    Data Analysis is carried out in two distinct environment:

    Result of a special study or Experiment

    By product of some operations or Observational

  • M. S. Ramaiah University of Applied Sciences

    108

    Session Summary (2/2)

    Uni-variate Analysis:

    The analysis of a single variable, for purposes of description(examples: frequency distribution, averages, and measures ofdispersion)

    Bi-variate Analysis:

    The analysis of two variables simultaneously for determining theempirical relationship between independent and dependentvariables

    Multi-variate Analysis:

    The analysis of the simultaneous relationships among severalvariables