10
1 BES Tutorial Sample Solutions, Semester2 2010 WEEK 3 TUTORIAL EXERCISES (To be discussed in the week starting August 2) 1. Using the car data from Week 2, Question 3: (a) Redo Q3(c) using EXCEL to confirm that the frequency histogram is given by Figure 3.1. (b)Calculate the mean, median and mode for this sample of data and use them to further describe the distribution of ages. Mean 3 . 7 20 11 24 ... 6 5 5 Ordering the data from lowest to highest: 0 1 2 3 4 5 6 7 8 9 10 2 6 10 14 18 22 Frequency Age Figure 3.1: Revised histogram for age of cars

Studentrs Tutorial Answers Week3

Embed Size (px)

Citation preview

  • 1

    BES Tutorial Sample Solutions, Semester2 2010

    WEEK 3 TUTORIAL EXERCISES (To be discussed in the week starting August 2)

    1. Using the car data from Week 2, Question 3: (a) Redo Q3(c) using EXCEL to confirm that the

    frequency histogram is given by Figure 3.1.

    (b) Calculate the mean, median and mode for this sample

    of data and use them to further describe the distribution of ages.

    Mean 3.720

    1124...655

    Ordering the data from lowest to highest:

    0123456789

    10

    2 6 10 14 18 22

    Freq

    uenc

    y

    Age

    Figure 3.1: Revised histogram for age of cars

  • 2

    2, 2, 3, 4, 4, 5, 5, 5, 6, 6, 6, 6, 6, 7, 9, 10, 11, 11, 14, 24, Median = (6+6)/2=6 Mode = 6 The sample mean is to the right of mode and median, suggesting that the sample distribution is skewed towards the right. The cause seems to be the large outlier one car had an age of 24, which appeared to be very different to the age of other cars. Given the skewness and the outlier, the median is possibly a better measure of central tendency. Hence a typical second-hand car is 6 years old. Alternatively the EXCEL output is:

    Age Mean 7.3Standard Error 1.126476Median 6Mode 6Standard Deviation 5.037752Sample Variance 25.37895Kurtosis 5.712234Skewness 2.0983Range 22Minimum 2Maximum 24Sum 146Count 20

    (c) If the largest observation were removed from this data

    set, how would the three measures of central tendency you have calculated change?

    Mean 4.619

    116...655

    (Now closer to

    median)

  • 3

    Median = (6+6)/2= 6 (unchanged) Mode = 6 (unchanged) 2. For the following statistical population, compute the

    mean, range, variance and standard deviation: 3, 3, 5, 12, 13, 14, 17, 20, 21, 21.

    9.1210

    21212017141312533

    Mean

    18321 Range

    89.4510

    )9.1221(....)9.123()( 2222

    Nx

    Variance i

    7742.689.45 deviationStandard

  • 4

    3. For the population in Q2 above, what would happen to each of the measures you have calculated if :

    (a) 4 were added to each data point (observation)? The mean would increase by 4, but the range variance and standard deviation would be unchanged. (b) Each data point was multiplied by 2? The mean, range and standard deviation would be multiplied by 2, whilst the variance would be multiplied by 4. 4. Calculate the 90th percentile for the following set of

    data: -2.4, -1.34, 3.4, 3.5, 4.01, 6.5, 6.7, 7.25, 7.9, 8.46, 9.7, 9.8, 10.45

    For a value of , we have

    .

    Implying the 90th percentile is 60% of the distance between the 12th & 13th observation. Then:

    90th percentile

    90p

    6.1210090)14(

    100)1( pnLp

    19.10)8.945.10)(6.0(8.9

  • 5

    5. SIA: Migrant wealth. Suppose the Minister for Immigration is interested in research on the assimilation of migrant households (a household where the chief income-earner is foreign born). The Household, Income and Labour Dynamics in Australia (HILDA) survey is a representative survey of Australian households. Using 4,669 household observations for 2002 from HILDA, we find there are 3,567 households classified as Australian-born and 1,102 classified as migrants. One key consideration is how migrant households are doing in terms of wealth compared with Australian-born households. Using these data, we find the following: Summary statistics for net household wealth ($A)

    Mean 10th percen

    tile

    Median 90th percen

    tile Australian-

    born

    236,064 1,545 123,020 560,006

    Migrant 248,970 1,720 131,152 524,372

    (a) What can you say about the distribution of net

    household wealth for both Australian-born and migrant households by looking at just the mean and the median figures?

    The wealth distribution is skewed quite heavily towards the right for both Australian-born and migrant households. The mean is much larger than the median,

  • 6

    suggesting that more than 50% of each sample have less than average wealth, while less than 50% of each sample have more than average wealth. In other words, there is a fair amount of wealth inequality in both samples. (b) More generally what can you say about the

    distribution of wealth for migrant households compared to that for Australian-born households? In particular, which type of household has greater variation in wealth?

    Based on just the mean and the median measures, a typical migrant family appears to be slightly wealthier than a typical Australian-born family. Both figures are larger for the migrant sample than the Australian-born sample. This is also the case for the 10th percentile figure. By contrast, the 90th percentile is greater for the Australian-born sample than the migrant sample. These figures suggest that, while typical migrant families are better off than typical Australian families in terms of wealth, migrant families are less likely to be very poor or very rich compared with Australian-born families. In other words, Australian-born families have greater variation in household wealth than migrant families.

    (c) Suppose the minister has net household wealth of

    $600,000. What can you say about their financial circumstances relative to other Australian-born households?

    The ministers household has greater wealth than at least 90% of Australian-born households in Australia. They

  • 7

    are amongst the wealthiest 10% of Australian households. 6. SIA: Sydney housing prices.

    Figure 3.2 depicts a scatter plot of Sydney housing prices versus distance from Sydney. The unit of observation is a suburb, price is the mean of the median price of houses sold in each suburb for two quarters (September and December 2002) and distance is measured in kilometers from Sydneys CBD.

    (a) What would you expect the correlation to be between price and distance?

    There is an inverse relationship between Distance to CBD and Price so expect correlation to be negative. (b) Does it appear that there is a linear relationship

    between the two variables? Relationship does not look linear largely because of the large variability in prices for suburbs close to the CBD. (These observations also tend to distort what the relationship looks like for the bulk of the data. If you were to eliminate these outliers, it is not clear what the relationship would look like for the remainder of the data.) (c) What other key features of these data can be

    determined from the plot?

  • 8

    Have already mentioned the large variability in

    prices for suburbs close to the CBD. Could say this more formally - the variance of prices close to the CBD (conditional variance) is much larger than the variance of prices further away from the CBD.

    Other outliers around 30kms from CBD (Clareville, Palm Beach and Whale Beach).

    There is no suspicion that these outliers are due to errors. All are feasible observations.

    Can see that the price and distance variables are both skewed to the right.

    There are numerous suburbs where there were no sales. Most of these are suburbs relatively close to the CBD.

    What should we do with the zero sales observations when we analyse the data? They are not data errors as sometimes occur. But they are not real zeros as we dont know what the price would have been had there been sales for the period in question.

    0

    1000000

    2000000

    3000000

    4000000

    5000000

    6000000

    0 10 20 30 40 50 60 70 80

    Pric

    e $

    Distance to CBD (kms)

    Figure 3.2: House prices in Sydney suburbs versus distance to CBD

  • 9

    7. Anzac Grange wants to develop guidelines for setting

    prices of cars according to the cars age. They hire a business consultant who chooses a sample of 117 second-hand passenger car advertisements collected from www.drive.com.au and retrieves data on age and price of the cars.

    (a) The business consultant first calculates the correlation coefficient between age and price and finds it to be -0.278. Interpret this result.

  • 10

    Correlation coefficients lie between -1 and 1. A negative value suggests an inverse relationship between the variables. A magnitude of (-)0.278 suggests that the relationship is not very strong. (b) Then the business consultant constructs a simple

    linear regression model using price (in dollars) as the dependent variable, and age (in years) as the independent variable. This model can be represented by:

    0 1i i iprice age u Interpret the ordinary least squares coefficient estimates, found to b0= 47,467 and b1 = - 2,658.

    The estimated slope coefficient of -2658, suggests that for every year older, second-hand car prices are expected to drop by $2,658. The sign is as expected: older cars tend to have a lower value. The sign is also consistent with the negative correlation coefficient. Literally the intercept is the predicted price of second hand cars with age = 0, i.e. $47,469. As is sometimes the case, interpretation of intercepts may be somewhat problematic. In this particular situation all second-hand cars have age > 0.