27
Print Name KEY Statistics 201 Spring 2018 Exam 1 – Version B Check One Below I am enrolled in the following section: (Check One) Section # Time Day s Instructor Exam Room 001 11:15 – 12:05 pm MWF Missy Morris 002 1:25 – 2:15 pm MWF Terry Higgins 003 8:00 – 8:50 am MWF Missy Morris 004 2:10 – 3:25 pm TR Charlie Cwiek 005 9:05 – 9:55 am MWF Missy Morris 006 11:10 – 12:25 pm MWF Jamie Paul 007 12:20 – 1:10 pm MWF Terry Higgins 008 10:10 – 11:00 am MWF Missy Morris 009 8:10 - 9:25 am TR Jamie Paul 010 12:40 – 1:55 pm TR Jamie Paul 011 5:05 – 6:20 pm TR Brian Stevens Make sure you are in the proper test room listed for your section. 10 points are deducted from exams taken in the improper room. 1

Print Namevolweb.utk.edu/.../uploads/2018/05/Exam-1-Spring-201… · Web viewMosaic plot Leaf report Split history True/False Questions Circle the best answer (2 points each) T F

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Print Namevolweb.utk.edu/.../uploads/2018/05/Exam-1-Spring-201… · Web viewMosaic plot Leaf report Split history True/False Questions Circle the best answer (2 points each) T F

Print Name KEY

Statistics 201 Spring 2018

Exam 1 – Version B

Check One Below

I am enrolled in the following section:

(Check One) Section # Time Days Instructor Exam Room 001 11:15 – 12:05 pm MWF Missy Morris 002 1:25 – 2:15 pm MWF Terry Higgins 003 8:00 – 8:50 am MWF Missy Morris 004 2:10 – 3:25 pm TR Charlie Cwiek 005 9:05 – 9:55 am MWF Missy Morris 006 11:10 – 12:25 pm MWF Jamie Paul 007 12:20 – 1:10 pm MWF Terry Higgins 008 10:10 – 11:00 am MWF Missy Morris 009 8:10 - 9:25 am TR Jamie Paul 010 12:40 – 1:55 pm TR Jamie Paul 011 5:05 – 6:20 pm TR Brian Stevens

Make sure you are in the proper test room listed for your section.10 points are deducted from exams taken in the improper room.

REPORT ALL ANSWERS TO TWO DEMICALS

1

Page 2: Print Namevolweb.utk.edu/.../uploads/2018/05/Exam-1-Spring-201… · Web viewMosaic plot Leaf report Split history True/False Questions Circle the best answer (2 points each) T F

EXAM SCORE: ____________

2

Page 3: Print Namevolweb.utk.edu/.../uploads/2018/05/Exam-1-Spring-201… · Web viewMosaic plot Leaf report Split history True/False Questions Circle the best answer (2 points each) T F

1. The following boxplots show the HS GPA of Stat 201 students in side-by-side boxplots. Use this information to complete the questions below.

i) (2 points) Which group has the smallest range? Circle the best answer:

Youngest In the Middle Oldest Only Child

ii) (3 points) What is the IQR for the boxplot for oldest children?

About 0.45 GPA points (Accept answers between 0.35 and 0.55)

iii) (2 points) Which group has the lowest minimum? Circle the best answer:

Youngest In the Middle Oldest Only Child

3

Page 4: Print Namevolweb.utk.edu/.../uploads/2018/05/Exam-1-Spring-201… · Web viewMosaic plot Leaf report Split history True/False Questions Circle the best answer (2 points each) T F

2. The following scatterplot shows the relationship between “Longest Awake” and “Hours of Sleep Nightly” for a sample of Stat 201 students from Spring 2018. Both variables are in hours.

i) (6 points) Before one calculates a correlation coefficient (r), there are 3 conditions that must be checked. Name each condition below and indicate if each condition is met in this case.

Condition 1 is: Quantitative variable condition

Is the condition met here? Circle one: YES NO

Briefly explain your YES or NO choice below: Both longest hours awake and hours of sleep nightly are quantitative variables

Condition 2 is: Straight Enough

Is the condition met here? Circle one: YES NO

Briefly explain your YES or NO choice below: A straight line goes through the data. NOTE FOR GRADER: This is a different condition than the outlier condition. We meet this straight enough condition.

Condition 3 is: No Outliers

Is the condition met here? Circle one: YES NO

Briefly explain your YES or NO choice below: There is an extreme outlier. Someone reported sleeping 14 hours a night and only being awake 18 hours for longest awake.

NOTE – Carefully read what they write. There is lots of variation in answers. See if they understand the concept with their answers.

4

Page 5: Print Namevolweb.utk.edu/.../uploads/2018/05/Exam-1-Spring-201… · Web viewMosaic plot Leaf report Split history True/False Questions Circle the best answer (2 points each) T F

Question 2 (continued)

The following regression output below was created from the data on the previous page:

ii) (2 points) Regardless of your answer to part (i), find the value of the correlation coefficient for the scatterplot on the previous page.

-0.3059

(Take off one point if the answer is not negative)

To analyze additional variables and their relationship, a correlation matrix was created. Use the matrix below to answer questions. All correlations of 1.0000 should not be considered for answers below.

iii) (3 points) What two variables have the weakest correlation?

HS GPA and Hours of Sleep Nightly

iv) (2 points) How many unique pairs of correlations are there for the variables above?

10

5

Page 6: Print Namevolweb.utk.edu/.../uploads/2018/05/Exam-1-Spring-201… · Web viewMosaic plot Leaf report Split history True/False Questions Circle the best answer (2 points each) T F

3. A local high school tracks the GPA of their graduating seniors. The GPAs are normally distributed with a mean of 3.6 points and a standard deviation of 0.1 points [which can be represented as N(3.6, 0.1)]. Use the graphics below to answer the questions on the next page.

N(3.6, 0.1)Area above 3.75 is 0.0668

N(3.6, 0.1)Area between 3.35 and 3.85 is 0.9876

N(0, 1)Area below 0.5 is 0.6915

N(0, 1)The bottom 90% is below 1.2818

N(0,1)The middle 90% is between -1.645 and 1.645

N(0,1)The top 80% starts at -0.841

6

Page 7: Print Namevolweb.utk.edu/.../uploads/2018/05/Exam-1-Spring-201… · Web viewMosaic plot Leaf report Split history True/False Questions Circle the best answer (2 points each) T F

i) (2 points) What percent of students have a GPA lower than 3.85?

0.9938 or 99.38%

ii) (3 points) What percent of students had a GPA higher than 3.65?

0.3085 or 30.85%

iii) (3 points) What is the GPA for a student in the 95th percentile?

3.7645

iv) (6 points) Using ONLY the 68-95-99.7 rule, provide as small an interval as possible that contains the answer to the question below. Fill in the blanks below with your answer:

“Between 2.5% and 0.15% of students have GPAs above 3.82.”

NOTE: Count each of the answers above as 3 points. The order to the numbers does not matter. Give the student 2 points out of 3 if the student gets close to the answer like 2.35 instead of 2.5.

7

Page 8: Print Namevolweb.utk.edu/.../uploads/2018/05/Exam-1-Spring-201… · Web viewMosaic plot Leaf report Split history True/False Questions Circle the best answer (2 points each) T F

4. The following is a portion of a larger data set for employees at a company.- SSN: The Social Security Number of the employee- Name: The name of the employee- Age: The age of the employee- State Born: The abbreviation of the state the employee was born in- Education: The level of education of the employee- Children: Assessed as 0=has children, 1=does not have children

i) (3 points) Identify each column as a categorical variable [CAT], a quantitative variable [QUANT] or an identifier variable [ID]. Write either CAT, QUANT or ID above each column.

ID CAT QUAN CAT CAT CAT

ii) (3 points) Name two variables from the list above that can be used together to make a mosaic plot.

Any of these two put together:

Name, State Born, Education, Children

iii) (3 points) Name two variables from the list above that can be used together to make stacked histograms.

Must have: Age

One of the following: Name, State Born, Education, Children

8

Page 9: Print Namevolweb.utk.edu/.../uploads/2018/05/Exam-1-Spring-201… · Web viewMosaic plot Leaf report Split history True/False Questions Circle the best answer (2 points each) T F

5. Employees at a small company were asked how much they spend on lunch per day. Each value was rounded to the nearest dollar. Use the data set below to answer the following questions.

12, 5, 5, 14, 9, 10, 14, 6, 15, 55

i) (2 points) What is the range of values spent on lunch?

50

ii) (3 points) What is the average amount spent on lunch?

14.5

iii) (2 points) What is the median amount spent on lunch?

11

vi) (4 points) What is the best measure of center and spread for these data?Note – Your answers below should be in English, not numbers.

Best measure of center for these data:

Median

Best measure of spread for these data:

IQR

9

Page 10: Print Namevolweb.utk.edu/.../uploads/2018/05/Exam-1-Spring-201… · Web viewMosaic plot Leaf report Split history True/False Questions Circle the best answer (2 points each) T F

6. A cellphone company is interested in determining if the number of texts customers send on Saturdays can be used to predict the number of texts those customers send on Wednesday. Some output from JMP is included below:

i) (4 points) Interpret the value of the slope of the least squares regression equation in the context of this problem. Be sure to use actual variable names and units of measure.

For each additional Saturday text, we expect Wednesday texts to increase by 0.5105 on average.

Note: Give grades 0-4 based on how well they interpreted the slope.

10

Page 11: Print Namevolweb.utk.edu/.../uploads/2018/05/Exam-1-Spring-201… · Web viewMosaic plot Leaf report Split history True/False Questions Circle the best answer (2 points each) T F

Problem 6 (continued)

ii) (3 points) Does this y-intercept have any practical interpretation? (Circle One): YES NO If yes, interpret that value. If no, explain why it has no practical interpretation.

Individuals who text 0 times on Saturday are expected to text 8.45 times on Wednesday on average.

iii) (4 points) Report the value of R2, and interpret its value (don’t comment on the magnitude of the number, just explain what the number means).

76.77% of the variation in Wednesday texts is explained by the variation in Saturday texts.

iv) (2 points) When doing regression, we check the same conditions that we check when we do correlation, but there is one additional condition that we check. Name the condition and comment on whether or not the condition is met. (If necessary, refer to the output on the previous page when you make your comments.)

Regression condition: Plot thicken condition (Also known as Plot doesn’t thicken)

Is the condition met here? (Circle one): YES NO

Briefly explain your answer: The plot gets a lot thicker.

v) (4 points) A person in the data set texted 50 times on Saturday. The residual for this person is 7.0225. Use this information and the space below to solve for how many times they texted on Wednesday. Circle your final answer.

Predicted Wednesday = 8.45 + 0.51(Saturday Texts)

8.45255119 + 0.5104999(50) = 33.977

Actual – predicted = residual

41= 33.977 + 7.0225

THE ANSWER IS 41 (Or very close to)

11

Page 12: Print Namevolweb.utk.edu/.../uploads/2018/05/Exam-1-Spring-201… · Web viewMosaic plot Leaf report Split history True/False Questions Circle the best answer (2 points each) T F

7. The following is a pivot chart created in Excel of some Stat 201 survey data. The pivot table summarizes the average number of languages spoken by whether or not a person was born in Tennessee and their gender.

i) (3 points) What could be added to the pivot chart to help the reader understand the values represented by the bars in the graphic?

Data table or data labels

ii) (2 points) What additional element could be added to this pivot chart to give the chart more context?

Axis TitlesChart titles

Legend Title

iii) (3 points) The most languages any person in the data set said they spoke was 6. If we want to use the pivot chart to find out which group they are in, what field would we change?

a) Columnsb) Rowsc) Values d) The pivot chart can’t display this information

12

Page 13: Print Namevolweb.utk.edu/.../uploads/2018/05/Exam-1-Spring-201… · Web viewMosaic plot Leaf report Split history True/False Questions Circle the best answer (2 points each) T F

8. A doctor collects data on her patients to see if they are more or less likely to break a bone. She uses a decision tree to make her predictions. Use the analysis below to answer the questions on the next page.

Leaf Report

13

Page 14: Print Namevolweb.utk.edu/.../uploads/2018/05/Exam-1-Spring-201… · Web viewMosaic plot Leaf report Split history True/False Questions Circle the best answer (2 points each) T F

Question 8 (Continued)

i) (2 points) According to the decision tree, which patient below is most likely to break a bone?

a) A patient who takes 1 monthly selfie and has a longest awake value of 40. b) A patient who takes 6 monthly selfies and has a longest awake value of 20.c) A patient who takes 1 monthly selfie and has a longest awake value of 20.d) A patient who takes 6 monthly selfies and has a longest awake value of 40.

ii) (2 points) What two types of variables can a decision tree use to make predictions?

Categorical and quantitative

iii) (3 points) What percent of the patients who have stayed awake greater than or equal to 28 hours have broken a bone?

203/375 = 54.133%

iv) (2 points) What additional graphic could be added to the decision tree report on the previous page to show the value of R-squared as we created the splits?

a) Pareto plotb) Mosaic plotc) Leaf reportd) Split history

14

Page 15: Print Namevolweb.utk.edu/.../uploads/2018/05/Exam-1-Spring-201… · Web viewMosaic plot Leaf report Split history True/False Questions Circle the best answer (2 points each) T F

True/False QuestionsCircle the best answer

(2 points each)

T F Outliers in a boxplot start at 2 IQRs above Q3 and 2 IQRs below Q1.

T F When the correlation is above 0.90 or below -0.90, it is safe to conclude that changes in X cause changes in Y.

T F Outliers can drastically impact the value of R2.

T F Correlation is described with shape, center and spread.

T F When checking the significance of a correlation coefficient, a p-value of 0.01 would imply that the correlation is not significant (i.e. possibly due to random chance).

T F If one knows the mean and standard deviation of a data set, then all values in the data set can be converted into z-scores.

15

Page 16: Print Namevolweb.utk.edu/.../uploads/2018/05/Exam-1-Spring-201… · Web viewMosaic plot Leaf report Split history True/False Questions Circle the best answer (2 points each) T F

(This page intentionally left blank)

16

Page 17: Print Namevolweb.utk.edu/.../uploads/2018/05/Exam-1-Spring-201… · Web viewMosaic plot Leaf report Split history True/False Questions Circle the best answer (2 points each) T F

FORMULA PAGE:

y = b0 + b1x

e = y - y

17