21
Applied Multivariate Statistics Fall Semester 2014 University of Mannheim Department of Economics Chair of Statistics Toni Stocker

Slides 0

Embed Size (px)

DESCRIPTION

Presentation

Citation preview

  • Applied Multivariate StatisticsFall Semester 2014

    University of MannheimDepartment of EconomicsChair of StatisticsToni Stocker

  • Applied Multivariate Statistics (AMS) - ContentIntroduction to AMS

    Principal Component Analysis (PCA)

    Correspondence Analysis

    Matrix Algebra

    Multivariate Samples

    Biplots

    Multidimensional Scaling (MDS)Cluster Analysis

    Linear Discriminant Analysis (LDA)Binary Response Models

    Factor Analysis

    1

    20

    57

    77

    129

    141

    152

    170

    183

    194

    212

  • Introduction to AMS

    1

  • 2 All other students:should have attended two or more courses in Statistics(descriptive statistics, estimating and hypothesis testing)

    Students in Economics from Mannheim: no problem

    A course in Basic Econometrics is helpful but not strictly required.

    The statistical software R will intensively be used throughout thiscourse. Students who are not yet familiar with R should workthrough chapters 1-5 of the R introduction (see course folder) ontheir own by September 12 at the latest.

    If you are not yet sure whether you will attend this course, you mayread sections 1.1 and 1.2 in Johnson & Wichern (see p. 3) to get anidea about the purposes of this course.

    Though R is easy to learn, you need to invest some time at the beginning. But you may benefit from it for a long time.

    Prerequisites

    General Course Information

  • Day Time LocationLecture Friday 10:15-11:45 L7, 3-5, P043Tutorial Friday 08:30-10:15 L7, 3-5, P043Extra-R-Tutorial Monday 12:00-13:30 t.b.a.

    Office Hour: Wednesday, 3:30-5:00 p.m. or by appointment

    Office: L7, 3-5, 1st floor, room 143

    Phone: 0621-181-3963

    Email: [email protected]

    General Course Information

    Tutorials start in the 2nd week.

    3

    Time and Locations

    Contact

  • Slides (Lecture), Assignments (Tutorials), Introduction to R (see p. 1)Material will be updated weekly (Friday) to find in course folder at Studierendenportal (ILIAS)

    General Course Information

    R. Johnson, D. Wichern (2007):Applied Multivariate Statistical Analysis; Pearson Intl. Ed.

    4

    Main Reference

    A. J. Izenman (2008):Modern Multivariate Statistical Techniques; Springer.

    P. Hewson (2009):Multivariate Statistics with R; Open Text Book.

    A. Rencher (2002):Methods of Multivariate Analysis; Wiley.

    W. Hrdle, L. Simar (2003):Applied Multivariate Statistical Analysis; Wiley.

    Course Material

    References

  • Exam + Assignments:80% written exam (120 minutes) + 20% Assignmentsin terms of points to earn in total.

    Example:Points

    Written Exam: 60 (from 80)Assignments: 18 (from 20): Total: 78 (from 100)=> Grading will be based on 78 points (from 100)Minimum for passing: 40

    Assignments:Need to submit homework and attend tutorial. To get full points (20) youneed to work at least on 10 assignments (out of 11) in a meaningful way.(See Guidelines for Assignments)

    5

    Examination

  • Multivariate analysis consists of a collection of methods that can beused when several measurements are made on each individual orobject in one or more samples.

    6

    See Renchner (2002), p.1

    Objectives Dimension reduction and structural simplification

    Grouping, discrimination and classification

    Investigation of the dependence among variables

    Visualization of high-dimensional data

    (see also J+W (2007), p.2)

    What is it about?

    Issues of Applied Multivariate Statistics (AMS)

    Close link to other areas such as Exploratory Data Analysis (EDA) and Data Mining

  • What is it about?

    7

    Example 1: Dimension Reduction

    Economic Indicators for the 27 European Union Countries in 2011(see WIREs Comput Stat 2012, 4:399406. doi: 10.1002/wics.1200)

  • What is it about?

    8

    Example 2: Modern Graphical Techniques

  • What is it about?

    9

    Example 2 ...

  • What is it about?

    10

    Example 3: Factor Analysis

    Consumer Preference (J&W, example 9.9, p. 508)

    =

    179.011.085.001.079.0150.071.042.011.050.0113.096.085.071.013.0102.001.042.096.002.01

    R

    T

    a

    s

    t

    e

    G

    o

    o

    d

    b

    u

    y

    f

    o

    r

    m

    o

    n

    e

    y

    F

    l

    a

    v

    o

    r

    S

    u

    i

    t

    a

    b

    l

    e

    f

    o

    r

    s

    n

    a

    c

    k

    P

    r

    o

    v

    i

    d

    e

    s

    l

    o

    t

    s

    o

    f

    e

    n

    e

    r

    g

    y

  • What is it about?

    Example 4: Distances

    11

    Voting results for 15 congressmen from New Jersey(example from R package HSAUR)

    Hunt(R) Sandman(R) Howard(D)Hunt(R) 0 8 15Sandman(R) 8 0 17Howard(D) 15 17 0

    Extraction from the distance matrix ...

  • What is it about?

    Example 5: Grouping

    12

  • What is it about?

    13

    Example 5 ...

  • What is it about?

    Example 6: Classification

    14

    Labor Market Participation of Married Women in Switzerland (1981)(example from R package AER)

  • What is it about?

    Example 7: Discrimination

    15

    Heights and weights of students

  • What is it about?

    Example 8: Correspondence Analysis

    16

    Baccalaurat in France (Hrdle & Simar, p. 313)

  • What is it about?

    Generally: Chapters 1-4, 8, 9, 11, 12 from Johnson and Wichern (J&W)

    Timetable and Contents

    Lecture 3: Matrix Algebra (part 2)

    Lecture 2: Matrix Algebra (part 1)

    Lecture 4: Multivariate Samples

    Lecture 5: Principal Component Analysis

    Lecture 1: Introduction (today)

    17

    Lecture 6: Biplots

    Lecture 7: Factor Analysis

    Course Outline

  • What is it about?

    Lecture 8: Multidimensional Scaling

    Lecture 10: Linear Discriminant Analysis

    18

    Lecture 13: Used as time buffer

    Lecture 11: Binary Response Models

    Lecture 12: Correspondence Analysis

    Lecture 9: Cluster Analysis

    Note: This is just a plan! Topics may be skipped; order

    may be changed; lecture topics may overlap

  • What is it about?

    ... at the end of the semester you

    know and (hopefully) understand most common methods foranalyzing multivariate data and their theoretical background

    Generally: This is an introductory and applied course. Modern multivariate techniques based on machine learning algorithms will hardly be covered.

    19

    can proficiently use R when using multivariate techniques:data import, constructing graphics, inference, model diagnosis and assessment

    have experienced the possibilities and limitations ofmultivariate methods on the basis of real data examples

    Main Objectives