44
Package ‘MMST’ February 15, 2013 Type Package Title DATASETS FROM MMST Version 0.6-1.1 Date 2010-07-04 Author Keith Halbert <[email protected]> Maintainer Keith Halbert <[email protected]> Description The datasets from Modern Multivariate Statistical Techniques by Alan Julian Izenman are contained in this package. The documentation descriptions show the page numbers of references to the data set within the text. See the text for detailed descriptions of the datasets. Also included in this package is a function for exporting these datasets en masse. License GPL (>= 2) LazyLoad yes Repository CRAN Date/Publication 2011-02-11 16:58:30 Depends R (>= 2.10) NeedsCompilation no R topics documented: MMST-package ....................................... 3 AirlineDistances ...................................... 3 alontop ........................................... 4 baseball ........................................... 5 bodyfat ........................................... 6 boston ............................................ 7 1

Package ‘MMST’

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Package ‘MMST’

Package ‘MMST’February 15, 2013

Type Package

Title DATASETS FROM MMST

Version 0.6-1.1

Date 2010-07-04

Author Keith Halbert <[email protected]>

Maintainer Keith Halbert <[email protected]>

Description The datasets from Modern Multivariate StatisticalTechniques by Alan Julian Izenman are contained in thispackage. The documentation descriptions show the page numbersof references to the data set within the text. See the textfor detailed descriptions of the datasets. Also included inthis package is a function for exporting these datasets en masse.

License GPL (>= 2)

LazyLoad yes

Repository CRAN

Date/Publication 2011-02-11 16:58:30

Depends R (>= 2.10)

NeedsCompilation no

R topics documented:MMST-package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3AirlineDistances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3alontop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4baseball . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5bodyfat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6boston . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1

Page 2: Package ‘MMST’

2 R topics documented:

BritishTowns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8bupa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8cleveland . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9color.stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10COMBO17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10covertype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11detergent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11diabetes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12ecoli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12foetal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13food . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14geyser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15gilgaied.soil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15glass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Hidalgo1872 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17ionosphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17iris . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18letter.recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18leukemia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19lloyd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20MEG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21MMST.out . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21morse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22ncifinal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23norwaypaper1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23pendigits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24pet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25pima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25primate.scapulae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26psych24r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27root.stocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28satimage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29shoplifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29shuttle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30soldat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31sonar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31spambase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32steganography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34swiss.roll . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34SwissBankNotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35tobacco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35tumors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36turtles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37ushighways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37vehicle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38wdbc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39wine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40x498.matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Page 3: Package ‘MMST’

AirlineDistances 3

yeast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Index 43

MMST-package DATASETS FROM MMST

Description

Data sets for Modern Multivariate Statistical Techniques, by A. Izenman (2008).

Details

Package: MMSTType: PackageVersion: 0.6-1Date: 2010-07-04License: GPL (>= 2)LazyLoad: yes

Author(s)

Keith Halbert <[email protected]>

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

See Also

MMST.out

AirlineDistances MMST AIRLINE DISTANCES DATA

Description

airline distances, 464, 467, 481, 482, 484

Usage

data(AirlineDistances)

Page 4: Package ‘MMST’

4 alontop

Format

A distance matrix of the distance in kilometers between the following 18 cities: Beijing, CapeTown, Hong Kong, Honolulu, London, Melbourne, Mexico, Montreal, Moscow, New Delhi, NewYork, Paris, Rio de Janeiro, Rome, San Francisco, Singapore, Stockholm, and Tokyo.

Source

National Geographic Society (1995), National Geographic Atlas of the World, Rev 6th Edition,National Geograpic

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

alontop MMST ALONTOP DATA

Description

colon cancer, 19, 20, 443, 444, 446

Usage

data(alontop)

Format

A data frame with 62 observations (tissue samples) on 92 numeric variables (a subset of a larger setof more than 6500 genes).

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., and Levine, A. (1999). Broadpatterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probedby oligonucleotide arrays, Proceedings of the National Academy of Sciences, 96, 6745-6750.

Page 5: Package ‘MMST’

baseball 5

baseball MMST BASEBALL DATA

Description

Major League Baseball salaries, 307, 308, 309, 368

Usage

data(baseball)

Format

A data frame with 337 observations on the following 18 variables.

salary a numeric vector

BA a numeric vector

OBP a numeric vector

Runs a numeric vector

Hits a numeric vector

X2B a numeric vector

X3B a numeric vector

HR a numeric vector

RBI a numeric vector

BB a numeric vector

SO a numeric vector

SB a numeric vector

E a numeric vector

FAE a numeric vector

FA a numeric vector

AE a numeric vector

A a numeric vector

Name a character vector

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Watnik, M.R. (1998). Pay for play: are baseball salaries based on performance? Journal of StatisticsEducation, 6.

Page 6: Package ‘MMST’

6 bodyfat

bodyfat MMST BODYFAT DATA

Description

bodyfat, 116, 125, 128, 146, 148, 150, 151, 154

Usage

data(bodyfat)

Format

A data frame with 252 observations on the following 15 variables.

density a numeric vector

bodyfat a numeric vector

age a numeric vector

weight a numeric vector

height a numeric vector

neck a numeric vector

chest a numeric vector

abdomen a numeric vector

hip a numeric vector

thigh a numeric vector

knee a numeric vector

ankle a numeric vector

biceps a numeric vector

forearm a numeric vector

wrist a numeric vector

Source

http://lib.stat.cmu.edu/datasets/

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Page 7: Package ‘MMST’

boston 7

boston MMST BOSTON HOUSING DATA

Description

Boston housing, 158

Usage

data(boston)

Format

A data frame with 506 observations on the following 14 variables.

crim a numeric vector

zn a numeric vector

indus a numeric vector

chas a numeric vector

nox a numeric vector

rm a numeric vector

age a numeric vector

dis a numeric vector

rad a numeric vector

tax a numeric vector

ptratio a numeric vector

black a numeric vector

lstat a numeric vector

medv a numeric vector

Source

http://lib.stat.cmu.edu/datasets/

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Page 8: Package ‘MMST’

8 bupa

BritishTowns MMST BRITISHTOWNS DATA

Description

British towns, 504

Usage

data(BritishTowns)

Format

A distance matrix of the distances in kilometers between the following 48 British towns: Ab-erdeen, Aberystwyth, Barnstaple,Birmingham, Brighton, Bristol, Cambridge, Cardiff, Carlisle, Car-marthen, Colchester, Dorchester, Dov, Edinburgh, Exeter, Fort.William, Glasgow, Gloucester, Guild-ford, He, Holyhead, Hull, Inverness, Kendal, Leeds, Lincoln, Liverpool, Maidstone, Manchester,Middlesborough, Newcastle, Northampton, Norwich, Nottingham, Oxford, Penzance, Perth, Ply-mouth, Preston, Salisbury, Sheffield, Shre, Southampton, Stoke, Stranraer, Taunton, York, and Lon-don.

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

bupa MMST BUPA LIVER DISORDERS DATA

Description

BUPA liver disorders, 258, 260, 348, 387, 508

Usage

data(bupa)

Format

A data frame with 345 observations on the following 7 variables.

mcv a numeric vectoralkphos a numeric vectorsgpt a numeric vectorsgot a numeric vectorgammagt a numeric vectordrinks a numeric vectorgroup a factor with levels 1 2

Page 9: Package ‘MMST’

cleveland 9

Source

http://archive.ics.uci.edu/ml/datasets.html

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

cleveland MMST CLEVELAND DATA

Description

Cleveland heart-disease, 284, 286, 287, 289, 291, 314, 368

Usage

data(cleveland)

Format

A data frame with 296 observations on the following 15 variables.

age a numeric vectorgender a factor with levels fem malecp a factor with levels abnang angina asympt notang

trestbps a numeric vectorchol a numeric vectorfbs a factor with levels fal truerestecg a factor with levels abn hyp norm

thalach a numeric vectorexang a factor with levels fal trueoldpeak a numeric vectorslope a factor with levels down flat up

ca a numeric vectorthal a factor with levels fix norm rev

diag1 a factor with levels buff sickdiag2 a factor with levels H S1 S2 S3 S4

Source

http://archive.ics.uci.edu/ml/datasets.html

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Page 10: Package ‘MMST’

10 COMBO17

color.stimuli MMST COLOR.STIMULI DATA

Description

perceptions of color, 468, 469, 503

Usage

data(color.stimuli)

Format

A distance matrix of colors represented by the following wavelengths (micrometers): 434, 445, 465,472, 490, 504, 537, 555, 584, 600, 610, 628, 651, and 674.

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Ekman, G. (1954). Dimensions of color vision, Journal of Psychology, 38, 467-474.

COMBO17 MMST COMBO17 DATA

Description

COMBO-17 galaxy photometric catalogue, 216, 219, 235

Usage

data(COMBO17)

Format

A data frame with 3462 observations on 65 numeric variables.

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Wolf. C., Meisenheimer, M., Kleinheinrich, M., Borch, A., Dye, S., Gray, M., Wisotski, L., Bell,E.F., Rix, H.-W., Cimatti, A., Hasinger, G., and Szokoly, G. (2004). A catalogue of the Chan-dra Deep Field South with multi-colour classification and photometric redshifts from COMBO-17,Astronomy & Astrophysics, arXiv:astro-ph/0403666v1.

Page 11: Package ‘MMST’

covertype 11

covertype MMST COVERTYPE DATA

Description

covertype, 279

Usage

data(covertype)

Format

A data frame with 581012 observations on 55 variables.

Source

http://kdd.ics.uci.edu/databases/covertype/

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

detergent MMST DETERGENT DATA

Description

laundry detergent, 156

Usage

data(detergent)

Format

A data frame with 12 observations on 1173 variables. There are five Y variables, representing fourcompounds in an aquaeous solution (the fifth Y variable is the amount of water in the solution).The X input variables consist of mid-infrared spectrum values recorded as the absorbances at 1168equally spaced frequencies in the detergent.

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Page 12: Package ‘MMST’

12 ecoli

diabetes MMST DIABETES DATA

Description

diabetes, 272, 278, 348, 391

Usage

data(diabetes)

Format

A data frame with 145 observations on the following 6 variables.

glucose.area a numeric vector

insulin.area a numeric vector

SSPG a numeric vector

relative.weight a numeric vector

fasting.plasma.glucose a numeric vector

class a numeric vector

Source

http://lib.stat.cmu.edu/datasets/Andrews/

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Andrews, D.F. and Herzberg, A.M. (1985). Data, New York: Springer.

ecoli MMST ECOLI DATA

Description

e-coli, 273, 279, 348, 391

Usage

data(ecoli)

Page 13: Package ‘MMST’

foetal 13

Format

A data frame with 336 observations on the following 9 variables.

label a character vectormvg a numeric vectorgvh a numeric vectorlip a numeric vectorchg a numeric vectoraac a numeric vectoralm1 a numeric vectoralm2 a numeric vectorsite a factor with levels cp im imL imS imU om omL pp

Source

http://archive.ics.uci.edu/ml/datasets.html

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

foetal MMST FOETAL DATA

Description

cutaneous potential recordings of a pregnant woman, 554, 556, 592

Usage

data(foetal)

Format

A data frame with 2500 observations of ECG points. The first variable is a simple timestep, thenext five channels are measured near the fetus (abdominal signals) and the last three channels wereplaced on the mother’s thorax (chest).

timestep a numeric vectorab1 a numeric vectorab2 a numeric vectorab3 a numeric vectorab4 a numeric vectorab5 a numeric vectorth1 a numeric vectorth2 a numeric vectorth3 a numeric vector

Page 14: Package ‘MMST’

14 food

Source

http://homes.esat.kuleuven.be/~smc/daisy/daisydata.html

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

de Lathauwer, L., de Moor, B., Vandewalle, J. (2000). Fetal electrocardiogram extraction by blindsource subspace separation, IEEE Transactions on Biomedical Engineering, 47, 567-573. Proceed-ings of the IEEE SP/Athos Workshop on Higher-Order Statistics, Girona, Spain, pp. 134-138.

food MMST FOOD DATA

Description

nutritional value of food, 196, 198, 206, 208, 462, 612, 613, 631

Usage

data(food)

Format

A data frame with 961 observations on the following 7 variables.

Fat.grams a numeric vector

Food.energy.calories a numeric vector

Carbohydrates.grams a numeric vector

Protein.grams a numeric vector

Cholesterol.mg a numeric vector

weight.grams a numeric vector

Saturated.fat.grams a numeric vector

Source

http://www.ntwrks.com/~mikev/chart1.html

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Page 15: Package ‘MMST’

geyser 15

geyser MMST GEYSER DATA

Description

Old Faithful Geyser, 99, 100, 409, 410

Usage

data(geyser)

Format

A data frame with 107 observations on the following 2 variables.

X1 a numeric vector

X2 a numeric vector

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Weisberg, S. (1985). Applied Linear Regression, Second Edition, New York: Wiley.

gilgaied.soil MMST GILGAIED SOIL DATA

Description

gilgaied soil, 271, 278, 367

Usage

data(gilgaied.soil)

Format

A data frame with 48 observations on the following 11 variables.

pH a numeric vector

N a numeric vector

BD a numeric vector

P a numeric vector

Ca a numeric vector

Mg a numeric vector

Page 16: Package ‘MMST’

16 glass

K a numeric vectorNa a numeric vectorcond a numeric vectorBlock.no. a factor with levels 1 2 3 4

Group.no. a factor with levels 1 2 3 4 5 6 7 8 9 10 11 12

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Horton, I.F., Russell, J.S., and Moore, A.W. (1968). Multivariatecovariance and canonical analysis:a method for selecting the most effective discriminators in a multivariate situation, Biometrics, 24,845-858.

glass MMST GLASS DATA

Description

forensic glass, 273, 348, 391, 508, 550

Usage

data(glass)

Format

A data frame with 214 observations on the following 10 variables.

RI a numeric vectorNa a numeric vectorMg a numeric vectorAl a numeric vectorSi a numeric vectorK a numeric vectorCa a numeric vectorBa a numeric vectorFe a numeric vectortype a factor with levels 1 2 3 5 6 7

Source

http://archive.ics.uci.edu/ml/datasets.html

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Page 17: Package ‘MMST’

Hidalgo1872 17

Hidalgo1872 MMST HIDALGO 1872 STAMP DATA

Description

Hidalgo postage stamps, 93, 96, 98

Usage

data(Hidalgo1872)

Format

A data frame with 485 observations on the following 3 variables.

thickness a numeric vectorthicknessA a numeric vectorthicknessB a numeric vector

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Izenman, A.J. and Sommer, C.J. (1988). Philatelic mixtures and multimodal densities, Journal ofthe American Statistical Association, 83, 941-953.

ionosphere MMST IONOSPHERE DATA

Description

ionosphere, 258, 260, 348, 387

Usage

data(ionosphere)

Format

A data frame with 351 observations on 34 continuous variables and 1 factor, classifying the observa-tion as Good (show some type of structure in the ionosphere) or Bad (pass through the ionosphere).

Source

http://archive.ics.uci.edu/ml/datasets.html

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Page 18: Package ‘MMST’

18 letter.recognition

iris MMST IRIS DATA

Description

iris, 235, 274, 278, 348, 391

Usage

data(iris)

Format

A data frame with 150 observations on the following 5 variables. This is R.A. Fisher’s classic dataset.

sepal.length a numeric vector

sepal.width a numeric vector

petal.length a numeric vector

petal.width a numeric vector

type a factor with levels Iris-setosa Iris-versicolor Iris-virginica

Source

http://archive.ics.uci.edu/ml/datasets.html

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

letter.recognition MMST LETTER.RECOGNITION DATA

Description

letter recognition, 274, 348, 391

Usage

data(letter.recognition)

Page 19: Package ‘MMST’

leukemia 19

Format

A data frame with 20000 observations on the following 17 variables. V1 through V16 are primitive,scaled to fit into a range of integer values of 0-15.

letter a factor with levels A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

V1 a numeric vector

V2 a numeric vector

V3 a numeric vector

V4 a numeric vector

V5 a numeric vector

V6 a numeric vector

V7 a numeric vector

V8 a numeric vector

V9 a numeric vector

V10 a numeric vector

V11 a numeric vector

V12 a numeric vector

V13 a numeric vector

V14 a numeric vector

V15 a numeric vector

V16 a numeric vector

Source

http://archive.ics.uci.edu/ml/datasets.html

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

leukemia MMST LEUKEMIA DATA

Description

leukemia (ALL/AML), 451, 453, 461

Usage

data(leukemia)

Page 20: Package ‘MMST’

20 lloyd

Format

A data frame with 72 observations on 7140 variables.

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Golub, T.R., Slonim, D.K., Tamayo, P., Huard, M., Gaasenbeek, J.P„ Mesirov, J.P., Coller, H., Loh,M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., and Lander, E.S. (1999). Molecular clas-sification of cancer: class discovery and class prediction by gene expression monitoring, Science,286, 531-537.

lloyd MMST LLOYD’S BANK DATA

Description

employee careers at Lloyds Bank, 477, 478, 489

Usage

data(samp05)data(samp25)data(samp05d)data(samp25d)

Format

There are two data frames and two distance matrices.

Details

There are four data sets utilized in this analysis, the data function must be repeatedly used to loadeach of the four.

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Stovel, K., Savage, M., and Bearman, P. (1996). Ascription into Achievement: Models of careersystems at Lloyds Bank, 1890-1970. American Journal of Sociology, 102, 358-399.

Page 21: Package ‘MMST’

MEG 21

MEG MMST MEG DATA

Description

identifying artifacts in MEG recordings, 569

Usage

data(MEG)

Format

A data frame with 122 observations on 17730 unnamed variables.

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Vigario, R„ Jousmaki, V., Hamalainen, M., Hari, R., and Oja, E. (1998). Independent compo-nent analysis for identification of artifacts in magnetoencephalographic recordings, In: Advances inNeural Information Processing Systems, 10, pp. 229-235, Cambridge, MA: MIT Press.

MMST.out MMST DATA SET OUTPUT

Description

Function to output data sets for Modern Multivariate Statistical Techniques, by A. Izenman (2008),to a single destination

Usage

MMST.out(dest.folder, datasets = ’all’)

Arguments

dest.folder String containing path to destination folder for files

datasets Vector of strings, each component being the name of a desired dataset (defaultis to output all the data sets contained in the package)

Details

The datasets will be tab delimited with file extension .txt. This task could be done manually usingwrite.table, and this is what the user should do if they are particular about the format of theexported dataset. The reason this function exists is for one to be able to easily export every datasetin the book at a single stroke.

Page 22: Package ‘MMST’

22 morse

Value

NULL

Note

The datasets of class dist are exported as symmetric matrices

Author(s)

Keith Halbert <[email protected]>

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

See Also

write.table

Examples

## Not run:MMST.out(’C:/output’) ## exports all the book’s datasetsMMST.out(’C:/output’, ’bodyfat’) ## exports single datasetMMST.out(’C:/output’, c(’bodyfat’, ’tobacco’)) ## exports two datasets

## End(Not run)

morse MMST MORSE DATA

Description

confusion of Morse-code signals, 469, 470, 503, 504

Usage

data(morse)

Format

A data frame with 36 numeric observations on variables representing each of the 36 alphanumericcharacters.

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Rothkopf, E.Z. (1957). A measure of stimulus similarity and errors in some paired-associate learn-ing, Journal of Experimental Psychology, 53, 94-101.

Page 23: Package ‘MMST’

ncifinal 23

ncifinal MMST NCIFINAL DATA

Description

National Cancer Institute, 461

Usage

data(ncifinal)

Format

A data frame with 5244 observations on 62 variables.

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

http://www.cancer.gov/

norwaypaper1 MMST NORWAYPAPER1 DATA

Description

Norwegian paper quality, 166, 167, 190, 193, 194

Usage

data(norwaypaper1)

Format

A data frame with 29 observations on the following 22 variables.

Y1 a numeric vector

Y2 a numeric vector

Y3 a numeric vector

Y4 a numeric vector

Y5 a numeric vector

Y6 a numeric vector

Y7 a numeric vector

Y8 a numeric vector

Y9 a numeric vector

Page 24: Package ‘MMST’

24 pendigits

Y10 a numeric vectorY11 a numeric vectorY12 a numeric vectorY13 a numeric vectorX1 a numeric vectorX2 a numeric vectorX3 a numeric vectorX4 a numeric vectorX5 a numeric vectorX6 a numeric vectorX7 a numeric vectorX8 a numeric vectorX9 a numeric vector

Source

http://lib.stat.cmu.edu/datasets/

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Aldrin, M. (1996). Moderate projection pursuit regression for multivariate response data, Compu-tational Statistics & Data Analysis, 21, 501-531.

pendigits MMST PENDIGITS DATA

Description

pen-based handwritten digit recognition, 211, 234, 274, 348, 391, 631

Usage

data(pendigits)

Format

A data frame with 10992 observations on 36 unnamed variables.

Source

http://archive.ics.uci.edu/ml/datasets.html

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Page 25: Package ‘MMST’

pet 25

pet MMST PET DATA

Description

PET yarns, 130, 133, 134, 136, 137, 142, 144, 156

Usage

data(pet)

Format

A data frame with 28 observations on 270 variables.

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Swierenga, H., de Weijer, A.P., van Wijk, R.J., and Buydens, L.M.C. (1999). Stategy for construct-ing robust multivariate calibration models, Chemometrics and Intelligent Laboratory Systems, 49,1-17.

pima MMST PIMA DATA

Description

Pima Indians diabetes, 292, 294, 296, 298, 299, 301, 302, 314, 368, 549

Usage

data(pima)

Format

A data frame with 532 observations on the following 9 variables.

npregnant a numeric vectorglucose a numeric vectordiastolic.bp a numeric vectorskinfold.thickness a numeric vectorbmi a numeric vectorpedigree a numeric vectorage a numeric vectorclassdigit a factor with levels 0 1

class a factor with levels diabetic normal

Page 26: Package ‘MMST’

26 primate.scapulae

Source

http://archive.ics.uci.edu/ml/datasets.html

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

primate.scapulae MMST PRIMATE.SCAPULAE DATA

Description

primate scapulae, 274, 279, 280, 348, 391, 420, 421, 461

Usage

data(primate.scapulae)

Format

A data frame with 105 observations on the following 11 variables.

genus a numeric vector

AD.BD a numeric vector

AD.CD a numeric vector

EA.CD a numeric vector

Dx.CD a numeric vector

SH.ACR a numeric vector

EAD a numeric vector

beta a numeric vector

gamma a numeric vector

class a factor with levels Gorilla Homo Hylobates Pan Pongo

classdigit a factor with levels 1 2 3 4 5

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Ashton, K.H., Oxnard, C.E., and Spence, T.F. (1965). Scapular shape and primate classification,Proceeding of the Zoological Society, London, 145, 125-142.

Page 27: Package ‘MMST’

psych24r 27

psych24r MMST PSYCH24R DATA

Description

psychological tests, 587, 588, 595

Usage

data(psych24r)

Format

A data frame with 301 observations on the following 31 variables.

Case a numeric vector

Sex a factor with levels F M

Age a numeric vector

Grp a numeric vector

V1 a numeric vector

V2 a numeric vector

V3 a numeric vector

V4 a numeric vector

V5 a numeric vector

V6 a numeric vector

V7 a numeric vector

V8 a numeric vector

V9 a numeric vector

V10 a numeric vector

V11 a numeric vector

V12 a numeric vector

V13 a numeric vector

V14 a numeric vector

V15 a numeric vector

V16 a numeric vector

V17 a numeric vector

V18 a numeric vector

V19 a numeric vector

V20 a numeric vector

V21 a numeric vector

Page 28: Package ‘MMST’

28 root.stocks

V22 a numeric vector

V23 a numeric vector

V24 a numeric vector

V25 a numeric vector

V26 a numeric vector

group a factor with levels GRANT PASTEUR

Source

http://www.psych.yorku.ca/friendly/lab/files/psy6140/data/psych24r.sas

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

root.stocks MMST ROOT STOCKS DATA

Description

root-stocks of apple trees, 193

Usage

data(root.stocks)

Format

A data frame with 104 observations on the following 5 variables.

type a factor with levels I II III IV IX V VI VII X XII XIII XV XVI

Y1 a numeric vector

Y2 a numeric vector

Y3 a numeric vector

Y4 a numeric vector

Source

http://lib.stat.cmu.edu/datasets/Andrews/

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Andrews, D.F. and Herzberg, A.M. (1985). Data, New York: Springer.

Page 29: Package ‘MMST’

satimage 29

satimage MMST SATIMAGE DATA

Description

Landsat satellite image, 428, 431, 436, 438, 461

Usage

data(satimage)

Format

A data frame with 4435 observations on 37 variables.

Source

http://www.liacc.up.pt/

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

shoplifting MMST SHOPLIFTING DATA

Description

shoplifting in the Netherlands, 634, 635, 646, 647

Usage

data(shoplifting)

Format

A data frame with 18 observations on the following 13 variables.

clothing a numeric vector

accessories a numeric vector

tobacco a numeric vector

writing a numeric vector

books a numeric vector

records a numeric vector

goods a numeric vector

Page 30: Package ‘MMST’

30 shuttle

sweets a numeric vector

toys a numeric vector

jewelry a numeric vector

perfume a numeric vector

tools a numeric vector

other a numeric vector

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

van der Heijden, P.G.M., de Falguerolles, A., and de Leeuw, J. (1989). A combined approach tocontingency table analysis using correspondence analysis and log-linear analysis, Applied Statistics,38, 249-292.

shuttle MMST SHUTTLE DATA

Description

shuttle, 274, 348, 391

Usage

data(shuttle)

Format

A data frame with 43500 observations on 10 unnamed numeric variables.

Source

http://archive.ics.uci.edu/ml/datasets.html

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Page 31: Package ‘MMST’

soldat 31

soldat MMST SOLDAT DATA

Description

aqueous solubility in drug discovery, 514, 515

Usage

data(soldat)

Format

A data frame with 5631 observations on 72 input variables and 1 output variable.

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Culp, M., Johnson, K., and Michailidis, G. (2006). ada: an R package for stochastic boosting,Journal of Statistical Software, 17, 2.

sonar MMST SONAR DATA

Description

sonar, 259, 260, 348, 387

Usage

data(sonar)

Format

A data frame with 208 observations on 61 unnamed variables.

Source

http://archive.ics.uci.edu/ml/datasets.html

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Page 32: Package ‘MMST’

32 spambase

spambase MMST SPAMBASE DATA

Description

spambase, 259, 260, 278, 348, 385, 387, 512, 549

Usage

data(spambase)

Format

A data frame with 4601 observations on the following 59 variables.

make a numeric vector

address a numeric vector

all a numeric vector

xd a numeric vector

our a numeric vector

over a numeric vector

remove a numeric vector

internet a numeric vector

order a numeric vector

mail a numeric vector

receive a numeric vector

will a numeric vector

people a numeric vector

report a numeric vector

addresses a numeric vector

free a numeric vector

business a numeric vector

email a numeric vector

you a numeric vector

credit a numeric vector

your a numeric vector

font a numeric vector

x000 a numeric vector

money a numeric vector

hp a numeric vector

Page 33: Package ‘MMST’

spambase 33

hpl a numeric vectorgeorge a numeric vectorx650 a numeric vectorlab a numeric vectorlabs a numeric vectortelnet a numeric vectorx857 a numeric vectordata a numeric vectorx415 a numeric vectorx85 a numeric vectortechnology a numeric vectorx1999 a numeric vectorparts a numeric vectorpm a numeric vectordirect a numeric vectorcs a numeric vectormeeting a numeric vectororiginal a numeric vectorproject a numeric vectorre a numeric vectoredu a numeric vectortable a numeric vectorconference a numeric vectorx. a numeric vectorx.. a numeric vectorx...1 a numeric vectorx..1 a numeric vectorx..2 a numeric vectorx..3 a numeric vectorcrla a numeric vectorcrll a numeric vectorcrrt a numeric vectorclassdigit a factor with levels 0 1

class a factor with levels email spam

Source

http://archive.ics.uci.edu/ml/datasets.html

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Page 34: Package ‘MMST’

34 swiss.roll

steganography MMST STEGANOGRAPHY DATA

Description

steganography, 344, 345

Usage

data(steganography)

Format

A data frame with 1000 observations on 73 variables.

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Kahn, D. (1996). The history of steganography, Proceedings of Information Hiding, First Interna-tional Workshop, Cambridge, U.K.

swiss.roll MMST SWISS ROLL DATA

Description

Swiss roll, 598, 617, 619, 620, 622, 623

Usage

data(swiss.roll)

Format

A data frame with 20000 observations on the following 5 variables.

X1 a numeric vector

X2 a numeric vector

X3 a numeric vector

Y1 a numeric vector

Y2 a numeric vector

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Page 35: Package ‘MMST’

SwissBankNotes 35

SwissBankNotes MMST SWISSBANKNOTES DATA

Description

Swiss bank notes, 235

Usage

data(SwissBankNotes)

Format

A data frame with 200 observations on the following 6 variables.

length a numeric vector

height.left a numeric vector

height.right a numeric vector

inner.lower a numeric vector

inner.upper a numeric vector

diagonal a numeric vector

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

tobacco MMST TOBACCO DATA

Description

chemical composition of tobacco, 183, 187

Usage

data(tobacco)

Page 36: Package ‘MMST’

36 tumors

Format

A data frame with 25 observations on the following 9 variables.

Y1.BurnRate a numeric vector

Y2.PercentSugar a numeric vector

Y3.PercentNicotine a numeric vector

X1.PercentNitrogen a numeric vector

X2.PercentChlorine a numeric vector

X3.PercentPotassium a numeric vector

X4.PercentPhosphorus a numeric vector

X5.PercentCalcium a numeric vector

X6.PercentMagnesium a numeric vector

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Anderson, R.L. and Bancroft, T.A. (1952). Statistical Theory in Research, New York: McGraw-Hill.

tumors MMST TUMORS DATA

Description

four childhood tumors, 541, 545, 550

Usage

data(tumors)

Format

A data frame with 2308 observations on 90 variables.

Source

http://research.nhgri.nih.gov/microarray/Supplement/

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M.,Antonescu, C.R., Peterson, C., and Meltzer, P.S. (2001). Classification and diagnostic prediction ofcancers using gene expression profiling and artificial neural networks, Nature Medicine, 7, 673-679.

Page 37: Package ‘MMST’

turtles 37

turtles MMST TURTLES DATA

Description

turtle carapaces, 234

Usage

data(turtles)

Format

A data frame with 48 observations on the following 4 variables.

sex a factor with levels f m

length a numeric vector

width a numeric vector

height a numeric vector

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

ushighways MMST USHIGHWAYS DATA

Description

U.S. highways, 106

Usage

data(ushighways)

Format

A data frame with 221 observations on the following 4 variables.

Interstate a numeric vector

State a factor with levels AL AR CA CO CT DE FL GA IA ID IL IN KS KY LA MA MD ME MI MN MO MS NCNE NH NJ NY OH OK OR PA RI SC SD TN TX UT VA VT WA WI WV WY

Approx.Miles a numeric vector

Location a character vector

Page 38: Package ‘MMST’

38 vehicle

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Rand McNally (1992), Rand McNally 1993 Business Traveler’s Road Atlas and Guide to MajorCities, Rand McNally

vehicle MMST VEHICLE DATA

Description

vehicle, 274, 302, 304, 348, 391

Usage

data(vehicle)

Format

A data frame with 564 observations on the following 20 variables.

Comp a numeric vector

Circ a numeric vector

Dcirc a numeric vector

RR a numeric vector

PrAxisAR a numeric vector

MaxLAR a numeric vector

ScatterR a numeric vector

Elong a numeric vector

PrAxisRect a numeric vector

MaxLRect a numeric vector

SvarMajAxis a numeric vector

SvarMinAxis a numeric vector

SradGyration a numeric vector

SkewMajAxis a numeric vector

SkewMinAxis a numeric vector

KurtMinAxis a numeric vector

KurtMajAxis a numeric vector

Hratio a numeric vector

classdigit a factor with levels 1 2 3 4

class a factor with levels bus opel saab van

Page 39: Package ‘MMST’

wdbc 39

Source

http://archive.ics.uci.edu/ml/datasets.html

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

wdbc MMST WDBC DATA

Description

Wisconsin diagnostic breast cancer, 239, 241, 246, 247, 249, 251, 255, 256, 260, 279, 348, 387,462, 508, 550

Usage

data(wdbc)

Format

A data frame with 569 observations on the following 32 variables.

id a numeric vector

class a factor with levels B M

radius.mv a numeric vector

texture.mv a numeric vector

peri.mv a numeric vector

area.mv a numeric vector

smooth.mv a numeric vector

comp.mv a numeric vector

scav.mv a numeric vector

ncav.mv a numeric vector

symt.mv a numeric vector

fracd.mv a numeric vector

radius.sd a numeric vector

texture.sd a numeric vector

peri.sd a numeric vector

area.sd a numeric vector

smooth.sd a numeric vector

comp.sd a numeric vector

scav.sd a numeric vector

Page 40: Package ‘MMST’

40 wine

ncav.sd a numeric vector

symt.sd a numeric vector

fracd.sd a numeric vector

radius.ev a numeric vector

texture.ev a numeric vector

peri.ev a numeric vector

area.ev a numeric vector

smooth.ev a numeric vector

comp.ev a numeric vector

scav.ev a numeric vector

ncav.ev a numeric vector

symt.ev a numeric vector

fracd.ev a numeric vector

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Street, W.N., Wolberg, W.H., and Mangasarian, O.L. (1993). Nuclear feature extraction for breasttumor diagnosis, IS&T/SPIE International Symposium on Electronic Imaging: Science and Tech-nology (San Jose, CA), 1905, 861-870.

wine MMST WINE DATA

Description

wine, 275, 278, 348, 391

Usage

data(wine)

Format

A data frame with 178 observations on the following 15 variables.

Alcohol a numeric vector

MalicAcid a numeric vector

Ash a numeric vector

AlcAsh a numeric vector

Mg a numeric vector

Phenols a numeric vector

Page 41: Package ‘MMST’

x498.matrix 41

Flav a numeric vector

NonFlavPhenols a numeric vector

Proa a numeric vector

Color a numeric vector

Hue a numeric vector

OD a numeric vector

Proline a numeric vector

classdigit a factor with levels 1 2 3

class a factor with levels Barbera Barolo Grignolino

Source

http://archive.ics.uci.edu/ml/datasets.html

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

x498.matrix MMST 498 MATRIX DATA

Description

mapping the protein universe, 484, 486

Usage

data(x498.matrix)

Format

A distance matrix mapping 498 unnamed variables representing proteins.

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Hou, J., Sims, G.E., Zhang, C., and Kim, S.-H. (2003). A global representation of the protein foldspace, Proceedings of the National Academy of Sciences, 100, 2386-2390.

Page 42: Package ‘MMST’

42 yeast

yeast MMST YEAST DATA

Description

yeast, 275, 279, 348, 391, 508

Usage

data(yeast)

Format

A data frame with 1484 observations on the following 10 variables.

yeast a character vector

mcg a numeric vector

gvh a numeric vector

alm a numeric vector

mit a numeric vector

erl a numeric vector

pox a numeric vector

vac a numeric vector

nuc a numeric vector

site a factor with levels CYT ERL EXC ME1 ME2 ME3 MIT NUC POX VAC

Source

http://archive.ics.uci.edu/ml/datasets.html

References

A. Izenman (2008), Modern Multivariate Statistical Techniques, Springer

Page 43: Package ‘MMST’

Index

∗Topic IOMMST-package, 3MMST.out, 21

∗Topic datasetsAirlineDistances, 3alontop, 4baseball, 5bodyfat, 6boston, 7BritishTowns, 8bupa, 8cleveland, 9color.stimuli, 10COMBO17, 10covertype, 11detergent, 11diabetes, 12ecoli, 12foetal, 13food, 14geyser, 15gilgaied.soil, 15glass, 16Hidalgo1872, 17ionosphere, 17iris, 18letter.recognition, 18leukemia, 19lloyd, 20MEG, 21MMST-package, 3morse, 22ncifinal, 23norwaypaper1, 23pendigits, 24pet, 25pima, 25primate.scapulae, 26psych24r, 27

root.stocks, 28satimage, 29shoplifting, 29shuttle, 30soldat, 31sonar, 31spambase, 32steganography, 34swiss.roll, 34SwissBankNotes, 35tobacco, 35tumors, 36turtles, 37ushighways, 37vehicle, 38wdbc, 39wine, 40x498.matrix, 41yeast, 42

AirlineDistances, 3alontop, 4

baseball, 5bodyfat, 6boston, 7BritishTowns, 8bupa, 8

cleveland, 9color.stimuli, 10COMBO17, 10covertype, 11

detergent, 11diabetes, 12

ecoli, 12

foetal, 13food, 14

43

Page 44: Package ‘MMST’

44 INDEX

geyser, 15gilgaied.soil, 15glass, 16

Hidalgo1872, 17

ionosphere, 17iris, 18

letter.recognition, 18leukemia, 19lloyd, 20

MEG, 21MMST (MMST-package), 3MMST-package, 3MMST.out, 3, 21morse, 22

ncifinal, 23norwaypaper1, 23

pendigits, 24pet, 25pima, 25primate.scapulae, 26psych24r, 27

root.stocks, 28

samp05 (lloyd), 20samp05d (lloyd), 20samp25 (lloyd), 20samp25d (lloyd), 20satimage, 29shoplifting, 29shuttle, 30soldat, 31sonar, 31spambase, 32steganography, 34swiss.roll, 34SwissBankNotes, 35

tobacco, 35tumors, 36turtles, 37

ushighways, 37

vehicle, 38

wdbc, 39wine, 40write.table, 22

x498.matrix, 41

yeast, 42