Statistics_in_Musicology, By Jan Beran

Embed Size (px)

Citation preview

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    1/285

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    2/285

    CHAPMAN & HALL/CRC

    A CRC Press CompanyBoca Raton London New York Washington, D.C.

    I n t e r d i s c i p l i n a r y S t a t i s t i c s

    STATISTICS in

    MUSICOLOGY

    Jan Beran

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    3/285

    This book contains information obtained from authentic and highly regarded sources. Reprinted material

    is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonableefforts have been made to publish reliable data and information, but the author and the publisher cannotassume responsibility for the validity of all materials or for the consequences of their use.

    Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronicor mechanical, including photocopying, micro lming, and recording, or by any information storage orretrieval system, without prior permission in writing from the publisher.

    The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, forcreating new works, or for resale. Specic permission must be obtained in writing from CRC Press LLCfor such copying.

    Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431.

    Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and areused only for identication and explanation, without intent to infringe.

    Visit the CRC Press Web site at www.crcpress.com

    2004 by Chapman & Hall/CRC

    No claim to original U.S. Government works

    International Standard Book Number 1-58488-219-0Library of Congress Card Number 2003048488

    Printed in the United States of America 1 2 3 4 5 6 7 8 9 0Printed on acid-free paper

    Library of Congress Cataloging-in-Publication Data

    Beran, Jan, 1959-Statistics in musicology / Jan Beran.

    p. cm. (Interdisciplinary statistics series)Includes bibliographical references (p. ) and indexes.ISBN 1-58488-219-0 (alk. paper)1. Musical analysisStatistical methods. I. Title. II. Interdisciplinary statistics

    MT6.B344 2003781.2dc21 2003048488

    2004 CRC Press LLC

    http://www.crcpress.com/http://www.crcpress.com/http://www.crcpress.com/
  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    4/285

    Contents

    Pre f ace

    1 So m e m athe m ati cal fo undati o ns o f mus ic 1.1 General backgroun d1.2 Some elements of algebr a1.3 Specic applications in musi c

    2 E xpl o rato ry data m i ni ng i n mus i cal s pace s 2.1 Musical motivatio n2.2 Some descriptive statistics and plots for univariate dat a2.3 Specic applications in music univariat e

    2.4 Some descriptive statistics and plots for bivariate dat a2.5 Specic applications in music bivariat e2.6 Some multivariate descriptive display s2.7 Specic applications in music multivariat e

    3 Gl o bal m e as ure s o f s tructure and rando m ne s s 3.1 Musical motivatio n3.2 Basic principle s3.3 Specic applications in musi c

    4 Time series analysis 4.1 Musical motivatio n4.2 Basic principle s4.3 Specic applications in musi c

    5 H ie rarchi cal m e tho ds 5.1 Musical motivatio n

    5.2 Basic principle s5.3 Specic applications in musi c

    6 M arkov chai ns and hi dde n M arkov m o de l s 6.1 Musical motivatio n6.2 Basic principle s

    2004 CRC Press LLC

    http://c2190_01.pdf/http://c2190_01.pdf/http://c2190_01.pdf/http://c2190_01.pdf/http://c2190_01.pdf/http://c2190_01.pdf/http://c2190_01.pdf/http://c2190_01.pdf/http://c2190_01.pdf/http://c2190_02.pdf/http://c2190_02.pdf/http://c2190_02.pdf/http://c2190_02.pdf/http://c2190_02.pdf/http://c2190_02.pdf/http://c2190_02.pdf/http://c2190_02.pdf/http://c2190_02.pdf/http://c2190_02.pdf/http://c2190_02.pdf/http://c2190_02.pdf/http://c2190_02.pdf/http://c2190_03.pdf/http://c2190_03.pdf/http://c2190_03.pdf/http://c2190_03.pdf/http://c2190_03.pdf/http://c2190_03.pdf/http://c2190_03.pdf/http://c2190_03.pdf/http://c2190_04.pdf/http://c2190_04.pdf/http://c2190_04.pdf/http://c2190_04.pdf/http://c2190_05.pdf/http://c2190_05.pdf/http://c2190_05.pdf/http://c2190_05.pdf/http://c2190_05.pdf/http://c2190_05.pdf/http://c2190_05.pdf/http://c2190_05.pdf/http://c2190_06.pdf/http://c2190_06.pdf/http://c2190_06.pdf/http://c2190_06.pdf/http://c2190_06.pdf/http://c2190_06.pdf/http://c2190_06.pdf/http://c2190_06.pdf/http://c2190_05.pdf/http://c2190_05.pdf/http://c2190_05.pdf/http://c2190_05.pdf/http://c2190_04.pdf/http://c2190_04.pdf/http://c2190_04.pdf/http://c2190_04.pdf/http://c2190_03.pdf/http://c2190_03.pdf/http://c2190_03.pdf/http://c2190_03.pdf/http://c2190_02.pdf/http://c2190_02.pdf/http://c2190_02.pdf/http://c2190_02.pdf/http://c2190_02.pdf/http://c2190_02.pdf/http://c2190_02.pdf/http://c2190_02.pdf/http://c2190_01.pdf/http://c2190_01.pdf/http://c2190_01.pdf/http://c2190_01.pdf/
  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    5/285

    6.3 Specic applications in musi c

    7 Circular statistics

    7.1 Musical motivatio n7.2 Basic principle s7.3 Specic applications in musi c

    8 Pri nci pal comp onent anal ysi s 8.1 Musical motivatio n8.2 Basic principle s8.3 Specic applications in musi c

    9 Discriminant analysis 9.1 Musical motivatio n9.2 Basic principle s9.3 Specic applications in musi c

    10 Cl uster anal ysi s 10.1 Musical motivatio n10.2 Basic principle s10.3 Specic applications in musi c

    11 Multidimensional scaling 11.1 Musical motivatio n11.2 Basic principle s11.3 Specic applications in musi c

    Li st of gures

    References

    2004 CRC Press LLC

    http://c2190_06.pdf/http://c2190_07.pdf/http://c2190_07.pdf/http://c2190_07.pdf/http://c2190_07.pdf/http://c2190_08.pdf/http://c2190_08.pdf/http://c2190_08.pdf/http://c2190_08.pdf/http://c2190_09.pdf/http://c2190_09.pdf/http://c2190_09.pdf/http://c2190_09.pdf/http://c2190_10.pdf/http://c2190_10.pdf/http://c2190_10.pdf/http://c2190_10.pdf/http://c2190_10.pdf/http://c2190_10.pdf/http://c2190_10.pdf/http://c2190_11.pdf/http://c2190_11.pdf/http://c2190_11.pdf/http://c2190_11.pdf/http://c2190_figs.pdf/http://c2190_ref.pdf/http://c2190_ref.pdf/http://c2190_figs.pdf/http://c2190_11.pdf/http://c2190_11.pdf/http://c2190_11.pdf/http://c2190_11.pdf/http://c2190_10.pdf/http://c2190_10.pdf/http://c2190_10.pdf/http://c2190_10.pdf/http://c2190_09.pdf/http://c2190_09.pdf/http://c2190_09.pdf/http://c2190_09.pdf/http://c2190_08.pdf/http://c2190_08.pdf/http://c2190_08.pdf/http://c2190_08.pdf/http://c2190_07.pdf/http://c2190_07.pdf/http://c2190_07.pdf/http://c2190_07.pdf/http://c2190_06.pdf/
  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    6/285

    Preface

    An essential aspect of music is structure. It is therefore not surprising that aconnection between music and mathematics was recognized long before ourtime. Perhaps best known among the ancient quantitative musicologists

    are the Pythagoreans, who found fundamental connections between musi-cal intervals and mathematical ratios. An obvious reason why mathematicscomes into play is that a musical performance results in sound waves thatcan be described by physical equations. Perhaps more interesting, however,is the intrinsic organization of these waves that distinguishes music fromordinary noise. Also, since music is intrinsically linked with human per-ception, emotion, and reection as well as the human body, the scienticstudy of music goes far beyond physics. For a deeper understanding of mu-sic, a number of different sciences, such as psychology, physiology, history,

    physics, mathematics, statistics, computer science, semiotics, and of coursemusicology to name only a few need to be combined. This, togetherwith the lack of available data, prevented, until recently, a systematic de-velopment of quantitative methods in musicology. In the last few years,the situation has changed dramatically. Collection of quantitative data isno longer a serious problem, and a number of mathematical and statis-tical methods have been developed that are suitable for analyzing suchdata. Statistics is likely to play an essential role in future developmentsof musicology, mainly for the following reasons: a) statistics is concernedwith nding structure in data; b) statistical methods and structures aremathematical, and can often be carried over to various types of data statistics is therefore an ideal interdisciplinary science that can link differ-ent scientic disciplines; and c) musical data are massive and complex and therefore basically useless, unless suitable tools are applied to extractessential features.

    This book is addressed to anybody who is curious about how one may an-alyze music in a quantitative manner. Clearly, the question of how such an

    analysis may be done is very complex, and no ultimate answer can be givenhere. Instead, the book summarizes various ideas that have proven usefulin musical analysis and may provide the reader with food for thought orinspiration to do his or her own analysis. Specically, the methods and ap-plications discussed here may be of interest to students and researchers inmusic, statistics, mathematics, computer science, communication, and en-

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    7/285

    gineering. There is a large variety of statistical methods that can be appliedin music. Selected topics are discussed in this book, ranging from simpledescriptive statistics to formal modeling by parametric and nonparametric

    processes. The theoretical foundations of each method are discussed briey,with references to more detailed literature. The emphasis is on examplesthat illustrate how to use the results in musical analysis. The methodscan be divided into two groups: general classical methods and specic newmethods developed to solve particular questions in music. Examples illus-trate on one hand how standard statistical methods can be used to obtainquantitative answers to musicological questions. On the other hand, thedevelopment of more specic methodology illustrates how one may designnew statistical models to answer specic questions. The data examples are

    kept simple in order to be understandable without extended musicologicalterminology. This implies many simplications from the point of view of music theory and leaves scope for more sophisticated analysis that maybe carried out in future research. Perhaps this book will inspire the readerto join the effort.

    Chapters are essentially independent to allow selective reading. Sincethe book describes a large variety of statistical methods in a nutshell itcan be used as a quick reference for applied statistics, with examples frommusicology.

    I would like to thank the following libraries, institutes, and museums fortheir permission to print various pictures, manuscripts, facsimiles, and pho-tographs: Zentralbibliothek Z urich (Ruth H ausler, Handschriftenabteilung;Aniko Ladanyi and Michael Kotrba, Graphische Sammlung); Belmont Mu-sic Publishers (Anne Wirth); Philippe Gontier, Paris; Osterreichische PostAG; Deutsche Post AG; Elisabeth von Janoza-Bzowski, D usseldorf; Univer-sity Library Heidelberg; Galerie Neuer Meister, Dresden; Robert-Sterl-Haus(K.M. Mieth); Bela Bart ok Memorial House (Janos Szir anyi); Frank Mar-

    tin Society (Maria Martin); Karadar-Bertoldi Ensemble (Prof. FrancescoBertoldi); col legno (Wulf Weinmann). Thanks also to B. Repp for provid-ing us with the tempo data for Schumanns Tr aumerei. I would also like tothank numerous colleagues from mathematics, statistics, and musicologywho encouraged me to write this book. Finally, I would like to thank mywife and my daughter for their encouragement and support, without whichthis book could not have been written.

    Jan BeranKonstanz, March 2003

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    8/285

    CHAPTER 1

    Some mathematical foundations of mu s i c

    1. 1 General background

    The study of music by means of mathematics goes back several thousandyears. Well documented are, for instance, mathematical and philosophi-cal studies by the Pythagorean school in ancient Greece (see e.g. van derWaerden 1979). Advances in mathematics, computer science, psychology,semiotics, and related elds, together with technological progress (in par-ticular computer technology) lead to a revival of quantitative thinking inmusic in the last two to three decades (see e.g. Archibald 1972, Solomon1973, Schnitzler 1976, Balzano 1980, Gotze and Wille 1985, Lewin 1987,Mazzola 1990a, 2002, Vuza 1991, 1992a,b, 1993, Keil 1991, Lendvai 1993,Lindley and Turner-Smith 1993, Genevois and Orlarey 1997, Johnson 1997;also see Hofstadter 1999, Andreatta et al. 2001, Leyton 2001, and Babbitt1960, 1961, 1987, Forte 1964, 1973, 1989, Rahn 1980, Morris 1987, 1995,Andreatta 1997; for early accounts of mathematical analysis of music alsosee Graeser 1924, Perle 1955, Norden 1964). Many recent references can befound in specialized journals such as Computing in Musicology, Music The-ory Online, Perspectives of New Music, Journal of New Music Research,Integral, Music Perception, and Music Theory Spectrum , to name a few.

    Music is, to a large extent, the result of a subconscious intuitive pro-cess. The basic question of quantitative musical analysis is in how far musicmay nevertheless be described or explained partially in a quantitative man-ner. The German philosopher and mathematician Leibniz (1646-1716) ( Fig-ure 1.5) called music the arithmetic of the soul. This is a profound philo-sophical statement; however, the difficulty is to formulate what exactly itmay mean. Some composers, notably in the 20th century, consciously usedmathematical elements in their compositions. Typical examples are permu-tations, the golden section, transformations in two or higher-dimensional

    spaces, random numbers, and fractals (see e.g. Sch onberg, Webern, Bart ok,Xenakis, Cage, Lutoslawsky, Eimert, Kagel, Stockhausen, Boulez, Ligeti,Barlow; Figures 1.1 , 1.4, 1.15). More generally, conscious logical con-struction is an inherent part of composition. For instance, the forms of sonata and symphony were developed based on reections about well bal-anced proportions. The tormenting search for logical perfection is well

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    9/285

    Figure 1.1 Quantitative analysis of music helps to understand creative processes.(Pierre Boulez, photograph courtesy of Philippe Gontier, Paris; and Jim by J.B.)

    Figure 1.2 J.S. Bach (1685-1750). (Engraving by L. Sichling after a painting by Elias Gottlob Haussmann, 1746; courtesy of Zentralbibliothek Z urich.)

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    10/285

    documented in Beethovens famous sketchbooks. Similarily, the art of coun-terpoint that culminated in J.S. Bachs ( Figure 1.2 ) work relies to a highdegree on intrinsically mathematical principles. A rather peculiar early ac-

    count of explicit applications of mathematics is the use of permutations inchange ringing in English churches since the 10th century (Fletcher 1956,Price 1969, Stewart 1992, White 1983, 1985, 1987, Wilson 1965). Morestandard are simple symmetries, such as retrograde (e.g. Crab fugue, orCanon cancricans), inversion, arpeggio, or augmentation. A curious ex-ample of this sort is Mozarts Spiegel Duett (or mirror duett, Figures1.6, 1.7 ; the attibution to Mozart is actually uncertain). In the 20th cen-tury, composers such as Messiaen or Xenakis (Xenakis 1971; gure 1.15)attempted to develop mathematical theories that would lead to new tech-

    niques of composition. From a strictly mathematical point of view, theirderivations are not always exact. Nevertheless, their artistic contributionswere very innovative and inspiring. More recent, mathematically stringentapproaches to music theory, or certain aspects of it, are based on mod-ern tools of abstract mathematics, such as algebra, algebraic geometry,and mathematical statistics (see e.g. Reiner 1985, Mazzola 1985, 1990a,2002, Lewin 1987, Fripertinger 1991, 1999, 2001, Beran and Mazzola 1992,1999a,b, 2000, Read 1997, Fleischer et al. 2000, Fleischer 2003).

    The most obvious connection between music and mathematics is due tothe fact that music is communicated in form of sound waves. Musical soundscan therefore be studied by means of physical equations. Already in ancientGreece (around the 5th century BC), Pythagoreans found the relationshipbetween certain musical intervals and numeric proportions, and calculatedintervals of selected scales. These results were probably obtained by study-ing the vibration of strings. Similar studies were done in other cultures, butare mostly not well documented. In practical terms, these studies lead tosingling out specic frequencies (or frequency proportions) as musically

    useful and to the development of various scales and harmonic systems.A more systematic approach to physics of musical sounds, music percep-tion, and acoustics was initiated in the second half of the 19th century bypath-breaking contributions by Helmholz (1863) and other physicists (seee.g. Rayleigh 1896). Since then, a vast amount of knowledge has been ac-cumulated in this eld (see e.g. Backus 1969, 1977, Morse and Ingard 1968,1986, Benade 1976, 1990, Rigden 1977, Yost 1977, Hall 1980, Berg andStork 1995, Pierce 1983, Cremer 1984, Rossing 1984, 1990, 2000, Johnston1989, Fletcher and Rossing 1991, Graff 1975, 1991, Roederer 1995, Rossing

    et al. 1995, Howard and Angus 1996, Beament 1997, Crocker 1998, Ned-erveen 1998, Orbach 1999, Kinsler et al. 2000, Raichel 2000). For a historicaccount on musical acoustics see e.g. Bailhache (2001).

    It may appear at rst that once we mastered modeling musical soundsby physical equations, music is understood. This is, however, not so. Musicis not just an arbitrary collection of sounds music is organized sound.

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    11/285

    Figure 1.3 Ludwig van Beethoven (1770-1827). (Drawing by E. D urck after a painting by J.K. Stieler, 1819; courtesy of Zentralbibliothek Z urich.)

    Figure 1.4 Anton Webern (1883-1945). (Courtesy of Osterreichische Post AG.)

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    12/285

    Figure 1.5 Gottfried Wilhelm Leibniz (1646-1716). (Courtesy of Deutsche Post AG and Elisabeth von Janota-Bzowski.)

    Physical equations for sound waves only describe the propagation of airpressure. They do not provide, by themselves, an understanding of howand why certain sounds are connected, nor do they tell us anything (atleast not directly) about the effect on the audience. As far as structure isconcerned, one may even argue for the sake of argument that music doesnot necessarily need physical realization in form of a sound. Musiciansare able to hear music just by looking at a score. Beethoven ( Figures 1.3 ,1.16) composed his ultimate masterpieces after he lost his hearing. Thus,on an abstract level, music can be considered as an organized structurethat follows certain laws. This structure may or may not express feelingsof the composer. Usually, the structure is communicated to the audienceby means of physical sounds which in turn trigger an emotional expe-rience of the audience (not necessarily identical with the one intended bythe composer). The structure itself can be analyzed, at least partially, us-

    ing suitable mathematical structures. Note, however, that understandingthe mathematical structure does not necessarily tell us anything about theeffect on the audience. Moreover, any mathematical structure used for ana-lyzing music describes certain selected aspects only. For instance, studyingsymmetries of motifs in a composition by purely algebraic means ignorespsychological, historical, perceptual, and other important issues. Ideally, allrelevant scientic disciplines would need to interact to gain a broad under-standing. A further complication is that the existence of a unique truthis by no means certain (and is in fact rather unlikely). For instance, a

    composition may contain certain structures that are important for somelisteners but are ignored by others. This problem became apparent in theearly 20th century with the introduction of 12-tone music. The generalpublic was not ready to perceive the complex structures of dodecaphonicmusic and was rather appalled by the seemingly chaotic noise, whereas aminority of specialized listeners was enthusiastic. Another example is the

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    13/285

    comparison of performances. Which pianist is the best? This question hasno unique answer, if any. There is no xed gold standard and no uniquesolution that would represent the ultimate unchangeable truth. What one

    may hope for at most is a classication into types of performances thatare characterized by certain quantiable properties without attaching asubjective judgment of quality.

    The main focus of this book is statistics. Statistics is essential for con-necting theoretical mathematical concepts with observed reality, to ndand explore structures empirically and to develop models that can be ap-plied and tested in practice. Until recently, traditional musical analysiswas mostly carried out in a purely qualitative, and at least partially sub- jective, manner. Applications of statistical methods to questions in musicol-

    ogy and performance research are very rare (for examples see Yaglom andYaglom 1967, Repp 1992, de la Motte-Haber 1996, Steinberg 1995, Waugh1996, Nettheim 1997, Widmer 2001, Stamatatos and Widmer 2002) andmostly consist of simple applications of standard statistical tools to con-rm results or conjectures that had been known or derived before bymusicological, historic, or psychological reasoning. An interesting overviewof statistical applications in music, and many references, can be found inNettheim (1997). The lack of quantitative analysis may be explained, inpart, by the impossibility of collecting objective data. Meanwhile, how-ever, due to modern computer technology, an increasing number of musicaldata are becoming available. An in-depth statistical analysis of music istherefore no longer unrealistic. On the theoretical side, the developmentof sophisticated mathematical tools such as algebra, algebraic geometry,mathematical statistics, and their adaptation to the specic needs of mu-sic theory, made it possible to pursue a more quantitative path. Becauseof the complex, highly organized nature of music, existing, mostly qual-itative, knowledge about music must be incorporated into the process of

    mathematical and statistical modeling. The statistical methods that willbe discussed in the subsequent chapters can be divided into two categories:

    1. Classical methods of mathematical statistics and exploratory data anal-ysis: many classical methods can be applied to analyze musical struc-tures, provided that suitable data are available. A number of exampleswill be discussed. The examples are relatively simple from the point of view of musicology, the purpose being to illustrate how the appropriateuse of statistics can yield interesting results, and to stimulate the readerto invent his or her own statistical methods that are appropriate foranswering specic musicological questions.

    2. New methods developed specically to answer concrete questions in mu-sicology: in the last few years, questions in music composition and per-formance lead to the development of new statistical methods that arespecically designed to solve questions such as classication of perfor-

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    14/285

    mance styles, identication and modeling of metric, melodic, and har-monic structures, quantication of similarities and differences betweencompositions and performance styles, automatic identication of musi-

    cal events and structures from audio signals, etc. Some of these methodswill be discussed in detail.A mathematical discipline that is concerned specically with abstract de-nitions of structures is algebra. Some elements of basic algebra are thereforediscussed in the next section. Naturally, depending on the context, othermathematical disciplines also play an equally important role in musicalanalysis, and will be discussed later where necessary. Readers who are fa-miliar with modern algebra may skip the following section. A few examplesthat illustrate applications of algebraic structures to music are presentedin Section 1.3. An extended account of mathematical approaches to musicbased on algebra and algebraic geometry is given, for instance, in Mazzola(1990a, 2002) (also see Lewin 1987 and Benson 1995-2002).

    1.2 Some elements of algebra

    1.2.1 Motivation

    Algebraic considerations in music theory have gained increasing popularity

    in recent years. The reason is that there are striking similarities betweenmusical and algebraic structures. Why this is so can be illustrated by a sim-ple example: notes (or rather pitches) that differ by an octave can be con-sidered equivalent with respect to their harmonic meaning. If an instru-ment is tuned according to equal temperament, then, from the harmonicperspective, there are only 12 different notes. These can be represented asintegers modulo 12. Similarily, there are only 12 different intervals. Thismeans that we are dealing with the set Z12 = {0, 1, ..., 11}. The sum of twoelements x, y

    Z 12 , z = x + y is interpreted as the note/interval resulting

    from increasing the note/interval x by the interval y. The set Z 12 of notes(intervals) is then an additive group (see denition below).

    1.2.2 Denitions and results

    We discuss some important concepts of algebra that are useful to describemusical structures. A more comprehensive overview of modern algebra canbe found in standard text books such as those by Albert (1956), Herstein(1975), Zassenhaus (1999), Gilbert (2002), and Rotman (2002).

    The most fundamental structures in algebra are group, ring, eld, mod-ule, and vector space.Denition 1 Let G be a nonempty set with a binary operation + such that a + b G for all a, b G and the following holds:1. (a + b) + c = a + ( b + c) (Associativity)

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    15/285

    2. There exists a zero element 0 G such that 0 + a = a + 0 = a for all a G3. For each a G, there exists an inverse element (a ) G such that

    (a) + a = a + ( a ) = 0Then (G, +) is called a group. The group (G, +) is called commutative (or abelian), if for each a, b G, a + b = b + a. The number of elements in Gis called order of the group and is denoted by o(G). If the order is nite,then G is called a nite group.In a multiplicative way this can be written asDenition 2 Let G be a nonempty set with a binary operation such that a b G for all a, b G and the following holds:1. (a b) c = a (b c) (Associativity)2. There exists an identity element e G such that e a = a e = a for all a G3. For each a G, there exists an inverse element a

    1

    G such that

    a 1 a = a a 1 = e

    Then (G, ) is called a group. The group (G, ) is called commutative (or abelian), if for each a, b G, a b = b a.For subsets we haveDenition 3 Let (G, ) and (H, ) be groups and H G. Then H is called subgroup of G.Some groups can be generated by a single element of the group:Denition 4 Let (G, ) be a group with n < elements denoted by a i(i = 0 , 1,...,n 1) and such that 1. ao = a n = e2. a i a j = a i + j if i + j

    n and a i a j = a i + j n if i + j > n

    Then G is called a cyclic group. Furthermore, if G = ( a) = {a i : i Z }where a i denotes the product with all i terms equal to a, then a is called a generator of G.An important notion is given in the followingDenition 5 Let G be a group that acts on a set X by assigning to each x X and g G an element g(x) X. Then, for each x X, the set G(x) = {y : y = g(x), g G} is called orbit of x.Note that, given a group G that acts on X, the set X is partitioned intodisjoint orbits.

    If there are two operations + and , then a ring is dened byDenition 6 Let R be a nonempty set with two binary operations + and

    such that the following holds:1. (R, +) is an abelian group

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    16/285

    2. a b R for all a, b R3. (a b) c = a (b c) (Associativity)4. a

    (b + c) = a

    b + a

    c and (b + c)

    a = b

    a + c

    a (distributive law)

    Then (R, + , ) is called an (associative) ring. If also a b = b a for all a, b R , then R is called a commutative ring.Further useful denitions are:Denition 7 Let R be a commutative ring and a R, a = 0 such that there exists an element b R, b = 0 with a b = 0 . Then a is called a zero-divisor. If R has no zero-divisors, then it is called an integral domain.Denition 8 Let R be a ring such that (R

    \ {0

    },

    ) is a group. Then R is

    called a division ring. A commutative division ring is called a eld.A module is dened as follows:Denition 9 Let (R, + , ) be a ring and M a nonempty set with a binary operation + . Assume that 1. (M, +) is an abelian group2. For every r R , m M , there exists an element r m M 3. r

    (a + b) = r

    a + r

    b for every r

    R , m

    M

    4. r (s b) = ( r s) a for every r, s R , m M 5. (r + s) a = r a + s a for every r, s R , m M Then M is called an Rmodule or module over R. If R has a unit element e and if e a = a for all a M , then M is called a unital Rmodule. A a unital Rmodule where R is a eld is called a vector space over R .

    There is an enormous amount of literature on groups, rings, modules,etc. Some of the standard results are summarized, for instance, in text

    books such as those given above. Here, we cite only a few theorems thatare especially useful in music. We start with a few more denitions.Denition 10 Let H G be a subgroup of G such that for every a G,a H a

    1

    H . Then H is called a normal subgroup of G.

    Denition 11 Let G be such that the only normal subgroups are H = Gand H = {e}. Then G is called a simple group.Denition 12 Let G be a group and H 1 ,...,H n normal subgroups such that

    G = H 1 H 2 H n (1.1)and any a G can be written uniquely as a product

    a = b1 b2 bn (1.2)with bi H i . Then G is said to be the (internal) direct product of H 1 ,...,H n .

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    17/285

    Denition 13 Let G1 and G2 be two groups, dene G = G1 G2 ={(a, b) : a G1 , b G2 } and the operation by (a 1 , b1 ) (a 2 , b2 ) = ( a 1 a 2 , b1 b2 ). Then the group (G, ) is called the (external) direct product of G 1 and G2 .Denition 14 Let M be an Rmodule and M 1 ,...,M n submodules such that every a M can be written uniquely as a sum

    a = a 1 + a2 + ... + an (1.3)

    with a i M i . Then M is said to be the direct sum of M 1 ,...,M n .We now turn to the question which subgroups of nite groups exist.Theorem 1 Let H be a subgroup of a nite group G. Then o(H ) is a

    divisor of o(G).Theorem 2 (Sylow) Let G be a group and p a prime number such that pmis a divisor of o(G). Then G has a subgroup H with o(H ) = pm .Denition 15 A subgroup H G such that pm is a divisor of o(G) but pm +1 is not a divisor, is called a pSylow subgroup.The next theorems help to decide whether a ring is a eld.Theorem 3 Let R be a nite integral domain. Then R is a eld.Corollary 1 Let p be a prime number and R = Z p =

    {x mod p : x

    N

    }be the set of integers modulo p (with the operations m + and dened accordingly). Then R is a eld.An essential way to compare algebraic structures is in terms of operation-

    preserving mappings. The following denitions are needed:Denition 16 Let (G1 , ) and (G 2 , ) be two groups. A mapping g : G1 G 2 such that

    g(a b) = g(a ) g(b) (1.4)is called a (group-)homomorphism. If g is a one-to-one (group-)homomorph-ism, then it is called an isomorphism (or group-isomorphism). Moreover,if G1 = G2 , then g is called an automorphism (or group-automorphism).Denition 17 Two groups G1 , G 2 are called isomorphic, if there is an isomorphism g : G1 G2 .Analogous denitions can be given for rings and modules:Denition 18 Let R1 and R2 be two rings. A mapping g : G1 G2 such that

    g(a + b) = g(a ) + g(b) (1.5)and

    g(a b) = g(a ) g(b) (1.6)is called a (ring-)homomorphism. If g is a one-to-one (ring-)homomorphism,then it is called an isomorphism (or ring-isomorphism). Furthermore, if R 1 = R 2 , then g is called an automorphism (or ring-automorphism).

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    18/285

    Denition 19 Two rings R1 , R 2 are called isomorphic, if there is an iso-morphism g : R 1 R 2 .Denition 20 Let M 1 and M 2 be two modules over R . A mapping g :

    M 1 M 2 such that for every a, b M 1 , r R ,g(a + b) = g(a ) + g(b) (1.7)and

    g(r a) = r g(a ) (1.8)is called a (module-)homomorphism (or a linear transformation). If g is a one-to-one (module-)homomorphism, then it is called an isomorphism (or module-isomorphism). Furthermore, if G1 = G2 , then g is called an automorphism (or module-automorphism).Denition 21 Two modules M 1 , M 2 are called isomorphic, if there is an isomorphism g : M 1 M 2 .Finally, a general family of transformations is dened byDenition 22 Let g : M 1 M 2 be a (module-)homomorphism. Then a mapping h : M 1 M 2 dened by

    h(a) = c + g(a ) (1.9)

    with c

    M 2 is called an affine transformation. If M 1 = M 2 , then g is called

    a symmetry of M . Moreover, if g is invertible, then it is called an invertible symmetry of M .

    Studying properties of groups is equivalent to studying groups of auto-morphisms:Theorem 4 (Cayleys theorem) Let G be a group. Then there is a set S such that G is isomorphic to a subgroup of A(S ) where A(S ) is the set of all one-to-one mappings of S onto itself.Denition 23 Let G be a nite group. Then the group (

    A(S ),

    ) (where

    a b denotes successive application of the functions a and b) is called the symmetric group of order n , and is denoted by S n .Note that S n is isomorphic to the group of permutations of the numbers1, 2,...,n , and has n ! elements. Another important concept is motivated byrepresentation in coordinates as we are used to from euclidian geometry.The representation follows since, in terms of isomorphy, the inner and outerproduct can be shown to be equivalent:Theorem 5 Let G = H 1 H 2 H n be the internal direct product of H 1 ,...,H n and G

    = H 1 H 2 ... H n the external direct product. Then G and G are isomorphic, through the isomorphism g : G G dened by g(a 1 ,...,a n ) = a 1 a 2 ... a n .This theorem implies that one does not need to distinguish between theinternal and external direct product. The analogous result holds for mod-ules:

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    19/285

    Theorem 6 Let M be a direct sum of M 1 ,...,M n . Then M is isomor-phic to the module M = {(a 1 , a 2 ,...,a n ) : a i M i} with the opera-tions (a 1 , a 2 ,... ) + ( b1 , b2 ,... ) = ( a 1 + b1 , a 2 + b2 ,... ) and r (a 1 , a 2 ,... ) =(r a 1 , r a 2 ,... ).Thus, a module M = M 1 + M 2 + ... + M n can be described in terms of its coordinates with respect to M i (i = 1 ,...,n ) and the structure of M isknown as soon as we know the structure of M i (i = 1 ,...,n ).

    Direct products can be used, in particular, to characterize the structureof nite abelian groups:Theorem 7 Let (G, ) be a nite commutative group. Then G is isomor-phic to the direct product of its Sylow-subgroups.Theorem 8 Let (G, ) be a nite commutative group. Then G is the direct product of cyclic groups.Similar, but slightly more involved, results can be shown for modules, butwill not be needed here.

    1.3 Specic applications in music

    In the following, the usefulness of algebraic structures in music is illus-trated by a few selected examples. This is only a small selection fromthe extended literature on this topic. For further reading see e.g. Graeser(1924), Sch onberg (1950), Perle (1955), Fletcher (1956), Babbitt (1960,1961), Price (1969), Archibald (1972), Halsey and Hewitt (1978), Balzano(1980), Rahn (1980), Gotze and Wille (1985), Reiner (1985), Berry (1987),Mazzola (1990a, 2002 and references therein), Vuza (1991, 1992a,b, 1993),Fripertinger (1991), Lendvai (1993), Benson (1995-2002), Read (1997), Noll(1997), Andreatta (1997), Stange-Elbe (2000), among others.

    1.3.1 The Mathieu groupIt can be shown that nite simple groups fall into families that can bedescribed explicitly, except for 26 so-called sporadic groups. One such groupis the so-called Mathieu group M 12 which was discovered by the Frenchmathematician Mathieu in the 19th century (Mathieu 1861, 1873, also seee.g. Conway and Sloane 1988). In their study of probabilistic properties of (card) shuffling, Diaconis et al. (1983) show that M 12 can be generated bytwo permutations (which they call Mongean shuffles ), namely

    1 = 1 2 3 4 5 6 7 8 9 10 11 127 6 8 5 9 4 10 3 11 2 12 1 (1.10)

    and

    2 = 1 2 3 4 5 6 7 8 9 10 11 126 7 5 8 4 9 3 10 2 11 1 12 (1.11)

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    20/285

    where the low rows denote the image of the numbers 1 , ..., 12. The orderof this group is o(M 12 ) = 95040 (!) An interesting application of thesepermutations can be found in Ile de feu 2 by Olivier Messiaen (Berry 1987)

    where 1 and 2 are used to generate sequences of tones and durations.

    1.3.2 Campanology

    A rather peculiar example of group theory in action (though perhapsrather trivial mathematically) is campanology or change ringing (Fletcher1956, Wilson 1965, Price 1969, White 1983, 1985, 1987, Stewart 1992). Theart of change ringing started in England in the 10th century and is stillperformed today. The problem that is to be solved is as follows: there are

    k swinging bells in the church tower. One starts playing a melody thatconsists of a certain sequence in which the bells are played, each bell be-ing played only once. Thus, the initial sequence is a permutation of thenumbers 1 ,...,k . Since it is not interesting to repeat the same melody overand over, the initial melody has to be varied. However, the bells are veryheavy so that it is not easy to change the timing of the bells. Each variationis therefore restricted, in that in each round only one pair of adjacentbells can exchange their position. Thus, for instance, if k = 4 and the pre-vious sequence was (1 , 2, 3, 4), then the only permissible permutations are(2, 1, 3, 4), (1 , 3, 2, 4), and (1 , 2, 4, 3). A further, mainly aesthetic restictionis that no sequence should be repeated except that the last one is iden-tical with the initial sequence. A typical solution to this problem is, forinstance, the Plain Bob that starts by (1 , 2, 3, 4), (2 , 1, 4, 3), (2 , 4, 1, 3),...and continues until all permutations in S 4 are visited.

    1.3.3 Representation of music

    Many aspects of music can be embedded in a suitable algebraic module(see e.g. Mazzola 1990a). Here are some examples:1. Apart from glissando effects, the essential frequencies in most types of

    music are of the form

    = oK

    i =1

    px ii (1.12)

    where K < , o is a xed basic frequency, pi are certain xed primenumbers and xi Q . Thus, = log = o +

    K

    i =1

    x i i (1.13)

    where o = log o , i = log pi (i 1). Let = { : =K i =1 x i i , x i Q} be the set of all log-frequencies generated this way. Then is amodule over Q . Two typical examples are:

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    21/285

    (a) o = 440 Hz , K = 3 , 1 = 2 , 2 = 3 , 3 = 5 : This is the so-calledEuler module in which most Western music operates. An importantsubset consists of frequencies of the just intonation with the pure in-

    tervals octave (ratio of frequencies 2), fth (ratio of frequencies=3/2)and major third (ratio of frequencies 5/4):

    = log = log 440 + x1 log 2 + x2 log 3 + x3 log 5 (1.14)

    (x i Z ). The notes (frequencies) can then be represented by pointsin a three-dimensional space of integers Z 3 . Note that, using the nota-tion a = ( a 1 , a 2 , a 3 ) and b = ( b1 , b2 , b3 ), the pitch obtained by additionc = a + b corresponds to the frequency o 2a 1 + b1 3a 2 + b2 5a 3 + b3 .

    (b) o = 440 Hz , K = 1 , 1 = 2 , and x = p12 , where p Z : Thiscorresponds to the well-tempered tuning where an octave is divided

    into equal intervals. Thus, the ratio 2 is decomposed into 12 ratios12 2 so that

    = log 440 + p12

    log 2 (1.15)

    If notes that differ by one or several octaves are considered equiva-lent, then we can identify the set of notes with the Zmodule Z12 =

    {0, 1, ..., 11

    }.

    2. Consider a nite module of notes (frequencies), such as for instance thewell-tempered module M = Z12 . Then a scale is an element of S ={(x1 ,...,x k ) : k |M |, x i M, x i = xj (i = j )}, the set of all nitevectors with different components.

    1.3.4 Classication of circular chords and other musical objects

    A central element of classical theory of harmony is the triad. An alge-braic property that distinguishes harmonically important triads from otherchords can be described as follows: let x1 , x 2 , x 3 Z 12 , such that (a) x i = x j(i= j ) and (b) there is an inner symmetry g : Z12 Z12 such that{y : y = gk (x 1 ), k N} = {x1 , x 2 , x 3 }. It can be shown that all chords(x1 , x 2 , x 3 ) for which (a) and (b) hold are standard chords that are har-monically important in traditional theory of harmony. Consider for instancethe major triad ( c,e,g ) = (0 , 4, 7) and the minor triad ( c,e , g) = (0 , 3, 7).For the rst triad, the symmetry g(x) = 3 x + 7 yields the desired result:

    g(0) = 7 = g, g(7) = 4 = e and g(4) = 7 = g. For the minor triad theonly inner symmetry is g(x) = 3 x + 3 with g(7) = 0 = c, g(0) = 3 = eand g(3) = 0 = c. This type of classication of chords can be carried overto more complicated congurations of notes (see e.g. Mazzola 1990a, 2002,Straub 1989). In particular, musical scales can be classied by comparingtheir inner symmetries.

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    22/285

    1.3.5 Torus of thirds

    Consider the group G = ( Z 12 , +) of pitches modulo octave. Then G isisomorphic to the direct sum of the Sylow groups Z3 and Z4 by applyingthe isomorphism

    g : Z 12 Z 3 + Z 4 , (1.16)x y = ( y1 , y2 ) = ( x mod3 , x mod 4) (1.17)

    Geometrically, the elements of Z3 + Z4 can be represented as points ona torus, y1 representing the position on the vertical meridian and y2 theposition on the horizontal equatorial circle ( Figure 1.8 ). This representationhas a musical meaning: a movement along a meridian corresponds to a

    major third, whereas a movement along a horizontal circle corresponds toa minor third. One then can dene the torus-distance dtorus (x, y ) byequating it to the minimal number of steps needed to move from x to y.The value of dtorus (x, y ) expresses in how far there is a third-relationshipbetween x and y. The possible values of dtorus are 0 (if x = y), 1, 2, and3 (smallest third-relationship). Note that dtorus can be decomposed intod3 + d4 where d3 counts the number of meridian steps and d4 the numberof equatorial steps.

    1.3.6 Transformations

    For suitably chosen integers p1 , p2 , p3 , p4 , consider the four-dimensionalmodule M = Z p 1 Z p 2 Z p 3 Z p 4 over Z where the coordinates rep-resent onset time, pitch (well-tempered tuning if p2 = 12), duration, andvolume. Transformations in this space play an essential role in music. A se-lection of historically relevant transformations used by classical composersis summarized in Table 1.1 (also see Figure 1.13 ).

    Generally, one may say that affine transformations are most important,and among these the invertible ones. In particular, it can be shown that eachsymmetry of Z12 can be written as a product (in the group of symmetriesSymm (Z 12 )) of the following musically meaningful transformations:

    Multiplication by 1 (inversion); Multiplication by 5 (ordering of notes according to circle of quarts); Addition of 3 (transposition by a minor third); Addition of 4 (transposition by a major third).All these transformations have been used by composers for many centuries.Some examples of apparent similarities between groups of notes (or motifs)are shown in Figures 1.10 through 1.12. In order not to clutter the pic-tures, only a small selection of similar motifs is marked. In dodecaphonicand serial music, transformation groups have been applied systematically(see e.g. Figure 1.9 ). For instance, in Sch obergs Orchestervariationen op.

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    23/285

    Table 1.1 Some affine transformations used in classical music

    Function Musical meaning

    Shift: f (x ) = x + a Transposition, repetition,change of duration,change of loudness

    Shear, e.g. of x = ( x 1 , ..., x 4 ) t Arpeggiow.r.t. line y = o + t (0, 1, 0, 0):f (x ) = x + a (0, 1, 0, 0)for x not on line,f (x ) = x for x on line

    Reection, e.g. w.r.t. Retrograde, inversionv = ( a, 0, 0, 0):f (x ) = ( a (x 1 a ), x 2 , x 3 , x 4 )

    Dilatation, e.g. w.r.t. pitch: Augmentationf (x ) = ( x 1 , a x 2 , x 3 , x 4 )

    Exchange of coordinates: Exchange of parametersf (x ) = ( x 2 , x 1 , x 3 , x 4 ) (20th century)

    31, the full orbit generated by inversion, retrograde and transposition isused. Webern used 12-tone series that are diagonally symmetric in thetwo-dimensional space spanned by pitch and onset time. Other famous ex-amples include Eimerts rotation by 45 degrees together with a dilatationby 2 (Eimert 1964) and serial compositions such as Boulezs Structuresand Stockhausens Kontra-Punkte. With advanced computer technol-ogy (e.g. composition soft- and hardware such as Xenakis UPIC graph-ics/computer system or the recently developed Presto software by Mazzola1989/1994), the application of affine transformations in musical spaces of arbitrary dimension is no longer the tedious work of the early dodecaphonicera. On the contrary, the practical ease and enormous artistic exibilitylead to an increasing popularity of computer aided transformations amongcontemporary composers (see e.g. Iannis Xenakis, Kurt Dahlke, WilfriedJentzsch, Guerino Mazzola 1990b, Dieter Salbert, Karl-Heinz Sch oppner,

    Tamas Ungvary, Jan Beran 1987, 1991, 1992, 2000; cf. Figure 1.14 ).

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    24/285

    Spiegel-DuettAllegro q =120

    (W.A. Mozart)

    Violin

    m f

    7

    Vln.

    12

    Vln.

    18

    Vln.

    22

    Vln.

    27

    Vln.

    32

    Vln.

    36

    Vln.

    41

    Vln.

    46

    Vln.

    51

    Vln.

    57

    Vln.

    60

    Vln.

    Figure 1.6 W.A. Mozart (1759-1791) (authorship uncertain) Spiegel-Duett.

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    25/285

    Figure 1.7 Wolfgang Amadeus Mozart (1756-1791). (Engraving by F. M uller af-ter a painting by J.W. Schmidt; courtesy of Zentralbibliothek Z urich.)

    Figure 1.8 The torus of thirds Z 3 + Z 4 .

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    26/285

    Figure 1.9 Arnold Sch onberg Sketch for the piano concert op. 42 notes with tone row and its inversions and transpositions. (Used by permission of Belmont Music Publishers.)

    Figure 1.10 Notes of Air by Henry Purcell. (For better visibility, only a small selection of related motifs is marked.)

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    27/285

    Figure 1.11 Notes of Fugue No. 1 (rst half) from Das Wohltemperierte Klavier by J.S. Bach. (For better visibility, only a small selection of related motifs is marked.)

    Figure 1.12 Notes of op. 68, No. 2 from Album f ur die Jugend by Robert Schu-mann. (For better visibility, only a small selection of related motifs is marked.)

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    28/285

    Figure 1.13 A miraculous transformation caused by high exposure to Wagner operas. (Caricature from a 19th century newspaper; courtesy of Zentralbibliothek Z urich.)

    Figure 1.14 Graphical representation of pitch and onset time in Z 271 together with instrumentation of polygonal areas. (Excerpt from Santi Piano concert No. 2 by Jan Beran, col legno CD 20062; courtesy of col legno, Germany.)

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    29/285

    Figure 1.15 Iannis Xenakis (1922-1998). (Courtesy of Philippe Gontier, Paris.)

    Figure 1.16 Ludwig van Beethoven (1770-1827). (Courtesy of Zentralbibliothek Z urich.)

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    30/285

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    31/285

    q =100 (72)

    Trumerei op. 15, No. 7Robert Schumann

    Piano

    p

    5

    9

    ritard.

    13

    17

    ritard.

    a tempo

    21

    23

    ritard.

    p

    Figure 2.1 Robert Schumann (1810-1856) Tr aumerei op. 15, No. 7.

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    32/285

    onset time

    l o g ( t e m p o

    )

    0 10 20 30

    - 1 5

    - 1 0

    - 5

    0

    1947

    1963

    1965

    Figure 2.2 Tempo curves of Schumanns Tr aumerei performed by Vladimir Horowitz.

    2 . 2 S o m e d e s cri p t i ve s t at i s t i cs an d p l o t s f o r u n i vari at e d at a

    2.2.1 Denitions

    We give a brief summary of univariate descriptive statistics. For a com-prehensive discussion we refer the reader to standard text books such asTukey (1977), Mosteller and Tukey (1977), Hoaglin (1977), Tufte (1977),Velleman and Hoaglin (1981), Chambers et al. (1983), Cleveland (1985).

    Suppose that we observe univariate data x1 , x 2 ,...,x n . To summarizegeneral characteristics of the data, various numerical summary statistics

    can be calculated. Essential features are in particular center (location),variability, asymmetry, shape of distribution, and location of unusual values(outliers). The most frequently used statistics are listed in Table 2.1 .

    We recall a few well known properties of these statistics: Sample mean: The sample mean can be understood as the center of

    gravity of the data, whereas the median divides the sample in two halves

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    33/285

    Table 2.1 Simple descriptive statistics

    Name Denition Feature measured

    Empirical distribution F n (x) = n 1 n

    i =1 1{x i x} Proportion of function obs. xMinimum xmin = min {x1 ,...,x n } Smallest valueMaximum xmin = max {x1 ,...,x n } Largest valueRange xrange = xmax xmin Total spreadSample mean x = n 1 ni =1 xi Center

    Sample median M = inf {x : F n (x) 12 } CenterSample quantile q = inf {x : F n (x) } Border of lower 100%Lower and upper Q1 = q 1

    4, Q 2 = q 3

    4Border of

    quartile lower 25% ,upper 75%

    Sample variance s2 = ( n 1) 1 n

    i =1 (x i x)2 Variability

    Sample standard s = + s 2 Variabilitydeviation

    Interquartile range IQR = Q 2 Q 1 VariabilitySample skewness m3 = n

    1 ni =1 [(x i x)/s ]

    3 Asymmetry

    Sample kurtosis m4 = n 1 n

    i =1 [(x i x)/s ]4

    3 Flat/sharp peak

    with an (approximately) equal number of observations. In contrast to themedian, the mean is sensitive to outliers, since observations that are farfrom the majority of the data have a strong inuence on its value.

    Sample standard deviation: The sample standard deviation is a measureof variability. In contrast to the variance, s is directly comparable withthe data, since it is measured in the same unit. If observations are drawnindependently from the same normal probability distribution (or a dis-

    tribution that is similar to a normal distribution), then the following ruleof thumb applies: (a) approximately 68% of the data are in the intervalx s; (b) approximately 95% of the data are in the interval x 2s; (c)almost all data are in the interval x 3s. For a sufficiently large samplesize, these conclusions can be carried over to the population from whichthe data were drawn.

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    34/285

    Interquartile range: The interquartile range also measures variability. Itsadvantage, compared to s, is that it is much less sensitive to outliers. If the observations are drawn from the same normal probability distribu-

    tion, then IQR/ 1.35 (or more precisely IQR/ [ 1

    (0.75) 1

    (0.25)]where 1 is the quantile function of the standard normal distribution)estimates the same quantity as s, namely the population standard devi-ation.

    Quantiles: For = in (i = 1 ,...,n ), q coincides with at least one ob-servation. For other values of , q can be dened as in Table 1.1 or,alternatively, by interpolating neighboring observed values as follows: let = in < < =

    i+1n . Then the interpolated quantile q is dened by

    q = q + 1/n (q q ) (2.1)Note that a slightly different convention used by some statisticians is tocall inf {x : F n (x) } the ( 0.5n )-quantile (see e.g. Chambers et al.1983).

    Skewness: Skewness measures symmetry/asymmetry. For exactly sym-metric data, m3 = 0 , for data with a long right tail m3 > 0, for datawith a long left tail m3 < 0.

    Kurtosis: The kurtosis is mainly meaningful for unimodal distributions,i.e. distributions with one peak. For a sample from a normal distribution,m4 0. The reason is that then E [(X )4 ] = 34 where = E (X ).For samples from unimodal distributions with a sharper or atter peakthan the normal distribution, we then tend to have m4 > 0 and m4 < 0respectively.

    Simple, but very useful graphical displays are:

    Histogram: 1. Divide an interval ( a, b] that includes all observations intodisjoint intervals I 1 = ( a1 , b1],...,I k = ( ak , bk ]. 2. Let n1 ,...,n k be thenumber of observations in the intervals I 1 ,...,I k respectively. 3. Aboveeach interval I j , plot a rectangle of width wj = bj aj and heighth j = n j /w j . Instead of the absolute frequencies, one can also use relativefrequencies n j /n where n = n1 + ... + n k . The essential point is that thearea is proportional to nj . If the data are drawn from a probabilitydistribution with density function f, then the histogram is an estimateof f.

    Kernel estimate of a density function: The histogram is a step function,and in that sense does not resemble most density functions. This can beimproved as follows. If the data are realizations of a continuous randomvariable X with distribution F (x) = P (X x) =

    x f (u)du, then a

    smooth estimate of the probability density function f can be dened bya kernel estimate (Rosenblatt 1956, Parzen 1962, Silverman 1986) of the

    2004 CRC Press LLC

    http://c2190_01.pdf/http://c2190_01.pdf/
  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    35/285

    form

    f (x) = 1

    nb

    n

    i =1

    K (x i x

    b ) (2.2)

    where K (u) = K (u) 0 and

    K (u)du = 1 . Most kernels used inpractice also satisfy the condition K (u) = 0 for |u| > 1. The band-width b then species which data in the neighborhood of x are usedto estimate f (x). In situations where one has partial knowledge of theshape of f, one may incorporate this into the estimation procedure. Forinstance, Hjort and Glad (2002) combine parametric estimation basedon a preliminary density function f (x; ) with kernel smoothing of theremaining density f /f (x; ). They show that major efficiency gainscan be achieved if the preliminary model is close to the truth.

    Barchart: If data can assume only a few different values, or if data arequalitative (i.e. we only record which category an item belongs to), thenone can plot the possible values or names of categories on the x-axis andon the vertical axis the corresponding (relative) frequencies.

    Boxplot (simple version): 1. Calculate Q1 , M , Q 2 and IQR = Q2 Q1 .2. Draw parallel lines (in principle of arbitrary length) at the levelsQ1 , M , Q 2 , A1 = Q1 32 IQR,A 2 = Q2 + 32 IQR,B 1 = Q1 3IQR andB2 = Q1 + 3 IQR. The points A1 , A2 are called inner fence , and B1 , B 2are called outer fence . 3. Identify the observation(s) between Q1 and A1that is closest to A1 and draw a line connecting Q1 with this point. Dothe same for Q2 and A2 . 4. Identify observation(s) between A1 and B1and draw points (or other symbols) at those places. Do the same for

    A2 and B2 . 5. Draw points (or other symbols) for observations beyondB1 and B2 respectively. The boxplot can be interpreted as follows: therelative positions of Q1 , M , Q 2 and the inner and outer fences indicatesymmetry or asymmetry. Moreover, the distance between Q1 and Q2 isthe IQR and thus measures variability. The inner and outer fences helpto identify outliers, i.e. values lying unusually far from most of the otherobservations.

    Q-q-plot for comparing two data sets x1 ,...,x n and y1 ,...,y m : 1. Denea certain number of points 0 < p 1 < ... < p k 1 (the standard choice is: pi = i 0.5N where N = min( n, m )). 2. Plot the pi -quantiles ( i = 1 ,...,N )of the yobservations versus those of the xobservations. Alternativeplots for comparing distributions are discussed e.g. in Ghosh and Beran(2000) and Ghosh (1996, 1999).

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    36/285

    2 . 3 Sp e ci c appl i cati o ns i n mus ic uni vari ate

    2.3.1 Tempo curves

    Figure 2.3 displays 28 tempo curves for performances of Schumanns Tr au-merei op. 15, No. 7, by 24 pianists. The names of the pianists and datesof the recordings (in brackets) are Martha Argerich (before 1983), ClaudioArrau (1974), Vladimir Ashkenazy (1987), Alfred Brendel (before 1980),Stanislav Bunin (1988), Sylvia Capova (before 1987), Alfred Cortot (1935,1947 and 1953), Clifford Curzon (about 1955), Fanny Davies (1929), J orgDemus (about 1960), Christoph Eschenbach (before 1966), Reine Gianoli(1974), Vladimir Horowitz (1947, before 1963 and 1965), Cyprien Katsaris(1980), Walter Klien (date unknown), Andre Krust (about 1960), Antonin

    Kubalek (1988), Benno Moisewitsch (about 1950), Elly Ney (about 1935),Guiomar Novaes (before 1954), Cristina Ortiz (before 1988), Artur Schn-abel (1947), Howard Shelley (before 1990), Yakov Zak (about 1960).

    Tempo is more likely to be varied in a relative rather than absolute way.For instance, a musician plays a certain passage twice as fast as the previ-ous one, but may care less about the exact absolute tempo. This suggestsconsideration of the logarithm of tempo. Moreover, the main interest lies incomparing the shapes of the curves. Therefore, the plotted curves consistof standardized logarithmic tempo (each curve has sample mean zero andvariance one).

    Schumanns Tr aumerei is divided into four main parts, each consistingof about eight bars, the rst two and the last one being almost identi-cal (see Figure 2.1 ). Thus, the structure is: A, A , B, and A . Already avery simple exploratory analysis reveals interesting features. For each pi-anist, we calculate the following statistics for the four parts respectively:x ,M,s,Q 1 , Q2 , m 3 and m4 . Figures 2.4a through e show a distinct patternthat corresponds to the division into A, A , B, and A . Tempo is much

    lower in A and generally highest in B. Also, A seems to be played at aslightly slower tempo than A though this distinction is not quite so clear(Figures 2.4a ,b). Tempo is varied most towards the end and considerablyless in the rst half of the piece ( Figures 2.4c ). Skewness is generally nega-tive which is due to occasional extreme ritardandi. This is most extremein part B and, again, least pronounced in the rst half of the piece ( A, A ).A mirror image of this pattern, with most extreme positive values in B,is observed for kurtosis. This indicates that in B (and also in A ), mosttempo values vary little around an average value, but occasionally extreme

    tempo changes occur. Also, for A, there are two outliers with an extremlynegative skewness these turn out to be Fanny Davies and J org Demus.Figures 2.4f through h show another interesting comparison of boxplots.In Figure 2.4f , the differences between the lower quartiles in A and Afor performances before 1965 are compared with those from performancesrecorded in 1965 or later. The clear difference indicates that, at least for the

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    37/285

    onset time

    l o g

    ( t e m p o )

    0 10 20 30

    - 1 0 0

    - 8 0

    - 6 0

    - 4 0

    - 2 0

    0 ARGERICHARRAU

    ASKENAZE

    BRENDEL

    BUNIN

    CAPOVA

    CORTOT1

    CORTOT2

    CORTOT3

    CURZON

    DAVIES

    DEMUSESCHENBACH

    GIANOLI

    HOROWITZ1

    HOROWITZ2

    HOROWITZ3

    KATSARIS

    KLIEN

    KRUST

    KUBALEKMOISEIWITSCH

    NEY

    NOVAES

    ORTIZ

    SCHNABELSHELLEY

    ZAK

    Figure 2.3 Twenty-eight tempo curves of Schumanns Tr aumerei performed by 24pianists. (For Cortot and Horowitz, three tempo curves were available.)

    sample considered here, pianists of the modern era tend to make a muchstronger distinction between A and A in terms of slow tempi. The onlyexceptions (outliers in the left boxplot) are Moiseiwitsch and Horowitz

    rst performance and Ashkenazy (outlier in the right boxplot). The com-parsion of skewness and curtosis in Figures 2.4g and h also indicates thatmodern pianists seem to prefer occasional extreme ritardandi. The onlyexception in the early 20th century group is Artur Schnabel, with anextreme skewness of 2.47 and a kurtosis of 7.04.

    Direct comparisons of tempo distributions are shown in Figures 2.5a

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    38/285

    Figure 2.4 Boxplots of descriptive statistics for the 28 tempo curves in Figure 2.3 .

    through f . The following observations can be made: a) compared to Demus(quantiles on the horizontal axis), Ortiz has a few relatively extreme slow

    tempi ( Figure 2.5a ); b) similarily, but in a less extreme way, Cortots inter-pretation includes occasional extremely slow tempo values ( Figure 2.5b ); c)Ortiz and Argerich have practically the same (marginal) distribution ( Fig-ure 2.5c); d) Figure 2.5d is similar to 2.5a and b, though less extreme; e) thetempo distribution of Cortots performance ( Figure 2.5e ) did not changemuch in 1947 compared to 1935; f) similarily, Horowitzs tempo distribu-

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    39/285

    tions in 1947 and 1963 are almost the same, except for slight changes forvery low tempi ( Figure 2.5f ).

    Demus

    O r t

    i z

    -2 -1 0 1

    - 4

    - 3

    - 2

    - 1

    0

    1

    Figure 2.5a: q-q-plotDemus (1960) - Ortiz (1988)

    Demus

    C o r t o t

    -2 -1 0 1

    - 4

    - 3

    - 2

    - 1

    0

    1

    2

    Figure 2.5b: q-q-plotDemus (1960) - Cortot (1935)

    Ortiz

    A r g e r i c

    h

    -4 -3 -2 -1 0 1

    - 4

    - 3

    - 2

    - 1

    0

    1

    2

    Figure 2.5c: q-q-plotOrtiz (1988) - Argerich (1983)

    Demus

    K r u s t

    -2 -1 0 1

    - 4

    - 3

    - 2

    - 1

    0

    1

    Figure 2.5d: q-q-plotDemus (1960) - Krust (1960)

    Cortot 1935

    C o r t o t 1 9 4 7

    -4 -3 -2 -1 0 1 2

    - 4

    - 2

    0 2

    Figure 2.5e: q-q-plotCortot (1935) - Cortot (1947)

    Horowitz 1947

    H o r o w

    i t z 1 9 6 3

    -4 -3 -2 -1 0 1

    - 4

    - 3

    - 2

    - 1

    0

    1

    Figure 2.5f: q-q-plotHorowitz (1947) - Horowitz (1963)

    Figure 2.5 q-q-plots of several tempo curves (from Figure 2.3 ).

    2.3.2 Notes modulo 12

    In most classical music, a central tone around which notes uctuate canbe identied, and a small selected number of additional notes or chords(often triads) play a special role. For instance, from about 400 to 1500A.D., music was mostly written using so-called modes. The main notes

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    40/285

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    41/285

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    42/285

    Figure 2.8 Johannes Chrysostomus Wolfgangus Theophilus Mozart (1756-1791)in the house of Salomon Gessner in Zurich. (Courtesy of Zentralbibliothek Z urich.)

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    43/285

    Figure 2.9 R. Schumann (1810-1856) lithography by H. Bodmer. (Courtesy of Zentralbibliothek Z urich.)

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    44/285

    6 and 7 by F. Martin (1890-1971). For each j = 4 , 8,..., 64, the frequen-cies pj (0) ,...,p j (11) are joined by lines respectively. The obvious commonfeature for Bach, Mozart and Schumann is a distinct preference (local max-

    imum) for the notes 5 and 7 (apart from 0). Note that if 0 is the root of the tonic triad, then 5 corresponds to the root of the subdominant triad.Similarily, 7 is root of the dominant triad. Also relatively frequent are thenotes 3 =minor third (second note of tonic triad in minor) and 10 =minorseventh, which is the fourth note of the dominant seventh chord to thesubtonic. Also note that, for Schumann, the local maxima are somewhatless pronounced. A different pattern can be observed for Scriabin and evenmore for Martin. In Scriabins Prelude op. 51/2, the perfect fth almostnever occurs, but instead the major sixth is very frequent. In Scriabins

    Prelude op. 51/4, the tonal system is dissolved even further, as the clearlydominating note is 6 which builds together with 0 the augmented fourth(or diminished fth) an interval that is considered highly dissonant intonal music. Nevertheless, even in Scriabins compositions, the distributionof notes does not change very rapidly, since the sixteen overlayed curves arealmost identical. This may indicate that the notion of scales or a slow har-monic development still play a role. In contrast, in Frank Martins PreludeNo. 6, the distribution changes very quickly. This is hardly surprising, sinceMartins style incorporates, among other inuences, dodecaphonism (12-tone music) a compositional technique that does not impose traditionalrestrictions on the harmonic structure.

    2.4 Some descriptive statistics and plots for bivariate data

    2.4.1 Denitions

    We give a short overview of important descriptive concepts for bivariate

    data. For a comprehensive treatment we refer the reader to standard textbooks given above (also see e.g. Plackett 1960, Ryan 1996, Srivastava andSen 1997, Draper and Smith 1998, and Rao 1973 for basic theoretical re-sults).

    Correlation

    If each observation consists of a pair of measurements ( x i , yi ), then the mainobjective is to investigate the relationship between x and y. Consider, for

    example, the case where both variables are quantitative. The data can thenbe displayed in a scatter plot (y versus x). Useful statistics are Pearsonssample correlation

    r = 1n

    n

    i =1

    (x i x

    sx)(

    yi ysy

    ) =ni =1 (x i x)(yi y)

    ni =1 (x i x)2 ni =1 (yi y)2(2.3)

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    45/285

    where s2x = n 1ni =1 (x i x)2 and s2y = n 1

    ni =1 (yi y)2 and Spearmansrank correlation

    r Sp =

    1

    n

    n

    i =1 (

    u i

    u

    su )(

    vi

    v

    sv ) =

    ni =1 (u i

    u)(vi

    v)

    ni =1 (u i u)2 ni =1 (vi v)2 (2.4)where ui denotes the rank of xi among the xvalues and vi is the rankof yi among the yvalues. In (2.3) and (2.4) it is assumed that sx , sy ,su and sv are not zero. Recall that these denitions imply the followingproperties: a) 1 r, r Sp 1; b) r = 1, if and only if yi = o + 1x iand 1 > 0 (exact linear relationship with positive slope); c) r = 1, if and only if yi = o + 1x i and 1 < 0 (exact linear relationship withnegative slope); d) rSp = 1 , if and only if xi > x j implies yi > y j (strictlymonotonically increasing relationship); e) r = 1, if and only if xi >x j implies yi < y j (strictly monotonically decreasing relationship); f) rmeasures the strength (and sign) of the linear relationship; g) rSp measuresthe strength (and sign) of monotonicity; h) if the data are realizations of abivariate random variable ( X, Y ), then r is an estimate of the populationcorrelation = cov(X, Y )/ var (X )var (Y ) where cov(X, Y ) = E [XY ] E [X ]E [Y ], var (X ) = cov(X, X ) and var (Y ) = cov(Y, Y ). When usingthese measures of dependence one should bear in mind that each of themmeasures a specic type of dependence only, namely linear and monotonicdependence respectively. Thus, a Pearson or Spearman correlation nearor equal to zero does not necessarily mean independence. Note also thatcorrelation can be interpreted in a geometric way as follows: dening thendimensional vectors x = ( x1 ,...,x n )t and y = ( y1 ,...,y n )t , r is equal tothe standardized scalar product between x and y , and is therefore equal tothe cosine of the angle between these two vectors.

    A special type of correlation is interesting for time series. Time series aredata that are taken in a specic ordered (usually temporal) sequence. If

    Y 1 , Y 2 ,...,Y n are random variables observed at time points i = 1 ,...,n, thenone would like to know whether there is any linear dependence betweenobservations Y i and Y i k , i.e. between observations that are k time unitsapart. If this dependence is the same for all time points i, and the expectedvalue of Y i is constant, then the corresponding population correlation canbe written as function of k only (see Chapter 4 ),

    cov(Y i , Y i + k )

    var (Y i )var (Y i + k )

    = (k) (2.5)

    and a simple estimate of (k) is the sample autocorrelation (acf)

    (k) = 1n

    n k

    i =1

    (yi y

    s )(

    yi + k ys

    ) (2.6)

    where s2 = n 1 (yi y)(yi + k y). Note that here summation stops at

    2004 CRC Press LLC

    http://c2190_04.pdf/http://c2190_04.pdf/
  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    46/285

    n k, because no data are available beyond ( n k) + k = n. For large lags(large compared to n), (k) is not a very precise estimate, since there areonly very few pairs that are k time units apart.

    The denition of (k) and (k) can be extended to multivariate timeseries, taking into account that dependence between different componentsof the series may be delayed. For instance, for a bivariate time series ( X i , Y i )(i = 1 , 2,... ), one considers lag- k sample cross-correlations

    XY (k) = 1n

    n k

    i =1

    (x i x

    sX)(

    yi + k ysY

    ) (2.7)

    as estimates of the population cross-correlations

    XY (k) = cov(X i , Y i + k )

    var (X i )var (Y i + k )(2.8)

    where s2X = n 1 (x i x)(x i + k x) and s2Y = n 1 (yi y)(yi + k y). If |XY (k)| is high, then there is a strong linear dependence between X i andY i + k .Regression

    In addition to measuring the strength of dependence between two variables,one is often interested in nding an explicit functional relationship. Forinstance, it may be possible to express the response variable y in terms of anexplanatory variable x by y = g(x, ) where is a variable representing thepart of y that is unexplained. More specically, we may have, for example,an additive relationship y = g(x) + or a multiplicative equation y =g(x)e . The simplest relationship is given by the simple linear regressionequation

    y = o + 1x + (2.9)where is assumed to be a random variable with E () = 0 (and usuallynite variance 2 = var () < ). Thus, the data are yi = o + 1x i + i (i =1,...,n ) where the i s are generated by the same zero mean distribution.Often the i s are also assumed to uncorrelated or even independent this ishowever not a necessary assumption. An obvious estimate of the unknownparameters o and 1 is obtained by minimizing the total sum of squarederrors

    SSE = SSE (bo , b1) = (yi

    bo

    b1x i )2 = r 2i (bo , b1) (2.10)

    with respect to bo , b1 . The solution is found by setting the partial derivativeswith respect to bo and b1 equal to zero. A more elegant way to nd thesolution is obtained by interpreting the problem geometrically: dening then-dimensional vectors 1 = (1 ,..., 1) t , b = ( bo , b1)t and the n 2 matrix X with columns 1 and x , we have SSE = ||y bo 1 b1x ||2 = ||y X b ||2

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    47/285

    where ||.|| denotes the squared euclidian norm, or length of the vector. Itis then clear that SSE is minimized by the orthogonal projection of y onthe plane spanned by 1 and x . The estimate of = ( o , 1)t is therefore

    = ( o , 1)t = ( X t X ) 1X t y (2.11)and the projection which is the vector of estimated values yi is givenby

    y = ( y1 ,..., yn )t = X (X t X ) 1X t y (2.12)Dening the measure of the total variability of y, SST = ||yy1 ||2 (totalsum of squares), and the quantities SSR = ||y y1 ||2 (regression sum of squares=variability due to the fact that the tted line is not horizontal)and SSE =

    ||y

    y

    ||2 (error sum of squares, variability unexplained by

    regression line), we have by Pythagoras

    SST = SSR + SSE (2.13)

    The proportion of variability explained by the regression line y = o + 1xis therefore

    R2 =ni =1 (yi yi )2ni =1 (yi y)2

    = ||y y1 ||2||y y1 ||2

    = SSRSST

    = 1 SSE SST

    . (2.14)

    By denition, 0 R2

    1, and R2

    = 1 if and only if yi = yi (i.e. all pointsare on the regression line). Moreover, for simple regression we also haveR2 = r 2 . The advantage of dening R2 as above (instead of via r2) is thatthe denition remains valid for the multiple regression model (see below),i.e. when several explanatory variables are available. Finally, note that anestimate of 2 is obtained by 2 = ( n 2) 1 r 2i ( o , 1).In analogy to the sample mean and the sample variance, the least squaresestimates of the regression parameters are sensitive to the presence of out-liers. Outliers in regression can occur in the y-variable as well as in the

    x-variable. The latter are also called inuential points. Outliers may oftenbe correct and in fact very interesting observations (e.g. telling us that theassumed model may not be correct). However, since least squares estimatesare highly inuenced by outliers, it is often difficult to notice that theremay be a problem, since the tted curve tends to lie close to the outliers.Alternative, robust estimates can be helpful in such situations (see Huber1981, Hampel et al. 1986). For instance, instead of minimizing the residualsum of squares we may minimize (r i ) where is a bounded function.If is differentiable, then the solution can usually also be found by solvingthe equations

    n

    i =1

    (r

    ) bj

    r (b ) = 0 ( j = 0 ,...,p ) (2.15)

    where 2 is a robust estimate of 2 obtained from an additional equationand p is the number of explanatory variables. This leads to estimates that

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    48/285

    are (up to a certain degree) robust with respect to outliers in y, not howeverwith respect to inuential points (outliers in x). To control the effect of inuential points one can, for instance, solve a set of equations

    n

    i =1

    j ( r, x i ) = 0 ( j = 0 ,...,p ) (2.16)

    where is such that it downweighs outliers in x as well. For a compre-hensive theory of robustness see e.g. Huber (1981), Hampel et al. (1986).For more recent, efficient and highly robust methods see Yohai (1987),Rousseeuw and Yohai (1984), Gervini and Yohai (2002), and referencestherein.

    The results for simple linear regression can be extended easily to the casewhere more than one explanatory variable is available. The multiple linearregression model with p explanatory variables is dened by y = o + 1x1 +... + p x p + . For data we write yi = o + 1x i 1 + ... + p x ip + i (i = 1 ,...,n ).Note that the word linear refers to linearity in the parameters o ,..., p .The function itself can be nonlinear. For instance, we may have polynomialregression with y = o + 1x + ... + p x p + . The same geometric argumentsas above apply so that (2.11) and (2.12) hold with = ( o ,..., p)t , andthe n ( p + 1)matrix X = ( x(1) ,...,x ( p+1) ) with columns x(1) = 1 andx

    ( j +1)

    = x j = ( x1j ,...,x nj )t

    ( j = 1 ,...,p ).

    Regression smoothing

    A more general, but more difficult, approach to modeling a functional re-lationship is to impose less restrictive assumptions on the function g. Forinstance, we may assume

    y = g(x) + (2.17)with g being a twice continuously differentiable function. Under suitable

    additional conditions on x and it is then possible to estimate g fromobserved data by nonparametric smoothing. As a special example considerobservations yi taken at time points i = 1 , 2,...,n. A standard model is

    yi = g(t i ) + i (2.18)

    where ti = i/n , i are independent identically distributed (iid) randomvariables with E (i ) = 0 and 2 = var (i ) < 0. The reason for usingstandardized time ti [0, 1] is that this way g is observed on an increasinglyne grid. This makes it possible to ultimately estimate g(t) for all valuesof t by using neighboring values ti , provided that g is not too wild. Asimple estimate of g can be obtained, for instance, by a weighted average(kernel smoothing)

    g(t) =n

    i =1

    wi yi (2.19)

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    49/285

    with suitable weights wi 0, wi = 1 . For example, one may use theNadaraya-Watson weightswi = wi (t; b, n) =

    K ( t t ib )nj =1 K ( t t

    j

    b ) (2.20)

    with b > 0, and a kernel function K 0 such that K (u) = K (u), K (u) =0 (|u| > 1) and

    1 1 K (u)du = 1. The role of b is to restrict observations

    that inuence the estimate to a small window of neighboring time points.For instance, the rectangular kernel K (u) = 12 1{|u| 1} yields the samplemean of observations yi in the window n(t b) i n(t + b). An evenmore elegant formula can be obtained by approximating the Riemann sum1

    nb

    n

    j =1K ( t t j

    b ) by the integral

    1

    1K (u)du = 1:

    g(t) =n

    i =1

    wi yi = 1nb

    n

    i =1

    K (t t i

    b )yi (2.21)

    In this case, the sum of the weights is not exactly equal to one, but asymp-totically (as n and b 0 such that nb3 ) this error is negligible.It can be shown that, under fairly general conditions on g and , g con-verges to g, in a certain sense that depends on the specic assumptions (seee.g. Gasser and Muller 1979, Gasser and Muller 1984, Hardle 1991, Beranand Feng 2002, Wand and Jones 1995, and references therein).

    An alternative to kernel smoothing is local polynomial tting (Fan andGijbels 1995, 1996; also see Feng 1999). The idea is to t a polynomiallocally, i.e. to data in a small neighborhood of the point of interest. Thiscan be formulated as a weighted least squares problem as follows:

    g(t) = o (2.22)

    where = ( o , 1 , ..., p )t solves a local least squares problem dened by

    = arg mina

    K ( t i tb )r2i (a). (2.23)

    Here r i = yi [ao + a1(t i t) + ... + a p (t i t) p], K is a kernel as above andb > 0 is the bandwidth dening the window of neighboring observations.It can be shown that asymptotically, a local polynomial smoother can bewritten as kernel estimator (Ruppert and Wand 1994). A difference onlyoccurs at the borders ( t close to 0 or 1) where, in contrast to the localpolynomial estimate, the kernel smoother has to be modied. The reason

    is that observations are no longer symmetrically spaced in the windowt b). A major advantage of local polynomials is that they automaticallyprovide estimates of derivatives, namely g (t) = 1 , g (t) = 2 2 etc. Kernelsmoothing can also be used for estimation of derivatives; however different(and rather complicated) kernels have to be used for each derivative (Gasserand M uller 1984, Gasser et al. 1985). A third alternative, so-called wavelet

    2004 CRC Press LLC

  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    50/285

    thresholding, will not be discussed here (see e.g. Daubechies 1992, Donohoand Johnston 1995, 1998, Donoho et al. 1995, 1996, Vidakovic 1999, andPercival and Walden 2000 and references therein). A related method based

    of wavelets is discussed in Chapter 5 .Smoothing of two-dimensional distributions, sharpening

    Estimating a relationship between x and y (where x and y are realizationsof random variables X and Y respectively) amounts to estimating the jointtwo-dimensional distribution function F (x, y ) = P (X x, Y y). Forcontinuous variables with F (x, y ) = u x v y f (u, v) dudv, the densityfunction f can be estimated, for instance, by a two-dimensional histogram.For visual and theoretical reasons, a better estimate is obtained by kernelestimation (see e.g. Silverman 1986) dened by

    f (x, y ) = 1nb1b2 i =1

    K (x i x, y i y; b1 , b2) (2.24)where the kernel K is such that K (u, v) = K (u, v) = K (u, v) 0, and

    K (u, v)dudv = 1. Usually, b1 = b2 = b and K (u, v) has compact sup-port. Examples of kernels are K (u, v ) = 14 1{|u| 1}1{|v| 1}(rectangularkernel with rectangular support), K (u, v) = 11{u2 + v2 1} (rectangu-lar kernel with circular support), K (u, v) = 2

    1[1u

    2

    v2] (Epanechnikovkernel with circular support) or K (u, v) = (2 ) 1 exp[12 (u2 + v2)] (nor-mal density kernel with innite support). In analogy to one-dimensional

    density estimation, it can be shown that under mild regularity conditions,f (x, y ) is a consistent estimate of f (x, y ), provided that b1 , b2 0, andnb1 , nb2 .Graphical representations of two-dimensional distribution functions are 3-dimensional perspective plot: z = f (x, y ) (or f (x, y )) is plotted againstx and y; contour plot: like in a geographic map, curves corresponding to equallevels of f are drawn in the x-y-plane; image plot: coloring of the x-y-plane with the color at point ( x, y ) cor-responding to the value of f.A simple way of enhancing the visual understanding of scatterplots is so-called sharpening (Tukey and Tukey 1981; also see Chambers et al. 1983):for given numbers a and b, only points with a f (x, y ) b are drawn inthe scatterplot. Alternatively, one may plot all points and highlight pointswith a f (x, y ) b.Interpolation

    Often a process may be generated in continuous time, but is observed atdiscrete time points. One may then wish to guess the values of the points

    2004 CRC Press LLC

    http://c2190_05.pdf/http://c2190_05.pdf/http://c2190_05.pdf/
  • 7/22/2019 Statistics_in_Musicology, By Jan Beran

    51/285

    in between. Kernel and local polynomial smoothing provide this possibility,since g(t) can be calculated for any t (0, 1). Alternatively, if the obser-vations are assumed to be completely without error, i.e. yi = g(t i ), thendeterministic interpolation can be used. The most popular method is splineinterpolation. For instance, cubic splines connect neighboring observed val-ues yi 1 , yi by cubic polynomials such that the rst and second derivativesat the endpoints t i 1 , t i are equal. For observations y1 ,...,y n at equidistanttime points t i with t i t i 1 = t j t j 1 = t (i, j = 1 ,...,n ), we have n 1polynomials

    pi (t) = a i + bi (t t i ) + ci (t t i )2 + di (t t i )3 (i = 1 ,...,n 1) (2.25)To achieve smoothness at the points ti where two polynomials pi 1 , pi meet,one imposes the condition that the polynomials and their rst two deriva-tives are equal at ti . This together with the conditions pi (t i ) = yi leads toa system of 3(n 2) + n = 4( n 1) 2 equations for 4( n 1) parametersa i , bi , ci , di (i = 1 ,...,n 1). To specify a unique solution one thereforeneeds two additional conditions at the border. A typical assumption is p (t1 ) = p (tn ) = 0 which denes so-called natural splines. Cubic splineshave a physical meaning, since these are the curves that form when a thinrod is forced to pass through n knots (in our case the knots are t1 ,...,t n ),

    corresponding to minimum strain energy. The term spline refers to thethin exible rods that were used in the past by draftsmen to draw smoothcurves in ship design. In spite of their natural meaning, interpolationsplines (and similarily other methods of interpolation) can be problem-atic since the interpolated values may be highly dependent on the specicmethod of interpolation and are therefore purely hypothetical unless theaim is indeed to build a ship.

    Splines can also be used for smoothing purposes by removing the restric-tion that the curve has to go through all observed points. More specically,

    one looks for a function g(t) such that

    V () =n

    i =1

    (yi g(t i ))2 +

    [g (t)]2dt (2.26)

    is minimized. The parameter > 0 controls the smoothness of the resultingcurve. For small values of , the tted curve will be rather rough but closeto the data; for large values more smoothness is achieved but the curveis, in general, not as close to the data. The question of which to choose

    reects a standard dilemma in statistical smoothing: one needs to balancethe aim of achieving a small bias ( small) against the aim of a smallvariance ( large). For a given value of , the solution to the minimizationproblem above turns out to be a natural cubic spline (see Reinsch 1967;also see