Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
How Geography Influences Language.
John Nerbonnej.nerbonne@rug
Center for Language and Cognition, University of Groningen
Linguistisches ForschungskolloquiumUniversitat Zurich
14 Apr. 2011
John Nerbonne j.nerbonne@rug 1/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
Overview
Old problems in dialectologyMassive variationCounterindicating signals
Aggregating signals (dialectometry)Levenshtein distance
Examining the influence of geographyUnderstanding Seguy’s curve (distance)Incorporating areas
Proceeding more geographically (ideas)
John Nerbonne j.nerbonne@rug 2/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
One old problem in dialectology
Pronunciations are very variable— 87 different pronunciations of ich in the PAD
1 5Ic 5Ic˜
5¯Ic QI–k QIk @IS >@
˜Ig c EI–S
¯Ec˜k E–g E
˙Icff
E˙IS¯E
˙Ik Ek Ekh I I: IP Ic Ic
ffIc¯
IG IGff
IS ISff
IS¯I
¯c I
¯c¯
I¯G I
¯g I
¯k I
¯k. I
¯C I
¯ý I
˚k I–c I–g I–g. I–j I–k
I–C I–x I˙
I˙c¯
I˙: I
˙:c I
˙c I
˙X I
˙g I
˙g. I
˙k I
˙C I
˙ý Ig
Ij Ij˜
Ik Ikh IC Ixff Yc¯
Yý e >e¯
IG e–>Pk e– c e–g e
˙S—
e˙
>cj e
˙c e
˙G e
˙g e
˙j e
˙C eg ek e
>kx
˜i i: i:c i:c
˜ic
i– i–:>jc i–k
John Nerbonne j.nerbonne@rug 3/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
A second old problem in dialectology
We receive noisy signals of provenance.
front/low V in Haus [p] (dark) vs. [>pf] [t] vs. [>ts] [k] vs. [x(c)]
“non-overlapping isoglosses”
John Nerbonne j.nerbonne@rug 4/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
Isoglosses seldom overlap
aggregate [S] (dark) vs. s [z] (dark) vs. [s] N d/t (dark)2nd shift (non-initially) (initially) vs. deletion
John Nerbonne j.nerbonne@rug 5/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
Isoglosses seldom overlap, more
apical [r] (dark) final [n] drop (dark) medial [t] vs. s init. lenited /g/vs. uvular [ö] vs. retention
John Nerbonne j.nerbonne@rug 6/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
Why dialectometry?
Strengthen geographic signals by aggregatingSolve problems of earlier dialectology
Non-overlapping distributionsSelection of features too arbitrairy“Atomism” (Coseriu), idiosyncratic words (Bloomfield)
Introduce replicable proceduresFollowing Seguy, Goebl, Schiltz, Kretzschmar, Shackleton, ...Seeking law-like relations in linguistic variation
Sublinear distributions of linguistic variation vs. geography
John Nerbonne j.nerbonne@rug 7/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
Calculating dialect distances
To determine the aggregate distance between dialects:We determine the distance between each dialect pair for everysingle linguist element (in sample, e.g. dialect atlas)
Perhaps just same (0) vs. different (1)... but we’ve developed more sensitive measures (below)
We sum these distances for every element (hundreds of them)Immediate result: place × place table of dialect differences
Seguy (1971), Goebl (1980s and on), many others
John Nerbonne j.nerbonne@rug 8/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
Aside: more sensitive pronunciation distance measure
Levenshtein distance enables analysis of phonetic transcriptionswithout manual alignment
—move from categorical to numerical analysis of data.One of the most successful methods to determine sequencedistance (Levenshtein, 1964)
biological molecules, software engineering, ...
Levenshtein distance: minimum number of insertions, deletionsand substitutions to transform one string into the otherSyllabicity constraint add: vowels never substitute for consonants
John Nerbonne j.nerbonne@rug 9/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
Example of the Levenshtein distance
mO@lk@ delete @ 1mOlk@ subst. O/E 1mElk@ delete @ 1mElk insert @ 1mEl@k
4
m O @ l k @m E l @ k
1 1 1 1
John Nerbonne j.nerbonne@rug 10/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
Example
Based on Dutch pronunciation data from theGoeman-Taeldeman-Van Reenen-Project data (GTRP; Goemanand Taeldeman, 1996)
We use 562 words for 424 varieties in the Netherlands
Wieling, Heeringa & Nerbonne (2007) An Aggregate Analysis ofPronunciation in the Goeman-Taeldeman-van Reenen-ProjectData. In: Taal en Tongval 59(1), 84-116
Calculating Levenshtein distances yields interesting soundcorrespondences contained in the alignments (more on that later)
Note that a 100-word comparison already yields about 500 soundcorrespondences
John Nerbonne j.nerbonne@rug 11/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
Distribution of sites
John Nerbonne j.nerbonne@rug 12/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
Analytical steps
Obtain the distances between each of the ≈ 90, 000 pairs ofvarieties
n.b. this involves 500× 52 segment comparisons≈ 1.1× 109 segment comparisons in total
Organize these in a 400× 400 tableSeek groups (dialect areas) or continuum-like relations, e.g. byapplying clustering or multi-dimensional scaling, respectively
Note that no attention has been paid to geography thus far!
John Nerbonne j.nerbonne@rug 13/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
Multi-Dimensional Scaling
Frisian
Frisian cities, Het Bildt
Westerkwartier
Stellingwerf
Low Saxon
Central Gelderland
Dutch Low Franconian
Flemish Low Franconian
John Nerbonne j.nerbonne@rug 14/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
MDS dimensions → colors, projected to map
John Nerbonne j.nerbonne@rug 15/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
Interpolated, interpreted maps
John Nerbonne j.nerbonne@rug 16/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
Noisy Clustering
BonnKöln 100
Iversheim56
AachenWinterspelt
55
Odenspiel
56
LohraWittelsberg 58
Allna100
HerbornseelbachOffdilln 100
99
DexbachNiederasphe 100
Rosenthal58
Frohnhausen
100
74
AltenbergSchraden 54
BockelwitzSchmannewitz 97
Linz60
GrünlichtenbergRoßwein 100
69
Lampertswalde
72
JonsdorfRammenau 88
Gersdorf72
65
AltlandsbergLippen 100
Groß Jamno100
Pretzsch
100
Neu Schadow
93
GerbstedtLandgrafroda 100
53
BorstendorfGornsdorf 100
Theuma96
Mockern
55
CursdorfOsterfeld
Wehrsdorf
56
BillingsbachZellingen 66
Altentrüdingen97
BempflingenIggingen 80
Schömberg100
BurgriedenOberhomberg
53
BruchHermeskeil 100
KruftSiebenbach 100
Mastershausen56
57
Hartenfels
56
BüdesheimEisenbach 73
Niedernhausen61
Vielbrunn
56
Lohrhaupten
83
EschelbronnPfaffenrot 83
Niederauerbach85
56
EnsheimMaxweiler
53
EbertshausenExdorf 100
TannWeyhers 100Helmers
100100
EichenhofenHermannsreuth 100
PeterskirchenSchachach 60
Gelting92
LangenbruckOberviehbach 59
PielenhofenTreffelstein 100
Ulbering
67
Hartenstein
60
KemmernOttowind 100
Schauenstein100
Weidenbach
71
Nürnberg
65
63
Oberau
62
Klafferstraß
70
Pöttmes
7875
MaibrunnRamsau 93
79
EinöllenUngstein 59
HorheimSeelbach 62
Endenburg−Lehnacker52
EngelsbachSchellroda 100
HönebachRinggau−Röhrda 84
Unterellen
63
Mörshausen
60
GroßwechsungenWieda 99
Groß Ballhausen86
100
Orferode
99
HöchstädtIgling 70
Wildpoldsried96
SchnepfenbachVolkershausen 71
ClausthalKleinbottwar
ObermaiselsteinOberwürzbach
83
AhrbergenWasbüttel 100Brelingen
76
AlberslohHaddorf 100
Lippramsdorf61
BrockhausenEngter 100
60
HohenkörbenWüllen 63
77
AltwarpBreddin
Klein Rossau60
GrünowVietmannsdorf 94
Falkenthal79
99
MirowSchönbeck 99
98
BenninWentorf 91
Groß MohrdorfWolgast 91
Hagen64
Kirch Kogel
69
GresenhorstHerzfeld 97
Jürgenshagen
68
Verchen
68
59
AstfeldFreden 74
GottsbürenOsterhagen 96
71
AtzendorfHundisburg 100
Götz94
JacobsdorfReetz 61
62
Ruhlsdorf
81
Benzingerode
100
JeverWangerooge 57
Barßel81
BremscheidHerdecke 60
HerrentrupReelkirchen 100
HesselteichValdorf 100
9256
DreekeHerßum 66
GroßenwieheSchwabstedt 100
Holmkjer100
Wasbek
65
HammahOiste 52
JesteburgKuhstedt 94
StöckenWarpe 100Adorf
BardenflethDiekhusen
EbstorfEversen
HohwachtHuddestorf
JeetzelOhrdorf
Osterbruch
88
LeuthWemb 100
83
100
Seeks groups in data, enabling comparison to older dialectologywhich sought areasOnly bootstrap (or noisy) clustering to avoid instability
John Nerbonne j.nerbonne@rug 17/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
Projecting groups to geography
Den Burg
SchiermonnikoogOosterend
Leeuwarden
Grouw
Groningen
Heerhugowaard
Haarlem
Delft
StaverenSteenwijk
Urk
Hattem
Amersfoort
Assen
Emmen
Itterbeck
Lochem
Brugge
Veurne
Middelburg
Gent
Vianen
Zevenbergen
Kalmthout
Mechelen
Groesbeek
Helmond
Venlo
Overpelt
Roeselare
SteenbeekGeraardsbergen Tienen
Kerkrade
Aubel
John Nerbonne j.nerbonne@rug 18/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
Large body of dialectometric work—positive aspects
Dutch, German, American English, Norwegian, Swedish,Afrikaans, Sardinian, Tuscan, Catalan, Sino-Tibetan, Chinese,Bulgarian, Bantu, Central Asian (Turkic & Indo-Iranian), ...Development of consistency measure (Cronbach’s α) indictingwhether data set is sufficiently largeNovel reflection, work on validation aimed at assessing degree ofdetection of SIGNALS OF PROVENANCE
Gooskens & Heeringa (2004) Perceptive Evaluation of LevenshteinDialect Distance Measurements using Norwegian Dialect Data.Language Variation and Change 16(3), 189-207.
John Nerbonne j.nerbonne@rug 19/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
Criticisms of dialectometry, esp. Levenshtein-basedwork
Measure is too insensitive, 0/1 segment differencesToo little attention to phonetic/phonological conditioningToo reliant on transcription—what about acoustics?Where is the sociolinguistics? Isn’t variationist linguistics mostlyabout sociolinguistics?“Distance-based” methods yield too little insight into the linguisticbasis of differences (concrete differences lost in the aggregatesums)
—the hint is that it may be all smoke & mirrorsSo what? Isn’t this all just confirming what we knew earlier?
... progress on all fronts, but presentation would take too long—question and discussion period for those interested
John Nerbonne j.nerbonne@rug 20/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
The Influence of Geography
Regression designDependent variable: varietal distance, as measured by aggregatecategorical distance or Levenshtein distanceIndependent variable: geographical distance, regarded as anoperationalization of the chance of social contactStatistical cautions:
1 correlations involving averages are inflated— but we’re interested in the entire varieties (dialects)
2 distances are not independent, so significance may be inflated— Mantel tests
John Nerbonne j.nerbonne@rug 21/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
Inspiration: Jean Seguy
Seguy (1971) La relation entre la distance spatiale et la distancelexicale. Revue de Linguistique Romane 35(138), 335-357:Aggregate variation increases sublinearly with respect togeography
COURSE MOYENNE
Y = 36Vlog(x + 11
so
.0
J
10
1
~ 1. 6 . I) IS 10 1~ 30 3~ .0 .~ 50 55 60 ~ 10 1S 10 is 90 95 100 IDS 110
John Nerbonne j.nerbonne@rug 22/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
Sublinear spread is general
0 100 300 500
0.00
0.10
0.20
Bantu
0 100 200 300 400 500
0.00
00.
002
0.00
4
Bulgaria
0 200 400 600 800
0.04
0.08
0.12
Germany
0 200 600 1000
0.0
0.2
0.4
LAMSAS / Lowman
0 50 100 200 300
0.01
0.03
0.05
0.07
The Netherlands
0 100 200 300 400 500
1.0
2.0
3.0
4.0
Norway
John Nerbonne j.nerbonne@rug 23/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
Aside: Trudgill’s “Gravity hypothesis”
Moon
DeimosPhobos
Venus
Earth
Mars
Sun
According to Trudgill (1972) diffusion follows an inverse square
law, with the consequence that linguistic distance should likewise
increase with the square of the distance. Population size plays
the role of mass.
John Nerbonne j.nerbonne@rug 24/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
Trudgill’s “Gravity hypothesis”
Sublinear aggregate relation incompatible with a quadraticinfluence
J.Nerbonne (to appear) Measuring the Diffusion of Linguistic Change.Phil. Transactions of the Royal Society B
John Nerbonne j.nerbonne@rug 25/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
How much does distance influence language?
Area Corr.(l,geo) r2
Gabon Bantu 0.47 0.22Bulgaria 0.49 0.24Germany 0.57 0.32Eastern U.S. 0.51 0.26Netherlands 0.62 0.38Norway 0.41 0.16
Norwegian ling. dist. correlates better w. travel time in 1900 (r = 0.54)Gooskens (2005) Dialectologia et Geolinguistica 13.
— very primitive geography!
John Nerbonne j.nerbonne@rug 26/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
Geography influence on language
Geography accounts for 22− 38% of aggregate linguistic variation.General — sublinear — characterization of relation betweengeographical distance and linguistic differencesLike population geneticists’ “isolation by distance” (Wright, 1943;Malecot, 1955)
John Nerbonne j.nerbonne@rug 27/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
Wrede’s (1926-56) German Dialect Areas
Aachen
Adorf
Ahrbergen
Albersloh
AllnaAltenberg
Altentrüdingen
Altlandsberg
Altwarp
Astfeld Atzendorf
BardenflethBarßel
Bempflingen
Bennin
Benzingerode
Billingsbach
Bockelwitz
BonnBorstendorf
Breddin
Brelingen
Bremscheid
Brockhausen
Bruch
Burgrieden
Büdesheim
Clausthal
Cursdorf
Dexbach
Diekhusen
Dreeke
Ebertshausen
Ebstorf
Eichenhofen
Einöllen
Eisenbach
Endenburg−Lehnacker
Engelsbach
Engter
Ensheim
Eschelbronn
Eversen
Exdorf
Falkenthal
Freden
Frohnhausen
Gelting
Gerbstedt
Gersdorf
Gornsdorf
Gottsbüren
Gresenhorst
Groß Ballhausen
Groß Jamno
Groß Mohrdorf
Großwechsungen
Großenwiehe
Grünlichtenberg
Grünow
Götz
Haddorf
Hagen
Hammah
Hartenfels
Hartenstein
HelmersHerbornseelbach
Herdecke
Hermannsreuth
Hermeskeil
Herrentrup
Herßum
Herzfeld
Hesselteich
Hohenkörben
Hohwacht
Holmkjer
Horheim
Huddestorf
Hundisburg
Höchstädt
Hönebach
Iggingen
Igling
Iversheim
Jacobsdorf
Jeetzel
Jesteburg
Jever
Jonsdorf
Jürgenshagen
Kemmern
Kirch Kogel
Klafferstraß
Klein Rossau
Kleinbottwar
Kruft
Kuhstedt
Köln
LampertswaldeLandgrafroda
Langenbruck
LeuthLinz
Lippen
Lippramsdorf
Lohra
Lohrhaupten
Maibrunn
Mastershausen
Maxweiler
Mirow
Mockern
Mörshausen
Neu Schadow
Niederasphe
Niederauerbach
Niedernhausen
Nürnberg
Oberau
Oberhomberg
Obermaiselstein
Oberviehbach
Oberwürzbach
OdenspielOffdilln
Ohrdorf
Oiste
Orferode
Osterbruch
Osterfeld
Osterhagen
Ottowind
Peterskirchen
Pfaffenrot
Pielenhofen
Pretzsch
Pöttmes
Rammenau
Ramsau
Reelkirchen
Reetz
Ringgau−Röhrda
Rosenthal
Roßwein
Ruhlsdorf
Schachach
Schauenstein
Schellroda
Schmannewitz
Schnepfenbach
Schraden
Schwabstedt
Schömberg
Schönbeck
Seelbach
Siebenbach
Stöcken
Tann
Theuma
Treffelstein
Ulbering
Ungstein
Unterellen
Valdorf
Verchen
Vielbrunn
Vietmannsdorf
Volkershausen
Wangerooge
Warpe
Wasbek
Wasbüttel
Wehrsdorf
Weidenbach
Wemb
Wentorf
Weyhers
Wieda
Wildpoldsried
Winterspelt
Wittelsberg
Wolgast
Wüllen
Zellingen
John Nerbonne j.nerbonne@rug 28/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
Influence of Dialect Areas?
We add to the regression design variables indicating whether twovarieties belong to the same or different dialect areas.Results: Dialect areas contribute substantially to the explanationof aggregate linguistic distance. r2 increases from about 32%(based only on geographic distance) to about 47% (based ongeographic distance and areal differencs).
John Nerbonne (submitted, 2010) How much does Geography InfluenceLanguage Variation? Auer et al. (eds.) Proc. of the Freiburg (FRIAS)language and space workshops. Mouton de Gruyter: Berlin.
John Nerbonne j.nerbonne@rug 29/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
Generalized Additive Models
Simon N. Wood (2006) Generalized Additive Models: AnIntroduction with RAllows regresion using combination of predictors, e.g. longitudeand latitudeA more sophisticated notion of geography (than simple distance)But I do not understand all the math!
John Nerbonne j.nerbonne@rug 30/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
LingDist ∼ S(long.lat)
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
51.0
51.5
52.0
52.5
53.0
53.5
−0.2
−0.15
−0.1
−0.05
−0.05
−0.05
0
0
0
0
0
0.05
0.05
0.05
0.1
0.1
0.1
0.1
5
0.15
0.1
5 Longitude
Latit
ude
John Nerbonne j.nerbonne@rug 31/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
How could Geographers help?
Alternative perspectives on geographic explanation
Do you have other means of exploring geographic explanations?
ToolsIs there good, easy-to-use simulation software e.g. to examinequestions of diffusion?
Access to potentially explanatory demographic dataAgeEducational level“Locality” – percentage born and raised with k kilometersPopulation density
John Nerbonne j.nerbonne@rug 32/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
How does geography influence language variation?
Pure distance models explain 22% - 38% of aggregate linguisticvariation.Areal distinctions are somewhat collinear, but nonethless addsubstantially to simple models, perhaps as much as 50% (moving30% to 45%, for example).Naturally, there is also subdialectal variation (social, sexual,individual), but few systematic data collections.Emerging questions:
What is the linguistic structure of the dialect differences we find?Do typological constraints play a (confounding) role?Can we tease apart geographical and historical explanations, andhow?
Try Gabmap! www.gabmap.nl
John Nerbonne j.nerbonne@rug 33/34
Motivation Aggregating Signals Dutch Pronunciation Geographic Projections Geographic Musings
Questions?
Thank You!
John Nerbonne j.nerbonne@rug 34/34