99
Introduction Data Preparation Language Variation Suite Working with Data Visual Analytics Inferential Analysis Data Modification Mixed Effects RBRUL Appendix Cross Tabulation Data Modification References Optimizing Language Variation Analysis: Language Variation Suite Olga Scrivner, Manuel D´ ıaz-Campos and Rafael Orozco [email protected] [email protected] [email protected] Indiana University and Louisiana State University NWAV45, 2016 1 / 93

Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Embed Size (px)

Citation preview

Page 1: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Optimizing Language Variation Analysis:Language Variation Suite

Olga Scrivner, Manuel Dıaz-Campos and Rafael Orozco

[email protected] [email protected] [email protected]

Indiana University and Louisiana State University

NWAV45, 20161 / 93

Page 2: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Goal

Provide researchers with a variety of quantitative methods toadvance language variation studies.

PositionSentencep < 0.001

1

ind, pre post

Heavinessp = 0.003

2

≤ 1 > 1

Periodp < 0.001

3

≤ 1 > 1

Node 4 (n = 81)

VO

OV

00.20.40.60.81

Node 5 (n = 119)

VO

OV

00.20.40.60.81

Node 6 (n = 181)

VO

OV

00.20.40.60.81

Periodp < 0.001

7

≤ 2 > 2

Node 8 (n = 221)

VO

OV

00.20.40.60.81

Focusp < 0.001

9

cf nf

Node 10 (n = 66)

VO

OV

00.20.40.60.81

Main_Verb_Structurep < 0.001

11

ACIOther, Restructuring

Node 12 (n = 43)

VO

OV

00.20.40.60.81

Node 13 (n = 265)

VO

OV

00.20.40.60.81

2 / 93

Page 3: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Objectives

1 Introduce a novel sociolinguistic toolkit

2 Develop practical quantitative analytical skills

3 Understand and interpret advanced statistical models

3 / 93

Page 4: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

What is LVS?

Language Variation Suite

It is a Shiny web application designed for data analysis insociolinguistic research.

It can be used for:

Processing spreadsheet data

Reporting in tables and graphs

Analyzing means, regression, conditional trees and muchmore

4 / 93

Page 5: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Background

LVS is built in R using Shiny package:

1 R - a free programming language for statistical computingand graphics

2 Shiny App - a web application framework for R

Computational power of R + Web interactivity

5 / 93

Page 6: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Background

http://littleactuary.github.io/blog/Web-application-framework-with-Shiny/

6 / 93

Page 7: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Data Preparation

Important things to consider before data entry:

File format:

Comma separated value (CSV) facilitates faster processing

Excel format will slow processing

Column names should not contain spaces

Permitted: non-accented characters, numbers, underscore,hyphen, and period

One column must contain your dependent variable

The rest of the columns contain independent variables

7 / 93

Page 8: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Workspace

Browser

Chrome, Firefox, Safari - recommendable

Explorer may cause instability issues

Accessibility

PC, Mac, Linux

Data files can be uploaded from any location on yourcomputer

Smart Phone

Data files must be on a cloud platform connected to yourphone account (e.g. dropbox)

8 / 93

Page 9: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Terminology Review

a. Categorical - non-numerical data with two values

yes - no; deletion - retention; perfective - imperfective

b. Continuous - numerical data

duration, age, chronological period

c. Multinomial - non-numerical data with three or morevalues

deletion - aspiration - retention

d. Ordinal - scale: currently not supported

9 / 93

Page 10: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Workshop Files

https://languagevariationsuite.wordpress.com/

1 categoricaldata.csv: categorical dependent - Labov NewYork 1966 study

2 continuousdata.csv: continuous dependent - Intervocalic/d/ in Caracas corpus (Dıaz-Campos et al.)

3 LVS web site:

https://languagevariationsuite.shinyapps.io/Pages/

10 / 93

Page 11: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Workshop Files

https://languagevariationsuite.wordpress.com/

1 categoricaldata.csv: categorical dependent - Labov NewYork 1966 study

2 continuousdata.csv: continuous dependent - Intervocalic/d/ in Caracas corpus (Dıaz-Campos et al.)

3 LVS web site:

https://languagevariationsuite.shinyapps.io/Pages/

10 / 93

Page 12: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Language Variation Suite - Structure

1 Demo

Brief introduction

2 Data

Upload file, data summary, adjust data, cross tabulation

3 Visual Analysis

Plotting - histograms, frequencies, cluster plots

4 RBRUL

New version by Daniel Johnson!

5 Inferential statistics

Modeling, regression, conditional trees, random forest,model comparison

11 / 93

Page 13: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Language Variation Suite - Structure

1 Demo

Brief introduction

2 Data

Upload file, data summary, adjust data, cross tabulation

3 Visual Analysis

Plotting - histograms, frequencies, cluster plots

4 RBRUL

New version by Daniel Johnson!

5 Inferential statistics

Modeling, regression, conditional trees, random forest,model comparison

11 / 93

Page 14: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Language Variation Suite - Data

1 Upload CSV file

2 Upload Excel file

12 / 93

Page 15: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Excel Format

1 Slow processing

2 Requires the name of your excel sheet

13 / 93

Page 16: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Save Excel as CSV Format

To optimize speed - Save as CSV prior upload

14 / 93

Page 17: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Server

Since LVS is hosted on a server, Shiny idle time-out settingsmay stop the application when it is left inactive (it will greyout).

Solution: Click reload and re-upload your csv data

15 / 93

Page 18: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Upload File

Upload categoricaldata.csv

16 / 93

Page 19: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Table

Table displays our dataset and allows for sorting columns indescending/ascending order.

17 / 93

Page 20: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Summary

Summary provides a quantitative summary for each variable,e.g. frequency count, mean, median.

18 / 93

Page 21: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Data Structure

19 / 93

Page 22: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Data Structure

1 Total number of observations

2 Number of variables

3 Variable types

Factor - categorical values

Num - numeric values (0.95, 1.05)

Int - integer values (1, 2, 3)20 / 93

Page 23: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Questions?

21 / 93

Page 24: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Language Variation Suite - Structure

1 Demo

Brief introduction

2 Data

Upload file, data summary, adjust data, cross tabulation

3 Visual Analysis

Plotting, cluster classification

4 RBRUL

New version by Daniel Johnson!

5 Inferential statistics

Modeling, regression, conditional trees, random forest,model comparison

22 / 93

Page 25: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Visual Analytics

Visual Analytics: “The science of analytical reasoningfacilitated by visual interactive interfaces”

(Thomas et al. 2005)

23 / 93

Page 26: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

One Variable Plot

24 / 93

Page 27: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

One Variable Plot

25 / 93

Page 28: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Two Variables Plot

26 / 93

Page 29: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Two Variables Plot

27 / 93

Page 30: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Three Variables Plot

28 / 93

Page 31: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Three Variables Plot

29 / 93

Page 32: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Cluster Plot

Classification of data into sub-groups is based onpairwise similarities

Groups are clustered in the form of a tree-likedendrogram

Independent variable must have at least THREE values(e.g. store - Saks, Kleins, Macy’s)

30 / 93

Page 33: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Cluster Plot

31 / 93

Page 34: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Cluster Plot

Saks (upper middle-class store), Macy’s (middle-class store), Kleins

(working-class)

32 / 93

Page 35: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Inferential Statistics

33 / 93

Page 36: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Language Variation Suite - Structure

1 Demo

Brief introduction

2 Data

Upload file, data summary, adjust data, cross tabulation

3 Visual Analysis

Plotting, cluster classification

4 RBRUL

New version by Daniel Johnson!

5 Inferential statistics

Modeling, regression, conditional trees, random forest,model comparison

34 / 93

Page 37: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Modeling

35 / 93

Page 38: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Modeling

35 / 93

We are interested in RETENTION= Application

Page 39: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Regression Types

Model

a.) Fixed effects

b.) Mixed effects - individual speaker/token variation (withingroup)

Type of Dependent Variable

a.) Binary/categorical (only two values)

b.) Continuous (numeric)

c.) Multinomial - categorical with more than two values

36 / 93

Page 40: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Regression

37 / 93

Page 41: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Model Output

38 / 93

Page 42: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Interpretation

1 Estimate: reported in log-odds: negative or positive effectcloser to zero - lesser effect

2 P - significance (p < 0.05)

39 / 93

Page 43: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Interpretation

Lexical item Fourth has a negative effect on retention and issignificant

Normal style has a slightly negative effect on retention but itscoefficient is not significant

Macy’s and Saks have a positive and significant effect onretention. Saks (upper middle class store) is more significantthan Macy’s (middle class store)

http://www.free-online-calculator-use.com/scientific-notation-converter.html40 / 93

Page 44: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Interpretation

Lexical item Fourth has a negative effect on retention and issignificant

Normal style has a slightly negative effect on retention but itscoefficient is not significant

Macy’s and Saks have a positive and significant effect onretention. Saks (upper middle class store) is more significantthan Macy’s (middle class store)

http://www.free-online-calculator-use.com/scientific-notation-converter.html40 / 93

exponential notation:

1.48e-8

0.0000000148

Page 45: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Questions?

41 / 93

Page 46: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

New Tools of Linguistic Analysis (Baayen 2008,Tagliamonte 2014, Gries 2015)

Conditional inference trees and Random Forests

“Proves to be more stable than stepwise variable selectionapproaches available for logistic regression” (Strobl2009:325)

Can handle skewed data that often violate theassumptions of regression approaches

42 / 93

Page 47: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Conditional Tree

Conditional tree: a simple non-parametric regression analysis,commonly used in social and psychological studies

Linear regression: all information is combined linearly

Conditional tree regression: visual splitting to captureinteraction between variables

Recursive splitting (tree branches)43 / 93

Page 48: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Conditional Tree - Tagliamonte and Baayen 2012

1 The distribution of was/were is split into two groups byindividuals.

2 The variant were occurs significantly more frequently with thefirst group.

44 / 93

Page 49: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Conditional Tree - Tagliamonte and Baayen (2012)

1 Polarity is relevant to the second group of individuals.

2 The variant were occurs significantly more often with negativepolarity

45 / 93

Page 50: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Conditional Tree - Tagliamonte and Baayen (2012)

1 Affirmative Polarity is conditioned by Age.

2 The variant was is produced significantly more often byIndividuals of 46 and younger.

46 / 93

Page 51: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Conditional Tree

47 / 93

Page 52: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Conditional Tree

1 Store is the most significant factor for R-use

Kleins (working class store) - more R-deletion

2 R-use in Macy’s and Saks is conditioned by lexical item:

Floor shows more R-retention than Fourth

3 Style is not significant48 / 93

Page 53: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Random Forest

1 Variable importance for predictors

2 Robust technique with small n large p data

3 All predictors considered jointly (allows for inclusion ofcorrelated factors)

49 / 93

Page 54: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Random Forest

50 / 93

Page 55: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Random Forest

1 Store is the most important predictor

2 Lexical Item is the second predictor

3 Style is irrelevant: close to zero and red dotted line (cut-offvalue).

51 / 93

Page 56: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Let’s Have a Short Break

52 / 93

Page 57: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Preparing Data

1 Download continuousdata.csv

2 Upload this file on LVS

53 / 93

Page 58: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Table

54 / 93

Page 59: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Summary

55 / 93

Page 60: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Summary

56 / 93

Page 61: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Changing Class from Integer to Factor

57 / 93

Page 62: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Change Class

58 / 93

Page 63: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Adjusted Dataset

59 / 93

Page 64: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Summary - New Dataset

60 / 93

Page 65: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Continuous Variable - Histogram

61 / 93

Page 66: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Density - Histogram

Density: a non-parametric model of the distribution of points basedon a smooth density estimate

http://scikit-learn.org/stable/modules/density.html

62 / 93

Page 67: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Frequency Plot

63 / 93

Page 68: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Frequency Plot - Word Cloud

64 / 93

Page 69: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Frequency Plot

65 / 93

Page 70: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Questions?

66 / 93

Page 71: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Fixed and Mixed Effects Models

67 / 93

I’m ready for Mixed Models!

Page 72: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Fixed and Mixed Models

Fixed Effects Model : All predictors are treated independently.

Underlying assumption - no group-internalvariation between speakers or tokens

Mixed Effects Model : Allows for evaluation of individual- andgroup-level variation

68 / 93

Page 73: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Fixed and Mixed Models: Errors

Fixed Regression Model - ignoring individual variations(speakers or words) may lead to Type I Error:“a chance effect is mistaken for a real differencebetween the populations”

Mixed Regression Model - prone to Type II Error:“if speaker variation is at a high level, we cannotdiscern small population effects without a largenumber of speakers” (Johnson 2009, 2015)

69 / 93

Page 74: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Mixed Effect Regression

Mixed Model = fixed effects + random effects

Fixed-effects factor - “repeatable and a small number of levels”

Random-effects factor - “a non-repeatable random sample

from a larger population” (Wieling 2012)

walk, sleep, study, finish, eat, etc

aspectual verb, stative verb

speaker1, speaker3, speaker3, etc

male, female

70 / 93

Page 75: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Mixed Effect Regression

Mixed Model = fixed effects + random effects

Fixed-effects factor - “repeatable and a small number of levels”

Random-effects factor - “a non-repeatable random sample

from a larger population” (Wieling 2012)

walk, sleep, study, finish, eat, etc

aspectual verb, stative verb

speaker1, speaker3, speaker3, etc

male, female

70 / 93

Page 76: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Mixed Effect Regression

Mixed Model = fixed effects + random effects

Fixed-effects factor - “repeatable and a small number of levels”

Random-effects factor - “a non-repeatable random sample

from a larger population” (Wieling 2012)

walk, sleep, study, finish, eat, etc

aspectual verb, stative verb

speaker1, speaker3, speaker3, etc

male, female

70 / 93

Page 77: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Mixed Effect Modeling

71 / 93

NULL when the dependent variable is continuous

Fixed Effects - independent variables

Page 78: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Mixed Effect Modeling

72 / 93

Mixed Effects - group-internal variation

Page 79: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Regression Results

73 / 93

Page 80: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Interpretation - Random Effects

1 Standard Deviation: a measure of the variability for eachrandom effect (speakers and tokens)

2 Residual: random variation that is not due to speakers ortokens (residual error)

74 / 93

Page 81: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Interpretation - Fixed Effects

1 Estimate/coefficient: reported in log-odds (negative orpositive)

2 P-value: tells you if the level is significant

75 / 93

Page 82: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Language Variation Suite - Structure

1 Demo

Brief introduction

2 Data

Upload file, data summary, adjust data, cross tabulation

3 Visual Analysis

Plotting, cluster classification

4 RBRUL

New version by Daniel Johnson!

5 Inferential statistics

Modeling, regression, varbrul analysis, conditional trees,random forest

76 / 93

Page 83: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

RBRUL 3.0 Beta

77 / 93

Page 84: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Upload File

78 / 93

Page 85: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Model Selection

79 / 93

Page 86: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Model Selection

80 / 93

Page 87: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Model Selection

81 / 93

Page 88: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Output

82 / 93

Page 89: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Application Values

83 / 93

Page 90: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Questions?

84 / 93

Page 91: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Appendix 1: Cross-Tabulation

Cross-tabulation examines the relationship between twovariables (their interaction).

85 / 93

Page 92: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Cross-Tabulation: One Dependent and OneIndependent Variables

86 / 93

Page 93: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Cross-Tabulation Output

Raw frequency / Proportion by column / Proportion across row

87 / 93

Page 94: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Appendix 2: Data Modification

88 / 93

Page 95: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Adjust Data

Retain: Select data subset

Exclude: Exclude variables from a factor group

Recode: Combine and rename variables

Change class: Numeric → factor; factor → numeric

Transform: Apply log transformation to a specific column

ADJUSTED DATASET:

Run - to apply all above changes

Reset - to reset to the original dataset

89 / 93

Page 96: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Exclude: Emphatic Style

90 / 93

Page 97: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Adjusted Dataset

91 / 93

Page 98: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Adjusting Dataset

To revert to the original data, select RESET:

92 / 93

Page 99: Optimizing Language Variation Analysis: Language Variation ...cl.indiana.edu/~obscrivn/docs/WorkshopLVS.pdf · Cross Tabulation Data Modi cation ... Important things to consider before

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

References I

[1] Baayen, Harald. 2008. Analyzing linguistic data: A practical introduction to statistics. Cambridge:Cambridge University Press

[2] Bentivoglio, Paola and Mercedes Sedano. 1993. Investigacion sociolinguıstica: sus metodos aplicados auna experiencia venezolana. Boletın de Linguıstica 8. 3-35

[3] Gries, Stefan Th. 2015. Quantitative designs and statistical techniques. In Douglas Biber RandiReppen (eds.), The Cambridge Handbook of English Corpus Linguistics. Cambridge: CambridgeUniversity Press

[4] Labov, W. 1966. The Social Stratification of English in New York City. Washington: Center for AppliedLinguistics

[5] http://gifsanimados.espaciolatino.com/x bob esponja 8.gif

[6] https://daniellestolt.files.wordpress.com/2013/01/are-you-ready1.gif

[7] http://www.martijnwieling.nl/R/sheets.pdf

93 / 93