24
WHY DO LINGUISTS NEED STATISTICS AND WHERE DO WE BEGIN? ŠA HEJNÁ 8 TH APRIL 2017, LINGVISTISK WEEKENDKONFERENCE AARHUS UNIVERSITY

WHY DO LINGUISTS NEED STATISTICS AND WHERE DO WE BEGIN? · PDF filewhy do linguists need statistics and where do we begin? mÍŠa hejnÁ 8th april 2017, lingvistisk weekendkonference

Embed Size (px)

Citation preview

Page 1: WHY DO LINGUISTS NEED STATISTICS AND WHERE DO WE BEGIN? · PDF filewhy do linguists need statistics and where do we begin? mÍŠa hejnÁ 8th april 2017, lingvistisk weekendkonference

WHY DO LINGUISTS NEED STATISTICS

AND WHERE DO WE BEGIN? MÍŠA HEJNÁ

8TH APRIL 2017, LINGVISTISK WEEKENDKONFERENCE

AARHUS UNIVERSITY

Page 2: WHY DO LINGUISTS NEED STATISTICS AND WHERE DO WE BEGIN? · PDF filewhy do linguists need statistics and where do we begin? mÍŠa hejnÁ 8th april 2017, lingvistisk weekendkonference

WHAT IS STATISTICS?

•  SET OF ANALYTICAL TOOLS

•  DEVELOPED FROM THE 17TH CENTURY ONWARDS

•  AT FIRST MAINLY IN AGRICULTURE, ASTRONOMY, AND POLITICS

•  NOW INDISPENSABLE IN THE FIELD OF LINGUISTICS TOO

•  HAVE YOU EVER WORKED WITH AVERAGES IN ANY WAY?

•  HAVE YOU EVER MADE A GRAPH?

Then you’ve had experience with statistics…

Gallerek et al. (2016, 115)

Page 3: WHY DO LINGUISTS NEED STATISTICS AND WHERE DO WE BEGIN? · PDF filewhy do linguists need statistics and where do we begin? mÍŠa hejnÁ 8th april 2017, lingvistisk weekendkonference

WHAT IS STATISTICS?

•  WE WANT TO KNOW IF WE CAN GENERALISE OUR RESULTS

•  TO KNOW THAT, WE NEED TO KNOW HOW LIKELY IT IS THAT OUR RESULT IS DUE TO CHANCE

•  IF IT IS UNLIKELY TO BE DUE TO CHANCE, WE CAN GENERALISE

•  WHICH MEANS WE CAN ALSO MAKE PREDICTIONS

•  ULTIMATELY MAKE SENSE OF THE WORLD AROUND US!

•  TESTING AFFECTS PEOPLE’S LIFE à E.G. TESTING A TYPE OF MEDICINE

•  SO IT’S IMPORTANT TO KNOW HOW SERIOUSLY WE CAN TAKE OUR RESULTS

Page 4: WHY DO LINGUISTS NEED STATISTICS AND WHERE DO WE BEGIN? · PDF filewhy do linguists need statistics and where do we begin? mÍŠa hejnÁ 8th april 2017, lingvistisk weekendkonference

STATISTICAL ANALYSES IN LINGUISTICS

•  IT’S INCREASINGLY MORE SO THE ACCEPTED STANDARD TO CARRY OUT STATISTICAL ANALYSES (ß DATA DRIVEN APPROACHES ON RISE, E.G. EXPERIMENTAL PRAGMATICS AS A NEW SUBFIELD OF LINGUISTICS)

•  FEW JOURNALS WILL ACCEPT PAPERS WITHOUT STATISTICAL SUPPORT, ! ALTHOUGH THIS DOES DEPEND ON THE RESEARCH QUESTION AND TYPE OF ANALYSIS DONE

•  UNLESS YOU DO A GOOD QUALITATIVE STUDY, YOU’RE LIKELY TO BE EXPECTED TO DO A STATISTICAL ANALYSIS

Page 5: WHY DO LINGUISTS NEED STATISTICS AND WHERE DO WE BEGIN? · PDF filewhy do linguists need statistics and where do we begin? mÍŠa hejnÁ 8th april 2017, lingvistisk weekendkonference

VISUALISING YOUR DATA

•  YOU’VE GOT ALL THE DATA

•  NOW MAKE SENSE OF IT!

•  IT REALLY HELPS TO START

WITH MAKING GRAPHS AND

EYE-BALLING WHAT’S GOING ON

We can’t quickly see the general patterns in such a big dataset à we need graphs and averages (i.e. descriptive statistics)

DESCRIPTIVE STATISTICS

Page 6: WHY DO LINGUISTS NEED STATISTICS AND WHERE DO WE BEGIN? · PDF filewhy do linguists need statistics and where do we begin? mÍŠa hejnÁ 8th april 2017, lingvistisk weekendkonference

VISUALISING YOUR DATA

•  YOU’VE GOT ALL THE DATA

•  NOW MAKE SENSE OF IT!

•  IT REALLY HELPS TO START

WITH MAKING GRAPHS AND

EYE-BALLING WHAT’S GOING ON

•  ESPECIALLY SO IF YOUR STUDY IS A FISHING EXPEDITION

(AN EXPLORATORY STUDY)

https://www.google.co.uk/search?q=dancing+data&source=lnms&tbm=isch&sa=X&ved=0ahUKEwiq0KCWt-nPAhVCCsAKHdmxCJYQ_AUICCgB&biw=1280&bih=626#imgrc=Wq9aTner8OeGrM%3A

DESCRIPTIVE STATISTICS

Page 7: WHY DO LINGUISTS NEED STATISTICS AND WHERE DO WE BEGIN? · PDF filewhy do linguists need statistics and where do we begin? mÍŠa hejnÁ 8th april 2017, lingvistisk weekendkonference

CONVINCING YOUR READER

•  VICTORIANS: STATISTICS AS A MEANS TO PREDICT STATE OF AFFAIRS

•  RESULTED IN PUBLIC HEALTH ACTS

FLORENCE NIGHTINGALE (1820-1910)

•  NURSE DURING THE CRIMEAN WAR

•  APT STATISTICIAN

•  SHOWED THAT MORE MEN DIE OF CHOLERA AND

TYPHUS THAN IN WAR DUE TO BAD SANITARY CONDITIONS

•  MORE MEN ALSO DIED DURING PEACETIME THAN WARTIME

Magnello ND

DESCRIPTIVE STATISTICS

Page 8: WHY DO LINGUISTS NEED STATISTICS AND WHERE DO WE BEGIN? · PDF filewhy do linguists need statistics and where do we begin? mÍŠa hejnÁ 8th april 2017, lingvistisk weekendkonference

CONVINCING YOUR READER

•  SHE AND HER COLLEAGUE FARR EMPLOYED GRAPHS – HELPFUL TO THOSE WHO ARE NOT USED TO STATS AND DATA ANALYSIS

•  MORTALITY OF THE BRITISH ARMY – POLAR AREA GRAPH (CHANGE ACROSS TIME)

•  BLUE – DEATHS PREVENTABLE BY BETTER HYGIENE

•  PINK – DEATHS FROM WOUNDS

•  BLACK – OTHER CAUSES

•  BRITISH ARMY WOULD HAVE DIED OUT IN THE WAR BECAUSE OF CHOLERA AND TYPHYS RATHER THAN WOUNDS!

Magnello ND

DESCRIPTIVE STATISTICS

Page 9: WHY DO LINGUISTS NEED STATISTICS AND WHERE DO WE BEGIN? · PDF filewhy do linguists need statistics and where do we begin? mÍŠa hejnÁ 8th april 2017, lingvistisk weekendkonference

INFERENTIAL STATISTICS

Salkind (2004, 5)

Page 10: WHY DO LINGUISTS NEED STATISTICS AND WHERE DO WE BEGIN? · PDF filewhy do linguists need statistics and where do we begin? mÍŠa hejnÁ 8th april 2017, lingvistisk weekendkonference

WHAT’S THE P?

•  P < 0.05, P < 0.01, P < 0.0001, P > 0.5

•  THE DEGREE OF PROBABILITY THAT OUR RESULT IS DUE TO CHANCE

•  IF WE HAVE A RESULT WITH P = 0.05, WE HAVE A 5% PROBABILITY THAT THE RESULT IS DUE TO CHANCE

•  THE LOWER THE P-VALUE, THE LOWER THE PROBABILITY OF THE RESULT BEING DUE TO CHANCE

•  THE LEVEL OF SIGNIFICANCE IS ARBITRARY

•  0.05 IN LINGUISTICS; 0.01 IN OTHER SCIENCES (ESPECIALLY WHERE LIFE IS AT STAKE!)

Do please bear in mind that the whole presentation simplifies many things.

Page 11: WHY DO LINGUISTS NEED STATISTICS AND WHERE DO WE BEGIN? · PDF filewhy do linguists need statistics and where do we begin? mÍŠa hejnÁ 8th april 2017, lingvistisk weekendkonference

VARIABLES

•  DEPENDENT VARIABLES

•  THAT’S THE ONE AFFECTED BY OTHER VARIABLES

•  E.G. THE FREQUENCY OF BE LIKE, F1, EXTENT OF CREAK, AMOUNT OF SPEECH

•  INDEPENDENT VARIABLES

•  THOSE ARE THE VARIABLES WHICH CAN AFFECT THE DEPENDENT VARIABLE

•  WE ARE INTERESTED IN THESE EFFECTS FOR VARIOUS REASONS

•  E.G. THE EFFECT OF AGE ON THE FREQUENCY OF BE LIKE (INDICATION OF CHANGE)

•  FOLLOWING SYNTACTIC CONSTITUENT WITH SOME HEDGES (INDICATION OF GRAMMATICALISATION)

•  NUISANCE VARIABLES (CONFOUNDING FACTORS)

In order to do statistical analyses (or indeed descriptive statistics), we need to understand different types of variables. Different types of variables require different mathematical treatment when calculating the probability (amongst many other things).

Page 12: WHY DO LINGUISTS NEED STATISTICS AND WHERE DO WE BEGIN? · PDF filewhy do linguists need statistics and where do we begin? mÍŠa hejnÁ 8th april 2017, lingvistisk weekendkonference

VARIABLES •  NUISANCE VARIABLES (CONFOUNDING FACTORS) - EXAMPLE

•  DO YOUNGER SPEAKERS HAVE LONGER PRE-ASPIRATION THAN OLDER SPEAKERS?

•  DEPENDENT VARIABLE: PRE-ASPIRATION DURATION

•  INDEPENDENT VARIABLE: AGE

NUISANCE VARIABLES

•  SPEAKING RATE: YOUNGER SPEAKERS MAY SPEAK FASTER, WHICH MAY RESULT IN SHORTER PRE-ASPIRATION

•  SMOKING: IF ONE AGE GROUP IS CORRELATED WITH SMOKING, THIS COULD BE A CONFOUND TOO

•  PLACE OF ARTICULATION OF THE CONSONANT: IF THIS HAS AN EFFECT ON PRE-ASPIRATION DURATION, WE SHOULD MAKE SURE THAT WE DON’T HAVE DIFFERENT PLACES OF ARTICULATION ACROSS THE GENERATIONS (E.G. MAINLY WORD WITH THE BILABIAL PLOSIVE FOR THE OLDER SPEAKERS AND MAINLY WORDS WITH THE VELAR PLOSIVE FOR THE YOUNGER SPEAKERS)

•  HOW TO CONTROL FOR THESE? •  EITHER WE EXCLUDE THESE IN THE DESIGN OF THE STUDY OR WE INCLUDE THEM AS INDEPENDENT VARIABLES

Page 13: WHY DO LINGUISTS NEED STATISTICS AND WHERE DO WE BEGIN? · PDF filewhy do linguists need statistics and where do we begin? mÍŠa hejnÁ 8th april 2017, lingvistisk weekendkonference

VARIABLES

•  FACTORS

•  NON-NUMERIC VARIABLES (WHICH CAN HOWEVER BE TRANSFORMED INTO NUMERIC ONES, AS WE DISCUSSED)

•  HAIR COLOUR (“BROWN”, “BLACK”, “STRAWBERRY”, ETC.)

•  E.G. THE TYPE OF A QUOTATIVE VERB (BE LIKE, SAY, WHISPER, BE ALL…)

•  THE OPTIONS ARE REFERRED TO AS “LEVELS”; “BROWN” IS A LEVEL OF THE “HAIR COLOUR” VARIABLE

•  NUMERIC VARIABLES

•  INTEGERS (1, 2, 3, 4, 5…) E.G. NUMBER OF SYLLABLES A WORD CAN HAVE

•  CONTINUOUS NUMBER (1.34, 1.78, 6.54) F1, F2, DURATION OF X…

Page 14: WHY DO LINGUISTS NEED STATISTICS AND WHERE DO WE BEGIN? · PDF filewhy do linguists need statistics and where do we begin? mÍŠa hejnÁ 8th april 2017, lingvistisk weekendkonference

WHICH TEST DO I USE?

•  UNDERSTANDING HOW TO TELL APART THOSE TYPES OF VARIABLES MATTERS A LOT

•  BECAUSE WITHOUT THAT YOU CAN’T DECIDE WHICH TEST TO USE!

•  GOOGLE DECISION MAKING TREES (BUT BEWARE, WE DIDN’T HAVE TIME TO DISCUSS ALL DIFFERENT TYPES OF VARIABLES)

•  IF YOU SEE A DECISION TREE WITH VARIABLE TYPES YOU DON’T KNOW, GOOGLE THOSE TYPES FIRST

•  HERE’S ONE TREE •  HTTP://STATS.IDRE.UCLA.EDU/OTHER/MULT-PKG/WHATSTAT/

•  MORE HERE (AND IN MANY OTHER PLACES TOO, JUST GOOGLE THE EXACT QUESTION YOU HAVE IN “”): HTTP://WWW.BIOSTATHANDBOOK.COM/TESTCHOICE.HTML

Page 15: WHY DO LINGUISTS NEED STATISTICS AND WHERE DO WE BEGIN? · PDF filewhy do linguists need statistics and where do we begin? mÍŠa hejnÁ 8th april 2017, lingvistisk weekendkonference

SOME YOUTUBE TUTORIALS FOR TWO OF THE BASIC TESTS USING EXCEL

YOU CAN TAKE IT FROM HERE •  CHI-SQUARE

•  HTTP://STUDY.COM/ACADEMY/LESSON/WHAT-IS-A-CHI-SQUARE-TEST-DEFINITION-EXAMPLE.HTML

•  T-TEST

•  HTTPS://WWW.YOUTUBE.COM/WATCH?V=0PD3DC1GCHC

•  HTTPS://WWW.YOUTUBE.COM/WATCH?V=AVIXQ-YSXV0

Page 16: WHY DO LINGUISTS NEED STATISTICS AND WHERE DO WE BEGIN? · PDF filewhy do linguists need statistics and where do we begin? mÍŠa hejnÁ 8th april 2017, lingvistisk weekendkonference

“THE HARDEST PART OF ANY STATISTICAL WORK IS GETTING

STARTED” CRAWLEY (2012, 1)

Page 17: WHY DO LINGUISTS NEED STATISTICS AND WHERE DO WE BEGIN? · PDF filewhy do linguists need statistics and where do we begin? mÍŠa hejnÁ 8th april 2017, lingvistisk weekendkonference

WHERE DO WE BEGIN?

WHERE TO START WITH STATS?

•  SALKIND, N.J. 2004. STATISTICS FOR PEOPLE WHO (THINK THEY) HATE STATISTICS. 2ND ED. SAGE PUBLICATIONS: LONDON.

WHEN YOU’VE READ SALKIND 2004 (AND IF YOU WANT TO USE R)

•  CRAWLEY, M.J. 2012. STATISTICS: AN INTRODUCTION USING R. CAMBRIDGE: WILEY.

•  WOODS, A.; FLETCHER, P.; HUGHES, A. 1993. STATISTICS IN LANGUAGE STUDIES. CUP: CAMBRIDGE.

IF YOU KNOW YOU WILL BE USING R FOR GRAPHS AND STATS

•  CRAWLEY, M.J. 2012. THE R BOOK. 2ND ED. CAMBRIDGE: WILEY.

•  FIRST EDITION IS STILL FAIRLY AWESOME (AND MUCH LESS EXPENSIVE!)

There are more books you may like, but here I’m only mentioning the ones I’ve got experience with.

Page 18: WHY DO LINGUISTS NEED STATISTICS AND WHERE DO WE BEGIN? · PDF filewhy do linguists need statistics and where do we begin? mÍŠa hejnÁ 8th april 2017, lingvistisk weekendkonference

EXCEL •  WIDELY AVAILABLE

•  ENABLES THE USERS TO USE MANY ANALYSES (BUT NOT ALL THAT YOU MAY NEED)

•  MATHEMATICAL AND OTHER FUNCTIONS

•  GRAPHS CAN BE MADE

•  RELATIVELY EASY TO USE

•  SOME DIFFERENCES ACROSS PLATFORMS

•  HTTP://BRIGHTENSGIRL.DEVIANTART.COM/ART/MAC-OSX-MS-EXCEL-ICON-339174925

Page 19: WHY DO LINGUISTS NEED STATISTICS AND WHERE DO WE BEGIN? · PDF filewhy do linguists need statistics and where do we begin? mÍŠa hejnÁ 8th april 2017, lingvistisk weekendkonference

R DISADVANTAGES FIRST

•  REQUIRES THE USER TO LEARN THE PROGRAMMING LANGUAGE OF R

à LESS USER-FRIENDLY THAN EXCEL

à STEEPER LEARNING CURVE

•  IS IT WORTH IT?

BEAR IN MIND THIS PRESENTATION IS GIVEN BY AN R USER…

Page 20: WHY DO LINGUISTS NEED STATISTICS AND WHERE DO WE BEGIN? · PDF filewhy do linguists need statistics and where do we begin? mÍŠa hejnÁ 8th april 2017, lingvistisk weekendkonference

R VS EXCEL

•  R FREELY AVAILABLE (INCLUDING UPDATES, CONSTANTLY BEING DEVELOPED BY PEOPLE WHO MAY HAVE INTERESTS VERY SIMILAR TO YOURS)

•  R THE SAME IRRESPECTIVE OF PLATFORM

•  MORE STATISTICAL TESTS AVAILABLE THAN IN EXCEL

•  MORE GRAPHICAL OPTIONS AVAILABLE THAN IN EXCEL (SEE THE NEXT SLIDE), VERY GOOD QUALITY •  MORE FLEXIBILITY IN MANIPULATING GRAPH DESIGN AND STATS TOO, YOU CAN REALLY BE IN CHARGE OF WHAT YOU’RE DOING IN R

•  YOU CAN WORK WITH MULTIPLE DATASETS AT ONCE IN R (NOT POSSIBLE IN EXCEL AS FAR AS I KNOW)

•  DATA COLUMNS DON’T HAVE TO BE NEXT TO ONE ANOTHER FOR ANY TESTS OR GRAPHS IN R (YOU DO NEED TO HAVE THE COLUMNS NEXT TO ONE ANOTHER FOR SOME TESTS IN EXCEL, WHICH CAN BE RATHER ANNOYING)

•  YOU CAN SAVE SCRIPTS AND RUN THEM VERY QUICKLY WITH FUTURE DATASETS

•  IF YOU MASTER THE BASICS OF R, IT’S LIKELY THAT OTHER TYPES OF SOFTWARE WORKING WITH PROGRAMMING WILL COME TO YOU MUCH EASIER (AND YOUR FUTURE EMPLOYERS MAY BE VERY IMPRESSED TOO)

Page 21: WHY DO LINGUISTS NEED STATISTICS AND WHERE DO WE BEGIN? · PDF filewhy do linguists need statistics and where do we begin? mÍŠa hejnÁ 8th april 2017, lingvistisk weekendkonference

R VS EXCEL

•  R FREELY AVAILABLE (INCLUDING UPDATES)

•  R THE SAME IRRESPECTIVE OF PLATFORM; NOT QUITE THE CASE OF EXCEL

•  MORE STATISTICAL TESTS AVAILABLE THAN IN EXCEL

•  MORE GRAPHICAL OPTIONS AVAILABLE THAN IN EXCEL

•  MORE FLEXIBILITY IN MANIPULATING GRAPH DESIGN AND STATS TOO

•  YOU WORK WITH RAW DATA: NO PIVOT TABLES NEEDED FOR MOST OF TESTS IN R

•  YOU CAN WORK WITH MULTIPLE DATASETS AT ONCE IN R

•  DATA COLUMNS DON’T HAVE TO BE NEXT TO ONE ANOTHER FOR ANY TESTS OR GRAPHS IN R

•  YOU CAN SAVE SCRIPTS AND RUN THEM VERY QUICKLY WITH FUTURE DATASETS

Iosad (2016, 19)

Page 22: WHY DO LINGUISTS NEED STATISTICS AND WHERE DO WE BEGIN? · PDF filewhy do linguists need statistics and where do we begin? mÍŠa hejnÁ 8th april 2017, lingvistisk weekendkonference

OTHER SOFTWARE •  RBRUL

•  CONSIDERED A GOOD STEP BEFORE PLUNGING INTO R

•  WORKS WITH CODING (COMMANDS) TOO

•  VARBRUL

•  PROGRAMMING SKILLS NOT REQUIRED (YOU DON’T WORK WITH COMMANDS; BUT LESS CONTROL OVER THINGS)

•  HTTP://WWW.ROMANISTIK.UNI-FREIBURG.DE/PUSCH/DOWNLOAD/VARIACIONISMO/GOLDVARB2001_USER_MANUAL.PDF

•  THE PROGRAM GIVES YOU OPTIONS, YOU TICK THE ONES YOU WANT

•  EASIER TO DO TESTS WITHOUT UNDERSTANDING WHAT THEY DO AND HOW

•  HTTPS://WWW.YOUTUBE.COM/WATCH?V=XBQRRO_TAPO (WHICH SOFTWARE SHOULD YOU USE?)

•  HTTPS://WWW.YOUTUBE.COM/WATCH?V=WKI0BQLZTCO

•  WHY USE R? SOME VERY PRACTICAL CONSIDERATIONS: HTTPS://WWW.YOUTUBE.COM/WATCH?V=W2GZFEYGU3S

•  INTRO TO R, START WITH THIS ONE AND FOLLOW UP: HTTPS://WWW.YOUTUBE.COM/WATCH?V=2-KW1MLOS1U

Page 23: WHY DO LINGUISTS NEED STATISTICS AND WHERE DO WE BEGIN? · PDF filewhy do linguists need statistics and where do we begin? mÍŠa hejnÁ 8th april 2017, lingvistisk weekendkonference

it’s hard work, but it’s worth it and it’s rewarding

Page 24: WHY DO LINGUISTS NEED STATISTICS AND WHERE DO WE BEGIN? · PDF filewhy do linguists need statistics and where do we begin? mÍŠa hejnÁ 8th april 2017, lingvistisk weekendkonference

REFERENCE LIST

•  CRAWLEY, M.J. 2012. STATISTICS: AN INTRODUCTION USING R. CAMBRIDGE: WILEY.

•  GARELLECK, M.; RITCHART, A.; KUANG, J. 2016. “BREATHY VOICE DURATING NASALITY: A CROSS-LINGUISTIC STUDY”. JOURNAL OF PHONETICS 59: 110-21.

•  IOSAD, P. 2016. “THE PHONOLOGIZATION OF REDUNDANCY: LENGTH AND QUALITY IN WELSH VOWELS’ PHONOLOGY”. PHONOLOGY 34, 1.

•  SALKIND, N.J. 2004. STATISTICS FOR PEOPLE WHO (THINK THEY) HATE STATISTICS. 2ND ED. SAGE PUBLICATIONS: LONDON.