R version 4.4.0 (2024-04-24)
Platform: aarch64-apple-darwin20
Running under: macOS Ventura 13.2.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/Berlin
tzcode source: internal
attached base packages:
[1] stats graphics grDevices datasets utils methods base
loaded via a namespace (and not attached):
[1] htmlwidgets_1.6.4 compiler_4.4.0 fastmap_1.1.1 cli_3.6.2
[5] htmltools_0.5.8.1 tools_4.4.0 rstudioapi_0.16.0 yaml_2.3.7
[9] rmarkdown_2.24 knitr_1.43 jsonlite_1.8.7 xfun_0.40
[13] digest_0.6.33 rlang_1.1.3 renv_1.0.7 evaluate_0.21
8 Independence
Motiviating mixed models
This chapter is not fully translated from bullet points (from my slides) to prose. This will happen eventually (hopefully by spring 2024).
This lecture covers Winter & Grice (2021) (Sections 1 and 2), Chapter 14 'Mixed Models 1: Conceptual Introduction (Sections 14.1-14.3; Winter, 2019), and Sections 8.1-8.2 from Ch. 8 (Mixed-effects models I: Linear Regression) in Sonderegger (2023).
Learning Objectives
Today we will learn about…
- the independence assumption
- types of non-independence in linguistic data
- the history of mixed models in linguistics
9 Independence assumption
- we already learned about some model assumptions
- assumption of normality of residuals
- homoscedasticity (constant variance) of residuals
- absence of collinearity of predictors
- there another, argulably more important assumption
- assumption of independence
9.1 (Non-)Independence
- non-independence: any possible link or connection between groups of data points
- e.g., two observations from the same participant will tend to be more similar than to completely independent observations
- any case where you might expect some clustering of observations by some grouping factor
- the independence assumption assumes that our data points are not linked
- i.e., the value of one observation is completely independent from another
- violations of this assumption have major implications for Type I (alpha) error
- i.e., the chances of observing an effect where there is none (false positive)
- it also artificially inflates sample size, which affects statistical power
9.2 Repeated measures design
- the reason most (experimental) linguistic data is non-independent is the use of the repeated-measures design
- collecting multiple data points from e.g., the same participant and for the same item
- increases statistical power, needing fewer participants (more data points, lower variance due to control in variability between subjects)
- saves resources (fewer subjects)
9.3 Other sources of non-independence
- non-independence is prevalent in other fields of linguistics, e.g.,
- corpus studies: text, author, language, dialect, register
- phonetic experiments: speaker, listener, exact repetitions
- socio-phonetics: dialect/geographical proximity, register, speaker
10 Pseudoreplication
Pseudoreplication refers to the treatment of dependent observations as independent data points, which causes an overabundance of erroneously significant results.
— Winter (2011), p. 2137
- analysing nonindependent data as if they were independent
- essentially, violating the independence assumption
- very (very) common in older publications
- can also result in Type M (magnitude) and S (sign) error
- is one contributor (out of many) to the so-called replication crisis
10.1 Problem: Generalizability
- beyond spurious results, how researchers interpret the implications of their findings is problematic
Unfortunately, outside of a few domains such as psycholinguistics, it remains rare to see psychologists model stimuli as random effects – despite the fact that most inferences researchers draw are clearly meant to generalize over populations of stimuli.
— Yarkoni (2022), p. 4
- if we don’t include grouping factors in our models, our findings are not generalisable beyond our sample
- it could be that our findings are due to a few participants or experimental items who deviate from the rest
- we need to take this by-grouping factor variation into account, but how?
10.2 Solution 1: Averaging
- e.g., repeated measures ANOVA
- seperate models for by-participant and by-item variance (with averaging) interpreted together
- PRO: takes both by-participant and -item variance into account
- CONs: not flexible or approrpriate for complex designs, and:
- loses information regarding the variation across the grouped observations
- lowers N
- e.g., if we average over participants, we’d have 1 only data point per participant!
- therefore loses statistical power (Type II error)
- inflates Type I error (chance of a false positive)
- in sum: not optimal
10.3 Solution 2: Single observations
- run an experiment without repeated measures
- but this lowers statistical power
- and drastically reduces generalizability
- e.g., we could present 60 participants with a single item
- or we could present 1 participant with 60 trials
- but these findings also can’t be generalised beyond that one item or one participant…
- in sum: not optimal
10.4 Solution 3: Linear mixed models
best available solution: use repeated-measures design and mixed models
a.k.a. mixed (effects) models/LM(E)Ms, multi-level models, hierarchical models
“mixed” because they contain:
- fixed effects: usually predictors; describe systematic variation in our data that we wish to explain
- random effects: unsystematic variation that are due to random sampling
random effects take dependence between observations into account
- contain varying intercepts and slopes per level of a grouping factor
fixed effects estimates are usually qualititatively unchanged
- what is affected in the measures of variance
11 History of mixed-effects models
11.1 1973: Language-as-fixed-effect-fallacy
- none of these ideas are new to linguistics
- Clark (1973):
- without including dependencies between repeated observations from the same linguistic items in our models, we cannot generalise our findings beyond our stimuli
- our results are relevant only for the subset of the population from which we sampled
The remedies for the language-as-fixed-effect fallacy are for the most part obvious. They include doing the right statistics, choosing the appropriate experimental design, and selecting a random or representative sample of language.
— Clark (1973), p. 347
11.2 ANOVAs: aggregation
- repeated measures ANOVAs were commonly used to take dependence between observations into account (and are still common in come fields today)
- require aggregation (i.e., averaging) over items or subjects, not both simultaneously
- drastically reduces our number of observations
- loss of information in the variance of observed data
- i.e., a loss of power (Type II error) and inflated Type I error (false positive)
11.3 2008: Baayen et al. (2008) and lme4
enter mixed models with crossed random effects
Journal of Memory and Language, Special Issue: Emerging Data Analysis
in addition, Baayen (2008) was published, a textbook for analysing linguistic data with R with an emphasis on LMMs with
lme4
Learning Objectives 🏁
Today we learned about…
- the independence assumption ✅
- types of non-independence in linguistic data ✅
- the history of mixed models in linguistics ✅
12 Task
Discuss the following questions.
- What is the independence assumption?
- What happens when the independence assumption is violated?
- What is the language-as-fixed-effect-fallacy?
- What other sources of variance might be present in language research?
- Why are repeated measures ANOVAs sub-optimal?
Session Info
Developed with Quarto using R version 4.4.0 (2024-04-24) (Puppy Cup) and RStudio version 2023.9.0.463 (Desert Sunflower), and the following packages: