8 Independence

Motiviating mixed models

Author

Affiliation

Daniela Palleschi

Humboldt-Universität zu Berlin

Published

January 12, 2024

Modified

February 13, 2024

Under construction

This chapter is not fully translated from bullet points (from my slides) to prose. This will happen eventually (hopefully by spring 2024).

This lecture covers Winter & Grice (2021) (Sections 1 and 2), Chapter 14 'Mixed Models 1: Conceptual Introduction (Sections 14.1-14.3; Winter, 2019), and Sections 8.1-8.2 from Ch. 8 (Mixed-effects models I: Linear Regression) in Sonderegger (2023).

Learning Objectives

Today we will learn about…

the independence assumption
types of non-independence in linguistic data
the history of mixed models in linguistics

9 Independence assumption

we already learned about some model assumptions
- assumption of normality of residuals
- homoscedasticity (constant variance) of residuals
- absence of collinearity of predictors
there another, argulably more important assumption
- assumption of independence

9.1 (Non-)Independence

non-independence: any possible link or connection between groups of data points
- e.g., two observations from the same participant will tend to be more similar than to completely independent observations
- any case where you might expect some clustering of observations by some grouping factor
the independence assumption assumes that our data points are not linked
- i.e., the value of one observation is completely independent from another
violations of this assumption have major implications for Type I (alpha) error
- i.e., the chances of observing an effect where there is none (false positive)
it also artificially inflates sample size, which affects statistical power

9.2 Repeated measures design

the reason most (experimental) linguistic data is non-independent is the use of the repeated-measures design
- collecting multiple data points from e.g., the same participant and for the same item
- increases statistical power, needing fewer participants (more data points, lower variance due to control in variability between subjects)
- saves resources (fewer subjects)

9.3 Other sources of non-independence

non-independence is prevalent in other fields of linguistics, e.g.,
- corpus studies: text, author, language, dialect, register
- phonetic experiments: speaker, listener, exact repetitions
- socio-phonetics: dialect/geographical proximity, register, speaker

10 Pseudoreplication

Pseudoreplication refers to the treatment of dependent observations as independent data points, which causes an overabundance of erroneously significant results.

— Winter (2011), p. 2137

analysing nonindependent data as if they were independent
essentially, violating the independence assumption
- very (very) common in older publications
can also result in Type M (magnitude) and S (sign) error
is one contributor (out of many) to the so-called replication crisis

10.1 Problem: Generalizability

beyond spurious results, how researchers interpret the implications of their findings is problematic

Unfortunately, outside of a few domains such as psycholinguistics, it remains rare to see psychologists model stimuli as random effects – despite the fact that most inferences researchers draw are clearly meant to generalize over populations of stimuli.

— Yarkoni (2022), p. 4

if we don’t include grouping factors in our models, our findings are not generalisable beyond our sample
- it could be that our findings are due to a few participants or experimental items who deviate from the rest
we need to take this by-grouping factor variation into account, but how?

10.2 Solution 1: Averaging

e.g., repeated measures ANOVA
- seperate models for by-participant and by-item variance (with averaging) interpreted together
PRO: takes both by-participant and -item variance into account
CONs: not flexible or approrpriate for complex designs, and:
- loses information regarding the variation across the grouped observations
- lowers N
  - e.g., if we average over participants, we’d have 1 only data point per participant!
- therefore loses statistical power (Type II error)
- inflates Type I error (chance of a false positive)
in sum: not optimal

10.3 Solution 2: Single observations

run an experiment without repeated measures
- but this lowers statistical power
- and drastically reduces generalizability
e.g., we could present 60 participants with a single item
- or we could present 1 participant with 60 trials
- but these findings also can’t be generalised beyond that one item or one participant…
in sum: not optimal

10.4 Solution 3: Linear mixed models

best available solution: use repeated-measures design and mixed models
a.k.a. mixed (effects) models/LM(E)Ms, multi-level models, hierarchical models
“mixed” because they contain:
- fixed effects: usually predictors; describe systematic variation in our data that we wish to explain
- random effects: unsystematic variation that are due to random sampling
random effects take dependence between observations into account
- contain varying intercepts and slopes per level of a grouping factor
fixed effects estimates are usually qualititatively unchanged
- what is affected in the measures of variance

11 History of mixed-effects models

11.1 1973: Language-as-fixed-effect-fallacy

none of these ideas are new to linguistics
Clark (1973):
- without including dependencies between repeated observations from the same linguistic items in our models, we cannot generalise our findings beyond our stimuli
- our results are relevant only for the subset of the population from which we sampled

The remedies for the language-as-fixed-effect fallacy are for the most part obvious. They include doing the right statistics, choosing the appropriate experimental design, and selecting a random or representative sample of language.

— Clark (1973), p. 347

11.2 ANOVAs: aggregation

repeated measures ANOVAs were commonly used to take dependence between observations into account (and are still common in come fields today)
- require aggregation (i.e., averaging) over items or subjects, not both simultaneously
- drastically reduces our number of observations
- loss of information in the variance of observed data
- i.e., a loss of power (Type II error) and inflated Type I error (false positive)

11.3 2008: Baayen et al. (2008) and `lme4`

enter mixed models with crossed random effects
Journal of Memory and Language, Special Issue: Emerging Data Analysis
- Baayen et al. (2008): introduction of lme4 package for linear mixed models
- Jaeger (2008): overview of generalised linear mixed models
in addition, Baayen (2008) was published, a textbook for analysing linguistic data with R with an emphasis on LMMs with lme4

Learning Objectives 🏁

Today we learned about…

the independence assumption ✅
types of non-independence in linguistic data ✅
the history of mixed models in linguistics ✅

12 Task

Discuss the following questions.

What is the independence assumption?
What happens when the independence assumption is violated?
What is the language-as-fixed-effect-fallacy?
What other sources of variance might be present in language research?
Why are repeated measures ANOVAs sub-optimal?

Session Info

Developed with Quarto using R version 4.4.0 (2024-04-24) (Puppy Cup) and RStudio version 2023.9.0.463 (Desert Sunflower), and the following packages:

R version 4.4.0 (2024-04-24)
Platform: aarch64-apple-darwin20
Running under: macOS Ventura 13.2.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

loaded via a namespace (and not attached):
 [1] htmlwidgets_1.6.4 compiler_4.4.0    fastmap_1.1.1     cli_3.6.2        
 [5] htmltools_0.5.8.1 tools_4.4.0       rstudioapi_0.16.0 yaml_2.3.7       
 [9] rmarkdown_2.24    knitr_1.43        jsonlite_1.8.7    xfun_0.40        
[13] digest_0.6.33     rlang_1.1.3       renv_1.0.7        evaluate_0.21

Learning Objectives

9 Independence assumption

9.1 (Non-)Independence

9.2 Repeated measures design

9.3 Other sources of non-independence

10 Pseudoreplication

10.1 Problem: Generalizability

10.2 Solution 1: Averaging

10.3 Solution 2: Single observations

10.4 Solution 3: Linear mixed models

11 History of mixed-effects models

11.1 1973: Language-as-fixed-effect-fallacy

11.2 ANOVAs: aggregation

11.3 2008: Baayen et al. (2008) and lme4

Learning Objectives 🏁

12 Task

Session Info

References

11.3 2008: Baayen et al. (2008) and `lme4`