Regression for Linguists
  • D. Palleschi
  1. Mixed models
  2. 8  Independence
  • Overview
    • Course overview
    • Resources and Set-up
  • Day 1: Simple linear regression
    • 1  Understanding straight lines
    • 2  Simple linear regression
    • 3  Continuous predictors
  • Day 2: Multiple regression
    • 4  Multiple Regression
    • 5  Categorical predictors
  • Day 3: Logistic regression
    • 6  Logistic regression
  • Report 1
    • 7  Report 1
  • Mixed models
    • 8  Independence
    • 9  Random intercepts
    • 10  Random slopes
    • 11  Shrinkage and Partial Pooling
    • 12  Model selection
    • 13  Model selection: Example
  • Report 2
    • 14  Report 2
  • References

Table of contents

  • 9 Independence assumption
    • 9.1 (Non-)Independence
    • 9.2 Repeated measures design
    • 9.3 Other sources of non-independence
  • 10 Pseudoreplication
    • 10.1 Problem: Generalizability
    • 10.2 Solution 1: Averaging
    • 10.3 Solution 2: Single observations
    • 10.4 Solution 3: Linear mixed models
  • 11 History of mixed-effects models
    • 11.1 1973: Language-as-fixed-effect-fallacy
    • 11.2 ANOVAs: aggregation
    • 11.3 2008: Baayen et al. (2008) and lme4
  • 12 Task
    • Session Info

8  Independence

Motiviating mixed models

Author
Affiliation

Daniela Palleschi

Humboldt-Universität zu Berlin

Published

January 12, 2024

Modified

February 13, 2024

Under construction

This chapter is not fully translated from bullet points (from my slides) to prose. This will happen eventually (hopefully by spring 2024).

This lecture covers Winter & Grice (2021) (Sections 1 and 2), Chapter 14 'Mixed Models 1: Conceptual Introduction (Sections 14.1-14.3; Winter, 2019), and Sections 8.1-8.2 from Ch. 8 (Mixed-effects models I: Linear Regression) in Sonderegger (2023).

Learning Objectives

Today we will learn about…

  • the independence assumption
  • types of non-independence in linguistic data
  • the history of mixed models in linguistics

9 Independence assumption

  • we already learned about some model assumptions
    • assumption of normality of residuals
    • homoscedasticity (constant variance) of residuals
    • absence of collinearity of predictors
  • there another, argulably more important assumption
    • assumption of independence

9.1 (Non-)Independence

  • non-independence: any possible link or connection between groups of data points
    • e.g., two observations from the same participant will tend to be more similar than to completely independent observations
    • any case where you might expect some clustering of observations by some grouping factor
  • the independence assumption assumes that our data points are not linked
    • i.e., the value of one observation is completely independent from another
  • violations of this assumption have major implications for Type I (alpha) error
    • i.e., the chances of observing an effect where there is none (false positive)
  • it also artificially inflates sample size, which affects statistical power

9.2 Repeated measures design

  • the reason most (experimental) linguistic data is non-independent is the use of the repeated-measures design
    • collecting multiple data points from e.g., the same participant and for the same item
    • increases statistical power, needing fewer participants (more data points, lower variance due to control in variability between subjects)
    • saves resources (fewer subjects)

9.3 Other sources of non-independence

  • non-independence is prevalent in other fields of linguistics, e.g.,
    • corpus studies: text, author, language, dialect, register
    • phonetic experiments: speaker, listener, exact repetitions
    • socio-phonetics: dialect/geographical proximity, register, speaker

10 Pseudoreplication

Pseudoreplication refers to the treatment of dependent observations as independent data points, which causes an overabundance of erroneously significant results.

— Winter (2011), p. 2137

  • analysing nonindependent data as if they were independent
  • essentially, violating the independence assumption
    • very (very) common in older publications
  • can also result in Type M (magnitude) and S (sign) error
  • is one contributor (out of many) to the so-called replication crisis

10.1 Problem: Generalizability

  • beyond spurious results, how researchers interpret the implications of their findings is problematic

Unfortunately, outside of a few domains such as psycholinguistics, it remains rare to see psychologists model stimuli as random effects – despite the fact that most inferences researchers draw are clearly meant to generalize over populations of stimuli.

— Yarkoni (2022), p. 4

  • if we don’t include grouping factors in our models, our findings are not generalisable beyond our sample
    • it could be that our findings are due to a few participants or experimental items who deviate from the rest
  • we need to take this by-grouping factor variation into account, but how?

10.2 Solution 1: Averaging

  • e.g., repeated measures ANOVA
    • seperate models for by-participant and by-item variance (with averaging) interpreted together
  • PRO: takes both by-participant and -item variance into account
  • CONs: not flexible or approrpriate for complex designs, and:
    • loses information regarding the variation across the grouped observations
    • lowers N
      • e.g., if we average over participants, we’d have 1 only data point per participant!
    • therefore loses statistical power (Type II error)
    • inflates Type I error (chance of a false positive)
  • in sum: not optimal

10.3 Solution 2: Single observations

  • run an experiment without repeated measures
    • but this lowers statistical power
    • and drastically reduces generalizability
  • e.g., we could present 60 participants with a single item
    • or we could present 1 participant with 60 trials
    • but these findings also can’t be generalised beyond that one item or one participant…
  • in sum: not optimal

10.4 Solution 3: Linear mixed models

  • best available solution: use repeated-measures design and mixed models

  • a.k.a. mixed (effects) models/LM(E)Ms, multi-level models, hierarchical models

  • “mixed” because they contain:

    • fixed effects: usually predictors; describe systematic variation in our data that we wish to explain
    • random effects: unsystematic variation that are due to random sampling
  • random effects take dependence between observations into account

    • contain varying intercepts and slopes per level of a grouping factor
  • fixed effects estimates are usually qualititatively unchanged

    • what is affected in the measures of variance

11 History of mixed-effects models

11.1 1973: Language-as-fixed-effect-fallacy

  • none of these ideas are new to linguistics
  • Clark (1973):
    • without including dependencies between repeated observations from the same linguistic items in our models, we cannot generalise our findings beyond our stimuli
    • our results are relevant only for the subset of the population from which we sampled

The remedies for the language-as-fixed-effect fallacy are for the most part obvious. They include doing the right statistics, choosing the appropriate experimental design, and selecting a random or representative sample of language.

— Clark (1973), p. 347

11.2 ANOVAs: aggregation

  • repeated measures ANOVAs were commonly used to take dependence between observations into account (and are still common in come fields today)
    • require aggregation (i.e., averaging) over items or subjects, not both simultaneously
    • drastically reduces our number of observations
    • loss of information in the variance of observed data
    • i.e., a loss of power (Type II error) and inflated Type I error (false positive)

11.3 2008: Baayen et al. (2008) and lme4

  • enter mixed models with crossed random effects

  • Journal of Memory and Language, Special Issue: Emerging Data Analysis

    • Baayen et al. (2008): introduction of lme4 package for linear mixed models
    • Jaeger (2008): overview of generalised linear mixed models
  • in addition, Baayen (2008) was published, a textbook for analysing linguistic data with R with an emphasis on LMMs with lme4

Learning Objectives 🏁

Today we learned about…

  • the independence assumption ✅
  • types of non-independence in linguistic data ✅
  • the history of mixed models in linguistics ✅

12 Task

Discuss the following questions.

  1. What is the independence assumption?
  2. What happens when the independence assumption is violated?
  3. What is the language-as-fixed-effect-fallacy?
  4. What other sources of variance might be present in language research?
  5. Why are repeated measures ANOVAs sub-optimal?

Session Info

Developed with Quarto using R version 4.4.0 (2024-04-24) (Puppy Cup) and RStudio version 2023.9.0.463 (Desert Sunflower), and the following packages:

R version 4.4.0 (2024-04-24)
Platform: aarch64-apple-darwin20
Running under: macOS Ventura 13.2.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

loaded via a namespace (and not attached):
 [1] htmlwidgets_1.6.4 compiler_4.4.0    fastmap_1.1.1     cli_3.6.2        
 [5] htmltools_0.5.8.1 tools_4.4.0       rstudioapi_0.16.0 yaml_2.3.7       
 [9] rmarkdown_2.24    knitr_1.43        jsonlite_1.8.7    xfun_0.40        
[13] digest_0.6.33     rlang_1.1.3       renv_1.0.7        evaluate_0.21    

References

Baayen, R. H. (2008). Analyzing Linguistic Data: A Practical Introduction to Statistics using R.
Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390–412. https://doi.org/10.1016/j.jml.2007.12.005
Clark, H. H. (1973). The language-as-fixed-effect fallacy: A critique of language statistics in psychological research. Journal of Verbal Learning and Verbal Behavior, 12(4), 335–359. https://doi.org/10.1016/S0022-5371(73)80014-3
Jaeger, T. F. (2008). Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language, 59(4), 434–446. https://doi.org/10.1016/j.jml.2007.11.007
Sonderegger, M. (2023). Regression Modeling for Linguistic Data.
Winter, B. (2011). PSEUDOREPLICATION IN PHONETIC RESEARCH.
Winter, B. (2019). Statistics for Linguists: An Introduction Using R. In Statistics for Linguists: An Introduction Using R. Routledge. https://doi.org/10.4324/9781315165547
Winter, B., & Grice, M. (2021). Independence and generalizability in linguistics. Linguistics, 59(5), 1251–1277. https://doi.org/10.1515/ling-2019-0049
Yarkoni, T. (2022). The generalizability crisis. Behavioral and Brain Sciences, 45, e1. https://doi.org/10.1017/S0140525X20001685
7  Report 1
9  Random intercepts
Source Code
---
title: "Independence"
subtitle: "Motiviating mixed models"
author: "Daniela Palleschi"
institute: Humboldt-Universität zu Berlin
# footer: "Lecture 1.1 - R und RStudio"
lang: en
date: "01/12/2024"
date-modified: last-modified
echo: false
---

::: {.callout-warning}
# Under construction {.unnumbered .uncounted .unlisted}

This chapter is not fully translated from bullet points (from my slides) to prose. This will happen eventually (hopefully by spring 2024).
:::

This lecture covers @winter_independence_2021 (Sections 1 and 2), Chapter 14 \'Mixed Models 1: Conceptual Introduction [Sections 14.1-14.3\; @winter_statistics_2019], and Sections 8.1-8.2 from Ch. 8 (Mixed-effects models I: Linear Regression) in @sonderegger_regression_2023.

# Learning Objectives {.unnumbered .unlisted}

Today we will learn about...

- the independence assumption
- types of  non-independence in linguistic data
- the history of mixed models in linguistics

# Independence assumption

- we already learned about some model assumptions
  - assumption of *normality* of residuals
  - *homoscedasticity* (constant variance) of residuals
  - absence of *collinearity* of predictors

- there another, argulably more important assumption
  - assumption of *independence*
  
## (Non-)Independence

- non-independence: any possible link or connection between groups of data points
  + e.g., two observations from the same participant will tend to be more similar than to completely independent observations
  + any case where you might expect some clustering of observations by some grouping factor
- the independence assumption assumes that our data points are *not* linked
  + i.e., the value of one observation is completely independent from another
- violations of this assumption have major implications for Type I (alpha) error
  + i.e., the chances of observing an effect where there is none (false positive)
- it also artificially inflates sample size, which affects statistical power

## Repeated measures design

- the reason most (experimental) linguistic data is non-independent is the use of the repeated-measures design
  + collecting multiple data points from e.g., the same participant and for the same item
  + increases statistical power, needing fewer participants (more data points, lower variance due to control in variability between subjects)
  + saves resources (fewer subjects)

## Other sources of non-independence

- non-independence is prevalent in other fields of linguistics, e.g., 
  + corpus studies: text, author, language, dialect, register
  + phonetic experiments: speaker, listener, exact repetitions
  + socio-phonetics: dialect/geographical proximity, register, speaker

# Pseudoreplication

> Pseudoreplication refers to the treatment of dependent observations as independent data points, which causes an overabundance of erroneously significant results.

--- @winter_pseudoreplication_2011, p. 2137

- analysing nonindependent data as if they were independent
- essentially, violating the independence assumption
  + very (*very*) common in older publications
- can also result in Type M (magnitude) and S (sign) error
- is one contributor (out of many) to the so-called replication crisis

## Problem: Generalizability

- beyond spurious results, how researchers interpret the implications of their findings is problematic

> Unfortunately, outside of a few domains such as psycholinguistics, it remains rare to see psychologists model stimuli as random effects – despite the fact that most inferences researchers draw are clearly meant to generalize over populations of stimuli.

--- @yarkoni_generalizability_2022, p. 4

- if we don't include grouping factors in our models, our findings are not generalisable beyond our sample
  + it could be that our findings are due to a few participants or experimental items who deviate from the rest
- we need to take this by-grouping factor variation into account, but how?

## Solution 1: Averaging

- e.g., repeated measures ANOVA
  + seperate models for by-participant and by-item variance (with averaging) interpreted together

- PRO: takes both by-participant and -item variance into account
- CONs: not flexible or approrpriate for complex designs, and:
  + loses information regarding the variation across the grouped observations
  + lowers N
    + e.g., if we average over participants, we'd have 1 only data point per participant!
  + therefore loses statistical power (Type II error)
  + inflates Type I error (chance of a false positive)

- in sum: not optimal

## Solution 2: Single observations

- run an experiment without repeated measures
  + but this lowers statistical power
  + and drastically reduces generalizability
- e.g., we could present 60 participants with a single item
  + or we could present 1 participant with 60 trials
  + but these findings also can't be generalised beyond that one item or one participant...

- in sum: not optimal

## Solution 3: Linear mixed models

- best available solution: use repeated-measures design and mixed models

- a.k.a. mixed (effects) models/LM(E)Ms, multi-level models, hierarchical models
- "mixed" because they contain:
  + **fixed effects**: usually predictors; describe systematic variation in our data that we wish to explain
  + **random effects**: unsystematic variation that are due to random sampling
- random effects take dependence between observations into account
  + contain varying intercepts and slopes per level of a **grouping factor**
- fixed effects estimates are usually qualititatively unchanged
  + what is affected in the measures of *variance*

# History of mixed-effects models

## 1973: Language-as-fixed-effect-fallacy

- none of these ideas are new to linguistics
- @clark_language-as-fixed-effect_1973:
  + without including dependencies between repeated observations from the same **linguistic items** in our models, we cannot generalise our findings beyond our stimuli
  + our results are relevant only for the subset of the population from which we sampled

> The remedies for the language-as-fixed-effect fallacy are for the most part obvious. They include doing the right statistics, choosing the appropriate experimental design, and selecting a random or representative sample of language.

--- @clark_language-as-fixed-effect_1973, p. 347

## ANOVAs: aggregation

- repeated measures ANOVAs were commonly used to take dependence between observations into account (and are still common in come fields today)
  + require aggregation (i.e., averaging) over items *or* subjects, not both simultaneously
  + drastically reduces our number of observations
  + loss of information in the variance of observed data
  + i.e., a loss of power (Type II error) and inflated Type I error (false positive)

## 2008: @baayen_mixed-effects_2008 and `lme4`

- enter mixed models with *crossed* random effects

- Journal of Memory and Language, Special Issue: Emerging Data Analysis
  + @baayen_mixed-effects_2008: introduction of `lme4` package for linear mixed models
  + @jaeger_categorical_2008: overview of generalised linear mixed models
- in addition, @baayen_analyzing_2008 was published, a textbook for analysing linguistic data with R with an emphasis on LMMs with `lme4`


# Learning Objectives 🏁 {.unnumbered .unlisted}

Today we learned about...

- the independence assumption ✅
- types of  non-independence in linguistic data ✅
- the history of mixed models in linguistics ✅

# Task

Discuss the following questions.

1. What is the independence assumption?
2. What happens when the independence assumption is violated?
2. What is the language-as-fixed-effect-fallacy?
3. What other sources of variance might be present in language research?
4. Why are repeated measures ANOVAs sub-optimal?

## Session Info {.unnumbered visibility="uncounted"}

```{r}
#| eval: false
#| echo: false
RStudio.Version()$version
```


Developed with Quarto using `r R.version.string` (`r R.version$nickname`) and RStudio version 2023.9.0.463 (Desert Sunflower), and the following packages:

```{r}
sessionInfo()
```

## References {.unlisted .unnumbered visibility="uncounted"}

::: {#refs custom-style="Bibliography"}
:::