Reproducible analysis reports with eye-tracking reading time data
  • D. Palleschi
  • Download PDF
  • Download ePub
  1. Background
  2. 1  Reproducible Analayses
  • Course Intro
  • Background
    • 1  Reproducible Analayses
    • 2  Eye-tracking during reading
    • 3  Working with eye-tracking reading data in R
  • Exploratory Data Analysis
    • 4  Data wrangling
    • 5  Data Visualisation with ggplot2
  • Modelling

Table of contents

  • 1.1 Replication
    • 1.1.1 An example from language research
  • 1.2 Reproducibility
    • 1.2.1 Replication vs. Reproducibility
  • 1.3 Open Science: Why should I care?
    • 1.3.1 What can I do?
    • 1.3.2 How to do better science
    • 1.3.3 What will we learn here?
  • 1.4 R is for Reproducibility
  • 1.5 Exercises
    • 1.5.1 RStudio
    • 1.5.2 Quarto
    • 1.5.3 Quarto Exercises
    • 1.5.4 Quarto cont’d
  • References

1  Reproducible Analayses

Published

April 12, 2023

1.1 Replication

“There is increasing concern that in modern research, false findings may be the majority or even the vast majority of published research claims”

– Ioannidis (2005)

  • replication refers to re-running a previous experiment with as few differences as possible
    • aim: determine whether the original results were robust and are replicable
    • if yes, great! the original findings are reliable
    • if no, hmm, maybe the original findings were false positives? or due to some other factor?
  • in recent years, researchers have tried to replicate classic studies in their field
    • but in many cases, they did not get the same effects the original study reported (and were famous for)
  • this began the replication crisis

1.1.1 An example from language research

  • Nieuwland et al. (2018): a direct EEG1 replication (versus conceptual replication)
  • a multi-lab replication of DeLong et al. (2005)’s impactful paper
    • DeLong et al. (2005): reported N400 effects elicted at unexpected nouns, but also on preceding determiners (English a/an) when it signalled an unexpected word,
      • e.g., The day was breezy so the boy went outside to fly…a kite/*an airplane
      • taken as evidence of pre-activation of phonological form, graded by cloze probability
    • Nieuwland et al. (2018): replicated N400 at noun, but not at adjective
      • i.e., failure to replicate a famous finding

1.2 Reproducibility

  • reproducibility refers to the ability to reproduce somebody’s analyses with their
    • data
    • and code
  • it is not something we do once, nor is it something that will get us published
    • but it’s important for open science and encourages transparency

1.2.1 Replication vs. Reproducibility

  • replication of a study
    • repeating an experiment
    • getting similar results
  • reproducibility of analyses
    • repeating analyses of the same data
    • getting the same results
  • e.g., when you submit a paper to a journal, they make ask for your data and code so reviewers can reproduce your analyses
    • requires data and code
  • if you have interesting findings, other researchers (or future you) may want to replicate your study to see if they can replicate your findings
    • (may require) stimuli, set-up and presentation information, participant demographics

1.3 Open Science: Why should I care?

  1. Science is cumulative
    • We should ensure we’re building on reliable, robust findings
    • i.e., it’s good scientific practice
  2. Because the field cares
    • replication/reproducibility are beginning to be foregrounded by e.g., journals/job advertisements
  3. Helps future you
    • pre-registration, reproducible analyses, clean and shareable data: all help future you

1.3.1 What can I do?

  • there’s a variety of open science practices that we can choose to implement
  • some suggestions from Kathawalla et al. (2021):

Level: Easy

  1. Journal Club
  2. Project workflow
  3. Pre-prints

Level: Medium

  1. Reproducible code
  2. Sharing data
  3. Transparent manuscripts
  4. Pre-registration

Level: Difficult

  1. Registered reports

1.3.2 How to do better science

  • don’t be afraid of making mistakes
    • (most) researchers aren’t statisticians or programmers
    • do the best you can, and be transparent
  • doing some of the steps is better than doing none

1.3.3 What will we learn here?

Design and Reporting

  • Preregistration/Registered Reports
  • Transparent writing

Analysis

  • Reproducible code
    • with open source software (R, RStudio, packages)
    • dynamic reports with Quarto/Rmarkdown
  • Project workflow
    • folder structure
      • how to sensibly set up your folders
    • contained environments
      • using RProjects and the here package

Image source: Kathawalla et al. (2021) (all rights reserved)

1.4 R is for Reproducibility

  • we will be working with R, RStudio, Quarto, and RProjects
    • R: a programming language for statistical computing and graphics
    • RStudio: an integrated development environment (IDE)
      • RStudio Desktop
      • RStudio Server
    • Quarto (similar to Rmarkdown): dynamic reports
      • combining text, code, and printed tables and figures
    • RProjects: a workflow tool
      • contains all files necessary for a project
      • works with relative file paths

1.5 Exercises

1.5.1 RStudio

  1. Open RStudio
    • locate the Environment, Files, and Console panes
    • File > New File > R script
    • write [your birth-month number]*[the your birth day] and hit Enter
    • write print("Hello World!")
    • write number <- 3*32; this will create an object/variable ‘number’
    • write string <- "Hello World!"; this will create an object/variable ‘string’
    • write number
    • write string
    • add comments describing each step using #
    • File > Save As

# multiply 5 by 7
5*7
[1] 35
# print some text
print("Hello World!")
[1] "Hello World!"
# save an object 'number' with 5*7
number <- 5*7
# save an object 'string' with text
string <- "Hello World!"
# print number
number
[1] 35
# print string
string
[1] "Hello World!"
# do math with objects
number+number
[1] 70
number*number
[1] 1225
number*2
[1] 70
month <- 5
day <- 7
month*day
[1] 35

1.5.2 Quarto2

  • R scripts are a great way to keep track of what you did
    • however, the output is not saved, and adding comments with # gets kind of chunky
    • enter: dynamic reports!
  • dynamic reports are those that combine text, code, and output
    • they are a great tool for communicating, collaborating, and documenting
    • they are also fantastic for note-taking
  • Rmarkdown vs. Quarto
    • both can combine text with code, outputting PDFs, Word Documents, html, or slides
    • main difference: Quarto has native support of a wider range of programming languages (e.g., Python and Julia)
  • Want to know more? Check out Hadley Wickham’s intro (Wickham et al., n.d.)

1.5.2.1 YAML

---
title: "My title"
author: "My name"
format: html
---
  • YAML is a human-readable programming language used to configure documents
  • formatting is important: but be sandwiched between --- and ---
  • in Quarto the output type must at least be given (with R: pdf, html, revealjs)

1.5.2.2 Headings and text

# This is a heading

This is text.

## This is a sub-heading

This is more text.
  • headings are indicated by #
    • the number of #’s indicates the heading level

1.5.2.3 Code snippets

# do some math
year <- 1989
dog <- "Lola"
  • sandwiched between markdown```{r} and `markdown
    • shortcut: Ctrl/Cmd+Alt+I

1.5.2.4 In-line code

I was born on `r month`/`r day`/`r year`. My dog's name is `r dog`.

I was born on 5/7/1989. My dog’s name is Lola.

  • code output that was run above text can be called in-line using `r `

1.5.2.5 Altogether

---
title: "My title"
author: "My name"
format: html
---

# This is a heading

This is text.

## This is a sub-heading

This is more text.

Add some code chunks.

```{r}
# do some math
year <- 1989
dog <- "Lola"
```

And use call objects for in-line code: I was born on `r month`/`r day`/`r year`. My dog's name is `r dog`.

1.5.3 Quarto Exercises

  1. Create a new Quarto document
    • File > New File > Quarto Document
    • Read the instructions
    • Practice running the chunks individually
    • render the document
    • verify that you can modify the code, re-run it, and see modified output
  1. Create one new Quarto document for each of the three built-in formats: HTML, PDF and Word.
    • Render each of the three documents
    • How do the outputs differ?
    • How do the inputs differ?3

1.5.4 Quarto cont’d

  • Choose a Quarto document:
    • give it a title, your name (author), and unclick ‘Use visual markdown editor’
  • Render
  • YAML:
title: "Eye-tracking during reading"
subtitle: "Lecture 2 notes"
author: "[YOUR NAME HERE]"
lang: en
date: `r Sys.Date()`
  • Render

  • you can now try writing your class notes in this document (if you’re brave)

References

DeLong, K. A., Urbach, T. P., & Kutas, M. (2005). Probabilistic word pre-activation during language comprehension inferred from electrical brain activity. Nature Neuroscience, 8(8), 1117–1121. https://doi.org/10.1038/nn1504
Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Med, 2(8), 2–8. https://doi.org/10.1371/journal.pmed.0020124
Kathawalla, U.-K., Silverstein, P., & Syed, M. (2021). Easing Into Open Science: A Guide for Graduate Students and Their Advisors. Collabra: Psychology, 7(1), 18684. https://doi.org/10.1525/collabra.18684
Nieuwland, M. S., Politzer-Ahles, S., Heyselaar, E., Segaert, K., Darley, E., Kazanina, N., Von Grebmer Zu Wolfsthurn, S., Bartolozzi, F., Kogan, V., Ito, A., Mézière, D., Barr, D. J., Rousselet, G. A., Ferguson, H. J., Busch-Moreno, S., Fu, X., Tuomainen, J., Kulakova, E., Husband, E. M., … Huettig, F. (2018). Large-scale replication study reveals a limit on probabilistic prediction in language comprehension. eLife, 7, e33468. https://doi.org/10.7554/eLife.33468
Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (n.d.). R for Data Science (2nd ed.). https://r4ds.hadley.nz/

  1. electroencephalography↩︎

  2. https://r4ds.hadley.nz/quarto.html#workflow↩︎

  3. You may need to install LaTeX in order to build the PDF output — RStudio will prompt you if this is necessary.↩︎

Background
2  Eye-tracking during reading
Source Code
---
date: 2023-04-12
bibliography: references/references.json
csl: references/apa.csl
---

# Reproducible Analayses

```{r, eval = T, cache = F, echo = F, message=FALSE}
# Create references.json file based on the citations in this script
# make sure you have 'bibliography: references.json' in the YAML
rbbt::bbt_update_bib("reproducibility.qmd")
```

```{r}
knitr::opts_chunk$set(eval = T, # change this to 'eval = T' to reproduce the analyses; make sure to comment out
                      echo = T, # 'print code chunk?'
                      message = F, # 'print messages (e.g., warnings)?'
                      error = F,
                      warning = F)
```

## Replication

> "There is increasing concern that in modern research, false findings may be the majority or even the vast majority of published research claims"
>
> -- <cite>@ioannidis_why_2005</cite>

- replication refers to re-running a previous experiment with as few differences as possible
  + aim: determine whether the original results were *robust* and are *replicable*
  + if yes, great! the original findings are reliable
  + if no, hmm, maybe the original findings were false positives? or due to some other factor?

- in recent years, researchers have tried to *replicate* classic studies in their field
  + but in many cases, they did not get the same effects the original study reported (and were famous for)
- this began the ***replication crisis***

### An example from language research

- @nieuwland_large-scale_2018: a *direct* EEG^[electroencephalography] replication (versus *conceptual* replication)

::: {style="font-size: 75%"}
  + a multi-lab replication of @delong_probabilistic_2005's impactful paper
    + @delong_probabilistic_2005: reported N400 effects elicted at unexpected nouns, but also on preceding determiners (English *a/an*) when it signalled an unexpected word, 
      + e.g., *The day was breezy so the boy went outside to fly...a kite*/\**an airplane*
      + taken as evidence of pre-activation of phonological form, graded by cloze probability
    + @nieuwland_large-scale_2018: replicated N400 at noun, but not at adjective
      + i.e., *failure to replicate* a famous finding
:::
  

## Reproducibility

- reproducibility refers to the ability to *reproduce* somebody's analyses with their
  + data
  + *and* code
- it is not something we do once, nor is it something that will get us published
  + but it's important for open science and encourages transparency

### Replication vs. Reproducibility

- **replication** of a study
  + repeating an **experiment**
  + getting *similar* results
- **reproducibility** of analyses
  + repeating **analyses** of the *same data*
  + getting the *same* results
  
- e.g., when you submit a paper to a journal, they make ask for your data and code so reviewers can *reproduce* your analyses
  + requires data and code
- if you have interesting findings, other researchers (or future you) may want to *replicate* your study to see if they can *replicate* your findings
  + (may require) stimuli, set-up and presentation information, participant demographics
  
## Open Science: Why should I care?

1. Science is cumulative
    + We should ensure we're building on reliable, robust findings
    + i.e., it's *good* scientific practice
2. Because the field cares
    + replication/reproducibility are beginning to be foregrounded by e.g., journals/job advertisements
3. Helps future you
    + pre-registration, reproducible analyses, clean and shareable data: all help *future you*
    
### What can I do?

- there's a variety of open science practices that we can choose to implement
- some suggestions from @kathawalla_easing_2021:

:::: {.columns}

::: {.column width="30%"}
Level: Easy

1. Journal Club
2. Project workflow
3. Pre-prints
:::

::: {.column width="40%"}
Level: Medium

4. Reproducible code
5. Sharing data
6. Transparent manuscripts
7. Pre-registration
:::

::: {.column width="30%"}    
Level: Difficult

8. Registered reports
:::

::::

### How to do better science

- don't be afraid of making mistakes
  + (most) researchers aren't statisticians or programmers
  + do the best you can, and ***be transparent***
- doing *some* of the steps is better than doing *none*

### What will we learn here?



:::: {.columns} 

::: {.column width="50%"}

Design and Reporting

::: {style="font-size: 75%;"}
- Preregistration/Registered Reports
- Transparent writing
:::

Analysis

::: {style="font-size: 75%;"}
- Reproducible code
  + with open source software (R, RStudio, packages)
  + dynamic reports with Quarto/Rmarkdown
  
- Project workflow
  + folder structure
      + how to sensibly set up your folders
   + contained environments
      + using RProjects and the `here` package
:::

:::

::: {.column width="50%"}

```{r echo = F, fig.env = "figure",out.height="100%", fig.align = "center", set.cap.width=T, fig.cap="Image source: @kathawalla_easing_2021 (all rights reserved)"}
knitr::include_graphics(here::here("media/Kathawalla_research_cycle.png"))
```

:::  

::::


## R is for Reproducibility

- we will be working with R, RStudio, Quarto, and RProjects
  + R: a programming language for statistical computing and graphics
  + RStudio: an integrated development environment (IDE)
    - RStudio Desktop
    - RStudio Server
  + Quarto (similar to Rmarkdown): dynamic reports
    - combining text, code, and printed tables and figures
  + RProjects: a workflow tool
    - contains all files necessary for a project
    - works with *relative* file paths

::: {.content-visible when-format="revealjs"}
::: {.notes}
Students: open R, then RStudio, then create an RProject
Me: show them each on my computer

- R can run code and save a script
- RStudio has so many more options (cheatsheet)
- RProjects keep everything tidy and together
:::
:::

## Exercises

### RStudio {.smaller}

1. Open RStudio
    - locate the Environment, Files, and Console panes
    - File > New File > R script
    - write `[your birth-month number]*[the your birth day]` and hit Enter
    - write `print("Hello World!")`
    - write `number <- 3*32`; this will create an object/variable 'number'
    - write `string <- "Hello World!"`; this will create an object/variable 'string'
    - write `number`
    - write `string`
    - add comments describing each step using `#`
    - File > Save As

## {-}

```{r, eval = T}
#| output-location: column-fragment
```
```{r, eval = T}
#| output-location: column-fragment
# multiply 5 by 7
5*7
```
```{r, eval = T}
#| output-location: column-fragment
# print some text
print("Hello World!")
```
```{r, eval = T}
#| output-location: column-fragment
# save an object 'number' with 5*7
number <- 5*7
```
```{r, eval = T}
#| output-location: column-fragment
# save an object 'string' with text
string <- "Hello World!"
```
```{r, eval = T}
#| output-location: column-fragment
# print number
number
```
```{r, eval = T}
#| output-location: column-fragment
# print string
string
```
```{r, eval = T}
#| output-location: column-fragment
# do math with objects
number+number
```
```{r, eval = T}
#| output-location: column-fragment
number*number
```
```{r, eval = T}
#| output-location: column-fragment
number*2
```
```{r, eval = T}
#| output-location: column-fragment
month <- 5
```
```{r, eval = T}
#| output-location: column-fragment
day <- 7
```
```{r, eval = T}
#| output-location: column-fragment
month*day
```

### Quarto^[https://r4ds.hadley.nz/quarto.html#workflow] {.smaller}

- R scripts are a great way to keep track of what you did
  + however, the output is not saved, and adding comments with `#` gets kind of chunky
  + enter: dynamic reports!
- dynamic reports are those that combine text, code, and output
  + they are a great tool for communicating, collaborating, and documenting
  + they are also fantastic for note-taking
- Rmarkdown vs. Quarto
  + both can combine text with code, outputting PDFs, Word Documents, html, or slides
  + main difference: Quarto has native support of a wider range of programming languages (e.g., Python and Julia)

- Want to know more? Check out [Hadley Wickham's intro](https://r4ds.hadley.nz/quarto.html) [@wickham_r_nodate]

#### YAML

```{r, eval = F}
#| code-line-numbers: false
---
title: "My title"
author: "My name"
format: html
---
```

- YAML is a human-readable programming language used to configure documents
- formatting is important: but be sandwiched between `---` and `---`
- in Quarto the output type must at least be given (with R: pdf, html, revealjs)



#### Headings and text

```{r, eval = F}
# This is a heading

This is text.

## This is a sub-heading

This is more text.
```

- headings are indicated by `#`
  + the number of `#`'s indicates the heading level



#### Code snippets

```{r}
#| results: asis

# do some math
year <- 1989
dog <- "Lola"
```

- sandwiched between ````markdown```{r}```` and ````markdown```
  + shortcut: Ctrl/Cmd+Alt+I



#### In-line code

```{r, eval = F}
I was born on `r month`/`r day`/`r year`. My dog's name is `r dog`.
```

I was born on `r month`/`r day`/`r year`. My dog's name is `r dog`.

- code output that was run *above* text can be called in-line using \``r` \`

#### Altogether

````markdown
---
title: "My title"
author: "My name"
format: html
---

# This is a heading

This is text.

## This is a sub-heading

This is more text.

Add some code chunks.

`r ''````{r}
# do some math
year <- 1989
dog <- "Lola"
```

And use call objects for in-line code: I was born on `r knitr::inline_expr("month")`/`r knitr::inline_expr("day")`/`r knitr::inline_expr("year")`. My dog's name is `r knitr::inline_expr("dog")`.
````

### Quarto Exercises
:::: {.columns}

::: {.column width="50%"}

3. Create a new Quarto document
    - File > New File > Quarto Document
    - Read the instructions
    - Practice running the chunks individually
    - render the document
    - verify that you can modify the code, re-run it, and see modified output
:::

::: {.column width="50%"}

4. Create one new Quarto document for each of the three built-in formats: HTML, PDF and Word. 
    - Render each of the three documents
    - How do the outputs differ? 
    - How do the inputs differ?^[You may need to install LaTeX in order to build the PDF output — RStudio will prompt you if this is necessary.]
:::

::::

### Quarto cont'd {.smaller}

- Choose a Quarto document:
  + give it a title, your name (author), and unclick 'Use visual markdown editor'
- Render
- YAML:

```{r, eval = FALSE}
title: "Eye-tracking during reading"
subtitle: "Lecture 2 notes"
author: "[YOUR NAME HERE]"
lang: en
date: `r Sys.Date()`
```
  + Render
  
- you can now try writing your class notes in this document (if you're brave)

# References {.unnumbered}

::: {#refs custom-style="Bibliography"}
:::