1 Reproducible Analayses

Published

April 12, 2023

1.1 Replication

“There is increasing concern that in modern research, false findings may be the majority or even the vast majority of published research claims”

– Ioannidis (2005)

replication refers to re-running a previous experiment with as few differences as possible
- aim: determine whether the original results were robust and are replicable
- if yes, great! the original findings are reliable
- if no, hmm, maybe the original findings were false positives? or due to some other factor?
in recent years, researchers have tried to replicate classic studies in their field
- but in many cases, they did not get the same effects the original study reported (and were famous for)
this began the replication crisis

1.1.1 An example from language research

Nieuwland et al. (2018): a direct EEG¹ replication (versus conceptual replication)

a multi-lab replication of DeLong et al. (2005)’s impactful paper
- DeLong et al. (2005): reported N400 effects elicted at unexpected nouns, but also on preceding determiners (English a/an) when it signalled an unexpected word,
  - e.g., The day was breezy so the boy went outside to fly…a kite/*an airplane
  - taken as evidence of pre-activation of phonological form, graded by cloze probability
- Nieuwland et al. (2018): replicated N400 at noun, but not at adjective
  - i.e., failure to replicate a famous finding

1.2 Reproducibility

reproducibility refers to the ability to reproduce somebody’s analyses with their
- data
- and code
it is not something we do once, nor is it something that will get us published
- but it’s important for open science and encourages transparency

1.2.1 Replication vs. Reproducibility

replication of a study
- repeating an experiment
- getting similar results
reproducibility of analyses
- repeating analyses of the same data
- getting the same results
e.g., when you submit a paper to a journal, they make ask for your data and code so reviewers can reproduce your analyses
- requires data and code
if you have interesting findings, other researchers (or future you) may want to replicate your study to see if they can replicate your findings
- (may require) stimuli, set-up and presentation information, participant demographics

1.3 Open Science: Why should I care?

Science is cumulative
- We should ensure we’re building on reliable, robust findings
- i.e., it’s good scientific practice
Because the field cares
- replication/reproducibility are beginning to be foregrounded by e.g., journals/job advertisements
Helps future you
- pre-registration, reproducible analyses, clean and shareable data: all help future you

1.3.1 What can I do?

there’s a variety of open science practices that we can choose to implement
some suggestions from Kathawalla et al. (2021):

Level: Easy

Journal Club
Project workflow
Pre-prints

Level: Medium

Reproducible code
Sharing data
Transparent manuscripts
Pre-registration

Level: Difficult

Registered reports

1.3.2 How to do better science

don’t be afraid of making mistakes
- (most) researchers aren’t statisticians or programmers
- do the best you can, and be transparent
doing some of the steps is better than doing none

1.3.3 What will we learn here?

Design and Reporting

Preregistration/Registered Reports
Transparent writing

Analysis

Reproducible code
- with open source software (R, RStudio, packages)
- dynamic reports with Quarto/Rmarkdown
Project workflow
- folder structure
  - how to sensibly set up your folders
- contained environments
  - using RProjects and the here package

Image source: Kathawalla et al. (2021) (all rights reserved)

1.4 R is for Reproducibility

we will be working with R, RStudio, Quarto, and RProjects
- R: a programming language for statistical computing and graphics
- RStudio: an integrated development environment (IDE)
  - RStudio Desktop
  - RStudio Server
- Quarto (similar to Rmarkdown): dynamic reports
  - combining text, code, and printed tables and figures
- RProjects: a workflow tool
  - contains all files necessary for a project
  - works with relative file paths

1.5 Exercises

1.5.1 RStudio

Open RStudio
- locate the Environment, Files, and Console panes
- File > New File > R script
- write [your birth-month number]*[the your birth day] and hit Enter
- write print("Hello World!")
- write number <- 3*32; this will create an object/variable ‘number’
- write string <- "Hello World!"; this will create an object/variable ‘string’
- write number
- write string
- add comments describing each step using #
- File > Save As

# multiply 5 by 7
5*7

[1] 35

# print some text
print("Hello World!")

[1] "Hello World!"

# save an object 'number' with 5*7
number <- 5*7

# save an object 'string' with text
string <- "Hello World!"

# print number
number

[1] 35

# print string
string

[1] "Hello World!"

# do math with objects
number+number

[1] 70

number*number

[1] 1225

number*2

[1] 70

month <- 5

day <- 7

month*day

[1] 35

1.5.2 Quarto²

R scripts are a great way to keep track of what you did
- however, the output is not saved, and adding comments with # gets kind of chunky
- enter: dynamic reports!
dynamic reports are those that combine text, code, and output
- they are a great tool for communicating, collaborating, and documenting
- they are also fantastic for note-taking
Rmarkdown vs. Quarto
- both can combine text with code, outputting PDFs, Word Documents, html, or slides
- main difference: Quarto has native support of a wider range of programming languages (e.g., Python and Julia)
Want to know more? Check out Hadley Wickham’s intro (Wickham et al., n.d.)

1.5.2.1 YAML

---
title: "My title"
author: "My name"
format: html
---

YAML is a human-readable programming language used to configure documents
formatting is important: but be sandwiched between --- and ---
in Quarto the output type must at least be given (with R: pdf, html, revealjs)

1.5.2.2 Headings and text

# This is a heading

This is text.

## This is a sub-heading

This is more text.

headings are indicated by #
- the number of #’s indicates the heading level

1.5.2.3 Code snippets

# do some math
year <- 1989
dog <- "Lola"

sandwiched between markdown```{r} and `markdown
- shortcut: Ctrl/Cmd+Alt+I

1.5.2.4 In-line code

I was born on `r month`/`r day`/`r year`. My dog's name is `r dog`.

I was born on 5/7/1989. My dog’s name is Lola.

code output that was run above text can be called in-line using `r `

1.5.2.5 Altogether

---
title: "My title"
author: "My name"
format: html
---

# This is a heading

This is text.

## This is a sub-heading

This is more text.

Add some code chunks.

```{r}
# do some math
year <- 1989
dog <- "Lola"
```

And use call objects for in-line code: I was born on `r month`/`r day`/`r year`. My dog's name is `r dog`.

1.5.3 Quarto Exercises

Create a new Quarto document
- File > New File > Quarto Document
- Read the instructions
- Practice running the chunks individually
- render the document
- verify that you can modify the code, re-run it, and see modified output

Create one new Quarto document for each of the three built-in formats: HTML, PDF and Word.
- Render each of the three documents
- How do the outputs differ?
- How do the inputs differ?³

1.5.4 Quarto cont’d

Choose a Quarto document:
- give it a title, your name (author), and unclick ‘Use visual markdown editor’
Render
YAML:

title: "Eye-tracking during reading"
subtitle: "Lecture 2 notes"
author: "[YOUR NAME HERE]"
lang: en
date: `r Sys.Date()`

Render
you can now try writing your class notes in this document (if you’re brave)

References

electroencephalography↩︎
https://r4ds.hadley.nz/quarto.html#workflow↩︎
You may need to install LaTeX in order to build the PDF output — RStudio will prompt you if this is necessary.↩︎

--- date: 2023-04-12 bibliography: references/references.json csl: references/apa.csl --- # Reproducible Analayses ```{r, eval = T, cache = F, echo = F, message=FALSE} # Create references.json file based on the citations in this script # make sure you have 'bibliography: references.json' in the YAML rbbt::bbt_update_bib("reproducibility.qmd") ``` ```{r} knitr::opts_chunk$set(eval = T, # change this to 'eval = T' to reproduce the analyses; make sure to comment out echo = T, # 'print code chunk?' message = F, # 'print messages (e.g., warnings)?' error = F, warning = F) ``` ## Replication > "There is increasing concern that in modern research, false findings may be the majority or even the vast majority of published research claims" > > -- <cite>@ioannidis_why_2005</cite> - replication refers to re-running a previous experiment with as few differences as possible + aim: determine whether the original results were *robust* and are *replicable* + if yes, great! the original findings are reliable + if no, hmm, maybe the original findings were false positives? or due to some other factor? - in recent years, researchers have tried to *replicate* classic studies in their field + but in many cases, they did not get the same effects the original study reported (and were famous for) - this began the ***replication crisis*** ### An example from language research - @nieuwland_large-scale_2018: a *direct* EEG^[electroencephalography] replication (versus *conceptual* replication) ::: {style="font-size: 75%"} + a multi-lab replication of @delong_probabilistic_2005's impactful paper + @delong_probabilistic_2005: reported N400 effects elicted at unexpected nouns, but also on preceding determiners (English *a/an*) when it signalled an unexpected word, + e.g., *The day was breezy so the boy went outside to fly...a kite*/\**an airplane* + taken as evidence of pre-activation of phonological form, graded by cloze probability + @nieuwland_large-scale_2018: replicated N400 at noun, but not at adjective + i.e., *failure to replicate* a famous finding ::: ## Reproducibility - reproducibility refers to the ability to *reproduce* somebody's analyses with their + data + *and* code - it is not something we do once, nor is it something that will get us published + but it's important for open science and encourages transparency ### Replication vs. Reproducibility - **replication** of a study + repeating an **experiment** + getting *similar* results - **reproducibility** of analyses + repeating **analyses** of the *same data* + getting the *same* results - e.g., when you submit a paper to a journal, they make ask for your data and code so reviewers can *reproduce* your analyses + requires data and code - if you have interesting findings, other researchers (or future you) may want to *replicate* your study to see if they can *replicate* your findings + (may require) stimuli, set-up and presentation information, participant demographics ## Open Science: Why should I care? 1. Science is cumulative + We should ensure we're building on reliable, robust findings + i.e., it's *good* scientific practice 2. Because the field cares + replication/reproducibility are beginning to be foregrounded by e.g., journals/job advertisements 3. Helps future you + pre-registration, reproducible analyses, clean and shareable data: all help *future you* ### What can I do? - there's a variety of open science practices that we can choose to implement - some suggestions from @kathawalla_easing_2021: :::: {.columns} ::: {.column width="30%"} Level: Easy 1. Journal Club 2. Project workflow 3. Pre-prints ::: ::: {.column width="40%"} Level: Medium 4. Reproducible code 5. Sharing data 6. Transparent manuscripts 7. Pre-registration ::: ::: {.column width="30%"} Level: Difficult 8. Registered reports ::: :::: ### How to do better science - don't be afraid of making mistakes + (most) researchers aren't statisticians or programmers + do the best you can, and ***be transparent*** - doing *some* of the steps is better than doing *none* ### What will we learn here? :::: {.columns} ::: {.column width="50%"} Design and Reporting ::: {style="font-size: 75%;"} - Preregistration/Registered Reports - Transparent writing ::: Analysis ::: {style="font-size: 75%;"} - Reproducible code + with open source software (R, RStudio, packages) + dynamic reports with Quarto/Rmarkdown - Project workflow + folder structure + how to sensibly set up your folders + contained environments + using RProjects and the `here` package ::: ::: ::: {.column width="50%"} ```{r echo = F, fig.env = "figure",out.height="100%", fig.align = "center", set.cap.width=T, fig.cap="Image source: @kathawalla_easing_2021 (all rights reserved)"} knitr::include_graphics(here::here("media/Kathawalla_research_cycle.png")) ``` ::: :::: ## R is for Reproducibility - we will be working with R, RStudio, Quarto, and RProjects + R: a programming language for statistical computing and graphics + RStudio: an integrated development environment (IDE) - RStudio Desktop - RStudio Server + Quarto (similar to Rmarkdown): dynamic reports - combining text, code, and printed tables and figures + RProjects: a workflow tool - contains all files necessary for a project - works with *relative* file paths ::: {.content-visible when-format="revealjs"} ::: {.notes} Students: open R, then RStudio, then create an RProject Me: show them each on my computer - R can run code and save a script - RStudio has so many more options (cheatsheet) - RProjects keep everything tidy and together ::: ::: ## Exercises ### RStudio {.smaller} 1. Open RStudio - locate the Environment, Files, and Console panes - File > New File > R script - write `[your birth-month number]*[the your birth day]` and hit Enter - write `print("Hello World!")` - write `number <- 3*32`; this will create an object/variable 'number' - write `string <- "Hello World!"`; this will create an object/variable 'string' - write `number` - write `string` - add comments describing each step using `#` - File > Save As ## {-} ```{r, eval = T} #| output-location: column-fragment ``` ```{r, eval = T} #| output-location: column-fragment # multiply 5 by 7 5*7 ``` ```{r, eval = T} #| output-location: column-fragment # print some text print("Hello World!") ``` ```{r, eval = T} #| output-location: column-fragment # save an object 'number' with 5*7 number <- 5*7 ``` ```{r, eval = T} #| output-location: column-fragment # save an object 'string' with text string <- "Hello World!" ``` ```{r, eval = T} #| output-location: column-fragment # print number number ``` ```{r, eval = T} #| output-location: column-fragment # print string string ``` ```{r, eval = T} #| output-location: column-fragment # do math with objects number+number ``` ```{r, eval = T} #| output-location: column-fragment number*number ``` ```{r, eval = T} #| output-location: column-fragment number*2 ``` ```{r, eval = T} #| output-location: column-fragment month <- 5 ``` ```{r, eval = T} #| output-location: column-fragment day <- 7 ``` ```{r, eval = T} #| output-location: column-fragment month*day ``` ### Quarto^[https://r4ds.hadley.nz/quarto.html#workflow] {.smaller} - R scripts are a great way to keep track of what you did + however, the output is not saved, and adding comments with `#` gets kind of chunky + enter: dynamic reports! - dynamic reports are those that combine text, code, and output + they are a great tool for communicating, collaborating, and documenting + they are also fantastic for note-taking - Rmarkdown vs. Quarto + both can combine text with code, outputting PDFs, Word Documents, html, or slides + main difference: Quarto has native support of a wider range of programming languages (e.g., Python and Julia) - Want to know more? Check out [Hadley Wickham's intro](https://r4ds.hadley.nz/quarto.html) [@wickham_r_nodate] #### YAML ```{r, eval = F} #| code-line-numbers: false --- title: "My title" author: "My name" format: html --- ``` - YAML is a human-readable programming language used to configure documents - formatting is important: but be sandwiched between `---` and `---` - in Quarto the output type must at least be given (with R: pdf, html, revealjs) #### Headings and text ```{r, eval = F} # This is a heading This is text. ## This is a sub-heading This is more text. ``` - headings are indicated by `#` + the number of `#`'s indicates the heading level #### Code snippets ```{r} #| results: asis # do some math year <- 1989 dog <- "Lola" ``` - sandwiched between ````markdown```{r}```` and ````markdown``` + shortcut: Ctrl/Cmd+Alt+I #### In-line code ```{r, eval = F} I was born on `r month`/`r day`/`r year`. My dog's name is `r dog`. ``` I was born on `r month`/`r day`/`r year`. My dog's name is `r dog`. - code output that was run *above* text can be called in-line using \``r` \` #### Altogether ````markdown --- title: "My title" author: "My name" format: html --- # This is a heading This is text. ## This is a sub-heading This is more text. Add some code chunks. `r ''````{r} # do some math year <- 1989 dog <- "Lola" ``` And use call objects for in-line code: I was born on `r knitr::inline_expr("month")`/`r knitr::inline_expr("day")`/`r knitr::inline_expr("year")`. My dog's name is `r knitr::inline_expr("dog")`. ```` ### Quarto Exercises :::: {.columns} ::: {.column width="50%"} 3. Create a new Quarto document - File > New File > Quarto Document - Read the instructions - Practice running the chunks individually - render the document - verify that you can modify the code, re-run it, and see modified output ::: ::: {.column width="50%"} 4. Create one new Quarto document for each of the three built-in formats: HTML, PDF and Word. - Render each of the three documents - How do the outputs differ? - How do the inputs differ?^[You may need to install LaTeX in order to build the PDF output — RStudio will prompt you if this is necessary.] ::: :::: ### Quarto cont'd {.smaller} - Choose a Quarto document: + give it a title, your name (author), and unclick 'Use visual markdown editor' - Render - YAML: ```{r, eval = FALSE} title: "Eye-tracking during reading" subtitle: "Lecture 2 notes" author: "[YOUR NAME HERE]" lang: en date: `r Sys.Date()` ``` + Render - you can now try writing your class notes in this document (if you're brave) # References {.unnumbered} ::: {#refs custom-style="Bibliography"} :::