1 Reproducible Analayses
1.1 Replication
“There is increasing concern that in modern research, false findings may be the majority or even the vast majority of published research claims”
– Ioannidis (2005)
- replication refers to re-running a previous experiment with as few differences as possible
- aim: determine whether the original results were robust and are replicable
- if yes, great! the original findings are reliable
- if no, hmm, maybe the original findings were false positives? or due to some other factor?
- in recent years, researchers have tried to replicate classic studies in their field
- but in many cases, they did not get the same effects the original study reported (and were famous for)
- this began the replication crisis
1.1.1 An example from language research
- a multi-lab replication of DeLong et al. (2005)’s impactful paper
- DeLong et al. (2005): reported N400 effects elicted at unexpected nouns, but also on preceding determiners (English a/an) when it signalled an unexpected word,
- e.g., The day was breezy so the boy went outside to fly…a kite/*an airplane
- taken as evidence of pre-activation of phonological form, graded by cloze probability
- Nieuwland et al. (2018): replicated N400 at noun, but not at adjective
- i.e., failure to replicate a famous finding
- DeLong et al. (2005): reported N400 effects elicted at unexpected nouns, but also on preceding determiners (English a/an) when it signalled an unexpected word,
1.2 Reproducibility
- reproducibility refers to the ability to reproduce somebody’s analyses with their
- data
- and code
- it is not something we do once, nor is it something that will get us published
- but it’s important for open science and encourages transparency
1.2.1 Replication vs. Reproducibility
- replication of a study
- repeating an experiment
- getting similar results
- reproducibility of analyses
- repeating analyses of the same data
- getting the same results
- e.g., when you submit a paper to a journal, they make ask for your data and code so reviewers can reproduce your analyses
- requires data and code
- if you have interesting findings, other researchers (or future you) may want to replicate your study to see if they can replicate your findings
- (may require) stimuli, set-up and presentation information, participant demographics
1.3 Open Science: Why should I care?
- Science is cumulative
- We should ensure we’re building on reliable, robust findings
- i.e., it’s good scientific practice
- Because the field cares
- replication/reproducibility are beginning to be foregrounded by e.g., journals/job advertisements
- Helps future you
- pre-registration, reproducible analyses, clean and shareable data: all help future you
1.3.1 What can I do?
- there’s a variety of open science practices that we can choose to implement
- some suggestions from Kathawalla et al. (2021):
Level: Easy
- Journal Club
- Project workflow
- Pre-prints
Level: Medium
- Reproducible code
- Sharing data
- Transparent manuscripts
- Pre-registration
Level: Difficult
- Registered reports
1.3.2 How to do better science
- don’t be afraid of making mistakes
- (most) researchers aren’t statisticians or programmers
- do the best you can, and be transparent
- doing some of the steps is better than doing none
1.3.3 What will we learn here?
Design and Reporting
- Preregistration/Registered Reports
- Transparent writing
Analysis
- Reproducible code
- with open source software (R, RStudio, packages)
- dynamic reports with Quarto/Rmarkdown
- Project workflow
- folder structure
- how to sensibly set up your folders
- contained environments
- using RProjects and the
here
package
- using RProjects and the
- folder structure
1.4 R is for Reproducibility
- we will be working with R, RStudio, Quarto, and RProjects
- R: a programming language for statistical computing and graphics
- RStudio: an integrated development environment (IDE)
- RStudio Desktop
- RStudio Server
- Quarto (similar to Rmarkdown): dynamic reports
- combining text, code, and printed tables and figures
- RProjects: a workflow tool
- contains all files necessary for a project
- works with relative file paths
1.5 Exercises
1.5.1 RStudio
- Open RStudio
- locate the Environment, Files, and Console panes
- File > New File > R script
- write
[your birth-month number]*[the your birth day]
and hit Enter - write
print("Hello World!")
- write
number <- 3*32
; this will create an object/variable ‘number’ - write
string <- "Hello World!"
; this will create an object/variable ‘string’ - write
number
- write
string
- add comments describing each step using
#
- File > Save As
# multiply 5 by 7
5*7
[1] 35
# print some text
print("Hello World!")
[1] "Hello World!"
# save an object 'number' with 5*7
<- 5*7 number
# save an object 'string' with text
<- "Hello World!" string
# print number
number
[1] 35
# print string
string
[1] "Hello World!"
# do math with objects
+number number
[1] 70
*number number
[1] 1225
*2 number
[1] 70
<- 5 month
<- 7 day
*day month
[1] 35
1.5.2 Quarto2
- R scripts are a great way to keep track of what you did
- however, the output is not saved, and adding comments with
#
gets kind of chunky - enter: dynamic reports!
- however, the output is not saved, and adding comments with
- dynamic reports are those that combine text, code, and output
- they are a great tool for communicating, collaborating, and documenting
- they are also fantastic for note-taking
- Rmarkdown vs. Quarto
- both can combine text with code, outputting PDFs, Word Documents, html, or slides
- main difference: Quarto has native support of a wider range of programming languages (e.g., Python and Julia)
- Want to know more? Check out Hadley Wickham’s intro (Wickham et al., n.d.)
1.5.2.1 YAML
---
: "My title"
title: "My name"
author: html
format---
- YAML is a human-readable programming language used to configure documents
- formatting is important: but be sandwiched between
---
and---
- in Quarto the output type must at least be given (with R: pdf, html, revealjs)
1.5.2.2 Headings and text
# This is a heading
This is text.
## This is a sub-heading
This is more text.
- headings are indicated by
#
- the number of
#
’s indicates the heading level
- the number of
1.5.2.3 Code snippets
# do some math
<- 1989
year <- "Lola" dog
- sandwiched between
markdown```{r}
and `markdown
- shortcut: Ctrl/Cmd+Alt+I
1.5.2.4 In-line code
`r month`/`r day`/`r year`. My dog's name is `r dog`. I was born on
I was born on 5/7/1989. My dog’s name is Lola.
- code output that was run above text can be called in-line using `
r
`
1.5.2.5 Altogether
---
title: "My title"
author: "My name"
format: html
---
# This is a heading
This is text.
## This is a sub-heading
This is more text.
Add some code chunks.
```{r}
# do some math
<- 1989
year <- "Lola"
dog ```
`r month`/`r day`/`r year`. My dog's name is `r dog`. And use call objects for in-line code: I was born on
1.5.3 Quarto Exercises
- Create a new Quarto document
- File > New File > Quarto Document
- Read the instructions
- Practice running the chunks individually
- render the document
- verify that you can modify the code, re-run it, and see modified output
- Create one new Quarto document for each of the three built-in formats: HTML, PDF and Word.
- Render each of the three documents
- How do the outputs differ?
- How do the inputs differ?3
1.5.4 Quarto cont’d
- Choose a Quarto document:
- give it a title, your name (author), and unclick ‘Use visual markdown editor’
- Render
- YAML:
: "Eye-tracking during reading"
title: "Lecture 2 notes"
subtitle: "[YOUR NAME HERE]"
author: en
lang: `r Sys.Date()` date
Render
you can now try writing your class notes in this document (if you’re brave)