Workshop overview

This website contains slides for a two-part workshop given given by Daniela Palleschi at the Leibniz-Zentrum Allgemeinesprachwissenschaft (ZAS) Berlin. The tools discussed are specific to the R enviornment, but the concepts are universal and programming language agnostic. The materials are re-structured from various other renditions of the workshop, all of which were based on a semester-length course on the same topic given at the Humboldt-Universität zu Berlin in the summer semester 2024. The materials for the semester-length course are more exhaustive, and can be viewed here.

Schedule

Table 1 shows the tentative plan for the workshop and may be adjusted based on the needs of the participants.

Table 1: Tentative schedule for the 2-day workshop
Day 1: Tues. Oct. 8, 1-4pm
(i) Open Science Practices and reproducibility
(ii) Data management: folder/file organisation, data handling
(iii) Working with RProjects + project-relative filepaths with {here}
Day 2: Thurs. Oct. 17, 1-4pm
(iv) Modular analyses and literate programming with Quarto
(v) Package management with {renv}
(vii) Code review via online repositories

How to navigate this website

Each topic is listed in the sidebar in chronological order. Three output formats are available, all with the same content:

  1. HTML page (landing page)
  2. PDF of content (sub-optimally formatted)
  3. Slides in RevealJS format

The contents were formatted for the slide output. Tables and figures may be too large/small in HTML and PDF format (especially the latter). Each page of the website presents the HTML format. The other 2 formats can be viewed by clicking on their symbol under ‘Other Formats’ (right sidebar).

Preparation

It is assumed you have at least some basic familiarity with R and R Studio. Please at the very least make sure you have the required software before Day 1.

Software: R and RStudio

Please make sure you have recent versions of R and RStudio installed prior to the workshop. Below you will find information on how to check which version of R and RStudio you currently have, and how to install or update them as needed.

Check software versions

R

To check which version of R you currently have, run the command R.version$version.string in the Console (to print just the version name and release date), or R.version$nickname (to print the nickname).

In the Console: print R version and release date
R.version$version.string
[1] "R version 4.4.1 (2024-06-14)"
In the Console: print R version nickname
R.version$nickname
[1] "Race for Your Life"

R

To check which version of RStudio you currently have, run the command R.version$version.string in the Console (to print just the version name and release date), or R.version$nickname (to print the nickname). Be sure to include these only in the Console, as Rmarkdown/Quarto scripts will not be able to run these commands.

In the Console: print RStudio version number
RStudio.Version()$version
In the Console: print RStudio version nickname
RStudio.Version()$release_name

Alternatively, you can go to Help > About RStudio in RStudio. You should see a pop-up like Figure 1.

Figure 1: Help > About RStudio

Intall/update software

  1. Install or update R
    • N.B., I am currently using version 4.4.1 (Race for Your Life, 2024-06-14)
    • having an R version from 2022.07 or later should suffice

Disclaimer: Updating R

Beware that updating R can interfere with on-going R projects you are currently working on, most notably because you will need to re-install packages (and thus you may be installing more recent package versions which may break existing code). If you are currently in the middle of analysing some data, you may not want to update R right now. In this case, just make note of which version you’re currently running (e.g., by running R.version in the Console)

  1. Install or update RStudio
    • I am currently using RStudio version 2023.12.1+402, as I encountered issues when updating to 2024.04.2+764 in April when it was released. As a rule of thumb, I update R and/or RStudio a few months after their initial release, and when I know I have time to fix any bugs that might pop up (i.e., I don’t have a looming deadline)

Suggested readings (before, during, or after the workshop)

There is currently a wealth of literature on the topic of reproducibility, both in terms of meta-science reviews of rates of reproducibility and in terms of best-practice advice. Some reading I would suggest for a soft introduction into the latter would be:

For a book-length treatment of R-specific reproducible workflows:

And for an overview of R-specific data analysis suggestions, I recommend the following on-line resources:

  • Bryan, J., Hester, J., Pileggi, S., & Aja, D. E. (n.d.). What They Forgot to Teach You About R. Retrieved May 6, 2024, from https://rstats.wtf/
  • Bryan, J., & TAs, T. S. 545. (n.d.). R Basics and workflows. In STAT 545 Course materials. Retrieved May 6, 2024, from https://stat545.com/

For more general discussions on Open Science Practices:

  • Kathawalla, U.-K., Silverstein, P., & Syed, M. (2021). Easing Into Open Science: A Guide for Graduate Students and Their Advisors. Collabra: Psychology, 7(1), 18684. https://doi.org/10.1525/collabra.18684
  • Crüwell, S., Van Doorn, J., Etz, A., Makel, M. C., Moshontz, H., Niebaum, J. C., Orben, A., Parsons, S., & Schulte-Mecklenbeck, M. (2019). Seven Easy Steps to Open Science: An Annotated Reading List. Zeitschrift Für Psychologie, 227(4), 237–248. https://doi.org/10.1027/2151-2604/a000387