RProjects
Creating a project-oriented workflow in R
Learning Objectives
Today we will…
- learn about project-oriented workflows
- create an RProject
- use project-relative filepaths with the
here
package
Installation requirements
- required installations/recent versions of:
- R
- version
4.4.0
, “Puppy Cup” - check current version with
R.version
- download/update: https://cran.r-project.org/bin/macosx/
- version
- RStudio
- version
2023.12.1.402
, “Ocean Storm” - Help > Check for updates
- new install: https://posit.co/download/rstudio-desktop/
- version
- R
Project-oriented workflow
- Folder structure:
- keeping everything related to a project in one place
- i.e., contained in a single folder, with subfolders as needed
- Project-relative working directory
- the project folder should act as your working directory
- all file paths should be relative to this folder
Folder structure
- a core computer literacy skill
- keep your Desktop as empty as possible
- have a sensible folder structure
- avoid mixing subfolders and files
- i.e., if a folder contains subfolders, ideally it should not contain files
RProjects
- in data analysis, using an IDE is beneficial
- e.g., RStudio
- most IDEs have their own implementation of a Project
- in RStudio, this is the RProject
- creates a
.Rproj
file in a project folder - stores project settings
- creates a
- you can have several RProjects open simultaneously
- and run several scripts across projects simultaneously
- most importantly, RProjects (can) centralise a specific project’s workflow and file path
- to read more about R Projects, check out Section 6.2: Projects from Wickham et al. (2023; or Ch. 8 - Workflow: Projects in Wickham & Grolemund, 2016)
Creating a new Project
- when?
- whenever you’re starting a new course or project which will use R
- why?
- to keep all the relavent materials in one place
- where?
- somewhere that makes sense, e.g., a folder called
SoSe2024
orMastersarbeit
- somewhere that makes sense, e.g., a folder called
- how?
File > New Project > New Directory > New Project > [Directory name] > Create Project
New RProject
Create a new RProject for this workshop
File > New Project > New Directory > New Project > [Directory name] > Create Project
- make sure you choose a sensible location
Opening a Project
- to open a project, locate its
.Rproj
file and double-click - or if you’re already in RStudio, you can use the
Project (None)
drop-down (top right)
Adding a README file
File > New File > Markdown File
(not R Markdown!)- add some text describing the purpose of this project
- include your name, the date
- use Markdown formatting (e.g.,
#
for headings,*italics*
,**bold**
)
- save as
README.md
in your project directory
Global RStudio options
Tools > Global Options
- Workspace: Restore .RData into workspace at startup: NO
- Save workspace to .RData on exit: Never
- this will ensure that you are always starting with a clean slate
- and that your code is not dependent on some pacakge or object you created in another session
- this is also how RMarkdown and Quarto scripts run
- they start with an empty environment and run the script linearly
Global settings
Change your Global Options so that
- Workspace: Restore .RData into workspace at startup: NO
- Save workspace to .RData on exit: Never
Identifying your RProject
- there are a ways to check which (if any) RProject you’re in
- there are 6 differences between xyzfig-noproject and xyzfig-project
- which is in an RProject session?
Folder structure
- some folders you’ll typically want to have:
data
: containing your dataset(s)scripts
(oranalyses
, etc.): containing any analysis scriptsmanuscript
: containing any write-ups of your resultsmaterials
: containing relevant experiment materials (e.g., stimuli)
- let’s just create the first 2 (
data
andscripts
)
data/
- do you have “raw”, i.e., pre-processed data?
- if so, you might want to create a
raw
sub-folder - and any other relevant sub-folders (e.g.,
processed
ortidy
)
- if so, you might want to create a
- download the dataset from the workshop repo (from Chromý et al., 2023)
- or, move a dataset of your own to this folder
scripts/
- try to create a single script for each “product”
- e.g., anonymised data, ‘cleaned’ data, data exploration, visualisation, analyses, etc.
- you can create sub-folders as the project develops and move scripts around
- for now, let’s create a new script to take a look at our data
New script
Create a new Quarto script:
File > New File > Quarto Document
- Add a title
- Uncheck the
Use Visual Editor
box - Click
Create
- Save it in your
scripts/
folder:File > Save as...
Load in the data
- load in the data however you normally would
- e.g.,
readr::read_csv()
- e.g.,
here
-package
here
package (Müller, 2020) enables file referencing- avoids the use of
setwd()
- avoids the use of
The problem with setwd()
If the first line of your R script is
setwd("C:\Users\jenny\path\that\only\I\have")
I will come into your office and SET YOUR COMPUTER ON FIRE🔥.
setwd()
depends on your entire machine’s folder structuresetwd()
breaks when you- send your project folder to a collaborator
- make your analyses open
- change the location of your project folder
- using slashes is also dependent on your operating system
The benefit of here()
- uses the top-level directory of your project as the working directory
- can separate folder names with a comma
here
Load the dataset using here
- Install
here
(e.g.,install.packages("here")
) - Load
here
at the beginning of your package- or use
here::
before calling a function
- or use
- Use the
here()
function to load in your data - Inspect the dataset however you usually would (e.g.,
summary()
,names()
, etc.) - Save your script
here::here()
- install package
In the Console
install.packages("here")
- load package and call the
here
function
# load package
library(here)
# read in data
<- read.csv(here("data", "data_lifetime_pilot.csv")) df_data
- or directly call the
here
function without loading the package
# read in data without loading here
<- read.csv(here::here("data", "data_lifetime_pilot.csv")) df_data
- note that I stored the data with the prefix
df_
df
stands for dataframe
- I recommend using object-type defining prefixes for all objects in your Environment
- e.g.,
fit_
for models,fig_
for figures,sum_
for summaries,tbl_
for tables, etc.
- e.g.,
Reproduce your analysis
- Perform some data exploration (e.g., with
names()
,summary()
,dplyr::glimpse()
, whatever you typically do) - Save your script, then close RStudio/your Rproject.
- Re-open the project. Can you re-run the script?
Learning objectives 🏁
Today we learned…
- learn about project-oriented workflows ✅
- create an RProject ✅
- establish a self-contained project environment with
here
✅
References
Bryan, J., & TAs, T. S. 545. (n.d.). R Basics and workflows. In STAT 545 Course materials. Retrieved May 6, 2024, from https://stat545.com/
Chromý, J., Brand, J., Laurinavichyute, A., & Lacina, R. (2023). Number agreement attraction in Czech and English comprehension: A direct experimental comparison. Glossa Psycholinguistics, 2(1), 1–20. https://doi.org/10.5070/G6011235
Müller, K. (2020). Here: A Simpler Way to Find Your Files (Version 1.0.1). https://CRAN.R-project.org/package=here
Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for Data Science (2nd ed.). https://r4ds.hadley.nz/
Wickham, H., & Grolemund, G. (2016). R for data science: Import, tidy, transform, visualize, and model data. " O’Reilly Media, Inc.".