3 Working with eye-tracking reading data in R

Loading and eye-balling a dataset

Author

Affiliation

Daniela Palleschi

Humboldt-Universität zu Berlin

Set-up

library(here) # relative path
library(tidyverse) # tidy/transform
library(beepr) # beeps when code runs or fails
library(rbbt) # zotero plugin

beep code

## play sound if error encountered
### from: https://sejohnston.com/2015/02/24/make-r-beep-when-r-markdown-finishes-or-when-it-fails/
options(error = function(){    # Beep on error
  beepr::beep(sound = "wilhelm")
  Sys.sleep(2) # 
  }
 )
## and when knitting is complete
.Last <- function() {          # Beep on exiting session
  beepr::beep(sound = "ping")
  Sys.sleep(6) # allow to play for 6 seconds
  }

rbbt code

# Create references.json file based on the citations in this script:
# 1. make sure you have 'bibliography: references/references.json' in the YAML
# 2. create a new folder called 'references'
# 3. run:
rbbt::bbt_update_bib("_et_dataset.qmd")

Error: '_et_dataset.qmd' does not exist in current working directory ('/Users/danielapalleschi/Documents/Personal/repo-eda').

4 The Perfect Lifetime Effect

the English Present Perfect (e.g., has done) (e.g., comrie_aspect_1976?)
- must be used in temporal contexts that include the present
  - I have been sick since last week
  - *I have been sick last year
The Lifetime Effect
- a referent’s lifetime (dead/alive) constrains verb tense in certain circumstances (e.g., mittwoch_tenses_2008?)
  - *Queen Elizabeth II is the British monarch.
  - *King Charles III was the British monarch.
the Perfect Lifetime Effect
- the (English) Present Perfect cannot be used to describe events of a dead person (e.g., mittwoch_english_2008?)
  - *Queen Elizabeth II has met many politicians.
  - King Charles III has met many politicians.

4.1 Our first dataset

referent-lifetime context
- dead/alive
critical sentence
- Present Perfect/Simple Future
binary naturalness judgement to end trial
- accept/reject

4.2 Design description

2x2 mixed design
- two 2-level factors (2x2 = 2-level x 2-level)
  - factor 1: lifetime (levels: dead, alive)
  - factor 2: tense (levels: PP, SF)

	alive	dead
PP	Eddie Redmayne…has won	Gene Kelly…*has won
SF	Eddie Redmayne…will win	Gene Kelly…*will win

predictors/independent variables
- lifetime
- tense

measure/dependent variables (verb region)
- first-fixation time (milliseconds)
- first-pass reading time (ms)
- regression path duration (ms)
- total reading time (ms)

4.2.1 Repeated measures design

observations are repeated e.g., multiple data points per participant, and per item across participants
- essentially, data are not independent
- e.g., each participant will have their own reading speed, some items might be systematically less acceptable for some unforeseen reason, etc.

5 Working with the data

Day 1

load the data
inspect data
- eyeball data structure
- print summaries
- plot data distributions

Day 2

tidy data
visualise data
communicate data

Day 3

analyse data
- confirmatory (a priori)
- exploratory (post-hoc)
report analyses

5.1 Install packages

install.packages("tidyverse")
install.packages("here")

install
- only do once
- …or when you working on a new computer
- …or after updating R
might be a wise idea to create a script just for installing packages
- can save time/energy when updating R

5.2 Load packages

library(tidyverse)
library(here)

load packages
- needed at the start of each session

5.3 Load dataset

df_lifetime <- readr::read_csv(here::here("data/data_lifetime_pilot.csv"))

N.B., readr::read_csv can be read as “read_csv() function in the readr package”
- i.e., package::function()
- you only need to use this syntax if you haven’t loaded the specific package yet (maybe because you only need it once), or if a function name is included in multiple packages (i.e., there’s a discrepancy in what read_csv could be referring to)
- why did I use it here?

5.4

here package

Using the here package, we can access files relative to where our .RProj is stored.

In ‘olden times’, we had to specify the file path with something like:

# load in data from an *absolute* file path
df_lifetime <- read_csv("Users/yournamehere/Documents/SoSe2023/ET_reading/data/data_lifetime_pilot.csv")

Or, we’d set an absolute path as our working directory, to which all other file paths were relative

# set *absolute* path as working directory
setwd("Users/username/Documents/SoSe2023/ET_reading")

# load in data *relative* to our wd
df_lifetime <- read_csv("data/data_lifetime_pilot.csv")

This meant that if I sent my project folder to somebody else, they wouldn’t be able to run my code because they would have to change the absolute file path to match their machine.

5.5 Inspect dataset

there are several different things you can inspect
- and different ways to accomplish those things
the first thing I usually do is look at the column/variable names

5.5.1 `names()`

the names in all caps are variables created during the experiment
- i.e., they are our recorded data, mainly what we wanted to measure: dependent variables (DV)
- also includes some information about the experiment set-up per participant
the other names are variables from my stimuli lists
- i.e., they mostly contain our independent variables (IV)/stimuli
we typically want to see what effect our IVs had on any given DVs
variable descriptions can be found on the Moodle: Data > Documentation

names(df_lifetime)

 [1] "RECORDING_SESSION_LABEL"     "TRIAL_INDEX"                
 [3] "EYE_USED"                    "IA_DWELL_TIME"              
 [5] "IA_FIRST_FIXATION_DURATION"  "IA_FIRST_RUN_DWELL_TIME"    
 [7] "IA_FIXATION_COUNT"           "IA_ID"                      
 [9] "IA_LABEL"                    "IA_REGRESSION_IN"           
[11] "IA_REGRESSION_IN_COUNT"      "IA_REGRESSION_OUT"          
[13] "IA_REGRESSION_OUT_COUNT"     "IA_REGRESSION_PATH_DURATION"
[15] "KeyPress"                    "rt"                         
[17] "bio"                         "critical"                   
[19] "gender"                      "item_id"                    
[21] "list"                        "match"                      
[23] "condition"                   "name"                       
[25] "name_vital_status"           "tense"                      
[27] "type"                        "yes_press"

5.5.2 `rename()`

the dependent variable names are pretty clunky, let’s rename a few:
- RECORDING_SESSION_LABEL corresponds to a single participant
- TRIAL_INDEX logged the trial number
- EYE_USED logged which eye was tracked

Code

df_lifetime <- df_lifetime %>%
  rename("px" = RECORDING_SESSION_LABEL,
         "trial" = TRIAL_INDEX,
         "eye" = EYE_USED)

5.5.2.1 Naming variables

Naming conventions

It’s wise to keep variable and object names concise but informative

all lowercase means fewer key strokes overall
separate words with either periods or underscores, e.g., trial.index or trial_index
e.g., we called our dataset df_lifetime because it is a dataframe (df) with data from our lifetime experiment

5.5.3 Data structure

datasets typically contain a lot of rows and columns
- so we want to get a feel for how the data is structured

head(df_lifetime)

# A tibble: 6 × 28
  px    trial eye   IA_DWELL_TIME IA_FIRST_FIXATION_DUR…¹ IA_FIRST_RUN_DWELL_T…²
  <chr> <dbl> <chr>         <dbl>                   <dbl>                  <dbl>
1 px3       1 RIGHT             0                       0                      0
2 px3       2 RIGHT             0                       0                      0
3 px3       3 RIGHT             0                       0                      0
4 px3       3 RIGHT             0                       0                      0
5 px3       3 RIGHT             0                       0                      0
6 px3       3 RIGHT             0                       0                      0
# ℹ abbreviated names: ¹IA_FIRST_FIXATION_DURATION, ²IA_FIRST_RUN_DWELL_TIME
# ℹ 22 more variables: IA_FIXATION_COUNT <dbl>, IA_ID <dbl>, IA_LABEL <chr>,
#   IA_REGRESSION_IN <dbl>, IA_REGRESSION_IN_COUNT <dbl>,
#   IA_REGRESSION_OUT <dbl>, IA_REGRESSION_OUT_COUNT <dbl>,
#   IA_REGRESSION_PATH_DURATION <dbl>, KeyPress <dbl>, rt <dbl>, bio <chr>,
#   critical <chr>, gender <chr>, item_id <dbl>, list <dbl>, match <chr>,
#   condition <chr>, name <chr>, name_vital_status <chr>, tense <chr>, …

df_lifetime %>%
  head()

# A tibble: 6 × 28
  px    trial eye   IA_DWELL_TIME IA_FIRST_FIXATION_DUR…¹ IA_FIRST_RUN_DWELL_T…²
  <chr> <dbl> <chr>         <dbl>                   <dbl>                  <dbl>
1 px3       1 RIGHT             0                       0                      0
2 px3       2 RIGHT             0                       0                      0
3 px3       3 RIGHT             0                       0                      0
4 px3       3 RIGHT             0                       0                      0
5 px3       3 RIGHT             0                       0                      0
6 px3       3 RIGHT             0                       0                      0
# ℹ abbreviated names: ¹IA_FIRST_FIXATION_DURATION, ²IA_FIRST_RUN_DWELL_TIME
# ℹ 22 more variables: IA_FIXATION_COUNT <dbl>, IA_ID <dbl>, IA_LABEL <chr>,
#   IA_REGRESSION_IN <dbl>, IA_REGRESSION_IN_COUNT <dbl>,
#   IA_REGRESSION_OUT <dbl>, IA_REGRESSION_OUT_COUNT <dbl>,
#   IA_REGRESSION_PATH_DURATION <dbl>, KeyPress <dbl>, rt <dbl>, bio <chr>,
#   critical <chr>, gender <chr>, item_id <dbl>, list <dbl>, match <chr>,
#   condition <chr>, name <chr>, name_vital_status <chr>, tense <chr>, …

df_lifetime |> 
  head()

# A tibble: 6 × 28
  px    trial eye   IA_DWELL_TIME IA_FIRST_FIXATION_DUR…¹ IA_FIRST_RUN_DWELL_T…²
  <chr> <dbl> <chr>         <dbl>                   <dbl>                  <dbl>
1 px3       1 RIGHT             0                       0                      0
2 px3       2 RIGHT             0                       0                      0
3 px3       3 RIGHT             0                       0                      0
4 px3       3 RIGHT             0                       0                      0
5 px3       3 RIGHT             0                       0                      0
6 px3       3 RIGHT             0                       0                      0
# ℹ abbreviated names: ¹IA_FIRST_FIXATION_DURATION, ²IA_FIRST_RUN_DWELL_TIME
# ℹ 22 more variables: IA_FIXATION_COUNT <dbl>, IA_ID <dbl>, IA_LABEL <chr>,
#   IA_REGRESSION_IN <dbl>, IA_REGRESSION_IN_COUNT <dbl>,
#   IA_REGRESSION_OUT <dbl>, IA_REGRESSION_OUT_COUNT <dbl>,
#   IA_REGRESSION_PATH_DURATION <dbl>, KeyPress <dbl>, rt <dbl>, bio <chr>,
#   critical <chr>, gender <chr>, item_id <dbl>, list <dbl>, match <chr>,
#   condition <chr>, name <chr>, name_vital_status <chr>, tense <chr>, …

5.5.4 `head()` function

prints the first 6 rows of your data
- you can also specify the number of rows

Code

df_lifetime %>%
  head(n = 2)

# A tibble: 2 × 28
  px    trial eye   IA_DWELL_TIME IA_FIRST_FIXATION_DUR…¹ IA_FIRST_RUN_DWELL_T…²
  <chr> <dbl> <chr>         <dbl>                   <dbl>                  <dbl>
1 px3       1 RIGHT             0                       0                      0
2 px3       2 RIGHT             0                       0                      0
# ℹ abbreviated names: ¹IA_FIRST_FIXATION_DURATION, ²IA_FIRST_RUN_DWELL_TIME
# ℹ 22 more variables: IA_FIXATION_COUNT <dbl>, IA_ID <dbl>, IA_LABEL <chr>,
#   IA_REGRESSION_IN <dbl>, IA_REGRESSION_IN_COUNT <dbl>,
#   IA_REGRESSION_OUT <dbl>, IA_REGRESSION_OUT_COUNT <dbl>,
#   IA_REGRESSION_PATH_DURATION <dbl>, KeyPress <dbl>, rt <dbl>, bio <chr>,
#   critical <chr>, gender <chr>, item_id <dbl>, list <dbl>, match <chr>,
#   condition <chr>, name <chr>, name_vital_status <chr>, tense <chr>, …

`head()` function task

Exercise: head()

print only 2 rows using whichever syntax you prefer
change n = 2 to some other number and print
run ?head in the Console
- find the opposite function (i.e., prints last rows) in the function description?
run this function with df_lifetime as argument; how many rows does it print as default?
play with n = in this function to print some other number of rows

5.5.5 `tail()` function

prints the last rows of a dataframe (or matrix, vector, table, or function)

df_lifetime %>%
  tail()

# A tibble: 6 × 28
  px    trial eye   IA_DWELL_TIME IA_FIRST_FIXATION_DUR…¹ IA_FIRST_RUN_DWELL_T…²
  <chr> <dbl> <chr>         <dbl>                   <dbl>                  <dbl>
1 px4     207 LEFT            509                     218                    509
2 px4     208 LEFT              0                       0                      0
3 px4     208 LEFT            317                     167                    317
4 px4     208 LEFT            162                     162                    162
5 px4     208 LEFT            139                     139                    139
6 px4     208 LEFT            280                     280                    280
# ℹ abbreviated names: ¹IA_FIRST_FIXATION_DURATION, ²IA_FIRST_RUN_DWELL_TIME
# ℹ 22 more variables: IA_FIXATION_COUNT <dbl>, IA_ID <dbl>, IA_LABEL <chr>,
#   IA_REGRESSION_IN <dbl>, IA_REGRESSION_IN_COUNT <dbl>,
#   IA_REGRESSION_OUT <dbl>, IA_REGRESSION_OUT_COUNT <dbl>,
#   IA_REGRESSION_PATH_DURATION <dbl>, KeyPress <dbl>, rt <dbl>, bio <chr>,
#   critical <chr>, gender <chr>, item_id <dbl>, list <dbl>, match <chr>,
#   condition <chr>, name <chr>, name_vital_status <chr>, tense <chr>, …

5.5.6 `names()`

prints the column/variable names

df_lifetime %>%
  names()

 [1] "px"                          "trial"                      
 [3] "eye"                         "IA_DWELL_TIME"              
 [5] "IA_FIRST_FIXATION_DURATION"  "IA_FIRST_RUN_DWELL_TIME"    
 [7] "IA_FIXATION_COUNT"           "IA_ID"                      
 [9] "IA_LABEL"                    "IA_REGRESSION_IN"           
[11] "IA_REGRESSION_IN_COUNT"      "IA_REGRESSION_OUT"          
[13] "IA_REGRESSION_OUT_COUNT"     "IA_REGRESSION_PATH_DURATION"
[15] "KeyPress"                    "rt"                         
[17] "bio"                         "critical"                   
[19] "gender"                      "item_id"                    
[21] "list"                        "match"                      
[23] "condition"                   "name"                       
[25] "name_vital_status"           "tense"                      
[27] "type"                        "yes_press"

5.5.7 `summary()`

prints a summary of each variable (column)

df_lifetime %>%
  summary()

      px                trial           eye            IA_DWELL_TIME   
 Length:4431        Min.   :  1.0   Length:4431        Min.   :   0.0  
 Class :character   1st Qu.: 52.5   Class :character   1st Qu.:   0.0  
 Mode  :character   Median :104.0   Mode  :character   Median : 301.0  
                    Mean   :105.0                      Mean   : 587.5  
                    3rd Qu.:157.0                      3rd Qu.: 765.5  
                    Max.   :208.0                      Max.   :8968.0  
 IA_FIRST_FIXATION_DURATION IA_FIRST_RUN_DWELL_TIME IA_FIXATION_COUNT
 Min.   :  0.0              Min.   :   0.0          Min.   : 0.000   
 1st Qu.:  0.0              1st Qu.:   0.0          1st Qu.: 0.000   
 Median :161.0              Median : 245.0          Median : 2.000   
 Mean   :139.4              Mean   : 507.9          Mean   : 2.714   
 3rd Qu.:202.5              3rd Qu.: 586.0          3rd Qu.: 4.000   
 Max.   :775.0              Max.   :8968.0          Max.   :35.000   
     IA_ID         IA_LABEL         IA_REGRESSION_IN  IA_REGRESSION_IN_COUNT
 Min.   :1.000   Length:4431        Min.   :0.00000   Min.   :0.0000        
 1st Qu.:1.000   Class :character   1st Qu.:0.00000   1st Qu.:0.0000        
 Median :2.000   Mode  :character   Median :0.00000   Median :0.0000        
 Mean   :2.681                      Mean   :0.09817   Mean   :0.1318        
 3rd Qu.:4.000                      3rd Qu.:0.00000   3rd Qu.:0.0000        
 Max.   :6.000                      Max.   :1.00000   Max.   :5.0000        
 IA_REGRESSION_OUT IA_REGRESSION_OUT_COUNT IA_REGRESSION_PATH_DURATION
 Min.   :0.00000   Min.   :0.00000         Min.   :    0.0            
 1st Qu.:0.00000   1st Qu.:0.00000         1st Qu.:    0.0            
 Median :0.00000   Median :0.00000         Median :  282.0            
 Mean   :0.08147   Mean   :0.09185         Mean   :  595.6            
 3rd Qu.:0.00000   3rd Qu.:0.00000         3rd Qu.:  747.0            
 Max.   :1.00000   Max.   :7.00000         Max.   :10242.0            
    KeyPress           rt            bio              critical        
 Min.   :4.000   Min.   :  533   Length:4431        Length:4431       
 1st Qu.:4.000   1st Qu.: 1332   Class :character   Class :character  
 Median :4.000   Median : 1890   Mode  :character   Mode  :character  
 Mean   :4.496   Mean   : 2467                                        
 3rd Qu.:5.000   3rd Qu.: 2910                                        
 Max.   :5.000   Max.   :15654                                        
    gender             item_id            list          match          
 Length:4431        Min.   :  1.00   Min.   :14.00   Length:4431       
 Class :character   1st Qu.: 26.00   1st Qu.:15.00   Class :character  
 Mode  :character   Median : 51.00   Median :25.00   Mode  :character  
                    Mean   : 64.16   Mean   :29.45                     
                    3rd Qu.: 78.50   3rd Qu.:35.00                     
                    Max.   :208.00   Max.   :45.00                     
  condition             name           name_vital_status     tense          
 Length:4431        Length:4431        Length:4431        Length:4431       
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
     type             yes_press    
 Length:4431        Min.   :4.000  
 Class :character   1st Qu.:4.000  
 Mode  :character   Median :4.000  
                    Mean   :4.499  
                    3rd Qu.:5.000  
                    Max.   :5.000

5.5.8 Exercise

Take some time to explore the dataset.

double click on the dataset name in the Environment pane to view it like a spreadsheet
look at the names, can you figure out what they represent?

5.6 class types

there are difference classes of data that R can read
- the function class() takes as its argument an object or number

df_lifetime$rt %>%
  class()

[1] "numeric"

5.7

Selecting a column

# with column index
df_lifetime[2] %>% summary()

     trial      
 Min.   :  1.0  
 1st Qu.: 52.5  
 Median :104.0  
 Mean   :105.0  
 3rd Qu.:157.0  
 Max.   :208.0

# with column name
df_lifetime[,"trial"] %>% summary()

     trial      
 Min.   :  1.0  
 1st Qu.: 52.5  
 Median :104.0  
 Mean   :105.0  
 3rd Qu.:157.0  
 Max.   :208.0

# with data$column_name
df_lifetime$trial %>% summary()

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    1.0    52.5   104.0   105.0   157.0   208.0

# with the tidyverse: select()
df_lifetime %>% 
  select(trial) %>%
  summary()

     trial      
 Min.   :  1.0  
 1st Qu.: 52.5  
 Median :104.0  
 Mean   :105.0  
 3rd Qu.:157.0  
 Max.   :208.0

5.7.1 `character` class

contain strings: collection of characters (i.e., text)
there’s no grouping in character variables
- each value is considered ‘unique’ and assumed to not be repeated
we usually aren’t interested in character class variables
- unless e.g., we have unique values per row (e.g., if a participant gave a free-text answer)
- or perhaps we have stored some stimuli sentences
  - although this would arguably be better as a ‘category’, since there should be multiple trials across participants that contain the same sentences

5.7.2 `numeric` class

variables with numeric values, usually some variable we’d want to compute summaries on, e.g., means
sometimes we don’t want numbers to be stored as numeric class, however
- this is the case for our variables yes_press and KeyPress (with 4 or 5)
the same is true for our variable item_id, which ranges from 1:120
- the numbers are just unique codes for our stimuli, the difference between item 1 and item 2 has nothing to do with the difference between the numbers 1 and 2

5.7.3 `factor` class

we typically want grouping variables to be factor class
- factors contain categorical data
- any number that could be replaced with some other label should be a factor
region of interest (ROI) = 1:7
- but we want to know how many observations per region, the number is not informative
- ROI could alternatively be coded as, e.g., “adverb”, “pronoun”, “verb”, “spillover”

`factor` class

let’s change df_lifetime$yes_press to factor
- using the mutate() verb from dplyr
- and as_factor() from forcats

# change yes_press to factor
df_lifetime %>%
  mutate(yes_press = as_factor(yes_press))

# A tibble: 4,431 × 28
   px    trial eye   IA_DWELL_TIME IA_FIRST_FIXATION_DU…¹ IA_FIRST_RUN_DWELL_T…²
   <chr> <dbl> <chr>         <dbl>                  <dbl>                  <dbl>
 1 px3       1 RIGHT             0                      0                      0
 2 px3       2 RIGHT             0                      0                      0
 3 px3       3 RIGHT             0                      0                      0
 4 px3       3 RIGHT             0                      0                      0
 5 px3       3 RIGHT             0                      0                      0
 6 px3       3 RIGHT             0                      0                      0
 7 px3       3 RIGHT             0                      0                      0
 8 px3       3 RIGHT             0                      0                      0
 9 px3       4 RIGHT             0                      0                      0
10 px3       5 RIGHT             0                      0                      0
# ℹ 4,421 more rows
# ℹ abbreviated names: ¹IA_FIRST_FIXATION_DURATION, ²IA_FIRST_RUN_DWELL_TIME
# ℹ 22 more variables: IA_FIXATION_COUNT <dbl>, IA_ID <dbl>, IA_LABEL <chr>,
#   IA_REGRESSION_IN <dbl>, IA_REGRESSION_IN_COUNT <dbl>,
#   IA_REGRESSION_OUT <dbl>, IA_REGRESSION_OUT_COUNT <dbl>,
#   IA_REGRESSION_PATH_DURATION <dbl>, KeyPress <dbl>, rt <dbl>, bio <chr>,
#   critical <chr>, gender <chr>, item_id <dbl>, list <dbl>, match <chr>, …

5.7.4 multiple arguments in a verb

we can also change multiple columns at once:

# change ROI & label to factor
df_lifetime %>%
  mutate(KeyPress = as_factor(KeyPress),
         item_id = as_factor(item_id))

# A tibble: 4,431 × 28
   px    trial eye   IA_DWELL_TIME IA_FIRST_FIXATION_DU…¹ IA_FIRST_RUN_DWELL_T…²
   <chr> <dbl> <chr>         <dbl>                  <dbl>                  <dbl>
 1 px3       1 RIGHT             0                      0                      0
 2 px3       2 RIGHT             0                      0                      0
 3 px3       3 RIGHT             0                      0                      0
 4 px3       3 RIGHT             0                      0                      0
 5 px3       3 RIGHT             0                      0                      0
 6 px3       3 RIGHT             0                      0                      0
 7 px3       3 RIGHT             0                      0                      0
 8 px3       3 RIGHT             0                      0                      0
 9 px3       4 RIGHT             0                      0                      0
10 px3       5 RIGHT             0                      0                      0
# ℹ 4,421 more rows
# ℹ abbreviated names: ¹IA_FIRST_FIXATION_DURATION, ²IA_FIRST_RUN_DWELL_TIME
# ℹ 22 more variables: IA_FIXATION_COUNT <dbl>, IA_ID <dbl>, IA_LABEL <chr>,
#   IA_REGRESSION_IN <dbl>, IA_REGRESSION_IN_COUNT <dbl>,
#   IA_REGRESSION_OUT <dbl>, IA_REGRESSION_OUT_COUNT <dbl>,
#   IA_REGRESSION_PATH_DURATION <dbl>, KeyPress <fct>, rt <dbl>, bio <chr>,
#   critical <chr>, gender <chr>, item_id <fct>, list <dbl>, match <chr>, …

5.7.5 Pop quiz

Which class should the following variables be (numeric, factor, or character)?:
- participant ID
- trial number
- first-pass reading time
- regression path duration
- regressions in
- context sentence
- lifetime
- tense
- celebrity name
change them to these class types, and print a summary
save and render the document

5.8 Plot the data

at this stage we want to explore the data
- distribution
  - peaks, spread
- boundaries

Histogram

hist(df_lifetime$IA_FIRST_RUN_DWELL_TIME)

Boxplot

boxplot(df_lifetime$IA_FIRST_RUN_DWELL_TIME)

Scatterplot

plot(df_lifetime$IA_FIRST_RUN_DWELL_TIME)

5.8.1 Plotting two variables

Scatterplot

plot(df_lifetime$IA_FIRST_FIXATION_DURATION, df_lifetime$IA_FIRST_RUN_DWELL_TIME)

5.8.2 Exercise

In your Quarto document:

create a heading ‘Data exploration’

briefly describe the data

For each of our depenent variables:

create a subheading
calculate the mean and standard deviation of the variable (mean(), sd()) + create a boxplot of the variable

Render the document often to make sure it runs
Upload the source file (day1-nachname_vorname.qmd) to Moodle
download the source file below yours in the list to the same folder, and try to run it

does it run?

5.8.3

print options

each code chunk can have different print options:
- eval = FALSE: do not evaluate this chunk
- include = FALSE evaluate this chunk but don’t show it or its results
- echo = FALSE print this chunk code
- message = FALSE/warning = false don’t print warnings or messages
- error = TRUE continue rendering document even if there’s an error
  - do not use error = TRUE for final versions! You want to make sure things work as they should

```{r, eval = T, echo = T, results = "asis", warning}
code here
```

```{r}
#| eval: false
code here
```

Session Info

Show Session Info

sessionInfo()

R version 4.3.0 (2023-04-21)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] rbbt_0.0.0.9000 beepr_1.3       lubridate_1.9.2 forcats_1.0.0  
 [5] stringr_1.5.0   dplyr_1.1.2     purrr_1.0.1     readr_2.1.4    
 [9] tidyr_1.3.0     tibble_3.2.1    ggplot2_3.4.2   tidyverse_2.0.0
[13] here_1.0.1     

loaded via a namespace (and not attached):
 [1] utf8_1.2.3        generics_0.1.3    renv_0.17.3       stringi_1.7.12   
 [5] hms_1.1.3         digest_0.6.33     magrittr_2.0.3    evaluate_0.21    
 [9] grid_4.3.0        timechange_0.2.0  fastmap_1.1.1     rprojroot_2.0.3  
[13] jsonlite_1.8.7    audio_0.1-10      fansi_1.0.4       scales_1.2.1     
[17] cli_3.6.1         rlang_1.1.1       crayon_1.5.2      bit64_4.0.5      
[21] munsell_0.5.0     withr_2.5.0       yaml_2.3.7        tools_4.3.0      
[25] parallel_4.3.0    tzdb_0.4.0        colorspace_2.1-0  vctrs_0.6.3      
[29] R6_2.5.1          lifecycle_1.0.3   htmlwidgets_1.6.2 bit_4.0.5        
[33] vroom_1.6.3       pkgconfig_2.0.3   pillar_1.9.0      gtable_0.3.3     
[37] glue_1.6.2        xfun_0.39         tidyselect_1.2.0  rstudioapi_0.15.0
[41] knitr_1.43        htmltools_0.5.5   rmarkdown_2.23    compiler_4.3.0