Statistics and Quantitative Methods (S1)

.title[
# Statistics and Quantitative Methods (S1)
]
.subtitle[
## Week 8
]
.author[
### Dr Stefano Coretta
]
.institute[
### University of Edinburgh
]
.date[
### 2022/11/08
]

---

.f1.link.dim.br3.ph3.pv2.mb2.dib.white.bg-purple[
[.white[Linear models illustrated]](https://stefanocoretta.shinyapps.io/lines/)
]

---

## Repeated measures: Multiple subjects

---

```r
mald_1_1
```

```
## # A tibble: 5,000 × 7
##    Subject Item        IsWord PhonLev    RT ACC     RT_log
##    <chr>   <chr>       <fct>    <dbl> <int> <fct>    <dbl>
##  1 15308   acreage     TRUE      6.01   617 correct   6.42
##  2 15308   maxraezaxr  FALSE     6.78  1198 correct   7.09
##  3 15308   prognosis   TRUE      8.14   954 correct   6.86
##  4 15308   giggles     TRUE      6.22   579 correct   6.36
##  5 15308   baazh       FALSE     6.13  1011 correct   6.92
##  6 15308   unflagging  TRUE      7.66  1402 correct   7.25
##  7 15308   ihnpaykaxrz FALSE     7.47  1059 correct   6.97
##  8 15308   hawk        TRUE      6.09   739 correct   6.61
##  9 15308   assessing   TRUE      6.37   789 correct   6.67
## 10 15308   mehlaxl     FALSE     5.80   926 correct   6.83
## # … with 4,990 more rows
```

---

```r
mald_1_1 %>%
  mutate(ACC_num = ifelse(ACC == "correct", 1, 0)) %>%
  ggplot(aes(PhonLev, ACC_num, colour = IsWord)) +
  geom_point() +
  geom_smooth(method = "glm", method.args = list(family = "binomial"))
```

---

.f1.link.dim.br3.ph3.pv2.mb2.dib.white.bg-purple[
[.white[Linear models illustrated]](https://stefanocoretta.shinyapps.io/lines/)
]

---

## Random effects: varying intercept

---

```r
mald_lm <- glm(ACC ~ IsWord * PhonLev, data = mald_1_1, family = binomial())

tidy(mald_lm)
```

```
## # A tibble: 4 × 5
##   term                estimate std.error statistic    p.value
##   <chr>                  <dbl>     <dbl>     <dbl>      <dbl>
## 1 (Intercept)           1.59      0.368       4.31 0.0000162 
## 2 IsWordFALSE          -2.77      0.591      -4.68 0.00000285
## 3 PhonLev               0.0719    0.0520      1.38 0.167     
## 4 IsWordFALSE:PhonLev   0.385     0.0861      4.47 0.00000784
```

---

```r
mald_lm_p <- ggpredict(mald_lm, terms = c("PhonLev [all]", "IsWord")) %>% plot(limits = c(0.75, 1))
mald_lm_p
```

---

```r
subjs <- sample(mald_1_1$Subject, 9)
mald_1_1 %>%
  filter(Subject %in% subjs) %>%
  ggplot(aes(Subject, fill = ACC)) +
  geom_bar(position = "fill")
```

---

```r
library(lme4)
mald_lm_2 <- glmer(ACC ~ IsWord * PhonLev + (1 | Subject), data = mald_1_1, family = binomial())

cat(capture.output(summary(mald_lm_2))[13:23], sep = "\n")
```

```
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  Subject (Intercept) 0.09121  0.302   
## Number of obs: 5000, groups:  Subject, 50
## 
## Fixed effects:
##                     Estimate Std. Error z value Pr(>|z|)
## (Intercept)          1.61126    0.37304   4.319 1.57e-05
## IsWordFALSE         -2.77650    0.59261  -4.685 2.80e-06
## PhonLev              0.07383    0.05234   1.410    0.158
```

---

```r
mald_lm_2_p <- ggpredict(mald_lm_2, terms = c("PhonLev [all]", "IsWord")) %>%
  plot(limits = c(0.75, 1))
mald_lm_2_p
```

---

```r
# library(patchwork)
mald_lm_p + mald_lm_2_p + plot_layout(nrow = 2, guides = "collect")
```

---

Let's check the **conditional means**

```r
mald_lm_2_r <- ranef(mald_lm_2) %>% as_tibble()
mald_lm_2_r
```

```
## # A tibble: 50 × 5
##    grpvar  term        grp    condval condsd
##    <chr>   <fct>       <fct>    <dbl>  <dbl>
##  1 Subject (Intercept) 15308  0.450    0.234
##  2 Subject (Intercept) 15345  0.0591   0.221
##  3 Subject (Intercept) 15373 -0.0208   0.217
##  4 Subject (Intercept) 15384 -0.392    0.201
##  5 Subject (Intercept) 15388  0.00308  0.216
##  6 Subject (Intercept) 15540 -0.388    0.204
##  7 Subject (Intercept) 15739 -0.147    0.215
##  8 Subject (Intercept) 15903 -0.251    0.208
##  9 Subject (Intercept) 16018 -0.260    0.209
## 10 Subject (Intercept) 16054 -0.402    0.205
## # … with 40 more rows
```

---

```r
mald_lm_2_r %>%
  ggplot(aes(condval, grp)) +
  geom_vline(xintercept = 0, colour = "#7fc97f") +
  geom_pointrange(
    aes(xmin = condval + 1.96 * condsd, xmax = condval - 1.96 * condsd),
    alpha = 0.7
  )
```

---

---

```r
ggpredict(mald_lm_2, terms = c("PhonLev [all]", "IsWord", "Subject [sample=8]"), type = "re") %>% plot()
```

---

### Reporting

Easy! Same as before but in the model specification blurb add the following:

> We also included a by-subject varying (or random) intercept.

---

.f1.link.dim.br3.ph3.pv2.mb2.dib.white.bg-purple[
[.white[Linear models illustrated]](https://stefanocoretta.shinyapps.io/lines/)
]

---

## Random effects: varying slopes

---

```r
mald_1_1 %>%
  filter(Subject %in% sample(mald_1_1$Subject, 9)) %>%
  mutate(ACC_num = ifelse(ACC == "correct", 1, 0)) %>%
  ggplot(aes(PhonLev, ACC_num, colour = IsWord)) +
  geom_point() +
  geom_smooth(method = "glm", method.args = list(family = "binomial")) +
  facet_wrap(~ Subject)
```

---

---

```r
mald_lm_3 <- glmer(ACC ~ IsWord * PhonLev + (IsWord | Subject), data = mald_1_1, family = binomial())

cat(capture.output(summary(mald_lm_3))[13:24], sep = "\n")
```

```
## 
## Random effects:
##  Groups  Name        Variance Std.Dev. Corr 
##  Subject (Intercept) 0.3114   0.558         
##          IsWordFALSE 1.4020   1.184    -0.89
## Number of obs: 5000, groups:  Subject, 50
## 
## Fixed effects:
##                     Estimate Std. Error z value Pr(>|z|)
## (Intercept)          1.66918    0.38633   4.321 1.56e-05
## IsWordFALSE         -2.83255    0.63037  -4.493 7.01e-06
## PhonLev              0.07740    0.05336   1.451    0.147
```

---

```r
mald_lm_3_p <- ggpredict(mald_lm_3, terms = c("PhonLev [all]", "IsWord")) %>%
  plot(limits = c(0.75, 1))
mald_lm_3_p
```

---

```r
mald_lm_p + mald_lm_3_p + plot_layout(nrow = 2, guides = "collect")
```

---

Let's check the **conditional means**

```r
mald_lm_3_r <- ranef(mald_lm_3) %>% as_tibble()
mald_lm_3_r
```

```
## # A tibble: 100 × 5
##    grpvar  term        grp   condval condsd
##    <chr>   <fct>       <fct>   <dbl>  <dbl>
##  1 Subject (Intercept) 15308  0.124   0.340
##  2 Subject (Intercept) 15345  0.714   0.358
##  3 Subject (Intercept) 15373 -0.0677  0.321
##  4 Subject (Intercept) 15384  0.380   0.350
##  5 Subject (Intercept) 15388  0.229   0.349
##  6 Subject (Intercept) 15540 -0.415   0.297
##  7 Subject (Intercept) 15739 -0.689   0.294
##  8 Subject (Intercept) 15903  0.920   0.367
##  9 Subject (Intercept) 16018  0.0954  0.329
## 10 Subject (Intercept) 16054  0.967   0.372
## # … with 90 more rows
```

---

```r
mald_lm_3_r %>%
  filter(term == "IsWordFALSE") %>%
  ggplot(aes(condval, reorder(grp, condval))) +
  geom_vline(xintercept = 0, colour = "#7fc97f") +
  geom_pointrange(
    aes(xmin = condval + 1.96 * condsd, xmax = condval - 1.96 * condsd),
    alpha = 0.7
  )
```

---

---

```r
ggpredict(mald_lm_3, terms = c("PhonLev [all]", "IsWord", "Subject [sample=9]"), type = "re") %>% plot()
```

---

### Reporting

Easy! Same as before but in the model specification blurb add the following:

> We also included a by-subject varying intercept plus a by-subject varying slope for IsWord.

---

## Repeated measures: Multiple subjects and words

---

```r
shallow_filt <- shallow %>%
  filter(Branching == "Left")

shallow_filt
```

```
## # A tibble: 1,170 × 11
##    Group ID    List  Target    RT RT_log Criti…¹ Word_…² Relat…³ Branc…⁴ accur…⁵
##    <chr> <chr> <chr> <chr>  <dbl>  <dbl> <chr>   <chr>   <fct>   <chr>   <chr>  
##  1 L1    L1_01 A     unawa…   603   6.40 Critic… Word    Unrela… Left    correct
##  2 L1    L1_01 A     unhol…   739   6.61 Critic… Word    Consti… Left    correct
##  3 L1    L1_01 A     unhap…   370   5.91 Critic… Word    Unrela… Left    correct
##  4 L1    L1_01 A     unsha…   821   6.71 Critic… Word    Consti… Left    correct
##  5 L1    L1_01 A     unrip…  1035   6.94 Critic… Word    Unrela… Left    incorr…
##  6 L1    L1_01 A     unkin…   498   6.21 Critic… Word    NonCon… Left    correct
##  7 L1    L1_01 A     unwar…  1133   7.03 Critic… Word    NonCon… Left    correct
##  8 L1    L1_01 A     uncle…   513   6.24 Critic… Word    Consti… Left    correct
##  9 L1    L1_01 A     reobt…   964   6.87 Critic… Word    NonCon… Left    incorr…
## 10 L1    L1_01 A     resea…   709   6.56 Critic… Word    Unrela… Left    correct
## # … with 1,160 more rows, and abbreviated variable names ¹Critical_Filler,
## #   ²Word_Nonword, ³Relation_type, ⁴Branching, ⁵accuracy
```

---

```r
unique(shallow_filt$ID)
```

```
##  [1] "L1_01" "L1_02" "L1_03" "L1_05" "L1_06" "L1_07" "L1_08" "L1_09" "L1_10"
## [10] "L1_11" "L1_12" "L1_13" "L1_14" "L1_15" "L1_16" "L1_17" "L1_18" "L1_19"
## [19] "L1_20" "L1_21" "L1_22" "L1_23" "L1_24" "L1_25" "L1_26" "L1_27" "L1_28"
## [28] "L1_29" "L1_30" "L2_01" "L2_02" "L2_03" "L2_04" "L2_05" "L2_06" "L2_07"
## [37] "L2_08" "L2_10" "L2_11" "L2_12" "L2_13" "L2_14" "L2_16" "L2_17" "L2_18"
## [46] "L2_19" "L2_20" "L2_21" "L2_22" "L2_23" "L2_24" "L2_25" "L2_26" "L2_27"
## [55] "L2_28" "L2_29" "L2_30" "L2_31" "L2_32" "L2_33" "L2_34" "L2_35" "L2_36"
## [64] "L2_37" "L2_38"
```

```r
unique(shallow_filt$Target)
```

```
##  [1] "unawareness"  "unholiness"   "unhappiness"  "unsharpness"  "unripeness"  
##  [6] "unkindness"   "unwariness"   "unclearness"  "reobtainable" "resealable"  
## [11] "recomputable" "readjustable" "reattachable" "resellable"   "reclosable"  
## [16] "uncleverness" "reusable"     "rehydratable"
```

`ID` and `Target` are crossed, i.e. each subject has seen all the targets and each target has been seen by each subject.

We talk about **crossed random effects**.

---

```r
shal_lm <- lmer(
  RT ~ Relation_type * Group + (1 | ID) + (1 | Target),
  data = shallow_filt
)

cat(capture.output(summary(shal_lm))[11:25], sep = "\n")
```

```
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  ID       (Intercept) 21495    146.61  
##  Target   (Intercept)  7928     89.04  
##  Residual             86941    294.86  
## Number of obs: 1170, groups:  ID, 65; Target, 18
## 
## Fixed effects:
##                                     Estimate Std. Error t value
## (Intercept)                          897.788     41.005  21.894
## Relation_typeConstituent             -95.502     31.619  -3.020
## Relation_typeNonConstituent          -18.350     31.619  -0.580
## GroupL2                              160.338     47.336   3.387
## Relation_typeConstituent:GroupL2       4.362     42.488   0.103
## Relation_typeNonConstituent:GroupL2  -45.018     42.488  -1.060
```

---

```r
shal_lm_p <- ggpredict(shal_lm, terms = c("Group", "Relation_type")) %>% plot()
shal_lm_p
```

---

```r
shal_lm_r <- ranef(shal_lm) %>% as_tibble()
shal_lm_r
```

```
## # A tibble: 83 × 5
##    grpvar term        grp   condval condsd
##    <chr>  <fct>       <fct>   <dbl>  <dbl>
##  1 ID     (Intercept) L1_01  -106.    63.9
##  2 ID     (Intercept) L1_02    13.9   63.9
##  3 ID     (Intercept) L1_03   -12.6   63.9
##  4 ID     (Intercept) L1_05   164.    63.9
##  5 ID     (Intercept) L1_06   -83.9   63.9
##  6 ID     (Intercept) L1_07   -55.3   63.9
##  7 ID     (Intercept) L1_08   -16.9   63.9
##  8 ID     (Intercept) L1_09  -101.    63.9
##  9 ID     (Intercept) L1_10    29.2   63.9
## 10 ID     (Intercept) L1_11  -167.    63.9
## # … with 73 more rows
```

---

```r
shal_lm_r %>%
  filter(grpvar == "ID") %>%
  ggplot(aes(condval, reorder(grp, condval))) +
  geom_vline(xintercept = 0, colour = "#7fc97f") +
  geom_pointrange(
    aes(xmin = condval + 1.96 * condsd, xmax = condval - 1.96 * condsd),
    alpha = 0.7
  )
```

---

---

```r
shal_lm_r %>%
  filter(grpvar == "Target") %>%
  ggplot(aes(condval, reorder(grp, condval))) +
  geom_vline(xintercept = 0, colour = "#7fc97f") +
  geom_pointrange(
    aes(xmin = condval + 1.96 * condsd, xmax = condval - 1.96 * condsd),
    alpha = 0.7
  )
```

---

---

```r
ggpredict(shal_lm, terms = c("ID [sample=6]", "Relation_type"), type = "re") %>% plot()
```

---

```r
ggpredict(shal_lm, terms = c("Target [sample=6]", "Relation_type"), type = "re") %>% plot()
```

---

```r
shal_lm_2 <- lmer(RT ~ Relation_type * Group + (Relation_type | ID) + (Relation_type | Target),
  data = shallow_filt)
cat(capture.output(summary(shal_lm_2))[11:20], sep = "\n")
```

```
## 
## Random effects:
##  Groups   Name                        Variance Std.Dev. Corr       
##  ID       (Intercept)                 24923.06 157.870             
##           Relation_typeConstituent      517.75  22.754  -0.93      
##           Relation_typeNonConstituent   837.73  28.944  -0.46  0.76
##  Target   (Intercept)                  7220.07  84.971             
##           Relation_typeConstituent       70.07   8.371  -1.00      
##           Relation_typeNonConstituent   468.28  21.640   1.00 -1.00
##  Residual                             86554.56 294.202
```

---

```r
cat(capture.output(summary(shal_lm_2))[22:29], sep = "\n")
```

```
## 
## Fixed effects:
##                                     Estimate Std. Error t value
## (Intercept)                          897.854     41.930  21.413
## Relation_typeConstituent             -95.618     31.890  -2.998
## Relation_typeNonConstituent          -18.300     32.409  -0.565
## GroupL2                              160.432     49.499   3.241
## Relation_typeConstituent:GroupL2       4.199     42.770   0.098
```

---

```r
shal_lm_2_p <- ggpredict(shal_lm_2, terms = c("Group", "Relation_type")) %>% plot()
shal_lm_2_p
```

---

```r
ggpredict(shal_lm_2, terms = c("ID [sample=10]", "Relation_type"), type = "re") %>% plot()
```

---

```r
ggpredict(shal_lm_2, terms = c("Target [sample=6]", "Relation_type"), type = "re") %>% plot()
```

---

### Reporting

Easy! Same as before but in the model specification blurb add the following:

> We also included by-subject and by-target varying intercepts, plus by-subject and by-target varying slopes for relation type.