Statistics and Quantitative Methods (S1)

.title[
# Statistics and Quantitative Methods (S1)
]
.subtitle[
## Week 7
]
.author[
### Dr Stefano Coretta
]
.institute[
### University of Edinburgh
]
.date[
### 2022/09/01
]

---

# INTERACTIONS

---

# Motivating interactions

---

Example 1: Shallow morphological parsing

```r
shallow %>% select(ID, Group, Target, Relation_type, Branching, accuracy)
```

```
## # A tibble: 1,950 × 6
## ID Group Target Relation_type Branching accuracy 
## <chr> <chr> <chr> <fct> <chr> <chr> 
## 1 L1_01 L1 unawareness Unrelated Left correct 
## 2 L1_01 L1 unholiness Constituent Left correct 
## 3 L1_01 L1 unhappiness Unrelated Left correct 
## 4 L1_01 L1 unsharpness Constituent Left correct 
## 5 L1_01 L1 unripeness Unrelated Left incorrect
## 6 L1_01 L1 undeniable Unrelated Right correct 
## 7 L1_01 L1 unskillful Constituent Right correct 
## 8 L1_01 L1 unkindness NonConstituent Left correct 
## 9 L1_01 L1 unwariness NonConstituent Left correct 
## 10 L1_01 L1 unclearness Constituent Left correct 
## # … with 1,940 more rows
```

---

```r
shallow <- shallow %>%
 filter(Relation_type != "NonConstituent") %>%
 mutate(
 Group = factor(Group, levels = c("L1", "L2")),
 Relation_type = factor(Relation_type, levels = c("Unrelated", "Constituent", "NonConstituent")),
 Branching = factor(Branching, levels = c("Left", "Right")),
 accuracy = factor(accuracy, levels = c("incorrect", "correct"))
 )
```

Let's filter the data to exclude `"NonConstituent"` from `Relation_type` and let's change character columns to factor columns.

---

```r
shallow %>%
  ggplot(aes(Relation_type, fill = accuracy)) +
  geom_bar(position = "fill") +
  facet_grid(~ Branching)
```

]

![](index_files/figure-html/shallow-plot-2-1.png)

]

---

Let's try first with no interaction.

```r
shallow_lm <- glm(accuracy ~ Relation_type + Branching, data = shallow, family = binomial())
```

`$$\text{accuracy}_{LO} = \beta_0 + \beta_1 \cdot \text{Relation_type}_{Const} + \beta_2 \cdot \text{Branching}_{Right}$$`

`$$\text{accuracy(Unrelated, Left)}_{LO} = \beta_0 + \beta_1 \cdot 0 + \beta_2 \cdot 0 = \beta_0$$`

`$$\text{accuracy(Constituent, Left)}_{LO} = \beta_0 + \beta_1 \cdot 1 + \beta_2 \cdot 0 = \beta_0 + \beta_1$$`
`$$\text{accuracy(Unrelated, Right)}_{LO} = \beta_0 + \beta_1 \cdot 0 + \beta_2 \cdot 1 = \beta_0 + \beta_2$$`

`$$\text{accuracy(Constituent, Right)}_{LO} = \beta_0 + \beta_1 \cdot 1 + \beta_2 \cdot 1 = \beta_0 + \beta_1 + \beta_2$$`

---

```r
# library(broom.mixed)
tidy(shallow_lm)
```

```
## # A tibble: 3 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.888 0.105 8.45 2.98e-17
## 2 Relation_typeConstituent 0.477 0.146 3.28 1.04e- 3
## 3 BranchingRight 1.52 0.165 9.18 4.47e-20
```

|                | Left          | Right                 |
|----------------|---------------|-----------------------|
| Unrelated      | 0.888         | 0.888 + 1.516         |
| Constituent    | 0.888 + 0.477 | 0.888 + 0.477 + 1.516 |

|                | Left  | Right |
|----------------|-------|-------|
| Unrelated      | 0.888 | 2.404 |
| Constituent    | 1.365 | 2.881 |

---

```r
ggpredict(shallow_lm, terms = c("Relation_type", "Branching"))
```

---

```r
ggpredict(shallow_lm, terms = c("Branching", "Relation_type")) %>% plot()
```

---

```r
shallow %>%
  ggplot(aes(Relation_type, fill = accuracy)) +
  geom_bar(position = "fill") +
  facet_grid(~ Branching)
```

---

# Categorical + categorical: Example 1

---

Let's add an interaction term between `Relation_type` and `Branching`.

```r
shallow_lm_2 <- glm(
 accuracy ~ Relation_type + Branching + Relation_type:Branching,
 data = shallow, family = binomial()
)
```

`$$\text{accuracy}_{LO} = \beta_0 + \beta_1 \cdot \text{Relation_type}_{Const} + \beta_2 \cdot \text{Branching}_{Right} + \beta_3 \cdot \text{Relation_type}_{Const} \cdot \text{Branching}_{Right}$$`

`$$\text{accuracy(Unrelated, Left)}_{LO} = \beta_0 + \beta_1 \cdot 0 + \beta_2 \cdot 0 + \beta_3 \cdot 1 \cdot 0 = \beta_0$$`

`$$\text{accuracy(Constituent, Left)}_{LO} = \beta_0 + \beta_1 \cdot 1 + \beta_2 \cdot 0 + \beta_3 \cdot 1 \cdot 0 = \beta_0 + \beta_1$$`
`$$\text{accuracy(Unrelated, Right)}_{LO} = \beta_0 + \beta_1 \cdot 0 + \beta_2 \cdot 1 + \beta_3 \cdot 0 \cdot 1 = \beta_0 + \beta_2$$`

`$$\text{accuracy(Constituent, Right)}_{LO} = \beta_0 + \beta_1 \cdot 1 + \beta_2 \cdot 1 + \beta_3 \cdot 1 \cdot 1 = \beta_0 + \beta_1 + \beta_2 + \beta_3$$`

---

```r
tidy(shallow_lm_2)
```

```
## # A tibble: 4 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.823 0.110 7.48 7.22e-14
## 2 Relation_typeConstituent 0.629 0.170 3.71 2.10e- 4
## 3 BranchingRight 1.78 0.228 7.80 6.44e-15
## 4 Relation_typeConstituent:BranchingRight -0.588 0.331 -1.78 7.58e- 2
```

|             | Left          | Right                          |
|-------------|---------------|--------------------------------|
| Unrelated   | 0.822         | 0.822 + 1.775                  |
| Constituent | 0.822 + 0.628 | 0.822 + 0.628 + 1.775 + -0.588 |

|             | Left  | Right |
|-------------|-------|-------|
| Unrelated   | 0.822 | 2.597 |
| Constituent | 1.45  | 2.637 |

---

```r
ggpredict(shallow_lm_2, terms = c("Relation_type", "Branching"))
```

---

```r
ggpredict(shallow_lm_2, terms = c("Branching", "Relation_type")) %>% plot()
```

---

### Reporting the model and results

We fitted a linear model with a Bernoulli family to accuracy (incorrect vs correct) with the following predictors: relation type (unrelated vs constituent) and branching (left vs right).
We also included an interaction between relation type and branching.
Both predictors were coded with treatment contrasts and the first level listed above was set as the reference level.
We report in the following paragraph the results as probabilities (the model's estimates in log-odds are reported between parenthesis with standard errors).

According to the model, there is a 69% probability of obtaining a correct response when the relation type is unrelated and the word is left-branching (`$\beta$` = 0.822, SE = 0.109).
When relation type is constituent and the word is left-branching, there probability increases to 81% (`$\beta$` = 0.628, SE = 0.169).
When the word is right-branching and the relation type is unrelated, there is a 93% probability of obtaining a correct response (`$\beta$` = 1.775, SE = 0.227).
The interaction term between relation type and branching indicates that when the word is right-branching and relation type is constituent there is a negligible difference in probability relative to relation type unrelated (`$\beta$` = -0.588, SE = 0.331), so that in the right-branching condition relation type does not really affects the probability of getting a correct response.

---

# Categorical + categorical: Example 2

---

Example 2: Vowel duration and consonant voicing in Italian and Polish

- CVCV words where V1 = /a, o, u/ and C2 = /t, d, k, ɡ/.

- Investigate the effect of voicing of C2 (voiceless /t, k/, voiced /d, g/) on V1 duration.

- Data from Italian and Polish speakers.

---

```r
dur_ita_pol %>% select(speaker, word, vowel, v1_duration, c2_phonation)
```

```
## # A tibble: 1,334 × 5
## speaker word vowel v1_duration c2_phonation
## <chr> <chr> <chr> <dbl> <chr> 
## 1 it01 pugu u 95.2 voiced 
## 2 it01 pada a 139. voiced 
## 3 it01 poco o 127. voiceless 
## 4 it01 pata a 127. voiceless 
## 5 it01 boco o 132. voiceless 
## 6 it01 podo o 125. voiced 
## 7 it01 boto o 134. voiceless 
## 8 it01 paca a 119. voiceless 
## 9 it01 bodo o 130. voiced 
## 10 it01 pucu u 99.1 voiceless 
## # … with 1,324 more rows
```

---

```r
d_pos <- position_dodge(width = 0.8)

vdur_voi_plot <- ggplot(dur_ita_pol, aes(c2_phonation, v1_duration)) +
 geom_jitter(size = 1, alpha = 0.25, width = 0.2) +
 geom_violin(aes(fill = c2_phonation), width = 0.25) +
 stat_summary(fun = mean, position = d_pos, colour = "black") +
 labs(
 x = "Consonant voicing", y = "Vowel duration"
 ) +
 theme(legend.position = "none")
```

---

---

```r
vdur_lang_plot <- ggplot(dur_ita_pol, aes(language, v1_duration)) +
 geom_jitter(size = 1, alpha = 0.25, width = 0.2) +
 geom_violin(aes(fill = language), width = 0.25) +
 stat_summary(fun = mean, position = d_pos, colour = "black") +
 labs(
 x = "Language", y = "Vowel duration"
 ) +
 theme(legend.position = "none")
```

---

---

```r
jd_pos <- position_jitterdodge(dodge.width = 0.8, jitter.width = 0.3)
d_pos <- position_dodge(width = 0.8)

vdur_plot <- ggplot(dur_ita_pol, aes(language, v1_duration)) +
 geom_jitter(aes(colour = c2_phonation), size = 1, alpha = 0.25, position = jd_pos) +
 geom_violin(aes(fill = c2_phonation), position = d_pos, width = 0.5) +
 stat_summary(aes(group = paste(c2_phonation, language)), fun = mean, position = d_pos, colour = "black") +
 labs(
 x = "Language", y = "Vowel duration"
 ) +
 theme(legend.position = "none")
```

---

???

Is the effect of consonant voicing on vowel duration the same in Italian and Polish?

To answer this question we need to include an interaction between consonant voicing and language.

---

```r
vdur_lm <- lm(v1_duration ~ c2_phonation * language, data = dur_ita_pol)
# NOTE: "c2_phonation * language" is a short-cut for
# "c2_phonation + language + c2_phonation:language"
```

`$$\text{v1_duration} = \beta_0 + \beta_1 \cdot \text{c2_phonation} + \beta_2 \cdot \text{language} + \beta_3 \cdot \text{c2_phonation} \cdot \text{language}$$`

`$$\text{v1_duration(voiced, Italian)} = \beta_0 + \beta_1 \cdot 0 + \beta_2 \cdot 0 + \beta_3 \cdot 0 \cdot 0= \beta_0$$`

`$$\text{v1_duration(voiceless, Italian)} = \beta_0 + \beta_1 \cdot 1 + \beta_2 \cdot 0 + \beta_3 \cdot 1 \cdot 0 = \beta_0 + \beta_1$$`

`$$\text{v1_duration(voiced, Polish)} = \beta_0 + \beta_1 \cdot 0 + \beta_2 \cdot 1 + \beta_3 \cdot 0 \cdot 1 = \beta_0 + \beta_2$$`

`$$\text{v1_duration(voiceless, Polish)} = \beta_0 + \beta_1 \cdot 1 + \beta_2 \cdot 1 + \beta_3 \cdot 1 \cdot 1 = \beta_0 + \beta_1 + \beta_2 + \beta_3$$`

---

```r
tidy(vdur_lm)
```

```
## # A tibble: 4 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 124. 1.38 89.8 0 
## 2 c2_phonationvoiceless -13.5 1.95 -6.94 6.22e-12
## 3 languagePolish -41.0 2.36 -17.4 5.47e-61
## 4 c2_phonationvoiceless:languagePolish 5.78 3.34 1.73 8.36e- 2
```

---

```r
ggpredict(vdur_lm, terms = c("c2_phonation", "language"))
```

---

```r
ggpredict(vdur_lm, terms = c("c2_phonation", "language")) %>%
  plot()
```

???

```r
marginaleffects(vdur_lm, variables = "c2_phonation", by = "language")
```

```
##       type         term                       contrast       dydx std.error
## 1 response c2_phonation mean(voiceless) - mean(voiced) -13.522775  1.949285
## 2 response c2_phonation mean(voiceless) - mean(voiced)  -7.743682  2.709335
##   statistic      p.value  conf.low conf.high language
## 1 -6.937299 3.996664e-12 -17.34330 -9.702246  Italian
## 2 -2.858149 4.261203e-03 -13.05388 -2.433484   Polish
```

---

### Reporting the model and results

We fitted a linear model with a Gaussian family to vowel duration with the following predictors: C2 voicing (voiced vs voiceless) and language (Italian vs Polish).
We also included an interaction between voicing type and language.
Both predictors were coded with treatment contrasts and the first level listed above was set as the reference level.

According to the model, vowels followed by a voiced consonant are 124 ms long in Italian (SE = 1.38).
In Polish, vowels followed by a voiced consonant are 83 ms long (`$\beta$` = -41, SE = 2).
Italian vowels followed by a voiceless consonant are 13 ms shorter (SE = 1.9) than vowels followed by a voiced consonant.
According to the interaction, the effect of voicing (i.e. consonant being voiceless) in Polish is 5 ms smaller than the effect in Italian (`$\beta$` = 5.7, SE = 3.3).

---

# Categorical + continous

---

```r
dur_ita <- dur_ita_pol %>%
 filter(language == "Italian") %>%
 mutate(
 vowel = factor(vowel, levels = c("a", "o", "u"))
 )

dur_ita %>% select(word, vowel, v1_duration, c2_duration)
```

```
## # A tibble: 879 × 4
## word vowel v1_duration c2_duration
## <chr> <fct> <dbl> <dbl>
## 1 pugu u 95.2 85.0
## 2 pada a 139. 75.2
## 3 poco o 127. 103. 
## 4 pata a 127. 100. 
## 5 boco o 132. 108. 
## 6 podo o 125. 70.0
## 7 boto o 134. 109. 
## 8 paca a 119. 91.7
## 9 bodo o 130. 61.3
## 10 pucu u 99.1 124. 
## # … with 869 more rows
```

---

```r
dur_ita %>%
  ggplot(aes(c2_duration, v1_duration)) +
  geom_point(alpha = 0.25) +
  geom_smooth(aes(colour = vowel), method = "lm")
```

???

There seems to be a positive effect of consonant duration on vowel duration: i.e. the longer the consonant the longer the vowel.

But is the effect of consonant duration on vowel duration the same for the three vowels /a, o, u/?

---

```r
vdur_lm_2 <- lm(v1_duration ~ c2_duration * vowel, data = dur_ita)
```

$$
`\begin{align}
\text{v1_duration} = & \beta_0 + \beta_1 \cdot \text{c2_duration} + \beta_2 \cdot \text{vowel_o} + \beta_3 \cdot + \text{vowel_u} \\
& + \beta_4 \cdot \text{c2_duration} \cdot \text{vowel_o} + \beta_5 \cdot \text{c2_duration} \cdot \text{vowel_u}
\end{align}`
$$

`$$\text{v1_duration(c2_dur = 0, a)} = \beta_0 + \beta_1 \cdot 0 + \beta_2 \cdot 0 + \beta_3 \cdot 0 \cdot 0= \beta_0$$`

`$$\text{v1_duration(c2_dur = 0, o)} = \beta_0 + \beta_1 \cdot 0 + \beta_2 \cdot 1 + \beta_3 \cdot 0 + \beta_4 \cdot 0 \cdot 1 + \beta_5 \cdot 0 \cdot 0 = \beta_0 + \beta_2$$`

`$$\text{v1_duration(c2_dur = 0, u)} = \beta_0 + \beta_1 \cdot 0 + \beta_2 \cdot 0 + \beta_3 \cdot 1 + \beta_4 \cdot 0 \cdot 0 + \beta_5 \cdot 0 \cdot 1 = \beta_0 + \beta_3$$`

---

`$$\text{v1_duration(c2_dur = 1, a)} = \beta_0 + \beta_1 \cdot 1 + \beta_2 \cdot 0 + \beta_3 \cdot 0 \cdot 0= \beta_0 + \beta_1$$`

`$$\text{v1_duration(c2_dur = 1, o)} = \beta_0 + \beta_1 \cdot 1 + \beta_2 \cdot 1 + \beta_3 \cdot 0 + \beta_4 \cdot 1 \cdot 1 + \beta_5 \cdot 1 \cdot 0 = \beta_0 + \beta_1 + \beta_2 + \beta_4$$`

`$$\text{v1_duration(c2_dur = 1, u)} = \beta_0 + \beta_1 \cdot 1 + \beta_2 \cdot 0 + \beta_3 \cdot 1 + \beta_4 \cdot 1 \cdot 0 + \beta_5 \cdot 1 \cdot 1 = \beta_0 + \beta_1 + \beta_3 + \beta_5$$`

---

```r
tidy(vdur_lm_2)
```

```
## # A tibble: 6 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 103. 6.09 16.9 1.50e-55
## 2 c2_duration 0.299 0.0674 4.44 1.01e- 5
## 3 vowelo 2.21 8.42 0.262 7.93e- 1
## 4 vowelu -13.3 8.91 -1.49 1.36e- 1
## 5 c2_duration:vowelo -0.105 0.0905 -1.16 2.48e- 1
## 6 c2_duration:vowelu -0.200 0.0950 -2.10 3.57e- 2
```

---

```r
ggpredict(
  vdur_lm_2, 
  terms = c(
    "c2_duration [0, 50, 100]",
    "vowel"
  )
)
```
]

```
## # Predicted values of v1_duration
## 
## # vowel = a
## 
## c2_duration | Predicted |           95% CI
## ------------------------------------------
##           0 |    102.87 | [ 90.93, 114.81]
##          50 |    117.84 | [111.90, 123.79]
##         100 |    132.82 | [128.89, 136.75]
## 
## # vowel = o
## 
## c2_duration | Predicted |           95% CI
## ------------------------------------------
##           0 |    105.08 | [ 93.68, 116.48]
##          50 |    114.82 | [108.77, 120.86]
##         100 |    124.55 | [120.93, 128.18]
## 
## # vowel = u
## 
## c2_duration | Predicted |          95% CI
## -----------------------------------------
##           0 |     89.58 | [76.84, 102.33]
##          50 |     94.56 | [87.85, 101.28]
##         100 |     99.54 | [95.82, 103.26]
```
]

???

You can specify specific values for continuous predictors, like in `"c2_duration [0, 50, 100]"`.

Here, I am asking `ggpredict()` to return predicted values for when `c2_duration` is 0, 50 and 100 ms.

To learn more about how to specify values, check the documentation in `?ggpredict`.

---

```r
ggpredict(vdur_lm_2, terms = c("c2_duration", "vowel")) %>%
  plot()
```

???

Note that for plotting you should not give specific values for the continuous predictor that goes into the *x*-axis, because this axis will include the entire range of values for that continuous predictor.

---

### Reporting the model and results

We fitted a linear model with a Gaussian family to Italian vowel duration with the following predictors: C2 duration and vowel (/a/, /o/, /u/).
We also included an interaction between C2 duration and vowel.
Vowel was coded with treatment contrasts using /a/ as the reference level.

According to the model, /a/ is 102 ms long (SE = 6) when C2 duration is 0 ms.
When C2 duration is 0, the duration of /o/ is virtually the same as that of /a/ (`$\beta$` = 2, SE = 8.42).
On the other hand, the model suggests that /u/ is 13 ms shorter than /a/ (SE = 8.9) when C2 duration is 0.
For each unit increase of C2 duration, /a/ increases by 0.3 ms (SE = 0.06).
In other words, for every 10 ms increase in C2 duration there is a corresponding 3 ms increase in the duration of /a/.
The interaction between C2 duration and vowel indicates that for each unit increase of C2 duration, the expected C2 duration effect for /o/ is 0.1 ms smaller than that of /a/ (`$\beta$` = -0.1, SE = 0.09) while the effect for /u/ is 0.2 ms smaller than that of /a/ (`$\beta$` = -0.2, SE = 0.09).
In other words, for each 10 ms increase of C2 duration, /o/ increases by 2 ms and /u/ increases by 1 ms.
Overall, the model suggest that the effect of C2 duration decreases from /a/ to /o/ to /u/, albeit by a small margin.

???

Now the question is: *Is a 1 ms difference in vowel duration meaningful?*.

This is a question that pertains to linguistics, rather than quantitative data analysis or statistics, so I will leave it at that (but if you are curious come to my office hours and I can tell you about my PhD thesis).

---

# Continuous + continuous

---

```r
dur_ita %>%
  ggplot(aes(c2_duration, v1_duration, colour = speech_rate)) +
  geom_point(alpha = 0.5)
```

???

Let's disregard vowel for the time being (you will add vowel in in the tutorial).

---

```r
vdur_lm_3 <- lm(v1_duration ~ c2_duration * speech_rate, data = dur_ita)

tidy(vdur_lm_3)
```

```
## # A tibble: 4 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 221. 19.0 11.6 4.61e-29
## 2 c2_duration 0.730 0.197 3.71 2.24e- 4
## 3 speech_rate -14.3 3.60 -3.97 7.80e- 5
## 4 c2_duration:speech_rate -0.199 0.0392 -5.08 4.56e- 7
```

---

```r
ggpredict(vdur_lm_3, terms = c("c2_duration", "speech_rate [3, 5, 7]")) %>%
  plot()
```

---

```r
ggpredict(vdur_lm_3, terms = c("speech_rate", "c2_duration [50, 100, 150]")) %>%
  plot()
```

---

```r
bi_pred <- ggpredict(vdur_lm_3, terms = c("c2_duration [0:200]", "speech_rate [0:8 by=0.25]")) %>%
 mutate(group = as.numeric(as.character(group)))

bi_pred_plot <- bi_pred %>%
 ggplot(aes(x, group, fill = predicted)) +
 geom_raster() +
 scale_fill_distiller(direction = 1, palette = "YlGnBu") +
 labs(
 x = "C2 duration (ms)", y = "Speech rate (syl/s)",
 fill = "Vowel\nduration"
 )
```

---

---

### Reporting the model and results

We fitted a linear model with a Gaussian family to vowel duration, with C2 duration and speech rate (as syllables per second) as predictors, including an interaction between the two.

The model suggests that, when C2 duration and speech rate are at 0, Italian vowels are on average 220 ms long (SE = 19).
The effect of C2 duration (when speech rate is 0) is 0.72 ms (SE = 0.2) while the effect of speech rate (when C2 duration is 0) is -14 ms (SE = 3.6).
The interaction of C2 duration and speech rate indicates that 
the effect of C2 duration on vowel duration decreases by 0.19 ms (SE = 0.04) for each unit increase of speech rate (i.e. +1 syl/s).
Equivalently, the effect of speech rate on vowel duration decreases by 0.19 ms for each unit increase (i.e. +1 ms) of C2 duration.