Statistics and Quantitative Methods (S1)

.title[
# Statistics and Quantitative Methods (S1)
]
.subtitle[
## Week 9
]
.author[
### Dr Stefano Coretta
]
.institute[
### University of Edinburgh
]
.date[
### 2022/11/15
]

---

![:scale 70%](../../img/p-value-1.png)

---

![:scale 70%](../../img/p-value-2.png)

---

![:scale 70%](../../img/p-value-3.png)

---

![:scale 70%](../../img/p-value-4.png)

---

![:scale 70%](../../img/p-value-5.png)

---

![:scale 70%](../../img/p-value-6.png)

---

## Difference between two groups

We have two hypotheses:

- **Null Hypothesis**: the difference between the means of Group A and B *is 0* (i.e. there is no difference).

- **Alternative Hypothesis**: the difference between the means of Group A and B *is not 0*.

`$$H_0: \mu_a - \mu_b = 0$$`

`$$H_1: \mu_a - \mu_b \neq 0$$`

---

![:scale 70%](../../img/p-value-7.png)

---

---

---

## Student's *t*-statistic

---

- We need a standardised measure of difference.

- One such measure is the **Student's *t*-statistic**.

`$$t = \frac{\mu_b - \mu_a}{\sqrt{\frac{\sigma^2_a}{n_a} + \frac{\sigma^2_b}{n_b}}}$$`

where:

- `$\mu_a$` and `$\mu_b$` are the means of group A and B.

- `$\sigma^2_a$` and `$\sigma^2_b$` are the squared standard deviations (i.e. variances) of group A and B.

- `$n_a$` and `$n_b$` are the sample sizes of group A and B.

---

```r
mono <- rnorm(n = 100, mean = 620, sd = 200)
mono
```

```
##   [1]  633.4392  752.2582  622.6836  610.4020  783.9224  378.9738  979.2863
##   [8]  891.5091  593.8808  516.6690  645.9104  912.1761  370.3845  520.3164
##  [15]  611.6311  527.5961  451.6295  694.7455  792.2663  586.6827  774.7792
##  [22]  419.2396  646.1119  803.3226  648.2603  314.2178  570.4713  296.9715
##  [29]  398.2627  609.8286  509.4211  751.1529  491.1971  785.5648  697.0619
##  [36]  499.1004  932.7318  820.9807  884.7186  820.3825  411.5537  727.6397
##  [43]  142.3023  382.6853  880.6865  715.6209 1061.3670  487.7238  415.3841
##  [50]  656.3461  716.4856  227.0040  855.3491  748.4832  479.5161  673.0863
##  [57]  617.9996  544.0142  550.8123  687.1248  727.7789  847.8614  631.0109
##  [64]  736.1269  685.9042  811.5472  787.8352  448.1374  832.4290  512.2818
##  [71]  807.5913  608.2524  285.2201  623.3032  921.1473  586.2498  746.5308
##  [78]  765.1335  509.0301  627.8762  584.6256  749.6515  491.6601  746.5627
##  [85]  562.3107  507.2156  907.1254  361.0339  491.9959  730.1587  129.0729
##  [92]  790.7218  834.4099  779.9508  960.2928  832.7353  504.3768  424.6288
##  [99]  986.2496  936.4772
```

---

```r
mono <- rnorm(n = 100, mean = 620, sd = 200)
bi <- rnorm(100, 680, 200)

exp <- tibble(
 rt = c(mono, bi),
 group = rep(c("mono", "bi"), each = 100)
)
exp
```

```
## # A tibble: 200 × 2
## rt group
## <dbl> <chr>
## 1 338. mono 
## 2 834. mono 
## 3 510. mono 
## 4 735. mono 
## 5 629. mono 
## 6 654. mono 
## 7 640. mono 
## 8 733. mono 
## 9 292. mono 
## 10 816. mono 
## # … with 190 more rows
```

---

---

`$$t = \frac{\mu_b - \mu_a}{\sqrt{\frac{\sigma^2_a}{n_a} + \frac{\sigma^2_b}{n_b}}}$$`

`$$t = \frac{680 - 620}{\sqrt{\frac{200^2}{100} + \frac{200^2}{100}}}$$`

`$$t = 2.12132$$`

---

---

---

## *p*-value

---

---

???

0.0176256

---

---

---

## Two-tailed *t*-test and *p*-value

- Difference between means: 60 ms.

- *t*-statistic: 2.12.

- *p*-value: 0.0352

**There is a 3.5% probability that we would find a difference that is 60 ms or more, assuming that the null hypothesis (`$H_0: \mu_a-\mu_b = 0$`) is true.**

---

## The `$\alpha$`-level

We need to **set a threshold**, i.e. a value of *p* below which we decide to **reject the null hypothesis**.

This threshold is known as the **α-level** and for most purposes:

In other words:

- If *p* < 0.05: we reject the null hypothesis.

- If *p* ≥ 0.05: we cannot reject the null hypothesis.

---

## Null Hypothesis Significance Testing

---

This method of statistical inference is called **Null Hypothesis Significance Testing**, or NHST, or frequentist approach.

- The difference between two means is **significant** if the *p*-value is smaller than 0.05 (the α-level).

- The difference between two means is **not significant** if the *p*-value is equal to or greater than 0.05 (the α-level).

- If the difference between two means is significant, then we can reject the null hypothesis. If it is not significant, we canno reject the null hypothesis.

---

**IMPORTANT**:

- We can **only either reject or not reject the null hypothesis**!

- **NHST does not allow us** to make statements about the alternative hypothesis.

- A significant result (i.e. difference or effect) does not mean that the alternative hypothesis is correct.
  
  - It only indicates that the result is compatible with the alternative hypothesis, but does not provide evidence for it.

- **NHST also does not allow us to accept** the null hypothesis, only reject it.

- ***p*-values are sensitive to sample size**: they decrease with increasing sample size.

- In other words, a non-significant *p*-value will become significant with greater a sample size.

- NHST is somewhat **perverse**, or at least very **counterintuitive**.

---

## *p*-values and linear models

---

- `g/lm()` report *p*-values by default.

- `g/lmer()` report *p*-values when the lmerTest package is attached.

---

```r
mald_lm_1 <- lm(RT ~ IsWord * PhonLev, data = mald_1_1)

summary(mald_lm_1)
```

```
## 
## Call:
## lm(formula = RT ~ IsWord * PhonLev, data = mald_1_1)
## 
## Residuals:
## Min 1Q Median 3Q Max 
## -938.58 -193.05 -73.39 100.74 1987.42 
## 
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 664.191 34.044 19.510 < 2e-16
## IsWordFALSE 230.167 48.895 4.707 2.58e-06
## PhonLev 40.903 4.741 8.627 < 2e-16
## IsWordFALSE:PhonLev -16.290 6.788 -2.400 0.0164
## 
## Residual standard error: 309.5 on 4996 degrees of freedom
## Multiple R-squared: 0.05244,	Adjusted R-squared: 0.05187 
## F-statistic: 92.17 on 3 and 4996 DF, p-value: < 2.2e-16
```

---

```r
library(lmerTest)
mald_lm_2 <- lmer(RT ~ IsWord * PhonLev + (IsWord | Subject), data = mald_1_1)

tidy(mald_lm_2)
```

```
## # A tibble: 8 × 8
## effect group term estim…¹ std.e…² stati…³ df p.value
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 fixed <NA> (Intercept) 661. 34.2 19.4 1269. 2.42e-73
## 2 fixed <NA> IsWordFALSE 195. 46.7 4.18 3024. 2.97e- 5
## 3 fixed <NA> PhonLev 41.6 4.39 9.46 4934. 4.45e-21
## 4 fixed <NA> IsWordFALSE:PhonLev -12.1 6.30 -1.93 4934. 5.42e- 2
## 5 ran_pars Subject sd__(Intercept) 93.4 NA NA NA NA 
## 6 ran_pars Subject cor__(Intercept).Is… 0.383 NA NA NA NA 
## 7 ran_pars Subject sd__IsWordFALSE 77.9 NA NA NA NA 
## 8 ran_pars Residual sd__Observation 285. NA NA NA NA 
## # … with abbreviated variable names ¹estimate, ²std.error, ³statistic
```

---

### Reporting

We fitted a linear model with a Gaussian distribution to reaction times in milliseconds. As predictors we have included whether the word is a real or a nonce word (`IsWord`), phonemic Levinshtein distance (`PhonLev`) and the interaction between the two.
A by-subject varying intercept was also included together with a by-subject varying slope for `IsWord`. Including a varying slope for PhonLev led to non-convergence, so it was dropped.
*P*-values were obtained with the lmerTest package (Kuznetsova et al. 2017) using the Satterthwaite's approximation of degrees of freedom.

According to the model, the mean reaction time with real words and Levenshtein distance equal 0 is about 661 ms (SE = 34, *t* = 19.356, df = 1269.461, *p* < 0.001).
When the word is a nonce word, reaction times increase by 195 ms (SE = 47, *t* = 4.182, df = 3024.393, *p* < 0.001).
The effect of Levenshtein distance is an increase of about 42 ms in reaction time for every unit increase (SE = 4.4, *t* = 9.464, df = 4933.716, *p* < 0.001).
The interaction between IsWord and PhonLev indicates an effect decrease of 12 ms, although it is not significant (`$\beta$` = -12, SE = 6.3, *t* = -1.926, df = 4934.293, *p* = 0.0542).

---

```r
ggpredict(mald_lm_2, terms = c("PhonLev [all]", "IsWord")) %>% plot()
```

---

```r
mald_lm_3 <- glmer(ACC ~ IsWord * PhonLev + (IsWord | Subject), data = mald_1_1, family = binomial())

tidy(mald_lm_3)
```

```
## # A tibble: 7 × 7
## effect group term estim…¹ std.e…² stati…³ p.value
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 fixed <NA> (Intercept) 1.67 0.386 4.32 1.56e-5
## 2 fixed <NA> IsWordFALSE -2.83 0.630 -4.49 7.01e-6
## 3 fixed <NA> PhonLev 0.0774 0.0534 1.45 1.47e-1
## 4 fixed <NA> IsWordFALSE:PhonLev 0.405 0.0885 4.57 4.82e-6
## 5 ran_pars Subject sd__(Intercept) 0.558 NA NA NA 
## 6 ran_pars Subject cor__(Intercept).IsWordFALSE -0.892 NA NA NA 
## 7 ran_pars Subject sd__IsWordFALSE 1.18 NA NA NA 
## # … with abbreviated variable names ¹estimate, ²std.error, ³statistic
```

---

### Reporting

We fitted a linear model with a Bernoulli distribution (aka binomial or logistic regression) to accuracy. As predictors we have included whether the word is a real or a nonce word (`IsWord`), phonemic Levinshtein distance (`PhonLev`) and the interaction between the two.
A by-subject varying intercept was also included together with a by-subject varying slope for `IsWord`. Including a varying slope for PhonLev led to non-convergence, so it was dropped.
*P*-values were obtained with the lmerTest package (Kuznetsova et al. 2017) using the Satterthwaite's approximation of degrees of freedom.

According to the model, the mean accuracy with real words and Levenshtein distance equal 0 is about 84% (`$\beta$` = 1.67, SE = 0.386, *z* = 4.321, *p* < 0.001).
When the word is a nonce word (and Levenshtein distance is 0), accuracy decreases to 57% (`$\beta$` = -2.83, SE = 0.63, *z* = -4.493, *p* < 0.001).
The effect of Levenshtein distance on accuracy in real words correponds to an increase of 0.077 log-odds although it is not significant (SE = 0.053, *z* = 1.451, *p* = 0.147).
The interaction between IsWord and PhonLev indicates there is an effect increase of 0.4 log-odds with nonce words (SE = 0.088, *z* = 4.572, *p* < 0.001).

---

```r
ggpredict(mald_lm_3, terms = c("PhonLev [all]", "IsWord")) %>% plot()
```