Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Statistics and Quantitative Methods (S1)

Week 9

Dr Stefano Coretta

University of Edinburgh

2022/11/15

1 / 34

2 / 34

3 / 34

4 / 34

5 / 34

6 / 34

7 / 34

Difference between two groups

We have two hypotheses:

  • Null Hypothesis: the difference between the means of Group A and B is 0 (i.e. there is no difference).

  • Alternative Hypothesis: the difference between the means of Group A and B is not 0.

8 / 34

Difference between two groups

We have two hypotheses:

  • Null Hypothesis: the difference between the means of Group A and B is 0 (i.e. there is no difference).

  • Alternative Hypothesis: the difference between the means of Group A and B is not 0.


H0:μaμb=0

H1:μaμb0

8 / 34

9 / 34

10 / 34

11 / 34

Student's t-statistic

  • We need a standardised measure of difference.
12 / 34

Student's t-statistic

  • We need a standardised measure of difference.

  • One such measure is the Student's t-statistic.

t=μbμaσ2ana+σ2bnb

where:

  • μa and μb are the means of group A and B.

  • σ2a and σ2b are the squared standard deviations (i.e. variances) of group A and B.

  • na and nb are the sample sizes of group A and B.

12 / 34

Student's t-statistic

mono <- rnorm(n = 100, mean = 620, sd = 200)
mono
## [1] 633.4392 752.2582 622.6836 610.4020 783.9224 378.9738 979.2863
## [8] 891.5091 593.8808 516.6690 645.9104 912.1761 370.3845 520.3164
## [15] 611.6311 527.5961 451.6295 694.7455 792.2663 586.6827 774.7792
## [22] 419.2396 646.1119 803.3226 648.2603 314.2178 570.4713 296.9715
## [29] 398.2627 609.8286 509.4211 751.1529 491.1971 785.5648 697.0619
## [36] 499.1004 932.7318 820.9807 884.7186 820.3825 411.5537 727.6397
## [43] 142.3023 382.6853 880.6865 715.6209 1061.3670 487.7238 415.3841
## [50] 656.3461 716.4856 227.0040 855.3491 748.4832 479.5161 673.0863
## [57] 617.9996 544.0142 550.8123 687.1248 727.7789 847.8614 631.0109
## [64] 736.1269 685.9042 811.5472 787.8352 448.1374 832.4290 512.2818
## [71] 807.5913 608.2524 285.2201 623.3032 921.1473 586.2498 746.5308
## [78] 765.1335 509.0301 627.8762 584.6256 749.6515 491.6601 746.5627
## [85] 562.3107 507.2156 907.1254 361.0339 491.9959 730.1587 129.0729
## [92] 790.7218 834.4099 779.9508 960.2928 832.7353 504.3768 424.6288
## [99] 986.2496 936.4772
13 / 34

Student's t-statistic

mono <- rnorm(n = 100, mean = 620, sd = 200)
bi <- rnorm(100, 680, 200)
exp <- tibble(
rt = c(mono, bi),
group = rep(c("mono", "bi"), each = 100)
)
exp
## # A tibble: 200 × 2
## rt group
## <dbl> <chr>
## 1 338. mono
## 2 834. mono
## 3 510. mono
## 4 735. mono
## 5 629. mono
## 6 654. mono
## 7 640. mono
## 8 733. mono
## 9 292. mono
## 10 816. mono
## # … with 190 more rows
14 / 34

Student's t-statistic

15 / 34

Student's t-statistic

t=μbμaσ2ana+σ2bnb

16 / 34

Student's t-statistic

t=μbμaσ2ana+σ2bnb


t=6806202002100+2002100


t=2.12132

16 / 34

Student's t-statistic

17 / 34

Student's t-statistic

18 / 34

p-value

19 / 34

p-value

20 / 34

0.0176256

p-value

21 / 34

p-value

22 / 34

Two-tailed t-test and p-value

  • Difference between means: 60 ms.

  • t-statistic: 2.12.

  • p-value: 0.0352

23 / 34

Two-tailed t-test and p-value

  • Difference between means: 60 ms.

  • t-statistic: 2.12.

  • p-value: 0.0352


There is a 3.5% probability that we would find a difference that is 60 ms or more, assuming that the null hypothesis (H0:μaμb=0) is true.

23 / 34

Two-tailed t-test and p-value

  • Difference between means: 60 ms.

  • t-statistic: 2.12.

  • p-value: 0.0352


There is a 3.5% probability that we would find a difference that is 60 ms or more, assuming that the null hypothesis (H0:μaμb=0) is true.


If p is small enough, we can reject the null hypothesis (that there is no difference).

23 / 34

Two-tailed t-test and p-value

  • Difference between means: 60 ms.

  • t-statistic: 2.12.

  • p-value: 0.0352


There is a 3.5% probability that we would find a difference that is 60 ms or more, assuming that the null hypothesis (H0:μaμb=0) is true.


If p is small enough, we can reject the null hypothesis (that there is no difference).

But how small is small enough?

23 / 34

The α-level

We need to set a threshold, i.e. a value of p below which we decide to reject the null hypothesis.

24 / 34

The α-level

We need to set a threshold, i.e. a value of p below which we decide to reject the null hypothesis.


This threshold is known as the α-level and for most purposes:

α=0.05

24 / 34

The α-level

We need to set a threshold, i.e. a value of p below which we decide to reject the null hypothesis.


This threshold is known as the α-level and for most purposes:

α=0.05


In other words:

  • If p < 0.05: we reject the null hypothesis.

  • If p ≥ 0.05: we cannot reject the null hypothesis.

24 / 34

Null Hypothesis Significance Testing

This method of statistical inference is called Null Hypothesis Significance Testing, or NHST, or frequentist approach.

25 / 34

Null Hypothesis Significance Testing

This method of statistical inference is called Null Hypothesis Significance Testing, or NHST, or frequentist approach.

  • The difference between two means is significant if the p-value is smaller than 0.05 (the α-level).
25 / 34

Null Hypothesis Significance Testing

This method of statistical inference is called Null Hypothesis Significance Testing, or NHST, or frequentist approach.

  • The difference between two means is significant if the p-value is smaller than 0.05 (the α-level).

  • The difference between two means is not significant if the p-value is equal to or greater than 0.05 (the α-level).

25 / 34

Null Hypothesis Significance Testing

This method of statistical inference is called Null Hypothesis Significance Testing, or NHST, or frequentist approach.

  • The difference between two means is significant if the p-value is smaller than 0.05 (the α-level).

  • The difference between two means is not significant if the p-value is equal to or greater than 0.05 (the α-level).

  • If the difference between two means is significant, then we can reject the null hypothesis. If it is not significant, we canno reject the null hypothesis.

25 / 34

Null Hypothesis Significance Testing

IMPORTANT:

  • We can only either reject or not reject the null hypothesis!
26 / 34

Null Hypothesis Significance Testing

IMPORTANT:

  • We can only either reject or not reject the null hypothesis!

  • NHST does not allow us to make statements about the alternative hypothesis.

    • A significant result (i.e. difference or effect) does not mean that the alternative hypothesis is correct.

    • It only indicates that the result is compatible with the alternative hypothesis, but does not provide evidence for it.

26 / 34

Null Hypothesis Significance Testing

IMPORTANT:

  • We can only either reject or not reject the null hypothesis!

  • NHST does not allow us to make statements about the alternative hypothesis.

    • A significant result (i.e. difference or effect) does not mean that the alternative hypothesis is correct.

    • It only indicates that the result is compatible with the alternative hypothesis, but does not provide evidence for it.

  • NHST also does not allow us to accept the null hypothesis, only reject it.

26 / 34

Null Hypothesis Significance Testing

IMPORTANT:

  • We can only either reject or not reject the null hypothesis!

  • NHST does not allow us to make statements about the alternative hypothesis.

    • A significant result (i.e. difference or effect) does not mean that the alternative hypothesis is correct.

    • It only indicates that the result is compatible with the alternative hypothesis, but does not provide evidence for it.

  • NHST also does not allow us to accept the null hypothesis, only reject it.

  • p-values are sensitive to sample size: they decrease with increasing sample size.

    • In other words, a non-significant p-value will become significant with greater a sample size.
26 / 34

Null Hypothesis Significance Testing

IMPORTANT:

  • We can only either reject or not reject the null hypothesis!

  • NHST does not allow us to make statements about the alternative hypothesis.

    • A significant result (i.e. difference or effect) does not mean that the alternative hypothesis is correct.

    • It only indicates that the result is compatible with the alternative hypothesis, but does not provide evidence for it.

  • NHST also does not allow us to accept the null hypothesis, only reject it.

  • p-values are sensitive to sample size: they decrease with increasing sample size.

    • In other words, a non-significant p-value will become significant with greater a sample size.
  • NHST is somewhat perverse, or at least very counterintuitive.

26 / 34

p-values and linear models

  • g/lm() report p-values by default.

  • g/lmer() report p-values when the lmerTest package is attached.

27 / 34

p-values and linear models

mald_lm_1 <- lm(RT ~ IsWord * PhonLev, data = mald_1_1)
summary(mald_lm_1)
##
## Call:
## lm(formula = RT ~ IsWord * PhonLev, data = mald_1_1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -938.58 -193.05 -73.39 100.74 1987.42
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 664.191 34.044 19.510 < 2e-16
## IsWordFALSE 230.167 48.895 4.707 2.58e-06
## PhonLev 40.903 4.741 8.627 < 2e-16
## IsWordFALSE:PhonLev -16.290 6.788 -2.400 0.0164
##
## Residual standard error: 309.5 on 4996 degrees of freedom
## Multiple R-squared: 0.05244, Adjusted R-squared: 0.05187
## F-statistic: 92.17 on 3 and 4996 DF, p-value: < 2.2e-16
28 / 34

p-values and linear models

library(lmerTest)
mald_lm_2 <- lmer(RT ~ IsWord * PhonLev + (IsWord | Subject), data = mald_1_1)
tidy(mald_lm_2)
## # A tibble: 8 × 8
## effect group term estim…¹ std.e…² stati…³ df p.value
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 fixed <NA> (Intercept) 661. 34.2 19.4 1269. 2.42e-73
## 2 fixed <NA> IsWordFALSE 195. 46.7 4.18 3024. 2.97e- 5
## 3 fixed <NA> PhonLev 41.6 4.39 9.46 4934. 4.45e-21
## 4 fixed <NA> IsWordFALSE:PhonLev -12.1 6.30 -1.93 4934. 5.42e- 2
## 5 ran_pars Subject sd__(Intercept) 93.4 NA NA NA NA
## 6 ran_pars Subject cor__(Intercept).Is… 0.383 NA NA NA NA
## 7 ran_pars Subject sd__IsWordFALSE 77.9 NA NA NA NA
## 8 ran_pars Residual sd__Observation 285. NA NA NA NA
## # … with abbreviated variable names ¹​estimate, ²​std.error, ³​statistic
29 / 34

p-values and linear models

Reporting

We fitted a linear model with a Gaussian distribution to reaction times in milliseconds. As predictors we have included whether the word is a real or a nonce word (IsWord), phonemic Levinshtein distance (PhonLev) and the interaction between the two. A by-subject varying intercept was also included together with a by-subject varying slope for IsWord. Including a varying slope for PhonLev led to non-convergence, so it was dropped. P-values were obtained with the lmerTest package (Kuznetsova et al. 2017) using the Satterthwaite's approximation of degrees of freedom.

According to the model, the mean reaction time with real words and Levenshtein distance equal 0 is about 661 ms (SE = 34, t = 19.356, df = 1269.461, p < 0.001). When the word is a nonce word, reaction times increase by 195 ms (SE = 47, t = 4.182, df = 3024.393, p < 0.001). The effect of Levenshtein distance is an increase of about 42 ms in reaction time for every unit increase (SE = 4.4, t = 9.464, df = 4933.716, p < 0.001). The interaction between IsWord and PhonLev indicates an effect decrease of 12 ms, although it is not significant (β = -12, SE = 6.3, t = -1.926, df = 4934.293, p = 0.0542).

30 / 34

p-values and linear models

ggpredict(mald_lm_2, terms = c("PhonLev [all]", "IsWord")) %>% plot()

31 / 34

p-values and linear models

mald_lm_3 <- glmer(ACC ~ IsWord * PhonLev + (IsWord | Subject), data = mald_1_1, family = binomial())
tidy(mald_lm_3)
## # A tibble: 7 × 7
## effect group term estim…¹ std.e…² stati…³ p.value
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 fixed <NA> (Intercept) 1.67 0.386 4.32 1.56e-5
## 2 fixed <NA> IsWordFALSE -2.83 0.630 -4.49 7.01e-6
## 3 fixed <NA> PhonLev 0.0774 0.0534 1.45 1.47e-1
## 4 fixed <NA> IsWordFALSE:PhonLev 0.405 0.0885 4.57 4.82e-6
## 5 ran_pars Subject sd__(Intercept) 0.558 NA NA NA
## 6 ran_pars Subject cor__(Intercept).IsWordFALSE -0.892 NA NA NA
## 7 ran_pars Subject sd__IsWordFALSE 1.18 NA NA NA
## # … with abbreviated variable names ¹​estimate, ²​std.error, ³​statistic
32 / 34

p-values and linear models

Reporting

We fitted a linear model with a Bernoulli distribution (aka binomial or logistic regression) to accuracy. As predictors we have included whether the word is a real or a nonce word (IsWord), phonemic Levinshtein distance (PhonLev) and the interaction between the two. A by-subject varying intercept was also included together with a by-subject varying slope for IsWord. Including a varying slope for PhonLev led to non-convergence, so it was dropped. P-values were obtained with the lmerTest package (Kuznetsova et al. 2017) using the Satterthwaite's approximation of degrees of freedom.

According to the model, the mean accuracy with real words and Levenshtein distance equal 0 is about 84% (β = 1.67, SE = 0.386, z = 4.321, p < 0.001). When the word is a nonce word (and Levenshtein distance is 0), accuracy decreases to 57% (β = -2.83, SE = 0.63, z = -4.493, p < 0.001). The effect of Levenshtein distance on accuracy in real words correponds to an increase of 0.077 log-odds although it is not significant (SE = 0.053, z = 1.451, p = 0.147). The interaction between IsWord and PhonLev indicates there is an effect increase of 0.4 log-odds with nonce words (SE = 0.088, z = 4.572, p < 0.001).

33 / 34

p-values and linear models

ggpredict(mald_lm_3, terms = c("PhonLev [all]", "IsWord")) %>% plot()

34 / 34

2 / 34
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow