We have two hypotheses:
Null Hypothesis: the difference between the means of Group A and B is 0 (i.e. there is no difference).
Alternative Hypothesis: the difference between the means of Group A and B is not 0.
We have two hypotheses:
Null Hypothesis: the difference between the means of Group A and B is 0 (i.e. there is no difference).
Alternative Hypothesis: the difference between the means of Group A and B is not 0.
H0:μa−μb=0
H1:μa−μb≠0
We need a standardised measure of difference.
One such measure is the Student's t-statistic.
t=μb−μa√σ2ana+σ2bnb
where:
μa and μb are the means of group A and B.
σ2a and σ2b are the squared standard deviations (i.e. variances) of group A and B.
na and nb are the sample sizes of group A and B.
mono <- rnorm(n = 100, mean = 620, sd = 200)mono
## [1] 633.4392 752.2582 622.6836 610.4020 783.9224 378.9738 979.2863## [8] 891.5091 593.8808 516.6690 645.9104 912.1761 370.3845 520.3164## [15] 611.6311 527.5961 451.6295 694.7455 792.2663 586.6827 774.7792## [22] 419.2396 646.1119 803.3226 648.2603 314.2178 570.4713 296.9715## [29] 398.2627 609.8286 509.4211 751.1529 491.1971 785.5648 697.0619## [36] 499.1004 932.7318 820.9807 884.7186 820.3825 411.5537 727.6397## [43] 142.3023 382.6853 880.6865 715.6209 1061.3670 487.7238 415.3841## [50] 656.3461 716.4856 227.0040 855.3491 748.4832 479.5161 673.0863## [57] 617.9996 544.0142 550.8123 687.1248 727.7789 847.8614 631.0109## [64] 736.1269 685.9042 811.5472 787.8352 448.1374 832.4290 512.2818## [71] 807.5913 608.2524 285.2201 623.3032 921.1473 586.2498 746.5308## [78] 765.1335 509.0301 627.8762 584.6256 749.6515 491.6601 746.5627## [85] 562.3107 507.2156 907.1254 361.0339 491.9959 730.1587 129.0729## [92] 790.7218 834.4099 779.9508 960.2928 832.7353 504.3768 424.6288## [99] 986.2496 936.4772
mono <- rnorm(n = 100, mean = 620, sd = 200)bi <- rnorm(100, 680, 200)exp <- tibble( rt = c(mono, bi), group = rep(c("mono", "bi"), each = 100))exp
## # A tibble: 200 × 2## rt group## <dbl> <chr>## 1 338. mono ## 2 834. mono ## 3 510. mono ## 4 735. mono ## 5 629. mono ## 6 654. mono ## 7 640. mono ## 8 733. mono ## 9 292. mono ## 10 816. mono ## # … with 190 more rows
t=μb−μa√σ2ana+σ2bnb
t=μb−μa√σ2ana+σ2bnb
t=680−620√2002100+2002100
t=2.12132
0.0176256
Difference between means: 60 ms.
t-statistic: 2.12.
p-value: 0.0352
Difference between means: 60 ms.
t-statistic: 2.12.
p-value: 0.0352
There is a 3.5% probability that we would find a difference that is 60 ms or more, assuming that the null hypothesis (H0:μa−μb=0) is true.
Difference between means: 60 ms.
t-statistic: 2.12.
p-value: 0.0352
There is a 3.5% probability that we would find a difference that is 60 ms or more, assuming that the null hypothesis (H0:μa−μb=0) is true.
If p is small enough, we can reject the null hypothesis (that there is no difference).
Difference between means: 60 ms.
t-statistic: 2.12.
p-value: 0.0352
There is a 3.5% probability that we would find a difference that is 60 ms or more, assuming that the null hypothesis (H0:μa−μb=0) is true.
If p is small enough, we can reject the null hypothesis (that there is no difference).
But how small is small enough?
We need to set a threshold, i.e. a value of p below which we decide to reject the null hypothesis.
We need to set a threshold, i.e. a value of p below which we decide to reject the null hypothesis.
This threshold is known as the α-level and for most purposes:
α=0.05
We need to set a threshold, i.e. a value of p below which we decide to reject the null hypothesis.
This threshold is known as the α-level and for most purposes:
α=0.05
In other words:
If p < 0.05: we reject the null hypothesis.
If p ≥ 0.05: we cannot reject the null hypothesis.
This method of statistical inference is called Null Hypothesis Significance Testing, or NHST, or frequentist approach.
This method of statistical inference is called Null Hypothesis Significance Testing, or NHST, or frequentist approach.
This method of statistical inference is called Null Hypothesis Significance Testing, or NHST, or frequentist approach.
The difference between two means is significant if the p-value is smaller than 0.05 (the α-level).
The difference between two means is not significant if the p-value is equal to or greater than 0.05 (the α-level).
This method of statistical inference is called Null Hypothesis Significance Testing, or NHST, or frequentist approach.
The difference between two means is significant if the p-value is smaller than 0.05 (the α-level).
The difference between two means is not significant if the p-value is equal to or greater than 0.05 (the α-level).
If the difference between two means is significant, then we can reject the null hypothesis. If it is not significant, we canno reject the null hypothesis.
IMPORTANT:
IMPORTANT:
We can only either reject or not reject the null hypothesis!
NHST does not allow us to make statements about the alternative hypothesis.
A significant result (i.e. difference or effect) does not mean that the alternative hypothesis is correct.
It only indicates that the result is compatible with the alternative hypothesis, but does not provide evidence for it.
IMPORTANT:
We can only either reject or not reject the null hypothesis!
NHST does not allow us to make statements about the alternative hypothesis.
A significant result (i.e. difference or effect) does not mean that the alternative hypothesis is correct.
It only indicates that the result is compatible with the alternative hypothesis, but does not provide evidence for it.
NHST also does not allow us to accept the null hypothesis, only reject it.
IMPORTANT:
We can only either reject or not reject the null hypothesis!
NHST does not allow us to make statements about the alternative hypothesis.
A significant result (i.e. difference or effect) does not mean that the alternative hypothesis is correct.
It only indicates that the result is compatible with the alternative hypothesis, but does not provide evidence for it.
NHST also does not allow us to accept the null hypothesis, only reject it.
p-values are sensitive to sample size: they decrease with increasing sample size.
IMPORTANT:
We can only either reject or not reject the null hypothesis!
NHST does not allow us to make statements about the alternative hypothesis.
A significant result (i.e. difference or effect) does not mean that the alternative hypothesis is correct.
It only indicates that the result is compatible with the alternative hypothesis, but does not provide evidence for it.
NHST also does not allow us to accept the null hypothesis, only reject it.
p-values are sensitive to sample size: they decrease with increasing sample size.
NHST is somewhat perverse, or at least very counterintuitive.
g/lm()
report p-values by default.
g/lmer()
report p-values when the lmerTest package is attached.
mald_lm_1 <- lm(RT ~ IsWord * PhonLev, data = mald_1_1)summary(mald_lm_1)
## ## Call:## lm(formula = RT ~ IsWord * PhonLev, data = mald_1_1)## ## Residuals:## Min 1Q Median 3Q Max ## -938.58 -193.05 -73.39 100.74 1987.42 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|)## (Intercept) 664.191 34.044 19.510 < 2e-16## IsWordFALSE 230.167 48.895 4.707 2.58e-06## PhonLev 40.903 4.741 8.627 < 2e-16## IsWordFALSE:PhonLev -16.290 6.788 -2.400 0.0164## ## Residual standard error: 309.5 on 4996 degrees of freedom## Multiple R-squared: 0.05244, Adjusted R-squared: 0.05187 ## F-statistic: 92.17 on 3 and 4996 DF, p-value: < 2.2e-16
library(lmerTest)mald_lm_2 <- lmer(RT ~ IsWord * PhonLev + (IsWord | Subject), data = mald_1_1)tidy(mald_lm_2)
## # A tibble: 8 × 8## effect group term estim…¹ std.e…² stati…³ df p.value## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>## 1 fixed <NA> (Intercept) 661. 34.2 19.4 1269. 2.42e-73## 2 fixed <NA> IsWordFALSE 195. 46.7 4.18 3024. 2.97e- 5## 3 fixed <NA> PhonLev 41.6 4.39 9.46 4934. 4.45e-21## 4 fixed <NA> IsWordFALSE:PhonLev -12.1 6.30 -1.93 4934. 5.42e- 2## 5 ran_pars Subject sd__(Intercept) 93.4 NA NA NA NA ## 6 ran_pars Subject cor__(Intercept).Is… 0.383 NA NA NA NA ## 7 ran_pars Subject sd__IsWordFALSE 77.9 NA NA NA NA ## 8 ran_pars Residual sd__Observation 285. NA NA NA NA ## # … with abbreviated variable names ¹estimate, ²std.error, ³statistic
We fitted a linear model with a Gaussian distribution to reaction times in milliseconds. As predictors we have included whether the word is a real or a nonce word (IsWord
), phonemic Levinshtein distance (PhonLev
) and the interaction between the two.
A by-subject varying intercept was also included together with a by-subject varying slope for IsWord
. Including a varying slope for PhonLev led to non-convergence, so it was dropped.
P-values were obtained with the lmerTest package (Kuznetsova et al. 2017) using the Satterthwaite's approximation of degrees of freedom.
According to the model, the mean reaction time with real words and Levenshtein distance equal 0 is about 661 ms (SE = 34, t = 19.356, df = 1269.461, p < 0.001). When the word is a nonce word, reaction times increase by 195 ms (SE = 47, t = 4.182, df = 3024.393, p < 0.001). The effect of Levenshtein distance is an increase of about 42 ms in reaction time for every unit increase (SE = 4.4, t = 9.464, df = 4933.716, p < 0.001). The interaction between IsWord and PhonLev indicates an effect decrease of 12 ms, although it is not significant (β = -12, SE = 6.3, t = -1.926, df = 4934.293, p = 0.0542).
ggpredict(mald_lm_2, terms = c("PhonLev [all]", "IsWord")) %>% plot()
mald_lm_3 <- glmer(ACC ~ IsWord * PhonLev + (IsWord | Subject), data = mald_1_1, family = binomial())tidy(mald_lm_3)
## # A tibble: 7 × 7## effect group term estim…¹ std.e…² stati…³ p.value## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>## 1 fixed <NA> (Intercept) 1.67 0.386 4.32 1.56e-5## 2 fixed <NA> IsWordFALSE -2.83 0.630 -4.49 7.01e-6## 3 fixed <NA> PhonLev 0.0774 0.0534 1.45 1.47e-1## 4 fixed <NA> IsWordFALSE:PhonLev 0.405 0.0885 4.57 4.82e-6## 5 ran_pars Subject sd__(Intercept) 0.558 NA NA NA ## 6 ran_pars Subject cor__(Intercept).IsWordFALSE -0.892 NA NA NA ## 7 ran_pars Subject sd__IsWordFALSE 1.18 NA NA NA ## # … with abbreviated variable names ¹estimate, ²std.error, ³statistic
We fitted a linear model with a Bernoulli distribution (aka binomial or logistic regression) to accuracy. As predictors we have included whether the word is a real or a nonce word (IsWord
), phonemic Levinshtein distance (PhonLev
) and the interaction between the two.
A by-subject varying intercept was also included together with a by-subject varying slope for IsWord
. Including a varying slope for PhonLev led to non-convergence, so it was dropped.
P-values were obtained with the lmerTest package (Kuznetsova et al. 2017) using the Satterthwaite's approximation of degrees of freedom.
According to the model, the mean accuracy with real words and Levenshtein distance equal 0 is about 84% (β = 1.67, SE = 0.386, z = 4.321, p < 0.001). When the word is a nonce word (and Levenshtein distance is 0), accuracy decreases to 57% (β = -2.83, SE = 0.63, z = -4.493, p < 0.001). The effect of Levenshtein distance on accuracy in real words correponds to an increase of 0.077 log-odds although it is not significant (SE = 0.053, z = 1.451, p = 0.147). The interaction between IsWord and PhonLev indicates there is an effect increase of 0.4 log-odds with nonce words (SE = 0.088, z = 4.572, p < 0.001).
ggpredict(mald_lm_3, terms = c("PhonLev [all]", "IsWord")) %>% plot()
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |