21  Gaussian models

In the context of a quantitative research study, one simple objectives could be to figure out the probability distribution of the variable of interest: Voice Onset Time, number of telic verbs, informativity score, acceptability ratings, reaction times, and so on. Let’s imagine we are interested in understanding more about the nature of reaction times in auditory lexical decision tasks (lexical decision tasks in which the target is presented aurally rather than in writing). We can revisit the RT data from above to try and address the following research question:

RQ: In a typical auditory lexical decision task, what are the mean and standard deviation of reaction times (RTs)?

Now, you might wonder why the mean and the standard deviation? This is because we are assuming that reaction times (i.e the population of reaction times, rather than our specific sample) are distributed according to a Gaussian probability distribution. It is usually the onus of the researcher to assume a probability distribution family. You will learn some heuristics for picking a distribution family later, but for now the Gaussian family will be a safe assumption to make. In statistical notation, we can write:

\[ \text{RT} \sim Gaussian(\mu, \sigma) \]

which you can read as: “reaction times are distributed according to a Gaussian distribution with mean \(\mu\) and standard deviation \(\sigma\)”. So the research question above is about finding the values of \(\mu\) and \(\sigma\).

For illustration’s sake, let’s assume the sample mean and standard deviation are also the population \(\mu\) and \(\sigma\): \(Gaussian(\mu = 1010, \sigma = 318)\). Figure 21.1 shows the empirical sample probability distribution (in grey) and the theoretical sample probability distribution (in purple) based on the sample mean and SD: in other words, the purple curve is the density curve of the probability distribution \(Gaussian(1010, 318)\), We have seen earlier that the sample mean and SD of the RTs from the mald data are biased, due to uncertainty and variability. What we are really after is the values of \(\mu\) and \(\sigma\) which are the mean and standard deviation of the Gaussian distribution of the population of RTs in auditory lexical decision tasks. In other words, we want to make inference from the sample to the population of RTs.

Code
mald <- readRDS("data/tucker2019/mald_1_1.rds")

rt_mean <- mean(mald$RT)
rt_sd <- sd(mald$RT)
rt_mean_text <- glue("mean: {round(rt_mean)} ms")
rt_sd_text <- glue("SD: {round(rt_sd)} ms")
x_int <- 2000

ggplot(data = tibble(x = 0:300), aes(x)) +
  geom_density(data = mald, aes(RT), colour = "grey", fill = "grey", alpha = 0.2) +
  stat_function(fun = dnorm, n = 101, args = list(rt_mean, rt_sd), colour = "#9970ab", linewidth = 1.5) +
  scale_x_continuous(n.breaks = 5) +
  geom_vline(xintercept = rt_mean, colour = "#1b7837", linewidth = 1) +
  geom_rug(data = mald, aes(RT), alpha = 0.1) +
  annotate(
    "label", x = rt_mean + 1, y = 0.0015,
    label = rt_mean_text,
    fill = "#1b7837", colour = "white"
  ) +
  annotate(
    "label", x = x_int, y = 0.0015,
    label = rt_sd_text,
    fill = "#8c510a", colour = "white"
  ) +
  annotate(
    "label", x = x_int, y = 0.001,
    label = "theoretical sample\ndistribution",
    fill = "#9970ab", colour = "white"
  ) +
  annotate(
    "label", x = x_int, y = 0.0003,
    label = "empirical sample\ndistribution",
    fill = "grey", colour = "white"
  ) +
  labs(
    title = "Theoretical sample distribution of reaction times",
    subtitle = glue("Gaussian distribution: mean = {round(rt_mean)} ms, SD = {round(rt_sd)}"),
    x = "RT (ms)", y = "Relative probability (density)"
  )
Figure 21.1

A statistical tool we can use to obtain an estimate of \(\mu\) and \(\sigma\) is a Gaussian model. A Gaussian model is a statistical model that estimates the values of the parameters of a Gaussian distribution, i.e. \(\mu\) and \(\sigma\). Since the values of the parameters are uncertain, we can estimate a probability distribution of the parameters from the data, rather than just their values. This is what a Bayesian Gaussian model does.

21.1 Prior probability distributions

As mentioned in Chapter 20, the essence of Bayesian inference is updating prior knowledge with . the estimation of posterior probability distributions from prior probability distributions and data. In practical terms, you need prior probability distributions, or priors for short, and data.

[XXX illustration of prior to posterior]

Priors have to be specified by the researcher.

Parameter values can be modelled to come from Gaussian distributions themselves. So, for the mean \(\mu\), we can estimate a Gaussian distribution, with its own mean and standard deviation, \(\mu_1\) and \(\sigma_1\). For the standard deviation \(\sigma\), we can estimate a half Gaussian distribution: why half? Because standard deviations cannot be negative, we can simply include the positive side of a Gaussian distribution centred at 0 (i.e. with mean = 0). In other words, we assume \(\sigma\) to be distributed according to \(Gaussian(0, \sigma_2)\). We can extend the model mathematical notation above to include the distributions of \(\mu\) and \(\sigma\).

\[ \begin{align} \text{RT} & \sim Gaussian(\mu, \sigma)\\ \mu & \sim Gaussian(\mu_1, \sigma_1)\\ \sigma & \sim HalfGaussian(\mu_2 = 0, \sigma_2)\\ \end{align} \]