Statistics and Quantitative Methods (S2)

.title[
# Statistics and Quantitative Methods (S2)
]
.subtitle[
## Week 4
]
.author[
### Dr Stefano Coretta
]
.institute[
### University of Edinburgh
]
.date[
### 2023/02/07
]

---

## Random variables

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**We have a question about the world, so we sample from a population.**

- `$y$` is a **sample** of values (`$y_1, y_2, y_3, ..., y_n$`).

- We say that the values in the sample `$y$` were generated by a **random variable `$Y$`**.

]

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
In probability theory, **A random variable `$Y$` is a variable whose value is unknown and is generated by a random event.**

- `$Y$` is uncertain.

- We can be describe `$Y$` by talking about the **probabilities** of different values being taken on by `$Y$`.
]

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[
**Probabilities are at the very core of statistics**, because the ultimate aim of statistics is to quantify WHAT?
]

---

## Probabilities

---

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**Probability**

- Of an event occurring or of a random variable taking on a specific value.

- Probabilities can only be **between 0 and 1**.

- ⛔️ 0 means **impossible**.
  - 🤷 0.5 means **it can happen but it can also not happen**.
  - ✅ 1 means **certain**.
]

---

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**Probability**

- Probability of an event occurring: 0 to 100% probability.

- **Probability of a random variable taking on a value**: a bit more complicated...
]

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[
**We need probability distributions!**
]

---

## Probability distributions

---

![](../../img/grubabilities.png)

---

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
A **probability distribution** is a mathematical function that describes *how the probabilities are distributed over the values* that a random variable can take on.
]

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
Two types of probability distributions

- **Discrete probability distributions.**

- **Continuous probability distributions.**
]

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[
**We talked about discrete and continuous variables in Week 2!**

Discrete variables follow discrete probability distributions and continuous variables follow continuous probability distributions.
]

---

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
We can visualise probability distributions:

- Using the **probability mass function** for discrete probability distributions.

- Using the **probability density function** for continuous probability distributions.
]

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[
You don't need to understand the math behind this, but you are free to learn about it through the internet search engine of your choice!
]

---

**Probability Mass Function**

---

**Probability Density Function**

---

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**Probability distributions can be expressed by a set of parameters.**

- We summarise a probability distribution with a **set of parameters**.

- In mathematical notation: `$\Theta = (\theta_1, ..., \theta_n)$`
  
  - Different (sub-)types of probability distributions have a different number of parameters and different parameters.
]

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[
The **Gaussian probability distribution** is a continuous probability distribution and it has two parameters:

- The mean `$\mu$`.
- The standard deviation `$\sigma$`.

]

Go to **[Seeing Theory](https://seeing-theory.brown.edu/probability-distributions/index.html#section2)**.

???

Seeing Theory was created by Daniel Kunin while an undergraduate at Brown University. The goal of this website is to make statistics more accessible through interactive visualizations (designed using Mike Bostock’s JavaScript library D3.js).

<https://seeing-theory.brown.edu/index.html#3rdPage>

---

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
In research, you try to **estimate** the probability distribution of the variable of interest (VOT, number of telic verbs, informativity score, acceptability ratings, ...).

- In other words you are trying to **estimate the parameters** of the probability distribution.
]

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[
Now, let's talk a bit about ontology...
]

---

.bg-washed-yellow.b--orange.ba.bw2.br3.shadow-5.ph4.mt2[
**Frequentist view**

- The parameters are **fixed** (they are *unknown but certain*).

- They take on a specific value.
]

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[
**Bayesian view**

- The parameters are **random variables** (they are *unknown and uncertain*).

- We describe each parameter as a probability distribution.

- And each parameter's probability distribution is described by a set of parameters (called hyper-parameters).
]

---

layout: false
background-image: url(https://media.giphy.com/media/443pAv9m6Ti8KiCoAi/giphy.gif)

---

## Albanian VOT

---

---

Based on the sample (N = 24): mean VOT = 12 ms, with SD = 3 ms.

---

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**Sample of 24 VOT values of Albanian voiceless stops**.

- Sample mean = 11.6 ms.
- Sample SD = 2.8 ms.
]

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
Let's assume VOT values are **distributed according to a Gaussian distribution**.

- The VOT values we sampled are generated by a Gaussian distribution with mean `$\mu$` and standard deviation `$\sigma$`.
]

---

---

We want to make inferences about the population based on the sample.

---

**Read as**: VOT values (`$vot$`) are distributed according to (`$\sim$`) a Gaussian distribution (`$Gaussian()$`) with mean `$\mu$` and standard deviation `$\sigma$`.

---

**Parameters**: mean `$\mu$` and SD `$\sigma$`.

---

**Parameters**: mean `$\mu$` and SD `$\sigma$`.

**BUT**, this does not keep into consideration the **uncertainty and variability** of the sampling procedure.

---

**Parameters**: mean `$\mu$` and SD `$\sigma$`.

**Hyperparameters**: mean `$\mu_1$` and SD `$\sigma_1$`.

Standard deviations are always positive! So we need a truncated Gaussian distribution (only the positive half!).

---

**Parameters**: mean `$\mu$` and SD `$\sigma$`.

**Hyperparameters**: mean `$\mu_1$` and SD `$\sigma_1$`.

**Hyperparameters**: mean `$\mu_2$` and SD `$\sigma_2$`.

---

## Estimating probability distributions

---

.pull-left[
<img src="index_files/figure-html/norm-1.png" width="100%" style="display: block; margin: auto;" />
]

.pull-right[
<img src="index_files/figure-html/hnorm-1.png" width="100%" style="display: block; margin: auto;" />
]

---

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
- `$Gaussian(\mu, \sigma), \mu = Gaussian(\mu_1, \sigma_1), \sigma = TruncGaussian(\mu_2, \sigma_2)$`

- We need to estimate:

- `$\mu_1$` and `$\sigma_1$` for the Gaussian probability distribution of `$\mu$`.

- `$\mu_2$` and `$\sigma_2$` for the truncated Gaussian probability distribution of `$\sigma$`.
]

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[
With our sample of N = 24 we want to make inferences about the population of VOT values of Albanian voiceless stops, by estimating the four parameters `$\mu_1$`, `$\sigma_1$`, `$\mu_2$` and `$\sigma_2$`.
]

---

```r
# Attach the brms package
library(brms)

# Run a Bayesian model
vot_bm <- brm(
 # This is the formula of the model.
 vot ~ 1,
 # This is the probability distribution family.
 family = gaussian(),
 # And the data.
 data = alb_vot_vl
)
```

---

.f2.center[
`vot ~ 1`
]

**Read as**: Model VOT values (`vot`) as a function of (`~`) the mean (`1`).

.f2.center[
`family = gaussian()`
]

**Read as**: using a Gaussian probability distribution.

- The Gaussian distribution also has another parameter, the SD.

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[
**Altogether**: Model VOT values as a function of the mean and standard deviation of a Gaussian probability distribution.
]

---

```
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: vot ~ 1 
##    Data: alb_vot_vl (Number of observations: 24) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    11.62      0.59    10.48    12.78 1.00     2559     2158
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     2.88      0.44     2.18     3.88 1.00     2327     2013
## 
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

---

```
##  Family: gaussian 
## Formula: vot ~ 1 
##    Data: alb_vot_vl (Number of observations: 24) 
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    11.62      0.59    10.48    12.78 1.00     2559     2158
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     2.88      0.44     2.18     3.88 1.00     2327     2013
```

---

```
##  Family: gaussian 
## Formula: vot ~ 1 
##    Data: alb_vot_vl (Number of observations: 24)
```

---

## Estimating the mean

---

```
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    11.62      0.59    10.48    12.78 1.00     2559     2158
```

- **Estimate**: `$\mu_1 = 11.62$` ms.

- **Est.Error**: `$\sigma_1 = 0.59$` ms.
]

.pull-right[
`$$vot \sim Gaussian(\mu, \sigma)$$`
.purple[
`$$\mu \sim Gaussian(\mu_1, \sigma_1)$$`
]

`$$\sigma \sim TruncGaussian(\mu_2, \sigma_2)$$`
]

---

---

There is a 95% probability that mean VOT is between 10.48 and 12.78 ms.

---

## Estimating the standard deviation

---

```
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     2.88      0.44     2.18     3.88 1.00     2327     2013
```

- **Estimate**: `$\mu_2 = 2.88$` ms.

- **Est.Error**: `$\sigma_2 = 0.44$` ms.
]

`$$\mu \sim Gaussian(\mu_1, \sigma_1)$$`

---

---

There is a 95% probability that VOT standard deviation is between 2.18 and 3.88 ms.

---

## Putting mean and SD together

---

## Summary

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
- A random variable `$Y$` is a variable whose value is unknown and is generated by a random event.

- A **probability distribution** is a mathematical function that describes *how the probabilities are distributed over the values* that a random variable can take on.

- **Discrete probability distributions.**
  - **Continuous probability distributions.**

- The Gaussian distribution has two parameters: mean `$\mu$` and SD `$\sigma$`.

- We can describe `$\mu$` and `$\sigma$` as probability distributions and estimate the (hyper-)parameters of those probability distributions.

- R package [brms](https://paul-buerkner.github.io/brms/), function `brm()`.
]