Statistics and Quantitative Methods (S2)

.title[
# Statistics and Quantitative Methods (S2)
]
.subtitle[
## Week 7 - Binary outcomes
]
.author[
### <strike>Dr Stefano Coretta</strike> Elizabeth Pankratz
]
.institute[
### University of Edinburgh
]
.date[
### 2023/03/07
]

---

## Any questions about the new teaching situation?

While you're thinking, please fill out the attendance form:

`https://forms.office.com/e/XzxhFEBHv6`

![:scale 30%](imgs/qr.png)

---

## Summary from last time

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
- **Comparing groups** with `brm()`

- `outcome ~ predictor`.
  - Categorical predictor with 2 and 3 levels.

- **Treatment coding** of categorical predictors.

- N–1 **dummy variables**, where N is number of levels in the predictor.
  - Level ordering is *alphabetical* but you can specify your own.
  - **NOTE**: You don't have to apply treatment coding yourself! It's done under the hood by R. But you should **understand how it works**.

- **Remember:**

- The **Intercept** `$\beta_0$` is the mean of the reference level.
  - The other `$\beta$`s are the **difference** of the other levels relative to the reference level.
]

---

## Six learning objectives for this week

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**(1)** What are **binary outcomes?**
]

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**(2)** How do we **visualise** them?
]

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**(3)** What **distribution** do binary outcomes follow?
]

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**(4)** The data is binary, but linear models require a **continuous outcome space.**
How do we fit such a model using binary data?
]

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**(5)** How do we **interpret** the estimates of such a model?
]

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**(6)** What are **credible intervals**?
]

---

## What are binary outcomes?

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**The variable you're trying to predict has two levels**, e.g.:

- yes / no
  - grammatical / ungrammatical
  - Spanish / English
  - indirect object (*gave the girl the book*) / to-PP (*gave the book to the girl*)
  - correct / incorrect

Very common in linguistics!
]

???

LO 1

---

## Today's running example: Morphological processing

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[

- English L1 and L2 participants (L2 participants are native speakers of Cantonese).

- **Lexical decision task:** Is the string you see a word in English or not?

- Each trial:

- **Prime**: *prolong* (unrelated), *unkind* (constituent), *kindness* (non-constituent).

- **Target**: *unkindness* (*[un-kind]-ness*, not *un-[kind-ness]*).

- Data gathered: Reaction times and accuracy.
]

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[
We will focus on **accuracy** (correct identification of real word: **correct/incorrect**) for L1 participants.
]

---

## Visualise as proportions

???

LO2

---

## What process could have generated this data?

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
We assume that there is **some probability `$p$` of responding correctly.**

We don't know this probability, so we want to **use the data to estimate it.**
]

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**How can a probability generate binary outcomes?**

- Imagine a coin flip: `$p = 0.5$`.
  
  - You flip the coin 10 times, you get 10 observations of binary outcomes (heads/tails), with probably about 50/50 frequency.
  
  - If it's not a fair coin, you might get 9 heads and 1 tail  from, e.g., `$p = 0.9$`.
]

---

## What distribution do we use to model this process?

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
We want to use a distribution that is:

- based on the probability of some event `$p$`, and
  - able to generate binary outcomes (0 and 1).
]

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[

The best distribution for the job: the **Bernoulli distribution**.

`$$Bernoulli(p)$$`

]

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
- A Bernoulli distribution generates a 1 ("success") with probability `$p$`.
- And a 0 ("failure") with probability `$1 - p = q$`.
  
And we can **think of our binary outcomes in terms of 0s and 1s:**

- 0 = incorrect (or tails, or English, or ...)
  - 1 = correct (or heads, or Spanish, or ...)

]

???

LO3

---

## Building up the model

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[

- Accuracy `$\text{acc}$` is a binary variable: incorrect vs. correct.

- `$p$` is the probability of obtaining a "correct" response.

]

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[

**Our goal: estimate `$p$`.**

But there's a problem...

]

---

## Linear models can't cope with bounded data

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[

- A line, in principle, goes on forever in a continuous space.

- But probabilities are bounded between 0 and 1.

- This means we can't fit a line to probabilities directly.

]

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[

The solution: **transform the probabilities into a continuous space**.

]

---

## The transformation: `logit()`

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[

- The **logit** (*log*istic un*it*) function converts probabilities to **log-odds**.

- The model can work with log-odds because they are not bounded.

]

???

LO 4

---

## What are log-odds?

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[

First, **what are odds**?

$$
\text{odds} = \frac{\text{probability of a thing happening}}{\text{probability of the thing not happening}} = \frac{p}{1-p}
$$
]

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[

If the probability of rain tomorrow is `$p = 0.7$`, then the odds of rain tomorrow are:

$$
`\begin{aligned}
\text{odds}_{rain} &= \frac{p}{1-p} \\
&= \frac{0.7}{1-0.7} \\
&= \frac{0.7}{0.3} \\
&= 2.333...
\end{aligned}`
$$
]

???

No longer bounded by 1, but are odds unbounded?

---

## Odds are still bounded (and also asymmetrical)

---
## Applying the log function unsquishes [0, 1]

---
## Log-odds are continuous and unbounded

---
## Probabilities and log-odds

???

On logit vs logistic function: <https://stats.stackexchange.com/a/120364>.

---

## From probabilities to log-odds

Use `qlogis()` (logit function) to go from probabilities to log-odds.

```r
qlogis(0.3)
```

```
## [1] -0.8472979
```

```r
qlogis(0.5)
```

```
## [1] 0
```

```r
qlogis(0.7)
```

```
## [1] 0.8472979
```

---

## From log-odds to probabilities

Use `plogis()` (logistic function, the inverse of the logit function) to go from log-odds to probabilities.

```r
plogis(-1)
```

```
## [1] 0.2689414
```

```r
plogis(0)
```

```
## [1] 0.5
```

```r
plogis(1)
```

```
## [1] 0.7310586
```

---

## Back to the model

.f3[
$$
`\begin{aligned}
\text{acc} & \sim Bernoulli(p) \\
logit(p) & \sim Gaussian(\mu, \sigma)
\end{aligned}`
$$
]

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
- Because we are dealing with the outcome variable in log-odds space, **all the parameters of the model are going to be estimated in log-odds space.**
  - Here, this applies to `$\mu$` and `$\sigma$`.
  
- We'll need to remember this when interpreting the model's estimates.
]

---

# But first, a few minutes' break 😮‍💨

Then we'll fit the model to the lexical decision data!

---
## Set up the outcome variable

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[

- We need to define a "success" level for our outcome variable.

- We'll set success = `"correct"`, so that we estimate `$p$` as the probability of responding correctly.

- To make this happen, let's manually reorder the levels in `Accuracy` as `"incorrect"` (will be coded as 0), `"correct"` (will be coded as 1).
]

```r
shallow <- shallow %>%
 mutate(
 Accuracy = factor(Accuracy, level = c("incorrect", "correct"))
 )

levels(shallow$Accuracy)
```

```
## [1] "incorrect" "correct"
```

---
layout:false
layout:true

## Starting small: Intercept-only model

---

```r
acc_bm <- brm(
 Accuracy ~ 1,
 family = bernoulli(),
 data = shallow,
 backend = "cmdstanr",
 file = "data/cache/acc_bm"
)
```

```r
summary(acc_bm)
```

```
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: Accuracy ~ 1 
##    Data: shallow (Number of observations: 518) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     1.32      0.11     1.11     1.53 1.00     1218     1913
## 
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

---

```
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     1.32      0.11     1.11     1.53 1.00     1218     1913
```

- **Estimate**: `$\mu = 1.32$` log-odds.

- **Est.Error**: `$\sigma = 0.11$` log-odds.
]

.pull-right[
$$
`\begin{aligned}
\text{acc} & \sim Bernoulli(p) \\
logit(p) & \sim Gaussian(1.32, 0.11)
\end{aligned}`
$$
]

---

There is a 95% probability that the log-odds of getting a "correct" response are between 1.11 and 1.53.

---

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
- 'There is a 95% probability that the log-odds of getting a "correct" response are between 1.11 and 1.53.'

- Uh, but what do these numbers actually mean...?
]

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[
In easier-to-understand probability terms:

**There is a 95% probability that the probability of getting a "correct" response is between 0.75 and 0.82.**
]

```r
round(plogis(1.11), 2)
```

```
## [1] 0.75
```

```r
round(plogis(1.53), 2)
```

```
## [1] 0.82
```

---
layout:false

## Set up the predictor `Relation_type`

```r
shallow <- shallow %>%
 mutate(
 Relation_type = factor(Relation_type, level = c("Unrelated", "NonConstituent", "Constituent"))
 )

levels(shallow$Relation_type)
```

```
## [1] "Unrelated"      "NonConstituent" "Constituent"
```

|                            | NonConstituent | Constituent |
|----------------------------|----------------|---------------|
| Relation type = Unrelated       | 0              | 0             |
| Relation type = NonConstituent  | 1              | 0             |
| Relation type = Constituent     | 0              | 1             |

---

## Modelling accuracy by relation type

.f3[
$$
`\begin{aligned}
\text{acc} & \sim Bernoulli(p) \\
logit(p) & = \beta_0 + \beta_1 \cdot relation_{ncons} + \beta_2 \cdot relation_{cons} \\
\beta_0 & \sim Gaussian(\mu_0, \sigma_0) \\
\beta_1 & \sim Gaussian(\mu_1, \sigma_1) \\
\beta_2 & \sim Gaussian(\mu_2, \sigma_2) \\
\end{aligned}`
$$
]

```r
acc_bm_2 <- brm(
 Accuracy ~ Relation_type,
 family = bernoulli(),
 data = shallow,
 backend = "cmdstanr",
 file = "data/cache/acc_bm_2"
)
```

---

## Modelling accuracy by relation type: Intercept

```
## Population-Level Effects: 
##                             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept                       1.02      0.17     0.71     1.36 1.00     2777     2484
## Relation_typeNonConstituent     0.19      0.25    -0.30     0.67 1.00     2948     2801
## Relation_typeConstituent        0.86      0.29     0.30     1.42 1.00     2999     2789
```

- **Estimate**: `$\mu = 1.02$` log-odds.

- **Est.Error**: `$\sigma = 0.17$` log-odds.
]

.pull-right[
$$
`\begin{aligned}
\text{acc} & \sim Bernoulli(p) \\
logit(p) & = \beta_0 + \beta_1 \cdot relation_{ncons} + \beta_2 \cdot relation_{cons} \\
\beta_0 & \sim Gaussian(1.02, 0.17) \\
\beta_1 & \sim Gaussian(\mu_1, \sigma_1) \\
\beta_2 & \sim Gaussian(\mu_2, \sigma_2) \\
\end{aligned}`
$$
]

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[
**When relation type is `Unrelated`:**
- The mean log-odds of getting a correct response is 1.02 (corresponds to a **probability of 0.73.**)
- 95% probability that the true mean is between 0.71 and 1.36 (corresponds to a **probability between 0.67 and 0.8**).
]

---
layout: false
layout: true

## Modelling accuracy by relation type: `NonConstituent`

---

- **Estimate**: `$\mu = 0.19$` log-odds.

- **Est.Error**: `$\sigma = 0.25$` log-odds.
]

.pull-right[
$$
`\begin{aligned}
\text{acc} & \sim Bernoulli(p) \\
logit(p) & = \beta_0 + \beta_1 \cdot relation_{ncons} + \beta_2 \cdot relation_{cons} \\
\beta_0 & \sim Gaussian(1.02, 0.17) \\
\beta_1 & \sim Gaussian(0.19, 0.25) \\
\beta_2 & \sim Gaussian(\mu_2, \sigma_2) \\
\end{aligned}`
$$
]

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[
**When the relation type is non-constituent**: 
- The mean change in log-odds is 0.19.
- To figure out the **conditional probability of answering correctly**, we'll put some of this math to use.
]

---

$$
`\begin{aligned}
logit(p) &= \beta_0 + \beta_1 \cdot relation_{ncons} + \beta_2 \cdot relation_{cons} \\
&= 1.02 + (0.19 \cdot 1) + (\beta_2 \cdot 0) \\
&= 1.02 + 0.19\\
&= 1.21\\
p &= logistic(1.21)\\
 &= 0.77\\
\end{aligned}`
$$
--

.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[
**When the relation type is non-constituent**, the mean probability of responding correctly is 0.77.
]

```r
round(plogis(1.02 + 0.19), 2)
```

```
## [1] 0.77
```

---

---
layout:false
layout: true

## Modelling accuracy by relation type: `Constituent`

---

--

```r
# Mean probability of success when relation type = constituent:
round(plogis(1.02 + 0.86), 2)
```

```
## [1] 0.87
```

???

When the relation type is constituent, the mean change in log-odds is 0.86, and there is a 95% probability that the true  change in log-odds is between 0.3 and 1.42.

---

---
layout:false
class: center middle reverse

# Questions about what we just saw?

The models themselves?

Moving between log-odds and probabilities?

Interpreting the models' estimates?

Anything else not sitting right?

???

LO 5

---

## How do we find the 95% CrIs?

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**The empirical rule** (a.k.a., the 68–95–99.7 rule)
]

---
layout:false
layout:true

## The empirical rule

---

---

---

---

---

---

???

As a general rule, `$\pm2\sigma$` covers 95% of the Gaussian distribution, which means that there's a 95% probability that the value lies within that range.

---

## Computing CrIs using quantiles

---

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
- **Quantiles** are cut points that divide a continuous probability distribution into intervals with equal probability.

- Common quantiles:

- Quartiles (4 intervals).

- Percentiles or centiles (100 intervals).
]

---

---

---

---

---

---

---

---

---

---

```r
acc_bm_2_draws
```

```
## # A tibble: 12,000 × 2
## Relation_type value
## <fct> <dbl>
## 1 Unrelated 0.773
## 2 NonConstituent 1.16 
## 3 Constituent 2.18 
## 4 Unrelated 1.22 
## 5 NonConstituent 1.32 
## 6 Constituent 1.92 
## 7 Unrelated 0.881
## 8 NonConstituent 1.46 
## 9 Constituent 1.55 
## 10 Unrelated 0.845
## # … with 11,990 more rows
```

---

```r
library(posterior)

# The 95% CrI
acc_bm_2_draws %>%
  group_by(Relation_type) %>%
  summarise(
    q95_lo = quantile2(value, probs = 0.025),  # the 2.5th centile
    q95_hi = quantile2(value, probs = 0.975),  # the 97.5th centile
    p_q95_lo = round(plogis(q95_lo), 2),
    p_q95_hi = round(plogis(q95_hi), 2)
  )
```

```
## # A tibble: 3 × 5
## Relation_type q95_lo q95_hi p_q95_lo p_q95_hi
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 Unrelated 0.705 1.36 0.67 0.8 
## 2 NonConstituent 0.853 1.57 0.7 0.83
## 3 Constituent 1.44 2.35 0.81 0.91
```

---
layout: false
layout: true

## Learning objectives revisited

---

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**(1)** What are **binary outcomes?**

- Outcomes with two discrete levels (e.g., yes/no, grammatical/ungrammatical, correct/incorrect).
]

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**(2)** How do we **visualise** them?

- As proportions.
]

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**(3)** What **distribution** do binary outcomes follow?

- A Bernoulli distribution with one parameter, `$p$`, the probability of success.
- In brms: `family = bernoulli()`.
]

---

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**(4)**  How do we fit a model requiring a **continuous outcome space** using binary data?

- We model the binary outcomes as being generated based on the probability of success `$p$`.
- We transform this probability into log-odds, which is continuous.
]

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**(5)** How do we **interpret** the estimates of such a model?

- The model's estimates are in log-odds space.
- To interpret them, we convert back to probability space using `plogis()`.
]

.bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[
**(6)** What are **credible intervals**?

- Credible intervals are the difference between two quantiles.
- For Bayesian modelling, they are the region in which, with some probability (usually 95%), the true value lies.
]

---
layout:false
class: center middle reverse

## We made it! 😊📈
### For any latecomers: please fill out the attendance form.

`https://forms.office.com/e/XzxhFEBHv6`

![:scale 30%](imgs/qr.png)