class: center, middle, inverse, title-slide .title[ # Statistics and Quantitative Methods (S2) ] .subtitle[ ## Week 7 - Binary outcomes ] .author[ ###
Dr Stefano Coretta
Elizabeth Pankratz ] .institute[ ### University of Edinburgh ] .date[ ### 2023/03/07 ] --- class: center middle reverse ## Any questions about the new teaching situation? While you're thinking, please fill out the attendance form: `https://forms.office.com/e/XzxhFEBHv6` ![:scale 30%](imgs/qr.png) --- ## Summary from last time .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ - **Comparing groups** with `brm()` - `outcome ~ predictor`. - Categorical predictor with 2 and 3 levels. - **Treatment coding** of categorical predictors. - Nā1 **dummy variables**, where N is number of levels in the predictor. - Level ordering is *alphabetical* but you can specify your own. - **NOTE**: You don't have to apply treatment coding yourself! It's done under the hood by R. But you should **understand how it works**. - **Remember:** - The **Intercept** `\(\beta_0\)` is the mean of the reference level. - The other `\(\beta\)`s are the **difference** of the other levels relative to the reference level. ] --- layout: false ## Six learning objectives for this week .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **(1)** What are **binary outcomes?** ] -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **(2)** How do we **visualise** them? ] -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **(3)** What **distribution** do binary outcomes follow? ] -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **(4)** The data is binary, but linear models require a **continuous outcome space.** How do we fit such a model using binary data? ] -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **(5)** How do we **interpret** the estimates of such a model? ] -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **(6)** What are **credible intervals**? ] --- layout: false ## What are binary outcomes? .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **The variable you're trying to predict has two levels**, e.g.: - yes / no - grammatical / ungrammatical - Spanish / English - indirect object (*gave the girl the book*) / to-PP (*gave the book to the girl*) - correct / incorrect Very common in linguistics! ] ??? LO 1 --- ## Today's running example: Morphological processing .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ - English L1 and L2 participants (L2 participants are native speakers of Cantonese). - **Lexical decision task:** Is the string you see a word in English or not? - Each trial: - **Prime**: *prolong* (unrelated), *unkind* (constituent), *kindness* (non-constituent). - **Target**: *unkindness* (*[un-kind]-ness*, not *un-[kind-ness]*). - Data gathered: Reaction times and accuracy. ] -- .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[ We will focus on **accuracy** (correct identification of real word: **correct/incorrect**) for L1 participants. ] --- ## Visualise as proportions <img src="index_files/figure-html/shal-1.png" width="60%" style="display: block; margin: auto;" /> ??? LO2 --- ## What process could have generated this data? -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ We assume that there is **some probability `\(p\)` of responding correctly.** We don't know this probability, so we want to **use the data to estimate it.** ] -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **How can a probability generate binary outcomes?** - Imagine a coin flip: `\(p = 0.5\)`. - You flip the coin 10 times, you get 10 observations of binary outcomes (heads/tails), with probably about 50/50 frequency. - If it's not a fair coin, you might get 9 heads and 1 tail from, e.g., `\(p = 0.9\)`. ] --- ## What distribution do we use to model this process? .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ We want to use a distribution that is: - based on the probability of some event `\(p\)`, and - able to generate binary outcomes (0 and 1). ] -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ The best distribution for the job: the **Bernoulli distribution**. `$$Bernoulli(p)$$` ] -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ - A Bernoulli distribution generates a 1 ("success") with probability `\(p\)`. - And a 0 ("failure") with probability `\(1 - p = q\)`. And we can **think of our binary outcomes in terms of 0s and 1s:** - 0 = incorrect (or tails, or English, or ...) - 1 = correct (or heads, or Spanish, or ...) ] ??? LO3 --- ## Building up the model .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ - Accuracy `\(\text{acc}\)` is a binary variable: incorrect vs. correct. - `\(p\)` is the probability of obtaining a "correct" response. ] .f3[ $$ `\begin{aligned} \text{acc} &\sim Bernoulli(p) \\ p &=\text{ ...} \end{aligned}` $$ ] -- .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[ **Our goal: estimate `\(p\)`.** But there's a problem... ] --- ## Linear models can't cope with bounded data -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ - A line, in principle, goes on forever in a continuous space. - But probabilities are bounded between 0 and 1. - This means we can't fit a line to probabilities directly. ] -- .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[ The solution: **transform the probabilities into a continuous space**. ] --- ## The transformation: `logit()` .f3[ $$ `\begin{aligned} \text{acc} &\sim Bernoulli(p) \\ logit(p) &=\text{ ...} \end{aligned}` $$ ] <br> .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ - The **logit** (*log*istic un*it*) function converts probabilities to **log-odds**. - The model can work with log-odds because they are not bounded. ] ??? LO 4 <!-- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ --> <!-- - The **logistic** function converts log-odds to **probabilities**. --> <!-- - The logistic function is the *inverse* of the logit function. --> <!-- ] --> --- ## What are log-odds? -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ First, **what are odds**? $$ \text{odds} = \frac{\text{probability of a thing happening}}{\text{probability of the thing not happening}} = \frac{p}{1-p} $$ ] -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ If the probability of rain tomorrow is `\(p = 0.7\)`, then the odds of rain tomorrow are: $$ `\begin{aligned} \text{odds}_{rain} &= \frac{p}{1-p} \\ &= \frac{0.7}{1-0.7} \\ &= \frac{0.7}{0.3} \\ &= 2.333... \end{aligned}` $$ ] ??? No longer bounded by 1, but are odds unbounded? --- ## Odds are still bounded (and also asymmetrical) .center[ ![:scale 60%](imgs/numberline_odds.png) ] --- ## Applying the log function unsquishes [0, 1] <img src="index_files/figure-html/logarithm-1.png" width="60%" style="display: block; margin: auto;" /> --- ## Log-odds are continuous and unbounded .center[ ![:scale 60%](imgs/numberline_logodds.png) ] --- ## Probabilities and log-odds <img src="index_files/figure-html/p-log-odds-1.png" width="60%" style="display: block; margin: auto;" /> ??? On logit vs logistic function: <https://stats.stackexchange.com/a/120364>. --- ## From probabilities to log-odds Use `qlogis()` (logit function) to go from probabilities to log-odds. ```r qlogis(0.3) ``` ``` ## [1] -0.8472979 ``` ```r qlogis(0.5) ``` ``` ## [1] 0 ``` ```r qlogis(0.7) ``` ``` ## [1] 0.8472979 ``` --- ## From log-odds to probabilities Use `plogis()` (logistic function, the inverse of the logit function) to go from log-odds to probabilities. ```r plogis(-1) ``` ``` ## [1] 0.2689414 ``` ```r plogis(0) ``` ``` ## [1] 0.5 ``` ```r plogis(1) ``` ``` ## [1] 0.7310586 ``` --- ## Back to the model .f3[ $$ `\begin{aligned} \text{acc} & \sim Bernoulli(p) \\ logit(p) & \sim Gaussian(\mu, \sigma) \end{aligned}` $$ ] -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ - Because we are dealing with the outcome variable in log-odds space, **all the parameters of the model are going to be estimated in log-odds space.** - Here, this applies to `\(\mu\)` and `\(\sigma\)`. - We'll need to remember this when interpreting the model's estimates. ] --- class: center middle reverse # But first, a few minutes' break š®āšØ Then we'll fit the model to the lexical decision data! --- ## Set up the outcome variable .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ - We need to define a "success" level for our outcome variable. - We'll set success = `"correct"`, so that we estimate `\(p\)` as the probability of responding correctly. - To make this happen, let's manually reorder the levels in `Accuracy` as `"incorrect"` (will be coded as 0), `"correct"` (will be coded as 1). ] -- ```r shallow <- shallow %>% mutate( Accuracy = factor(Accuracy, level = c("incorrect", "correct")) ) levels(shallow$Accuracy) ``` ``` ## [1] "incorrect" "correct" ``` --- layout:false layout:true ## Starting small: Intercept-only model --- ```r acc_bm <- brm( Accuracy ~ 1, family = bernoulli(), data = shallow, backend = "cmdstanr", file = "data/cache/acc_bm" ) ``` -- ```r summary(acc_bm) ``` ``` ## Family: bernoulli ## Links: mu = logit ## Formula: Accuracy ~ 1 ## Data: shallow (Number of observations: 518) ## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; ## total post-warmup draws = 4000 ## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 1.32 0.11 1.11 1.53 1.00 1218 1913 ## ## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS ## and Tail_ESS are effective sample size measures, and Rhat is the potential ## scale reduction factor on split chains (at convergence, Rhat = 1). ``` --- ``` ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 1.32 0.11 1.11 1.53 1.00 1218 1913 ``` <br> <br> .pull-left[ - **Intercept**: `\(logit(p)\)`. - **Estimate**: `\(\mu = 1.32\)` log-odds. - **Est.Error**: `\(\sigma = 0.11\)` log-odds. ] .pull-right[ $$ `\begin{aligned} \text{acc} & \sim Bernoulli(p) \\ logit(p) & \sim Gaussian(1.32, 0.11) \end{aligned}` $$ ] --- <img src="index_files/figure-html/acc-int-1.png" width="60%" style="display: block; margin: auto;" /> There is a 95% probability that the log-odds of getting a "correct" response are between 1.11 and 1.53. --- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ - 'There is a 95% probability that the log-odds of getting a "correct" response are between 1.11 and 1.53.' - Uh, but what do these numbers actually mean...? ] -- .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[ In easier-to-understand probability terms: **There is a 95% probability that the probability of getting a "correct" response is between 0.75 and 0.82.** ] ```r round(plogis(1.11), 2) ``` ``` ## [1] 0.75 ``` ```r round(plogis(1.53), 2) ``` ``` ## [1] 0.82 ``` --- layout:false ## Set up the predictor `Relation_type` ```r shallow <- shallow %>% mutate( Relation_type = factor(Relation_type, level = c("Unrelated", "NonConstituent", "Constituent")) ) levels(shallow$Relation_type) ``` ``` ## [1] "Unrelated" "NonConstituent" "Constituent" ``` <br> <br> | | NonConstituent | Constituent | |----------------------------|----------------|---------------| | Relation type = Unrelated | 0 | 0 | | Relation type = NonConstituent | 1 | 0 | | Relation type = Constituent | 0 | 1 | --- ## Modelling accuracy by relation type .f3[ $$ `\begin{aligned} \text{acc} & \sim Bernoulli(p) \\ logit(p) & = \beta_0 + \beta_1 \cdot relation_{ncons} + \beta_2 \cdot relation_{cons} \\ \beta_0 & \sim Gaussian(\mu_0, \sigma_0) \\ \beta_1 & \sim Gaussian(\mu_1, \sigma_1) \\ \beta_2 & \sim Gaussian(\mu_2, \sigma_2) \\ \end{aligned}` $$ ] ```r acc_bm_2 <- brm( Accuracy ~ Relation_type, family = bernoulli(), data = shallow, backend = "cmdstanr", file = "data/cache/acc_bm_2" ) ``` --- ## Modelling accuracy by relation type: Intercept ``` ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 1.02 0.17 0.71 1.36 1.00 2777 2484 ## Relation_typeNonConstituent 0.19 0.25 -0.30 0.67 1.00 2948 2801 ## Relation_typeConstituent 0.86 0.29 0.30 1.42 1.00 2999 2789 ``` .pull-left[ - **Intercept**: `\(\beta_0\)`. - **Estimate**: `\(\mu = 1.02\)` log-odds. - **Est.Error**: `\(\sigma = 0.17\)` log-odds. ] .pull-right[ $$ `\begin{aligned} \text{acc} & \sim Bernoulli(p) \\ logit(p) & = \beta_0 + \beta_1 \cdot relation_{ncons} + \beta_2 \cdot relation_{cons} \\ \beta_0 & \sim Gaussian(1.02, 0.17) \\ \beta_1 & \sim Gaussian(\mu_1, \sigma_1) \\ \beta_2 & \sim Gaussian(\mu_2, \sigma_2) \\ \end{aligned}` $$ ] -- .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[ **When relation type is `Unrelated`:** - The mean log-odds of getting a correct response is 1.02 (corresponds to a **probability of 0.73.**) - 95% probability that the true mean is between 0.71 and 1.36 (corresponds to a **probability between 0.67 and 0.8**). ] --- layout: false layout: true ## Modelling accuracy by relation type: `NonConstituent` --- ``` ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 1.02 0.17 0.71 1.36 1.00 2777 2484 ## Relation_typeNonConstituent 0.19 0.25 -0.30 0.67 1.00 2948 2801 ## Relation_typeConstituent 0.86 0.29 0.30 1.42 1.00 2999 2789 ``` .pull-left[ - **Effect of `\(relation_{ncons}\)`**: `\(\beta_1\)`. - **Estimate**: `\(\mu = 0.19\)` log-odds. - **Est.Error**: `\(\sigma = 0.25\)` log-odds. ] .pull-right[ $$ `\begin{aligned} \text{acc} & \sim Bernoulli(p) \\ logit(p) & = \beta_0 + \beta_1 \cdot relation_{ncons} + \beta_2 \cdot relation_{cons} \\ \beta_0 & \sim Gaussian(1.02, 0.17) \\ \beta_1 & \sim Gaussian(0.19, 0.25) \\ \beta_2 & \sim Gaussian(\mu_2, \sigma_2) \\ \end{aligned}` $$ ] -- .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[ **When the relation type is non-constituent**: - The mean change in log-odds is 0.19. - To figure out the **conditional probability of answering correctly**, we'll put some of this math to use. ] --- ``` ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 1.02 0.17 0.71 1.36 1.00 2777 2484 ## Relation_typeNonConstituent 0.19 0.25 -0.30 0.67 1.00 2948 2801 ## Relation_typeConstituent 0.86 0.29 0.30 1.42 1.00 2999 2789 ``` -- $$ `\begin{aligned} logit(p) &= \beta_0 + \beta_1 \cdot relation_{ncons} + \beta_2 \cdot relation_{cons} \\ &= 1.02 + (0.19 \cdot 1) + (\beta_2 \cdot 0) \\ &= 1.02 + 0.19\\ &= 1.21\\ p &= logistic(1.21)\\ &= 0.77\\ \end{aligned}` $$ -- .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[ **When the relation type is non-constituent**, the mean probability of responding correctly is 0.77. ] -- ```r round(plogis(1.02 + 0.19), 2) ``` ``` ## [1] 0.77 ``` --- <img src="index_files/figure-html/acc-bm-2-cond-1.png" width="60%" style="display: block; margin: auto;" /> --- layout:false layout: true ## Modelling accuracy by relation type: `Constituent` --- ``` ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 1.02 0.17 0.71 1.36 1.00 2777 2484 ## Relation_typeNonConstituent 0.19 0.25 -0.30 0.67 1.00 2948 2801 ## Relation_typeConstituent 0.86 0.29 0.30 1.42 1.00 2999 2789 ``` <br> -- ```r # Mean probability of success when relation type = constituent: round(plogis(1.02 + 0.86), 2) ``` ``` ## [1] 0.87 ``` ??? When the relation type is constituent, the mean change in log-odds is 0.86, and there is a 95% probability that the true change in log-odds is between 0.3 and 1.42. --- <img src="index_files/figure-html/acc-bm-2-cond-2-1.png" width="60%" style="display: block; margin: auto;" /> --- layout:false class: center middle reverse # Questions about what we just saw? The models themselves? Moving between log-odds and probabilities? Interpreting the models' estimates? Anything else not sitting right? ??? LO 5 --- ## How do we find the 95% CrIs? .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **The empirical rule** (a.k.a., the 68ā95ā99.7 rule) ] --- layout:false layout:true ## The empirical rule --- <img src="index_files/figure-html/empirical-rule-0-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/empirical-rule-1-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/empirical-rule-2-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/empirical-rule-3-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/empirical-rule-4-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/empirical-rule-5-1.png" width="60%" style="display: block; margin: auto;" /> ??? As a general rule, `\(\pm2\sigma\)` covers 95% of the Gaussian distribution, which means that there's a 95% probability that the value lies within that range. --- layout: false layout: true ## Computing CrIs using quantiles --- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ - **Quantiles** are cut points that divide a continuous probability distribution into intervals with equal probability. - Common quantiles: - Quartiles (4 intervals). - Percentiles or centiles (100 intervals). ] --- <img src="index_files/figure-html/quartiles-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/quart-1-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/quart-2-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/quart-3-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/centiles-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/cent-96-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/cent-95-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/cent-80-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/cent-60-1.png" width="60%" style="display: block; margin: auto;" /> --- ```r acc_bm_2_draws ``` ``` ## # A tibble: 12,000 Ć 2 ## Relation_type value ## <fct> <dbl> ## 1 Unrelated 0.773 ## 2 NonConstituent 1.16 ## 3 Constituent 2.18 ## 4 Unrelated 1.22 ## 5 NonConstituent 1.32 ## 6 Constituent 1.92 ## 7 Unrelated 0.881 ## 8 NonConstituent 1.46 ## 9 Constituent 1.55 ## 10 Unrelated 0.845 ## # ā¦ with 11,990 more rows ``` --- ```r library(posterior) # The 95% CrI acc_bm_2_draws %>% group_by(Relation_type) %>% summarise( q95_lo = quantile2(value, probs = 0.025), # the 2.5th centile q95_hi = quantile2(value, probs = 0.975), # the 97.5th centile p_q95_lo = round(plogis(q95_lo), 2), p_q95_hi = round(plogis(q95_hi), 2) ) ``` ``` ## # A tibble: 3 Ć 5 ## Relation_type q95_lo q95_hi p_q95_lo p_q95_hi ## <fct> <dbl> <dbl> <dbl> <dbl> ## 1 Unrelated 0.705 1.36 0.67 0.8 ## 2 NonConstituent 0.853 1.57 0.7 0.83 ## 3 Constituent 1.44 2.35 0.81 0.91 ``` --- layout: false layout: true ## Learning objectives revisited --- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **(1)** What are **binary outcomes?** - Outcomes with two discrete levels (e.g., yes/no, grammatical/ungrammatical, correct/incorrect). ] -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **(2)** How do we **visualise** them? - As proportions. ] -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **(3)** What **distribution** do binary outcomes follow? - A Bernoulli distribution with one parameter, `\(p\)`, the probability of success. - In brms: `family = bernoulli()`. ] --- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **(4)** How do we fit a model requiring a **continuous outcome space** using binary data? - We model the binary outcomes as being generated based on the probability of success `\(p\)`. - We transform this probability into log-odds, which is continuous. ] -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **(5)** How do we **interpret** the estimates of such a model? - The model's estimates are in log-odds space. - To interpret them, we convert back to probability space using `plogis()`. ] -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **(6)** What are **credible intervals**? - Credible intervals are the difference between two quantiles. - For Bayesian modelling, they are the region in which, with some probability (usually 95%), the true value lies. ] --- layout:false class: center middle reverse ## We made it! šš ### For any latecomers: please fill out the attendance form. `https://forms.office.com/e/XzxhFEBHv6` ![:scale 30%](imgs/qr.png)