class: center, middle, inverse, title-slide .title[ # Statistics and Quantitative Methods (S2) ] .subtitle[ ## Week 4 ] .author[ ### Dr Stefano Coretta ] .institute[ ### University of Edinburgh ] .date[ ### 2023/02/07 ] --- ## Random variables .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **We have a question about the world, so we sample from a population.** - `\(y\)` is a **sample** of values (`\(y_1, y_2, y_3, ..., y_n\)`). - We say that the values in the sample `\(y\)` were generated by a **random variable `\(Y\)`**. ] -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ In probability theory, **A random variable `\(Y\)` is a variable whose value is unknown and is generated by a random event.** - `\(Y\)` is uncertain. - We can be describe `\(Y\)` by talking about the **probabilities** of different values being taken on by `\(Y\)`. ] -- .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[ **Probabilities are at the very core of statistics**, because the ultimate aim of statistics is to quantify WHAT? ] --- layout: true ## Probabilities --- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **Probability** - Of an event occurring or of a random variable taking on a specific value. - Probabilities can only be **between 0 and 1**. - ⛔️ 0 means **impossible**. - 🤷 0.5 means **it can happen but it can also not happen**. - ✅ 1 means **certain**. ] -- .center[ ![:scale 50%](../../img/probabilities.png) ] --- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **Probability** - Probability of an event occurring: 0 to 100% probability. - **Probability of a random variable taking on a value**: a bit more complicated... ] .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[ **We need probability distributions!** ] --- layout: false layout: true ## Probability distributions --- <br> ![](../../img/grubabilities.png) --- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ A **probability distribution** is a mathematical function that describes *how the probabilities are distributed over the values* that a random variable can take on. ] -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ Two types of probability distributions - **Discrete probability distributions.** - **Continuous probability distributions.** ] -- .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[ **We talked about discrete and continuous variables in Week 2!** Discrete variables follow discrete probability distributions and continuous variables follow continuous probability distributions. ] --- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ We can visualise probability distributions: - Using the **probability mass function** for discrete probability distributions. - Using the **probability density function** for continuous probability distributions. ] -- .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[ You don't need to understand the math behind this, but you are free to learn about it through the internet search engine of your choice! ] --- **Probability Mass Function** <br> .center[ ![:scale 50%](../../img/dice.png) ] --- **Probability Density Function** <img src="index_files/figure-html/vot-dens-1.png" width="60%" style="display: block; margin: auto;" /> --- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **Probability distributions can be expressed by a set of parameters.** - We summarise a probability distribution with a **set of parameters**. - In mathematical notation: `\(\Theta = (\theta_1, ..., \theta_n)\)` - Different (sub-)types of probability distributions have a different number of parameters and different parameters. ] -- .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[ The **Gaussian probability distribution** is a continuous probability distribution and it has two parameters: - The mean `\(\mu\)`. - The standard deviation `\(\sigma\)`. ] Go to **[Seeing Theory](https://seeing-theory.brown.edu/probability-distributions/index.html#section2)**. ??? Seeing Theory was created by Daniel Kunin while an undergraduate at Brown University. The goal of this website is to make statistics more accessible through interactive visualizations (designed using Mike Bostock’s JavaScript library D3.js). <https://seeing-theory.brown.edu/index.html#3rdPage> --- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ In research, you try to **estimate** the probability distribution of the variable of interest (VOT, number of telic verbs, informativity score, acceptability ratings, ...). - In other words you are trying to **estimate the parameters** of the probability distribution. ] -- .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[ Now, let's talk a bit about ontology... ] --- .bg-washed-yellow.b--orange.ba.bw2.br3.shadow-5.ph4.mt2[ **Frequentist view** - The parameters are **fixed** (they are *unknown but certain*). - They take on a specific value. ] -- .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[ **Bayesian view** - The parameters are **random variables** (they are *unknown and uncertain*). - We describe each parameter as a probability distribution. - And each parameter's probability distribution is described by a set of parameters (called hyper-parameters). ] --- layout: false background-image: url(https://media.giphy.com/media/443pAv9m6Ti8KiCoAi/giphy.gif) --- layout: true ## Albanian VOT --- <img src="index_files/figure-html/alb-vot-1-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/alb-vot-2-1.png" width="60%" style="display: block; margin: auto;" /> -- Based on the sample (N = 24): mean VOT = 12 ms, with SD = 3 ms. --- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ **Sample of 24 VOT values of Albanian voiceless stops**. - Sample mean = 11.6 ms. - Sample SD = 2.8 ms. ] -- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ Let's assume VOT values are **distributed according to a Gaussian distribution**. - The VOT values we sampled are generated by a Gaussian distribution with mean `\(\mu\)` and standard deviation `\(\sigma\)`. ] -- <br> .f3[ `$$vot \sim Gaussian(\mu, \sigma)$$` ] --- <img src="index_files/figure-html/alb-vot-dist-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/alb-vot-dist-2-1.png" width="60%" style="display: block; margin: auto;" /> -- We want to make inferences about the population based on the sample. --- .f3[ `$$vot \sim Gaussian(\mu, \sigma)$$` ] **Read as**: VOT values (`\(vot\)`) are distributed according to (`\(\sim\)`) a Gaussian distribution (`\(Gaussian()\)`) with mean `\(\mu\)` and standard deviation `\(\sigma\)`. --- .f3[ `$$vot \sim Gaussian(\mu, \sigma)$$` ] **Parameters**: mean `\(\mu\)` and SD `\(\sigma\)`. -- .f3[ `$$\mu = ...$$` `$$\sigma = ...$$` ] --- .f3[ `$$vot \sim Gaussian(\mu, \sigma)$$` ] **Parameters**: mean `\(\mu\)` and SD `\(\sigma\)`. .f3[ `$$\mu = 12.6$$` `$$\sigma = 2.8$$` ] -- <br> **BUT**, this does not keep into consideration the **uncertainty and variability** of the sampling procedure. --- .f3[ `$$vot \sim Gaussian(\mu, \sigma)$$` ] **Parameters**: mean `\(\mu\)` and SD `\(\sigma\)`. .f3[ `$$\mu \sim Gaussian(\mu_1, \sigma_1)$$` ] **Hyperparameters**: mean `\(\mu_1\)` and SD `\(\sigma_1\)`. .f3[ `$$\sigma = ...$$` ] -- Standard deviations are always positive! So we need a truncated Gaussian distribution (only the positive half!). --- .f3[ `$$vot \sim Gaussian(\mu, \sigma)$$` ] **Parameters**: mean `\(\mu\)` and SD `\(\sigma\)`. .f3[ `$$\mu \sim Gaussian(\mu_1, \sigma_1)$$` ] **Hyperparameters**: mean `\(\mu_1\)` and SD `\(\sigma_1\)`. .f3[ `$$\sigma \sim TruncGaussian(\mu_2, \sigma_2)$$` ] **Hyperparameters**: mean `\(\mu_2\)` and SD `\(\sigma_2\)`. --- layout: false layout: true ## Estimating probability distributions --- .pull-left[ <img src="index_files/figure-html/norm-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="index_files/figure-html/hnorm-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ - `\(Gaussian(\mu, \sigma), \mu = Gaussian(\mu_1, \sigma_1), \sigma = TruncGaussian(\mu_2, \sigma_2)\)` - We need to estimate: - `\(\mu_1\)` and `\(\sigma_1\)` for the Gaussian probability distribution of `\(\mu\)`. - `\(\mu_2\)` and `\(\sigma_2\)` for the truncated Gaussian probability distribution of `\(\sigma\)`. ] -- .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[ With our sample of N = 24 we want to make inferences about the population of VOT values of Albanian voiceless stops, by estimating the four parameters `\(\mu_1\)`, `\(\sigma_1\)`, `\(\mu_2\)` and `\(\sigma_2\)`. ] --- ```r # Attach the brms package library(brms) # Run a Bayesian model vot_bm <- brm( # This is the formula of the model. vot ~ 1, # This is the probability distribution family. family = gaussian(), # And the data. data = alb_vot_vl ) ``` --- .f2.center[ `vot ~ 1` ] **Read as**: Model VOT values (`vot`) as a function of (`~`) the mean (`1`). .f7[We will see later that `1` is also called the *Intercept*.] -- .f2.center[ `family = gaussian()` ] **Read as**: using a Gaussian probability distribution. - The Gaussian distribution also has another parameter, the SD. -- .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt2[ **Altogether**: Model VOT values as a function of the mean and standard deviation of a Gaussian probability distribution. ] --- ``` ## Family: gaussian ## Links: mu = identity; sigma = identity ## Formula: vot ~ 1 ## Data: alb_vot_vl (Number of observations: 24) ## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; ## total post-warmup draws = 4000 ## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 11.62 0.59 10.48 12.78 1.00 2559 2158 ## ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## sigma 2.88 0.44 2.18 3.88 1.00 2327 2013 ## ## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS ## and Tail_ESS are effective sample size measures, and Rhat is the potential ## scale reduction factor on split chains (at convergence, Rhat = 1). ``` --- ``` ## Family: gaussian ## Formula: vot ~ 1 ## Data: alb_vot_vl (Number of observations: 24) ## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 11.62 0.59 10.48 12.78 1.00 2559 2158 ## ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## sigma 2.88 0.44 2.18 3.88 1.00 2327 2013 ``` --- ``` ## Family: gaussian ## Formula: vot ~ 1 ## Data: alb_vot_vl (Number of observations: 24) ``` --- layout: false layout: true ## Estimating the mean --- ``` ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 11.62 0.59 10.48 12.78 1.00 2559 2158 ``` -- <br> <br> .pull-left[ - **Intercept**: the mean `\(\mu\)`. - **Estimate**: `\(\mu_1 = 11.62\)` ms. - **Est.Error**: `\(\sigma_1 = 0.59\)` ms. ] .pull-right[ `$$vot \sim Gaussian(\mu, \sigma)$$` .purple[ `$$\mu \sim Gaussian(\mu_1, \sigma_1)$$` ] `$$\sigma \sim TruncGaussian(\mu_2, \sigma_2)$$` ] --- <img src="index_files/figure-html/vot-int-p-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/vot-int-p-2-1.png" width="60%" style="display: block; margin: auto;" /> -- There is a 95% probability that mean VOT is between 10.48 and 12.78 ms. --- layout: false layout: true ## Estimating the standard deviation --- ``` ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## sigma 2.88 0.44 2.18 3.88 1.00 2327 2013 ``` -- <br> <br> .pull-left[ - **sigma**: the SD `\(\sigma\)`. - **Estimate**: `\(\mu_2 = 2.88\)` ms. - **Est.Error**: `\(\sigma_2 = 0.44\)` ms. ] .pull-right[ `$$vot \sim Gaussian(\mu, \sigma)$$` `$$\mu \sim Gaussian(\mu_1, \sigma_1)$$` .purple[ `$$\sigma \sim TruncGaussian(\mu_2, \sigma_2)$$` ] ] --- <img src="index_files/figure-html/vot-sig-p-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/vot-sig-p-2-1.png" width="60%" style="display: block; margin: auto;" /> -- There is a 95% probability that VOT standard deviation is between 2.18 and 3.88 ms. --- layout: false ## Putting mean and SD together <img src="index_files/figure-html/vot-bm-pp-1.png" width="60%" style="display: block; margin: auto;" /> --- ## Summary .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ - A random variable `\(Y\)` is a variable whose value is unknown and is generated by a random event. - A **probability distribution** is a mathematical function that describes *how the probabilities are distributed over the values* that a random variable can take on. - **Discrete probability distributions.** - **Continuous probability distributions.** - The Gaussian distribution has two parameters: mean `\(\mu\)` and SD `\(\sigma\)`. - We can describe `\(\mu\)` and `\(\sigma\)` as probability distributions and estimate the (hyper-)parameters of those probability distributions. - R package [brms](https://paul-buerkner.github.io/brms/), function `brm()`. ]