class: center, middle, inverse, title-slide .title[ # Statistics and Quantitative Methods (S1) ] .subtitle[ ## Week 1 ] .author[ ### Dr Stefano Coretta ] .institute[ ### University of Edinburgh ] .date[ ### 2022/09/20 ] --- # Before we begin... .pull-left[ .f3[Complete the **SATS-36 questionnaire**.] .f3[https://www.soscisurvey.de/sqmf-sats/] ] .pull-right[ .center[ ![](../../img/QR-Code-sqmf-sats.svg) ] ] ??? Photo by Emily Morter on Unsplash --- # News - **SQM (Statistics and Quantitative Methods)** is an updated version of previous years' SED (Statistics and Experimental Design). - The main learning objective is to allow you to **conduct basic data analyses** which can be applied to linguistic questions and data. - We will focus on **modern techniques** of data handling and quantitative analysis. --- class: middle .pull-left[ ![](../../img/Alice_par_John_Tenniel_25.png) ] .pull-right[ .f3[For, you see, so many out-of-the-way things had happened lately, that Alice had begun to think that very few things indeed were really impossible.] .right[—Lewis C. Carol, Alice's Adventures in Wonderland ] ] ??? "If I had a world of my own, everything would be nonsense. Nothing would be what it is, because everything would be what it isn't. And contrary wise, what is, it wouldn't be. And what it wouldn't be, it would. You see?" .right[—Alice, from Disney's *Alice in Wonderland*] --- # What (applied) statistics is NOT - Statistics is not maths. - Statistics is not about hard truths. - Statistics is not a purely objective endeavour. - Statistics is not a substitute of common sense and expert knowledge. - Statistics is not equivalent to `\(p\)`-values and significance testing. --- background-image: url(https://saffroninteractive.com/wp-content/uploads/2014/09/Knowledge-1024x576.png) background-size: cover ??? What is knowledge? What does "knowledge" mean to you? --- class: inverse background-image: url(../../img/word-cloud.png) background-size: cover ??? <!-- UPDATE IMAGE --> Results from Slido poll. Where does knowledge come from? --- layout: true # Sources of knowledge --- .center[ ![:scale 55%](../../img/sources-knowledge.png) ] ??? One way of classifying sources of knowledge: - Authority - Intuition - Reason (logic) - Observation. --- .center[ ![:scale 55%](../../img/sources-knowledge-empirical.png) ] ??? Observations generate **empirical data**. *Empirical* just means 'based on experience', where *experience* here means 'observation'. How can we use observations/empirical data? NOTE: Do not mix up empirical methods with experimental methods. Experiments are a kind of empirical method. EXTRA: Check out the etymology of the word *empiric(al)* here: <https://en.wiktionary.org/wiki/empiric#English> --- layout: false # Research methods .center[ ![:scale 60%](../../img/res-methods.png) ] --- layout: true # Research process --- .center[ ![:scale 60%](../../img/res-process.png) ] --- .center[ ![:scale 60%](../../img/res-process-data-ana.png) ] --- layout: true # Data analysis --- .center[ ![:scale 80%](../../img/data-quant-qual.png) ] --- .center[ ![:scale 80%](../../img/data-quant.png) ] --- layout: false class: inverse right middle .f2[The numbers have no way of speaking for themselves. We speak for them. We imbue them with meaning.] — Nate Silver, *The Signal and the Noise* --- # Inference process .center[ ![:scale 40%](../../img/bro.png) ] ??? Imbuing numbers with meaning is a good characterisation of the "inference process". We have a question about something. Let's imagine that this something is the population of British Sign Language signers. We want to know whether the cultural background of the BSL signers is linked to different pragmatic uses of the sign for BROTHER. But we can't survey the entire population of BSL signers. NOTE: *Population* can be a set of anything, not just a specific group of people. For example, the words in a dictionary can be a "population"; or the antipassive constructions of Austronesian languages; or ... --- # Inference process .center[ ![:scale 75%](../../img/inference.png) ] ??? We take a **sample** from the population. This is our empirical data (the product of observation). How do we go from data/observation to answering our question? We can use **inference**. **Inference** is the process of understanding something about a population based on the sample (aka the data) taken from that population. --- class: middle center inverse # Inference is not infallible ??? However, inference based on data does not guarantee that the answers to our questions are right/true. In fact, any observation we make comes with a degree of **uncertainty and variability**. EXTRA: Check out this article: <https://www.scientificamerican.com/article/if-you-say-science-is-right-youre-wrong/> EXTRA: Find out about Popper's view of falsification and fallibilism. --- # Uncertainty and variability .center[ ![](../../img/pliny.jpg) ] ??? So we have to deal with: - Uncertainty in any observation of a phenomenon. - Variability among different observations of the same phenomenon. --- class: center, middle, inverse ![](../../img/uncertainty.png) ??? Guess what it is... --- class: center middle inverse # *Statistics* as a tool to deal with *uncertainty* and *variability* ??? Statistics helps us quantifying uncertainty and controlling for variability. --- class: middle .pull-left[ # What is statistics? ] .pull-right[ ![:scale 90%](../../img/charles-deluvio-Mv9hjnEUHR4-unsplash.jpg) ] ??? EXTRA: Check out the etymology of *statistics* here: <https://en.wiktionary.org/wiki/statistics#Etymology_1>. Photo by <a href="https://unsplash.com/@charlesdeluvio?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Charles Deluvio</a> on <a href="https://unsplash.com/s/photos/dog?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Unsplash</a> --- # Quick poll .f3[*Which words come to your mind when you think of "statistics"?*] <br> .pull-left[ .f3[Join at] .f1[slido.com] .f1[\#4115 041] ] .pull-right[ .center[ ![](../../img/QR-SQM-1-Week-1.png) ] ] ??? Slido poll. <https://app.sli.do/event/7vnRtEYaWhpasrYTT4B11P> --- # What is statistics? .f3[ - Statistics is the **science** concerned with developing and studying methods for collecting, analyzing, interpreting and presenting empirical data. (From [UCI Department of Statistics](https://www.stat.uci.edu/what-is-statistics/)) ] -- .f3[ - Statistics is the **technology** of extracting information, illumination and understanding from data, often in the face of uncertainty. (From the [British Academy](https://www.thebritishacademy.ac.uk/blog/what-is-statistics/)) ] -- .f3[ - Statistics is a **mathematical and conceptual** discipline that focuses on the relation between data and hypotheses. (From the [Standford Encyclopedia of Philosophy](https://plato.stanford.edu/entries/statistics/)) ] -- .f3[ - Statistics as the **art** of applying the science of scientific methods. (From [ORI Results](https://www.oriresults.com/articles/blog-posts/the-art-of-statistics/), [Nature](https://www.nature.com/articles/d41586-019-00898-0)) ] --- class: inverse right middle .f2[*Statistic is both a science and an art*.] .f4[It is a *science* in that its methods are basically systematic and have general application and an *art* in that their successful application depends, to a considerable degree, on the skill and special experience of the statistician, and on his knowledge of the field of application.] —L. H. C. Tippett --- class: middle .pull-left[ # But...<br>*all that glisters is not gold* ] .pull-right[ ![](../../img/gollum-statistician.png) ] --- layout: true # Many Analysts, One Data Set --- .pull-left[ .center[ ![:scale 60%](../../img/red-card.jpg) ] ] .pull-right[ .f3[**Is there a link between player skin tone and number of red cards in soccer?**] - **29 independent analysis teams**. - 69% of the teams reported an effect, and 31% did not. - **21 unique** types of statistical analysis. <br> > The observed results from analyzing a complex data set can be highly contingent on **justifiable**, but **subjective**, analytic decisions. .right[—Silberzahn et al 2018] ] --- .pull-left[ .center[ ![:scale 85%](../../img/forking-paths.png) ] ] .pull-right[ .f3[**Do speakers acoustically modify utterances to signal atypical word combinations?**] - **30 independent analysis teams**. - **104** individual models. - **52** unique measurement specifications. - **47** unique model specifications. <br> > The observed results from analyzing a complex data set can be highly contingent on **justifiable**, but **subjective**, analytic decisions. .right[—Coretta et al, *in-principle*] ] --- layout: false class: bottom background-image: url(../../img/francesco-gallarotti-ruQHpukrN7c-unsplash.jpg) background-size: cover # THE "NEW STATISTICS" ??? Photo by <a href="https://unsplash.com/@gallarotti?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Francesco Gallarotti</a> on <a href="https://unsplash.com/s/photos/new?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Unsplash</a> --- # The *New Statistics* It addresses three problems ([Cumming 2014](https://doi.org/10.1177%2F0956797613504966)): - Published research is a biased selection of all research. -- - Data analysis and reporting are often selective and biased. -- - In many research fields, studies are rarely replicated, so false conclusions persist. <br> -- This course will introduce you to the *New Statistics* incrementally. --- # Course info - Check the Learn website **and** <https://stefanocoretta.github.io/sqmf/>. - Course announcements are sent via Learn. - **ASK FOR HELP**: It is OK not to be OK, and remember that you are not alone. - Got to the PPLS UG or PG Hub (on SharePoint Online) > Support for students > Wellbeing and Health. - Assessment: - **Formative assessments**: Monday and Thursday deadlines (FM and FT). - **Summative assessment**: Data Analysis Report, due on Thursday 8 December at noon (100% final mark).