Syllabus overview
This course is an introduction to quantitative data analysis, as commonly employed in linguistics, using the R software.
We will cover the following topics:
- The basics of quantitative data analysis.
- Data preparation.
- Data summaries.
- Principles of data visualisation.
- Statistical modelling with linear models.
- Statistical inference using Bayesian inference.
At completion of the course you will have gained the following skills:
- Import common data formats, tidy and transform data.
- Choosing and reporting appropriate summary measures.
- Using compelling visualisations to communicate a specific message about patterns in the data.
- Master linear models for different types of data (continuous measures and binary outcomes).
- Using Bayesian inference to answer research questions and avoid common interpretation pitfalls.
Examples from different branches of linguistics will be used to provide you with hands-on experience in quantitative data analysis and Open Research practices.
Weekly plan
Week 1: Quantitative methods and uncertainty
- What is quantitative data analysis?
- What is the inference process?
- How can we talk about uncertainty and variability?
- Which are the limits of quantitative methods?
- Think critically about statistics, uncertainty and variability.
- Use R to perform simple calculations.
- Master the basics of the programming language R.
- Use RStudio.
Intake form
- You must complete the intake form before coming to the Tuesday lecture.
- The link to the form will be circulated via Learn.
Install R and RStudio
- If you with to use your own laptop for the workshops, instead of the Lab PCs, you need to install both R and RStudio.
- NOTE: If you have installed either R or RStudio prior to January 2023, please make sure you delete both R and RStudio from your laptop.
- Please, follow the instructions for Step 1 and 2 here:
Main textbooks
- Statistics for Linguists with R, by Bodo Winter (S4LR) Ch. 1. [via library]
- R for Data Science (R4DS) Ch. 1, Ch. 2. [online book]
- Statistical (Re)thinking, by Richard McElreath (SReT), Ch. 1. [via library]
- Darwin Holmes 2020. Researcher Positionality - A Consideration of Its Influence and Place in Qualitative Research - A New Researcher Guide
- Jafar 2018. What is positionality and should it be expressed in quantitative studies?
From the lecture
- Silberzahn et al. 2018. Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results
- Coretta et al. (in principle). Multidimensional signals and analytic flexibility: Estimating degrees of freedom in human speech analyses
- Cumming 2014. The New Statistics: Why and How
- Kurschke and Liddell 2018. The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective
Week 2: Data wrangling
- What are the types of statistical variables?
- Which summary measures are appropriate for which types of variables?
- What are common measures central tendency?
- What are common measures of dispersion?
- Organise files efficiently.
- Import tabular data in R.
- Obtain mean, median, mode, range and standard deviation.
- Use R scripts to save and reuse code.
- Lecture slides.
- Workshop tutorial.
- Workshop files (right-click and download):
Week 3: Data visualisation
- What are the principles of good data visualisation?
- Which are the main components of a plot?
- Which are the appropriate plots for different types of data?
- How can we visualise uncertainty?
- Create common types of plots with ggplot2.
- Use colour and shape to effectively convey meaning.
- Describe a plot in writing and comment on observable patterns.
- Filter and mutate data.
Set-up your laptop
From Week 4 on, we will need to use software that unfortunately is not currently working on the Lab PCs. This means you will need to bring your own laptop to the workshop. Moreover, you will have to install software prior to the workshop.
Note that it will take up to an hour to install everything and you might encounter errors, so please do this asap. DO NOT WAIT NEXT WEEK.
You can find the full instructions here: set-up instructions.
- Lecture slides.
- Workshop tutorial.
- Workshop files (right-click and download):
Main textbooks
From the lecture
- Spiegelhalter 2020. The Art of Statistics: Learning from Data.
- Fundamentals of Data Visualisation.
- Data viz catalogues
- Tutorials
- Colour
- Caveats
Week 4: Statistical modeling basics
- What are probability distributions?
- How can we describe probability distributions with statistical parameters?
- What are the frequentist and Bayesian view of statistical parameters?
- How can we estimate parameters using statistical models?
- Transform data by creating new columns and filtering based on specific values.
- Use logical operators to transform data.
- Fit a statistical model to estimate the mean and standard deviation of a Gaussian variable with
. - Interpret the summary of the model and understand the meaning of the reported estimates.
Set-up your laptop
From Week 4 on, we will need to use software that unfortunately is not currently working on the Lab PCs. This means you will need to bring your own laptop to the workshop. Moreover, you were asked to install software prior to the workshop.
In the tragic event you did not go through the installation instructions last week, DO IT NOW AT YOUR OWN PERIL. I won’t be able to help you with the installation during the workshop.
You can find the full instructions here: set-up instructions.
- Lecture slides.
- Workshop tutorial.
- Workshop files (right-click and download):
Week 5: Categorical predictors
DUE on Thursday 16 February at NOON.
Formative assessment 1 requires you to read, mutate and plot a given data set. You can preview the formative instructions here:
We will use GitHub Classroom for all assessments. For an overview of how GitHub Classroom works, watch these videos.
The GitHub Classroom invitation link can be found on Learn.
- How do we model variables using categorical predictors?
- Which are the most common coding systems for categorical predictors?
- How do we interpret the model output when there are categorical predictors?
- How can we quickly check model goodness?
- Master contrast coding in R for categorical predictors.
- Understand treatment coding.
- Fit, interpret and plot models with a categorical predictor.
- Reporting of model specification and results.
- Lecture slides.
- Workshop tutorial.
- Workshop files (right-click and download):
Main textbooks
- R4DS Ch. 15. [online book]
- S4LR Ch 7. [via library]
- SReT Sec 5.3. [via library]
Flexible Learning Week
There is no homework as such, so take the time to revise the materials and/or catch up with the previous weeks’ materials.
There will be no classes.
Week 6: Lecture and tutorials cancelled
DUE on Thursday 2 March at NOON.
F2 requires you to find a data table and fit a model with one continuous outcome and one categorical predictor. You can preview the instructions here:
From the .Rmd file, you will render a PDF and submit it over Turnitin. You can find the link to Turnitin on Learn under Assessment > Formative Assessment F2.
Week 7: Binary outcomes
- How can we visualise proportions of binary outcomes (yes/no, correct/incorrect, …)?
- Which distribution do binary outcomes follow?
- What is the relationship between probabilities and log-odds?
- How do we interpret log-odds and odds?
- Plot binary data as proportions in ggplot2.
- Pivot data from wide to long with tidyr.
- Fit, interpret and plot linear models with binary outcome variables, using the Bernoulli distribution family.
- Convert between log-odds, odds and probabilities.
- Lecture slides.
- Workshop tutorial.
- Workshop files (right-click and download):
Week 8: Multiple predictors and interactions
- What is a factorial design?
- How do we estimate and interpret the effects of multiple predictors?
- How do we deal with situations when one predictor’s effect is different, depending on the value of the other predictor?
- How can such interactions between predictors be built into our models?
- How do we interpret model estimates of interactions?
- Run and interpret models with multiple predictors.
- Interpret interactions between two predictors.
- Plot posterior and conditional probabilities from models with interactions.
- Practice transforming and back-transforming variables.
- Lecture slides.
- Workshop tutorial.
- Workshop files (right-click and download):
Data analysis report
Given a data table and a research question, submit a short and concise data analysis report, including summary measures, plots, and a linear model. You can download everything you need from here:
From the .Rmd file, you will render a PDF and submit it over Turnitin. You can find the link to Turnitin on Learn under Assessment > Formative Assessment F3.
Week 9: Continuous predictors and interactions
- How do we model predictors that aren’t categorical, but continuous?
- How do we interpret model estimates for continuous predictors?
- How do we fit and interpret interactions involving continuous predictors?
- Centre continuous predictors.
- Run and interpret models with continuous predictors.
- Interpret interactions that are categorical * continuous (in the lecture) and continuous * continuous (in the tutorial).
- Lecture slides.
- Workshop tutorial.
- Workshop files (right-click and download):
Thursday 30 March
Due on Thursday 30 March at noon
The first summative contains a series of guided exercises. You can find the summative materials and exercises here:
You may find helpful information in the FAQ post on Piazza here.
Thursday 27 April
Due on Thursday 27 April at noon
In the second summative assessment, you will:
- select a dataset and its associated research questions,
- analyse the data using one linear model, and
- write a report about the data, the model, and your findings.
You can find more information about the assessment on GitHub here: