class: center, middle, inverse, title-slide .title[ # Statistics and Quantitative Methods (S1) ] .subtitle[ ## Week 2 ] .author[ ### Dr Stefano Coretta ] .institute[ ### University of Edinburgh ] .date[ ### 2022/09/27 ] --- # Data visualisation .center[ ![](../../img/data-quant.png) ] --- # Good data visualisation Alberto Cairo has identified four common features of a good data visualisation: 1. It contains **reliable information**. -- 2. The design has been chosen so that relevant **patterns become noticeable**. -- 3. It is presented in an **attractive** manner, but appearance should not get in the way of **honesty, clarity and depth**. -- 4. When appropriate, it is organized in a way that **enables some exploration**. ??? Spiegelhalter, David. The Art of Statistics (Pelican Books) (pp. 64-66). Penguin Books Ltd. Kindle Edition. --- # Quick poll .f3[*Which of the following are not continuous measures, i.e. they are discrete?*] <br> .pull-left[ .f3[Join at] .f1[slido.com] .f1[\#4018 941] ] .pull-right[ .center[ ![](../../img/QR-SQM-1-Week-2.png) ] ] ??? Slido poll. <https://app.sli.do/event/dh4i7sfLzWJJFyUWZJZzJt> --- layout:true # Types of variables --- .center[ ![:scale 70%](../../img/discr-cont-probs.png) ] --- .pull-left[ **Continuous variables** Between any value, there is an infinite set of other possible values - Duration of segments, words, pauses, ... - Acoustic measurements, like formants and Centre of Gravity. - Reaction times. - Brain activity from EEG or MRI scans. - Can you think of anything else? ] -- .pull-right[ **Discrete variables** Between any consecutive value, there cannot be any other possible value. - Counts of words, segments, gestures, f0 peaks, ... - Binary outcomes like *yes/no*, *correct/incorrect*, *voiced/voiceless*, *real word/nonce word*, *L1/L2*, *mono/bilingual*, ... - Categorical outcomes with more than two levels, like gender, languages, places of articulation, diagnosis, ... - Ordered scales, like Likert scales and ratings, language attitude, ... ] ??? The continuous/discrete dichotomy is horthogonal to the numeric/categorical dichotomy: - Counts are numeric but discrete. - Durations cannot be negative but are numeric and continuous. --- layout: false # Quick poll .f3[*Which of the following graphs are you familiar with?*] <br> .pull-left[ .f3[Join at] .f1[slido.com] .f1[\#4018 941] ] .pull-right[ .center[ ![](../../img/QR-SQM-1-Week-2.png) ] ] ??? Slido poll. <https://app.sli.do/event/dh4i7sfLzWJJFyUWZJZzJt> --- # Bar chart <img src="index_files/figure-html/unnamed-chunk-1-1.png" height="500px" style="display: block; margin: auto;" /> ??? Bar charts are great for counts (of anything). The *x*-axis includes the level of AES, while the *y*-axis shows the number of languages per AES level. --- layout: true # Stacked bar chart --- <img src="index_files/figure-html/aes-stack-1-1.png" height="500px" style="display: block; margin: auto;" /> ??? In this plot I separated endangered vs non-endangered languages. Within the endangered languages I further show the counts of different AES levels. --- <img src="index_files/figure-html/aes-stack-2-1.png" height="500px" style="display: block; margin: auto;" /> ??? Here, the *x*-axis corresponds to the language macro-areas in the data. Within each bar, the counts for each of the AES levels is given. --- layout: false # Stacked proportion (filled) bar chart <img src="index_files/figure-html/aes-filled-1.png" height="500px" style="display: block; margin: auto;" /> ??? So far we have seen raw counts. What about proportions? You can show proportions by using a "filled" bar chart. Each bar is stretched so that covers the entire range from 0 to 1. Note that proportions are between 0 and 1, while percentages are between 0 and 100%. --- # Dot matrix chart <img src="index_files/figure-html/aes-matrix-1.png" height="500px" style="display: block; margin: auto;" /> --- # Mosaic plot <img src="index_files/figure-html/aes-mosaic-1.png" height="500px" style="display: block; margin: auto;" /> --- layout: true # Line plot --- <img src="index_files/figure-html/ip-line-1.png" height="500px" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/ip-point-1.png" height="500px" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/ip-line-point-1.png" height="500px" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/gest-line-1.png" height="500px" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/gest-line-facet-1-1.png" height="500px" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/gest-line-facet-2-1.png" height="500px" style="display: block; margin: auto;" /> --- layout: false layout: true # Connected dots plot --- <img src="index_files/figure-html/gest-conn-1-1.png" height="500px" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/gest-conn-2-1.png" height="500px" style="display: block; margin: auto;" /> --- layout: false layout: true # Strip chart --- <img src="index_files/figure-html/pol-strip-f0-1.png" height="500px" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/pol-strip-hnr-1.png" height="500px" style="display: block; margin: auto;" /> --- layout: false layout: true # Density plot --- <img src="index_files/figure-html/pol-dens-1-1.png" height="500px" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/pol-dens-2-1.png" height="500px" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/pol-dens-3-1.png" height="500px" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/pol-dens-4-1.png" height="500px" style="display: block; margin: auto;" /> --- layout: false layout: true # Violin plot --- <img src="index_files/figure-html/pol-vio-1-1.png" height="500px" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/pol-vio-2-1.png" height="500px" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/pol-vio-3-1.png" height="500px" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/pol-vio-4-1.png" height="500px" style="display: block; margin: auto;" /> --- layout: false layout: true # Scatter plot --- <img src="index_files/figure-html/pol-sca-1-1.png" height="500px" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/pol-sca-2-1.png" height="500px" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/pol-sca-3-1.png" height="500px" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/pol-sca-4-1.png" height="500px" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/pol-sca-5-1.png" height="500px" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/mald-1-1.png" height="500px" style="display: block; margin: auto;" /> --- layout: false class: center middle reverse # DO'S AND DON'TS --- layout: true # DO --- <img src="index_files/figure-html/mald-bar-1-1.png" height="500px" style="display: block; margin: auto;" /> ??? Bar charts should be used for discrete numeric variables, not for continuous variables. --- <img src="index_files/figure-html/mald-bar-2-1.png" height="500px" style="display: block; margin: auto;" /> ??? If you want to show proportions, instead of raw counts, use proportion bar charts (aka filled bar chart). --- <img src="index_files/figure-html/mald-bar-3-1.png" height="500px" style="display: block; margin: auto;" /> ??? To show proportions from multiple subjects/items, use strip charts. --- layout: false # DON'T <img src="index_files/figure-html/mald-dont-1.png" height="500px" style="display: block; margin: auto;" /> ??? Never ever ever use bar charts with error bars to show mean proportions. They are misleading: - The bars do not indicate a discrete numeric values: mean proportions are continuous variables. - Error bars mask the true variability of the data: show raw proportions instead. For more see: https://www.data-to-viz.com/caveat/error_bar.html, https://stats.stackexchange.com/questions/349422/does-it-make-sense-to-add-error-bars-in-a-bar-chart-of-frequencies/367889#367889 --- # DO <img src="index_files/figure-html/pol-do-1.png" height="500px" style="display: block; margin: auto;" /> ??? For continuous variables, like acoustic measures or reaction times, use violins with overlaid strip charts. You can include very narrow box plots, but remember that box plots mask variability in the raw data. --- # DON'T <img src="index_files/figure-html/pol-dont-1.png" height="500px" style="display: block; margin: auto;" /> ??? Can you see what difference it makes to use box plots only? --- # Summary - Carefully think about which type of variable you are working with: **continuous or discrete**? - The type of variable allows you to select appropriate types of plots. Your **go-to plots** are: - Bar charts (and variants). - Strip charts. - Line plots. - Density plots. - Violin plots. - Be mindful of the **DOs and DON'Ts** of plotting.