Skip to main content

Advanced Statistical analysis – Bivariate data analysis

This lesson comprises two (2) master classes focusing on:

  • Classifying data
  • Organising and representing data
  • Measures of central tendency and spread
  • Bivariate scatter plots
  • Lines of best fit
  • Pearson's correlation coefficient
  • Describing bivariate data sets
  • Limitations of interpolation and extrapolation

Content:

MA-S2.1


  • Classify data relating to a single random variable
  • Organise, interpret and display data into appropriate tabular and/or graphical representations including Pareto charts, cumulative frequency distribution tables or graphs, parallel box-plots and two-way tables
    • compare the suitability of different methods of data presentation in real-world contexts
  • Summarise and interpret grouped and ungrouped data through appropriate graphs and summary statistics
  • Calculate measures of central tendency and spread and investigate their suitability in real-world contexts and use to compare large datasets
    • investigate real-world examples from the media illustrating appropriate and inappropriate uses or misuses of measures of central tendency and spread
  • Identify outliers and investigate and describe the effect of outliers on summary statistics
    • use different approaches for identifying outliers, for example consideration of the distance from the mean or median, or the use of below \( Q_1−1.5 \times IQR \) and above \( Q_3+1.5 \times IQR \) as criteria, recognising and justifying when each approach is appropriate
    • investigate and recognise the effect of outliers on the mean, median and standard deviation
  • Describe, compare and interpret the distributions of graphical displays and/or numerical datasets and report findings in a systematic and concise manner

 

MA-S2.2


  • Construct a bivariate scatterplot to identify patterns in the data that suggest the presence of an association
  • Use bivariate scatterplots (constructing them where needed), to describe the patterns, features and associations of bivariate datasets, justifying any conclusions
    • describe bivariate datasets in terms of form (linear/non-linear) and in the case of linear, also the direction (positive/negative) and strength of association (strong/moderate/weak)
    • identify the dependent and independent variables within bivariate datasets where appropriate
    • describe and interpret a variety of bivariate datasets involving two numerical variables using real-world examples in the media or those freely available from government or business datasets
  • Calculate and interpret Pearson's correlation coefficient (r) using technology to quantify the strength of a linear association of a sample
  • Model a linear relationship by fitting an appropriate line of best fit to a scatterplot and using it to describe and quantify associations
    • fit a line of best fit to the data by eye and using technology
    • fit a least squares regression line to the data using technology
    • interpret the intercept and gradient of the fitted line
  • Use the appropriate line of best fit, both found by eye and by applying the equation of the fitted line, to make predictions by either interpolation or extrapolation
    • distinguish between interpolation and extrapolation, recognising the limitations of using the fitted line to make predictions, and interpolate from plotted data to make predictions where appropriate
  • Solve problems that involve identifying, analysing and describing associations between two numeric variables
  • Construct, interpret and analyse scatterplots for bivariate numerical data in practical contexts
    • demonstrate an awareness of issues of privacy and bias, ethics, and responsiveness to diverse groups and cultures when collecting and using data