Skip to main content

Standard Statistical analysis – Data analysis

This lesson comprises three (3) master classes focusing on:

  • Data collection methods
  • Classifying and organising data
  • Data spread and central tendency
  • Data interpretation and communication

Content:

MS-S1.1


  • Describe and use appropriate data collection methods for a population or samples
    • investigate whether a sample obtained from a population may or may not be representative of the population by considering different kinds of sampling methods: systematic sampling, self-selected sampling, capture-recapture, simple random sampling and stratified sampling
    • investigate the advantages and disadvantages of each type of sampling
    • describe the potential faults in the design and practicalities of data collection processes, eg surveys, experiments and observational studies, misunderstandings and misrepresentations, including examples from the media
  • Classify data relating to a single random variable
    • classify a categorical variable as either ordinal, eg income level (low, medium, high) or nominal, eg place of birth (Australia, overseas)
    • classify a numerical variable as either discrete, eg the number of rooms in a house, or continuous, eg the temperature in degrees Celsius
  • Review how to organise and display data into appropriate tabular and/or graphical representations
    • display categorical data in tables and, as appropriate, in both bar charts or Pareto charts
    • display numerical data as frequency distribution tables and histograms, cumulative frequency distribution tables and graphs, dot plots and stem and leaf plots (including back-to-back where comparing two datasets)
    • construct and interpret tables and graphs related to real-world contexts, including: motor vehicle safety including driver behaviour, accident statistics, blood alcohol content over time, running costs of a motor vehicle, costs of purchase and insurance, vehicle depreciation, rainfall, hourly temperature, household and personal water usage
  • Interpret and compare data by considering it in tabular and/or graphical representations
    • choose appropriate tabular and/or graphical representations to enable comparisons
    • compare the suitability of different methods of data presentation in real-world contexts, including their visual appeal, eg a heat map to illustrate climate change data or the median house prices across suburbs

 

MS-S1.2


  • Describe the distinguishing features of a population and sample
    • define notations associated with population values (parameters) and sample-based estimates (statistics), including population mean \( \mu \), population standard deviation \( \sigma \), sample mean \( \bar{x} \) and sample standard deviation \( s \)
  • Summarise and interpret grouped and ungrouped data through appropriate graphs and summary statistics
    • discuss the mode and determine where possible
    • calculate measures of central tendency, including the arithmetic mean and the median
    • investigate the suitability of measures of central tendency in real-world contexts and use them to compare datasets
    • calculate measures of spread including the range, quantiles (including quartiles, deciles and percentiles), interquartile range (IQR) and standard deviation (calculations for standard deviation only required using technology)
  • Investigate and describe the effect of outliers on summary statistics
    • use different approaches for identifying outliers, including consideration of the distance from the mean or median, or the use of \( Q_1−1.5×IQR \) and \( Q_3+1.5×IQR \) as criteria, recognising and justifying when each approach is appropriate
    • investigate and recognise the effect of outliers on the mean and median
  • Investigate real-world examples from the media illustrating appropriate and inappropriate uses or misuses of measures of central tendency and spread
  • Describe, compare and interpret the distributions of graphical displays and/or numerical datasets and report findings in a systematic and concise manner
    • identify modality (unimodal, bimodal or multimodal)
    • identify shape (symmetric or positively or negatively skewed)
    • identify central tendency, spread and outliers, using and justifying appropriate criteria
    • calculate measures of central tendency or measures of spread where appropriate
  • Construct and compare parallel box-plots
    • complete a five-number summary for different datasets
    • compare groups in terms of central tendency (median), spread (IQR and range) and outliers (using appropriate criteria)
    • interpret and communicate the differences observed between parallel box-plots in the context of the data