Statistics — Study Notes for SOF IMO
Overview
Statistics is the branch of mathematics dealing with collection, organization, analysis and interpretation of numerical data. For SOF IMO, this topic focuses on measures of central tendency (mean, median, mode) for both ungrouped and grouped data, along with graphical representation of data through bar graphs, histograms, frequency polygons and pie charts.
Statistics problems regularly appear in the Mathematical Reasoning section and occasionally in the Achievers Section where data interpretation meets real-world scenarios. Students must master quick calculation techniques for mean, median and mode, understand frequency distribution tables, and interpret various graphical formats. The topic bridges pure mathematics with everyday applications, making it both conceptually important and practically relevant for competitive exam success.
Strong performance in statistics requires accuracy in arithmetic operations, careful reading of frequency tables, and the ability to extract information from graphs quickly. Students should focus on identifying which measure of central tendency best represents a given dataset and practice converting raw data into grouped frequency distributions.
Key Concepts
- **Ungrouped data** consists of individual observations listed separately, while **grouped data** organizes observations into class intervals with their frequencies. Ungrouped data is easier to handle but becomes unwieldy for large datasets.
- **Mean (arithmetic average)** represents the sum of all observations divided by their count. For grouped data, we use class marks (midpoints) multiplied by frequencies, making it sensitive to extreme values.
- **Median** is the middle value when data is arranged in ascending or descending order, making it resistant to outliers. For even number of observations, median equals the average of the two middle values.
- **Mode** is the observation or class interval with the highest frequency. A dataset can be unimodal (one mode), bimodal (two modes) or multimodal, while some datasets have no mode if all frequencies are equal.
- **Class mark (midpoint)** for any class interval equals (lower limit + upper limit)/2 and serves as the representative value for that entire class in grouped data calculations.
- **Cumulative frequency** is the running total of frequencies up to a particular class, essential for finding the median in grouped data and constructing cumulative frequency curves (ogives).
- **Range** measures data spread as the difference between maximum and minimum values, providing a simple indicator of variability alongside central tendency measures.
- Graphical representations transform numerical data into visual formats: **bar graphs** for categorical data, **histograms** for continuous grouped data with no gaps between bars, **frequency polygons** formed by joining midpoints of histogram tops, and **pie charts** showing proportional parts of a whole.
Formulas / Key Facts
**For ungrouped data:**
- Mean = Sum of all observations / Number of observations = Σx / n
- Median = Middle value when arranged in order; for n odd: (n+1)/2 th term; for n even: average of (n/2)th and (n/2 + 1)th terms
- Mode = Most frequently occurring observation
**For grouped data:**
- Mean = Σ(f × x) / Σf, where f = frequency, x = class mark
- Class mark = (Lower limit + Upper limit) / 2
- Median class = Class containing the (n/2)th observation in cumulative frequency
- Modal class = Class with maximum frequency
- Cumulative frequency of a class = Sum of frequencies of that class and all previous classes
**Graphical conversions:**
- Pie chart angle for a category = (Frequency of category / Total frequency) × 360°
- Histogram bars touch each other; bar width represents class width, height represents frequency (or frequency density)
- Frequency polygon connects midpoints of histogram bars, with two additional points on x-axis at imaginary class marks before first and after last class
Worked Examples
**Example 1: Mean and median of ungrouped data**
Find the mean and median of: 5, 8, 12, 15, 8, 20, 8, 25
*Solution:* Mean = (5 + 8 + 12 + 15 + 8 + 20 + 8 + 25) / 8 = 101 / 8 = 12.625
For median, arrange in order: 5, 8, 8, 8, 12, 15, 20, 25 Since n = 8 (even), median = average of 4th and 5th terms = (8 + 12) / 2 = 10
Mode = 8 (appears 3 times, most frequent)
**Example 2: Mean of grouped data**
| Class Interval | 0-10 | 10-20 | 20-30 | 30-40 | |----------------|------|-------|-------|-------| | Frequency | 5 | 8 | 12 | 5 |
Find the mean.
*Solution:* | Class | Frequency (f) | Class mark (x) | f × x | |-------|---------------|----------------|-------| | 0-10 | 5 | 5 | 25 | | 10-20 | 8 | 15 | 120 | | 20-30 | 12 | 25 | 300 | | 30-40 | 5 | 35 | 175 | | Total | Σf = 30 | | 620 |
Mean = Σ(f × x) / Σf = 620 / 30 = 20.67
**Example 3: Median from cumulative frequency**
Using the data from Example 2, find the median class.
*Solution:* | Class | Frequency | Cumulative Frequency | |----------|-----------|----------------------| | 0-10 | 5 | 5 | | 10-20 | 8 | 13 | | 20-30 | 12 | 25 | | 30-40 | 5 | 30 |
n = 30, so n/2 = 15
The 15th observation lies in the class where cumulative frequency first exceeds or equals 15, which is 20-30. Therefore, **median class = 20-30**.
Common Mistakes
- **Confusing class mark with class boundaries** → Class mark is the midpoint used for calculations, while boundaries (like 9.5, 19.5) are used for continuous data representation. For mean calculation, always use class marks.
- **Forgetting to arrange data in order before finding median** → Students often pick the middle position from unsorted data. Always sort first: median requires ordered data.
- **Using mode instead of modal class in grouped data** → For grouped data, we identify the modal class (highest frequency interval), not a single mode value. Don't report a class mark as "the mode."
- **Wrong cumulative frequency calculation** → Cumulative frequency must include the current class frequency plus all previous ones, not just the running count. Double-check each cumulative total builds on the last.
- **Mixing up histogram and bar graph properties** → Histograms have touching bars for continuous data; bar graphs have gaps for discrete categories. Also, histogram bars can have varying widths (use frequency density then), while bar graphs typically have uniform width.
Quick Reference
- Mean = Σ(f × x) / Σf for grouped data; sensitive to extreme values
- Median = middle value when ordered; use n/2 position for cumulative frequency in grouped data
- Mode = highest frequency observation or class
- Class mark = (Lower + Upper limit) / 2
- Always arrange ungrouped data in order before finding median
- Pie chart sector angle = (Frequency / Total) × 360°