Elementary Statistics — RRB NTPC Study Notes
Overview
Elementary Statistics is a moderate-weightage topic in RRB NTPC Mathematics, typically yielding 2–4 questions per paper. Questions test your ability to compute measures of central tendency (mean, median, mode) and dispersion (range) from raw data, frequency distributions, or grouped data. You may also encounter basic data interpretation from tables or simple charts.
This topic rewards systematic practice. Most questions follow standard formulas, so accuracy and speed in arithmetic are crucial. Students often lose marks through calculation errors rather than conceptual gaps. The key is to memorize the formulas clearly, understand when to apply each measure, and practice sufficient problems to handle variations like missing frequencies or combined datasets.
Master this topic by drilling formula application on 15–20 varied problems. Since calculations can be lengthy, develop shortcuts for finding medians in sorted lists and for calculating means with assumed means (step-deviation method). This preparation pays off with quick, confident answers on exam day.
Key Concepts
- **Mean (Arithmetic Average)**: The sum of all observations divided by the number of observations. It represents the "balance point" of the data and is affected by every value, including outliers.
- **Median**: The middle value when data is arranged in ascending or descending order. For odd n, it's the ((n+1)/2)th term; for even n, it's the average of the (n/2)th and (n/2 + 1)th terms. Median is robust to extreme values.
- **Mode**: The most frequently occurring value in a dataset. A dataset can be unimodal (one mode), bimodal (two modes), multimodal, or have no mode if all values occur equally.
- **Range**: The difference between the maximum and minimum values. It measures the spread or dispersion of data and is the simplest measure of variability.
- **Frequency Distribution**: Data organized into classes/groups with their frequencies. For grouped data, use class marks (midpoints) and apply modified formulas for mean, median, and mode.
- **Cumulative Frequency**: Running total of frequencies, used to locate the median class in grouped data.
Formulas / Key Facts
**For Ungrouped (Raw) Data:**
- **Mean** = (Sum of all observations) / (Number of observations) = (Σx) / n
- **Median** = Middle value after sorting. If n is odd: ((n+1)/2)th term. If n is even: average of (n/2)th and (n/2 + 1)th terms.
- **Mode** = Most frequent value(s) in the dataset.
- **Range** = Maximum value − Minimum value
**For Grouped (Frequency Distribution) Data:**
- **Mean** = Σ(f × x) / Σf, where f is frequency and x is class mark (midpoint of class interval).
- **Median** = L + [(n/2 − CF) / f] × h, where L = lower boundary of median class, n = total frequency, CF = cumulative frequency before median class, f = frequency of median class, h = class width.
- **Mode** = L + [(f₁ − f₀) / (2f₁ − f₀ − f₂)] × h, where L = lower boundary of modal class (class with highest frequency), f₁ = frequency of modal class, f₀ = frequency of class before modal class, f₂ = frequency of class after modal class, h = class width.
- **Median class**: The class where cumulative frequency first equals or exceeds n/2.
- **Modal class**: The class with the highest frequency.
Worked Examples
**Example 1: Mean, Median, Mode, Range of Raw Data**
Find mean, median, mode and range: 12, 15, 18, 15, 22, 19, 15, 20
**Solution:**
- **Mean** = (12 + 15 + 18 + 15 + 22 + 19 + 15 + 20) / 8 = 136 / 8 = 17
- **Sorted data**: 12, 15, 15, 15, 18, 19, 20, 22
- **Median**: n = 8 (even), so median = average of 4th and 5th terms = (15 + 18) / 2 = 16.5
- **Mode** = 15 (appears 3 times, most frequent)
- **Range** = 22 − 12 = 10
**Example 2: Mean from Frequency Distribution**
| Score (x) | 10 | 20 | 30 | 40 | 50 | |-----------|----|----|----|----|----| | Frequency (f) | 3 | 5 | 7 | 4 | 1 |
Find the mean score.
**Solution:**
- Σ(f × x) = (3×10) + (5×20) + (7×30) + (4×40) + (1×50) = 30 + 100 + 210 + 160 + 50 = 550
- Σf = 3 + 5 + 7 + 4 + 1 = 20
- **Mean** = 550 / 20 = 27.5
**Example 3: Median from Grouped Data**
| Class | 0–10 | 10–20 | 20–30 | 30–40 | 40–50 | |-------|------|-------|-------|-------|-------| | Frequency | 4 | 6 | 10 | 8 | 2 |
Find the median.
**Solution:**
- Σf = 4 + 6 + 10 + 8 + 2 = 30, so n/2 = 15
- Cumulative frequencies: 4, 10, 20, 28, 30
- Median class is 20–30 (CF reaches 20, first time ≥ 15)
- L = 20, CF (before median class) = 10, f = 10, h = 10
- **Median** = 20 + [(15 − 10) / 10] × 10 = 20 + 5 = 25
Common Mistakes
- **Forgetting to sort data before finding median** → Always arrange data in ascending/descending order first. Median of unsorted data is meaningless.
- **Confusing mean with median in word problems** → "Average" always means mean unless context clearly indicates median. Read carefully whether the question asks for "middle value" (median) or "arithmetic mean."
- **Using wrong cumulative frequency in median formula** → CF in the formula is the cumulative frequency of the class *before* the median class, not the median class itself. Double-check your cumulative frequency table.
- **Calculation errors in mode formula** → The mode formula has three frequency terms (f₀, f₁, f₂). Ensure you correctly identify the modal class and the classes immediately before and after it. A sign error in (2f₁ − f₀ − f₂) is common.
- **Assuming every dataset has a mode** → If all values occur with equal frequency, there is no mode. Don't force an answer; state "no mode" if appropriate.
Quick Reference
- **Mean** = Sum / Count; affected by all values and outliers.
- **Median** = Middle term (sorted data); robust to extremes.
- **Mode** = Most frequent value; can be none, one, or multiple.
- **Range** = Max − Min; simplest measure of spread.
- **Grouped data median**: Find n/2, locate median class from cumulative frequency, apply L + [(n/2 − CF)/f] × h.
- **Always sort ungrouped data before finding median.**