Standard Deviation and Variance
Measure data spread using range, variance, and standard deviation; understand σ notation.
Measures of Spread
Measures of center: Mean, median, mode (where data clusters)
Measures of spread: Range, variance, standard deviation (how spread out data is)
Why important: Two datasets can have same mean but very different spreads
Example: Same Mean, Different Spread
Data set A: 10, 10, 10, 10 (no variation) Data set B: 0, 5, 15, 20 (lots of variation)
Both have mean = 10, but B is much more spread out
Range
Range: Difference between maximum and minimum
Formula: Range = max - min
Pros: Easy to calculate
Cons: Only uses two values, sensitive to outliers
Example: Calculate Range
Data: 5, 8, 12, 15, 20
Range: 20 - 5 = 15
Variance
Variance: Average of squared deviations from mean
Measures: How far data points are from mean, on average
Symbol: σ² (population), s² (sample)
Formula (population):
σ² = Σ(x - μ)² / N
Where:
- x = each data value
- μ = population mean
- N = number of values
Formula (sample):
s² = Σ(x - x̄)² / (n - 1)
Where:
- x̄ = sample mean
- n = sample size
- (n - 1) is called "degrees of freedom"
Example 1: Calculate Variance
Data: 2, 4, 6, 8, 10
Step 1: Find mean
μ = (2 + 4 + 6 + 8 + 10)/5 = 30/5 = 6
Step 2: Find deviations from mean
2 - 6 = -4
4 - 6 = -2
6 - 6 = 0
8 - 6 = 2
10 - 6 = 4
Step 3: Square deviations
(-4)² = 16
(-2)² = 4
0² = 0
2² = 4
4² = 16
Step 4: Find average
σ² = (16 + 4 + 0 + 4 + 16)/5 = 40/5 = 8
Variance: 8
Example 2: Using Table
Data: 1, 3, 3, 5, 8
| x | x - μ | (x - μ)² |
|---|---|---|
| 1 | -3 | 9 |
| 3 | -1 | 1 |
| 3 | -1 | 1 |
| 5 | 1 | 1 |
| 8 | 4 | 16 |
Mean μ = 4
Variance:
σ² = (9 + 1 + 1 + 1 + 16)/5 = 28/5 = 5.6
Standard Deviation
Standard deviation: Square root of variance
Symbol: σ (population), s (sample)
Same units as original data (variance has squared units)
Formula:
σ = √(σ²) or s = √(s²)
Interpretation: Average distance from mean
Example 1: From Variance
Variance = 25
Standard deviation:
σ = √25 = 5
Example 2: Complete Calculation
Data: 10, 12, 14, 16, 18
Mean: μ = 14
Deviations: -4, -2, 0, 2, 4
Squared: 16, 4, 0, 4, 16
Variance:
σ² = (16 + 4 + 0 + 4 + 16)/5 = 40/5 = 8
Standard deviation:
σ = √8 ≈ 2.83
Interpretation: On average, values are about 2.83 units from mean
Population vs Sample
Population: All members of a group
- Use σ, σ²
- Divide by N
Sample: Subset of population
- Use s, s²
- Divide by (n - 1) for better estimate
Example: Sample Standard Deviation
Sample data: 5, 7, 9, 11
Mean: x̄ = 8
Squared deviations: 9, 1, 1, 9
Sample variance:
s² = (9 + 1 + 1 + 9)/(4 - 1) = 20/3 ≈ 6.67
Sample standard deviation:
s = √(20/3) ≈ 2.58
Properties of Standard Deviation
σ ≥ 0 (always non-negative)
σ = 0 only when all values are identical
Larger σ means more spread out data
Units same as data (unlike variance)
Adding constant to all data: σ unchanged
Multiplying all data by constant k: σ multiplied by |k|
Example: Effect of Transformation
Original data: 2, 4, 6, 8 (σ = 2.24)
Add 10 to each: 12, 14, 16, 18
- Mean changes to 15
- σ stays 2.24
Multiply by 3: 6, 12, 18, 24
- Mean becomes 15
- σ becomes 3(2.24) = 6.72
Interpreting Standard Deviation
Small σ: Data clustered near mean
Large σ: Data widely dispersed
Compare datasets: Larger σ means more variability
Example: Compare Two Classes
Class A scores: Mean = 75, σ = 5
- Most scores 70-80 (close to mean)
Class B scores: Mean = 75, σ = 15
- Scores range 60-90 (very spread out)
Same average, but Class B has more variation
Using Calculator/Technology
Most calculators have built-in functions:
- σₓ or σ (population)
- sₓ or s (sample)
Enter data in list, use statistics function
Empirical Rule (68-95-99.7)
For approximately normal distributions:
68% of data within 1σ of mean
95% of data within 2σ of mean
99.7% of data within 3σ of mean
Example: Apply Empirical Rule
Heights: mean = 170 cm, σ = 10 cm
Within 1σ: 160-180 cm (68%)
Within 2σ: 150-190 cm (95%)
Within 3σ: 140-200 cm (99.7%)
Outliers and Standard Deviation
Outlier: Value significantly different from others
Outliers increase standard deviation
Common rule: Value is outlier if more than 2σ from mean
Example: Identify Outlier
Data: 10, 12, 14, 16, 50
Mean: 20.4 Standard deviation: ≈ 16.4
Check 50:
50 - 20.4 = 29.6
29.6 > 2(16.4) = 32.8? No
Borderline, but significantly affects σ
Coefficient of Variation
Compares variability across different scales
CV = (σ/μ) × 100%
Useful when comparing datasets with different units or means
Example: Compare Variation
Heights: Mean = 170 cm, σ = 10 cm
CV = (10/170) × 100% ≈ 5.9%
Weights: Mean = 70 kg, σ = 8 kg
CV = (8/70) × 100% ≈ 11.4%
Weights have more relative variation
Real-World Applications
Quality control: Product consistency (low σ desired)
Finance: Investment risk (σ measures volatility)
Testing: Score consistency across students
Weather: Temperature variability
Sports: Player/team consistency
Example: Manufacturing
Target bolt length: 5.0 cm
σ = 0.05 cm (tight tolerance)
95% of bolts within:
5.0 ± 2(0.05) = 4.9 to 5.1 cm
Acceptable quality control
Practice
Data: 3, 5, 7, 9. What is the variance?
If variance = 16, what is standard deviation?
All data values are identical. What is σ?
For normal distribution with mean 50 and σ = 5, what percent within 45-55?