Scatter Plots
Plot data points to identify correlation, trends, and relationships between variables.
For Elementary Students
What is a Scatter Plot?
A scatter plot is a graph that shows dots to see if two things are related!
Think about it like this: Imagine you want to know if studying more helps you get better test scores. A scatter plot lets you see the pattern by putting dots on a graph!
Parts of a Scatter Plot
Dots (points) — each dot represents one person or thing
x-axis (horizontal) → first thing you're measuring (like hours studied)
y-axis (vertical) ↑ second thing you're measuring (like test score)
Title — tells what the graph is about
Example: Ice Cream and Temperature
Let's say we want to see if hotter days mean more ice cream sales.
Data:
| Temperature (°F) | Ice Creams Sold |
|---|---|
| 60 | 10 |
| 70 | 15 |
| 80 | 25 |
| 90 | 30 |
Making the scatter plot:
Ice Cream Sales vs. Temperature
30 • •
25 • •
20 •
15 • •
10 • •
5 •
───┼───┼───┼───┼───┼───→ Temperature
60 70 80 90 100
Each dot shows one day!
What Does It Tell Us?
Looking at the dots, they go UP from left to right!
This means: When temperature goes up, ice cream sales go up too!
The two things are related!
Three Patterns to Look For
Pattern 1: Dots Go Up (like ice cream example)
•
•
•
Meaning: Both things increase together
Examples:
- More study → Higher scores
- More rain → More umbrella sales
Pattern 2: Dots Go Down
•
•
•
Meaning: One goes up, the other goes down
Examples:
- More absences → Lower grades
- More practice → Fewer mistakes
Pattern 3: Dots Are All Over
• •
• •
•
Meaning: No pattern! The two things aren't related.
Examples:
- Shoe size vs. test score (no connection!)
- Hair color vs. height
Reading a Scatter Plot
Question: Look at our ice cream graph. How many ice creams were sold when it was 80°F?
Answer: Find 80 on the bottom (x-axis), go up to the dot, then look left → 25 ice creams
Making Your Own Scatter Plot
Steps:
Step 1: Collect data with two numbers for each item
- Example:
(height, weight)for each student
Step 2: Draw the axes
- Horizontal (x) = first measurement
- Vertical (y) = second measurement
Step 3: Plot each pair as a dot
(x, y)→ go right x, then up y
Step 4: Look for a pattern!
Why Use Scatter Plots?
To see relationships between two things!
To predict: If the pattern is clear, you can guess what might happen next
Example: If it's 85°F tomorrow, we can guess we'll sell about 27-28 ice creams (between the 80° and 90° dots)
Real-Life Scatter Plots
Sports: Practice time vs. number of goals scored
School: Pages read vs. reading level
Science: Plant water vs. plant height
Health: Exercise vs. energy level
For Junior High Students
Formal Definition
A scatter plot (or scatter diagram) displays paired numerical data (x, y) as points on a coordinate plane to reveal relationships between two quantitative variables.
Components:
y-axis (dependent variable)
↑
| • (data point)
| •
| • •
|
└───────────→ x-axis (independent variable)
Each point (x, y):
- x-coordinate: value of independent variable
- y-coordinate: value of dependent variable
Types of Correlation
Correlation: The relationship between two variables
Positive Correlation: As x increases, y tends to increase
Points trend upward: ↗
Example: Study hours vs. test scores
- Strong positive: Points close to upward line
- Moderate positive: Points generally upward but scattered
- Weak positive: Slight upward trend, very scattered
Negative Correlation: As x increases, y tends to decrease
Points trend downward: ↘
Example: Number of absences vs. final grade
- Strong negative: Points close to downward line
- Moderate negative: General downward trend
- Weak negative: Slight downward trend, scattered
No Correlation: No clear pattern
Points randomly scattered
Example: Shoe size vs. test scores
Variables are independent (not related)
Strength of Correlation
Strong: Points closely follow a line
- Clear, predictable relationship
- R² value near 1 (or -1 for negative)
Moderate: Points somewhat follow a line
- General trend visible but with variation
Weak: Points barely follow a line
- Little predictability
Creating a Scatter Plot
Example: Hours Studied vs. Test Score
Data:
| Student | Hours | Score |
|---|---|---|
| Ana | 1 | 65 |
| Ben | 2 | 70 |
| Cal | 3 | 75 |
| Dan | 4 | 85 |
| Eva | 5 | 90 |
Procedure:
Step 1: Set up coordinate system
- x-axis: Hours Studied (0-6)
- y-axis: Test Score (60-100)
Step 2: Choose appropriate scale
- x-axis: Count by 1s
- y-axis: Count by 10s
Step 3: Plot points
(1, 65),(2, 70),(3, 75),(4, 85),(5, 90)
Step 4: Analyze pattern
- Points trend upward → positive correlation
Line of Best Fit (Trend Line)
Definition: A line that best represents the trend of the data
Not all points are on the line, but it minimizes overall distance to all points.
Purpose:
- Summarize the relationship
- Make predictions
Drawing by hand:
- Draw line through middle of points
- Balance points above and below line
- Line should follow general trend
Equation form: y = mx + b
- m = slope (rate of change)
- b = y-intercept (starting value)
Making Predictions
Interpolation: Predict within the data range
Example: Data from x = 0 to x = 10
Predict at x = 7 → More reliable (within range)
Extrapolation: Predict outside the data range
Example: Data from x = 0 to x = 10
Predict at x = 15 → Less reliable (beyond data)
Caution: Extrapolation assumes pattern continues, which may not be true!
Outliers
Outlier: A point that doesn't fit the general pattern
Identification: Point far from the trend line or cluster
Example: Hours studied vs. score
Most points: More hours → higher score
Outlier: Student studied 5 hours but scored 60 (much lower than expected)
Possible reasons:
- Data entry error
- Unusual circumstance (student was sick)
- Other factors at play
Impact: Outliers can affect correlation strength and trend line position
Clusters and Gaps
Cluster: Group of points close together
Example: Most students between 70-80 on test
Gap: Region with no data points
Example: No students scored between 50-65
Interpretation: Shows distribution patterns in data
Causation vs. Correlation
CRITICAL DISTINCTION:
Correlation: Two variables are related (change together)
Causation: One variable CAUSES the other to change
Correlation ≠ Causation!
Example: Ice Cream Sales and Drownings
Observation: Both increase in summer (positive correlation)
WRONG conclusion: Ice cream causes drownings
CORRECT explanation: Both caused by a third variable (warm weather)
- Warm weather → more swimming → more drownings
- Warm weather → more ice cream purchases
Key principle: Just because two things are correlated doesn't mean one causes the other.
Determining Causation
Need:
- Controlled experiments
- Logical mechanism explaining the cause
- Temporal relationship (cause before effect)
- Elimination of alternative explanations
Correlation alone is NOT enough!
Analyzing Scatter Plots
Questions to ask:
- Direction: Positive, negative, or no correlation?
- Strength: Strong, moderate, or weak?
- Form: Linear (straight line) or nonlinear (curved)?
- Outliers: Are there unusual points?
- Clusters/gaps: Are data grouped or spread evenly?
Example Analysis
Temperature vs. Ice Cream Sales
- Direction: Positive (both increase together)
- Strength: Strong (points close to line)
- Form: Linear (straight-line pattern)
- Outliers: One day (90°F, 15 sales) - perhaps rainy?
- Interpretation: Strong positive linear relationship
Correlation Coefficient (r)
Advanced concept: A number measuring correlation strength
Range: -1 ≤ r ≤ 1
Interpretation:
- r = 1: Perfect positive correlation
- r = 0.7 to 0.9: Strong positive
- r = 0.4 to 0.7: Moderate positive
- r = 0 to 0.4: Weak positive
- r = 0: No correlation
- r = -0.4 to 0: Weak negative
- r = -0.7 to -0.4: Moderate negative
- r = -0.9 to -0.7: Strong negative
- r = -1: Perfect negative correlation
Applications
Science: Identifying relationships between variables
- Temperature vs. chemical reaction rate
- Fertilizer amount vs. plant growth
Economics: Analyzing market trends
- Advertising spend vs. sales revenue
- Price vs. demand
Medicine: Health studies
- Exercise vs. blood pressure
- Age vs. bone density
Sports: Performance analysis
- Practice time vs. free throw percentage
- Height vs. rebounds
Social Science: Behavioral studies
- Study time vs. GPA
- Sleep hours vs. test performance
Common Errors
Error 1: Assuming correlation means causation
❌ Sales and drownings correlate → ice cream causes drownings
✓ Both have common cause (temperature)
Error 2: Extrapolating too far beyond data
❌ Data from ages 5-18, predict for age 50
✓ Only predict within or near data range
Error 3: Ignoring outliers
❌ Draw trend line through outlier
✓ Consider whether outlier is error or meaningful
Error 4: Wrong scale distorts pattern
❌ Scale makes weak correlation look strong
✓ Use appropriate, honest scale
Tips for Success
Tip 1: Always label axes with units
Tip 2: Use consistent scale (equal spacing)
Tip 3: Look at overall pattern, not individual points
Tip 4: Consider context when interpreting
Tip 5: Question causation claims based only on correlation
Tip 6: Check for outliers and investigate causes
Tip 7: Use scatter plots for continuous numerical data, not categorical
Summary
Scatter plot: Graph of (x, y) points showing relationship between variables
Correlation types:
- Positive: both increase
- Negative: one increases, other decreases
- None: no pattern
Strength: Strong (tight pattern), moderate, weak (scattered)
Trend line: Represents overall pattern, used for prediction
Key warning: Correlation ≠ Causation
Uses: Identify relationships, make predictions, analyze patterns
Practice
Points on a scatter plot trend upward from left to right. This shows:
As hours of TV watched increases, test scores decrease. This is:
Points are very close to the trend line. This indicates:
Using a trend line to predict a value within the data range is called: