Scatter Plots

Plot data points to identify correlation, trends, and relationships between variables.

beginnerstatisticsgraphscorrelationmiddle-schoolUpdated 2026-02-02

For Elementary Students

What is a Scatter Plot?

A scatter plot is a graph that shows dots to see if two things are related!

Think about it like this: Imagine you want to know if studying more helps you get better test scores. A scatter plot lets you see the pattern by putting dots on a graph!

Parts of a Scatter Plot

Dots (points) — each dot represents one person or thing

x-axis (horizontal) → first thing you're measuring (like hours studied)

y-axis (vertical) ↑ second thing you're measuring (like test score)

Title — tells what the graph is about

Example: Ice Cream and Temperature

Let's say we want to see if hotter days mean more ice cream sales.

Data:

Temperature (°F)Ice Creams Sold
6010
7015
8025
9030

Making the scatter plot:

Ice Cream Sales vs. Temperature

30 •                    •
25 •              •
20 •
15 •        •
10 •  •
 5 •
───┼───┼───┼───┼───┼───→ Temperature
   60  70  80  90 100

Each dot shows one day!

What Does It Tell Us?

Looking at the dots, they go UP from left to right!

This means: When temperature goes up, ice cream sales go up too!

The two things are related!

Three Patterns to Look For

Pattern 1: Dots Go Up (like ice cream example)

    •
  •
•

Meaning: Both things increase together

Examples:

  • More study → Higher scores
  • More rain → More umbrella sales

Pattern 2: Dots Go Down

•
  •
    •

Meaning: One goes up, the other goes down

Examples:

  • More absences → Lower grades
  • More practice → Fewer mistakes

Pattern 3: Dots Are All Over

•   •
  •   •
•

Meaning: No pattern! The two things aren't related.

Examples:

  • Shoe size vs. test score (no connection!)
  • Hair color vs. height

Reading a Scatter Plot

Question: Look at our ice cream graph. How many ice creams were sold when it was 80°F?

Answer: Find 80 on the bottom (x-axis), go up to the dot, then look left → 25 ice creams

Making Your Own Scatter Plot

Steps:

Step 1: Collect data with two numbers for each item

  • Example: (height, weight) for each student

Step 2: Draw the axes

  • Horizontal (x) = first measurement
  • Vertical (y) = second measurement

Step 3: Plot each pair as a dot

  • (x, y) → go right x, then up y

Step 4: Look for a pattern!

Why Use Scatter Plots?

To see relationships between two things!

To predict: If the pattern is clear, you can guess what might happen next

Example: If it's 85°F tomorrow, we can guess we'll sell about 27-28 ice creams (between the 80° and 90° dots)

Real-Life Scatter Plots

Sports: Practice time vs. number of goals scored

School: Pages read vs. reading level

Science: Plant water vs. plant height

Health: Exercise vs. energy level

For Junior High Students

Formal Definition

A scatter plot (or scatter diagram) displays paired numerical data (x, y) as points on a coordinate plane to reveal relationships between two quantitative variables.

Components:

y-axis (dependent variable)
        ↑
        |  • (data point)
        |    •
        |  •   •
        |
        └───────────→ x-axis (independent variable)

Each point (x, y):

  • x-coordinate: value of independent variable
  • y-coordinate: value of dependent variable

Types of Correlation

Correlation: The relationship between two variables

Positive Correlation: As x increases, y tends to increase

Points trend upward: ↗

Example: Study hours vs. test scores

  • Strong positive: Points close to upward line
  • Moderate positive: Points generally upward but scattered
  • Weak positive: Slight upward trend, very scattered

Negative Correlation: As x increases, y tends to decrease

Points trend downward: ↘

Example: Number of absences vs. final grade

  • Strong negative: Points close to downward line
  • Moderate negative: General downward trend
  • Weak negative: Slight downward trend, scattered

No Correlation: No clear pattern

Points randomly scattered

Example: Shoe size vs. test scores

Variables are independent (not related)

Strength of Correlation

Strong: Points closely follow a line

  • Clear, predictable relationship
  • R² value near 1 (or -1 for negative)

Moderate: Points somewhat follow a line

  • General trend visible but with variation

Weak: Points barely follow a line

  • Little predictability

Creating a Scatter Plot

Example: Hours Studied vs. Test Score

Data:

StudentHoursScore
Ana165
Ben270
Cal375
Dan485
Eva590

Procedure:

Step 1: Set up coordinate system

  • x-axis: Hours Studied (0-6)
  • y-axis: Test Score (60-100)

Step 2: Choose appropriate scale

  • x-axis: Count by 1s
  • y-axis: Count by 10s

Step 3: Plot points

  • (1, 65), (2, 70), (3, 75), (4, 85), (5, 90)

Step 4: Analyze pattern

  • Points trend upward → positive correlation

Line of Best Fit (Trend Line)

Definition: A line that best represents the trend of the data

Not all points are on the line, but it minimizes overall distance to all points.

Purpose:

  • Summarize the relationship
  • Make predictions

Drawing by hand:

  1. Draw line through middle of points
  2. Balance points above and below line
  3. Line should follow general trend

Equation form: y = mx + b

  • m = slope (rate of change)
  • b = y-intercept (starting value)

Making Predictions

Interpolation: Predict within the data range

Example: Data from x = 0 to x = 10

Predict at x = 7 → More reliable (within range)

Extrapolation: Predict outside the data range

Example: Data from x = 0 to x = 10

Predict at x = 15 → Less reliable (beyond data)

Caution: Extrapolation assumes pattern continues, which may not be true!

Outliers

Outlier: A point that doesn't fit the general pattern

Identification: Point far from the trend line or cluster

Example: Hours studied vs. score

Most points: More hours → higher score
Outlier: Student studied 5 hours but scored 60 (much lower than expected)

Possible reasons:

  • Data entry error
  • Unusual circumstance (student was sick)
  • Other factors at play

Impact: Outliers can affect correlation strength and trend line position

Clusters and Gaps

Cluster: Group of points close together

Example: Most students between 70-80 on test

Gap: Region with no data points

Example: No students scored between 50-65

Interpretation: Shows distribution patterns in data

Causation vs. Correlation

CRITICAL DISTINCTION:

Correlation: Two variables are related (change together)

Causation: One variable CAUSES the other to change

Correlation Causation!

Example: Ice Cream Sales and Drownings

Observation: Both increase in summer (positive correlation)

WRONG conclusion: Ice cream causes drownings

CORRECT explanation: Both caused by a third variable (warm weather)

  • Warm weather → more swimming → more drownings
  • Warm weather → more ice cream purchases

Key principle: Just because two things are correlated doesn't mean one causes the other.

Determining Causation

Need:

  • Controlled experiments
  • Logical mechanism explaining the cause
  • Temporal relationship (cause before effect)
  • Elimination of alternative explanations

Correlation alone is NOT enough!

Analyzing Scatter Plots

Questions to ask:

  1. Direction: Positive, negative, or no correlation?
  2. Strength: Strong, moderate, or weak?
  3. Form: Linear (straight line) or nonlinear (curved)?
  4. Outliers: Are there unusual points?
  5. Clusters/gaps: Are data grouped or spread evenly?

Example Analysis

Temperature vs. Ice Cream Sales

  • Direction: Positive (both increase together)
  • Strength: Strong (points close to line)
  • Form: Linear (straight-line pattern)
  • Outliers: One day (90°F, 15 sales) - perhaps rainy?
  • Interpretation: Strong positive linear relationship

Correlation Coefficient (r)

Advanced concept: A number measuring correlation strength

Range: -1 ≤ r ≤ 1

Interpretation:

  • r = 1: Perfect positive correlation
  • r = 0.7 to 0.9: Strong positive
  • r = 0.4 to 0.7: Moderate positive
  • r = 0 to 0.4: Weak positive
  • r = 0: No correlation
  • r = -0.4 to 0: Weak negative
  • r = -0.7 to -0.4: Moderate negative
  • r = -0.9 to -0.7: Strong negative
  • r = -1: Perfect negative correlation

Applications

Science: Identifying relationships between variables

  • Temperature vs. chemical reaction rate
  • Fertilizer amount vs. plant growth

Economics: Analyzing market trends

  • Advertising spend vs. sales revenue
  • Price vs. demand

Medicine: Health studies

  • Exercise vs. blood pressure
  • Age vs. bone density

Sports: Performance analysis

  • Practice time vs. free throw percentage
  • Height vs. rebounds

Social Science: Behavioral studies

  • Study time vs. GPA
  • Sleep hours vs. test performance

Common Errors

Error 1: Assuming correlation means causation

❌ Sales and drownings correlate → ice cream causes drownings
✓ Both have common cause (temperature)

Error 2: Extrapolating too far beyond data

❌ Data from ages 5-18, predict for age 50
✓ Only predict within or near data range

Error 3: Ignoring outliers

❌ Draw trend line through outlier
✓ Consider whether outlier is error or meaningful

Error 4: Wrong scale distorts pattern

❌ Scale makes weak correlation look strong
✓ Use appropriate, honest scale

Tips for Success

Tip 1: Always label axes with units

Tip 2: Use consistent scale (equal spacing)

Tip 3: Look at overall pattern, not individual points

Tip 4: Consider context when interpreting

Tip 5: Question causation claims based only on correlation

Tip 6: Check for outliers and investigate causes

Tip 7: Use scatter plots for continuous numerical data, not categorical

Summary

Scatter plot: Graph of (x, y) points showing relationship between variables

Correlation types:

  • Positive: both increase
  • Negative: one increases, other decreases
  • None: no pattern

Strength: Strong (tight pattern), moderate, weak (scattered)

Trend line: Represents overall pattern, used for prediction

Key warning: Correlation ≠ Causation

Uses: Identify relationships, make predictions, analyze patterns

Practice

Points on a scatter plot trend upward from left to right. This shows:

As hours of TV watched increases, test scores decrease. This is:

Points are very close to the trend line. This indicates:

Using a trend line to predict a value within the data range is called: