Line Graphs and Scatter Plots

Selecting the correct chart type is the most critical decision in data visualization. The best chart makes your message obvious, while the wrong one can actively mislead your audience. Here we break down two of the most commonly confused chart types: the Line Graph and the Scatter Plot.


1. The Line Graph: Connecting the Dots of Time

A Line Graph is the gold standard for displaying continuous change and trends.

Primary Use and Visual Metaphor

  • Purpose: To show how a quantitative value evolves continuously over an ordered, often temporal, scale.

  • Key Feature: Points are connected. This connection is not merely decorative; it carries a deep visual metaphor—the line represents the measurable movement or transition between data points.

  • Ideal Data: Data must be continuous, sequential, or measured over fixed intervals of time (e.g., stock prices, temperature over 24 hours, sales growth over quarters).



The Cardinal Rule

  • The line graph should only be used when the data is continuous, not random or interchangeable. Connecting points that represent unrelated categories (like "Sales by Country" connected by a line) implies a sequential relationship that does not exist, which is misleading.





2. The Scatter Plot: Mapping Relationships and Distribution

A Scatter Plot is used to explore the relationship (or correlation) between two quantitative variables.

Primary Use and Data Structure

  • Purpose: To show the distribution and correlation between two variables, typically X and Y.

  • Key Feature: Data are not a series. The position of each individual point is what matters, not the line between them.

  • Ideal Data: Often used with stochastic data (data that has a random probability distribution) where each point represents an independent sample. There may be more than one Y-value for a single X-value, as is typical when plotting individual samples (e.g., Height vs. Weight for every person in a study).



 Watch Out for Over-plotting!

The primary challenge with scatter plots is over-plotting: when multiple data points occupy the exact or very similar coordinates, they layer on top of each other, obscuring the information about data density.

One Simple Trick: Transparency (The Alpha Value)

The most effective way to address over-plotting is by using transparency.

  • Technique: Set the color of the points to be partly see-through (known as setting the alpha value of the color to less than 1).

  • Result: When individual points overlap, the color in those dense areas will look stronger and darker. This visual effect effectively turns density into a new visual variable, allowing the audience to accurately gauge which areas of the chart have the most data points.



Comments

Popular Posts