Data Representation & Visualisation | GCSE Statistics Notes

📖 30 min read📅 Updated: 9 May 2026

Visualising data is a core skill in Statistics. From simple tally charts to complex histograms with unequal widths, this chapter covers every diagram you need to know for your exam.

Topic 3.1 — Tally Charts & Frequency Tables

Tally charts and frequency tables are the foundational tools for organising raw data into a manageable and interpretable format. A tally chart is a simple method of recording and counting data using tally marks, where every fifth mark is drawn diagonally across the previous four to create a bundle of five. This grouping into fives makes it quick and easy to count large numbers of responses without losing track. Once the data has been tallied, it is typically transferred into a frequency table, which presents the information more formally. A frequency table lists each category or value and its corresponding frequency—the number of times it occurs. For numerical data, this can be extended to grouped frequency tables, where data is organised into class intervals, such as 10–19, 20–29, and so on.

These tables are not merely administrative tools; they are the gateway to further statistical analysis. From a frequency table, you can easily calculate totals, proportions, and percentages. For example, if a frequency table shows that 15 out of 50 students walk to school, you can immediately deduce that 30% of the sample uses this mode of transport. However, the construction of these tables requires care. Categories must be mutually exclusive and exhaustive, ensuring every data point fits into one and only one category. When dealing with grouped data, the choice of class intervals is critical, as intervals that are too wide will obscure detail, while those that are too narrow will fail to summarise effectively.

Topic 3.2 — Pictograms

A pictogram is a form of data visualization that uses pictures or symbols to represent quantities, making it visually engaging and accessible, particularly for non-specialist audiences. Every pictogram must include a key, which specifies what quantity a single symbol represents. For instance, one picture of a car might represent 10 actual vehicles sold. To construct a pictogram, the frequency of each category is translated into the appropriate number of symbols. If a category has a frequency of 35 and the key states one symbol equals 10, you would draw three and a half symbols. The handling of partial symbols is an important skill; a half symbol must be drawn accurately and clearly labelled in the key to avoid misinterpretation.

The primary advantage of a pictogram is its intuitive appeal. However, this very strength is also its greatest weakness. Pictograms can be misleading if the symbols are drawn to different scales or if the size of the symbols varies, as the human eye is drawn to area rather than just number. A common error is to change the size of the symbols for different categories, which distorts the true proportions. Furthermore, pictograms are generally unsuitable for displaying large numbers or precise data, as drawing hundreds of tiny icons becomes impractical and makes comparison difficult. While rarely the most precise tool, understanding pictograms is necessary because they appear in media and public-facing statistics.

Topic 3.3 — Bar Charts (Dual, Multiple, Composite, Percentage)

Bar charts are among the most versatile and widely used tools for representing categorical and discrete data. A simple bar chart uses bars of equal width, with the height of each bar representing the frequency of a category, and crucially, there are gaps between the bars to signify that the categories are distinct and not continuous. Beyond the basic form, there are several specialised types designed for more complex comparisons. Dual or multiple bar charts place two or more bars side-by-side for each category, allowing for direct visual comparison between groups. For example, you might compare the sales of different products across two different years.

A composite, or stacked, bar chart places segments on top of one another to show both the total frequency and the breakdown of that total into its component parts. This is particularly useful for showing proportions, such as the percentage of students achieving different grades in separate classes. A percentage bar chart is a variation of the composite chart where every bar is scaled to represent 100%, making it easy to compare the proportional breakdown across categories regardless of differing totals. Another related form is the bar-line chart, which replaces the bars with horizontal lines, often used for population pyramids or rate data.

Topic 3.4 — Pie Charts & Comparative Pie Charts

A pie chart is a circular graph divided into sectors, where the angle of each sector is proportional to the frequency it represents. To calculate the angle for a sector, you use the formula: (category frequency ÷ total frequency) × 360 degrees. For example, if 30 out of 120 people prefer tea, the sector angle is (30/120) × 360 = 90 degrees. Drawing a pie chart accurately requires a protractor and a sharp pencil. When interpreting a pie chart, you are comparing proportions rather than absolute values. Two categories with the same sector angle have the same proportional share of the total, even if the actual numbers they represent are vastly different.

A comparative pie chart is used to compare the same categories across two or more different groups or time periods. In a standard comparative pie chart, all the pies are drawn to be the same size, meaning the total quantity represented by each pie may be different, but only the proportions are compared. At the Higher tier, students encounter proportional pie charts, where the total quantity is represented by the area of the pie. The radius of each pie is calculated using the formula: r = √(new total / old total) × old radius. This means a pie representing a larger total will have a larger radius, allowing for the comparison of both proportion and overall quantity simultaneously.

Topic 3.5 — Stem and Leaf Diagrams (inc. Back-to-Back)

A stem and leaf diagram is a semi-graphical method for displaying quantitative data that preserves the original data values while showing the shape of the distribution. To construct one, the data is split into a "stem" (typically the leading digit or digits) and a "leaf" (the trailing digit). For the dataset 12, 15, 21, 23, the stem '1' would have leaves '2' and '5', and the stem '2' would have leaves '1' and '3'. A critical rule is that the leaves for each stem must be ordered from smallest to largest. The diagram must also include a key to explain the format. One of the major advantages of a stem and leaf diagram is that the median and range can be found directly without referring back to the raw data.

A back-to-back stem and leaf diagram is used to compare two distributions. The shared stem is placed in the centre, with the leaves for one dataset branching out to the left and the leaves for the other dataset branching out to the right. This allows for immediate visual comparison of the shapes, centres, and spreads of the two distributions. For example, you might compare the heights of boys and girls in a class. From the diagram, you can quickly see which group is generally taller by observing where the bulk of the leaves are concentrated, and you can find the median for each group by counting to the middle value on each side.

Topic 3.6 — Venn Diagrams

Venn diagrams are a visual tool used to illustrate the logical relationships between different sets of data. In GCSE Statistics, they are most commonly used with two or sometimes three sets. Each set is represented by a circle, and the circles overlap to show the intersection between sets. The universal set, which contains all possible elements under consideration, is typically represented by a enclosing rectangle. The key notations to understand are the union (A ∪ B), which represents all elements in set A, set B, or both; the intersection (A ∩ B), which represents only the elements in both sets; and the complement (A'), which represents all the elements not in set A.

Venn diagrams are particularly powerful for solving probability problems. For instance, if you are given the probability of students studying Maths and/or Statistics, a Venn diagram can help you visualise how many study both, how many study only one, and how many study neither. The overlapping region is often the most important, as it can be the most difficult to calculate from word problems alone. When filling in a Venn diagram, it is best to start with the intersection and work outwards. For three-set problems, the central region where all three circles overlap must be addressed first to avoid double-counting.

Topic 3.7 — Line Charts & Time Series

A line chart is a graph that uses points connected by straight lines to show how a quantity changes, typically over a period of time. A time series is a specific type of line chart where the data points are plotted at successive time intervals, such as daily, monthly, or yearly. Constructing a time series involves placing time on the horizontal axis and the variable of interest on the vertical axis. The points are plotted and joined with straight lines, not because the value necessarily changed linearly between the two points, but to help the eye track the progression and identify overall trends.

Interpreting a time series requires looking for several key features. The trend is the long-term direction of the data—whether it is generally increasing, decreasing, or remaining stable. Seasonal patterns are regular, short-term fluctuations that repeat at fixed intervals, such as increased ice cream sales every summer. Cyclic trends are repeating patterns that do not necessarily have a fixed period. The distinction between a line graph and a time series is subtle; all time series are line graphs, but not all line graphs are time series. A line graph might plot height against weight, where neither axis is time.

Topic 3.8 — Scatter Graphs & Line of Best Fit (by Eye)

A scatter graph, or scatter diagram, is the standard way to display bivariate data and visually investigate the relationship between two quantitative variables. The explanatory (independent) variable is plotted on the horizontal x-axis, and the response (dependent) variable is plotted on the vertical y-axis. Once the points are plotted, the pattern they form can reveal the type, strength, and direction of any correlation. A line of best fit is then drawn by eye to summarise the general trend of the data. This line does not have to pass through the origin, nor does it have to pass through any specific data points; rather, it should have roughly an equal number of points above and below it, and it should follow the slope of the main cloud of points.

The line of best fit is not merely a visual aid; it is a predictive tool. By reading up or across from the line, you can estimate a value. If the prediction is made for a value within the range of the existing data, it is called interpolation, and it is generally considered reliable. If the prediction is made for a value outside the range of the data, it is called extrapolation, and it must be treated with extreme caution because the pattern observed within the data may not hold true outside of it. Drawing the line of best fit is a skill that requires practice to ensure it is balanced and represents the overall trend.

Topic 3.9 — Frequency Polygons & Cumulative Frequency Charts

Frequency polygons and cumulative frequency charts are two powerful tools for representing grouped continuous data. A frequency polygon is constructed by plotting the frequency of each class interval at its midpoint and then joining the points with straight lines. Unlike a histogram, the polygon is not just the outline of the bars; it is a distinct line graph that makes it easy to compare two or more distributions on the same axes. The key is to use the midpoint of the class interval on the x-axis, not the upper or lower boundary.

Cumulative frequency charts, on the other hand, are used to show the running total of frequencies. To create one, a cumulative frequency table must first be constructed by adding up the frequencies as you move from the lowest to the highest class interval. The points on the graph are plotted at the upper class boundary against the cumulative frequency. It is absolutely critical to remember that cumulative frequency is always plotted at the upper class boundary, never the midpoint. From the cumulative frequency curve, you can read off important statistical measures: the median is found at the 50th percentile, the lower quartile (Q1) at the 25th percentile, and the upper quartile (Q3) at the 75th percentile. The interquartile range (IQR) can then be calculated as Q3 − Q1.

Topic 3.10 — Box Plots (Box and Whisker Diagrams)

A box plot, or box and whisker diagram, is a graphical representation of the five-number summary of a dataset: the minimum value, the lower quartile (Q1), the median (Q2), the upper quartile (Q3), and the maximum value. The "box" encloses the middle 50% of the data, from Q1 to Q3, with a line inside the box to represent the median. The "whiskers" extend from the box to the minimum and maximum values, showing the full range of the data. Drawing a box plot requires an accurate scale, as the relative positions and lengths of the components convey important information about the distribution.

Box plots are exceptionally useful for comparing distributions. When placed side-by-side, they allow for immediate visual comparison of the central tendency (via the median) and the spread (via the IQR and range). A box that is shifted higher on the scale indicates a higher central value. A longer box indicates greater variability in the middle 50% of the data. When comparing two box plots, you must always reference both a measure of average (the median) and a measure of spread (the IQR). Outliers can also be identified on a box plot; any data point that falls more than 1.5 times the IQR below Q1 or above Q3 is often marked with an individual cross to flag it as an outlier.

Topic 3.11 — Histograms (Equal & Unequal Width) & Frequency Density

Histograms are used to represent continuous data and, unlike bar charts, have no gaps between the bars to reflect the uninterrupted nature of the variable. For equal-width histograms, the frequency is plotted on the vertical axis, and the height of each bar directly represents the frequency. However, when dealing with unequal class widths, plotting frequency on the y-axis would be misleading because the area of the bars would no longer be proportional to the frequency. To solve this, we use frequency density.

Frequency Density is defined by the formula: Frequency Density = Frequency ÷ Class Width. The frequency of a class can be found by calculating the area of the bar: Frequency = Frequency Density × Class Width. On a histogram with unequal widths, the vertical axis is labelled "Frequency Density," and it is the area of the bar, not its height, that corresponds to the frequency. One of the most common and costly errors is using frequency on the y-axis for unequal-width intervals, which creates a distorted representation of the data.

Topic 3.12 — Population Pyramids & Choropleth Maps

Population pyramids and choropleth maps are specialised visualisation tools for demographic and geographic data. A population pyramid is a type of bar chart that shows the distribution of various age groups in a population. It typically has the percentage or number of males on the left side and females on the right side, with age groups arranged in cohorts along the central vertical axis. The shape of a population pyramid reveals a great deal about a region's demographics. A wide base indicates high birth rates and a young population, while a narrow base and bulging top suggest an aging population with low birth rates.

A choropleth map uses shading or colour to represent data values across different geographical regions. Darker shades typically represent higher values, and lighter shades represent lower values, with a clear key or scale explaining the correspondence. Choropleth maps are excellent for showing regional patterns at a glance, such as crime rates by county or election results by constituency. However, they have a significant limitation: they assume that the data is uniformly distributed within each region. When interpreting these diagrams, you must be able to describe their features, draw conclusions about the underlying population or geography, and critique their limitations.

Topic 3.13 — Graphical Misrepresentation (Misleading Graphs)

Graphical misrepresentation, or the use of misleading graphs, is a critical topic because it develops a student's ability to think critically about visual information. One of the most common techniques of distortion is the truncated y-axis, where the vertical axis does not start at zero. This exaggerates the differences between values, making a small increase look like a dramatic surge. Other misleading techniques include using inconsistent scale intervals on the axes, which can distort the perceived rate of change, and using 3D effects or distorted imagery where the area or volume of the shape increases more than the value it represents.

Another common error is omitting labels, titles, or units, which leaves the reader without the necessary context to interpret the data correctly. At the Higher tier, students must also be aware of the misuse of frequency density in histograms, such as accidentally plotting frequency on the y-axis for unequal class widths. When asked to critique a graph, it is not enough to say it is "misleading." You must identify the specific graphical feature, explain the effect it has, and describe how it distorts the true picture.

Topic 3.14 — Choosing & Justifying the Right Chart/Graph

The ability to select and justify the most appropriate diagram for a given dataset is a high-level skill. The choice of chart depends entirely on the type of data and the purpose of the visualisation. For categorical data, where we are showing separate groups (e.g., favourite colour), a bar chart, pie chart, or pictogram is appropriate. A bar chart is best for comparing the sizes of categories, a pie chart for showing the proportion of a whole, and a pictogram for engaging a non-technical audience. For continuous grouped data, such as height or weight, a histogram or frequency polygon is required to show the shape of the distribution.

When investigating the relationship between two quantitative variables, a scatter graph is the only correct choice, as it reveals correlation. To show change over time, a time series or line graph is used. When comparing two distributions directly, back-to-back stem and leaf diagrams or comparative box plots are superior because they allow for the comparison of both average and spread. In examinations, the command word "justify" requires you to explain your choice by linking the properties of the graph to the nature of the data. For example: "I would use a histogram because the data is continuous and grouped into unequal class widths, so frequency density is needed to represent the distribution fairly."

Frequently Asked Questions

What is frequency density and why is it used?▼

Frequency density = Frequency ÷ Class Width. It is used in histograms with unequal class widths to ensure that the AREA of the bar, rather than its height, represents the frequency. This prevents the graph from being misleading.

Where should points be plotted on a cumulative frequency graph?▼

Points on a cumulative frequency graph must ALWAYS be plotted at the upper class boundary of the interval. Plotting at the midpoint is a common mistake.

What is the difference between a bar chart and a histogram?▼

Bar charts have gaps between bars and are used for categorical or discrete data. Histograms have no gaps and are used for continuous data. In histograms, it is the area of the bars that represents frequency.

What makes a graph misleading?▼

Common misleading features include a truncated y-axis (not starting at zero), inconsistent scales, 3D effects that distort area, and missing labels or keys.

Chapter 3: Data Representation & Visualisation