5 Examples of GROUP BY
When you start learning SQL, you quickly come across the GROUP BY clause. Data grouping—or data aggregation—is an important concept in the world of databases. In this article, we’ll demonstrate how you can use the GROUP BY clause in practice. We’ve gathered five GROUP BY examples, from easier to more complex ones so you can see data grouping in a real-life scenario. As a bonus, you’ll also learn a bit about aggregate functions and the HAVING clause.
SQL is a universal language to talk to databases that has been around for almost 50 years. If you’re a complete beginner, consider taking our SQL Basics course before reading this article.
One of the core concepts behind SQL is data grouping, or data aggregation. If you’re reading this article, you’ve probably already heard about the GROUP BY
clause. To help you understand it better, we’ve presented five business problems and showed how they can be solved in our GROUP BY
examples.
If you need a quick introduction to GROUP BY
, watch our five-minute YouTube video.
In this article, we’re going to help an imaginary museum analyze their guests. We’ll use the history of visits to the museum to derive meaningful insights using the GROUP BY
clause. Let’s get down to work, then!
Mục Lục
Input Data
We’re going to work with a single table named visit
. Each row represents a single visit to the museum. Below, you can see a few sample rows from this table:
visit
datepriceduration
2020-05-01215
2020-05-01437
2020-05-06724
…
As you can see, the table is not very complicated. It only contains three columns:
- date—The date of the visit to the museum.
- price—The price paid for the ticket in dollars (you will see a variety of prices because the museum uses different pricing options on various days along with many types of reduced tickets).
- duration—The duration of the visit to the museum in minutes.
By the way, if you already know basic SQL statements and want to try creating tables instead, take a look at this LearnSQL.com Table Basics course.
Why Do We Group Rows?
We know that we can aggregate (group) rows in SQL, but why do we do that? The GROUP BY
clause is typically used alongside aggregate functions, which compute various statistics about the groups of rows. The five most basic aggregate functions in SQL are:
COUNT()
—Used to count the number of rows.AVG()
—Used to find the average value.MIN()
andMAX()
—Used to find the minimum and maximum value, respectively.SUM()
—Used to find the sum of all values.
In short, we group rows to compute various statistics.
GROUP BY Examples
Good. Now that we know a bit about aggregate functions, let’s take a look at five GROUP BY examples.
Example 1: GROUP BY With One Column
We’ll start with a simple example. We want to find out how many people visited the museum on each day. In other words, for each date, we’ll show the number of visits to the museum. The query we need will look like this:
SELECT date, COUNT(*) FROM visit GROUP BY date;
We only have two columns: date and count. COUNT(*) means “count everything.” Because we also use the date column in the GROUP BY clause, we’ll see a separate count for each date. When we run the query in our database, we should see something like this:
datecount
2020-06-297
2020-05-236
2020-06-235
…
Excellent. We now know how many people visited the museum on each day.
Instead of COUNT(*)
, which means “count every row,” we could also use a column name inside, e.g., COUNT(duration)
. The difference is that COUNT(*)
counts all rows for a given group even if some rows contain NULL (unknown) values. COUNT(duration)
, on the other hand, only counts those rows that have a non-NULL value in the duration
column. In our table, however, there are no NULL values, so the result would be the same.
Example 2: GROUP BY With Two Columns
We now want to know the average price paid for a ticket in a given month. For this, we’ll need a more complicated query. Take a look:
SELECT EXTRACT(YEAR FROM date) AS year, EXTRACT(MONTH FROM date) AS month, ROUND(AVG(price), 2) AS avg_price FROM visit GROUP BY EXTRACT(YEAR FROM date), EXTRACT(MONTH FROM date);
We used the EXTRACT(YEAR FROM date)
function to get the year from each date, and we used EXTRACT(MONTH FROM date)
to get the month (as a numerical value, where “1” means “January,” “2” means “February,” etc.). Note that they are treated as separate columns, so you’ll see the year in one column and the month in another one.
Because we want to see the average for each month of each year separately, we also need to group by these two columns. We have to repeat the same functions in the GROUP BY
clause.
The third column is a combination of two functions. Inside, we’ve got AVG(price)
, which will calculate the average price in each group. We also have ROUND(AVG(price), 2))
to round the average value to two decimal places.
When you use a GROUP BY
clause, try to remember the golden rule: All column names from the SELECT
clause should either appear in the GROUP BY
clause or be used in the aggregate functions. In this case, both EXTRACT(YEAR FROM date)
and EXTRACT(MONTH FROM date)
should appear in the GROUP BY
clause. If you forget about one of them, you’ll probably see an error. The third column uses an aggregate function, AVG(price)
, so it’s not mentioned in the GROUP BY
clause.
There are some exceptions to this rule, and they may lead to unexpected behavior.
When we run the query, we’ll see something like this:
yearmonthavg_price
202057.52
202066.70
As you can see, the average ticket price decreased in June, compared to May. This could translate into lower income for the museum.
Example 3: GROUP BY and ORDER BY
This time, we want to find the average visit duration value for each month. We also want to make sure that the rows are sorted chronologically. The query we’ll need will be similar to the previous example:
SELECT EXTRACT(YEAR FROM date) AS year, EXTRACT(MONTH FROM date) AS month, ROUND(AVG(duration), 2) FROM visit GROUP BY EXTRACT(YEAR FROM date), EXTRACT(MONTH FROM date) ORDER BY EXTRACT(YEAR FROM date), EXTRACT(MONTH FROM date);
The new piece here is the ORDER BY
clause. An ORDER BY
clause lets us specify the order in which we should see the rows. In this case, we want to see all rows sorted first by the year, and then by the month.
Again, we need to repeat the same functions from the SELECT
clause for the ORDER BY
clause to work. By default, ORDER BY
sorts the rows in the ascending order.
If you want to see rows sorted in descending order, you need to add the DESC
keyword after the column name. For example, write ORDER BY EXTRACT(YEAR FROM date) DESC
. You can read more about the difference between GROUP BY
and ORDER BY
in this article.
When we run the query, we’ll see something like this:
yearmonthavg_duration
2020547.61
2020651.33
On average, a guest spent more time in the museum in June than in May. That’s good news!
Example 4: GROUP BY and HAVING
Now, we have the following problem: we want to see the average ticket price for each day. However, there’s an extra condition: we don’t want to show days with 3 or fewer visits. This condition translates into a new piece in our SQL query. Take a look:
SELECT date, ROUND(AVG(price), 2) AS avg_price FROM visit GROUP BY date HAVING COUNT(*) > 3 ORDER BY date;
The new part here is HAVING COUNT(*) > 3
. HAVING
is a clause we can use to filter on the grouped rows. In this case, we group rows by the date (GROUP BY date
). When we do that, we want to make sure that a given group has more than three rows (HAVING COUNT(*) > 3
). If a group (in this case, visits on a given day) doesn’t fulfill this condition, we don’t show it at all.
When we run the query, we’ll see something like this:
dateavg_price
2020-05-015.80
2020-05-157.00
2020-05-236.67
…
Example 5: GROUP BY, HAVING, and WHERE
Finally, we have the following problem to solve: we want to show the average visit duration for each day. Again, we only want to show days with more than three visits. However, we also want to make sure that visits which are five minutes long or shorter are not included in the calculations. These are most probably tests performed by the museum employees, so we want to ignore them. This is the query we’ll need:
SELECT date, ROUND(AVG(duration), 2) AS avg_duration FROM visit WHERE duration > 5 GROUP BY date HAVING COUNT(*) > 3 ORDER BY date;
The new part here is the WHERE
clause. It is used to only include visits that lasted more than five minutes.
The WHERE
and HAVING
clauses may look similar, but there is a difference between them: WHERE
is used to filter single rows before they are grouped (i.e., single visits) while HAVING
is used to filter groups of rows (i.e., visits on a given day). Read more in this article.
When we run the query, we’ll see something like this:
dateavg_duration
2020-05-0129.80
2020-05-1555.75
2020-05-2332.17
2020-05-2969.50
2020-06-0239.83
2020-06-0448.67
2020-06-0948.50
2020-06-2351.60
2020-06-2957.86
Note how the average visit duration increases with almost every day in June. It looks like we added aninteresting exhibition and our guests spread the word about it.
Summary and Follow-Up
With these five examples, we went from easy to more complex GROUP BY
cases. The versatility of SQL allowed us to analyze the museum visits and answer several questions about them. This shows how effectively GROUP BY
can solve real business problems.
If you’d like to read more about the GROUP BY clause, our Chief Content Editor, Agnieszka, prepared a comprehensive article available here.
If you feel like learning more SQL, have a look at LearnSQL.com. LearnSQL.com team teaches SQL from scratch in a completely interactive manner.
For beginners, we have our best-selling SQL Basics course. We guarantee a smooth entry into the world of coding for people with no prior experience in IT. You won’t need to worry about the technical setup—you will study straight from your web browser. We take care of the database while you focus on the key concepts of SQL.
If you want to know more about the benefits of learning with us before you purchase a subscription, take a look at our article: Why Take the SQL Basics Course at LearnSQL.com. This article explains in detail why we built the course, what it contains, and the philosophy behind it.