How to Work with the SQL GROUP BY Clause | 365 Data Science
The SQL GROUP BY clause could seem intuitive at the beginning, but understanding it is a must when writing more advanced SQL queries. Mastering this feature will make your life easier; without it, you can end up in deep trouble. So, please be very careful!
Side note: If you want to learn another feature that will make it easier for you to code, you can check the tutorial on the ORDER BY clause.
Mục Lục
How to Group Results in SQL?
When working in SQL, results can be grouped according to a specific field or fields. Technically, this is done by using the SQL GROUP BY clause, which must be placed immediately after the WHERE conditions, if any, and just before the ORDER BY clause. Remember SQL GROUP BY is one of its most powerful and useful tools!
The syntax to comply with is:
Important: The GROUP BY clause is located just above the ORDER BY clause. An Example:
First, what will happen if we select the “first_name” column from the “employees” table? I am sure you can tell that we will obtain an unsorted list of the names registered in the database.
Side note: If you haven’t downloaded the ‘employees’ database, you can see how to do it in the linked tutorial.
Let’s check what’s going to happen when we GROUP the output BY first name.
SELECT
first_name
FROM
employees
GROUP BY first_name;
The result will be shorter, as only distinct values will be selected. If a value within the “first_name” field appears in the table more than once, by using GROUP BY, it will be displayed only in a single row.
An Alternative
Practically speaking, this is the same as if we used the well-known SELECT DISTINCT structure. The only difference is, this time, the selected rows were sorted in a different way, as shown in the picture below.
In addition, GROUP BY was slower than SELECT DISTINCT. Therefore, why would we need this GROUP BY statement at all?
Why We Need the SQL GROUP BY Clause?
Knowing how to use a SQL GROUP BY statement whenever you have aggregate functions is essential. In most cases, when you need an aggregate function, you must add a GROUP BY clause in your query too. Assume you need a list composed of two fields. The first must contain a distinct first name of the employee and the second – the number of times this name is encountered in our database.
Using it with the COUNT() Function
The SELECT COUNT() function will do the job just fine. In order to get the total value of records in this table, we can type:
SELECT
COUNT(first_name)
FROM
employees
Then, to split the result returned from the SELECT statement into groups, we can type.
GROUP BY (first_name);
In the column from the picture above, we see the number of times each name is encountered. But we don’t see the names these values refer to.
Here is a rule of thumb professionals strictly comply with:
Important: Always include the field you have grouped your results by in the SELECT statement!
Let’s do that. We can insert “first_name” just before the COUNT part. Re-run to obtain a clear output with a straightforward interpretation!
Aamer can be seen 228 times, Aamod – 216, and so on. Avoiding Mistakes
This rule is crucial because, in Workbench, as you just saw, the query would run properly if you don’t include the grouped-by field in the SELECT statement, but this will not be valid in some other databases. There, it will be impossible to execute the query if written without the group-by column in the SELECT statement. So, please stick to this simple rule. It also improves the organization and readability of your output. Make this a habit from now on, and you will avoid facing messy situations when coding.
How to Combine SQL GROUP BY with SQL ORDER BY?
Imagine you wanted to sort the entire outcome by first name in descending order. The code will look like this:
SELECT
first_name, COUNT(first_name)
FROM
employees
GROUP BY first_name
ORDER BY first_name DESC;
This last piece of information was an important addition to the content of this tutorial. Not all blocks of code are mandatory, but you must get used to the order in which you state these blocks in the query. Remember the following logical flow:
SELECT something FROM a certain table WHERE certain conditions are met. GROUP the results BY a column and, possibly, ORDER them in a certain direction.
The Problems with the Output Section
We retrieved all distinct first names from the “employees” table and their corresponding amount.
Please, take a look at the output section.
Observe how the columns bear the names of the fields they have been extracted from and the function applied – the second column is called “COUNT(first_name)”.
There are two problems with the name of that second column:
- It is not professional to leave the function keyword in the column names of the output.
- It would be much better to see a name that explains the reasoning behind using this aggregate function.
The Solution
Both problems can be solved using aliases.
In our example, the first_name is one selection, and the counted number of first names in each group is another selection.
By using the keyword AS, you could rename a selection.
How to Use Aliases
Therefore, if you rewrite the code in the following way:
SELECT
first_name, COUNT(first_name) AS names_count
FROM
employees
GROUP BY first_name
ORDER BY first_name;
When you finally execute the code, you will see the second column name in your result to be renamed as “names_count”.
Well, if someone, who does not have a clue about your analysis observes this output now, they could immediately understand that the left-hand column contains the names of some people, while the right-hand column delivers information about the number of times each name appears in our records.
What Are the Benefits of Grouping Data?
In conclusion, we didn’t say that the SQL GROUP BY clause is one of the most powerful tools out there for no reason. It is preferred over the SELECT DISTINCT statement because it can be combined with aggregate functions. You can also use it with SQL ORDER BY. However, you must make sure that you keep the right order when writing it. Moreover, we found out why aliases can be helpful. So, using them does not actually change your output. But this technique is constantly being applied in queries, as it clarifies the analysis undertaken. If you are interested in learning about what else you can combine with the GROUP BY statement, you can learn all about the HAVING clause.
***
Eager to hone your SQL skills? Take our SQL course.
Next Tutorial: Whether To Use WHERE Or HAVING