How To Use GROUP BY and ORDER BY in SQL | DigitalOcean

Structured Query Language (SQL) databases can store and manage a lot of data across numerous tables. With large data sets, it’s important to understand how to sort data, especially for analyzing result sets or organizing data for reports or external communications.

Two common statements in SQL that help with sorting your data are GROUP BY and ORDER BY. A GROUP BY statement sorts data by grouping it based on column(s) you specify in the query and is used with aggregate functions. An ORDER BY allows you to organize result sets alphabetically or numerically and in ascending or descending order.

In this tutorial, you will sort query results in SQL using the GROUP BY and ORDER BY statements. You’ll also practice implementing aggregate functions and the WHERE clause in your queries to sort the results even further.

To follow this guide, you will need a computer running some type of relational database management system (RDBMS) that uses SQL. The instructions and examples in this tutorial were validated using the following environment:

  • A server running Ubuntu 20.04, with a non-root user with sudo administrative privileges and firewall enabled. Follow our Initial Server Setup with Ubuntu 20.04 to get started.
  • MySQL installed and secured on the server. Follow our How To Install MySQL on Ubuntu 20.04 guide to set this up. This guide assumes you’ve also set up a non-root MySQL user, as outlined in Step 3 of this guide.

Note: Please note that many relational database management systems use their own unique implementations of SQL. Although the commands outlined in this tutorial will work on most RDBMSs, the exact syntax or output may differ if you test them on a system other than MySQL.

To practice sorting data results in this tutorial, you’ll need a database and table loaded with sample data. If you do not have one ready to insert, you can read the following Connecting to MySQL and Setting up a Sample Database section to learn how to create a database and table. This tutorial will refer to this sample database and table throughout.

If your SQL database runs on a remote server, SSH into your server from your local machine:

  1. ssh

    sammy@your_server_ip

Next, open the MySQL prompt, replacing sammy with your MySQL user account information:

  1. mysql

    -u

    sammy

    -p

Create a database named movieDB:

  1. CREATE DATABASE movieDB

    ;

If the database was created successfully, you’ll receive the following output:

Output

Query OK, 1 row affected (0.01 sec)

To select the movieDB database run the following USE statement:

  1. USE movieDB

    ;

Output

Database changed

After selecting the database, create a table within it. For this tutorial’s example, we’ll create a table that stores information about a local movie theater’s showings. This table will hold the following seven columns:

  • theater_id: stores values of the int data type for each theater’s showing rooms, and will serve as the table’s primary key, meaning each value in this column will function as a unique identifier for its respective row.
  • date: uses the DATE data type to store the specific date by the year, month, and day a movie was shown. This data type adheres to the following parameters: four digits for the year, and a maximum of two digits for the month and day (YYYY-MM-DD).
  • time: represents the movie’s scheduled showing with the TIME data type by hours, minutes, and seconds (HH:MM:SS).
  • movie_name: stores the movie’s name using the varchar data type with a maximum of 40 characters.
  • movie_genre: uses the varchar data type with a maximum of 30 characters, to hold information on each movie’s respective genre.
  • guest_total: shows the total number of guests that attended a movie showing with the int data type.
  • ticket_cost: uses the decimal data type, with a precision of four and a scale of one, meaning values in this column can have four digits, and two digits to the right of the decimal point. This column represents the ticket cost for the specific movie showing.

Create a table named movie_theater that contains each of these columns by running the following CREATE TABLE command:

  1. CREATE TABLE movie_theater

    (

  2. theater_id int,
  3. date

    DATE,
  4. time

    TIME,
  5. movie_name varchar

    (

    40

    )

    ,
  6. movie_genre varchar

    (

    30

    )

    ,
  7. guest_total int,
  8. ticket_cost decimal

    (

    4,2

    )

    ,
  9. PRIMARY KEY

    (

    theater_id

    )

  10. )

    ;

Next, insert some sample data into the empty table:

  1. INSERT INTO movie_theater
  2. (

    theater_id, date, time, movie_name, movie_genre, guest_total, ticket_cost

    )

  3. VALUES
  4. (

    1

    ,

    '2022-05-27'

    ,

    '10:00:00'

    ,

    'Top Gun Maverick'

    ,

    'Action'

    ,

    131

    ,

    18.00

    )

    ,
  5. (

    2

    ,

    '2022-05-27'

    ,

    '10:00:00'

    ,

    'Downton Abbey A New Era'

    ,

    'Drama'

    ,

    90

    ,

    18.00

    )

    ,
  6. (

    3

    ,

    '2022-05-27'

    ,

    '10:00:00'

    ,

    'Men'

    ,

    'Horror'

    ,

    100

    ,

    18.00

    )

    ,
  7. (

    4

    ,

    '2022-05-27'

    ,

    '10:00:00'

    ,

    'The Bad Guys'

    ,

    'Animation'

    ,

    83

    ,

    18.00

    )

    ,
  8. (

    5

    ,

    '2022-05-28'

    ,

    '09:00:00'

    ,

    'Top Gun Maverick'

    ,

    'Action'

    ,

    112

    ,

    8.00

    )

    ,
  9. (

    6

    ,

    '2022-05-28'

    ,

    '09:00:00'

    ,

    'Downton Abbey A New Era'

    ,

    'Drama'

    ,

    137

    ,

    8.00

    )

    ,
  10. (

    7

    ,

    '2022-05-28'

    ,

    '09:00:00'

    ,

    'Men'

    ,

    'Horror'

    ,

    25

    ,

    8.00

    )

    ,
  11. (

    8

    ,

    '2022-05-28'

    ,

    '09:00:00'

    ,

    'The Bad Guys'

    ,

    'Animation'

    ,

    142

    ,

    8.00

    )

    ,
  12. (

    9

    ,

    '2022-05-28'

    ,

    '05:00:00'

    ,

    'Top Gun Maverick'

    ,

    'Action'

    ,

    150

    ,

    13.00

    )

    ,
  13. (

    10

    ,

    '2022-05-28'

    ,

    '05:00:00'

    ,

    'Downton Abbey A New Era'

    ,

    'Drama'

    ,

    118

    ,

    13.00

    )

    ,
  14. (

    11

    ,

    '2022-05-28'

    ,

    '05:00:00'

    ,

    'Men'

    ,

    'Horror'

    ,

    88

    ,

    13.00

    )

    ,
  15. (

    12

    ,

    '2022-05-28'

    ,

    '05:00:00'

    ,

    'The Bad Guys'

    ,

    'Animation'

    ,

    130

    ,

    13.00

    )

    ;

Output

Query OK, 12 rows affected (0.00 sec) Records: 12 Duplicates: 0 Warnings: 0

Once you’ve inserted the data, you’re ready to start sorting query results in SQL.

The function of a GROUP BY statement is to group records with shared values. A GROUP BY statement is always used with an aggregate function in a query. As you may recall, an aggregate function summarizes information and returns a single result. For instance, you can query for the total count or sum of a column and this will produce a single value in your result. With a GROUP BY clause, you can implement the aggregate function to get one result value for each group you desire.

GROUP BY is useful for returning multiple desired results sorted by your specified group(s), rather than solely one column. Additionally, GROUP BY must always come after the FROM statement and the WHERE clause, if you choose to use one. Here’s an example of how a query with a GROUP BY and aggregate function is structured:

GROUP BY syntax

SELECT column_1, AGGREGATE_FUNCTION(column_2) FROM table GROUP BY column_1;

To illustrate how you can use GROUP BY statements, say you’re leading the campaign for several movie releases, and you want to evaluate the success of your marketing efforts. You ask a local theater to share the data they collected from guests on Friday and Saturday. Start by reviewing the data by running SELECT and the * symbol to select “every column” from the movie_theater table:

  1. SELECT * FROM movie_theater

    ;

Output

+------------+------------+----------+-------------------------+-------------+-------------+-------------+ | theater_id | date | time | movie_name | movie_genre | guest_total | ticket_cost | +------------+------------+----------+-------------------------+-------------+-------------+-------------+ | 1 | 2022-05-27 | 10:00:00 | Top Gun Maverick | Action | 131 | 18.00 | | 2 | 2022-05-27 | 10:00:00 | Downton Abbey A New Era | Drama | 90 | 18.00 | | 3 | 2022-05-27 | 10:00:00 | Men | Horror | 100 | 18.00 | | 4 | 2022-05-27 | 10:00:00 | The Bad Guys | Animation | 83 | 18.00 | | 5 | 2022-05-28 | 09:00:00 | Top Gun Maverick | Action | 112 | 8.00 | | 6 | 2022-05-28 | 09:00:00 | Downton Abbey A New Era | Drama | 137 | 8.00 | | 7 | 2022-05-28 | 09:00:00 | Men | Horror | 25 | 8.00 | | 8 | 2022-05-28 | 09:00:00 | The Bad Guys | Animation | 142 | 8.00 | | 9 | 2022-05-28 | 05:00:00 | Top Gun Maverick | Action | 150 | 13.00 | | 10 | 2022-05-28 | 05:00:00 | Downton Abbey A New Era | Drama | 118 | 13.00 | | 11 | 2022-05-28 | 05:00:00 | Men | Horror | 88 | 13.00 | | 12 | 2022-05-28 | 05:00:00 | The Bad Guys | Animation | 130 | 13.00 | +------------+------------+----------+-------------------------+-------------+-------------+-------------+ 12 rows in set (0.00 sec)

While this data is helpful, you want to perform a deeper assessment and sort the results for some specific columns.

Since you worked on movies across a few different genres, you’re interested in knowing how well-received they were by movie-goers. Specifically, you want to know the average amount of people that watched each movie genre. Use SELECT to retrieve the various types of movies from the movie_genre column. Then apply the aggregate function AVG on the guest_total column, use AS to create an alias for a column called average, and include the GROUP BY statement to group results by movie_genre. Grouping them this way will provide you with the average results for each movie genre:

  1. SELECT movie_genre, AVG

    (

    guest_total

    )

    AS average
  2. FROM movie_theater
  3. GROUP BY movie_genre

    ;

Output

+-------------+----------+ | movie_genre | average | +-------------+----------+ | Action | 131.0000 | | Drama | 115.0000 | | Horror | 71.0000 | | Animation | 118.3333 | +-------------+----------+ 4 rows in set (0.00 sec)

This output provides the four averages for each genre within the movie_genre group. Based on this information, Action movies attracted the highest average number of guests per showing.

Next, let’s say you want to measure the theater’s revenues over two separate days. The following query returns values from the date column, as well as values returned by the SUM aggregate function. Specifically, the aggregate function SUM will enclose a mathematical equation in parentheses to multiply (using the * operator) the number of total guests by the cost of a ticket, represented as: SUM(guest_total * ticket_cost). This query includes the AS clause to provide the alias total_revenue for the column returned by the aggregate function. Then complete the query with the GROUP BY statement to group the query results by the date column:

  1. SELECT date, SUM

    (

    guest_total * ticket_cost

    )

  2. AS total_revenue
  3. FROM movie_theater
  4. GROUP BY

    date

    ;

Output

+------------+---------------+ | date | total_revenue | +------------+---------------+ | 2022-05-27 | 7272.00 | | 2022-05-28 | 9646.00 | +------------+---------------+ 2 rows in set (0.00 sec)

Since you used GROUP BY to group the date column, your output provides the results for the total revenue in ticket sales for each day, in this case, $7,272 for Friday, May 27, and $9,646 for Saturday, May, 28.

Now imagine you want to focus on and analyze one movie: The Bad Guys. In this scenario, you want to figure out how timing and price points impact a family’s choice to watch an animated film. For this query use the aggregate function MAX to retrieve the maximum ticket_cost, making sure to include AS to create the alias for the price_data column. After, use the WHERE clause to narrow down the results by movie_name to solely “The Bad Guys”, and use AND to also determine the most popular movie times based on guest_total numbers that were more than 100 with the comparison operator >. Then complete the query with the GROUP BY statement and group it by time:

  1. SELECT time, MAX

    (

    ticket_cost

    )

    AS price_data
  2. FROM movie_theater
  3. WHERE movie_name

    =

    "The Bad Guys"

  4. AND guest_total

    >

    100

  5. GROUP BY

    time

    ;

Output

+----------+------------+ | time | price_data | +----------+------------+ | 09:00:00 | 8.00 | | 05:00:00 | 13.00 | +----------+------------+ 2 rows in set (0.00 sec)

According to this output, more guests attended The Bad Guys movie at the early matinee time of 9:00 am, which had the more affordable price point of $8.00 per ticket. However, these results also show movie guests paid the higher ticket price of $13.00 at 5:00 pm, suggesting that families prefer showings that aren’t too late in the day and will pay a bit more for a ticket. This seems to be a fair assessment when compared to the 10:00 pm time when The Bad Guys movie only had 83 guests and the price per ticket was $18.00. This can be helpful information to provide the movie theater manager with evidence that opening more matinee and early evening time slots can increase the attendance for families that are making a choice based on a preferred time and price point.

Please note that even though GROUP BY is almost always used with an aggregate function, there can be exceptions, although unlikely. However, if you did want to group your results without an aggregate function, you can use the DISTINCT statement to achieve the same result. A DISTINCT clause removes any duplicates in a result set by returning the unique values in the column, and it can only be used with a SELECT statement. For example, if you wanted to group all the movies together by name, you could do so with the following query:

  1. SELECT DISTINCT movie_name FROM movie_theater

    ;

Output

+-------------------------+ | movie_name | +-------------------------+ | Top Gun Maverick | | Downton Abbey A New Era | | Men | | The Bad Guys | +-------------------------+ 4 rows in set (0.00 sec)

As you recall from viewing all the data in the table, there were duplicates of the movie names since there were multiple showings. Therefore, DISTINCT removed those duplicates and effectively grouped the unique values under the single column movie_name. This is effectively identical to the following query, which includes a GROUP BY statement:

  1. SELECT movie_name FROM movie_theater GROUP BY movie_name

    ;

Now that you’ve practiced using GROUP BY with aggregate functions, next you’ll learn how to sort your query results with the ORDER BY statement.

The function of the ORDER BY statement is to sort results in ascending or descending order based on the column(s) you specify in the query. Depending on the data type stored by the column you specify after it, ORDER BY will organize them in alphabetical or numerical order. By default, ORDER BY will sort results in ascending order; if you prefer descending order, however, you have to include the keyword DESC in your query. You can also use the ORDER BY statement with GROUP BY, but it must come after in order to function properly. Similar to GROUP BY, ORDER BY must also come after the FROM statement and WHERE clause. The general syntax for using ORDER BY is as follows:

ORDER BY syntax

SELECT column_1, column_2 FROM table ORDER BY column_1;

Let’s continue with the sample data for the movie theater and practice sorting results with ORDER BY. Begin with the following query which retrieves values from the guest_total column and organizes those numerical values with an ORDER BY statement:

  1. SELECT guest_total FROM movie_theater
  2. ORDER BY guest_total

    ;

Output

+-------------+ | guest_total | +-------------+ | 25 | | 83 | | 88 | | 90 | | 100 | | 112 | | 118 | | 130 | | 131 | | 137 | | 142 | | 150 | +-------------+ 12 rows in set (0.00 sec)

Since your query specified a column with numerical values, the ORDER BY statement organized the results by numerical and ascending order, starting with 25 under the guest_total column.

If you preferred to order the column in descending order, you would add the DESC keyword at the end of the query. Additionally, if you wanted to order the data by the character values under movie_name, you would specify that in your query. Let’s perform that type of query using ORDER BY to order the movie_name column with character values in descending order. Sort the results even further by including a WHERE clause to retrieve the data on movies showing at 10:00 pm from the time column:

  1. SELECT movie_name FROM movie_theater
  2. WHERE

    time

    =

    '10:00:00'

  3. ORDER BY movie_name DESC

    ;

Output

+-------------------------+ | movie_name | +-------------------------+ | Top Gun Maverick | | The Bad Guys | | Men | | Downton Abbey A New Era | +-------------------------+ 4 rows in set (0.01 sec)

This result set lists the four different movie showings at 10:00 pm in descending alphabetical order, starting from Top Gun Maverick to Downtown Abbey A New Era.

For this next query, combine the ORDER BY and GROUP BY statements with the aggregate function SUM to generate results on the total revenue received for each movie. However, let’s say the movie theater miscounted the total guests and forgot to include special parties that had pre-purchased and reserved tickets for a group of 12 people at each showing.

In this query use SUM and include the additional 12 guests at each movie showing by implementing the operator for addition + and then adding 12 to the guest_total. Make sure to enclose this in parenthesis. Then, multiply this total by the ticket_cost with the operator *, and complete the mathematical equation by closing the parenthesis at the end. Add the AS clause to create the alias for the new column titled total_revenue. Then, use GROUP BY to group total_revenue results for each movie based on the data retrieved from the movie_name column. Lastly, use ORDER BY to organize the results under the new column total_revenue in ascending order:

  1. SELECT movie_name, SUM

    ((

    guest_total +

    12

    )

    * ticket_cost

    )

  2. AS total_revenue
  3. FROM movie_theater
  4. GROUP BY movie_name
  5. ORDER BY total_revenue

    ;

Output

+-------------------------+---------------+ | movie_name | total_revenue | +-------------------------+---------------+ | Men | 3612.00 | | Downton Abbey A New Era | 4718.00 | | The Bad Guys | 4788.00 | | Top Gun Maverick | 5672.00 | +-------------------------+---------------+ 4 rows in set (0.00 sec)

This result set tells us the total revenue for each movie with the additional 12 guest ticket sales and organizes the total ticket sales in ascending order from lowest to highest. From this, we learn that Top Gun Maverick received the most ticket sales, while Men received the least. Meanwhile, The Bad Guys and Downton Abbey A New Era movies were very close in total ticket sales.

In this section, you practiced various ways to implement the ORDER BY statement and how to specify the order you prefer, such as ascending and descending orders for both character and numerical data values. You also learned how to include the WHERE clause to narrow down your results, and performed a query using both the GROUP BY and ORDER BY statements with an aggregate function and mathematical equation.

Understanding how to use GROUP BY and ORDER BY statements are important for sorting your results and data. Whether you want to organize multiple results under one group, organize one of your columns in alphabetical and descending order, or do both simultaneously; it’s up to you and your desired result(s). You also learned about other ways to sort your results even further with the WHERE clause. If you want to learn more, check out our tutorial on How To Use Wildcards in SQL to practice filtering results with the LIKE clause.

Alternate Text Gọi ngay