SQL for Beginners (Part 7) : The GROUP BY Clause and HAVING Clause

This is the seventh part of a series of articles showing the basics of SQL. In this article we take a look at the GROUP BY clause and HAVING clause in SQL.

Setup

You can perform all these queries online for free using SQL Fiddle.

The examples in this article require the following tables to be present.

--DROP TABLE employees PURGE;
--DROP TABLE departments PURGE;

CREATE TABLE departments (
  department_id   NUMBER(2) CONSTRAINT departments_pk PRIMARY KEY,
  department_name VARCHAR2(14),
  location        VARCHAR2(13)
);

INSERT INTO departments VALUES (10,'ACCOUNTING','NEW YORK');
INSERT INTO departments VALUES (20,'RESEARCH','DALLAS');
INSERT INTO departments VALUES (30,'SALES','CHICAGO');
INSERT INTO departments VALUES (40,'OPERATIONS','BOSTON');
COMMIT;


CREATE TABLE employees (
  employee_id   NUMBER(4) CONSTRAINT employees_pk PRIMARY KEY,
  employee_name VARCHAR2(10),
  job           VARCHAR2(9),
  manager_id    NUMBER(4),
  hiredate      DATE,
  salary        NUMBER(7,2),
  commission    NUMBER(7,2),
  department_id NUMBER(2) CONSTRAINT emp_department_id_fk REFERENCES departments(department_id)
);

INSERT INTO employees VALUES (7369,'SMITH','CLERK',7902,to_date('17-12-1980','dd-mm-yyyy'),800,NULL,20);
INSERT INTO employees VALUES (7499,'ALLEN','SALESMAN',7698,to_date('20-2-1981','dd-mm-yyyy'),1600,300,30);
INSERT INTO employees VALUES (7521,'WARD','SALESMAN',7698,to_date('22-2-1981','dd-mm-yyyy'),1250,500,30);
INSERT INTO employees VALUES (7566,'JONES','MANAGER',7839,to_date('2-4-1981','dd-mm-yyyy'),2975,NULL,20);
INSERT INTO employees VALUES (7654,'MARTIN','SALESMAN',7698,to_date('28-9-1981','dd-mm-yyyy'),1250,1400,30);
INSERT INTO employees VALUES (7698,'BLAKE','MANAGER',7839,to_date('1-5-1981','dd-mm-yyyy'),2850,NULL,30);
INSERT INTO employees VALUES (7782,'CLARK','MANAGER',7839,to_date('9-6-1981','dd-mm-yyyy'),2450,NULL,10);
INSERT INTO employees VALUES (7788,'SCOTT','ANALYST',7566,to_date('13-JUL-87','dd-mm-rr')-85,3000,NULL,20);
INSERT INTO employees VALUES (7839,'KING','PRESIDENT',NULL,to_date('17-11-1981','dd-mm-yyyy'),5000,NULL,10);
INSERT INTO employees VALUES (7844,'TURNER','SALESMAN',7698,to_date('8-9-1981','dd-mm-yyyy'),1500,0,30);
INSERT INTO employees VALUES (7876,'ADAMS','CLERK',7788,to_date('13-JUL-87', 'dd-mm-rr')-51,1100,NULL,20);
INSERT INTO employees VALUES (7900,'JAMES','CLERK',7698,to_date('3-12-1981','dd-mm-yyyy'),950,NULL,30);
INSERT INTO employees VALUES (7902,'FORD','ANALYST',7566,to_date('3-12-1981','dd-mm-yyyy'),3000,NULL,20);
INSERT INTO employees VALUES (7934,'MILLER','CLERK',7782,to_date('23-1-1982','dd-mm-yyyy'),1300,NULL,10);
COMMIT;

These tables are a variant of the EMP and DEPT tables from the SCOTT schema. You will see a lot of Oracle examples on the internet using the tables from the SCOTT schema. You can find the original table definitions in the “$ORACLE_HOME/rdbms/admin/utlsampl.sql” script.

GROUP BY Clause

SQL includes many Aggregate Functions, which take information from multiple rows and aggregate it down to produce fewer rows in the final result set. The following code contains a simple example of that using the COUNT, AVG and SUM aggregate functions. We have 14 rows in the EMPLOYEES table, but the query produces a single row with the count of the rows in the table, as well as the the average and total of the SALARY column for all rows in the table.

SELECT COUNT(*) AS employee_count,
       AVG(e.salary) AS avg_salary,
       SUM(e.salary) AS sum_salary
FROM   employees e;

EMPLOYEE_COUNT AVG_SALARY SUM_SALARY
-------------- ---------- ----------
            14 2073.21429      29025

1 row selected.

SQL>

Without a GROUP BY clause, the whole result set is treated as a single group, so the aggregate functions work on the result set as a whole. Adding the GROUP BY clause splits the results into groups of rows, with the aggregate functions being applied on a group basis. The following example groups the rows by DEPARTMENT_ID, so the aggregates are on a per-department basis.

SELECT e.department_id,
       COUNT(*) AS employee_count,
       AVG(e.salary) AS avg_salary,
       SUM(e.salary) AS sum_salary
FROM   employees e
GROUP BY e.department_id
ORDER BY e.department_id;

DEPARTMENT_ID EMPLOYEE_COUNT AVG_SALARY SUM_SALARY
------------- -------------- ---------- ----------
           10              3 2916.66667       8750
           20              5       2175      10875
           30              6 1566.66667       9400

3 rows selected.

SQL>

The more columns included in the GROUP BY clause, the more granular the aggregates. The following example groups by the DEPARTMENT_ID and JOB columns.

SELECT e.department_id,
       e.job,
       COUNT(*) AS employee_count,
       AVG(e.salary) AS avg_salary,
       SUM(e.salary) AS sum_salary
FROM   employees e
GROUP BY e.department_id, e.job
ORDER BY e.department_id, e.job;

DEPARTMENT_ID JOB       EMPLOYEE_COUNT AVG_SALARY SUM_SALARY
------------- --------- -------------- ---------- ----------
           10 CLERK                  1       1300       1300
           10 MANAGER                1       2450       2450
           10 PRESIDENT              1       5000       5000
           20 ANALYST                2       3000       6000
           20 CLERK                  2        950       1900
           20 MANAGER                1       2975       2975
           30 CLERK                  1        950        950
           30 MANAGER                1       2850       2850
           30 SALESMAN               4       1400       5600

9 rows selected.

SQL>

Remember, all non-aggregate columns must be included in the GROUP BY clause.

Joins

A grouped query can join to other tables to provide additional information. In the following example we want to report some aggregate data based on department, but we want to display the DEPARTMENT_NAME, rather than the DEPARTMENT_ID, so we have to join to the DEPARTMENTS tables.

SELECT d.department_name,
       COUNT(*) AS employee_count,
       AVG(e.salary) AS avg_salary,
       SUM(e.salary) AS sum_salary
FROM   departments d
       JOIN employees e ON d.department_id = e.department_id
GROUP BY d.department_name
ORDER BY d.department_name;

DEPARTMENT_NAM EMPLOYEE_COUNT AVG_SALARY SUM_SALARY
-------------- -------------- ---------- ----------
ACCOUNTING                  3 2916.66667       8750
RESEARCH                    5       2175      10875
SALES                       6 1566.66667       9400

3 rows selected.

SQL>

This works fine, but we are losing an important piece of information. The OPERATIONS department has no employees, but we don’t display it because we have used an INNER JOIN. Switching to a LEFT OUTER JOIN appears to fix that problem, but it reports the wrong number of employees.

SELECT d.department_name,
       COUNT(*) AS employee_count,
       AVG(e.salary) AS avg_salary,
       SUM(e.salary) AS sum_salary
FROM   departments d
       LEFT OUTER JOIN employees e ON d.department_id = e.department_id
GROUP BY d.department_name
ORDER BY d.department_name;

DEPARTMENT_NAM EMPLOYEE_COUNT AVG_SALARY SUM_SALARY
-------------- -------------- ---------- ----------
ACCOUNTING                  3 2916.66667       8750
OPERATIONS                  1
RESEARCH                    5       2175      10875
SALES                       6 1566.66667       9400

4 rows selected.

SQL>

Now we have a row for the OPERATIONS department, the COUNT(*) call has correctly returned the value 1, but this is not the correct number of employees. This is not the fault of the GROUP BY, but the way the COUNT(*) call works. If instead we reference a mandatory column from the outer-joined table in the COUNT call, we will get the correct result, as NULL values are not counted. The following example shows how.

SELECT d.department_name,
       COUNT(e.employee_id) AS employee_count,
       AVG(e.salary) AS avg_salary,
       SUM(e.salary) AS sum_salary
FROM   departments d
       LEFT OUTER JOIN employees e ON d.department_id = e.department_id
GROUP BY d.department_name
ORDER BY d.department_name;

DEPARTMENT_NAM EMPLOYEE_COUNT AVG_SALARY SUM_SALARY
-------------- -------------- ---------- ----------
ACCOUNTING                  3 2916.66667       8750
OPERATIONS                  0
RESEARCH                    5       2175      10875
SALES                       6 1566.66667       9400

4 rows selected.

SQL>

HAVING Clause

We mentioned previously, having more columns referenced in the GROUP BY clause gives us more granular information. The following example groups the data by DEPARTMENT_NAME and JOB.

SELECT d.department_name, e.job,
       COUNT(e.employee_id) AS employee_count,
       AVG(e.salary) AS avg_salary,
       SUM(e.salary) AS sum_salary
FROM   departments d
       LEFT OUTER JOIN employees e ON d.department_id = e.department_id
GROUP BY d.department_name, e.job
ORDER BY d.department_name, e.job;

DEPARTMENT_NAM JOB       EMPLOYEE_COUNT AVG_SALARY SUM_SALARY
-------------- --------- -------------- ---------- ----------
ACCOUNTING     CLERK                  1       1300       1300
ACCOUNTING     MANAGER                1       2450       2450
ACCOUNTING     PRESIDENT              1       5000       5000
OPERATIONS                            0
RESEARCH       ANALYST                2       3000       6000
RESEARCH       CLERK                  2        950       1900
RESEARCH       MANAGER                1       2975       2975
SALES          CLERK                  1        950        950
SALES          MANAGER                1       2850       2850
SALES          SALESMAN               4       1400       5600

10 rows selected.

SQL>

Sometimes we are only interested in information from specific groups of data. The HAVING clause allows us to filter out groups that do not meet a specific requirement. You can think of it as a WHERE clause for the GROUP BY clause. In the following example we are only returning information for groups with more than 1 employee.

SELECT d.department_name, e.job,
       COUNT(e.employee_id) AS employee_count,
       AVG(e.salary) AS avg_salary,
       SUM(e.salary) AS sum_salary
FROM   departments d
       LEFT OUTER JOIN employees e ON d.department_id = e.department_id
GROUP BY d.department_name, e.job
HAVING COUNT(e.employee_id) > 1
ORDER BY d.department_name, e.job;

DEPARTMENT_NAM JOB       EMPLOYEE_COUNT AVG_SALARY SUM_SALARY
-------------- --------- -------------- ---------- ----------
RESEARCH       ANALYST                2       3000       6000
RESEARCH       CLERK                  2        950       1900
SALES          SALESMAN               4       1400       5600

3 rows selected.

SQL>

Just as a silly aside, it is possible to have a HAVING clause without a GROUP BY clause, as shown in the following example. The whole result set is treated as a single group, so the HAVING clause filer can still be applied.

SELECT COUNT(e.employee_id) AS employee_count,
       AVG(e.salary) AS avg_salary,
       SUM(e.salary) AS sum_salary
FROM   departments d
       LEFT OUTER JOIN employees e ON d.department_id = e.department_id
HAVING COUNT(e.employee_id) > 2
ORDER BY d.department_name, e.job;

EMPLOYEE_COUNT AVG_SALARY SUM_SALARY
-------------- ---------- ----------
            14 2073.21429      29025

1 row selected.

SQL>

23c Updates

From Oracle 23c onward we can use the column alias in GROUP BY and HAVING clauses, or the column position in the GROUP BY clause. This functionality is demonstrated here.

For more information see:

Hope this helps. Regards Tim…

Back to the Top.