SQL for Beginners (Part 7) : The GROUP BY Clause and HAVING Clause
Home » Articles » Misc » Here
Mục Lục
SQL for Beginners (Part 7) : The GROUP BY Clause and HAVING Clause
This is the seventh part of a series of articles showing the basics of SQL. In this article we take a look at the GROUP BY
clause and HAVING
clause in SQL.
Related articles.
Setup
You can perform all these queries online for free using SQL Fiddle.
The examples in this article require the following tables to be present.
--DROP TABLE employees PURGE; --DROP TABLE departments PURGE; CREATE TABLE departments ( department_id NUMBER(2) CONSTRAINT departments_pk PRIMARY KEY, department_name VARCHAR2(14), location VARCHAR2(13) ); INSERT INTO departments VALUES (10,'ACCOUNTING','NEW YORK'); INSERT INTO departments VALUES (20,'RESEARCH','DALLAS'); INSERT INTO departments VALUES (30,'SALES','CHICAGO'); INSERT INTO departments VALUES (40,'OPERATIONS','BOSTON'); COMMIT; CREATE TABLE employees ( employee_id NUMBER(4) CONSTRAINT employees_pk PRIMARY KEY, employee_name VARCHAR2(10), job VARCHAR2(9), manager_id NUMBER(4), hiredate DATE, salary NUMBER(7,2), commission NUMBER(7,2), department_id NUMBER(2) CONSTRAINT emp_department_id_fk REFERENCES departments(department_id) ); INSERT INTO employees VALUES (7369,'SMITH','CLERK',7902,to_date('17-12-1980','dd-mm-yyyy'),800,NULL,20); INSERT INTO employees VALUES (7499,'ALLEN','SALESMAN',7698,to_date('20-2-1981','dd-mm-yyyy'),1600,300,30); INSERT INTO employees VALUES (7521,'WARD','SALESMAN',7698,to_date('22-2-1981','dd-mm-yyyy'),1250,500,30); INSERT INTO employees VALUES (7566,'JONES','MANAGER',7839,to_date('2-4-1981','dd-mm-yyyy'),2975,NULL,20); INSERT INTO employees VALUES (7654,'MARTIN','SALESMAN',7698,to_date('28-9-1981','dd-mm-yyyy'),1250,1400,30); INSERT INTO employees VALUES (7698,'BLAKE','MANAGER',7839,to_date('1-5-1981','dd-mm-yyyy'),2850,NULL,30); INSERT INTO employees VALUES (7782,'CLARK','MANAGER',7839,to_date('9-6-1981','dd-mm-yyyy'),2450,NULL,10); INSERT INTO employees VALUES (7788,'SCOTT','ANALYST',7566,to_date('13-JUL-87','dd-mm-rr')-85,3000,NULL,20); INSERT INTO employees VALUES (7839,'KING','PRESIDENT',NULL,to_date('17-11-1981','dd-mm-yyyy'),5000,NULL,10); INSERT INTO employees VALUES (7844,'TURNER','SALESMAN',7698,to_date('8-9-1981','dd-mm-yyyy'),1500,0,30); INSERT INTO employees VALUES (7876,'ADAMS','CLERK',7788,to_date('13-JUL-87', 'dd-mm-rr')-51,1100,NULL,20); INSERT INTO employees VALUES (7900,'JAMES','CLERK',7698,to_date('3-12-1981','dd-mm-yyyy'),950,NULL,30); INSERT INTO employees VALUES (7902,'FORD','ANALYST',7566,to_date('3-12-1981','dd-mm-yyyy'),3000,NULL,20); INSERT INTO employees VALUES (7934,'MILLER','CLERK',7782,to_date('23-1-1982','dd-mm-yyyy'),1300,NULL,10); COMMIT;
These tables are a variant of the EMP and DEPT tables from the SCOTT schema. You will see a lot of Oracle examples on the internet using the tables from the SCOTT schema. You can find the original table definitions in the “$ORACLE_HOME/rdbms/admin/utlsampl.sql” script.
GROUP BY Clause
SQL includes many Aggregate Functions, which take information from multiple rows and aggregate it down to produce fewer rows in the final result set. The following code contains a simple example of that using the COUNT
, AVG
and SUM
aggregate functions. We have 14 rows in the EMPLOYEES
table, but the query produces a single row with the count of the rows in the table, as well as the the average and total of the SALARY
column for all rows in the table.
SELECT COUNT(*) AS employee_count, AVG(e.salary) AS avg_salary, SUM(e.salary) AS sum_salary FROM employees e; EMPLOYEE_COUNT AVG_SALARY SUM_SALARY -------------- ---------- ---------- 14 2073.21429 29025 1 row selected. SQL>
Without a GROUP BY
clause, the whole result set is treated as a single group, so the aggregate functions work on the result set as a whole. Adding the GROUP BY
clause splits the results into groups of rows, with the aggregate functions being applied on a group basis. The following example groups the rows by DEPARTMENT_ID, so the aggregates are on a per-department basis.
SELECT e.department_id, COUNT(*) AS employee_count, AVG(e.salary) AS avg_salary, SUM(e.salary) AS sum_salary FROM employees e GROUP BY e.department_id ORDER BY e.department_id; DEPARTMENT_ID EMPLOYEE_COUNT AVG_SALARY SUM_SALARY ------------- -------------- ---------- ---------- 10 3 2916.66667 8750 20 5 2175 10875 30 6 1566.66667 9400 3 rows selected. SQL>
The more columns included in the GROUP BY
clause, the more granular the aggregates. The following example groups by the DEPARTMENT_ID and JOB columns.
SELECT e.department_id, e.job, COUNT(*) AS employee_count, AVG(e.salary) AS avg_salary, SUM(e.salary) AS sum_salary FROM employees e GROUP BY e.department_id, e.job ORDER BY e.department_id, e.job; DEPARTMENT_ID JOB EMPLOYEE_COUNT AVG_SALARY SUM_SALARY ------------- --------- -------------- ---------- ---------- 10 CLERK 1 1300 1300 10 MANAGER 1 2450 2450 10 PRESIDENT 1 5000 5000 20 ANALYST 2 3000 6000 20 CLERK 2 950 1900 20 MANAGER 1 2975 2975 30 CLERK 1 950 950 30 MANAGER 1 2850 2850 30 SALESMAN 4 1400 5600 9 rows selected. SQL>
Remember, all non-aggregate columns must be included in the GROUP BY
clause.
Joins
A grouped query can join to other tables to provide additional information. In the following example we want to report some aggregate data based on department, but we want to display the DEPARTMENT_NAME, rather than the DEPARTMENT_ID, so we have to join to the DEPARTMENTS tables.
SELECT d.department_name, COUNT(*) AS employee_count, AVG(e.salary) AS avg_salary, SUM(e.salary) AS sum_salary FROM departments d JOIN employees e ON d.department_id = e.department_id GROUP BY d.department_name ORDER BY d.department_name; DEPARTMENT_NAM EMPLOYEE_COUNT AVG_SALARY SUM_SALARY -------------- -------------- ---------- ---------- ACCOUNTING 3 2916.66667 8750 RESEARCH 5 2175 10875 SALES 6 1566.66667 9400 3 rows selected. SQL>
This works fine, but we are losing an important piece of information. The OPERATIONS department has no employees, but we don’t display it because we have used an INNER JOIN
. Switching to a LEFT OUTER JOIN
appears to fix that problem, but it reports the wrong number of employees.
SELECT d.department_name, COUNT(*) AS employee_count, AVG(e.salary) AS avg_salary, SUM(e.salary) AS sum_salary FROM departments d LEFT OUTER JOIN employees e ON d.department_id = e.department_id GROUP BY d.department_name ORDER BY d.department_name; DEPARTMENT_NAM EMPLOYEE_COUNT AVG_SALARY SUM_SALARY -------------- -------------- ---------- ---------- ACCOUNTING 3 2916.66667 8750 OPERATIONS 1 RESEARCH 5 2175 10875 SALES 6 1566.66667 9400 4 rows selected. SQL>
Now we have a row for the OPERATIONS department, the COUNT(*)
call has correctly returned the value 1, but this is not the correct number of employees. This is not the fault of the GROUP BY
, but the way the COUNT(*)
call works. If instead we reference a mandatory column from the outer-joined table in the COUNT
call, we will get the correct result, as NULL values are not counted. The following example shows how.
SELECT d.department_name, COUNT(e.employee_id) AS employee_count, AVG(e.salary) AS avg_salary, SUM(e.salary) AS sum_salary FROM departments d LEFT OUTER JOIN employees e ON d.department_id = e.department_id GROUP BY d.department_name ORDER BY d.department_name; DEPARTMENT_NAM EMPLOYEE_COUNT AVG_SALARY SUM_SALARY -------------- -------------- ---------- ---------- ACCOUNTING 3 2916.66667 8750 OPERATIONS 0 RESEARCH 5 2175 10875 SALES 6 1566.66667 9400 4 rows selected. SQL>
HAVING Clause
We mentioned previously, having more columns referenced in the GROUP BY
clause gives us more granular information. The following example groups the data by DEPARTMENT_NAME and JOB.
SELECT d.department_name, e.job, COUNT(e.employee_id) AS employee_count, AVG(e.salary) AS avg_salary, SUM(e.salary) AS sum_salary FROM departments d LEFT OUTER JOIN employees e ON d.department_id = e.department_id GROUP BY d.department_name, e.job ORDER BY d.department_name, e.job; DEPARTMENT_NAM JOB EMPLOYEE_COUNT AVG_SALARY SUM_SALARY -------------- --------- -------------- ---------- ---------- ACCOUNTING CLERK 1 1300 1300 ACCOUNTING MANAGER 1 2450 2450 ACCOUNTING PRESIDENT 1 5000 5000 OPERATIONS 0 RESEARCH ANALYST 2 3000 6000 RESEARCH CLERK 2 950 1900 RESEARCH MANAGER 1 2975 2975 SALES CLERK 1 950 950 SALES MANAGER 1 2850 2850 SALES SALESMAN 4 1400 5600 10 rows selected. SQL>
Sometimes we are only interested in information from specific groups of data. The HAVING
clause allows us to filter out groups that do not meet a specific requirement. You can think of it as a WHERE
clause for the GROUP BY
clause. In the following example we are only returning information for groups with more than 1 employee.
SELECT d.department_name, e.job, COUNT(e.employee_id) AS employee_count, AVG(e.salary) AS avg_salary, SUM(e.salary) AS sum_salary FROM departments d LEFT OUTER JOIN employees e ON d.department_id = e.department_id GROUP BY d.department_name, e.job HAVING COUNT(e.employee_id) > 1 ORDER BY d.department_name, e.job; DEPARTMENT_NAM JOB EMPLOYEE_COUNT AVG_SALARY SUM_SALARY -------------- --------- -------------- ---------- ---------- RESEARCH ANALYST 2 3000 6000 RESEARCH CLERK 2 950 1900 SALES SALESMAN 4 1400 5600 3 rows selected. SQL>
Just as a silly aside, it is possible to have a HAVING
clause without a GROUP BY
clause, as shown in the following example. The whole result set is treated as a single group, so the HAVING
clause filer can still be applied.
SELECT COUNT(e.employee_id) AS employee_count, AVG(e.salary) AS avg_salary, SUM(e.salary) AS sum_salary FROM departments d LEFT OUTER JOIN employees e ON d.department_id = e.department_id HAVING COUNT(e.employee_id) > 2 ORDER BY d.department_name, e.job; EMPLOYEE_COUNT AVG_SALARY SUM_SALARY -------------- ---------- ---------- 14 2073.21429 29025 1 row selected. SQL>
23c Updates
From Oracle 23c onward we can use the column alias in GROUP BY and HAVING clauses, or the column position in the GROUP BY clause. This functionality is demonstrated here.
For more information see:
Hope this helps. Regards Tim…
Back to the Top.