The GROUP BY Clause specifies the groups into which output rows are to be placed. It also determines if aggregate functions are included in the SELECT clause (select list), and calculates a summary value for each group.
In other words, whenever the SELECT clause of your SELECT statement includes column values and aggregate functions, you must include the GROUP BY clause. The GROUP BY clause must list all of the column values used in the SELECT clause.
The HAVING clause sets conditions on the GROUP BY clause similar to the way WHERE interacts with SELECT. The WHERE search condition is applied before the grouping operation occurs; the HAVING search condition is applied after the grouping operation occurs. The HAVING syntax is exactly like the WHERE syntax, except HAVING can contain aggregate functions. HAVING clauses can reference any of the items that appear in the select list.
This query finds publishers who have had year-to-date sales greater than $40,000:
USE pubs
SELECT pub_id, total = SUM(ytd_sales)
FROM titles
GROUP BY pub_id
HAVING SUM(ytd_sales) > 40000
Here is the result set:
pub_id total ------ ----------- 0877 44219 (1 row(s) affected)To make sure there are at least six books involved in the calculations for each publisher, this example uses HAVING COUNT(*) > 5 to eliminate the publishers that return totals for fewer than six books:
USE pubs
SELECT pub_id, total = SUM(ytd_sales)
FROM titles
GROUP BY pub_id
HAVING COUNT(*) > 5
Understanding the correct sequence in which the WHERE, GROUP BY, and HAVING clauses are applied helps in coding efficient queries:
/////////////////
Syntax:
The following is the position of the HAVING clause in a query:
SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY
The HAVING clause must follow the GROUP BY clause in a query and must also precede the ORDER BY clause if used. The following is the syntax of the SELECT statement, including the HAVING clause:
//////////////
The WHERE clause is used to filter the rows that result from the operations specified in the FROM clause.
The GROUP BY clause is used to group the output of the WHERE clause.
The HAVING clause is used to filter rows from the grouped result.
For any search conditions that could be applied either before or after the grouping operation, it is more efficient to specify them in the WHERE clause. This reduces the number of rows that have to be grouped. The only search conditions that should be specified in the HAVING clause are those search conditions that must be applied after the grouping operation has been performed.
The following query shows HAVING with an aggregate function. It groups the rows in the titles table by type and eliminates the groups that include only one book.
Sample Database All the examples on this page use the:
AdventureWorks Lightweight (LT) Sample Database.
Pubs Sample Database or
Northwind Sample Database.
if you don't have an instance of these databases running on your pc, please follow the links to download and install them.
USE pubs
SELECT type
FROM titles
GROUP BY type
HAVING COUNT(*) > 1
Here is the result set:
type | ||
---|---|---|
business | ||
mod_cook | ||
popular_comp | ||
psychology | ||
trad_cook | ||
** (5 row(s) affected) |
Below is an example of a HAVING clause without aggregate functions. It groups the rows in titles by type and eliminates those types that do not start with the letter p.
USE pubs
SELECT type
FROM titles
GROUP BY type
HAVING type LIKE 'p%'
Here is the result set:
type | ||
---|---|---|
popular_comp | ||
psychology | ||
** (2 row(s) affected) |
When multiple conditions are included in HAVING, they are combined with AND, OR, or NOT. The following example shows how to group titles by publisher, including only those publishers with identification numbers greater than 0800, who have paid more than $15,000 in total advances, and who sell books for an average of less than $20.
SELECT pub_id, SUM(advance) AS AmountAdvanced,
AVG(price) AS AveragePrice
FROM pubs.dbo.titles
WHERE pub_id > '0800'
GROUP BY pub_id
HAVING SUM(advance) > $15000
AND AVG(price) < $20
ORDER BY can be used to order the output of a GROUP BY clause. This example shows using the ORDER BY clause to define the order in which the rows from a GROUP BY clause are returned:
SELECT pub_id, SUM(advance) AS AmountAdvanced,
AVG(price) AS AveragePrice
FROM pubs.dbo.titles
WHERE pub_id > '0800'
AND price >= $5
GROUP BY pub_id
HAVING SUM(advance) > $15000
AND AVG(price) < $20
ORDER BY pub_id DESC
The GROUP BY keywords are followed by a list of columns, known as the grouping columns. The GROUP BY clause restricts the rows of the result set. When a GROUP BY clause is used, each item in the select list must produce a single value for each group. There is only one row for each distinct value in the grouping column or columns. Each result set row contains summary data related to the specific value in its grouping columns. Null values in one item are placed in one group.
The following example retrieves the total for each SalesOrderID from the SalesOrderDetail table.
A.
--
SELECT SalesOrderID, SUM(LineTotal) AS SubTotal
FROM Sales.SalesOrderDetail AS sod
GROUP BY SalesOrderID
ORDER BY SalesOrderID;
GO
-- Retrieve the number of employees in each city
SELECT City, COUNT(EmployeeID) AS NumEmployees
FROM Employees
GROUP BY City;
The following example retrieves the number of employees for each City from the Address table joined to the EmployeeAddress table. This example uses AdventureWorks.
B.
--
SELECT a.City, COUNT(bea.AddressID) EmployeeCount
FROM Person.BusinessEntityAddress AS bea
INNER JOIN Person.Address AS a
ON bea.AddressID = a.AddressID
GROUP BY a.City
ORDER BY a.City;
GO
The following example retrieves the total sales for each year by using the DATEPART function. The same expression must be present in both the SELECT list and GROUP BY clause.
C.
--
SELECT DATEPART(yyyy,OrderDate) AS N'Year'
,SUM(TotalDue) AS N'Total Order Amount'
FROM Sales.SalesOrderHeader
GROUP BY DATEPART(yyyy,OrderDate)
ORDER BY DATEPART(yyyy,OrderDate);
GO
The HAVING clause is used to filter grouped data. For example, the following example uses the HAVING clause to specify which of the groups generated in the GROUP BY clause should be included in the result set.
D.
--
SELECT DATEPART(yyyy,OrderDate) AS N'Year'
,SUM(TotalDue) AS N'Total Order Amount'
FROM Sales.SalesOrderHeader
GROUP BY DATEPART(yyyy,OrderDate)
HAVING DATEPART(yyyy,OrderDate) >= N'2003'
ORDER BY DATEPART(yyyy,OrderDate);
GO
/*
Retrieve the number of employees in each city in which there are at least 2 employees.
*/
SELECT City, COUNT(EmployeeID) AS NumEmployees
FROM Employees
GROUP BY City
HAVING COUNT(EmployeeID) > 1;
Here is an example of a query using every clause listed above:
/*
Find the number of sales representatives in each city that contains at least 2 sales representatives. Order by the number of employees.
*/
SELECT City, COUNT(EmployeeID) AS NumEmployees
FROM Employees
WHERE Title = 'Sales Representative'
GROUP BY City
HAVING COUNT(EmployeeID) > 1
ORDER BY NumEmployees;
1. Every non-aggregate column that appears in the SELECT clause must also appear in the GROUP BY clause.
2. You may not use aliases in the HAVING clause. MySQL allows usage of aliases in the HAVING clause, but you may want to avoid this to keep your code as cross-database compatible as possible.
3. You may use aliases in the ORDER BY clause.
4. To filter rows based on aggregate functions, use the HAVING clause.
5. You should declare column aliases for any calculated fields in the SELECT clause.
6. You may not use aggregate functions in the WHERE clause.
Note