The GROUP BY Clause specifies the groups into which output rows are to be placed. It also determines if aggregate functions are included in the SELECT clause (select list), and calculates a summary value for each group.
In other words, whenever the SELECT clause of your SELECT statement includes column values and aggregate functions, you must include the GROUP BY clause. The GROUP BY clause must list all of the column values used in the SELECT clause.
The GROUP BY keywords are followed by a list of columns, known as the grouping columns. The GROUP BY clause restricts the rows of the result set. When a GROUP BY clause is used, each item in the select list must produce a single value for each group. There is only one row for each distinct value in the grouping column or columns. Each result set row contains summary data related to the specific value in its grouping columns. Null values in one item are placed in one group.
The following example retrieves the total for each SalesOrderID from the SalesOrderDetail table.
A.
--
SELECT SalesOrderID, SUM(LineTotal) AS SubTotal
FROM Sales.SalesOrderDetail AS sod
GROUP BY SalesOrderID
ORDER BY SalesOrderID;
GO
-- Retrieve the number of employees in each city
SELECT City, COUNT(EmployeeID) AS NumEmployees
FROM Employees
GROUP BY City;
Sample Database All the examples on this page use the:
AdventureWorks Lightweight (LT) Sample Database.
Pubs Sample Database or
Northwind Sample Database.
if you don't have an instance of these databases running on your pc, please follow the links to download and install them.
The following example retrieves the number of employees for each City from the Address table joined to the EmployeeAddress table. This example uses AdventureWorks.
B.
--
SELECT a.City, COUNT(bea.AddressID) EmployeeCount
FROM Person.BusinessEntityAddress AS bea
INNER JOIN Person.Address AS a
ON bea.AddressID = a.AddressID
GROUP BY a.City
ORDER BY a.City;
GO
The following example retrieves the total sales for each year by using the DATEPART function. The same expression must be present in both the SELECT list and GROUP BY clause.
C.
--
SELECT DATEPART(yyyy,OrderDate) AS N'Year'
,SUM(TotalDue) AS N'Total Order Amount'
FROM Sales.SalesOrderHeader
GROUP BY DATEPART(yyyy,OrderDate)
ORDER BY DATEPART(yyyy,OrderDate);
GO
The HAVING clause is used to filter grouped data. For example, the following example uses the HAVING clause to specify which of the groups generated in the GROUP BY clause should be included in the result set.
D.
--
SELECT DATEPART(yyyy,OrderDate) AS N'Year'
,SUM(TotalDue) AS N'Total Order Amount'
FROM Sales.SalesOrderHeader
GROUP BY DATEPART(yyyy,OrderDate)
HAVING DATEPART(yyyy,OrderDate) >= N'2003'
ORDER BY DATEPART(yyyy,OrderDate);
GO
/*
Retrieve the number of employees in each city in which there are at least 2 employees.
*/
SELECT City, COUNT(EmployeeID) AS NumEmployees
FROM Employees
GROUP BY City
HAVING COUNT(EmployeeID) > 1;
Here is an example of a query using every clause listed above:
/*
Find the number of sales representatives in each city that contains at least 2 sales representatives. Order by the number of employees.
*/
SELECT City, COUNT(EmployeeID) AS NumEmployees
FROM Employees
WHERE Title = 'Sales Representative'
GROUP BY City
HAVING COUNT(EmployeeID) > 1
ORDER BY NumEmployees;
Understanding the correct sequence in which the WHERE, GROUP BY, and HAVING clauses are applied helps in coding efficient queries:
The WHERE clause is used to filter the rows that result from the operations specified in the FROM clause.
The GROUP BY clause is used to group the output of the WHERE clause.
The HAVING clause is used to filter rows from the grouped result.
For any search conditions that could be applied either before or after the grouping operation, it is more efficient to specify them in the WHERE clause. This reduces the number of rows that have to be grouped. The only search conditions that should be specified in the HAVING clause are those search conditions that must be applied after the grouping operation has been performed.
The following query shows HAVING with an aggregate function. It groups the rows in the titles table by type and eliminates the groups that include only one book.
1. Every non-aggregate column that appears in the SELECT clause must also appear in the GROUP BY clause.
2. You may not use aliases in the HAVING clause. MySQL allows usage of aliases in the HAVING clause, but you may want to avoid this to keep your code as cross-database compatible as possible.
3. You may use aliases in the ORDER BY clause.
4. To filter rows based on aggregate functions, use the HAVING clause.
5. You should declare column aliases for any calculated fields in the SELECT clause.
6. You may not use aggregate functions in the WHERE clause.
When used without a GROUP BY clause, aggregate functions report only one aggregate value for a SELECT statement. The GROUP BY clause is used to produce aggregate values for each row in the result set.
This example returns the number of units sold for each product in category 2:
SELECT OrdD.ProductID AS ProdID,
SUM(OrdD.Quantity) AS AmountSold
FROM [Order Details] AS OrdD JOIN Products as Prd
ON OrdD.ProductID = Prd.ProductID
AND Prd.CategoryID = 2
GROUP BY OrdD.ProductID
Here is the result set:
ProdID | AmountSold | |
---|---|---|
3 | 328 | |
4 | 453 | |
5 | 298 | |
6 | 301 | |
8 | 372 | |
15 | 122 | |
44 | 601 | |
61 | 603 | |
63 | 445 | |
65 | 745 | |
66 | 239 | |
77 | 791 | |
** (12 row(s) affected) |
One or more aggregate-free expressions. These are usually references to the grouping columns.
Optionally, the ALL keyword, which specifies that all groups produced by the GROUP BY clause are returned, even if some of the groups do not have any rows that meet the search conditions.
CUBE or ROLLUP.
The HAVING clause is typically used in conjunction with the GROUP BY clause, although HAVING can be specified separately.
You can group by an expression as long as it does not include aggregate functions, for example:
SELECT DATEPART(yy, HireDate) AS Year,
COUNT(*) AS NumberOfHires
FROM Northwind.dbo.Employees
GROUP BY DATEPART(yy, HireDate)
USE pubs
SELECT type, pub_id, 'avg' = AVG(price), 'sum' = sum(ytd_sales)
FROM titles
GROUP BY type, pub_id
Here is the result set:
type | pub_id avg | avg | sum |
---|---|---|---|
business | 0736 | 11.96 | 18722 |
business | 1389 | 17.31 | 12066 |
mod_cook | 0877 | 11.49 | 24278 |
popular_comp | 1389 | 21.48 | 12875 |
psychology | 0736 | 45.93 | 9564 |
psychology | 0877 | 21.59 | 375 |
trad_cook | 0877 | 15.96 | 19566 |
UNDECIDED | 0877 | (null) | (null) |
** (8 row(s) affected) |
USE pubs
SELECT type
FROM titles
GROUP BY type
HAVING COUNT(*) > 1
** |
USE pubs
SELECT type
FROM titles
GROUP BY type
HAVING type LIKE 'p%'
When multiple conditions are included in HAVING, they are combined with AND, OR, or NOT. The following example shows how to group titles by publisher, including only those publishers with identification numbers greater than 0800, who have paid more than $15,000 in total advances, and who sell books for an average of less than $20.
SELECT pub_id, SUM(advance) AS AmountAdvanced,
AVG(price) AS AveragePrice
FROM pubs.dbo.titles
WHERE pub_id > '0800'
GROUP BY pub_id
HAVING SUM(advance) > $15000
AND AVG(price) < $20
ORDER BY can be used to order the output of a GROUP BY clause. This example shows using the ORDER BY clause to define the order in which the rows from a GROUP BY clause are returned:
SELECT pub_id, SUM(advance) AS AmountAdvanced,
AVG(price) AS AveragePrice
FROM pubs.dbo.titles
WHERE pub_id > '0800'
AND price >= $5
GROUP BY pub_id
HAVING SUM(advance) > $15000
AND AVG(price) < $20
ORDER BY pub_id DESC
Note